Wikipedia:Link rot
This help page is a how-to guide. It details processes or procedures of some aspect(s) of Wikipedia's norms and practices. It is not one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. |
This page in a nutshell: Steps may be taken to reduce or repair dead external links. |
Like most large websites, Wikipedia suffers from the phenomenon known as link rot, where external links go dead (also called a dead link), as the linked websites disappear, change their content, or move. This presents a significant threat to Wikipedia's reliability policy and its source citation guideline.
In general, do not delete cited information solely because the URL to the source does not work any longer. Tools, procedures and processes are available as outlined in this document.
Contents
Preventing link rot
Automatic archiving
All new links added to Wikipedia are automatically saved to Wayback Machine within about 24hrs. This is done with a program called "NoMore404" which Internet Archive runs and maintains. It scans the IRC feed channels, extracts new external URLs and adds a snapshot to the Wayback. This system became active sometime after 2015, though previous efforts were also made. Also, sometime around 2010-2012, Archive.is attempted to archive all external links then existing on Wikipedia. This was incomplete but a significant number of links were added to archive.is during this period making it a major archival source filing in gaps of coverage. Archive.is is still making automated archives as of 2019 though extent of coverage and frequency is unknown.
As of 2015, there is a Wikipedia bot and tool called WP:IABOT that automates fixing link rot. It runs continuously checking all articles on Wikipedia if a link is dead, adding archives to Wayback Machine (if none yet there), and replacing dead links in the wikitext with an archived version. This bot runs automatically but it can also be directed by end users through its web interface. It is available in the page history tab under "Fix dead links".
As of 2015, the periodic bot WP:WAYBACKMEDIC checks for link rot in the archive links themselves. Archive databases are dynamic and changing, archives go missing, move, new ones added etc.. this bot maintains archive links.
Manual archiving
Suggestions for ways to manually improve archiving:
- Avoid bare URLs. Use citation templates such as
{{cite web}}
for citations, and{{webarchive}}
for external links sections.
- Use a web archiving service such as Internet Archive, WebCite or Archive.is. A complete list is available at WP:List of web archives on Wikipedia. Add the archive URL to
|archiveurl=
and|archivedate=
in the citation template. If the link is not yet dead, include|deadurl=no
otherwise set as|deadurl=yes
.
- If the link is still live but not yet archived, somewhere, login to the archive service of your choice and request the page be archived.
- Run WP:IABOT on pages via its user interface. Note that Featured content is automatically checked once a week via User:FA RotBot and IABot.
Alternative methods
Most citation templates have a |quote=
parameter that can be used to store text quotes of the source material. This can be used to store a limited amount of text from the source within the citation template. This is especially useful for sources that cannot be archived with web archiving services. It can also provide insurance against failure of the chosen web archiving service. Storing the entire text of the source is not appropriate under fair use policies, so choose only the most important portions of the text that most support the assertions in the Wikipedia article. Where applicable, public domain materials can be copied to Wikisource.
Repairing a dead link
There are several ways to try to repair a dead link, detailed below:
Searching
If the dead link includes enough information (article title, names, etc.) it is often possible to use it to find the Web page at a different location, either on the same site or elsewhere.
Often web pages simply moved within the same site. A site index or site-specific search feature is a useful place to locate the moved page. If these tools are not available, many Internet search engines allow a search on a specified site.
Failing this searching the Internet for the page can find alternatives.
Internet archives
Check for archived versions at one of the many web archive services. The "Big 3" archive services are web.archive.org, webcitation.org and archive.is .. these account for over 90% of all archives on Wikipedia, with Wayback being over 80% of all archive links. Other archive services are listed at WP:WEBARCHIVES and web archive.
If multiple archive dates are available, use the one that is most likely to be the contents of the page seen by the editor who entered the reference on the |accessdate=
. If that parameter is not specified, a search of the article's revision history can be performed to determine when the link was added to the article.
View the archive to verify that it contains valid page information. Usually dates closer to the time the link was placed in the Wikipedia page, or earlier, are more likely to show valid information.
For most citation templates, archives are entered using the required |archiveurl=
, |archivedate=
and optional |deadurl=
parameters. The primary link is automatically switched to the archive unless |deadurl=no
; the |deadurl=
parameter can simply be omitted. To pre-emptively supply an archived version of a URL that may later go dead, |deadurl=no
will change the display order, with the title retaining the original link and the archive linked at the end. When the original URL has been usurped for the purposes of spam, advertising, or is otherwise unsuitable, setting |deadurl=unfit
or |deadurl=usurped
suppresses display of the original URL (but |url= is still required).
Archive site | Bookmarklet |
---|---|
Archive.org | javascript:void(window.open('https://web.archive.org/web/*/'+location.href))
|
UKGWA | javascript:void(window.open('http://webarchive.nationalarchives.gov.uk/*/'+location.href))
|
WebCite | javascript:void(window.open('https://www.webcitation.org/query.php?url='+location.href))
|
Memento
The Mementos interface allows one to search multiple archiving services with a single search. The Memento database is cached, meaning results are returned quickly, but the cache also becomes out of date. Therefore it should not be relied on as the final word - very often when it reports no archives are available they actually are. You may still need to do the work of checking individual archive sites, but Mementos can be a quick first check.
Mitigating a dead link
At times, all attempts to repair the link will be unsuccessful. In that event, consider finding an alternate source so that the loss of the original does not harm the verifiability of the article. Alternate sources about broad topics are usually easily located. A simple search engine query might locate an appropriate alternative, but be extremely careful to avoid citing mirrors and forks of Wikipedia itself, which would violate Wikipedia:Verifiability.
Sometimes, finding an appropriate source is not possible, or would require more extensive research techniques, such as a visit to a library or the use of a subscription-based database. If that is the case, consider consulting with Wikipedia editors at Wikipedia:WikiProject Resource Exchange, the Wikipedia:Village pump, or Wikipedia:Help desk. Also, consider contacting experts or other interested editors at a relevant WikiProject.
Sometimes a link is dead because the website moved the URL eg. http://example.com moved to http://example.co.uk .. if you discover a URL change like this please submit a request at WP:BOTREQ for a url move. A bot will make the change.
Keeping dead links
A dead, unarchived source URL may still be useful. Such a link indicates that information was (probably) verifiable in the past, and the link might provide another user with greater resources or expertise with enough information to find the reference. It could also return from the dead. With a dead link, it is possible to determine if it has been cited elsewhere, or to contact the person originally responsible for the source. For example, one could contact the Yale Computer Science department if http://www.cs.yale.edu/~EliYale/Defense-in-Depth-PhD-thesis.pdf[dead link] were dead. Place {{dead link}} after the dead URL and just before the </ref>
tag if applicable, leaving the original link intact.
Placing {{dead link}} auto-categorizes the article into Articles with dead external links project category, and into specific monthly date range category based on |date=
parameter. Do not delete a URL just because it has been tagged with {{dead link}} for a long time.
Link rot on non-Wikimedia sites
Non-Wikimedia sites are also susceptible to link rot. Following a page move or page deletion, links to Wikipedia pages from other websites may break. In most page moves, a redirect will remain at the old page—this won't cause a problem. But if a page is completely deleted or usurped (i.e. replaced with other content) then link rot will have been caused on any external websites that link to it.
Replacement of page content with a disambiguation page may still cause link rot, but is less harmful because a disambiguation page is essentially a type of soft redirect that will lead the reader to the required content. If a page is usurped with content for another subject that shares its name, a hatnote may be placed at the top that directs readers to the original content on its new page—this again is a type of soft redirect, but less obvious. In these cases, readers arriving from an external rotten link should be able to find what they're looking for, but the situation is best avoided as they would have to get there via an additional page, potentially giving a poor impression of both Wikipedia and the linking website.
Because the Wikipedia software does not store Referer
information, it will be impossible to tell how many external web pages will be affected by a move or deletion, but the risk of link rot will probably be greatest on older and higher profile pages. In truth, there is not a lot that can be done; maintenance of non-Wikimedia websites is not within the scope of being a Wikimedian, nor in most cases within our capability (although if they can be fixed, it would be helpful to do so). However, it may be good practice to think about the potential impact on other sites when deleting or moving Wikipedia pages, especially if no redirect or hatnote will remain. If a move or deletion is expected to cause significant damage, then this might be a factor to consider in WP:RM, WP:AFD and WP:RFD discussions, although other factors may carry more weight.
See also
- Help:Using the Wayback Machine—how-to guide
- Wikipedia:Using WebCite—how-to guide
- Wikipedia:Using Archive.is—how-to guide
- Special:LinkSearch—to find all the pages that contain a particular URL
- Wikipedia:Citing sources/Further considerations#Pre-emptive archiving—brief guide on how to use various archiving services
- Wikipedia:Citing sources#Preventing and repairing dead links
- Wikipedia:External links#Longevity of links—prescribes removal of dead URLs from the "External links" section
- Wikipedia:Offline sources—essay
- Category:Articles with bare URLs for citations—the backlog of articles containing bare URLs at risk of link rot, sub-categorised by month
- Category:Articles with dead external links—the backlog of articles containing dead links, sub-categorised by month
Bots
- InternetArchiveBot (IABot)—automatically fixes dead links whenever possible, and tags them when it isn't
- WaybackMedic-automatically fixes dead links that are difficult to determine, other general fixes
- User:Legobot—can mass tag links with
{{dead link}}
. Requests can be made at User talk:Legoktm.
External links
- Wayback Machine add-on, official Wayback add-on (Firefox)
- Resurrect Pages, a third-party add-on tool provides links to seven cache/archive websites upon coming across a dead link. (Firefox)
- Webcache, add-on for Opera.
- weblinkchecker.py—script from the Python Wikipedia Bot collection which finds broken external links.