Search engine cache explained

A search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down.^[1]

A web crawler collects the contents of a web page, which is then indexed by a web search engine. The search engine might make the copy accessible to users. Web crawlers that obey restrictions in robots.txt^[2] or meta tags^[3] by the site webmaster may not make a cached copy available to search engine users if instructed not to.

Search engine cache can be used for crime investigation,^[4] legal proceedings^[5] and journalism.^[6] Examples of search engines that offer their users cached versions of web pages are Bing, Yandex Search, and Baidu.

Search engine cache may not be fully protected by the usual laws that protect technology providers from copyright infringement claims.^[7]

Google Cache

Google retired its web caching service in 2024. The service was designed for websites that might show up in a Google search result, but are temporarily offline. It was not designed for long or even medium term archiving purposes. Google said the Internet as of 2024 is much more reliable than it was "way back" in earlier days, and therefore its cache service is no longer an important service to maintain.

Google pointed to the Wayback Machine as a better alternative, and suggested Google might work with them in the future.^[8] In September 2024, Google and the Internet Archive announced a collaboration providing links to the Wayback Machine from within Google Search.^[9]

Notes and References

Book: The Data Journalism Handbook. Wilfried Ruetten. 2012. O'Reilly Media, Inc.. 9781449330064. When a page becomes controversial, the publishers may take it down or alter it without acknowledgment. If you suspect you're running into the problem, the first place to turn is Google's cache of the page as it was when it did its last crawl..
Web site: Robots meta tag, data-nosnippet, and X-Robots-Tag specifications. noarchive: Do not show a cached link in search results..
Web site: Special tags that Google understands - Search Console Help. noarchive - Don't show a Cached link for a page in search results..
Book: Investigating Internet Crimes: An Introduction to Solving Crimes in Cyberspace. Todd G. Shipley, Art Bowker. 2013. Newnes. 9780124079298. For the investigator this can be a valuable piece of information. Depending on when Google crawled the site, the last page may contain information different from the current page. Documenting and capturing Google's cached page of a webpage can therefore be important step to ensure this time snapshot is preserved..
Book: Regulation of Securities: SEC Answer Book. Steven Mark Levy. 2011. Aspen Publishers Online. 9781454805434. The World Wide Web is not as ephemeral as one might think. An increasing number of older web pages are available online through such services as the Wayback Machine, Yahoo Cache, or Bing Cache. Some plaintiffs' lawyers and corporate gadflies use these services as a matter of routine..
Web site: Google's caches and .com search engine provide 'right to be forgotten' solutions. Cleland Thom. 2014-10-23. Press Gazette. Journalists can also access delisted content via the Google cache..
Web site: Brussels Court of Appeal upholds judgment against Google News and Google Cache. https://web.archive.org/web/20150426180521/www.eubelius.com/en/spotlight/brussels-court-appeal-upholds-judgment-against-google-news-and-google-cache. dead. 2015-04-26. June 2011. Herman De Bauw, Valerie Vandenweghe. For the cache function, the Court rejected the exception of a "technically necessary copy". This exception exempts temporary reproduction which is a necessary part of a technical process applied by an intermediary for transmission in a network between third parties. According to the Court, the cache copy that Google stores on its server is not technically necessary for efficient transmission..
Web site: Google Search's cache links are officially being retired . 2 February 2024 .
Web site: Freeland . Chris . New Feature Alert: Access Archived Webpages Directly Through Google Search . . September 11, 2024 . 2024-09-11 .