{"id":337153,"date":"2025-08-12T01:09:12","date_gmt":"2025-08-12T01:09:12","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/337153\/"},"modified":"2025-08-12T01:09:12","modified_gmt":"2025-08-12T01:09:12","slug":"reddit-will-block-the-internet-archive","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/337153\/","title":{"rendered":"Reddit will block the Internet Archive"},"content":{"rendered":"<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Reddit says that it has caught AI companies scraping its data from the Internet Archive\u2019s Wayback Machine, so it\u2019s going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">\u201dInternet Archive provides a service to the open web, but we\u2019ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,\u201d spokesperson Tim Rathschmidt tells The Verge.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">The Internet Archive\u2019s mission is to keep a digital archive of websites on the internet and <a href=\"https:\/\/archive.org\/about\/\" target=\"_blank\" rel=\"noopener\">\u201cother cultural artifacts,\u201d<\/a> and the Wayback Machine is a tool you can use to look at pages as they appeared on certain dates, but Reddit believes not all of its content should be archived that way.\u201cUntil they\u2019re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we\u2019re limiting some of their access to Reddit data to protect redditors,\u201d Rathschmidt says.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">The limits will start \u201cramping up\u201d today, and Reddit says it reached out to the Internet Archive \u201cin advance\u201d to \u201cinform them of the limits before they go into effect,\u201d according to Rathschmidt. He says Reddit has also \u201craised concerns\u201d about the ability of people to scrape content from the Internet Archive in the past.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Reddit has a recent history of cutting off access to scraper tools as AI companies have begun to use (and abuse) them en masse, but it\u2019s willing to provide that data if companies pay. Last year, Reddit struck <a href=\"https:\/\/www.theverge.com\/2024\/2\/22\/24080165\/google-reddit-ai-training-data\" target=\"_blank\" rel=\"noopener\">a deal with Google<\/a> for both Google Search and AI training data early last year, and a few months later, it started blocking major search engines from crawling its data <a href=\"https:\/\/www.theverge.com\/2024\/7\/24\/24205244\/reddit-blocking-search-engine-crawlers-ai-bot-google\" target=\"_blank\" rel=\"noopener\">unless they pay<\/a>. It also said its infamous <a href=\"https:\/\/www.theverge.com\/2023\/4\/18\/23688463\/reddit-developer-api-terms-change-monetization-ai\" target=\"_blank\" rel=\"noopener\">API changes from 2023<\/a>, which forced some third-party apps to shut down, <a href=\"https:\/\/www.theverge.com\/23779477\/reddit-protest-blackouts-crushed\" target=\"_blank\" rel=\"noopener\">leading to protests<\/a>, were because those APIs were abused to train AI models.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">Reddit also struck an AI deal with <a target=\"_blank\" href=\"https:\/\/www.theverge.com\/2024\/5\/16\/24158529\/reddit-openai-chatgpt-api-access-advertising\" rel=\"noreferrer noopener\">OpenAI<\/a>, but it <a target=\"_blank\" href=\"https:\/\/www.theverge.com\/ai-artificial-intelligence\/679768\/reddit-sues-anthropic-alleging-its-bots-accessed-reddit-more-than-100000-times-since-last-july\" rel=\"noreferrer noopener\">sued Anthropic<\/a> in June, claiming Anthropic was still scraping from Reddit even after Anthropic <a target=\"_blank\" href=\"https:\/\/www.theverge.com\/2024\/7\/31\/24210565\/reddit-microsoft-anthropic-perplexity-pay-ai-search\" rel=\"noreferrer noopener\">said it<\/a> wasn\u2019t scraping anymore.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\">\u201cWe have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter,\u201d Mark Graham, director of the Wayback Machine, says in a statement to The Verge.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1ymtmqpi _17nnmdy1 _17nnmdy0 _1xwtict1\"><strong>Update, August 11th<\/strong>: Added statement from the Wayback Machine.<\/p>\n","protected":false},"excerpt":{"rendered":"Reddit says that it has caught AI companies scraping its data from the Internet Archive\u2019s Wayback Machine, so&hellip;\n","protected":false},"author":2,"featured_media":337154,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,12,2511,326,53,16,15],"class_list":{"0":"post-337153","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-news","11":"tag-reddit","12":"tag-tech","13":"tag-technology","14":"tag-uk","15":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115013128177954788","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/337153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=337153"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/337153\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/337154"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=337153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=337153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=337153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}