{"id":130343,"date":"2025-10-18T15:00:14","date_gmt":"2025-10-18T15:00:14","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/130343\/"},"modified":"2025-10-18T15:00:14","modified_gmt":"2025-10-18T15:00:14","slug":"wikipedia-contributors-are-worried-about-ai-scraping","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/130343\/","title":{"rendered":"Wikipedia Contributors Are Worried About AI Scraping"},"content":{"rendered":"<p>                  <img decoding=\"async\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/10\/44b03b0db8e1d6ce874080785a3d7e33c6-Wikipedia-herrman.rhorizontal.w1100.jpg\" class=\"lede-image\" data-content-img=\"\" width=\"1100\" height=\"733\" style=\"width:100%;height:auto;\" fetchpriority=\"high\"\/> <\/p>\n<p class=\"clay-paragraph_drop-cap\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5mv6u002d0ij44lmrfxf5@published\" data-word-count=\"81\">Over at the <a href=\"https:\/\/diff.wikimedia.org\/2025\/10\/17\/new-user-trends-on-wikipedia\/\" rel=\"nofollow noopener\" target=\"_blank\">official blog<\/a> of the Wikipedia community, Marshall Miller untangled a recent mystery. \u201cAround May 2025, we began observing unusually high amounts of apparently human traffic,\u201d he wrote. Higher traffic would generally be good news for a volunteer-sourced platform that aspires to reach as many people as possible, but it would also be surprising: The rise of chatbots and the AI-ification of Google Search have <a href=\"https:\/\/nymag.com\/intelligencer\/article\/inside-the-medias-traffic-apocalypse.html\" rel=\"nofollow noopener\" target=\"_blank\">left many big websites with fewer visitors<\/a>. Maybe Wikipedia, like <a href=\"https:\/\/nymag.com\/intelligencer\/article\/why-you-are-reading-reddit-a-lot-more-these-days.html\" rel=\"nofollow noopener\" target=\"_blank\">Reddit<\/a>, is an exception?<\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5nkut000h3b74syjej2hg@published\" data-word-count=\"5\">Nope! It was just bots:<\/p>\n<blockquote data-uri=\"nymag.com\/intelligencer\/_components\/blockquote\/instances\/cmgv5nqbv000p3b74r2usjq0k@published\" class=\"blockquote\" data-editable=\"text\" data-word-count=\"86\">\n<p>This [rise] led us to investigate and update our bot detection systems. We then used the new logic to reclassify our traffic data for March\u2013August 2025, and found that much of the unusually high traffic for the period of May and June was coming from bots that were built to evade detection \u2026 after making this revision, we are seeing declines in human pageviews on Wikipedia over the past few months, amounting to a decrease of roughly 8% as compared to the same months in 2024.<\/p>\n<\/blockquote>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5nwnm000y3b74ftrxoxeq@published\" data-word-count=\"115\">To be clearer about what this means, these bots aren\u2019t just vaguely inauthentic users or some incidental side effect of the general spamminess of the internet. In many cases, they\u2019re bots working on behalf of AI firms, going undercover as humans to scrape Wikipedia for training or summarization. Miller got right to the point. \u201cWe welcome new ways for people to gain knowledge,\u201d he wrote. \u201cHowever, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia.\u201d Fewer real visits means fewer contributors and donors, and it\u2019s easy to see how such a situation could send one of the great experiments of the web into a death spiral.<\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5nxud00173b74r8ixmtjp@published\" data-word-count=\"166\">Arguments like this are intuitive and easy to make, and you\u2019ll hear them beyond the ecosystem of the web: AI models ingest a lot of material, often without clear permission, and then offer it back to consumers in a form that\u2019s often <a href=\"https:\/\/nymag.com\/intelligencer\/article\/ai-boom-expanding-google-dominance.html\" rel=\"nofollow noopener\" target=\"_blank\">directly competitive<\/a> with the people or companies that provided it in the first place. Wikipedia\u2019s authority here is bolstered by how it isn\u2019t trying to make money \u2014\u00a0it\u2019s run by a foundation, not an established commercial entity that feels threatened by a new one \u2014\u00a0but also by its unique position. It was founded as a stand-alone reference resource before settling ambivalently into a new role: A site that people mostly just found through Google but in greater numbers than ever. With the rise of LLMs, Wikipedia became important in a new way as a uniquely large, diverse, well-curated data set about the world; in return, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials.<\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5nxud00183b743zj0lrt2@published\" data-word-count=\"83\">Here\u2019s an example: Let\u2019s say you\u2019re reading this article and become curious about Wikipedia itself \u2014 its early history, the <a href=\"https:\/\/www.foxnews.com\/media\/wikipedias-co-founder-anonymous-editors-why-site-biased-against-conservatives-how-fix\" rel=\"nofollow noopener\" target=\"_blank\">wildly<\/a> <a href=\"https:\/\/nymag.com\/intelligencer\/article\/jimmy-wales-on-why-wikipedia-is-still-so-good.html\" rel=\"nofollow noopener\" target=\"_blank\">divergent<\/a> opinions of its original founders, its funding, etc. Unless you\u2019ve been paying attention to this stuff for decades, it may feel as if it\u2019s always been there. Surely, there\u2019s more to it than that, right? So you ask Google, perhaps as a shortcut for getting to a Wikipedia page, and Google uses AI to generate a blurb that looks like this:<\/p>\n<p>                  <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/10\/c9a68aa98bcf6381e1b918bc8893c9abd8-Wikipedia-herrman-3.rhorizontal.w700.jpg\" class=\"img-data\" data-content-img=\"\" width=\"700\" height=\"467\" style=\"width:100%;height:auto;\"\/> <\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5osgo001q3b74yiq1i3ci@published\" data-word-count=\"67\">This is an AI Overview that summarizes, among other things, Wikipedia. Formally, it\u2019s pretty close to an encyclopedia article. With a few formatting differences \u2014\u00a0notice the bullet-point AI-ese \u2014\u00a0it hits a lot of the same points as Wikipedia\u2019s article about itself. It\u2019s a bit shorter than the top section of the official article and contains far fewer details. It\u2019s fine! But it\u2019s a summary of a summary.<\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5ots6001z3b74dqvcy2z3@published\" data-word-count=\"30\">The next option you encounter still isn\u2019t Wikipedia\u2019s article \u2014\u00a0that shows up further down. It\u2019s a prompt to \u201cDive deeper in AI Mode.\u201d If you do that, you see this:<\/p>\n<p>                  <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/10\/3dc9f3ac5c1ebfc378370c00f99c16de92-Wikipedia-herrman-2.rvertical.w570.jpg\" class=\"img-data\" data-content-img=\"\" width=\"570\" height=\"712\" style=\"width:100%;height:auto;\"\/> <\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5s7sw002i3b74jh5kn0ep@published\" data-word-count=\"58\">It\u2019s another summary, this time with a bit of commentary. (Also: If Wikipedia is \u201cgenerally not considered a reliable source itself because it is a tertiary source that synthesizes information from other places,\u201d then what does that make a chatbot?) There are links in the form of footnotes, but as Miller\u2019s post suggests, people aren\u2019t really clicking them.<\/p>\n<p class=\"clay-paragraph\" data-editable=\"text\" data-uri=\"nymag.com\/intelligencer\/_components\/clay-paragraph\/instances\/cmgv5s928002r3b74ism3eluz@published\" data-word-count=\"88\">Google\u2019s treatment of Wikipedia\u2019s autobiography is about as pure an example as you\u2019ll see of AI companies\u2019 effective relationship to the web (and maybe much of the world) around them as they build strange, complicated, but often <a href=\"https:\/\/nymag.com\/intelligencer\/article\/what-do-people-actually-use-chatgpt-for.html\" rel=\"nofollow noopener\" target=\"_blank\">compelling<\/a> products and deploy them to hundreds of millions of people. To these companies, it\u2019s a resource to be consumed, processed, and then turned into a product that attempts to render everything before it is obsolete \u2014 or at least to bury it under a heaping pile of its own output.<\/p>\n<p>          Sign Up for John Herrman column alerts<\/p>\n<p>Get an email alert as soon as a new article publishes.<\/p>\n<p>        Vox Media, LLC Terms and Privacy Notice<\/p>\n<p class=\"expanded-terms \" aria-hidden=\"true\">By submitting your email, you agree to our <a href=\"https:\/\/nymag.com\/newyork\/terms\/\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">Terms<\/a> and <a href=\"https:\/\/nymag.com\/newyork\/privacy\/\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">Privacy Notice<\/a> and to receive email correspondence from us.<\/p>\n","protected":false},"excerpt":{"rendered":"Over at the official blog of the Wikipedia community, Marshall Miller untangled a recent mystery. \u201cAround May 2025,&hellip;\n","protected":false},"author":2,"featured_media":130344,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[261],"tags":[291,289,290,18,19,17,9297,9298,82,65729],"class_list":{"0":"post-130343","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-eire","12":"tag-ie","13":"tag-ireland","14":"tag-john-herrman","15":"tag-screen-time","16":"tag-technology","17":"tag-wikipedia"},"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/130343","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=130343"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/130343\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/130344"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=130343"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=130343"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=130343"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}