{"id":194041,"date":"2025-09-02T12:03:10","date_gmt":"2025-09-02T12:03:10","guid":{"rendered":"https:\/\/www.europesays.com\/us\/194041\/"},"modified":"2025-09-02T12:03:10","modified_gmt":"2025-09-02T12:03:10","slug":"ai-companies-must-honour-a-foundational-rule-of-the-internet-respecting-site-owners-wishes-on-content","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/194041\/","title":{"rendered":"AI companies must honour a foundational rule of the internet: Respecting site owners\u2019 wishes on content"},"content":{"rendered":"<p><a style=\"display:block\" href=\"https:\/\/www.theglobeandmail.com\/resizer\/v2\/NQI54AS25ZAZNPXUU2POOGDMA4.jpg?auth=0939710d519a666b8588cf564e68679b46b5fe71272d028516030923f1377388&amp;width=600&amp;height=400&amp;quality=80&amp;smart=true\" aria-haspopup=\"true\" data-photo-viewer-index=\"0\" target=\"_blank\" rel=\"noopener\">Open this photo in gallery:<\/a><\/p>\n<p class=\"figcap-text\">AI tools are diverting traffic and attention away from Wikipedia and other websites, starving them of the revenue that they need to survive, researchers have concluded.Gregory Bull\/The Associated Press<\/p>\n<p class=\"c-article-body__text text-pr-5\">Viet Vu is manager of economic research at the Dais at Toronto Metropolitan University.<\/p>\n<p class=\"c-article-body__text text-pr-5\">One foundational rule of the internet has been the social contract between website owners and search giants such as Google. <\/p>\n<p class=\"c-article-body__text text-pr-5\">Owners would agree to let search-engine bots crawl and index their websites for free, in return for their sites showing up in search results for relevant queries, driving traffic. If they didn\u2019t want their website to be indexed, they could politely note it in what\u2019s called a \u201crobots.txt\u201d file on their server, and upon seeing it, the bot would leave the site unindexed.<\/p>\n<p class=\"c-article-body__text text-pr-5\">Without intervention from any political or legal bodies, most companies have complied (with some hiccups) with the voluntary standard since its<b> <\/b>introduction in 1994. That is, until the latest generation of large language models, or LLMs, emerged.<\/p>\n<p class=\"c-article-body__text text-pr-5\">LLMs are data-hungry. A previous version of ChatGPT, OpenAI\u2019s chatbot model, was reportedly trained with data equivalent to 10 trillion words, or enough to fill more than 13 billion Globe and Mail op-eds. It would take a columnist writing daily for 36.5 million years to generate sufficient data to train that model.<\/p>\n<p class=\"c-article-body__text mv-16 l-inset text-pb-8\" data-sophi-feature=\"interstitial\"><a href=\"https:\/\/www.theglobeandmail.com\/business\/commentary\/article-ai-will-ruin-art-and-it-ruin-the-sparkle-in-our-lives\/\" target=\"_blank\" rel=\"noopener\">Opinion: AI will ruin art, and it will ruin the sparkle in our lives<\/a><\/p>\n<p class=\"c-article-body__text text-pr-5\">To satisfy this need, <a href=\"https:\/\/www.theglobeandmail.com\/topics\/artificial-intelligence\/\" target=\"_self\" rel=\"noopener\" title=\"https:\/\/www.theglobeandmail.com\/topics\/artificial-intelligence\/\">artificial-intelligence<\/a> companies have to look to the broader internet for high-quality text input. As it turns out, there aren\u2019t nearly enough websites that allow bots to collect data to create better chatbots.<\/p>\n<p class=\"c-article-body__text text-pr-5\">Some AI companies state that they respect robots.txt. In some cases, their public statements are allegedly at odds with their actual practices.<b> <\/b>But even when these pledges are genuine, companies benefit from another loophole in robots.txt: To block a bot, the website owner must be able to specify the bot\u2019s name. And many AI companies\u2019 bots\u2019 names are only disclosed or discovered after they have crawled freely through the internet.<\/p>\n<p class=\"c-article-body__text text-pr-5\">The impact of these bots is profound. Take Wikipedia, for example: Its content is freely licensed, allowing bots to crawl through its trove of high-quality information. Between January, 2024, and April, 2025,<b> <\/b>the non-profit found that <a href=\"https:\/\/diff.wikimedia.org\/2025\/04\/01\/how-crawlers-impact-the-operations-of-the-wikimedia-projects\/\" target=\"_self\" rel=\"noopener\" title=\"https:\/\/diff.wikimedia.org\/2025\/04\/01\/how-crawlers-impact-the-operations-of-the-wikimedia-projects\/\">its multimedia downloads increased by 50 per cent<\/a>, driven in large part by AI companies\u2019 bots downloading licence-free images to train their products.<\/p>\n<p class=\"c-article-body__text text-pr-5\">Wikipedia says the bot traffic is adding to its operating costs. This ultimately could have been profitable for the site if these chatbots directed users to a page inviting them to contribute to the donation-funded organization.<\/p>\n<p class=\"c-article-body__text text-pr-5\">Instead, <a href=\"https:\/\/datareportal.com\/reports\/digital-2025-exploring-trends-in-wikipedia-traffic\" target=\"_self\" rel=\"noopener\" title=\"https:\/\/datareportal.com\/reports\/digital-2025-exploring-trends-in-wikipedia-traffic\">researchers have concluded<\/a> that as AI tools divert traffic and attention away from Wikipedia and other websites, those sites are increasingly starved of the revenue that they need to survive, even as the tools themselves rely on those sites for input. Figures published by Similarweb indicate a 22.7-per-cent drop in Wikipedia\u2019s overall traffic between 2022 and 2025. <\/p>\n<p class=\"c-article-body__text text-pr-5\">Instead of a quid pro quo of letting search engines crawl websites and serving traffic to them<b> <\/b>through search results, the current arrangement means websites face increased costs while seeing fewer actual visitors, and while also providing the resources to train ever-improving AI chatbots. Without intervention, this threatens to create a vicious cycle where AI products drive websites to shut down, destroying the very data AI needs to improve further.<\/p>\n<p class=\"c-article-body__text text-pr-5\">In response, companies such as Cloudflare, a web-services provider, have started treating AI bots as hackers trying to compromise their customers\u2019 cybersecurity. In August, Cloudflare alleged that Perplexity, a leading AI company, is actively developing new ways to hide its crawling activities to circumvent existing cybersecurity barriers.<\/p>\n<p class=\"c-article-body__text text-pr-5\">A technical showdown against the largest and most well-resourced technology companies in the world isn\u2019t sustainable. The obvious solution would be for these AI companies to honour the trust-based mechanism of robots.txt. However, given competitive pressures, companies aren\u2019t incentivized to do the right thing.<\/p>\n<p class=\"c-article-body__text text-pr-5\">Individual countries are also struggling to find ways to protect individual content creators. For example, Canada\u2019s Online News Act was an attempt to compel social-media companies to compensate news organizations for lost revenue. Instead of achieving that goal, Ottawa learned that companies such as Meta would rather remove Canadians\u2019 access to news on their platforms than compensate publishers.<\/p>\n<p class=\"c-article-body__text text-pr-5\">Our next-best bet is an international agreement, akin to the Montreal Protocol, which bound countries to co-ordinate laws phasing out substances that eroded the Earth\u2019s ozone layer. Absent American leadership, Canada should lead in establishing a similar protocol for AI bots. It could encourage countries to co-ordinate legislative efforts compelling companies to honour robots.txt instructions. If all tech companies around the world had to operate under common rules, it would level the playing field by removing the competitive pressure to race to the bottom.<\/p>\n<p class=\"c-article-body__text text-pr-5\">AI technology can, and should, benefit the world \u2013 but it cannot do so by breaking the internet.<\/p>\n","protected":false},"excerpt":{"rendered":"Open this photo in gallery: AI tools are diverting traffic and attention away from Wikipedia and other websites,&hellip;\n","protected":false},"author":3,"featured_media":194042,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[2148,2138,671,104,2132,692,2147,2131,2143,2144,2140,2133,2130,79,407,746,2142,2137,2159,2134,2135,454,712,2139,1165,728,2149,108,2154,2155,2157,2152,2156,2150,2153,2136,85,2146,80,2145,2151,1458,158,1164,2141,67,132,68,1154,107,2158],"class_list":{"0":"post-194041","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-internet","8":"tag-alberta","9":"tag-arts-news","10":"tag-bc","11":"tag-breaking-news","12":"tag-breaking-news-video","13":"tag-british-columbia","14":"tag-canada","15":"tag-canada-news","16":"tag-canada-sports","17":"tag-canada-sports-news","18":"tag-canada-trafficcanada-weather","19":"tag-canadian-breaking-news","20":"tag-canadian-news","21":"tag-economy","22":"tag-education","23":"tag-environment","24":"tag-federal-government","25":"tag-foreign-news","26":"tag-globe-and-mail","27":"tag-globe-and-mail-breaking-news","28":"tag-globe-and-mail-canada-news","29":"tag-government","30":"tag-internet","31":"tag-life-news","32":"tag-lifestyle","33":"tag-local-news","34":"tag-manitoba","35":"tag-national-news","36":"tag-new-brunswick","37":"tag-newfoundland-and-labrador","38":"tag-northwest-territories","39":"tag-nova-scotia","40":"tag-nunavut","41":"tag-ontario","42":"tag-pei","43":"tag-photos","44":"tag-political-news","45":"tag-political-opinion","46":"tag-politics","47":"tag-politics-news","48":"tag-quebec","49":"tag-sports-news","50":"tag-technology","51":"tag-travel","52":"tag-trudeau","53":"tag-united-states","54":"tag-unitedstates","55":"tag-us","56":"tag-us-news","57":"tag-world-news","58":"tag-yukon"},"share_on_mastodon":{"url":"","error":"Validation failed: Text character limit of 500 exceeded"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/194041","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=194041"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/194041\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/194042"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=194041"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=194041"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=194041"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}