{"id":400232,"date":"2025-09-05T14:49:11","date_gmt":"2025-09-05T14:49:11","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/400232\/"},"modified":"2025-09-05T14:49:11","modified_gmt":"2025-09-05T14:49:11","slug":"ai-bots-bombard-publisher-websites-with-no-meaningful-value-exchange","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/400232\/","title":{"rendered":"AI bots bombard publisher websites with &#8216;no meaningful value exchange&#8217;"},"content":{"rendered":"<p>        <img width=\"1038\" height=\"778\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/09\/shutterstock_2163824051-scaled-e1756904135903-1038x778.webp.webp\" class=\"attachment-4x3-large-crop size-4x3-large-crop wp-post-image\" alt=\"Robots.txt illustration showing example of what the coding looks like\" decoding=\"async\" fetchpriority=\"high\"  \/><br \/>\n                Robots.txt illustration. Picture: Shutterstock\/BestForBest<\/p>\n<p>Unwanted <a href=\"https:\/\/pressgazette.co.uk\/subject\/artificial-intelligence\/\" target=\"_blank\" rel=\"noopener\">AI<\/a> scraping of publisher websites is placing a costly financial burden on publishers, according to one leading industry executive.<\/p>\n<p>Chris Dicker, chief executive of Candr Media Group and board member of the Independent Publishers Alliance, said that the publisher\u2019s Trusted Reviews website was taken down multiple times on 16 August when it was scraped 1.6 million times in a day.<\/p>\n<p>This was up from a previous record of 1.2 million scrapes on the site a day earlier.<\/p>\n<p>He said the average level of AI scraping for Trusted Reviews is running at between approximately 70,000 and 100,000 times a day.<\/p>\n<p>Out of the 1.6 million AI scrapes performed on 16 August, Dicker said he believed these resulted in 603 actual users of generative AI platforms later landing on the Trusted Reviews site,  meaning a clickthrough rate from ChatGPT and other similar platforms of 0.037%. He noted in a Linkedin post that this was \u201cdramatically lower than you would expect from traditional search\u201d.<\/p>\n<p>He told Press Gazette that the referral ratio for the previous week from an AI bot scrape resulting in a human visit was 1,888 to one.<\/p>\n<p>He also said engagement from that traffic \u201cperformed poorly\u201d with 58% less time spent on the site than an average user and 10% fewer pages viewed.<\/p>\n<p>Dicker said Candr is currently preparing to change hosting provider and therefore had not taken some steps such as implementing <a href=\"https:\/\/tollbit.com\/bot-paywall\/\" target=\"_blank\" rel=\"noopener\">Tollbit\u2019s bot paywall<\/a>, which allows sites to set fees for AI crawlers to access their content and which the publisher plans to do. <\/p>\n<p>But Dicker told Press Gazette the data raised concerns about some AI companies not respecting robots.txt signals that a website does not want to be crawled by specified bots.<\/p>\n<p>He said Tollbit\u2019s dashboard for Trusted Reviews showed that OpenAI had ignored robots.txt to scrape the site 12.2 million times in the past three months, with Meta doing so 2.8 million times, Amazon 2.4 million times, Perplexity 101,000 and Bytedance 95,000.<\/p>\n<p>Trusted Reviews saw an increase in AI bots ignoring robots.txt of 75% compared to the previous three months, he added, while in the past six months it was up 89% compared to the prior six months.<\/p>\n<p>Website hosting company <a href=\"https:\/\/pressgazette.co.uk\/news\/major-uk-and-us-publishers-join-forces-to-block-ai-scrapers\/\" target=\"_blank\" rel=\"noopener\">Cloudflare announced in July that it would allow websites to block all scrapers by default.<\/a><\/p>\n<p>In response to <a href=\"https:\/\/www.linkedin.com\/feed\/update\/urn:li:activity:7366097880232640512\/\" target=\"_blank\" rel=\"noopener\">Dicker\u2019s Linkedin post<\/a>, OpenAI\u2019s head of media partnerships Varun Shetty said: \u201cTook a look at your robots.txt file and you are allowing OAI-Searchbot access to your site. If that\u2019s something you\u2019d prefer not to do, you can block the crawler and you should see this issue resolved. We\u2019re working closely with publishers on finding ways that our products can drive them value \u2013 will continue to do that.\u201d<\/p>\n<p>However, Dicker told Press Gazette that it was not OAI-Searchbot that was driving the spike in scraping \u2013 it was <a href=\"https:\/\/platform.openai.com\/docs\/bots\" target=\"_blank\" rel=\"noopener\">another OpenAI bot, ChatGPT-User<\/a>, which is blocked by Trusted Reviews.<\/p>\n<p>This bot has been the most prominent of the scrapers on Trusted Reviews, Dicker said, although there are occasional spikes from other tech companies such as Apple, when it said it planned to create an LLM, and Amazon, ahead of sales periods. Bytedance and Meta are also present.<\/p>\n<p>Dicker also noted that with the onus on publishers to block bots via robots.txt, they need to know what the bots are in order to do so. This means that any new ones that spring up are harder to block in advance.<\/p>\n<p>\u2018Actual hard cost\u2019 to publishers of AI scraping is hard to pin down<\/p>\n<p>Dicker also raised concerns around the cost of huge spikes in scraping to publishers, especially smaller ones.<\/p>\n<p>\u201cFrom an Independent Publishers Alliance perspective, some of our members have come to me and said, Chris, we\u2019ve got an issue where we are getting scraped so much that our hosting providers are saying we now need to move up a package and that is costing thousands of pounds a year,\u201d he said.<\/p>\n<p>Dicker added that the impact is wider than just the hosting fees: \u201cWhat\u2019s the impact on the users that were trying to come to our site when the site went down? What brand impact is happening?\u2026 [and] the fact that we won\u2019t be able to serve adverts to them at that particular time. The actual hard cost is almost impossible to actually put a number on. How do you put a number on damage to your brand?\u201d<\/p>\n<p class=\"has-vivid-red-color has-text-color has-link-color wp-elements-03d0024a831742df0c01a61b62606c41\"><strong>[Read more: <a href=\"https:\/\/pressgazette.co.uk\/news\/urgent-bid-lodged-with-uk-regulator-to-stop-google-ai-overviews-stealing-journalism\/\" target=\"_blank\" rel=\"noopener\">Urgent bid from Independent Publishers Alliance and campaign groups lodged with UK regulator to stop Google AI Overviews \u2018stealing journalism\u2019<\/a>]<\/strong><\/p>\n<p>Dicker said Trusted Reviews has been around for 20 years and spent millions on marketing \u201cto put it in the position it\u2019s in. What\u2019s the impact that all of a sudden users are coming to the site and not able to access it, or come on and have a terrible experience because of the bot activity? A fraction of the millions of pounds we spent? Add it to your bill.\u201d<\/p>\n<p>AI scraping \u2018not matched by meaningful value exchange\u2019<\/p>\n<p>Stuart Forrest, global SEO director for publishing at Bauer Media, told Press Gazette that LLM bots (excluding Google) are making up an average of around 20% of total scraping on Bauer sites and at other publishers according to data he has seen.<\/p>\n<p>He noted that a <a href=\"https:\/\/chatgpt-vs-google.com\/\" target=\"_blank\" rel=\"noopener\">new tracker from software company Ahrefs<\/a> shows that in July, 42% of referral traffic to 44,000 websites came from Google (pointing out that this does not only include publishers, and Google may index even higher if that was the case). Meanwhile ChatGPT, the biggest LLM by referral traffic sent, only referred 0.2% of all traffic.<\/p>\n<p>\u201cOn one hand, you\u2019ve got this growth in cost of scraping, but it\u2019s just not matched by any kind of meaningful value exchange\u2026 for me, that\u2019s the fundamental \u2013 that there is not a value exchange for publishers.\u201d<\/p>\n<p>Forrest added: \u201cWhat do we do with that information when we\u2019re deciding <a href=\"https:\/\/pressgazette.co.uk\/news\/major-uk-and-us-publishers-join-forces-to-block-ai-scrapers\/\" target=\"_blank\" rel=\"noopener\">whether or not to block<\/a>? Right now, just using that data, you\u2019d say, well, we\u2019re going to block, right, because they\u2019re stealing our IP,\u201d adding that publishers can know \u201cwith some confidence\u201d that when a scrape is happening it\u2019s to answer a real user query based on their content \u2013 but without getting the referral traffic.<\/p>\n<p>He asaid the argument often made against blocking is that LLMs are leading to changing search behaviour with new platforms like ChatGPT growing fast, meaning that it will be advantageous in the long run to be present on them. But he pointed to Ahrefs data showing that just 0.2% of traffic to websites is coming to ChatGPT.<\/p>\n<p>\u201cWe\u2019ve just not seen that exponential growth in referral traffic,\u201d he said. \u201cCertainly, if you use referral as a business model, it\u2019s not going to be worth it\u2026 it doesn\u2019t feel like there\u2019s ever going to be a similar to Google business model in which we let them have our content free, we benefit from referral traffic.\u201d<\/p>\n<p>Forrest said he was not sure if high levels of scraping would be causing a huge cost burden to many publishers currently, but noted that if CDN [content delivery network] providers are seeing an impact on their margins, they\u2019ll likely pass on those costs to publishers as contracts come up for renewal.<\/p>\n<p>\n    &#13;\n<\/p>\n<p>Email <b><a href=\"https:\/\/pressgazette.co.uk\/publishers\/digital-journalism\/ai-bots-bombard-publisher-websites-with-no-meaningful-value-exchange\/mailto: pged@pressgazette.co.uk\" target=\"_blank\" rel=\"noopener\">pged@pressgazette.co.uk<\/a><\/b> to point out mistakes, provide story tips or send in a letter for publication on our &#8220;Letters Page&#8221; blog <\/p>\n","protected":false},"excerpt":{"rendered":"Robots.txt illustration. Picture: Shutterstock\/BestForBest Unwanted AI scraping of publisher websites is placing a costly financial burden on publishers,&hellip;\n","protected":false},"author":2,"featured_media":400233,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,138133,1315,53,16,15],"class_list":{"0":"post-400232","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-bauer","11":"tag-chatgpt","12":"tag-technology","13":"tag-uk","14":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115152248268833159","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/400232","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=400232"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/400232\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/400233"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=400232"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=400232"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=400232"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}