{"id":64074,"date":"2025-07-14T05:29:09","date_gmt":"2025-07-14T05:29:09","guid":{"rendered":"https:\/\/www.europesays.com\/us\/64074\/"},"modified":"2025-07-14T05:29:09","modified_gmt":"2025-07-14T05:29:09","slug":"behind-the-iab-tech-labs-new-initiative-to-deal-with-ai-scraping-and-publisher-revenue-loss","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/64074\/","title":{"rendered":"Behind The IAB Tech Lab\u2019s New Initiative To Deal With AI Scraping And Publisher Revenue Loss"},"content":{"rendered":"<p>In June, the IAB Tech Lab proposed a new initiative to create guardrails around how AI bots are permitted to access content, with an emphasis on publisher monetization.<\/p>\n<p>It\u2019s hoping that its new solution will get publishers back on their feet \u2013 and keep them there.<\/p>\n<p>Publishers are like \u201cthe plankton of the digital media ecosystem,\u201d said IAB Tech Lab CEO Anthony Katsur.<\/p>\n<p>Every living thing in an aquatic environment depends on plankton. If they die out, the rest of the ocean goes down with them. And if publishers collapse, that would be an \u201cextinction-level event\u201d for digital media, Katsur said.        &#13;\n        <\/p>\n<p>            <img decoding=\"async\" alt=\"\" class=\"alignnone\" src=\"https:\/\/www.europesays.com\/us\/wp-content\/uploads\/2025\/07\/advertisement2.png\"\/><\/p>\n<p>Many publishers are still managing to stay afloat, but the water is choppy, with <a href=\"https:\/\/www.adexchanger.com\/the-sell-sider\/from-clicks-to-connections-three-essential-tips-for-the-post-traffic-era\/\" data-wpel-link=\"internal\" target=\"_blank\" rel=\"noopener\">traffic<\/a> falling off the metaphorical cliff and no metaphorical harness in sight.<\/p>\n<p><strong>A life raft for publishers<\/strong><\/p>\n<p>The IAB\u2019s initiative, currently called the LLM Content Ingest API Initiative (\u201cwhich we need to rename,\u201d Katsur joked; it\u2019s \u201ca mouthful\u201d) can be broken down into four major components.<\/p>\n<p>The first is access controls, which determine who is allowed to access a publisher\u2019s content in the first place.<\/p>\n<p>Once controls are established, access terms come into place, such as licensing models and content tiers. Under the IAB\u2019s guidelines, content will be segregated into tiers based on relevance and value.<\/p>\n<p>\u201cYour archival content from 10 years ago is not worth as much as your late-breaking news or your interview with Taylor Swift,\u201d Katsur said.<\/p>\n<p>The guidelines would also mandate logging the use of content, which Katsur defines as \u201ctracking and recording when and how publisher content is accessed or used by an LLM or AI system,\u201d so publishers can accurately invoice and track usage of their data.<\/p>\n<p>    <a class=\"d-block float-end font-heading btn btn-sm btn-primary ms-2 text-bg-primary\" role=\"button\" href=\"https:\/\/www.adexchanger.com\/newsletter\/?utm_campaign=subscribe&amp;utm_medium=web&amp;utm_source=ad&amp;utm_term=house&amp;utm_content=subscribe\" data-wpel-link=\"internal\"><br \/>\n        Subscribe<\/p>\n<p>    <\/a><br \/>\n    AdExchanger Daily<\/p>\n<p class=\"mb-0\">Get our editors\u2019 roundup delivered to your inbox every weekday.<\/p>\n<p>Content logging ties into the final part of the initiative, which Katsur believes is the most important facet: tokenization. Tokenization involves breaking content down into smaller units made up of words, parts of words, punctuation or metadata, Katsur said. These units, called tokens, are used to train LLMs and generate their responses. Publisher content gets tokenized and uniquely assigned to each publisher.<\/p>\n<p>Then, \u201cusing the logging and reporting functions that we are proposing,\u201d he explained, publishers can see exactly how the information scraped from their sites is being used.<\/p>\n<p>Tokenization is useful for brands, too, so they can see what is being said about their products and by whom. Many LLMs scrape sites like Reddit, for example, and parrot back what they find as fact \u2013 despite the information often being outdated, if not outright incorrect.<\/p>\n<p>As AI continues to make a name for itself in search, a set of guidelines like the LLM Content Ingest API Initiative (looking forward to that new name) is the best way to ensure that query responses are accurate, Katsur said, and that publishers \u2013 and with them, the rest of the ad tech ecosystem \u2013 continue to thrive.<\/p>\n<p><strong>The big picture<\/strong><\/p>\n<p>But let\u2019s zoom out.<\/p>\n<p>What actually happens when a bot scrapes a website?<\/p>\n<p>First, it\u2019s important to note that AI isn\u2019t born with limitless knowledge. It has to get that knowledge from somewhere. That\u2019s why AI bots mine websites, which are vast troves of information.<\/p>\n<p>Sometimes, scraping is one-and-done. When a query is for something straightforward, like a chocolate chip cookie recipe, a bot typically won\u2019t need to continue scraping a site for more updated information, Katsur explained, since a cookie recipe doesn\u2019t generally update or evolve. And once an AI model has a good recipe, it can feed it (no pun intended) to the hundreds of thousands of people requesting it.<\/p>\n<p>It\u2019s not guaranteed that after a page is scraped once it never will be scraped again. There is a common misconception \u201cthat once an LLM crawls, it stores all the data and never crawls again,\u201d said Katsur. IAB research has shown that crawlers will recrawl content they have already accessed.<\/p>\n<p>Still, scraping the same page a handful of additional times doesn\u2019t scale against the pay-per-visit model that publishers are used to.<\/p>\n<p>With a pay-per-crawl model, a publisher gets paid when a bot pulls information from its site \u00ad\u2013 and that\u2019s basically the end of the story. No matter how many of a generative AI search engine\u2019s users benefit from that information down the line, the publisher only gets paid once per scrape.<\/p>\n<p>Pay per query, on the other hand, is more similar to the way publishers currently drive revenue, and is the model favored by the IAB Tech Lab. \u201cNow you\u2019re getting paid per use,\u201d said Katsur, \u201cwhich is similar to getting paid per visit.\u201d<\/p>\n<p>\u201cPay per query scales,\u201d he said. \u201cPay per crawl does not.\u201d<\/p>\n<p>Problem is, even pay per crawl isn\u2019t guaranteed. Plenty of bots are <a href=\"https:\/\/www.404media.co\/ai-scraping-bots-are-breaking-open-libraries-archives-and-museums\/\" data-wpel-link=\"external\" target=\"_blank\" rel=\"noopener\">scraping sites without providing any compensation<\/a> and, technically, that\u2019s allowed \u2013 for now.<\/p>\n<p>But that seems to be changing, as more companies develop models that put publisher monetization at the forefront.<\/p>\n<p>Earlier this month, <a href=\"https:\/\/www.adexchanger.com\/daily-news-roundup\/wednesday-20250207\/\" data-wpel-link=\"internal\" target=\"_blank\" rel=\"noopener\">Cloudflare<\/a> implemented a new pay-per-crawl model that gives publishers full rein over the access they provide to bots. Publishers can give full access, block all scraping or opt into the new pay-per-crawl model, which requires bots to share payment information so they can be charged for each scrape.<\/p>\n<p>That\u2019s something \u2013\u00a0although, until this sort of model is widely adopted, publisher traffic is still in serious danger.<\/p>\n<p>But, hey, along with the LLM Content Ingest API Initiative, it\u2019s definitely a start.<\/p>\n","protected":false},"excerpt":{"rendered":"In June, the IAB Tech Lab proposed a new initiative to create guardrails around how AI bots are&hellip;\n","protected":false},"author":3,"featured_media":64075,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[691,45686,738,12993,45687,45688,304,45689,45690,45691,4196,45693,158,45692,67,132,68],"class_list":{"0":"post-64074","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-anthony-katsur","10":"tag-artificial-intelligence","11":"tag-cloudflare","12":"tag-content-monetization","13":"tag-data-scraping","14":"tag-generative-ai","15":"tag-iab","16":"tag-iab-tech-lab","17":"tag-llms","18":"tag-monetization","19":"tag-publishers","20":"tag-technology","21":"tag-tony-katsur","22":"tag-united-states","23":"tag-unitedstates","24":"tag-us"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/114849943570755971","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/64074","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=64074"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/64074\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/64075"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=64074"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=64074"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=64074"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}