Most big AI providers scrape the open web, hoovering up content to improve their chatbots, which then compete with publishers for the attention of internet users. However, more AI orgs might have to pay up soon, because the Really Simple Licensing (RSL) spec has reached version 1.0, providing guidance on how to set machine-readable rules for crawlers.

“Today’s release of RSL 1.0 marks an inflection point for the open internet,” said Eckart Walther, chair of the RSL technical steering committee, in a statement. “RSL establishes clarity, transparency, and the foundation for new economic frameworks for publishers and AI systems, ensuring that internet innovation can continue to flourish, underpinned by clear, accountable content rights.”

Introduced in September, RSL represents a response to the explosion of automated content harvesting intended to provide fodder for AI model training. It’s intended to complement the Robots Exclusion Protocol [RFC 9309], a way for websites to declare acceptable methods of engagement through a robots.txt file.

In a bid to prevent their content from being laundered for profit in an AI model, publishers are increasingly trying to negotiate licensing deals or block bot-based data gathering. Web site operators typically publish a robots.txt file at the site root to provide guidance to automated traffic. But robots.txt compliance is voluntary and many crawlers ignore the directive.

RSL builds upon syndication spec RSS and the Robots Exclusion Protocol by providing a way to declare requirements for accessing and processing content, which may involve a demand for compensation.

The specification includes an XML vocabulary for describing content usage, licensing, and legal terms of service. The RSL document – functionally a machine readable license – can be integrated with other web mechanisms, including robots.txt, HTTP headers, RSS feeds, and HTML link elements.

It provides support for license acquisition and enforcement via the Open License Protocol (OLP), the Crawler Authorization Protocol (CAP), and the Encrypted Media Standard (EMS).

The RSL 1.0 release adds new categories for the element such as “ai-all,” “ai-input,” and “ai-index,” to accommodate more specific AI usage rules, such as allowing search engines to index content but not use it for AI search applications. It also includes a new “contribution” payment option for noncommercial organizations that want “a good faith monetary or in-kind contribution that supports the development or maintenance of the assets, or the broader content ecosystem.”

While RSL is similar to the Robots Exclusion Protocol in that it’s not a technical access control mechanism, it provides support for publishers and partners that choose to implement paywalls and other barriers.

There are various technical options to enforce the preferences expressed in RSL and robots.txt declarations for bots that fail to comply, such as network-level barriers. But sometimes legal intervention is required to halt bad behavior. Bad bots may still flout or bypass RSL requirements, but the spec’s support for licensing services, encryption mechanisms, and authentication mechanisms should help publishers who choose to challenge such behavior in court.

The RSL spec has been endorsed by infrastructure companies like Cloudflare and Akamai, which offer content tollbooth services for billing AI bots; by publishers like The Associated Press; by social media sites like Stack Overflow; and by micropayment biz Supertab; among others.

“From what we’ve seen over the last couple of years and the effect that bot scraping has had on these publications, whether that be from the traffic onto their sites, the loss of the advertising revenue on those sites, et cetera, it’s time for a new offering that benefits these publications and the content that they provide,” Supertab director of growth, Erick McAfee told The Register in an interview.

Supertab provides a payment layer for RSL and has been beta testing implementations with about a dozen customers for the past two quarters, although bots aren’t actually being billed at this point. McAfee said the testing aims to validate how payments would flow if in fact the bots comply.

“The goal is to be able in the future to provide an invoice to these LLMs and explain, ‘This is the cause and this is the effect and this is the cost of what’s happened.’ But as of right now, we’re just collecting data to show what’s going on,” he said.

McAfee said that while he couldn’t share information about specific customers, “the data is impressive in the sense that it’s definitely impactful” in terms of the impact AI bots have had on site visits and reduced advertising revenue. ®