California Law Will Require AI Developers to Disclose Training Data

Starting Jan. 1, California will require developers of generative artificial intelligence (AI) models to publicly disclose what data they use to train their systems.

The measure, Assembly Bill 2013, known as the Generative Artificial Intelligence Training Data Transparency Act, was signed into law earlier in 2024 and is scheduled to take effect at the start of next year.

The law requires developers to publish detailed information on their websites about the datasets that power their models. Disclosures must include the sources of data, whether the datasets are publicly available or proprietary, their size and type, whether copyrighted material or personal data are included, and the time period during which the data was collected.

Raising Legal Stakes

Bloomberg Law described AB 2013 as among the most comprehensive U.S. rules on AI disclosure, requiring companies to publish details about the data that trains their models. Researchers and legal analysts add that compliance will not be simple.

As Goodwin Law explained, “Implementing AB 2013 presents several key challenges for developers of generative AI systems. One of the foremost difficulties lies in assembling comprehensive documentation of training datasets, especially for models that have evolved over time. … Many generative models incorporate data from heterogeneous sources, some of which may lack clear provenance or licensing information.”

Generative AI firms are already navigating lawsuits alleging that models were trained on copyrighted works without permission. A disclosure mandate could make it easier to trace which datasets were used, potentially strengthening claims from rights holders. At the same time, researchers argue that transparency could provide a foundation for independent audits and risk assessments.

Advertisement: Scroll to Continue

Industry Pushback

Industry leaders, however, are voicing concern. According to The Wall Street Journal, business-tech executives warned the bill could have a “chilling effect” on development in California, with startups particularly exposed to compliance burdens.

Still, some analysts argue California’s targeted strategy may prove more durable. Some industry voices argue that thoughtful regulation could strengthen rather than stifle innovation. Microsoft’s Chief Scientist Eric Horvitz said that oversight, if “done properly,” can accelerate advances in AI by encouraging more responsible data use and building public trust in new systems.

California has long shaped national practice in technology regulation, from privacy rules to emissions standards. If the disclosure requirements prove workable, other states could follow suit. That possibility gives the law significance well beyond California’s borders.

The broader policy debate is whether transparency alone will be enough. As PYMNTS has reported, Colorado has delayed the AI act implementation to June 2026.

When Citi unveiled its upgraded platform, PYMNTS noted how financial institutions are moving toward clearer safeguards and responsible scaling. California’s law signals that disclosure may soon be more than a best practice, it could become a baseline expectation across industries.

California Law Will Require AI Developers to Disclose Training Data

Tags: