California's AB 2013, the Generative AI Training Data Transparency Act, took effect January 1, 2026. If you develop a generative AI system and make it available to users in California, you are now required to post specific information about your training data on your website before the system goes live.
The law has been in effect for six months. OpenAI and Anthropic have published compliant disclosures. xAI challenged the law in federal court and lost its bid for a preliminary injunction in March 2026. The law stands.
TL;DR: California AB 2013 requires generative AI developers to disclose 12 categories of training data information on their website before making a system available to Californians. Effective January 1, 2026. No standalone penalty, but enforcement runs via the AG under California's Unfair Competition Law. xAI's constitutional challenge was rejected at the preliminary injunction stage. OpenAI and Anthropic have both posted compliant disclosures.
What AB 2013 requires
The law requires developers to publish a high-level summary of their training data before making a covered AI system publicly available, and to update that disclosure whenever a substantial modification to the system is made.
The disclosure must cover 12 categories, drawn directly from the statute:
- The sources or owners of the training datasets
- A description of how the datasets serve the intended purpose of the AI system
- The number of data points in the datasets (may be stated as a range)
- Whether the data includes content protected by copyright, trademark, or patent
- Whether the data is entirely in the public domain
- Whether the developer purchased or licensed the data
- Whether the data includes personal information as defined under California law
- Whether the data includes aggregate consumer information
- Whether the data was collected directly from individuals
- Whether synthetic data generation was used in developing the system
- The dates or date range when the data was collected
- The countries where the data was collected
The law specifies that the disclosure must be publicly accessible on the developer's website. It does not require publication of the datasets themselves, model weights, or proprietary training pipelines.
Who is covered
AB 2013 covers developers of generative AI systems that are released on or after January 1, 2022, and made available to Californians for free or for a fee.
The law applies based on where users are located, not where the developer is headquartered. A company outside California must comply if it makes a covered system available to California residents.
Not covered: Systems used solely for internal business operations and never made available externally. Systems used solely for security and integrity purposes, national airspace operations, or national security and defense are also exempt.
The scope includes API access. If you offer a generative AI model through an API that developers use to build products served to California users, you are covered as the developer of the underlying model. If you are deploying another company's model via API without modification, you are not the developer and the disclosure obligation sits with the model provider.
If you have fine-tuned a model on your own data and are making that system available to California users, the fine-tuning data is part of your training dataset and you need your own AB 2013 disclosure covering it.
The xAI lawsuit and what it means for compliance
Elon Musk's xAI filed a federal lawsuit against California Attorney General Rob Bonta on December 29, 2025, one day before the law took effect. The lawsuit challenged AB 2013 on three constitutional grounds.
Fifth Amendment (Takings Clause): xAI argued that training data information constitutes trade secrets worth billions of dollars, and that compelled public disclosure without compensation is an unconstitutional taking.
First Amendment: The complaint alleged forced disclosure compels speech without sufficient justification.
Due Process: xAI argued the law's requirements are unconstitutionally vague.
A federal judge in the Central District of California denied xAI's motion for a preliminary injunction on March 4, 2026, finding the constitutional claims insufficiently developed to justify blocking the law. The law remained in effect throughout. xAI has appealed to the Ninth Circuit but as of June 2026 the law is fully operative.
The practical consequence: xAI's challenge did not create a compliance exception for other developers. OpenAI and Anthropic did not join the lawsuit and posted disclosures in January 2026.
The trade secret tension
AB 2013 includes no explicit trade secret exemption, which is the central issue in the xAI litigation. The law requires disclosure of information about training datasets, not the datasets themselves, but even categorical disclosure about sources and licensing may feel sensitive to some developers.
For most companies the practical risk is lower than xAI portrays it. Disclosing that a dataset contains licensed content from general categories of sources, covers a date range, and includes no personal information does not necessarily reveal the specific datasets or curation methodology that constitutes a trade secret.
Legal counsel who have reviewed AB 2013 generally recommend erring toward disclosure rather than withholding under an implied trade secret exception that the statute does not provide. The enforcement risk of non-compliance under California's Unfair Competition Law is greater than the competitive risk of publishing the required categorical summary.
How OpenAI and Anthropic complied
Both companies published training data summaries in January 2026, providing a practical template for what regulators may consider sufficient.
OpenAI's disclosure describes training data categories, including text from the internet, licensed datasets, books, and code. It confirms synthetic data generation, notes geographic distribution of sources, and states that personal information may be present in internet-sourced data with steps taken to filter sensitive content.
Anthropic published a similar disclosure covering training data categories, synthetic data use, and geographic scope, with a note that proprietary details about specific datasets are protected under trade secret law where applicable.
The shared approach: describe categories honestly, note what was licensed, acknowledge personal information presence while describing mitigation, and invoke trade secret protection for specific dataset identities rather than for the categorical disclosure itself. That framing is now the de facto compliance standard.
Enforcement: what the AG can actually do
AB 2013 does not create its own enforcement mechanism or specify penalties. Enforcement runs through California's Unfair Competition Law (UCL, Business and Professions Code §17200), which allows the state Attorney General, city attorneys, and county counsel to sue for injunctive relief and civil penalties of up to $2,500 per violation.
The practical enforcement picture six months after the law took effect: the AG's office has not brought an AB 2013 enforcement action as of June 2026. The focus has been on monitoring compliance rather than enforcement. Most major AI developers have either posted disclosures or are in the process of doing so.
The absence of enforcement to date does not mean the risk is zero. UCL enforcement does not require proof of harm to individual consumers, and the AG can act on a complaint or on its own initiative. Companies that have made a deliberate choice not to comply are in a different risk category than companies that posted a good-faith disclosure that may be incomplete.
There is also no private right of action under AB 2013. Individual California residents cannot sue a developer directly for failing to post the required disclosure. Enforcement is entirely at the government's discretion.
AB 2013 vs. SB 942: two separate laws
A common point of confusion is that California has two AI transparency laws with overlapping names.
AB 2013 (this article) requires developers to disclose training data information on their website. The obligation is on the AI developer, and it covers what went into building the model.
SB 942 (the California AI Transparency Act, amended by AB 853) requires AI systems to label and watermark AI-generated content so users can tell when content was produced by AI. The operative date was pushed to August 2, 2026, to align with the EU AI Act. The obligation is on the deployer of AI-generated content, not on the model developer.
A company can be subject to both. If you develop a generative AI model and deploy it to produce content for California users, AB 2013 applies to your training data disclosure and SB 942 applies to your content labeling. The California SB 942 compliance guide covers the content labeling obligations separately.
Copy-paste AB 2013 disclosure template
This template covers the 12 required categories. Have legal counsel review before publishing, particularly the personal information and copyright sections.
[Company Name] Generative AI Training Data Disclosure Published pursuant to California Assembly Bill 2013. Last updated: [date].
Sources and owners: Training datasets include [categories: publicly available web text / licensed datasets from third-party providers / user-generated content / proprietary data collected by the company]. [Add or remove categories as accurate.]
Purpose: These datasets are used to develop [system name]'s ability to [generate text / images / code / other].
Volume: Approximately [range, e.g., hundreds of billions] of tokens / data points across all training datasets.
Copyright status: Training data includes content protected by copyright. [Describe: we license this content / we rely on fair use / training data is limited to public domain sources.]
Licensing: Training data was obtained through [licensed agreements / public domain sources / a combination of both].
Personal information: Training data [includes / may include] personal information as defined under California law, sourced from [general category, e.g., publicly available internet content]. We [take / have taken] steps to [filter / minimize] the inclusion of sensitive personal information.
Aggregate consumer information: Training data [does / does not] include aggregate consumer information.
Direct collection: Training data [was / was not] collected directly from individuals.
Synthetic data: [Synthetic data generation was / was not] used in developing this system.
Collection dates: Training data was collected between approximately [date range].
Countries: Training data was sourced from content originating in [countries or regions, e.g., the United States, European Union member states, and other countries].
Checklist
- Determined whether your product is covered (developer of generative AI system vs. application layer on API)
- If fine-tuned: disclosure covers fine-tuning data separately from base model
- All 12 disclosure categories reviewed against actual training data
- Disclosure page created and publicly accessible on company website
- Disclosure posted before system was made available in California
- Process established to update disclosure when substantial modifications occur
- Trade secret analysis completed for any categories where disclosure feels sensitive
- Legal counsel reviewed before publication
Related Reading
- California SB 942 AI transparency act: August 2026 compliance
- AI training data copyright and fair use ruling 2026
- Privacy-first AI APIs that don't train on your data
- Multi-state AI compliance strategy 2026
- AI governance guide for small teams
- Amazon KDP content types AI disclosure guide 2026
- Synthetic data governance and GDPR compliance 2026
