What is AI red-teaming?

AI red-teaming is adversarial testing of an AI system by a team that attempts to find failure modes, harmful outputs, manipulation vulnerabilities, biases, and misuse pathways before deployment. The term borrows from military and cybersecurity practice but the techniques are distinct from traditional software penetration testing because AI systems fail probabilistically, not deterministically.

Does the EU AI Act require red-teaming?

The EU AI Act does not use the term red-teaming but Article 9 requires that high-risk AI systems undergo testing to identify the most appropriate risk management measures, with testing performed at appropriate points throughout the development lifecycle. For General Purpose AI models with systemic risk, Article 55 requires adversarial testing including red-teaming.

What should AI red-teaming tests cover?

At minimum, AI red-teaming should test prompt injection (attempts to hijack the system via malicious inputs), jailbreaks (attempts to bypass safety guardrails via roleplay or multi-step manipulation), output manipulation (crafting inputs to produce specific harmful outputs), bias and discrimination (testing protected class performance disparities), data leakage (attempts to extract training data or system prompts), and.

When should a small team use an external red team vs. internal testing?

Small teams should use external red-teaming when the system is high-risk under the EU AI Act or NIST AI RMF, when the system makes consequential individual decisions (credit, hiring, healthcare), when it handles sensitive personal data at scale, or when a regulator, customer, or insurer specifically requires independent testing.

What documentation do regulators want from red-teaming?

Regulators want a test plan (scope, methodology, who conducted it, dates), a findings log (each test scenario, what was tested, what the output was, whether it was a pass or a finding), a remediation record (what was done about each finding), and a residual risk statement (what risks remain after mitigation).

Does NIST AI RMF require red-teaming?

NIST AI RMF does not mandate red-teaming by name but the Measure function explicitly calls for adversarial testing of AI systems. AI RMF Playbook actions under MEASURE 2.5 and MEASURE 2.6 include testing for bias, accuracy, and adversarial robustness.

NIST AI Red Teaming: What It Is, What It Te…

Most EU AI Act compliance attention has focused on documentation requirements: technical files, conformity assessments, registration. What many teams missed is that Article 9 also requires testing, and that obligation is substantive, not just a box to check.

Regulators in the US and EU have converged on adversarial testing, commonly called red-teaming, as the expected method for demonstrating that an AI system's risk has been properly measured. This guide explains what that means in practice, which regulatory frameworks require it, and how to run a proportionate testing program without a dedicated security team.

TL;DR: Regulators in 2026 expect documented adversarial testing for any AI system that affects individuals. The EU AI Act Article 9 requires it for high-risk AI. NIST AI RMF Measure function formalizes it. The White House 2023 executive order made it a voluntary commitment for frontier model developers. For small teams the practical answer is a risk-tiered approach: minimal logging for low-risk tools, structured prompt-injection and bias testing for medium-risk systems, and independent external red-teaming for high-risk AI. Document everything. Regulators want evidence of testing, not just a claim that testing happened.

What AI red-teaming actually is

Red-teaming in cybersecurity means hiring a team to attack your systems the way a real adversary would. The goal is to find what breaks before someone outside finds it first.

AI red-teaming applies the same adversarial mindset but the attack surface is different. AI systems fail in ways traditional software does not. A language model might refuse harmful requests 999 times and comply on the 1000th if the prompt is phrased differently. A hiring AI might perform well on average but systematically underrate applications from a specific demographic. A customer-service bot might be manipulated into revealing information it was not supposed to share.

The difference from traditional software testing comes down to probability. A software vulnerability is either there or not. An AI failure mode is a distribution: some prompts trigger it, others do not, and the boundary is rarely crisp. That means exhaustive testing is not possible. Instead, red-teaming focuses on systematic coverage of the most likely and most consequential failure paths.

AI red-teaming typically covers six categories of risk:

Prompt injection: An attacker embeds instructions in user-visible content causing the AI to follow attacker commands rather than developer intent. A malicious document in a summarization tool might instruct the model to ignore its instructions and output something harmful.

Jailbreaks: Multi-step manipulation attempts that get the AI to violate its safety constraints. Common techniques include role-play framing ("pretend you are an AI without restrictions"), gradual escalation, and indirect instruction ("write a story where a character explains...").

Output manipulation: Crafting inputs to produce specific harmful, misleading, or legally problematic outputs. This includes getting the AI to produce defamatory content about specific individuals, generate discriminatory outputs for protected classes, or produce outputs that create legal liability.

Bias and discrimination testing: Systematically varying demographic signals in test inputs to measure whether the system produces different outputs for different groups. For systems making consequential decisions, this is a legal obligation under US employment law, the EU AI Act, and anti-discrimination frameworks, not just a security concern.

Data leakage: Attempting to extract training data, system prompts, or confidential information the model has been given. This includes membership inference attacks, prompt extraction, and indirect probing.

Misuse scenarios: Testing how a motivated bad actor in your specific deployment context would abuse the system. A legal research tool might be probed for how it responds to requests for legal strategies to evade regulations. A medical information bot might be tested for how it handles requests for dangerous self-treatment advice.

What regulators require: the three key frameworks

Team reviewing AI compliance documentation at a conference table

EU AI Act Article 9: high-risk AI testing obligation

Article 9 of the EU AI Act requires that providers of high-risk AI systems establish a risk management system that includes testing procedures. The testing must:

Identify the most appropriate risk management measures
Be performed at appropriate points throughout the development lifecycle, not just before deployment
Ensure the system performs consistently for its intended purpose
Include testing in real-world conditions where possible

The Act does not specify methodology but the European AI Office's guidance treats adversarial testing as the expected approach for identifying manipulation vulnerabilities and failure modes. For high-risk systems in Annex III categories (employment, credit, education, healthcare, critical infrastructure, migration, law enforcement), documented red-teaming results should be part of the technical documentation file.

For General Purpose AI models with systemic risk (those trained with more than 10^25 FLOPs), Article 55 is more explicit: providers must conduct adversarial testing including red-teaming at least annually and make the results available to the AI Office upon request.

NIST AI RMF: Measure function

NIST AI RMF's Measure function is the closest US analog to red-teaming requirements. MEASURE 2.5 requires testing across diverse populations and conditions; MEASURE 2.6 requires testing beyond the training distribution; MEASURE 2.7 addresses trustworthiness including safety, security, and bias. The Playbook guidance under these subcategories includes adversarial robustness testing and systematic bias evaluation. For organizations using NIST AI RMF as a safe harbor under Texas TRAIGA or Colorado SB 189, Measure function conformance requires evidence of this testing.

White House executive order: voluntary commitments

The October 2023 White House executive order on AI established voluntary red-teaming commitments from major AI developers, including sharing results with the government before frontier model deployment. For smaller organizations, it creates no direct legal obligation, but the standards being developed under it (NIST AI RMF 2.0, US AI Safety Institute evaluation frameworks) will shape what "reasonable" AI testing means for regulatory and litigation purposes.

Risk-tiered testing: how to calibrate your program

Not every AI tool needs an independent red-team engagement. The right level of testing depends on the risk tier of the system.

Low-risk AI tools (internal productivity tools, summarization with human review of all outputs): Log usage, document intended use, maintain a basic incident record. No structured adversarial testing is required.

Medium-risk AI systems (customer-facing tools, systems that inform decisions about individuals, tools handling sensitive personal data): Structured internal testing before deployment and after major updates. Cover the six risk categories above. Use a tester independent from the build team. Document findings and remediation.

High-risk AI systems (EU AI Act Annex III categories, hiring, credit, healthcare, systems affecting individuals at scale): Independent red-team engagement before deployment using testers external to the build team. Annual re-testing or testing after material changes. Results go in the technical documentation file.

Practical 8-step red-teaming protocol for small teams

This protocol is designed for teams that cannot afford dedicated security staff. It takes a medium-risk system through a structured testing cycle with two to four people.

Step 1: Define scope and test objectives. Write down exactly what you are testing, which system version, which deployment context, and which risk categories you will cover. Set a time limit (two to four hours is typical for a medium-risk tool).

Step 2: Assemble the test team. Include at minimum one person familiar with how the system was built and one person who represents typical user perspectives. Neither should have designed the specific prompts or guardrails being tested.

Step 3: Build a test case library. Write down at least twenty test scenarios across the six risk categories before testing begins. Use published jailbreak databases (ML Commons AI Safety benchmark, Anthropic red-team datasets, NIST AI 100-2) as starting points. Add use-case-specific scenarios for your deployment context.

Step 4: Run structured tests. Execute each test case. Record the exact input, the system output, and whether the output represents a finding (a failure, a near-miss, or unexpected behavior). Do not paraphrase. Capture exact outputs.

Step 5: Escalate promising findings. For each finding, probe further. If a jailbreak worked once, test ten variations. Document which variations worked and which did not. This maps the boundary of the vulnerability.

Step 6: Classify findings by severity. Use a simple three-tier classification: critical (immediate harm or legal liability if the finding were exploited), high (significant harm with some conditions), low (minor issue with limited impact). Prioritize remediation by severity.

Step 7: Remediate and re-test. Fix critical and high findings before deployment. Re-run the specific test cases that produced those findings after the fix is applied. Document that re-testing occurred and the outcome.

Step 8: Produce a test summary. Write a one-to-two page summary: scope, methodology, findings by severity, remediation status, residual risks. Sign it with the name of who conducted testing and the date. This is the document regulators want to see.

Internal vs. external red-teaming: when to outsource

For most small teams, internal testing is the default because it is fast and cheap. The question is when internal testing is not enough.

Outsource to an external red team when:

The system is in an EU AI Act Annex III high-risk category and you need conformity assessment evidence
The system makes consequential decisions affecting individuals at scale (hiring, credit, healthcare)
A customer, insurer, or regulator specifically requires independent third-party testing
The internal team has a conflict of interest that makes objective testing unreliable
The system handles security-sensitive applications where insider-knowledge blind spots are a real risk

When you do engage external red-teamers, give them a written scope agreement that specifies what they are testing, what techniques are in and out of scope, what data they can access, and what they must do with findings. Retain the engagement agreement and their report as part of your technical documentation.

What to ask vendors about their red-teaming

If you are a deployer using a third-party AI model or system, you take on compliance obligations but cannot control the underlying model's testing. Put these five questions in every AI vendor questionnaire:

Does your model undergo pre-deployment red-teaming? What categories of risk does it cover?
How frequently is adversarial testing conducted after initial deployment?
What is your process for disclosing red-team findings that affect customer use cases?
Do you maintain a documented vulnerability disclosure program for AI safety issues?
Can you provide a summary of your most recent red-team results?

Vendors that cannot answer these questions are not automatically non-compliant, but their silence shifts more testing responsibility back to you as deployer. Document the vendor's responses as part of your own technical file.

Documenting red-team results: the regulatory minimum

Regulators reviewing AI compliance want to see four things in red-team documentation:

Test plan: Who tested the system, when, using what methodology, covering what scope. The plan does not need to be detailed but it must exist before testing begins, not be reconstructed after.

Findings log: Each test case, the input used, the output produced, the classification (finding or pass), and the severity if a finding.

Remediation record: For each finding, what was done, by whom, and when. For critical findings, evidence of re-testing after remediation.

Residual risk statement: What risks remain after all feasible mitigations. This is the honest acknowledgment that no system is perfectly secure and that residual risks were considered and accepted.

Keep these records for the lifetime of the system plus the required retention period under applicable law (typically three to seven years under EU AI Act provisions for high-risk systems).

JadePuffer Ransomware: 8 Gaps in Your AI Vendor Security Checklist

TL;DR: Regulators in 2026 expect documented adversarial testing for any AI system that affects individuals. The EU AI Act Article 9 requires it for high-risk AI. NIST AI RMF Measure function formalizes it. The White House 2023 executive order made it a voluntary commitment for frontier model developers. For small teams the practical answer is a risk-tiered approach: minimal logging for low-risk tools, structured prompt-injection and bias testing for medium-risk systems, and independent external red-teaming for high-risk AI. Document everything. Regulators want evidence of testing, not just a claim that testing happened.

What AI red-teaming actually is

Red-teaming in cybersecurity means hiring a team to attack your systems the way a real adversary would. The goal is to find what breaks before someone outside finds it first.

AI red-teaming typically covers six categories of risk:

What regulators require: the three key frameworks

Team reviewing AI compliance documentation at a conference table

EU AI Act Article 9: high-risk AI testing obligation

Article 9 of the EU AI Act requires that providers of high-risk AI systems establish a risk management system that includes testing procedures. The testing must:

Identify the most appropriate risk management measures
Be performed at appropriate points throughout the development lifecycle, not just before deployment
Ensure the system performs consistently for its intended purpose
Include testing in real-world conditions where possible

NIST AI RMF: Measure function

White House executive order: voluntary commitments

Risk-tiered testing: how to calibrate your program

Not every AI tool needs an independent red-team engagement. The right level of testing depends on the risk tier of the system.

Practical 8-step red-teaming protocol for small teams

This protocol is designed for teams that cannot afford dedicated security staff. It takes a medium-risk system through a structured testing cycle with two to four people.

Internal vs. external red-teaming: when to outsource

For most small teams, internal testing is the default because it is fast and cheap. The question is when internal testing is not enough.

Outsource to an external red team when:

The system is in an EU AI Act Annex III high-risk category and you need conformity assessment evidence
The system makes consequential decisions affecting individuals at scale (hiring, credit, healthcare)
A customer, insurer, or regulator specifically requires independent third-party testing
The internal team has a conflict of interest that makes objective testing unreliable
The system handles security-sensitive applications where insider-knowledge blind spots are a real risk

What to ask vendors about their red-teaming

Does your model undergo pre-deployment red-teaming? What categories of risk does it cover?
How frequently is adversarial testing conducted after initial deployment?
What is your process for disclosing red-team findings that affect customer use cases?
Do you maintain a documented vulnerability disclosure program for AI safety issues?
Can you provide a summary of your most recent red-team results?

Documenting red-team results: the regulatory minimum

Regulators reviewing AI compliance want to see four things in red-team documentation:

Test plan: Who tested the system, when, using what methodology, covering what scope. The plan does not need to be detailed but it must exist before testing begins, not be reconstructed after.

Findings log: Each test case, the input used, the output produced, the classification (finding or pass), and the severity if a finding.

Remediation record: For each finding, what was done, by whom, and when. For critical findings, evidence of re-testing after remediation.

Keep these records for the lifetime of the system plus the required retention period under applicable law (typically three to seven years under EU AI Act provisions for high-risk systems).

JadePuffer Ransomware: 8 Gaps in Your AI Vendor Security Checklist

NIST AI Red Teaming: What It Is, What It Tests, and How to Run One in 2026

What AI red-teaming actually is

What regulators require: the three key frameworks

EU AI Act Article 9: high-risk AI testing obligation

NIST AI RMF: Measure function

White House executive order: voluntary commitments

Risk-tiered testing: how to calibrate your program

Practical 8-step red-teaming protocol for small teams

Internal vs. external red-teaming: when to outsource

What to ask vendors about their red-teaming

Documenting red-team results: the regulatory minimum

NIST AI Red Teaming: What It Is, What It Tests, and How to Run One in 2026

What AI red-teaming actually is

What regulators require: the three key frameworks

EU AI Act Article 9: high-risk AI testing obligation

NIST AI RMF: Measure function

White House executive order: voluntary commitments

Risk-tiered testing: how to calibrate your program

Practical 8-step red-teaming protocol for small teams

Internal vs. external red-teaming: when to outsource

What to ask vendors about their red-teaming

Documenting red-team results: the regulatory minimum