DGTLENG 304: Engineering Agentic Products
DGTLENG 304 · Lesson 4 of 5

Certification and Regulatory Frameworks

The Certification Challenge

Certification is the process by which a regulatory authority (or an organization's own assurance process) confirms that a product is acceptably safe for its intended use. For conventional products, this process is well-established: the product is designed to meet standards, tested against those standards, and certified based on the evidence.

For agentic products, the certification challenge is fundamental: the standards were not designed for products that make their own decisions.

Existing frameworks assume that product behavior can be fully specified before deployment, that testing can verify the specified behavior, and that a certified product behaves the same way throughout its service life. Agentic products violate all three assumptions — behavior emerges at runtime, testing cannot cover all scenarios, and learning products change their behavior over time.

This lesson examines the current regulatory landscape, its limitations, and the directions it is evolving.

Prescriptive Standards

The traditional approach: the standard specifies what the system shall do — specific design requirements, specific implementation constraints, specific test procedures.

How They Work

The standard defines detailed rules. The product developer demonstrates compliance with each rule. The certification authority reviews the evidence and grants certification.

Example — DO-178C (Software Considerations in Airborne Systems): This standard defines five software assurance levels (DAL A through E) based on the severity of failure conditions. For each level, it specifies required activities (requirements development, design, coding, testing), required evidence (documentation, traceability, test results), and required independence (who can verify what). A DAL-A system (catastrophic failure condition) requires 100% structural code coverage at the MC/DC (Modified Condition/Decision Coverage) level.

Example — DO-254 (Design Assurance Guidance for Airborne Electronic Hardware): Similar to DO-178C but for hardware — defining assurance activities for programmable logic (FPGAs, ASICs) used in airborne systems.

Strengths

Prescriptive standards provide clear, unambiguous compliance targets. Developers know exactly what they must do and produce. Certification authorities have objective criteria for approval. The process is mature, well-understood, and supported by decades of successful practice.

Limitations for Agentic Products

Prescriptive standards assume that behavior is fully specified and deterministic. DO-178C requires tracing every software requirement to test cases that verify it — which works when requirements specify behavior explicitly. For a learned perception model, what are the "requirements"? For a behavior tree that produces emergent behavior from composed sub-behaviors, how do you trace each emergent behavior to a requirement?

The coverage metrics (MC/DC, requirements coverage) measure how thoroughly the specified behavior is tested. They do not address unspecified behavior — which is precisely what agentic products exhibit.

Performance-Based Standards

A newer approach: the standard specifies what the system shall achieve — performance outcomes — without prescribing how to achieve them.

How They Work

The standard defines measurable performance targets. The product developer demonstrates that the product meets those targets. The certification authority evaluates the evidence — which may include testing, simulation, analysis, and field data.

Example — ISO 26262 (Automotive Functional Safety): Defines Automotive Safety Integrity Levels (ASIL A through D) based on severity, exposure, and controllability. Specifies safety goals and required processes but allows flexibility in how they are achieved. The developer defines the safety concept and demonstrates that the implementation achieves it.

Example — ISO 21448 (SOTIF — Safety of the Intended Functionality): Specifically addresses the challenge that a system may behave exactly as designed and still cause harm — because the design does not handle all situations correctly. SOTIF defines a process for identifying and mitigating these "insufficiencies" in the intended functionality. This is directly relevant to agentic products, where "working as designed" may not mean "safe in all conditions."

Example — FDA Software as a Medical Device (SaMD): Classifies software based on the significance of the information it provides to healthcare decisions and the state of the healthcare situation. Increasingly accommodates AI/ML-based systems through pre-determined change control plans that define the boundaries within which a learning system can evolve without requiring re-certification.

Strengths

Performance-based standards accommodate diverse implementation approaches. A developer can use rules, learned models, or hybrid architectures — as long as the performance targets are met. This flexibility is essential for agentic products where prescriptive compliance is impractical.

Limitations for Agentic Products

Performance targets must be measurable, which means they must be tested. Testing agentic products faces the same coverage challenges described in Lesson 3 — you cannot test all scenarios. Statistical performance metrics (e.g., "the system shall correctly classify objects with 99.9% accuracy") require defining the test distribution, which may not match the operational distribution.

Performance-based standards also face the question of what to measure. An autonomous vehicle that drives safely 99.99% of the time still causes harm in the 0.01%. Is that performance level acceptable? Who decides? These are not technical questions — they are societal questions that the standard alone cannot answer.

Adaptive Certification

An emerging approach: continuous monitoring replaces (or supplements) point-in-time certification.

How It Works

Instead of certifying a fixed product and assuming it remains compliant, adaptive certification establishes a continuous assurance process:

  1. Initial certification: The product passes an initial certification based on available evidence (testing, simulation, analysis)
  2. Operational monitoring: The product's behavior is continuously monitored during deployment. Runtime monitors (from Lesson 3) track safety-relevant properties.
  3. Learning governance: If the product learns and adapts, the learning process is governed — with defined bounds on what can change, validation gates before updates deploy, and monitoring after deployment.
  4. Continuous evidence: Operational data feeds back into the safety case, providing ongoing evidence that the product remains safe. The safety case is a living document updated with field evidence.
  5. Adaptive response: If monitoring detects degradation or new failure modes, the certification authority can require corrective action, restrict operational use, or suspend certification until the issue is resolved.

Strengths

Adaptive certification acknowledges the reality of agentic products: their behavior evolves, operational conditions are unpredictable, and pre-deployment verification is inherently incomplete. By building continuous assurance into the framework, it addresses the limitations that prescriptive and performance-based approaches cannot overcome.

Limitations

Adaptive certification is nascent. Regulatory authorities are exploring it but few have implemented it at scale. The infrastructure requirements are significant — continuous monitoring, data pipelines, automated safety case updating, and real-time communication between deployed products and certification authorities.

It also raises accountability questions. If a product is continuously monitored and a failure occurs, who is responsible — the manufacturer who designed it, the operator who deployed it, the certification authority who approved it, or the monitoring system that failed to detect the problem?

The Gap

The fundamental gap in the current regulatory landscape is this: existing standards were designed for products whose behavior is fully specified at design time. Agentic products violate that assumption.

Closing this gap requires:

  • New safety argumentation approaches — moving from "we tested all specified behaviors" to "we have a comprehensive, multi-layered assurance case that provides sufficient confidence in operational safety"
  • Statistical safety evidence — accepting probabilistic safety claims ("the probability of failure is below X per operating hour") alongside deterministic ones ("the system always does Y in condition Z")
  • Continuous assurance — supplementing point-in-time certification with ongoing monitoring and evidence collection
  • Cross-domain collaboration — the automotive, aerospace, medical, and defense sectors are all facing this challenge. Solutions developed in one sector should inform others

Certification Approaches

What it requiresCompliance with specific design rules, implementation constraints, and test procedures defined by the standard
ExamplesDO-178C (airborne software), DO-254 (airborne hardware) — define assurance activities, evidence, and coverage levels per criticality
StrengthClear, unambiguous compliance targets; mature and well-understood process
Gap for agentic productsAssumes behavior is fully specified and deterministic; coverage metrics do not address emergent or learned behavior

Assessment

Question 1 of 3Score: 0

Why are prescriptive standards (like DO-178C) insufficient for certifying agentic products? (Select all that apply)

Select all that apply

A medical robotics company is developing a surgical assistant that uses AI to identify and classify tissue in real time. The product must receive FDA clearance. The perception model is a deep neural network trained on surgical image data. Outline the certification strategy you would propose, addressing both what current regulatory frameworks offer and where you would need to make novel safety arguments.