Verification and Validation Agents
The V&V Problem That Agents Address
Verification and validation (V&V) is one of the most resource-intensive activities in engineering. The goal is straightforward: confirm that the system meets its requirements (verification) and that it fulfills its intended purpose (validation). The execution is anything but straightforward — determining what to test, generating test cases, running the tests, interpreting results, identifying gaps, and iterating until confidence is sufficient.
The challenge scales with system complexity. A simple component might have dozens of requirements and a handful of test cases. A complex system has thousands of requirements, intricate interdependencies, and a combinatorial explosion of test conditions. Engineers cannot manually generate all relevant test cases, cannot manually identify all coverage gaps, and cannot manually trace all the connections between requirements, design decisions, and verification evidence.
V&V agents address this by automating the reasoning that drives verification activities: what to test, how to test it, what the results mean, and where to focus next. Unlike automated test execution (which runs predefined tests), V&V agents reason about verification strategy — they decide what testing is needed, not just how to run it.
Automated Test Generation
Test generation is the most mature capability of V&V agents. Given a formal or semi-formal specification of requirements, the agent generates test cases designed to verify compliance.
From requirements to test cases. The agent parses requirements — structured text, formal specifications, or model-based requirements from the system model (DGTLENG 101) — and generates test cases that exercise the requirement's conditions. A requirement like "the system shall maintain temperature between 10C and 50C during all operational modes" generates test cases at the boundaries (10C, 50C), at nominal conditions, and at transitions between operational modes.
Boundary value analysis at scale. For systems with many interacting parameters, the boundary conditions form a high-dimensional surface. An agent can systematically generate test points along this surface — corner cases, edge cases, and combined boundary conditions that manual test planning would miss. A system with 20 parameters each having upper and lower bounds has over a million corner cases. The agent does not test all of them — it uses engineering risk assessment to prioritize — but it identifies the corner cases that human planners would not consider.
Property-based test generation. Instead of testing specific input-output pairs, the agent generates tests that check invariant properties: "the output is always positive," "the response time never exceeds the threshold," "the system always returns to safe mode after a fault." The agent generates random or structured inputs and checks whether the property holds, searching for violations.
What the agent cannot do: Generate tests for requirements it cannot parse. Vague, ambiguous, or contradictory requirements (DGTLENG 201) produce vague, irrelevant, or contradictory tests. The quality of the agent's test generation is bounded by the quality of the requirements it works from. This creates a feedback loop — the agent's inability to generate clear tests for a vague requirement is itself diagnostic of a requirements problem.
Model Checking and Consistency Analysis
Beyond generating tests for execution, V&V agents can perform static analysis on models — checking for properties that should always hold without running simulations.
Structural consistency. The agent traverses the system model checking: every requirement has at least one design allocation, every interface has matching ports on both sides, no circular dependencies exist in the hierarchy, all parametric constraints evaluate without errors, and naming conventions are followed. These are the engineering equivalent of linting rules for code, and they catch errors that accumulate silently in large models.
Behavioral property checking. For models with behavioral specifications (state machines, activity diagrams, sequence diagrams), the agent can check properties like: every state is reachable, no deadlock states exist, all fault conditions have defined recovery paths, and timing constraints are satisfiable. This is formal verification applied to engineering models — exhaustive checking of all possible states rather than testing a sample.
Cross-discipline consistency. The agent checks that the structural model, the thermal model, and the electrical model agree on shared parameters — component dimensions, material properties, interface loads. Inconsistencies between discipline models are a chronic source of integration problems. An agent that continuously monitors cross-discipline consistency catches these before they propagate into detailed design.
What the agent cannot do: Check properties that are not formalized. If the model does not capture a constraint, the agent cannot verify it. Model checking is only as complete as the model itself. The agent also cannot assess whether the model correctly represents the physical system — that validation requires physical testing and engineering judgment.
Coverage Analysis: Finding What You Have Not Tested
Perhaps the most valuable capability of V&V agents is identifying what has not been verified — the gaps in coverage that represent the highest risk.
Requirements coverage. The agent maps verification evidence (test results, analysis results, inspection records) to requirements. Requirements without adequate evidence are flagged. Requirements with evidence that is partial (some conditions tested but not all) are highlighted. The agent computes coverage metrics and identifies the requirements that are most under-tested relative to their criticality.
Parameter space coverage. For requirements that depend on continuous parameters (temperature, load, speed), the agent assesses how well the tested conditions span the parameter space. A requirement tested only at room temperature is not verified at the temperature extremes. The agent identifies the regions of the parameter space where no evidence exists and recommends tests to fill the gaps.
Scenario coverage. Complex systems operate under combinations of conditions — operating modes, environmental conditions, failure states, transition sequences. The agent generates a combinatorial model of the scenario space and maps existing tests to it. Untested combinations are identified and prioritized by risk.
What the agent cannot do: Determine whether coverage is sufficient. Coverage metrics tell you what percentage of conditions have been tested. Whether that percentage is adequate for the program's risk tolerance is an engineering and programmatic judgment. The agent provides the data; the engineer (and the program's risk framework) makes the sufficiency decision.
Risk-Informed Testing: The Agent Decides What to Test Next
The most advanced V&V agents do not just identify coverage gaps — they prioritize which gaps to fill based on risk, cost, and available resources.
Risk scoring. The agent combines multiple factors to score each untested condition: the severity of the associated failure mode (from the safety analysis), the likelihood that the design is marginal in that region (from simulation results or surrogate predictions), and the cost of testing (equipment, time, resource availability). High-severity, high-likelihood, low-cost conditions are tested first.
Adaptive test planning. As test results come in, the agent updates its risk scores and adjusts the test plan. A test that reveals marginal performance in one region triggers additional tests in that region. A test that shows large margin in another region reduces the priority of further testing there. The test plan evolves as information accumulates.
Resource-constrained scheduling. Given a fixed test budget (time, lab availability, personnel), the agent optimizes the test sequence to maximize the reduction in risk per unit of resource spent. This is a planning and optimization problem that the agent can solve — the constraints are quantifiable and the objective (maximize risk reduction) is definable.
What the agent cannot do: Make the final accept/reject decision. A V&V agent can report that 97% of requirements are verified, that the remaining 3% are low-risk, and that additional testing is not cost-effective. The decision to accept this risk and proceed to the next program phase is a human decision that involves programmatic, regulatory, and ethical considerations beyond the agent's scope.
Explore each V&V agent capability. Notice the pattern: agents excel at systematic, exhaustive analysis (coverage mapping, consistency checking, boundary generation) and fall short where engineering judgment is required (sufficiency decisions, physical fidelity assessment, risk acceptance). The agent does the work that is tedious at scale; the engineer makes the decisions that require context and accountability.
Assessment
A V&V agent identifies that 15 out of 200 requirements for a subsystem have no verification evidence. Of these 15, the agent's risk scoring ranks 3 as high-severity (associated with safety-critical failure modes) and 12 as low-severity (associated with operational convenience). Given a limited test budget, which response is appropriate? (Select all that apply)
Select all that apply
Consider a system you are familiar with that undergoes formal verification. Describe: (1) how verification activities are currently planned — who decides what to test and how, (2) where the biggest coverage gaps typically appear and why they are missed by manual planning, (3) which V&V agent capabilities (test generation, model checking, coverage analysis, risk-informed prioritization) would provide the most value and why, and (4) what the V&V agent should NOT be trusted to do and where human judgment must remain in the loop.