DGTLENG 206 · Lesson 4 of 5

NLP for Engineering Documents

Engineering Text Is Not General Text

Engineering organizations run on documents: requirements specifications, design standards, test reports, maintenance logs, failure analyses, certification packages, interface control documents, safety assessments. These documents accumulate over decades and contain critical institutional knowledge — but they are written in a dialect that general-purpose natural language processing struggles with.

Engineering text has specific characteristics that distinguish it from the news articles, social media posts, and web pages that most NLP models are trained on.

Dense technical vocabulary. "The bracket shall withstand a limit load of 15,000 lbf at the lug interface without permanent deformation" contains terms (limit load, lug interface, permanent deformation) with precise engineering meanings that differ from common usage. "Limit" does not mean "boundary." "Permanent" does not mean "long-lasting." These terms have specific definitions in standards documents, and misinterpreting them changes the requirement.

Implicit structure. Requirements documents follow conventions that encode meaning in position, numbering, and formatting. Requirement 3.2.4.1 is a child of 3.2.4, which is a child of 3.2, which belongs to Section 3. This hierarchy is not just organizational — it reflects decomposition from system to subsystem to component requirements. NLP that ignores this structure loses the decomposition relationships that give individual requirements their context.

Cross-referencing. "Per MIL-STD-810H, Method 514.8" is a pointer to a specific vibration testing methodology defined in an external standard. Engineering text is dense with such references — to standards, to other sections, to external documents, to previous analyses. Understanding a requirement often requires resolving these references, which general NLP cannot do without domain-specific knowledge.

Ambiguity with consequences. In casual text, ambiguity is normal and usually harmless. In engineering requirements, ambiguity is a defect. "The system shall operate at high temperatures" is ambiguous: what temperature? What does "operate" mean — full performance, or degraded mode? Who determines "high"? Ambiguous requirements propagate through design, analysis, and test — each downstream engineer interprets the ambiguity differently, and the inconsistencies surface only when the interpretations collide (usually during test, the most expensive time to discover a design problem).

In MBSE-driven organizations, the system model provides the structured backbone — requirements are allocated to components, interfaces are defined, verification methods are specified. But the natural language content within those model elements still matters. The requirement text is what engineers read, review, and implement. AI can bridge the gap between the model's structure and the document's content.

What NLP Enables for Engineering

NLP applied to engineering documents enables four primary capabilities, each with distinct value, distinct risks, and distinct roles for human oversight. The value is not in replacing engineers but in scaling their attention — processing thousands of pages to surface the items that require expert judgment.

Ambiguity Detection

Requirements ambiguity is one of the most persistent and costly problems in systems engineering. Studies consistently show that 30 to 50 percent of requirements defects trace to ambiguity in the original specification. Catching ambiguity during requirements review, rather than during integration test, can reduce rework costs by an order of magnitude.

NLP-based ambiguity detection identifies linguistic patterns associated with unclear requirements:

Vague adjectives and adverbs: "sufficient," "adequate," "reasonable," "high," "fast," "minimal" — words that convey judgment without specifying a measurable threshold
Passive voice without agents: "The data shall be processed" — by whom? By what? The passive construction hides the responsible entity
Unbounded lists: "including but not limited to" — this phrase makes the requirement untestable because the scope is undefined
Implicit assumptions: "The system shall resume normal operation after a fault" — what is "normal operation"? How quickly? What if the fault damaged hardware?
Missing units or ranges: "The temperature shall not exceed the rated value" — what rated value? Rated by whom? Where is it defined?

Rule-based NLP (pattern matching, part-of-speech tagging) catches many of these patterns reliably. Large language models add contextual understanding — recognizing that "high temperature" is ambiguous in a thermal requirement but not in a general description of operating environments. The combination of rule-based screening (high recall, catching all potential ambiguities) with LLM-based filtering (higher precision, reducing false positives) is more effective than either approach alone.

Specification Extraction

Engineering documents contain structured information buried in prose. A test report might contain hundreds of measured values, pass/fail determinations, and test conditions scattered across tables, paragraphs, and figure captions. Extracting this information manually to populate a database or a model is tedious, error-prone, and a bottleneck in data-driven workflows.

NLP-based extraction identifies and structures specific information types from unstructured text:

Parameter extraction: identifying numerical values with their associated parameters, units, and conditions ("yield strength of 276 MPa at 25 degrees Celsius")
Requirement parsing: decomposing a natural language requirement into structured fields (subject, action, object, constraint, verification method)
Entity recognition: identifying references to components, systems, standards, tests, and personnel within engineering text
Relationship extraction: identifying connections between entities ("Component A interfaces with Component B through Interface C per ICD-1234")

The extracted information feeds into the system model — populating properties, creating traceability links, and connecting document content to model structure. This is where NLP directly supports MBSE: it converts the unstructured content that engineers write into the structured data that the model requires.

Compliance Checking

Engineering products must comply with standards, regulations, and contractual requirements. Compliance verification involves checking that every applicable requirement from every applicable standard has been addressed in the design, analysis, or test documentation. This is a combinatorial problem: a single product might reference 50 standards containing thousands of requirements, each of which must be traced to evidence of compliance.

NLP assists compliance checking by:

Requirement matching: identifying which paragraphs in a standard are applicable to a given product based on its characteristics (operating environment, materials, criticality level)
Evidence mapping: linking compliance evidence (test reports, analysis results, design documents) to specific standard requirements
Gap detection: identifying standard requirements that have no linked evidence — compliance gaps that might otherwise be discovered during certification review
Change impact analysis: when a standard is revised, identifying which existing compliance evidence is affected and may need to be updated

Compliance checking is a domain where AI assistance has high leverage: the task is rule-intensive, document-intensive, and error-prone when done manually. But the consequences of incorrect compliance determination are severe — a product declared compliant when it is not can fail in service, with regulatory and safety consequences. Human review of AI-assisted compliance determinations is not optional.

Report Generation

Engineering reports follow standardized structures: test reports, design reviews, trade studies, failure analyses. Much of the content is formulaic — describing the test setup, summarizing the configuration, listing the applicable requirements, tabulating results. NLP and LLM-based generation can draft these sections from structured data (model properties, test results, analysis outputs), producing a first draft that engineers then review, correct, and supplement with interpretation.

The value is not in replacing the engineering analysis — it is in reducing the hours spent formatting, cross-referencing, and writing boilerplate prose that conveys information already captured in the model and in tool outputs. An engineer who spends 40 hours writing a test report might spend 4 hours reviewing and correcting a generated draft — if the generation is good enough to be a useful starting point.

For each capability, notice the pattern: AI scales the engineer's attention, but the engineer retains responsibility for correctness. The human review is not a formality — it is the quality gate. The AI's contribution is converting a needle-in-a-haystack search (find the ambiguity in 2,000 requirements) into a focused review task (evaluate these 47 flagged items).

What LLMs Changed

Large language models (GPT-class, Claude-class) represent a step change in what NLP can do with engineering text. Traditional NLP required extensive task-specific training: a custom model for ambiguity detection, a different custom model for entity extraction, another for classification. Each required labeled training data from the engineering domain, which is expensive to create and difficult to maintain.

LLMs bring three capabilities that traditional NLP lacked:

Zero-shot and few-shot learning. An LLM can perform a task — classify a requirement, extract a parameter, summarize a test report — with only a natural language description of the task and a few examples. No custom training, no labeled dataset, no ML pipeline. This dramatically lowers the barrier to deploying NLP in engineering contexts where labeled data is scarce and the tasks are diverse.

Contextual reasoning. Traditional NLP operates on surface patterns — word frequencies, part-of-speech tags, syntactic structures. LLMs can reason about meaning in context. Given a requirement "the system shall operate in extreme environments," an LLM can explain why this is ambiguous (no definition of "extreme," no enumeration of environments, no performance specification for each environment) in a way that is useful for the engineer reviewing it. This contextual reasoning is particularly valuable for cross-referencing and compliance tasks where understanding the relationship between documents matters.

Natural language interface. Engineers can interact with NLP capabilities through natural language prompts rather than through programming interfaces. "Find all requirements in this specification that reference thermal performance but do not specify a temperature range" is a query that a traditional NLP system cannot parse but an LLM can answer — imperfectly, but usefully.

These capabilities have made engineering NLP accessible to organizations that could never have built custom NLP systems. But they come with a fundamental limitation that is more dangerous in engineering than in most other domains.

Limitations: Hallucination in Safety-Critical Contexts

LLMs generate text that is statistically plausible given the prompt and the model's training data. They do not verify facts, check references, or validate reasoning. When an LLM "extracts" a specification value from a document, it may be reading the document correctly — or it may be generating a plausible-looking value that does not appear in the document at all. This is hallucination, and it is the central risk of LLM deployment in engineering.

Why hallucination is uniquely dangerous in engineering:

In a customer service chatbot, a hallucinated fact might irritate a customer. In an engineering context, a hallucinated material property that enters the system model and feeds into a stress analysis can lead to a structural failure. The consequences scale with the criticality of the application — and engineering applications are frequently safety-critical.

Hallucination is not random. LLMs hallucinate more in specific circumstances:

When asked about specific numerical values (material properties, test results, dimensional tolerances) — the model generates plausible numbers rather than admitting it does not have the information
When the prompt requests information that is not present in the provided context — the model fills the gap with generated content rather than stating the gap exists
When asked to make inferences that require domain expertise the model does not reliably have — the model produces confident-sounding but incorrect reasoning

Mitigation strategies for engineering deployment:

Retrieval-augmented generation (RAG) provides the LLM with the actual source documents as context, reducing (but not eliminating) hallucination by grounding the response in retrieved text. The LLM answers based on what it reads, not what it remembers — but it can still misinterpret what it reads.

Citation requirements force the LLM to indicate which source document and section support each claim. If the model cannot cite a source, the claim is flagged for verification. This does not prevent hallucination but makes it detectable.

Structured output validation checks LLM outputs against known constraints — extracted values must fall within physically possible ranges, referenced standards must exist, component names must match the model's component hierarchy. Any output that violates a known constraint is rejected.

Human-in-the-loop verification remains the final safeguard. For safety-critical applications, every AI-generated claim that enters the engineering record must be verified by a qualified engineer against the source material. The AI saves time by drafting and extracting; the engineer ensures correctness by reviewing and approving.

The organizational principle: LLMs in engineering should be deployed as advisory tools that assist engineers, not as autonomous tools that produce trusted output without review. The level of review should be proportional to the criticality of the application — higher criticality demands more rigorous verification of AI outputs. This mirrors the advisory-before-autonomous principle from the deployment maturity framework in Lesson 5.

Assessment

Question 1 of 3Score: 0

An NLP tool flags the following requirement as ambiguous: 'The control system shall respond to operator commands within an acceptable timeframe.' Which aspects make this requirement ambiguous? (Select all that apply)

Select all that apply

Implement a simplified requirements ambiguity detector in Python. Given a list of engineering requirements (as strings), scan each requirement for common ambiguity indicators: vague qualifiers (sufficient, adequate, reasonable, appropriate, high, low, fast, slow, minimal, excessive), passive voice without agents (shall be + past participle without a 'by' clause), and missing quantification (requirements containing 'shall' but no numbers, units, or measurable thresholds). Return a structured report listing each requirement, the ambiguity types found, and the specific words or patterns that triggered each flag. This simulates the rule-based layer of a hybrid ambiguity detection system.

python

import re

# Sample engineering requirements to analyze
requirements = [
  "REQ-001: The structure shall withstand a limit load of 15000 lbf without permanent deformation.",
  "REQ-002: The system shall respond to operator inputs in a reasonable timeframe.",
  "REQ-003: Adequate thermal protection shall be provided for all external surfaces.",
  "REQ-004: The data shall be transmitted to the ground station.",
  "REQ-005: The control software shall process sensor data at a rate of 100 Hz with latency not exceeding 10 ms.",
  "REQ-006: The system shall maintain sufficient accuracy during high-vibration environments.",
  "REQ-007: The battery shall provide a minimum of 8 hours of continuous operation at full rated power.",
  "REQ-008: The component shall be designed to minimize weight.",
]

VAGUE_QUALIFIERS = [
  "sufficient", "adequate", "reasonable", "appropriate",
  "high", "low", "fast", "slow", "minimal", "excessive",
  "normal", "typical", "acceptable", "suitable"
]

def check_vague_qualifiers(requirement_text):
  """
  Check for vague adjectives/adverbs that lack measurable definitions.
  Return a list of found vague terms.
  """
  # YOUR CODE: search for vague qualifiers in the text
  pass

def check_passive_voice(requirement_text):
  """
  Check for passive voice constructions without agents.
  Look for 'shall be [past participle]' without a following 'by' clause.
  Return True if passive voice without agent is detected, with the matched phrase.
  """
  # YOUR CODE: use regex to find passive constructions
  pass

def check_missing_quantification(requirement_text):
  """
  Check if a requirement contains 'shall' but no numbers or units.
  Return True if the requirement appears to lack measurable criteria.
  """
  # YOUR CODE: check for presence of numbers, units, or thresholds
  pass

def analyze_requirements(requirements):
  """
  Analyze each requirement for ambiguity indicators.
  Return a list of dicts, each with:
    - 'id': requirement ID
    - 'text': requirement text
    - 'flags': list of {type, detail} ambiguity flags
    - 'risk_level': 'clear' if no flags, 'review' if 1 flag, 'rewrite' if 2+ flags
  """
  # YOUR CODE: run all checks on each requirement
  pass

# Run analysis and print report
results = analyze_requirements(requirements)
for r in results:
  status = r['risk_level'].upper()
  print(f"[{status}] {r['id']}")
  if r['flags']:
      for flag in r['flags']:
          print(f"  - {flag['type']}: {flag['detail']}")
  else:
      print(f"  No ambiguity indicators found.")
  print()