The Engineering Data Problem
The Volume Is Not the Problem
Engineering organizations produce staggering amounts of data. A single aircraft program generates millions of requirements, thousands of CAD models, terabytes of simulation results, test data from hundreds of test events, manufacturing specifications for tens of thousands of parts, supplier qualification packages, operational telemetry from fielded assets, and maintenance records spanning decades. A semiconductor fab produces billions of metrology data points per week. An offshore platform generates continuous sensor streams from thousands of instruments.
None of that is the problem.
The problem is that this data lives in disconnected tools, incompatible formats, and organizational silos that were never designed to talk to each other. Requirements sit in DOORS. CAD models live in Teamcenter. Simulation results are on a shared drive somewhere. Test data is in a lab database that three people know how to query. Manufacturing specs are in an ERP system. Supplier data arrives as PDFs in email. Operational telemetry flows into a historian that the design team has never accessed.
Each of these tools is perfectly capable within its domain. The fragmentation is between domains. And that fragmentation is where engineering value is destroyed.
What Fragmentation Actually Costs
The direct costs are well-documented but routinely underestimated.
Duplicated effort. When an engineer cannot find existing analysis, they redo it. Studies in aerospace and automotive consistently show that 30 to 40 percent of engineering time is spent searching for, translating, or recreating information that already exists somewhere in the organization. That is not an efficiency problem — it is a structural waste built into the data architecture.
Inconsistent information. When the same data exists in multiple places with no authoritative source, versions diverge. The stress analyst uses Rev C of the load case. The thermal analyst uses Rev B. The structural designer is working from a CAD model that reflects Rev D geometry. Nobody discovers the inconsistency until integration testing, where the cost of correction is an order of magnitude higher than at design time.
Decisions based on stale data. A change is approved and released in the PLM system, but the downstream simulation model still uses the old parameters because the connection is manual. The simulation passes. The hardware fails. The root cause is not a simulation error — it is a data synchronization failure.
Integration failures discovered late. When data flows are manual, the first time all the pieces come together is during integration and test. Every data handoff that depends on a human copying a value from one tool to another is a defect injection point. The more handoffs in the chain, the more likely the integrated result will contain errors that trace back to broken data flow, not bad engineering.
The Root Cause: Data as a Byproduct, Not an Asset
Most engineering organizations treat data as a byproduct of work. An engineer creates a CAD model to define geometry. The data in that model — parameters, constraints, material assignments, tolerances — is incidental to the engineer's primary goal. It exists because the tool requires it, not because the organization manages it.
This mindset produces tools that optimize for individual productivity but ignore the data relationships that would make the organization productive. A CAD tool that produces excellent geometry but stores it in a proprietary format that only that tool can read has optimized the wrong thing. The geometry is valuable. The inability to connect that geometry to the requirements it satisfies, the analyses that validate it, and the manufacturing processes that produce it destroys value downstream.
The shift that digital engineering demands is treating data as a first-class engineering product — structured, governed, versioned, and connected — not as a side effect of using tools.
Structured vs. Unstructured Data
This distinction is concrete and consequential.
Unstructured data is information locked in documents. A requirement written as a paragraph in a Word file. A trade study captured in a PowerPoint deck. A test report as a PDF. The information is there, but it is accessible only to a human reader. You cannot query it, filter it, trace it, or automate anything with it. To find all requirements allocated to the thermal subsystem that lack a verification method, someone must read every page.
Structured data is information stored with typed attributes in a queryable format. A requirement as a record in a database with fields for ID, shall statement, performance threshold, verification method, rationale, allocation, status, parent requirement, and linked design elements. The same information, but now accessible to computation. That query about unverified thermal requirements takes seconds, not weeks.
The difference is not cosmetic. It determines whether the organization can form the "logical and numerical associations across information streams" that the digital engineering definition requires. You cannot associate what you cannot query. You cannot query what is not structured. And you cannot structure retroactively what was created as prose.
Why This Matters for Digital Engineering
The DE definition speaks of forming associations across information streams using science, computation, data infrastructure, and AI. Every word in that definition assumes that data is accessible, structured, and connected. If your requirements live in Word documents, your CAD lives in disconnected vaults, your simulation results live on shared drives, and your test data lives in spreadsheets — you do not have information streams. You have information puddles.
The gap between where most organizations are and where digital engineering needs them to be is not a tools problem. It is a data architecture problem. Buying better tools does not help if those tools create more disconnected puddles. The work of digital engineering begins with connecting the streams.
Engineering Data Architecture: Disconnected Silos vs. Integrated Fabric
Use the slider to compare the two architectures. In the disconnected model, every boundary between tools is a potential failure point. In the integrated model, those boundaries still exist — but they are bridged by data connections that maintain consistency and traceability. The tools are not replaced; they are connected.
Assessment
An organization stores requirements in DOORS, CAD in Teamcenter, simulation results on a shared drive, and test data in a lab database. A change to a structural load requirement is approved in DOORS. Which of the following are likely consequences of this disconnected architecture? (Select all that apply)
Select all that apply
Identify three specific data handoffs in your organization (or one you are familiar with) where information moves from one tool, team, or lifecycle phase to another. For each handoff, describe: (1) what data moves, (2) how it moves today (manual export, email, copy-paste, API, etc.), (3) what breaks or degrades during the transfer, and (4) what the downstream consequence is when the handoff fails. Then propose one change to the highest-risk handoff that would reduce fragmentation.