DGTLENG 206 · Lesson 1 of 5

Framing Engineering Problems for AI

Not Every Problem Needs AI

The first skill in applied AI is not building models. It is deciding whether to build one at all. AI is a powerful tool with real costs — data acquisition, model development, validation, maintenance, and the risk of incorrect predictions in safety-critical contexts. The engineering value of AI comes not from applying it everywhere but from applying it where three conditions are met simultaneously.

Is there enough data? Machine learning learns from examples. If you have five test results, you do not have enough data to train a regression model, no matter how sophisticated the architecture. Engineering data is expensive — each data point may represent a physical test, a high-fidelity simulation, or months of operational monitoring. The question is not "do I have data?" but "do I have enough data to learn the pattern I care about, within the domain I care about?"

In MBSE-driven organizations, the system model defines what data exists and where it lives. Model data (component properties, interface definitions, requirement allocations), simulation data (stress fields, thermal profiles, performance curves), test data (measured vs. predicted), and operational data (telemetry, maintenance logs) are all structured and traceable. This structured data is the foundation that makes AI feasible in engineering — without it, the data wrangling cost dominates the entire effort.

Is the pattern learnable? Some engineering relationships are smooth, continuous, and well-behaved. Material yield strength varies predictably with temperature. Drag varies smoothly with angle of attack in the pre-stall regime. These patterns are learnable from modest datasets. Other relationships involve discontinuities, phase transitions, or chaotic dynamics — a composite laminate fails catastrophically with no gradual degradation; turbulent flow transitions are sensitive to initial conditions. Learnable patterns have structure that a model can exploit. Random or chaotic processes do not.

Is the cost of being wrong acceptable? A surrogate model that mispredicts drag by 5% during a design screening study wastes a few hours of follow-up analysis. A surrogate model that mispredicts failure stress by 5% on a flight-critical structure can kill people. The consequences of error determine how much validation is required and whether AI augmentation is appropriate at all. Advisory applications (AI suggests, human decides) tolerate higher error rates than autonomous applications (AI acts without human review).

The Four Problem Types

Engineering problems map onto four fundamental ML problem types. The mapping is not always one-to-one — a single engineering workflow may involve multiple problem types in sequence. But recognizing which type you are dealing with determines the methods, the data requirements, and the validation strategy.

Classification

Classification assigns discrete labels to inputs. The model sees a set of features and predicts a category.

In engineering, classification problems appear throughout the lifecycle. Pass/fail determination: given a set of test measurements, does the component meet its specification? This is a binary classification problem where the features are the measurements and the label is the verdict. Fault diagnosis: given a pattern of sensor readings from an operating system, which fault category does this pattern indicate? Multi-class classification where the features are sensor signatures and the classes are known fault types. Quality inspection: given an image of a manufactured part, does it contain a defect? Image classification, often using convolutional neural networks.

The critical question for classification in engineering is class imbalance. Failures are rare. In a dataset of 10,000 operational cycles, perhaps 50 are fault events. A model that predicts "no fault" every time achieves 99.5% accuracy while being completely useless. Metrics like precision, recall, and F1 score matter more than raw accuracy. And in safety-critical applications, the cost of a false negative (missed fault) far exceeds the cost of a false positive (unnecessary inspection).

The connection to MBSE: the system model defines the fault taxonomy (what failure modes exist), the sensor architecture (what measurements are available), and the requirements that define pass/fail thresholds. Classification models operate on data generated within this model-defined framework.

Regression

Regression predicts continuous numerical outputs from inputs. It is the ML analog of curve fitting, extended to high-dimensional spaces.

Engineering regression problems include property prediction: predicting material properties (yield strength, fatigue life, thermal conductivity) from composition and processing parameters. Performance estimation: predicting system-level performance (range, efficiency, capacity) from design parameters. Surrogate modeling: predicting simulation outputs (stress, temperature, displacement fields) from simulation inputs, replacing expensive solvers with fast approximations.

Surrogate models deserve special emphasis because they sit at the intersection of MBSE and AI. The system model defines the design parameters. Simulations — configured from model data — generate the training data. The surrogate learns the simulation's input-output mapping. Once trained, the surrogate replaces the simulator for applications that require thousands of evaluations: optimization, uncertainty quantification, real-time digital twin predictions. The surrogate's validity is bounded by the design space defined in the model — and extrapolation outside that space is the primary risk.

Clustering

Clustering discovers natural groupings in data without predefined labels. It is unsupervised — the algorithm finds structure, not answers.

Engineering clustering applications include design space exploration: grouping thousands of candidate designs by similarity to identify distinct design families rather than examining each individually. Anomaly detection: identifying data points that do not belong to any normal cluster — potential faults, unusual operating conditions, or data quality issues. Operational profiling: grouping operational duty cycles to discover that a fleet of vehicles operates in three distinct usage patterns, each requiring different maintenance strategies.

Clustering is often the first step in an analysis pipeline. Before you build a classifier or a regressor, clustering helps you understand the structure of your data: are there distinct subpopulations? Are there outliers? Is the data distribution what you expected based on the system model?

Generation

Generation creates new artifacts — designs, text, code, images — that satisfy specified constraints.

Engineering generation applications include design synthesis: generating candidate geometries that satisfy performance constraints, using generative adversarial networks, variational autoencoders, or topology optimization with learned priors. Report drafting: generating structured engineering reports from model data — assembly descriptions, test summaries, trade study narratives. Code generation: producing analysis scripts, configuration files, or simulation setup code from natural language descriptions of the analysis to be performed.

Generation is the newest and least mature of the four problem types in engineering applications. The primary risk is plausibility without correctness: a generated design may look reasonable and satisfy stated constraints while violating unstated constraints that the model did not capture. A generated report may read fluently while containing factual errors. Human review remains essential for all generative applications in engineering.

Mapping Engineering Problems to ML Types

The practitioner's task is to take a real engineering problem — "we need to reduce the time it takes to evaluate thermal compliance across 500 design variants" — and decompose it into one or more ML problem types with clear inputs, outputs, data sources, and success criteria.

The decomposition requires understanding what data the system model makes available (structured parameters, simulation results, test records), what the engineering workflow actually needs (a ranked list, a pass/fail verdict, an anomaly flag, a draft report), and what level of accuracy the downstream decision requires.

For each problem type, consider: in your own engineering domain, which application would provide the most value, and which carries the most risk? The answer to these two questions is often the same application — which is why the decision framework matters.

Assessment

Question 1 of 3Score: 0

A manufacturing team has 50,000 images of machined parts, of which 200 show surface defects. They want to build an automated inspection system. Which of the following are valid concerns? (Select all that apply)

Select all that apply

Choose a specific engineering product or system you are familiar with. Identify one problem from each of the four ML types (classification, regression, clustering, generation) that would be valuable for that product. For each: (1) state the specific engineering problem, (2) identify the data source (model data, simulation data, test data, or operational data), (3) explain what 'enough data' means for this problem, and (4) describe the consequences of the model being wrong and what level of human oversight is appropriate.