DGTLENG 206: Applied AI for Engineers
DGTLENG 206 · Lesson 3 of 5

Anomaly Detection and Predictive Maintenance

From Reactive to Predictive

Equipment fails. The question is whether you find out before or after it happens, and how much it costs either way. The maintenance philosophy an organization adopts determines the answer, and that philosophy has evolved through four distinct levels — each enabled by progressively more data and more sophisticated analysis.

Reactive maintenance is the default: run the equipment until it breaks, then fix it. This minimizes maintenance effort in the short term and maximizes downtime, emergency repair cost, and secondary damage in the long term. A bearing fails, the shaft scores, the housing cracks, and a $200 bearing replacement becomes a $40,000 rebuild with six weeks of lost production. Reactive maintenance is appropriate only when the equipment is non-critical, redundant, or cheap enough that replacement is faster than any inspection program.

Scheduled maintenance replaces components on a fixed calendar or usage interval, regardless of their actual condition. Change the oil every 5,000 hours. Replace the pump seals every 18 months. This prevents some catastrophic failures but introduces waste: components are replaced with useful life remaining, and maintenance windows are scheduled based on averages rather than actual condition. The intervals are set conservatively — based on the worst-case degradation rate — which means most components are replaced well before they need to be.

Condition-based maintenance uses periodic or continuous monitoring to assess equipment health and trigger maintenance when degradation is actually detected. Vibration analysis, oil analysis, thermography, and ultrasonic inspection provide direct evidence of condition. This reduces both unexpected failures and unnecessary replacements. But it requires sensors, monitoring infrastructure, and trained analysts to interpret the data — and the decision remains human: an analyst reviews the data, judges the trend, and recommends action.

Predictive maintenance uses machine learning to forecast when failure will occur, enabling maintenance to be scheduled with optimal timing. Not "the bearing is degrading" (condition-based) but "the bearing will reach failure threshold in approximately 340 operating hours" (predictive). This enables maintenance to be planned, parts to be ordered, and downtime to be scheduled during planned outages rather than emergency stops.

The progression from reactive to predictive is a progression in data utilization. Reactive uses no data. Scheduled uses population statistics. Condition-based uses real-time measurements. Predictive uses historical patterns learned from data to forecast future states. Each level requires the capabilities of the previous level plus additional infrastructure.

In MBSE-driven organizations, the system model defines the maintenance-relevant architecture: which components are critical, what failure modes exist, what sensors are available, what the operational envelope looks like. Predictive maintenance does not replace this engineering knowledge — it operates within the framework the model defines.

Training on Normal Operation

The foundational insight of anomaly detection for maintenance is this: you do not need failure data to detect failures. You need data from normal operation.

This is counterintuitive. In supervised classification (Lesson 1), you need labeled examples of each class — pass and fail, fault and no-fault. But failure data is scarce in well-maintained systems. A critical pump might fail once in five years. Waiting to collect enough failure examples to train a classifier is neither practical nor desirable.

Anomaly detection inverts the approach. Instead of learning what failure looks like, you learn what normal looks like. Any deviation from normal is flagged as an anomaly. The model learns the boundaries of healthy operation — the expected vibration signatures, temperature profiles, pressure relationships, power consumption patterns — and flags anything that falls outside those boundaries.

Autoencoders are neural networks trained to reconstruct their input. The network compresses the input (sensor readings) into a lower-dimensional representation, then reconstructs the original from the compressed form. During training on normal data, the autoencoder learns to represent normal patterns efficiently. When presented with abnormal data — a vibration pattern it has never seen — the reconstruction error spikes because the compressed representation cannot capture the unfamiliar pattern. High reconstruction error equals anomaly.

One-class support vector machines (SVM) learn a boundary around normal data in feature space. Any new data point that falls outside the boundary is classified as an anomaly. The method is geometric: it finds the tightest enclosure around the normal data and flags anything outside.

Statistical process control methods learn the distribution of each monitored parameter under normal conditions and flag deviations beyond expected statistical limits. These methods are simpler, more interpretable, and work well for individual parameters — but they miss multivariate anomalies where each individual parameter is within limits yet the combination is abnormal. A temperature of 85 degrees is normal. A vibration of 4.2 mm/s is normal. But the combination of 85 degrees and 4.2 mm/s, if it has never been observed before, may indicate a new operating regime or an emerging fault.

The MBSE connection: The system model defines which sensors monitor which components, what the normal operating envelope is, and what failure modes each component exhibits. The anomaly detection model learns normal behavior from data generated within this model-defined framework. When the model's operational envelope changes — a new operating regime is added, a sensor is relocated — the anomaly detection baseline must be updated.

Anomaly Scoring

Not all anomalies are equal. A vibration reading 2% above the historical maximum is different from one 200% above. Effective anomaly detection systems produce scores, not binary flags, and the engineering value lies in how those scores are interpreted.

Reconstruction error (from autoencoders) provides a continuous anomaly score. Higher error means greater deviation from normal. The score can be trended over time: a gradually increasing reconstruction error suggests progressive degradation, while a sudden spike suggests an acute event.

Mahalanobis distance measures how far a data point is from the center of the normal data distribution, accounting for correlations between parameters. A temperature that is high for the current vibration level is more anomalous than one that is high when vibration is also high, because the combination is unusual even if the individual values are not.

Threshold setting is the critical engineering decision. Set the threshold too low and the system floods operators with false alarms — alert fatigue sets in, and real anomalies are ignored. Set the threshold too high and genuine degradation goes undetected until it is too late. The threshold is not a statistical parameter to be optimized in isolation. It is an engineering decision that balances the cost of false alarms (unnecessary inspections, operator distraction, loss of trust in the system) against the cost of missed detections (unplanned downtime, secondary damage, safety risk).

In practice, multi-level thresholds work well. A "watch" level triggers increased monitoring frequency. A "warning" level triggers an engineering review. An "alarm" level triggers immediate action. Each level carries a different false-positive tolerance: you can accept many watch-level alerts that turn out to be benign, but you need alarm-level alerts to be highly reliable.

Remaining Useful Life Estimation

Anomaly detection answers "is something wrong?" Remaining useful life (RUL) estimation answers "how long until it fails?" This is the core capability that distinguishes predictive from condition-based maintenance.

RUL estimation requires run-to-failure data — historical records of components that were monitored from healthy through degradation to actual failure. This data is rare and valuable. Each run-to-failure record provides one training example: a time series of sensor readings annotated with the known time-to-failure at each point.

Degradation modeling fits a mathematical curve to the observed degradation trend and projects it forward to the failure threshold. If vibration amplitude is growing linearly at 0.1 mm/s per month and the failure threshold is 7.0 mm/s, and the current reading is 4.5 mm/s, the projected RUL is 25 months. This works when the degradation mechanism is understood and the progression is monotonic. It fails when degradation is nonlinear, when multiple mechanisms interact, or when the failure threshold is not well defined.

Recurrent neural networks (RNNs) and their variants (LSTM, GRU) learn temporal patterns in sensor data and predict the remaining time-to-failure directly. Given a sequence of recent sensor readings, the model outputs an RUL estimate. These models can learn complex, nonlinear degradation patterns from data — but they require substantial run-to-failure datasets for training, which are the scarcest resource in predictive maintenance.

Survival analysis methods (Cox proportional hazards, Weibull models) estimate the probability of failure as a function of time and covariates (operating conditions, component age, maintenance history). Instead of a single RUL number, survival analysis provides a probability distribution — "there is a 10% chance of failure within 100 hours, 50% within 500 hours, 90% within 1,200 hours." This probabilistic framing is more honest about the uncertainty inherent in RUL estimation and more useful for maintenance planning.

Transfer learning addresses the data scarcity problem. A model trained on run-to-failure data from one fleet of turbines can be adapted to a different fleet with limited local data. The model transfers general knowledge about degradation patterns and is fine-tuned on the specific characteristics of the target fleet. This does not eliminate the need for local data, but it reduces the requirement from hundreds of run-to-failure records to tens.

Integration with Digital Twins

Predictive maintenance reaches its full potential when integrated with digital twins (DGTLENG 202). The digital twin maintains a real-time model of the physical system's state, driven by sensor data. Anomaly detection and RUL estimation become twin capabilities — the twin does not just mirror current state but predicts future state.

Physics-informed anomaly detection uses the twin's physics model to generate expected sensor readings for the current operating conditions. The anomaly score is the deviation between measured and predicted values, where "predicted" comes from physics, not statistics. This approach is more robust than purely data-driven methods because it accounts for operating condition changes. A higher temperature is expected when the load increases — the physics model predicts this, so it is not flagged as an anomaly. But a higher temperature at the same load, when the physics model predicts no change, is genuinely anomalous.

What-if maintenance planning uses the twin to simulate the consequences of different maintenance decisions. If the RUL estimate says 340 hours remaining, the twin can simulate: what happens if we run for 200 more hours before maintenance? What load restrictions would extend the RUL to 500 hours? What is the probability of secondary damage if the primary component fails? These simulations inform the maintenance decision with engineering analysis, not just statistical prediction.

Fleet-level optimization extends from individual twins to populations. With digital twins of every critical asset, maintenance scheduling becomes an optimization problem: given the predicted RUL of every component across the fleet, schedule maintenance to minimize total cost (planned downtime plus risk-weighted unplanned downtime) while respecting constraints (minimum fleet availability, maintenance crew capacity, parts inventory).

The system model underpins this integration. It defines the component hierarchy, the sensor allocation, the failure modes, the physics models embedded in the twin, and the maintenance procedures. The AI operates on data structured by this model framework.

False Positive Management

False positives are the silent killer of predictive maintenance programs. A system that generates ten false alarms for every real anomaly will be ignored within months, no matter how good its true detection rate is. Managing false positives is not a tuning exercise — it is a design discipline.

Root causes of false positives include:

  • Operating condition changes that the model has not learned (new product run, seasonal temperature variation, changed load profile)
  • Sensor drift or calibration shifts that the model interprets as process changes
  • Transient conditions (startup, shutdown, load changes) that are normal but infrequent enough to fall outside the learned normal baseline
  • Model staleness — the definition of "normal" has shifted gradually, and the model has not been retrained

Mitigation strategies:

  • Condition-aware models that explicitly account for operating mode, ambient conditions, and load level as inputs — so that expected variation is modeled, not flagged
  • Temporal filtering that requires an anomaly to persist for a defined duration before alerting — transient spikes are logged but do not trigger action
  • Ensemble agreement that requires multiple independent models or features to flag the same anomaly before alerting — reducing the chance that a single noisy signal triggers a false alarm
  • Operator feedback loops where maintenance technicians can mark alerts as true or false positives, and this feedback is used to retrain and improve the model over time

The economics are asymmetric. In most industrial contexts, the cost of a false alarm is an unnecessary inspection (hours of labor, possibly a brief shutdown). The cost of a missed real anomaly is catastrophic failure (days of downtime, major repair cost, possible safety incident). The threshold should be set to favor detection over precision — but not so far that alert fatigue makes the system useless. The optimal operating point depends on the specific costs, and those costs come from the engineering and business context, not from the ML model.

Economics of Maintenance Strategies

The economic case for predictive maintenance is not "it saves money." The economic case is specific: it reduces the total cost of maintenance by shifting spend from emergency repairs and unnecessary replacements to planned, condition-driven interventions.

Reactive maintenance has the lowest upfront cost (no monitoring equipment, no analysis infrastructure) and the highest long-term cost (emergency repairs, secondary damage, unplanned downtime, safety incidents). Total cost is dominated by the tail events — the one catastrophic failure that costs more than years of monitoring would have.

Scheduled maintenance has moderate, predictable costs (regular replacement intervals) but wastes component life and maintenance labor. When intervals are set conservatively, as they must be for safety, the waste is substantial — studies across industries consistently show that 30 to 60 percent of scheduled replacements occur on components with significant remaining life.

Condition-based maintenance reduces waste by replacing components based on observed degradation. It requires investment in monitoring equipment and trained analysts. The economic return depends on the cost ratio: monitoring cost versus the cost of unnecessary replacements avoided and unexpected failures prevented.

Predictive maintenance adds the time dimension — knowing when failure will occur enables optimal scheduling. The additional cost is the ML infrastructure (data pipelines, model development, model monitoring). The additional value is maintenance planning: ordering parts just in time, scheduling downtime during planned outages, batching maintenance across multiple components to minimize total downtime.

The business case must be built on specific numbers for specific equipment. A predictive maintenance program for a fleet of gas turbines, where a single unplanned shutdown costs $500,000 per day, has a very different return profile than one for HVAC systems in a commercial building. The AI does not change the economics — it enables a more precise response to economics that already exist.

Maintenance Strategy Comparison

When maintenance occursAfter the equipment has already failed. No advance warning.
Data requiredNone. No monitoring infrastructure needed.
Cost profileLowest upfront investment but highest total cost. Emergency repairs cost 3-10x planned repairs, and secondary damage multiplies the bill.
Risk profileHighest risk of safety incidents, cascading failures, and extended unplanned downtime.
Appropriate whenEquipment is non-critical, redundant, or cheap enough that replacement is faster than any inspection program.

Compare the four strategies. Notice that each builds on the data and infrastructure of the previous level. Predictive maintenance does not replace condition-based monitoring — it adds a forecasting layer on top of it. An organization that skips condition-based monitoring and jumps straight to predictive maintenance will find that it has no data to train on and no monitoring infrastructure to deploy against.

Assessment

Question 1 of 3Score: 0

An anomaly detection system for a gas turbine fleet uses an autoencoder trained on 12 months of normal operation data. After deployment, it flags a significant anomaly on Turbine 7. Maintenance inspection finds no defect. Which of the following are plausible explanations? (Select all that apply)

Select all that apply

Choose a specific piece of critical equipment from your engineering domain (a turbine, a pump, a structural component, an electronic assembly — anything with meaningful failure consequences). Describe: (1) What sensors would you monitor and why — connect each sensor to a specific failure mode from the system model. (2) How you would define 'normal' for training an anomaly detection model — what data would you collect, over what time period, and what operating conditions must be represented. (3) How you would handle the first false positive — the system flags an anomaly, maintenance inspects and finds nothing. What do you do with this information? (4) What would be required to move from anomaly detection to RUL estimation for this equipment — what data is missing and how would you collect it?