Case Study · 01

Rocket Engine Health Orchestrator.

A rocket engine often fails quietly first — several sensors drifting at once, each still inside its own limit. Conventional redline monitoring is structurally blind to that. This is a research prototype that reads the channels together and flags the combined signature before any single one breaks.

Stack: Python · PyTorch · Transformers · CNN · Three.js Period: 2026 Status: research prototype · early build

orchestrator — live interface

Interactive demo

The live orchestrator interface

3-D engine · live verdicts · guided tour

The problem

When a rocket engine fails, the failure has underlying causes — a bearing degrading, a valve mis-commanding, combustion turning unstable — that show up across the sensors as they develop. Often they show up quietly and across many channels at once: vibration creeping up, a temperature trending warm, a pressure easing down, each within its own limit but together forming the fault's signature.

Conventional monitoring watches each channel against its own redline, and is structurally blind to this class of failure — the evidence is distributed, and no single channel breaks its limit until the fault is well advanced. Catching these coupled, sub-threshold faults is the problem this project takes on.

The system is positioned as an advisory layer above the engine's existing deterministic redline protection, which retains all authority to act. It is a research prototype — not a deployable or certifiable system — built to demonstrate one approach to the problem, and to be extended.

The approach

Detect the fault from its combined signature, not from any single channel.

Rather than watch readings in isolation, the system learns the engine's normal command-to-response behavior — given the controls and current state, what the channels should do — and flags departures from it. Trained on a system's abundant normal operating history rather than its scarce failures, it ignores benign changes that are merely the expected response to a command: a pressure rising because the engine was throttled up is expected; the same rise without the command is not.

Each sensor is treated as its own kind of signal. A dedicated encoder turns each channel into a common representation, and an attention layer fuses those representations to weigh the channels against one another. The output is a verdict: an anomaly flag, a calibrated confidence, and the channels the model weighted most heavily in reaching it.

Schematic of the combinatorial idea — illustrative, not measured data. Four channels sit in caution, none past its redline; the orchestrator fuses them into a single verdict and names the channels it weighted most.

Treating the channels as a set that attention fuses is what makes the system combinatorial — the signal lives in the combination, whether that combination is a temporal cascade or a simultaneous coincidence. That is the whole idea: several channels, each unremarkable alone, resolving into a named assessment while no single one has crossed a redline.

Architecture

Per-channel encoders, fused by attention.

The v1 prototype models eight representative channels. Every encoder — whatever its internals — emits a fixed-length embedding in a shared space, so the fusion layer sees a uniform set of tokens and never depends on how any channel was produced. That shared interface is what makes the system modular: an encoder can be swapped, or a new channel added, without changing anything above it.

Encoders matched to each signal

Statistical encoders handle most channels (chamber pressure, bearing and nozzle temperatures, coolant flow, feed pressure): the embedding is a residual against expected behavior — cheap and legible, and most of the system.
A small convolutional encoder (CNN) handles turbopump vibration, whose waveform and frequency-band structure a scalar residual can't represent.
The command-to-response model handles the channels whose meaning is inherently dynamic (shaft speed, mixture ratio), emitting its hidden state as their representation rather than judging them in isolation.

The orchestrator

An attention layer takes the per-channel embeddings as tokens, each tagged with a learned channel identity, and weighs them against one another to produce the verdict. Confidence is reported as a calibrated measure of how far behavior sits from learned-normal — most reliable in the regime the model has seen, least on behavior unlike its training. An overconfident miss is the dangerous failure here, so the plan is post-hoc calibration (likely temperature scaling) after training.

The implicated channels are the ones the fusion layer weighted most heavily — a direct view of the model's attention, offered as a starting point for where to look. The working assumption is stated plainly: weighting indicates relevance, not proven cause.

Design decisions

What I considered, and why I rejected it.

The priority was keeping the architecture modular and easy to iterate on — starting simple and only adding complexity when the data shows it's needed. Two decisions shaped the rest.

Per-channel encoders, not grouped or monolithic

Grouping similar channels under one encoder, or running a single transformer over all raw channels concatenated, were both on the table. I rejected them: a per-channel embedding maps to one physical quantity, so the fusion layer's attention weights map directly back to individual sensors. That is what keeps the output legible to an operator. A group embedding is a blend — when the model attends to it, you can't ask what happened in any specific channel inside it.

Command context in the fusion layer, not the encoders

The hardest practical problem is avoiding false alarms during commanded changes: a throttle-down legitimately moves many channels at once, and without context that looks like a fault. Baking command signals into each encoder would couple them to command logic and break modularity. Instead the encoders stay focused purely on sensor data, and command context is handled in the fusion layer through cross-attention to a short command history.

Four floor commitments

These hold regardless of how the design evolves:

The orchestrator must catch coupled, sub-threshold faults that per-channel redlines miss on the same data — that baseline comparison is the result.
Training, calibration, and fault-injection data are generated separately and kept disjoint. The model sees only commands and telemetry at inference time.
Attention weights indicate which channels were relevant, not what caused the fault — and the output is presented honestly about that distinction.
The system advises; it does not act. Deterministic redline protection keeps all authority.

Data generation

You can't test failure detection without manufacturing failures.

The system needs data it doesn't have: a real engine's operating history, and examples of the faults it's meant to catch. A physics-based synthetic generator supplies both for the prototype. A reduced-order engine model produces nominal telemetry with realistic cross-channel correlation — driven through a duty cycle (startup ramp, steady state, throttle changes, shutdown) and layered with sensor noise, channel lag, and run-to-run variation.

Faults are injected as physical perturbations in that same model — a bearing degradation is a rising friction term plus a thermal term — so the multi-channel signature emerges through the model's couplings on its own, rather than being drawn into the data by hand. The generator authors the faults, so it holds ground truth, which makes it the system's test oracle: known fault in, expected verdict out.

The seam is deliberately clean. The generator and a real telemetry feed produce the same thing — multi-channel time series on the same interface — so real history can replace the model without touching the logic above it. This is a reduced-order model, not a validated simulation of any specific engine; the claim is that its behavior is physically plausible and correctly coupled, enough to make the detection problem genuine.

The interface

An assessment is only useful if a human can act on it.

High-dimensional, correlated state is exactly what a person can't read from raw numbers — the system's own problem, turned on the human looking at it. So the interface is built to make the reasoning legible at a glance.

Split screen

The physical engine on the left, the orchestrator's reasoning on the right — truth and interpretation side by side.

Severity is color

State reads before a single number is parsed; nominal, caution, and alarm are immediate.

Verdict routing

When a verdict forms, the implicated channels brighten and animate inward — you watch the evidence assemble.

Reasoning points at metal

The named channels glow on the 3-D engine in sync, tying an abstract assessment back to physical hardware.

Self-explaining tour

A guided walkthrough carries a first-time viewer through the whole idea, unattended.

The contrast, shown

It can run the same drift the orchestrator flags while every individual limit stays quiet — the thesis, demonstrated.

▶ Launch interactive demo

Roadmap & open problems

The first rung of a ladder.

The prototype is built so the next rungs are reachable without rebuilding.

v1 (this build): the full pipeline — per-channel encoders, command-to-response modeling, attention fusion, a verdict, and the demonstration — on synthetic data, advisory, fusing over a flat set of channels.
Next: push more of the detection weight into the learned fusion and harden it against a richer fault library.
Then: graph attention over a learned channel-influence topology — so a verdict traces to a connected subgraph of channels rather than a bag of weights. The learned graph becomes not just a detector but an explanation.

The honest unknowns sit in the same territory:

Faithful explanation. Implicated-channel weighting can be shown to align with authored ground truth, but alignment within synthetic scenarios isn't proof of causal faithfulness. The graph-structure direction is one concrete way to attack it.
Trust under novelty. Learned components are least sure exactly where a system meant to catch the unprecedented needs them most. Bounding that is open.
Authority and certification. The advisory placement isn't a temporary limitation to engineer away — it reflects where trust in these methods actually stands.

The open architectural question I want to test first: whether cross-attention over a command sequence is enough to capture command-to-response dynamics, or whether a small recurrent model does better. That's the first experiment once the pipeline runs end-to-end.

Stack

What's under the hood.

Per-channel encoders emit fixed-length embeddings into a shared space; an attention layer fuses them, conditioned on command history through cross-attention, into a calibrated verdict. A reduced-order physics model generates nominal and faulted telemetry. A Three.js front end maps the verdict back onto the engine.

Encoders

Statistical residual encoders
Convolutional encoder (vibration)
Shared embedding interface

Fusion

PyTorch · attention orchestrator
Cross-attention to command history
Calibrated confidence

Data generation

Reduced-order physics model
Faults injected as perturbations
Disjoint train / calibrate / test

Visualization

Three.js live 3-D engine
Severity color · verdict routing
Self-explaining guided tour

Artifacts

The work, in public.

Demo

Interactive demonstration

Live 3-D engine, orchestrator verdict, guided tour

›

Code

GitHub repository

Source — encoders, attention fusion, and the data generator

›

Writeup

Design rationale

Decisions considered and rejected, in narrative form

›