Satellite Health Monitor

A full end-to-end anomaly-detection system for a simulated Low-Earth-Orbit satellite. A Rust physics engine generates realistic telemetry; a Python XGBoost classifier watches the stream in real time and fires alerts within ~65 seconds of fault onset. All predictions come with SHAP explanations traceable to physical quantities.

RustPythonXGBoostSHAPReal-time MLAnomaly DetectionFeature Engineering

Context

Satellite anomaly detection is a classic real-time ML problem: the data stream is continuous, faults can be subtle, and a false positive is almost as costly as a missed alert. The challenge is compounded by periodic physical events (eclipse every ~95 minutes) that create abrupt telemetry changes indistinguishable from certain faults unless the model has a long enough temporal horizon.

I built both the simulator and the detector from scratch. The simulator runs in Rust for speed and models ECI orbital mechanics, attitude control, eclipse transitions, thermal effects, and battery dynamics. Each 24-hour run injects one fault at a random time, producing a labelled CSV. The detector is trained on 500 such runs (~43 million raw telemetry rows before feature engineering).

Training runs50024 h each
Raw telemetry43 Mrows (1 Hz)
Features532after engineering
Macro F1 (test)1.008 classes
Alert latency~65 sfrom fault onset
False positives0nominal runs

Method

1
Rust Simulatororbital-mechanics.rs

ECI orbit + attitude dynamics, eclipse model, fault injection at a random start time.

2
Feature Eng.preprocess.py

Rolling stats (30 s - 600 s), slopes (1 min - 1 hr), eclipse counters. 532 features total.

3
XGBoosttrain.py

400 trees, multi:softprob, 8 classes. Time-split: 70 % train / 30 % test.

4
Threshold Tuningevaluate.py

Per-class threshold on adj. proba = proba / threshold. Battery=4.0, solar_panel=3.5.

5
SHAPexplain.py

TreeExplainer on 500 stratified test rows per class; global + per-class importance.

6
Real-time Infer.inference.py

Sliding buffer (4 000 raw rows), predict every 30 s, alert on 3 consecutive hits.

Anomaly classes

Battery DegradationGradual capacity loss; SOC slope diverges over hours.
Sensor DriftTemperature ratio features skew as a sensor calibration drifts.
Wheel FrictionAngular velocity noise builds exponentially over ~30 min.
Solar Panel FaultPanel efficiency drops; distinguished from eclipse by long-range power slope.
Solar StormBroadband attitude disturbance; angular rates spike across all axes.
Thruster FailureSlow z-axis angular drift; model picks it up via omega_z rolling std.
Meteorite ImpactSudden broad angular kick, resolved in one orbit by reaction wheels.

Results: Live Demo

Pick a scenario and watch the detector in action. Telemetry plays back at accelerated speed; the ML panel updates every 30 simulated seconds. The alert fires when the model predicts the same class three times in a row with at least 50 % confidence. SHAP bars show which features drove each prediction.

Conclusion

The final system achieves macro F1 = 1.00 on a held-out time split and zero false positives across all tested nominal runs, including during the eclipse/re-entry windows that originally caused spurious solar-panel alerts. The SHAP breakdown confirms that each anomaly class is genuinely driven by the physically appropriate signal: angular velocity noise for wheel friction, z-axis drift for thruster failure, battery SOC slope for degradation, and the long-range power slope for solar panel faults during eclipse.

The key lesson is architectural: when an anomaly can be masked by a periodic physical event, you need a feature whose temporal horizon spans that period. Short rolling windows encode current state; long-range slopes encode pre-event baselines. Combining both is what made the eclipse ambiguity learnable.