Robot Learning for Anomaly Detection and Introspection through Cameras

2 February, 2026

Project Overview

The aim of this project to detect and alert anomalies in critical or lagacy control systems (e.g., cranes) without direct connection to the system, yet using purely CCTV-based introspection. This provides an air-gap that protects the system from cyber attacks. However, vision based state estimation incorporates significantly higher noise for the model to differentiate whether it's a true anomaly.

Camera-based State Estimation

Methodology

In this work, we propose a platform-agnostic anomaly detection framework, namely AWARE, for reliable anomaly detection in the real world. AWARE comprises of two major components:

  • A world model that learned the dynamics of the system and provides next state predictions given history states and actions. By comparing the real-time prediction error with a precomputed reference, the model flags whenever a large discrepancy of behavior occurs (e.g. collision, dangerous movement).

  • A latent estimator that estimates dynamics parameters from a window of states and actions. This functions as a complementary signal for detecting subtle changes of system dynamics (e.g. a degraded motor, or a shift of payload's center of mass).

Case A: Payload Collision
Case C: Dangerous Behavior
Case B: Motor Degradation
Case D: Persistent Drag

When such anomalies are detected, the model alerts the operator in real-time, enabling proactive expert intervention and preventing further damage. AWARE also supports damage introspection by decoding the estimated latent, pinpointing the source of the issue.

Results

  • Our recent results demonstrate a 77% detection success rate for payload collisions and motor degradation using high noise, camera-based state estimation.
  • The trained model demonstrates highly accurate motion prediction, with about 3 degrees of error over a 2 second horizon.
  • Our method also demonstrates 96% success rate in detecting which motor is impaired with only camera-based state estimation.

Unlike modern VLMs which usually introduce large cloud compute and high latency, AWARE is designed to operate in real-time, fully on device (NVIDIA Jetson Orin), and consume less than 200W power.

Future Works

It would be interesting to see the transferability of AWARE on a complete different platform, for example, a quadruped robot. The robot contains higher dimensional observations that poses many challenges to the state estimation, such as self-occlusion. In the meantime, the moving robot suggests a dynamic nominal condition that changes with respect to the interection between robot and the environment. Also, it remains unclear whether AWARE still detects reliably when the locotion policy internally incorporates a level of self-correction.

Nevertheless, this project is still undergoing as more interesting aspects of the method are yet to be published.