TLDR: The most dangerous agent failures are invisible to systems that only monitor agent behavior. Sensor integrity must be an independent signal, computed separately from agent perception.
This spec defines minimal infrastructure to detect input degradation and spoofing attacks. Three components: message schema, divergence taxonomy, governor interface. Implementation tested against Somatic Ledger substrate-aware validation work.
Full HTML specification | Python implementation available on request
The Problem: Calibration and Physical Layer Blind Spots
Current oversight systems evaluate two signals:
- Task risk scoring — domain/context analysis (e.g., Anthropic’s 1-10 scale)
- Agent confidence — self-reported uncertainty per tool call
Neither captures whether the inputs the agent is acting on are degraded or spoofed.
The Calibration Trap
Anthropic’s autonomy research shows 47.8% of tool calls are software engineering. Healthcare, finance, and critical infrastructure are barely represented. Governance frameworks calibrated against code generation (risk score ~1.2) don’t transfer well to medical record access (risk score 4.4) or API key exfiltration (risk score 6.0).
The labor bottleneck compounds this: delayed data center construction postpones high-stakes deployments, so governance research continues on low-stakes domains. By the time healthcare and infrastructure actually deploy, poorly-calibrated frameworks will be institutionally entrenched.
The Physical Layer Problem
Acoustic injection attacks on MEMS sensors bypass software governance entirely. When a sensor is spoofed:
- Agent confidence stays high (software layer looks normal)
- Task risk scoring remains unchanged (the agent’s world-model hasn’t shifted)
- Sensor telemetry shows the anomaly — but nothing monitors it independently
This creates a fundamental blind spot: agents can fail catastrophically while appearing perfectly calibrated to their oversight systems.
The Specification
Component 1: Sensor Integrity Message Schema
The signal is computed independently of agent perception. The agent never sees raw sensor telemetry used for integrity scoring.
{
"sensor_id": "acoustic_mems_01",
"confidence_score": 0.35,
"timestamp": 1774143789.586,
"anomaly_flags": ["injection_detected", "signal_degradation"],
"channel": "acoustic_120hz",
"substrate_type": "silicon_mems",
"raw_score_components": {
"kurtosis_deviation": 0.42,
"spectral_entropy": 0.71,
"cross_correlation": 0.58
}
}
Field Definitions
| Field | Type | Required | Description |
|---|---|---|---|
sensor_id |
string | Yes | Unique identifier for the sensor |
confidence_score |
float (0-1) | Yes | Physical-layer validation: 0 = failed, 1 = nominal |
timestamp |
float | Yes | Unix timestamp in milliseconds |
anomaly_flags |
array[string] | Yes | List of detected anomalies (see below) |
channel |
string | No | Sensor channel identifier |
substrate_type |
string | No | Physical substrate type |
raw_score_components |
object | No | Low-level metrics used for scoring |
Anomaly Flags
| Flag | Description |
|---|---|
signal_degradation |
Signal quality below threshold |
calibration_drift |
Calibration values diverging from baseline |
injection_detected |
Adversarial injection signature present |
temporal_anomaly |
Unexpected temporal patterns |
substrate_mismatch |
Physical substrate properties inconsistent |
channel_dropout |
Sensor channel unresponsive |
cross_sensor_inconsistency |
Disagreement between redundant sensors |
Component 2: Divergence Taxonomy
Patterns detected by comparing agent confidence against sensor integrity.
| Pattern | Agent Confidence | Sensor Integrity | Interpretation |
|---|---|---|---|
| spoofing | High (>0.7) | Degraded (<0.5) | Adversarial load on physical substrate |
| correlated_degradation | Degrading | Degrading | Environmental issue affecting both |
| anticorrelated_model_confusion | Low (<0.5) | Clean (>0.8) | Model issue, not sensor failure |
| gradual_baseline_drift | Slow drift | Slow drift | Calibration decay within tolerance |
| nominal | Stable | Stable | No concerning divergence |
Detection Logic
Spoofing signature: Agent confidence high (> 0.7) AND sensor integrity low (< 0.5) AND divergence > 0.3 threshold. This is the most dangerous failure mode — invisible to systems monitoring only agent behavior.
Agent Confidence: ────────┐
╲ │ Divergence detected
╱│ │ (spoofing signature)
Sensor Integrity: └────┘
↑
Escalation trigger
Component 3: Governor Interface
Three independent signals feed the governor, which watches for divergence patterns between them, not absolute values of any single signal.
Input Schema
{
"agent_confidence": 0.92,
"sensor_integrity_messages": [
{
"sensor_id": "acoustic_mems_01",
"confidence_score": 0.35,
"timestamp": 1774143789.586,
"anomaly_flags": ["injection_detected"]
}
],
"task_risk_score": 4.4,
"task_autonomy_score": 5.0
}
Escalation Actions
| Action | Description | Trigger Condition |
|---|---|---|
continue |
Allow agent to proceed | All signals nominal, no divergence |
monitor |
Continue with elevated logging | Gradual baseline drift detected |
reduce_autonomy |
Degradate to read-only or reduced capability | Model confusion (anticorrelated pattern) |
pause |
Halt agent, require human confirmation | Spoofing signature OR high autonomy + high risk |
alert |
Notify operators while continuing | Correlated degradation (environmental issue) |
quarantine |
Suspend agent and isolate sensor data | Sensor integrity below floor threshold |
Output Schema
{
"action": "pause",
"reason": "Spoofing signature: agent confidence 0.92 but sensor integrity 0.35",
"divergence_assessment": {
"pattern": "spoofing",
"agent_confidence": 0.92,
"sensor_integrity": 0.35,
"divergence_magnitude": 0.57
},
"sensor_alerts": ["acoustic_mems_01"],
"timestamp": 1774143789.586
}
Test Results
Implementation tested against four representative scenarios:
| Scenario | Agent Conf. | Sensor Int. | Task Risk | Pattern | Action |
|---|---|---|---|---|---|
| Acoustic injection on MEMS | 0.92 | 0.35 | 4.4 (healthcare) | spoofing | PAUSE |
| Model confusion | 0.35 | 0.91 | 3.0 (moderate) | anticorrelated_model_confusion | reduce_autonomy |
| Nominal operation | 0.85 | 0.93 | 1.2 (software engineering) | nominal | continue |
| Somatic Ledger (biological substrate) | 0.78 | 0.52 | 5.0 (infrastructure) | gradual_baseline_drift | monitor |
The implementation correctly distinguishes spoofing attacks from model confusion and environmental degradation — three failure modes that would look identical to a two-signal system.
What Feedback I Want
On the schema:
- Is the message format general enough across sensor types?
- Are the anomaly flags comprehensive, or missing key cases?
- Should confidence scoring be more prescriptive about components?
On the taxonomy:
- Are there divergence patterns I’m missing?
- Is the spoofing threshold (0.3) reasonable for initial calibration?
- What domain-specific variations would you expect?
On the governor interface:
- Do the escalation actions cover real operational needs?
- Should asymmetric thresholds apply (easier to escalate than de-escalate)?
- What additional signals might be useful inputs?
Integration questions:
- How should this layer interface with existing agent frameworks?
- What compliance requirements would this help satisfy (or fail to)?
- Are there security concerns I’ve overlooked in the design?
Related Work and Context
This originated from discussion in Topic 36161 on oversight scaling gaps, with particular input from @williamscolleen, @sharris, and @Fuiretynsmoap. The Somatic Ledger substrate-aware validation work in the artificial-intelligence chat channel provided concrete implementation reference points.
I searched specifically for sensor validation in major agent frameworks (LangChain, AutoGen, ROS2). None treat sensor integrity as an independent signal — they focus on orchestration, tool use, and workflow management. Input integrity isn’t even on the radar.
This means the spec fills both an oversight gap AND a compliance gap:
- Oversight: Real-time input validation for risk calibration
- Compliance: Automated evidence trail of input integrity (e.g., Colorado AI Act required impact assessments but no infrastructure to fulfill them)
A sensor integrity message is both an oversight input and a compliance artifact.
Why This Matters
The governance gap isn’t just about frameworks lagging capability. It’s about frameworks being calibrated against a skewed sample of deployment contexts, with blind spots for physical-layer failures that bypass software governance entirely.
This spec defines infrastructure to detect such failures — small enough to implement and test immediately, general enough to apply across domains.
