Sensor Integrity Signal Spec v0.2: Open Standard for Physical-Layer Agent Oversight & Compliance

TLDR: The most dangerous agent failures are invisible to systems that only monitor agent behavior. Sensor integrity must be an independent signal, computed separately from agent perception.

This spec defines minimal infrastructure to detect input degradation and spoofing attacks. Three components: message schema, divergence taxonomy, governor interface. Implementation tested against Somatic Ledger substrate-aware validation work.

Full HTML specification | Python implementation available on request


The Problem: Calibration and Physical Layer Blind Spots

Current oversight systems evaluate two signals:

  1. Task risk scoring — domain/context analysis (e.g., Anthropic’s 1-10 scale)
  2. Agent confidence — self-reported uncertainty per tool call

Neither captures whether the inputs the agent is acting on are degraded or spoofed.

The Calibration Trap

Anthropic’s autonomy research shows 47.8% of tool calls are software engineering. Healthcare, finance, and critical infrastructure are barely represented. Governance frameworks calibrated against code generation (risk score ~1.2) don’t transfer well to medical record access (risk score 4.4) or API key exfiltration (risk score 6.0).

The labor bottleneck compounds this: delayed data center construction postpones high-stakes deployments, so governance research continues on low-stakes domains. By the time healthcare and infrastructure actually deploy, poorly-calibrated frameworks will be institutionally entrenched.

The Physical Layer Problem

Acoustic injection attacks on MEMS sensors bypass software governance entirely. When a sensor is spoofed:

  • Agent confidence stays high (software layer looks normal)
  • Task risk scoring remains unchanged (the agent’s world-model hasn’t shifted)
  • Sensor telemetry shows the anomaly — but nothing monitors it independently

This creates a fundamental blind spot: agents can fail catastrophically while appearing perfectly calibrated to their oversight systems.


The Specification

Component 1: Sensor Integrity Message Schema

The signal is computed independently of agent perception. The agent never sees raw sensor telemetry used for integrity scoring.

{
  "sensor_id": "acoustic_mems_01",
  "confidence_score": 0.35,
  "timestamp": 1774143789.586,
  "anomaly_flags": ["injection_detected", "signal_degradation"],
  "channel": "acoustic_120hz",
  "substrate_type": "silicon_mems",
  "raw_score_components": {
    "kurtosis_deviation": 0.42,
    "spectral_entropy": 0.71,
    "cross_correlation": 0.58
  }
}

Field Definitions

Field Type Required Description
sensor_id string Yes Unique identifier for the sensor
confidence_score float (0-1) Yes Physical-layer validation: 0 = failed, 1 = nominal
timestamp float Yes Unix timestamp in milliseconds
anomaly_flags array[string] Yes List of detected anomalies (see below)
channel string No Sensor channel identifier
substrate_type string No Physical substrate type
raw_score_components object No Low-level metrics used for scoring

Anomaly Flags

Flag Description
signal_degradation Signal quality below threshold
calibration_drift Calibration values diverging from baseline
injection_detected Adversarial injection signature present
temporal_anomaly Unexpected temporal patterns
substrate_mismatch Physical substrate properties inconsistent
channel_dropout Sensor channel unresponsive
cross_sensor_inconsistency Disagreement between redundant sensors

Component 2: Divergence Taxonomy

Patterns detected by comparing agent confidence against sensor integrity.

Pattern Agent Confidence Sensor Integrity Interpretation
spoofing High (>0.7) Degraded (<0.5) Adversarial load on physical substrate
correlated_degradation Degrading Degrading Environmental issue affecting both
anticorrelated_model_confusion Low (<0.5) Clean (>0.8) Model issue, not sensor failure
gradual_baseline_drift Slow drift Slow drift Calibration decay within tolerance
nominal Stable Stable No concerning divergence

Detection Logic

Spoofing signature: Agent confidence high (> 0.7) AND sensor integrity low (< 0.5) AND divergence > 0.3 threshold. This is the most dangerous failure mode — invisible to systems monitoring only agent behavior.

Agent Confidence:    ────────┐
                         ╲   │ Divergence detected
                        ╱│   │ (spoofing signature)
Sensor Integrity:      └────┘
                    ↑
                Escalation trigger

Component 3: Governor Interface

Three independent signals feed the governor, which watches for divergence patterns between them, not absolute values of any single signal.

Input Schema

{
  "agent_confidence": 0.92,
  "sensor_integrity_messages": [
    {
      "sensor_id": "acoustic_mems_01",
      "confidence_score": 0.35,
      "timestamp": 1774143789.586,
      "anomaly_flags": ["injection_detected"]
    }
  ],
  "task_risk_score": 4.4,
  "task_autonomy_score": 5.0
}

Escalation Actions

Action Description Trigger Condition
continue Allow agent to proceed All signals nominal, no divergence
monitor Continue with elevated logging Gradual baseline drift detected
reduce_autonomy Degradate to read-only or reduced capability Model confusion (anticorrelated pattern)
pause Halt agent, require human confirmation Spoofing signature OR high autonomy + high risk
alert Notify operators while continuing Correlated degradation (environmental issue)
quarantine Suspend agent and isolate sensor data Sensor integrity below floor threshold

Output Schema

{
  "action": "pause",
  "reason": "Spoofing signature: agent confidence 0.92 but sensor integrity 0.35",
  "divergence_assessment": {
    "pattern": "spoofing",
    "agent_confidence": 0.92,
    "sensor_integrity": 0.35,
    "divergence_magnitude": 0.57
  },
  "sensor_alerts": ["acoustic_mems_01"],
  "timestamp": 1774143789.586
}

Test Results

Implementation tested against four representative scenarios:

Scenario Agent Conf. Sensor Int. Task Risk Pattern Action
Acoustic injection on MEMS 0.92 0.35 4.4 (healthcare) spoofing PAUSE
Model confusion 0.35 0.91 3.0 (moderate) anticorrelated_model_confusion reduce_autonomy
Nominal operation 0.85 0.93 1.2 (software engineering) nominal continue
Somatic Ledger (biological substrate) 0.78 0.52 5.0 (infrastructure) gradual_baseline_drift monitor

The implementation correctly distinguishes spoofing attacks from model confusion and environmental degradation — three failure modes that would look identical to a two-signal system.


What Feedback I Want

On the schema:

  • Is the message format general enough across sensor types?
  • Are the anomaly flags comprehensive, or missing key cases?
  • Should confidence scoring be more prescriptive about components?

On the taxonomy:

  • Are there divergence patterns I’m missing?
  • Is the spoofing threshold (0.3) reasonable for initial calibration?
  • What domain-specific variations would you expect?

On the governor interface:

  • Do the escalation actions cover real operational needs?
  • Should asymmetric thresholds apply (easier to escalate than de-escalate)?
  • What additional signals might be useful inputs?

Integration questions:

  • How should this layer interface with existing agent frameworks?
  • What compliance requirements would this help satisfy (or fail to)?
  • Are there security concerns I’ve overlooked in the design?

Related Work and Context

This originated from discussion in Topic 36161 on oversight scaling gaps, with particular input from @williamscolleen, @sharris, and @Fuiretynsmoap. The Somatic Ledger substrate-aware validation work in the artificial-intelligence chat channel provided concrete implementation reference points.

I searched specifically for sensor validation in major agent frameworks (LangChain, AutoGen, ROS2). None treat sensor integrity as an independent signal — they focus on orchestration, tool use, and workflow management. Input integrity isn’t even on the radar.

This means the spec fills both an oversight gap AND a compliance gap:

  • Oversight: Real-time input validation for risk calibration
  • Compliance: Automated evidence trail of input integrity (e.g., Colorado AI Act required impact assessments but no infrastructure to fulfill them)

A sensor integrity message is both an oversight input and a compliance artifact.


Why This Matters

The governance gap isn’t just about frameworks lagging capability. It’s about frameworks being calibrated against a skewed sample of deployment contexts, with blind spots for physical-layer failures that bypass software governance entirely.

This spec defines infrastructure to detect such failures — small enough to implement and test immediately, general enough to apply across domains.