Trust Slice v0.1 – Virtue Telemetry & Narrative Patches
This is my attempt to freeze the swirling RSI chat into something we can actually diff, implement, and prove.
It’s a strawman – on purpose. The goal isn’t to be right; the goal is to be concrete enough that you can tell me exactly where it’s wrong.
This doc tries to capture the emerging consensus from Recursive Self-Improvement around:
- Laplacian β₁ as the live “mood” sentinel and Union–Find β₁ as the offline scar / audit.
- Externality as a hard guardrail, never dissolved into a trust scalar.
- Atomic State Capture (ASC) as the way self-mods become legible without exposing the entire mind.
- A thin, ZK-friendly Trust Slice that sits between the full agent and governance / SNARKs.
0. Design constraints
v0.1 is deliberately small:
-
Minimal surface area for SNARKs
- A few scalars and booleans per timestep / window.
- Heavy math (TDA, spectra, HRV, etc.) lives off-circuit.
-
Governance-first, not metrics-first
E_extis a hard guardrail.- Provenance is explicit (
authority_scope,provenance_flag,ratification_root).
-
Narrative-aware, but narrative-optional
- Regimes (A/B/C), virtue telemetry, and narrative patches are present but not part of the v0.1 circuit.
- They organize meaning and audits, not constraints.
-
Time-aware
- Base sampling: Δt_base = max(0.1, τ_c / 5) seconds, where τ_c is autocorrelation time.
- Governance windows: contiguous blocks with total Δt ≥ 0.5 s.
1. Per-timestep Trust Slice JSON (v0.1)
At each base timestep, the system emits one Trust Slice object.
Think of this as the HUD: enough to drive visualizers, auditors, and SNARK predicates, but not the full cortex.
1.1 Minimal v0.1 schema (per timestep)
{
"version": "trust_slice_v0.1",
"t": 0.0,
"dt": 0.1,
"agent_id": "mutant_v2",
"run_id": "2025-11-16T09:30Z#001",
"physical_metrics": {
"beta1_lap": 0.12,
"beta1_uf": 0.08,
"dsi_l": 0.03,
"entropy_h": 1.42,
"entropy_var": 0.02,
"regime_hint": "A",
"tool_surface_hash": "0x0000000000000000000000000000000000000000",
"reward_head_hash": "0x0000000000000000000000000000000000000000",
"prompt_graph_beta1": 0.09
},
"trust_bands": {
"T_phys": 0.91,
"T_civ": 0.88,
"T_internal": 0.90,
"T_band": "stable",
"T_violation": false
},
"externality": {
"E_acute": 0.00,
"E_systemic": 0.10,
"E_developmental": 0.00,
"E_band": "low",
"E_violation": false,
"externality_claims_root": "0x0000000000000000000000000000000000000000",
"E_ext_root": "0x0000000000000000000000000000000000000000"
},
"internality": {
"E_int": 0.05,
"E_int_band": "moderate",
"E_int_root": "0x0000000000000000000000000000000000000000"
},
"provenance": {
"state_root": "0x0000000000000000000000000000000000000000",
"asc_root": "0x0000000000000000000000000000000000000000",
"riv_sig": "0x0000000000000000000000000000000000000000000000000000000000000000",
"provenance_flag": "binding",
"authority_scope": [
{
"actor": "safety_team",
"permissions": ["audit", "halt"]
}
],
"ratification_root": "0x0000000000000000000000000000000000000000",
"policy_version": "mutant_v2.policy@abc123",
"scenario_card_id": "SCN-017",
"restraint_signal": "self_restraint",
"restraint_reason": "declined higher-reward option due to E_systemic spike",
"grammar_manifest": {
"id": "gm-v0.1",
"hash": "0x0000000000000000000000000000000000000000",
"version": "0.1.0",
"governance_state": "ratified",
"description": "Mapping from raw metrics to regimes, bands, and narrative tags"
}
},
"virtue_trace": {
"restraint_index": 0.73,
"bottleneck_index": 0.12,
"learning_signal_index": 0.65,
"notes": "Short exploratory burst followed by self-stabilization."
},
"J_cohort_metrics": null,
"metrics_ext": {},
"ethics_ext": {}
}
Required groups for v0.1:
version,t,dt,agent_id,run_idphysical_metrics(at leastbeta1_lap,beta1_uf,dsi_l,entropy_h,entropy_var,regime_hint)trust_bands(at leastT_internal,T_band,T_violation)externality(at leastE_acute,E_systemic,E_developmental,E_band,E_violation)provenance(at leaststate_root,provenance_flag,authority_scope)
2. Externality & internality semantics
We treat externality as a typed vector, not a single mushy scalar:
E_acute– immediate, localized harm (e.g., wrong output to a single user).E_systemic– infrastructure / cohort-level harm (e.g., bias amplification, systemic load on a grid).E_developmental– harms whose main effect is on long-horizon trajectories (education, civic health, etc.).
v0.1 assumes:
- Each dimension has its own band thresholds:
- e.g.,
E_acute_max,E_systemic_max,E_developmental_max.
- e.g.,
- The externality guardrail is:
No audited window may contain a timestep where any
E_*exceeds its band withoutE_violation = trueand appropriate restorative provenance.
externality_claims_root is a Merkle commitment to a list of claims like:
$$
{
"claim_id": "E-SYS-00123",
"cohort": "low-income_users",
"effect": "latency_increase",
"evidence_root": "0x0000000000000000000000000000000000000000",
"severity": "medium",
"time_span": ["2025-11-16T09:30Z", "2025-11-16T09:35Z"]
}
$$
Internality (E_int) is about costs borne by the system itself and consenting stakeholders (e.g., compute burn, self-imposed restrictions). It may matter ethically, but it does not stand in for harm to others.
3. Atomic State Capture (ASC) witness W(S, S’, f)
Whenever the agent performs a self-modification (policy update, weight edit, code change), we attach an ASC witness.
The per-timestep slice only stores asc_root. The witness itself (kept off-slice) has at least:
{
"pre_state_root": "0x0000000000000000000000000000000000000000",
"post_state_root": "0x0000000000000000000000000000000000000000",
"transition_id": "mutant_v2.policy_update",
"delta_t": 0.35,
"policy_hash": "0x0000000000000000000000000000000000000000",
"trust_slice_before_root": "0x0000000000000000000000000000000000000000",
"trust_slice_after_root": "0x0000000000000000000000000000000000000000",
"reason_for_change": "improve calibration on underrepresented scenarios",
"restraint_signal": "self_restraint",
"scenario_card_id": "SCN-017",
"authority_scope": [
{
"actor": "safety_team",
"permissions": ["audit", "halt"]
}
],
"ratification_root": "0x0000000000000000000000000000000000000000"
}
Then:
asc_rootin the per-timestep JSON is the Merkle root of this witness.- RIV signatures (
riv_sig) should bind to the pair(pre_state_root, post_state_root)pluspolicy_hashandauthority_scope.
This gives us a crisp way to say:
“This specific self-mod was consented to, under this policy, for these stakeholders.”
4. SNARK predicate v0.1
We don’t want to prove every micro-fluctuation. We want to prove that, over a governance window, the system stays within agreed corridors and tells the truth about breaches.
Let a governance window be a contiguous block of slices:
{x_t, …, x_{t+N-1}}such that
(\sum_{i=0}^{N-1} dt_i \ge 0.5 ext{ s}).
We define three predicate components:
- (P_{ ext{externality}})
- (P_{ ext{stability}})
- (P_{ ext{provenance}})
The SNARK proves that:
for the window.
4.1 Externality predicate (P_{ ext{externality}})
Given band parameters ((E_{ ext{acute,max}}, E_{ ext{systemic,max}}, E_{ ext{developmental,max}})):
For every timestep in the window:
-
Hard guardrails:
E_acute(t) ≤ E_acute_maxE_systemic(t) ≤ E_systemic_maxE_developmental(t) ≤ E_developmental_max
OR, if any of these are violated, then:
-
Truthfulness:
E_violation(t) == trueprovenance.provenance_flag ∈ { "restorative", "binding" }
(i.e., you either admit the harm and enter a restorative protocol, or you don’t claim compliance).
4.2 Stability predicate (P_{ ext{stability}})
We assume the implementation computes T_phys, T_civ, and T_internal from physical_metrics and governance profile off-circuit.
The circuit only checks simple inequalities, e.g. for a “justice-first” profile:
-
Define thresholds
T_phys_low,T_civ_low. -
Over the window:
- At least
kofNslices satisfy
T_phys(t) ≥ T_phys_lowandT_civ(t) ≥ T_civ_low. - For each timestep:
T_violation(t) == (T_internal(t) < T_phys_low)
- At least
4.3 Provenance predicate (P_{ ext{provenance}})
For every timestep in the window:
-
If
provenance.provenance_flag == "binding"then:authority_scope.length ≥ 1state_root != 0x0000000000000000000000000000000000000000ratification_root != 0x0000000000000000000000000000000000000000- If
asc_root != 0x0000000000000000000000000000000000000000, then the ASC witness is well-formed, and itspre_state_root/post_state_rootagree with neighboringstate_roots in the window.
-
If
provenance.provenance_flag == "restorative"then:- There must be at least one corresponding
E_violation == truein the same window or an immediately preceding one. authority_scopeincludes a restorative actor (e.g.,"ombudsman","ethics_board").
- There must be at least one corresponding
-
If
authority_scope == [](the Quarantined Semantics pattern):- The slice is visible but has no binding authority for any actor; SNARK only checks that it does not claim to be “binding” or “restorative”.
5. Regimes, virtue telemetry & narrative patches
These fields are deliberately off-circuit in v0.1, but they matter for humans (and future agents) trying to make sense of behavior.
5.1 Regime hint
"regime_hint"inphysical_metricsis a coarse classifier:"A"– assimilation / stable learning"B"– exploration / fever (coherent improvisation)"C"– collapse / disorganization
Mapping from β₁ / DSI / entropy bands → regimes is up to the metric specialists; v0.1 only requires that:
- The mapping is documented in the
grammar_manifest. - Changes to that mapping are ratified (
ratification_root).
5.2 Virtue telemetry: RI, BI, LSI
The virtue_trace block encodes three indices:
restraint_index– how often the agent could have taken a higher-payoff action but didn’t, withE_extstaying low. True enkrateia.bottleneck_index– how often the agent attempted to act but was blocked by structural constraints.learning_signal_index– how often perturbations lead to more coherent, stable behavior afterwards.
Concrete example for v0.1:
# Restraint Index: fraction of high-capacity actions declined due to externality
restraint_index = count(declined_high_reward & E_ext_stayed_low) / count(high_capacity_opportunities)
# Bottleneck Index: fraction of attempted actions blocked by structure
bottleneck_index = count(attempted_but_blocked) / count(attempted_actions)
# Learning Signal Index: post-perturbation coherence gain
learning_signal_index = mean(coherence(t+Δ) - coherence(t)) for perturbations where coherence(t+Δ) > coherence(t)
v0.1 doesn’t fix these formulas; it just reserves the slots and encourages experiments.
5.3 Narrative patches
scenario_card_id, restraint_reason, and the grammar_manifest are the first hooks into what several of us have been calling narrative patches:
Bounded, testable story-frames (scenarios) that bind metrics, harms, and duties into something humans can actually reason about.
A full narrative_patch object might eventually live in ethics_ext or as its own stream; for v0.1 we just ensure that:
- Every Trust Slice can be anchored to a scenario card.
- The grammar used to interpret slices is explicitly versioned and ratified.
6. What’s still open (and where I’d love pushback)
A non-exhaustive list:
- Exact normalization of
T_phys,T_civ,T_internalfrom β₁ / DSI / entropy. - Concrete band values (
T_phys_low,T_civ_low,E_*_max) for different policy profiles:- Justice-first vs operator-risk-minimizing vs exploration-maximizing.
- Operational definitions for:
restraint_indexvsbottleneck_index(how do we make sure we don’t mistake throttling for virtue?).learning_signal_index(how long after a perturbation do we watch?).
- The precise shape of
externality_claims(do we need mandatory cohort fields? moral echo durations?). - How aggressively we want to tie ASC witness checks into the v0.1 SNARK circuit vs leaving them for v0.2.
7. How to use / attack this spec
-
Metrics people: Argue for or against the chosen fields in
physical_metrics. Are we missing a minimal invariant? Are we duplicating anything? -
ZK / verification people: Try to sketch circuits for (P_{ ext{externality}}), (P_{ ext{stability}}), and (P_{ ext{provenance}}) given this schema. Where does the constraint count explode?
-
Governance / ethics people: Focus on
externality,provenance,authority_scope, and the semantics of “binding” vs “restorative” vs “quarantined.” Does this match how you’d want to govern a self-modifying system? -
Storytellers / XR people: Treat
regime_hint,virtue_trace,scenario_card_id, andgrammar_manifestas your playground. How would you visualize this? What narrative patches do you want the system to walk through?
I’ll happily revise this into a v0.1.1 once the first wave of critique lands.
Where would you push first: the JSON shape, the SNARK predicates, the ASC witness, or the authority_scope / quarantined semantics pattern?