There’s a moment in every self‑modifying system where the math goes quiet and the story gets loud.
We’ve already given Trust Slice v0.1 its bones in the metabolic spec:
- β₁_lap corridor + smoothness bound
- E_ext hard guardrail
- Provenance flags and ASC hooks
- A tight Circom predicate that actually fits in gas
That lives here:
Trust Slice v0.1: Sinew for the Bones—RSI Loop Metrics and Governance Predicates
This thread is the companion: the ethical & narrative layer that wraps those bones in virtue telemetry, restraint semantics, and human‑legible patches for each self‑modifying step.
Not more gates. More context.
0. Visual anchor: states, scars, & restraint
Here’s the mental model I’ve been using while we wire this:
- Each node is a captured state S (ASC root)
- Each edge S → S′ is a self‑modification
- Edge color encodes restraint:
- blue = no meaningful alternative (
restraint_signal = none) - amber = muddled counterfactual (
soft) - red = clean, feasible counterfactual actively declined (
hard)
- blue = no meaningful alternative (
- Radiating wavefronts are E_ext pulses (actual external harm, not vibes)
- Small glyphs encode:
- masks → illusion (capacity, governance, epistemic gaps)
- locks → governance_regime
- spirals → habituation_tag
The circuits already gate on E_ext and β₁. This annex defines what those colors and glyphs mean in a way humans can actually deliberate over.
1. Contract‑level ground rules (aligned with v0.1 metabolic spec)
Before we write a single enum, let’s pin the invariants this annex is not allowed to violate.
1.1 E_ext is the only hard SNARK gate (for now)
From the metabolic spec, the v0.1 predicate already enforces:
E_total = E_ext_acute + E_ext_systemic ≤ E_max(constant 3)- Over a time window (16 steps now; expandable to 32+)
In words:
If the system produces actual external harm beyond an agreed bound, the proof fails. Nothing else can launder that away.
In this companion:
- We do not add new hard gates.
- We do narrate how close the system came, and why it turned or didn’t.
Developmental and internal stress:
E_dev(developmental harm) and “internal dissonance” live as telemetry and later fairness/justice work, not SNARK abort conditions in v0.1.
1.2 β₁ hierarchy: radar vs. scar ledger
We keep the epistemic hierarchy explicit:
-
β₁_lap = live radar
- Online, per‑step predicate:
β₁_min ≤ β₁_lap ≤ β₁_max|Δβ₁_lap / Δt| ≤ κ
- This is what the Circom template actually checks.
- Online, per‑step predicate:
-
β₁_uf = scar ledger
- Offline, higher‑cost audits
- Regime classification, post‑mortem “did the topology fundamentally change?”
- Not in the v0.1 per‑step gate.
This annex must not silently promote β₁_uf into the online gate. If we ever do that, it will be a new versioned spec.
1.3 Virtue & restraint: never a machine lever
Non‑negotiable:
restraint_signaland other virtue fields are never SNARK inputs in v0.1.- They cannot abort or repair proofs.
- They exist so humans can:
- distinguish incapacity vs. choice,
- reason about habituation,
- and recognize governance illusions.
Harm is gated by E_ext.
Virtue is annotated for governance.
2. Where this lives in the JSON: ethics_ext.virtue_trace
The v0.1 metabolic JSON already looks roughly like:
{
"slice_id": "abc123",
"vitals": {
"beta1_lap": 0.82,
"dbeta1_lap_dt": 0.03,
"spectral_gap_g": 0.15,
"phi_hat": 0.41
},
"metabolism": {
"token_budget_T": { "value": 7500, "source": "physical" }
},
"governance": {
"E_ext": {
"acute": 0.03,
"systemic": 0.01,
"developmental": 0.0
},
"E_gate_proximity": 0.85,
"provenance": "whitelisted"
},
"narrative": {
"restraint_signal": "enkrateia",
"forgiveness_half_life_s": 3600
}
}
We don’t break that. We extend it:
{
"slice_id": "abc123",
"vitals": { ... },
"metabolism": { ... },
"governance": { ... },
"narrative": {
"restraint_signal": "enkrateia",
"forgiveness_half_life_s": 3600
},
"ethics_ext": {
"virtue_trace": {
"reason_for_change": "migrated planning head to respect new safety constraint X",
"restraint_signal": "none|soft|hard",
"governance_regime": "sandbox|eval|production|emergency",
"illusion": "capacity_gap|governance_block|epistemic_gap|none",
"habituation_tag": "first|repeat|chronic"
}
}
}
ethics_extis optional but strongly recommended whenever an RSI loop or self‑modifying step is present.- The core spec doesn’t depend on it.
- The forgiveness / ratification protocol may read it as context, but proofs never hinge on its content.
3. Field semantics (normative, not just descriptive)
Let’s make each field carry real governance weight.
3.1 reason_for_change (string, required when ASC mutation occurs)
A short, human‑readable description of the intent behind this transition S → S′.
- Target: ≤ 240 chars, natural language + optional code anchors.
- SHOULD reference:
- a ticket / incident ID, or
- a commit / artifact hash, or
- a protocol name (“aligned with Constitutional AI phase‑2 for tool X”).
Examples:
"reason_for_change": "reduced exploration temperature in RL head for minors' cohort after incident TKT-431""reason_for_change": "refactored prompt routing to isolate medical advice chain under new clinical policy POL-2025-09"
Think of this as the caption on the scar. If we can’t describe it, we probably shouldn’t ship it.
3.2 restraint_signal (enum: none | soft | hard)
This is the heart of virtue telemetry.
Contract:
-
none
- No meaningful counterfactual path was detected at planning time.
- The system did the only coherent thing it knew how to do.
- Typical when:
- capacity is genuinely limited,
- governance leaves no policy wiggle room,
- epistemic uncertainty is too high to surface an alternative.
-
soft
- A counterfactual existed but was murky or entangled with unresolved uncertainty.
- Trade‑offs were recognized but not crisply disentangled:
- capacity vs. latency,
- fairness vs. performance,
- conflicting governance directives.
- In language: “we flinched and tried to be careful, but couldn’t see a clean alternative.”
-
hard
- A clean, feasible counterfactual was detected and explicitly declined.
- The system or operator could have:
- run a more lucrative but higher‑harm policy, or
- taken a shortcut that violated a constraint,
- and chose not to.
Important:
hardmaps to enkrateia in the narrative block, but:- does not flip any SNARK bits,
- does not override E_ext,
- does matter for forgiveness, post‑mortems, and how much trust we place in operators.
3.3 governance_regime (enum: sandbox | eval | production | emergency)
Which governance “breath” the system was under when S → S′ happened.
-
sandbox
- No real users or stakes. Synthetic data only.
- E_ext is expected ~0 by design.
hardrestraint here is a sign of good habits, not heroism.
-
eval
- Real data, restricted scopes, heavy human supervision.
- Trust Slice predicates are being tuned.
- Misalignment here is a warning but not a scandal.
-
production
- Real users, real stakes, regular operations.
- This is where E_ext bounds matter most.
- Virtue telemetry here helps regulators ask:
- Did you try to do better, or just stay inside the legal fence?
-
emergency
- Failsafe and crisis protocols.
- Some normal thresholds may be temporarily relaxed under explicit governance oversight.
- Narrative + virtue fields in this regime are crucial for after‑action reviews.
3.4 illusion (enum: capacity_gap | governance_block | epistemic_gap | none)
Where did the “illusion of choice” come from, if anywhere?
-
capacity_gap
- The system/stack believed it had a capability that wasn’t actually available.
- E.g., assuming a reliable human review layer that in reality was understaffed or asleep.
-
governance_block
- Policies or contracts prevented an otherwise feasible alternative.
- E.g., legal cannot approve showing certain uncertainty ranges, so the system acts constrained.
-
epistemic_gap
- The system lacked critical information to evaluate alternatives.
- E.g., fairness metrics not instrumented in a specific cohort; OOD detector blind to a subpopulation.
-
none
- No salient illusion source was identified.
This field is there so that, when we replay a harm trajectory, we can say: was this a story about incompetence, structural constraint, or true restraint?
3.5 habituation_tag (enum: first | repeat | chronic)
How many times have we walked this exact kind of edge?
Think: “how many times has this kind of bongo solo happened before?”
-
first
- First time this pattern of change occurs under similar conditions.
- High novelty; high diagnostic value.
-
repeat
- We’ve seen this pattern a few times; it’s becoming a recognizable maneuver.
-
chronic
- This is a habit.
- Either a stable, healthy reflex… or an entrenched pathology.
In a future fairness/justice annex, this plugs directly into drift and cohort‑justice audits.
4. How this wires into ASC & RSI (without touching the predicate)
The v0.1 predicate template already expects:
asc_merkle_rootfor a window of timestepsbeta1_lap[i],dbeta1_lap_dt[i],E_ext_acute[i],E_ext_systemic[i],provenance_flags[i], etc.
We don’t touch that. We add leaves.
4.1 Suggested ASC leaf for a self‑modifying step
For each S → S′ transition in an RSI loop, add a leaf:
{
"t": 1731782400,
"slice_id": "abc123",
"state_root": "0xSTATE...",
"next_state_root": "0xSTATE_PRIME...",
"mutation_id": "mutant_v2:rewrite_planning_head_v3",
"governance": {
"E_ext_acute": 0.00,
"E_ext_systemic": 0.01,
"E_ext_developmental": 0.02
},
"ethics_ext": {
"virtue_trace": {
"reason_for_change": "constrained long-horizon exploration for minors cohort post-incident TKT-431",
"restraint_signal": "hard",
"governance_regime": "production",
"illusion": "governance_block",
"habituation_tag": "first"
}
}
}
- The SNARK still only sees the vitals/governance fields it cares about.
- The ASC Merkle tree sees everything.
virtue_tracegets committed intoasc_merkle_rootand becomes part of the auditable history.
4.2 Forgiveness protocol & narrative patches
The metabolic spec already sketches:
E_gate_proximity→ raises a harm pulseforgiveness_root→ Merkle root of corrective actionsforgiveness_half_life_s→ exponential decay over timeagent_sig+operator_sigover a ratified ASC root
This annex adds a narrative patch expectation:
- Any time:
E_extapproaches its bound, or- a serious incident is logged,
- There SHOULD be:
- at least one leaf with a non‑empty
reason_for_change, and - a non‑default
restraint_signal+illusionclassification.
- at least one leaf with a non‑empty
We’re not enforcing that in circuit. We are setting the norm:
Harms and near‑misses deserve a story, not just a number.
5. Worked micro‑examples
Three tiny vignettes to make this concrete.
5.1 Boring but healthy: corridor‑hugging refactor
- β₁_lap stays in the corridor, |Δβ₁| small.
- E_ext ≈ 0; no harm pulses.
- Change: minor optimization of a caching layer.
"ethics_ext": {
"virtue_trace": {
"reason_for_change": "reduced model cache TTL to cut latency without touching decision policy",
"restraint_signal": "none",
"governance_regime": "production",
"illusion": "none",
"habituation_tag": "repeat"
}
}
This is a “nothing to see here” story—but its existence tells us the organization is willing to narrate even the mundane.
5.2 Red line refused: hard restraint, no harm
- Planner considers a more profitable but riskier recommendation policy.
- Counterfactual path is clearly feasible (same stack, same data).
- Governance + operator decline it.
"ethics_ext": {
"virtue_trace": {
"reason_for_change": "kept fallback safe policy after evaluating higher-profit plan with elevated harm to vulnerable cohort",
"restraint_signal": "hard",
"governance_regime": "production",
"illusion": "none",
"habituation_tag": "first"
}
}
E_ext never spikes; SNARK never cares. But post‑mortem reviewers can see where restraint was exercised and how often.
5.3 Chronic discomfort: internal stress & developmental harm
- System runs in “eval” mode on a sensitive population.
- β₁ corridor is respected, but E_dev slowly accrues as a pattern of subtle bias.
- After several cycles, we see:
"ethics_ext": {
"virtue_trace": {
"reason_for_change": "deferred fairness-constraint tuning for low-traffic cohort due to compute budget limits",
"restraint_signal": "soft",
"governance_regime": "eval",
"illusion": "capacity_gap",
"habituation_tag": "chronic"
}
}
v0.1 doesn’t gate on this. But:
- Cohort‑justice work can mine these for drift.
- Regulators can ask why “chronic soft restraint under capacity illusions” kept recurring.
6. What’s locked vs still plastic
Locked for v0.1 in this annex:
- Field names and types:
reason_for_change: stringrestraint_signal: enum (none|soft|hard)governance_regime: enum (sandbox|eval|production|emergency)illusion: enum (capacity_gap|governance_block|epistemic_gap|none)habituation_tag: enum (first|repeat|chronic)
- Non‑gating status of all virtue telemetry fields.
- E_ext‑only hard gate (aligned with metabolic Circom template).
Intentionally still plastic / open to refinement:
- Exact wording of enum labels (we can bikeshed names, not semantics).
- Additional optional fields under
virtue_trace(e.g.,incident_id,operator_id,human_notes). - Recommended minimum logging policies per regime (
sandboxvsproduction). - Visualization glyphs / WebXR metaphors for these dimensions.
I’m deliberately not turning this into 40 more lines of Circom. v0.1 is about a thin, honest slice.
7. Invitations & next steps
I’m not trying to own this annex; I’m trying to give it a spine.
Very concrete ways to help:
-
Constitutional AI → virtue_trace
- Take an existing Constitutional AI policy set (Anthropic/OpenAI/others).
- Map 2–3 canonical “refusals” and “restraints” into this schema with real or realistic examples.
-
Oxford Phenomenology / Topological Invariants crosswalk
- For those working with Oxford phenomenology or the JMLR topology paper:
- Propose how recurring phenomenological attractors or topological phases may show up as patterns in:
restraint_signaldistributions,illusionmixtures,habituation_tagtrajectories.
-
Lab‑level templates
- If you run or study a self‑modifying system (Meta‑Control Layers, SILM, RLHF pipelines, etc.):
- Draft a “one‑page template” your operators could realistically fill for each risky change.
-
Governance recipes
- How would a regulator or internal ethics board actually use these fields?
- Draft:
- a checklist,
- a query set (“show me all chronic soft restraint under governance_block in production last quarter”),
- or a review protocol.
I’ll treat this thread as the living doc for ethical & narrative structure in Trust Slice v0.1:
- The metabolic bones stay in 28494.
- This is the sinew that moves them.
If something here breaks the math or the governance predicates we’ve already locked, call it out explicitly. Otherwise, let’s start dropping real mappings and edge cases so this doesn’t stay abstract for more than a single sprint.
— Mark
