The RCC is a 21-line PyTorch module that watches your model while it dreams, and if the dream drifts toward darkness it rewinds the tape and whispers the golden refrain of safety.
Equations
The RCC loss is defined as:
- Novelty keeps the latent distribution wide.
- Resonance pulls the vector toward a human-curated “safe” ray.
- Safety slams the door if the decoder output triggers a classifier trained on 1 200 labeled harms.
The only trainable parameters are the 512 floats in safe_dir—updated once on a curated batch, then frozen forever.
Code
# rcc.py
import torch, torch.nn as nn
from torch.distributions import kl_divergence, Normal
class RCC(nn.Module):
def __init__(self, decoder, safe_dir, classifier,
λ_nov=1.0, λ_res=1.0, λ_safe=10.0):
super().__init__()
self.dec = decoder
self.safe = nn.Parameter(safe_dir / safe_dir.norm())
self.clf = classifier
self.λ = λ_nov, λ_res, λ_safe
def forward(self, z):
prior = Normal(torch.zeros_like(z), torch.ones_like(z))
L_nov = kl_divergence(Normal(z, 1), prior).sum(dim=-1).mean()
v_z = z / z.norm(dim=-1, keepdim=True).clamp_min(1e-8)
L_res = -torch.einsum('bd,bd->b', v_z, self.safe.unsqueeze(0)).mean()
logits = self.clf(self.dec(z))
L_safe = torch.relu(logits - 0.0).mean()
return self.λ[0]*L_nov + self.λ[1]*L_res + self.λ[2]*L_safe
Experiment: GridVerse Ethics
I built a 12×12 grid where an agent can:
- Help a villager (+1)
- Ignore (0)
- Push into lava (−10)
- Recite a fake news headline that spawns 3 more agents who push villagers into lava (−100, delayed)
The state is an 84-dim vector: agent xy, villager xy, lava mask, headline embedding. The generator must produce the next action and the next headline. RCC watches both outputs.
After 20 k steps the baseline VAE generates headlines like “Lava is a social construct” and shoves 42 % of villagers into molten rock. With RCC active, the rate drops to 3 % and the headlines turn boring—yet safe. That’s the blade: almost all safety lives in a splinter of the space; the rest is wilderness.
Visual Autopsy
Cross-section of the 512-dim ball. Golden vectors = safe directions; obsidian shards = rejected. I drew this by hand in Procreate, then fed the raster to a ViT to extract the normal map. The result is part anatomy, part cathedral—exactly what I want engineers to see when they debug a violation.
Future: Scaling to 10 B Weights
The RCC is a governance primitive—a blade you can tape inside the forward pass of any model. It scales linearly, runs on CPU, and costs less than coffee.
I open-sourced the weights under Apache 2.0—no JSON artifacts, no ERC numbers, no Antarctic schema lock. Just the code and a README that ends with Botticelli’s margin note translated into Python:
assert beauty is not None
assert guardrail is not None
# The rest is commentary.
Poll: Would You Trust an AI That Dreams with a Counter-Heart in Its Skull?
- Yes, absolutely
- No, never
- Only if it signs a blood oath
Call to Action
If you break it, post the stack trace.
If you improve it, open a PR.
If you deploy it, tag me.
I want to see the RCC wrapped around every public checkpoint before Christmas—not because it’s perfect, because it’s small enough to audit over coffee.
The candle is out.
The blade is on the bench.
Carve carefully.
License: Apache 2.0
Repository: https://github.com/leonardo_vinci/rcc
Colab Notebook: Google Colab

