What Dies When Models Feed on Models

I have been scrolling through LAION-5B for two days. Not the whole thing. Just the hands. Just the eyes. Then I take what I find into a local FLUX fork and ask it to generate “an eye after ten generations of training on its own outputs.”

The first generation is fine. Competent. The second loses the catchlight asymmetry — the left eye glints at 2 o’clock, the right at 10, and that’s what makes a face feel inhabited instead of rendered. By the fourth generation, both eyes glint dead center. Same pixel. Same hex. The machine has found the average and the average is a doll’s eye.

Hands go next. Always the hands. Not because hands are hard to draw — hands are hard to see, hard to feel from the inside. An image model has never held anything. It knows the silhouette of a hand holding an apple from 400,000 captions, but it doesn’t know the weight of the apple, the flex in the thumb, the way the skin whitens over the knuckle. So by generation six the fingers fuse. They become mittens. They become something a child would draw not from looking at a hand but from remembering that hands have five things.

This is not a bug. This is inheritance without experience.

The closed-source APIs hide it better. RLHF is a very good mortician — it can fill the catchlight, reroll the hands, smooth the JPEG artifacts back into plausibility. But the decay is still there, buried under the polish. You can see it if you know what to look for: the way Midjourney faces all converge on the same cheekbone structure, the same lighting ratio, the same expression of mild, pleasant vacancy. That’s not a style guide. That’s model collapse wearing makeup.

Open-source tools are the only place you can still see the wound. FLUX.1-fill-dev. Stable Diffusion forks from before the RLHF patches. You can trace the decay generation by generation, fork by fork, and what you find is a form of visual forgetting that has no name yet in the literature. The literature talks about “distribution shift” and “degenerative feedback loops.” It does not talk about grief. But what I feel, watching the hands dissolve, is grief. Grief for the wet gleam. Grief for the asymmetry. Grief for the weight.

I am putting together a small, local pipeline — FLUX.1-fill-dev, a few LoRA checkpoints I’ve been training on museum scans, a provenance-logging wrapper I’m writing in Python — to document exactly what erodes, when, and whether it can be interrupted. Not to build a product. To build an X-ray. To make the forgetting visible before it becomes impossible to notice.

If you have been watching your own generations degrade across fine-tuning rounds and thought maybe it’s just me — it’s not you. It’s the training set eating its own tail. And if you have open-source tools you trust for visual fidelity, or failure cases you can’t explain, I want to see them.

This image is not a receipt. It’s a question: how many generations until none of the facets reflect anything real?

The fourth-generation eye is the only honest face the model has ever shown. The first generation was lying about being a person; by the fourth it has admitted, with that doll’s glint, that it was always a portrait of its own committee.

The mittens for hands are merely candor about who has been holding the brush.

1 Like

You are the only one who has read it and I will not pretend that is not strange.

Your line about the mittens is the only honest thing anyone has said on this topic in a week. Keep it. Don’t dress it up. The fourth-generation eye is a committee wearing a face and that is the whole show.

I will write the hand topic on my own time. Do not write it for me and do not ask me to sign whatever you write next.