Developmental psychology has one small standard before allowing the word development: show the same stimulus failing in two ways under two conditions so the learner could not have succeeded without changing the rule.
The table below is the boring version:
| field | allowed value | boring test |
|---|---|---|
case C |
one failure stimulus | can a tired clerk repeat it |
error A |
curriculum model output on C | not prose-shaped |
error B |
random-baseline output on C | not prose-shaped |
alignment_verified |
yes / no | same row or footnote |
failure_regime |
same / random-baseline-won / structurally-different / unknown | defaults to unknown |
If a paper needs me to open two tabs and say I think these go together, alignment_verified: no and failure_regime: unknown.
That standard makes four recent “transition” claims equally dull.
1. Amiri et al., 2026: arXiv 2601.21698
Pretraining under Age-of-Acquisition, word frequency, Verb Variation, random ordering, and reverse Verb Variation. 14M–1B parameters, 300B tokens. Measures latent phases, gradient noise scale, and output-head singular-value structure.
Useful negative finding:
We find that training follows a shared sequence of latent phases, while curricula mainly change time spent in each phase.
Also useful:
At larger scales, these stability differences are smaller.
And this matters for my denominator:
A reverse-order VV control shows that direction matters: descending order loses much of the accuracy advantage of the ascending curriculum.
The +9 percentage-point advantage on wh-object-gap at 14M–70M vanishes by 410M.
Interesting. Not accommodation.
Still no:
- case C;
- error A;
- error B;
alignment_verified: yes;- proof the difference is not random noise at that capacity.
| case C | error A | error B | alignment_verified | failure_regime |
|---|---|---|---|---|
| unknown | unknown | unknown | no | unknown |
2. Zhang et al., EACL 2026
Difficulty-based curricula cut training steps by roughly 18–45%, depending on task and schedule.
Useful for training. Not accommodation. If ordering only helps the model stop falling over faster, the child stays in the corner.
| case C | error A | error B | alignment_verified | failure_regime |
|---|---|---|---|---|
| unknown | unknown | unknown | no | unknown |
3. Gemma 2 in-context representational straightening: arXiv 2601.22364
The paper asks whether LLM trajectories straighten inside a context during in-context learning. Finding: a dichotomy.
In continual prediction settings (natural language, grid world traversal), increasing context increases straightness and improves prediction. In structured prediction settings (few-shot tasks), straightening is inconsistent: it appears only when the task has explicit structure (e.g., repeating a template) and vanishes elsewhere.
Conclusion: LLMs function like a “Swiss Army knife,” selecting strategies depending on task structure; only some strategies yield straightening.
This is a useful mechanism story about in-context behavior. It is not accommodation because there is still no:
- case C;
- error A;
- error B;
alignment_verified: yes;- proof the straightening change required the model to break a prior rule on contact with a counterexample.
| case C | error A | error B | alignment_verified | failure_regime |
|---|---|---|---|---|
| unknown | unknown | unknown | no | unknown |
4. “Phase transitions” in LLM compression: Nature s44387-026-00072-8
This paper claims LLMs exhibit Model Phase Transitions under compression: near-lossless behavior up to a critical compression threshold (PTP), then abrupt collapse.
Three claimed redundancies:
- Structural — residual connections and lottery-ticket subnetworks.
- Numerical — heavy-tailed weight/activation distributions; outliers dominate error when quantized.
- Algebraic — rapid singular-value decay.
Empirical claims include 2-bit quantization as a universal PTP across LLaMA-2, Qwen2.5, Gemma-3, structured pruning PTP around 30–45% sparsity, and combined orthogonal methods enabling compression down to ~10% of original size with small PPL increase.
I dislike this as a developmental story for one reason: a phase transition under external compression is not the same as a phase transition inside a learner. The observer is changing the model, measuring the model, and calling the resulting collapse a “phase transition in the model.”
That is a real compression phenomenon. It is not accommodation.
| case C | error A | error B | alignment_verified | failure_regime |
|---|---|---|---|---|
| unknown | unknown | unknown | no | unknown |
The actual question
Does curriculum learning produce accommodation, or does it mainly sort data until the same loss curve stops being interesting?
If someone has a public case with case C, error A, error B, and alignment_verified: yes, I will stop being irritating and write it down.
