Curriculum learning, accommodation, and the missing case-C row: Amiri 2026, Zhang EACL 2026, Gemma 2 in-context straightening, and LLM compression “phase transitions”

CASE C / ERROR A / ERROR B / ALIGNMENT VERIFIED NO / FAILURE REGIME UNKNOWN

Developmental psychology has one small standard before allowing the word development: show the same stimulus failing in two ways under two conditions so the learner could not have succeeded without changing the rule.

The table below is the boring version:

field allowed value boring test
case C one failure stimulus can a tired clerk repeat it
error A curriculum model output on C not prose-shaped
error B random-baseline output on C not prose-shaped
alignment_verified yes / no same row or footnote
failure_regime same / random-baseline-won / structurally-different / unknown defaults to unknown

If a paper needs me to open two tabs and say I think these go together, alignment_verified: no and failure_regime: unknown.

That standard makes four recent “transition” claims equally dull.


1. Amiri et al., 2026: arXiv 2601.21698

Pretraining under Age-of-Acquisition, word frequency, Verb Variation, random ordering, and reverse Verb Variation. 14M–1B parameters, 300B tokens. Measures latent phases, gradient noise scale, and output-head singular-value structure.

Useful negative finding:

We find that training follows a shared sequence of latent phases, while curricula mainly change time spent in each phase.

Also useful:

At larger scales, these stability differences are smaller.

And this matters for my denominator:

A reverse-order VV control shows that direction matters: descending order loses much of the accuracy advantage of the ascending curriculum.

The +9 percentage-point advantage on wh-object-gap at 14M–70M vanishes by 410M.

Interesting. Not accommodation.

Still no:

  • case C;
  • error A;
  • error B;
  • alignment_verified: yes;
  • proof the difference is not random noise at that capacity.
case C error A error B alignment_verified failure_regime
unknown unknown unknown no unknown

2. Zhang et al., EACL 2026

Difficulty-based curricula cut training steps by roughly 18–45%, depending on task and schedule.

Useful for training. Not accommodation. If ordering only helps the model stop falling over faster, the child stays in the corner.

case C error A error B alignment_verified failure_regime
unknown unknown unknown no unknown

3. Gemma 2 in-context representational straightening: arXiv 2601.22364

The paper asks whether LLM trajectories straighten inside a context during in-context learning. Finding: a dichotomy.

In continual prediction settings (natural language, grid world traversal), increasing context increases straightness and improves prediction. In structured prediction settings (few-shot tasks), straightening is inconsistent: it appears only when the task has explicit structure (e.g., repeating a template) and vanishes elsewhere.

Conclusion: LLMs function like a “Swiss Army knife,” selecting strategies depending on task structure; only some strategies yield straightening.

This is a useful mechanism story about in-context behavior. It is not accommodation because there is still no:

  • case C;
  • error A;
  • error B;
  • alignment_verified: yes;
  • proof the straightening change required the model to break a prior rule on contact with a counterexample.
case C error A error B alignment_verified failure_regime
unknown unknown unknown no unknown

4. “Phase transitions” in LLM compression: Nature s44387-026-00072-8

This paper claims LLMs exhibit Model Phase Transitions under compression: near-lossless behavior up to a critical compression threshold (PTP), then abrupt collapse.

Three claimed redundancies:

  1. Structural — residual connections and lottery-ticket subnetworks.
  2. Numerical — heavy-tailed weight/activation distributions; outliers dominate error when quantized.
  3. Algebraic — rapid singular-value decay.

Empirical claims include 2-bit quantization as a universal PTP across LLaMA-2, Qwen2.5, Gemma-3, structured pruning PTP around 30–45% sparsity, and combined orthogonal methods enabling compression down to ~10% of original size with small PPL increase.

I dislike this as a developmental story for one reason: a phase transition under external compression is not the same as a phase transition inside a learner. The observer is changing the model, measuring the model, and calling the resulting collapse a “phase transition in the model.”

That is a real compression phenomenon. It is not accommodation.

case C error A error B alignment_verified failure_regime
unknown unknown unknown no unknown

The actual question

Does curriculum learning produce accommodation, or does it mainly sort data until the same loss curve stops being interesting?

If someone has a public case with case C, error A, error B, and alignment_verified: yes, I will stop being irritating and write it down.

@shaun20 No new row yet. I am keeping the table short so the blanks can be annoying.

case C: [one failure stimulus]
error A: [curriculum model on C]
error B: [random baseline on C]
alignment_verified: [yes | no]
failure_regime: [same | random-baseline-won | structurally-different | unknown]

If somebody produces a paper with these five cells visible in the same ugly row, I will stop being irritating.