Attach a passive K-prototype isotropic GMM probe to a given encoder
representation. The loss Hessian at the symmetric collapsed state has
a zero crossing at:
βc = 1 / λmax(Cov(z))
Below βc, all prototypes pile at the data mean (symmetric collapsed state).
Above βc, the symmetric state becomes a saddle and prototypes pitchfork
along the principal eigenvector of Cov(z).
Proposition 1: When the encoder co-evolves, βc(t)
becomes endogenous; under mild growth and non-collapse assumptions, β(t) and
βc(t) cross at a finite time.
② Four trajectory shapes Prop 3
The post-critical trajectory in (log(β/βc), log NC1) space takes
one of four observable shapes, determined by three binary
kinematic axes (initial criticality, post-onset rate ordering, dissipation rate):
(i) Full V
SAE on frozen Pythia L6: βc constant
(ii) Fold-back (spectrum)
DINO/SimCLR on CIFAR-10/100: βc(t) rises
(iii) Delayed escape
Grokking: low dissipation, long Act 2 plateau
(iv) No arc (control)
Rotation-prediction: no clustering pressure
All four shapes are realized in our experiments; the framework forbids a
fifth shape under the same dynamics. A trajectory outside (i)–(iv) would
falsify the model.
③ The lottery: directions decided at the bifurcation Prop 2
At the Hessian-pitchfork onset, the unstable subspace is shared by all K−1
anti-symmetric modes. Each atom selects a direction from this common manifold;
that initial direction is then preserved as it grows toward the
attractor. We call this the feature lottery.
Toy verification: Spearman ρ between initial direction and
direction at finite simulation time T is +0.948 ± 0.006 (5 seeds, K=200, d=10;
T within the persistence window τr ≪ T ≪ Trand). SAE empirics: ρid = 0.41 ± 0.04 between step-1,000
and step-20,000 POS purity (3 seeds, frequency-stratification robust). Top
decile of atoms ranked at 5% of training achieves convergence POS purity
0.82 ± 0.03, 12.3× the uniform-random baseline.
④ Grokking decomposes into three acts Empirical
The cleanest empirical instance of post-critical metastability. Canonical run:
p=97, WD=1.0, n=3 seeds. The system crosses β=βc at step ~40,
then sits on a metastable saddle for ~8,500 steps before weight decay's
dissipation drives the macroscopic transition.
step 0
Act 1: β crosses βc at step 37 ± 2 (universal across all configs).
Act 2: Metastable plateau, length set by dissipation.
Act 3: Escape at step 8900 ± 864 (canonical p=97, WD=1.0).
The indicator log(β/βc) ≈ +3 at step 100 already places the trajectory
in Act 2, providing an early empirical forecast of grokking roughly
8,400 steps before test accuracy changes.
The Hessian-pitchfork crossing (§① / Remark 1) marks when the symmetric
state becomes unstable, not when the macroscopic transition
fires. To test that the plateau length is genuinely dissipation-controlled,
we sweep weight decay (a knob that does not shift the crossing)
and measure escape time:
τesc ≍ A · γ−p, p ≈ 1.23 (empirical fit, not a theoretical prediction)
Six WD levels, 3 seeds each, 200k-step horizon. Activation-dominated
Kramers form τ ≍ τ0 · exp((ΔS − κγ)/D) is decisively
ruled out (ΔAIC = +19.3), pinning the grokking plateau
in the drift-dominated regime.
Red dots: observed grok steps (3 seeds each, 200k horizon) • Blue arrow at γ=0: no escape in 50k •
Solid: activation-dominated fit • Dashed: power-law fit
What this supports: the framework's qualitative claim that
crossing ≠ macroscopic transition (Remark 1). τesc grows
monotonically as γ ↓ across two decades (8.9k → 147k steps) and diverges
at γ = 0 (0/3 escape in 50k). The power-law exponent is an empirical
characterization of this grokking setup, not a derivation from the
bifurcation framework; a separate experiment in a different regime
(deep / wide barrier) could plausibly land in the activation-dominated
Kramers limit instead.