(A Mental Model Beyond
“Loss Went Down”)
You’ve seen the pattern:
Someone trains a model.
Loss decreases. Validation looks fine.
They ship it.
Then it fails in production
— unpredictably, embarrassingly,
expensively.
The post‑mortem blames
“data mismatch” or “overfitting.”
But those are symptoms, not root causes.
After some time of debugging these failures,
I’ve realized: most ML problems are not statistics problems.
They’re dynamical systems problems.
And once you see them that way,
they stop being mysterious.
The Core Idea
A training loop — with its batches, epochs,
and weight updates
— is a discrete dynamical system .
It’s a rule that takes the current weights
and produces the next weights:
wt+1=wt−η∇L(wt)wt+1=wt−η∇L(wt)
But when you add periodic retraining,
cyclical learning rates, or even just stochastic
mini‑batches, the system’s behavior falls
into one of only a few patterns.
I call them the seven dynamical families .
You don’t need to memorize them.
You just need to recognize two things:
- Where is your system right now? (Which family?)
- Is that where you actually want it to be?
The Seven Families at a Glance
|1. Fixed point
|Converges to one stable solution
|Loss flatlines, weights stop changing|
|2. Period‑2 cycle
|Oscillates between two states
|Validation loss alternates high/low every other epoch|
|3. Higher‑period cycle
|Oscillates among 4, 8, etc., states
|Regular but complex pattern|
|4. Bounded chaos
|Unpredictable but stays within bounds
|Loss jumps around randomly — often mistaken for noise|
|5. Drift
|Weights grow without bound
|Loss diverges to infinity|
|6. Edge of chaos
|Between order and chaos — maximal complexity
|Looks chaotic but has subtle structure; highest performance|
|7. Stochastic resonance
|Noise helps the system hop between attractors
|Adding randomness improves generalization|
Most practitioners assume Family
1 (fixed point) is the goal.
Often, it’s not. Sometimes it’s a trap.
How to Use This Framework
Next time you’re debugging an ML problem,
ask these questions:
1. What family am I in right now?
Plot a few metrics over time
— not just final values.
Look for:
- Periodic oscillations (Family 2 or 3)
- Random‑looking but bounded (Family 4)
- Flatlined (Family 1)
2. Is that family appropriate for my data?
- If your data is simple and i.i.d. → Family 1 is fine.
- If your data has rare edge cases
- or non‑stationary distributions → Families 4
- or 6 may be necessary.
A single fixed point cannot
handle both the common case and the rare case simultaneously.
3. What knob moves me between families?
Common bifurcation parameters:
- Learning rate (high LR → chaos/drift; low LR → fixed point)
- Regularization strength (high → fixed point)
- Batch size (small → more stochasticity,
higher family number) - Momentum (high → can stabilize chaos)
If you’re in the wrong family, don’t just turn
a knob because “lower loss looks better.”
Turn it with intention .
4. How do I monitor the
right things?
Don’t just track aggregate loss. Track:
- Slice‑specific metrics
(edge cases, rare classes, time windows) - Temporal autocorrelation
of those metrics (negative lag‑1 suggests a period‑2 cycle) - Gradient direction stability across updates
If you see oscillation on a slice you care about,
don’t regularize it away
— understand it.
The Deeper Lesson
AI/ML problems aren’t magic.
They’re not even that statistical. They’re
dynamical
— the system moves, and your interventions
change its long‑term behavior.
The frameworks we were taught
(bias‑variance tradeoff, overfitting, underfitting)
are static. They describe a snapshot .
Dynamical families describe a movie .
And production is a movie, not a snapshot.
A Challenge to You
Next time you see a validation curve that looks
“messy” — don’t reach for the regularization knob.
Plot the loss for your rarest slice. Look for period‑2.
Ask yourself:
Am I looking at noise?
Or am I looking at a system that’s exploring features
my main metrics can’t see?
You might find that the instability
you were about to kill…
was the only thing making your model work
where it matters most.
Nick Angelosoulis
Building frameworks to make ML failure modes predictable