Mathematical Foundations
Understand the mathematical theory behind DREAM's adaptive dynamics
Mathematical Foundations
This guide covers the mathematical foundations underlying DREAM's adaptive neural dynamics.
Continuous-Time RNN Dynamics
DREAM models hidden state evolution as a differential equation rather than a discrete update:
\tau \cdot \frac{dh}{dt} = -h + h_{target}(x, h, \text{error})Where:
h: Hidden stateτ: Time constant (adaptive)h_target: Target state based on input and error
Euler Discretization
Using Euler discretization with step size dt:
\begin{aligned}
h(t+dt) &= h(t) + dt \cdot \frac{-h(t) + h_{target}}{\tau} \\
&= \left(1 - \frac{dt}{\tau}\right) \cdot h(t) + \frac{dt}{\tau} \cdot h_{target}
\end{aligned}Stability condition: dt/τ < 1 (ensured by clamping)
Interpretation
(1 - dt/τ): Retention factor (how much of old state to keep)dt/τ: Integration factor (how much of target to incorporate)τlarge: Slow integration, stable memoryτsmall: Fast integration, quick response
Hebbian Learning Rule
Classic Hebbian Principle
"Neurons that fire together, wire together"
\frac{dU}{dt} = -\lambda \cdot (U - U_{target}) + \eta \cdot \text{surprise} \cdot (h \otimes \text{error} \times V)Where:
U: Fast weights (left factor)U_target: Target weights (consolidation anchor)λ: Forgetting rateη: Base plasticity⊗: Outer productV: Fixed basis (right factor)
Component Breakdown
Decay term:
-\lambda \cdot (U - U_{target})- Pulls
UtowardU_target - Prevents unbounded growth
- Implements "forgetting" of unused patterns
Plasticity term:
\eta \cdot \text{surprise} \cdot (h \otimes \text{error} \times V)- Outer product
h ⊗ error: Correlation between state and error - Projection onto
V: Efficient low-rank update - Surprise modulation: High surprise → faster learning
Update Rule
Using Euler integration:
U_{new} = U + \frac{dU}{dt} \cdot dtNormalization
After update, normalize to maintain stability:
U_{new} \leftarrow U_{new} \cdot \frac{\text{target\_norm}}{||U_{new}||}Surprise Computation
Core Idea
Surprise measures how "unexpected" a prediction error is:
\text{surprise} = \sigma\left(\frac{||\text{error}|| - \tau_{effective}}{\gamma}\right)Where:
σ: Sigmoid function (range 0 to 1)||error||: Prediction error normτ_effective: Adaptive thresholdγ: Temperature (smoothness)
Interpretation
| Surprise Value | Meaning |
|---|---|
| ≈ 0 | Error is expected (no update needed) |
| ≈ 0.5 | Error is at threshold |
| ≈ 1 | Error is highly surprising (full update) |
Effective Threshold
The threshold τ_effective combines classical and adaptive components:
\tau_{effective} = 0.3 \cdot \tau_{classical} + 0.7 \cdot \tau_{adaptive}Classical (entropy-based):
\begin{aligned}
\text{entropy} &= \frac{1}{2} \log(2\pi e \cdot \text{error\_var}) \\
\tau_{classical} &= \tau_0 \cdot (1 + \alpha \cdot \text{entropy})
\end{aligned}Adaptive (habituation):
\begin{aligned}
\tau_{adaptive} &= (1 - \beta) \cdot \tau_{adaptive} + \beta \cdot ||\text{error}|| \\
\tau_{adaptive} &= \text{clamp}(\tau_{adaptive}, \text{max}=0.8)
\end{aligned}Error Statistics
Exponential moving average (EMA) for error statistics:
\begin{aligned}
\text{error\_mean} &= (1 - \beta) \cdot \text{error\_mean} + \beta \cdot \text{error} \\
\text{error\_var} &= (1 - \beta) \cdot \text{error\_var} + \beta \cdot (\text{error} - \text{error\_mean})^2
\end{aligned}Liquid Time-Constant Adaptation
Dynamic Time Constant
The time constant adapts based on surprise:
\tau(\text{surprise}) = \frac{\tau_{system}}{1 + \text{surprise} \times \text{scale}}Properties
| Property | Value |
|---|---|
τ(0) | τ_system (baseline) |
τ(1) | τ_system / (1 + scale) (minimum) |
| Monotonicity | Decreasing in surprise |
Behavioral Modes
High surprise (novel input):
\begin{aligned}
\text{surprise} &\approx 1 \\
\tau &\approx \frac{\tau_{system}}{1 + \text{scale}} \quad \text{(small)} \\
\frac{dt}{\tau} &\approx \text{large} \quad \text{(fast response)}
\end{aligned}Low surprise (expected input):
\begin{aligned}
\text{surprise} &\approx 0 \\
\tau &\approx \tau_{system} \quad \text{(large)} \\
\frac{dt}{\tau} &\approx \text{small} \quad \text{(slow integration)}
\end{aligned}Stability Clamping
For numerical stability:
\begin{aligned}
\tau_{effective} &= \text{clamp}(\tau_{dynamic}, \text{min}=0.01, \text{max}=50.0) \\
\frac{dt}{\tau} &= \text{clamp}\left(\frac{dt}{\tau_{effective} + dt}, \text{min}=0.01, \text{max}=0.5\right)
\end{aligned}Low-Rank Fast Weights
Decomposition
Instead of storing full weight matrix W (hidden × input), DREAM uses:
W_{fast} = U \times V^TWhere:
U:(batch, hidden, rank)- learned via Hebbian plasticityV:(input, rank)- fixed basis
Memory Efficiency
| Representation | Memory |
|---|---|
Full matrix W | O(hidden × input) |
Low-rank U × V^T | O(rank × (hidden + input)) |
Example: With hidden=256, input=64, rank=8:
- Full:
256 × 64 = 16,384parameters - Low-rank:
8 × (256 + 64) = 2,560parameters - Compression: 6.4× reduction
Update Efficiency
- Only
Uchanges (fast, per-batch) Vstays fixed (slow, learned)- Enables efficient meta-learning
Sleep Consolidation
Mechanism
During low-surprise periods, the model "consolidates" memories:
\text{if } \text{avg\_surprise} < \text{threshold}: \quad U_{target} \leftarrow U_{target} + \zeta \cdot (U - U_{target})Where ζ is the sleep rate.
Purpose
- Stabilizes frequently-used patterns
- Prevents catastrophic forgetting
- Similar to biological memory consolidation during sleep
Predictive Coding
Prediction Generation
\begin{aligned}
\text{dynamic} &= U \times V^T \\
C_{effective} &= C^T + \text{dynamic} \cdot 0.1 \\
x_{pred} &= \tanh(C_{effective} \times h_{prev}) \times ||x||
\end{aligned}Error Computation
\text{error} = x - x_{pred}Design Rationale
- Base matrix
C: Stable predictions - Fast weights
U: Input-specific adaptation - Multiplicative modulation: Preserves structure
- Norm scaling: Appropriate magnitude
Summary of Equations
Complete Cell Update
Given input x and previous state (h, U, τ_adaptive):
-
Prediction:
x_pred = tanh((C^T + U V^T · 0.1) × h) × ||x|| -
Error:
error = x - x_pred -
Surprise:
surprise = σ((||error|| - τ_effective) / γ) -
Fast weights update:
U ← U + dt · [-λ(U - U_target) + η · surprise · (h ⊗ error × V)] -
Time constant:
τ = τ_sys / (1 + surprise × scale) -
Hidden state update:
h ← (1 - dt/τ) · h + (dt/τ) · tanh(input_effect)
Next Steps
- Architecture Deep Dive - Implementation details
- API Reference - Class and method documentation
- Configuration Guide - Parameter tuning