Mathematical Foundations

This guide covers the mathematical foundations underlying DREAM's adaptive neural dynamics.

Continuous-Time RNN Dynamics

DREAM models hidden state evolution as a differential equation rather than a discrete update:

\tau \cdot \frac{dh}{dt} = -h + h_{target}(x, h, \text{error})

Where:

h: Hidden state
τ: Time constant (adaptive)
h_target: Target state based on input and error

Euler Discretization

Using Euler discretization with step size dt:

\begin{aligned}
h(t+dt) &= h(t) + dt \cdot \frac{-h(t) + h_{target}}{\tau} \\
        &= \left(1 - \frac{dt}{\tau}\right) \cdot h(t) + \frac{dt}{\tau} \cdot h_{target}
\end{aligned}

Stability condition: dt/τ < 1 (ensured by clamping)

Interpretation

(1 - dt/τ): Retention factor (how much of old state to keep)
dt/τ: Integration factor (how much of target to incorporate)
τ large: Slow integration, stable memory
τ small: Fast integration, quick response

Hebbian Learning Rule

Classic Hebbian Principle

"Neurons that fire together, wire together"

\frac{dU}{dt} = -\lambda \cdot (U - U_{target}) + \eta \cdot \text{surprise} \cdot (h \otimes \text{error} \times V)

Where:

U: Fast weights (left factor)
U_target: Target weights (consolidation anchor)
λ: Forgetting rate
η: Base plasticity
⊗: Outer product
V: Fixed basis (right factor)

Component Breakdown

Decay term:

-\lambda \cdot (U - U_{target})

Pulls U toward U_target
Prevents unbounded growth
Implements "forgetting" of unused patterns

Plasticity term:

\eta \cdot \text{surprise} \cdot (h \otimes \text{error} \times V)

Outer product h ⊗ error: Correlation between state and error
Projection onto V: Efficient low-rank update
Surprise modulation: High surprise → faster learning

Update Rule

Using Euler integration:

U_{new} = U + \frac{dU}{dt} \cdot dt

Normalization

After update, normalize to maintain stability:

U_{new} \leftarrow U_{new} \cdot \frac{\text{target\_norm}}{||U_{new}||}

Surprise Computation

Core Idea

Surprise measures how "unexpected" a prediction error is:

\text{surprise} = \sigma\left(\frac{||\text{error}|| - \tau_{effective}}{\gamma}\right)

Where:

σ: Sigmoid function (range 0 to 1)
||error||: Prediction error norm
τ_effective: Adaptive threshold
γ: Temperature (smoothness)

Interpretation

Surprise Value	Meaning
≈ 0	Error is expected (no update needed)
≈ 0.5	Error is at threshold
≈ 1	Error is highly surprising (full update)

Effective Threshold

The threshold τ_effective combines classical and adaptive components:

\tau_{effective} = 0.3 \cdot \tau_{classical} + 0.7 \cdot \tau_{adaptive}

Classical (entropy-based):

\begin{aligned}
\text{entropy} &= \frac{1}{2} \log(2\pi e \cdot \text{error\_var}) \\
\tau_{classical} &= \tau_0 \cdot (1 + \alpha \cdot \text{entropy})
\end{aligned}

Adaptive (habituation):

\begin{aligned}
\tau_{adaptive} &= (1 - \beta) \cdot \tau_{adaptive} + \beta \cdot ||\text{error}|| \\
\tau_{adaptive} &= \text{clamp}(\tau_{adaptive}, \text{max}=0.8)
\end{aligned}

Error Statistics

Exponential moving average (EMA) for error statistics:

\begin{aligned}
\text{error\_mean} &= (1 - \beta) \cdot \text{error\_mean} + \beta \cdot \text{error} \\
\text{error\_var} &= (1 - \beta) \cdot \text{error\_var} + \beta \cdot (\text{error} - \text{error\_mean})^2
\end{aligned}

Liquid Time-Constant Adaptation

Dynamic Time Constant

The time constant adapts based on surprise:

\tau(\text{surprise}) = \frac{\tau_{system}}{1 + \text{surprise} \times \text{scale}}

Properties

Property	Value
`τ(0)`	`τ_system` (baseline)
`τ(1)`	`τ_system / (1 + scale)` (minimum)
Monotonicity	Decreasing in surprise

Behavioral Modes

High surprise (novel input):

\begin{aligned}
\text{surprise} &\approx 1 \\
\tau &\approx \frac{\tau_{system}}{1 + \text{scale}} \quad \text{(small)} \\
\frac{dt}{\tau} &\approx \text{large} \quad \text{(fast response)}
\end{aligned}

Low surprise (expected input):

\begin{aligned}
\text{surprise} &\approx 0 \\
\tau &\approx \tau_{system} \quad \text{(large)} \\
\frac{dt}{\tau} &\approx \text{small} \quad \text{(slow integration)}
\end{aligned}

Stability Clamping

For numerical stability:

\begin{aligned}
\tau_{effective} &= \text{clamp}(\tau_{dynamic}, \text{min}=0.01, \text{max}=50.0) \\
\frac{dt}{\tau} &= \text{clamp}\left(\frac{dt}{\tau_{effective} + dt}, \text{min}=0.01, \text{max}=0.5\right)
\end{aligned}

Low-Rank Fast Weights

Decomposition

Instead of storing full weight matrix W (hidden × input), DREAM uses:

W_{fast} = U \times V^T

Where:

U: (batch, hidden, rank) - learned via Hebbian plasticity
V: (input, rank) - fixed basis

Memory Efficiency

Representation	Memory
Full matrix `W`	`O(hidden × input)`
Low-rank `U × V^T`	`O(rank × (hidden + input))`

Example: With hidden=256, input=64, rank=8:

Full: 256 × 64 = 16,384 parameters
Low-rank: 8 × (256 + 64) = 2,560 parameters
Compression: 6.4× reduction

Update Efficiency

Only U changes (fast, per-batch)
V stays fixed (slow, learned)
Enables efficient meta-learning

Sleep Consolidation

Mechanism

During low-surprise periods, the model "consolidates" memories:

\text{if } \text{avg\_surprise} < \text{threshold}: \quad U_{target} \leftarrow U_{target} + \zeta \cdot (U - U_{target})

Where ζ is the sleep rate.

Purpose

Stabilizes frequently-used patterns
Prevents catastrophic forgetting
Similar to biological memory consolidation during sleep

Predictive Coding

Prediction Generation

\begin{aligned}
\text{dynamic} &= U \times V^T \\
C_{effective} &= C^T + \text{dynamic} \cdot 0.1 \\
x_{pred} &= \tanh(C_{effective} \times h_{prev}) \times ||x||
\end{aligned}

Error Computation

\text{error} = x - x_{pred}

Design Rationale

Base matrix C: Stable predictions
Fast weights U: Input-specific adaptation
Multiplicative modulation: Preserves structure
Norm scaling: Appropriate magnitude

Summary of Equations

Complete Cell Update

Given input x and previous state (h, U, τ_adaptive):

Prediction: x_pred = tanh((C^T + U V^T · 0.1) × h) × ||x||
Error: error = x - x_pred
Surprise: surprise = σ((||error|| - τ_effective) / γ)
Fast weights update: U ← U + dt · [-λ(U - U_target) + η · surprise · (h ⊗ error × V)]
Time constant: τ = τ_sys / (1 + surprise × scale)
Hidden state update: h ← (1 - dt/τ) · h + (dt/τ) · tanh(input_effect)

Next Steps

Architecture Deep Dive - Implementation details
API Reference - Class and method documentation
Configuration Guide - Parameter tuning

Mathematical Foundations

On this page