Manifestro Docs

Mathematical Foundations

Understand the mathematical theory behind DREAM's adaptive dynamics

Mathematical Foundations

This guide covers the mathematical foundations underlying DREAM's adaptive neural dynamics.

Continuous-Time RNN Dynamics

DREAM models hidden state evolution as a differential equation rather than a discrete update:

\tau \cdot \frac{dh}{dt} = -h + h_{target}(x, h, \text{error})

Where:

  • h: Hidden state
  • τ: Time constant (adaptive)
  • h_target: Target state based on input and error

Euler Discretization

Using Euler discretization with step size dt:

\begin{aligned}
h(t+dt) &= h(t) + dt \cdot \frac{-h(t) + h_{target}}{\tau} \\
        &= \left(1 - \frac{dt}{\tau}\right) \cdot h(t) + \frac{dt}{\tau} \cdot h_{target}
\end{aligned}

Stability condition: dt/τ < 1 (ensured by clamping)

Interpretation

  • (1 - dt/τ): Retention factor (how much of old state to keep)
  • dt/τ: Integration factor (how much of target to incorporate)
  • τ large: Slow integration, stable memory
  • τ small: Fast integration, quick response

Hebbian Learning Rule

Classic Hebbian Principle

"Neurons that fire together, wire together"

\frac{dU}{dt} = -\lambda \cdot (U - U_{target}) + \eta \cdot \text{surprise} \cdot (h \otimes \text{error} \times V)

Where:

  • U: Fast weights (left factor)
  • U_target: Target weights (consolidation anchor)
  • λ: Forgetting rate
  • η: Base plasticity
  • : Outer product
  • V: Fixed basis (right factor)

Component Breakdown

Decay term:

-\lambda \cdot (U - U_{target})
  • Pulls U toward U_target
  • Prevents unbounded growth
  • Implements "forgetting" of unused patterns

Plasticity term:

\eta \cdot \text{surprise} \cdot (h \otimes \text{error} \times V)
  • Outer product h ⊗ error: Correlation between state and error
  • Projection onto V: Efficient low-rank update
  • Surprise modulation: High surprise → faster learning

Update Rule

Using Euler integration:

U_{new} = U + \frac{dU}{dt} \cdot dt

Normalization

After update, normalize to maintain stability:

U_{new} \leftarrow U_{new} \cdot \frac{\text{target\_norm}}{||U_{new}||}

Surprise Computation

Core Idea

Surprise measures how "unexpected" a prediction error is:

\text{surprise} = \sigma\left(\frac{||\text{error}|| - \tau_{effective}}{\gamma}\right)

Where:

  • σ: Sigmoid function (range 0 to 1)
  • ||error||: Prediction error norm
  • τ_effective: Adaptive threshold
  • γ: Temperature (smoothness)

Interpretation

Surprise ValueMeaning
≈ 0Error is expected (no update needed)
≈ 0.5Error is at threshold
≈ 1Error is highly surprising (full update)

Effective Threshold

The threshold τ_effective combines classical and adaptive components:

\tau_{effective} = 0.3 \cdot \tau_{classical} + 0.7 \cdot \tau_{adaptive}

Classical (entropy-based):

\begin{aligned}
\text{entropy} &= \frac{1}{2} \log(2\pi e \cdot \text{error\_var}) \\
\tau_{classical} &= \tau_0 \cdot (1 + \alpha \cdot \text{entropy})
\end{aligned}

Adaptive (habituation):

\begin{aligned}
\tau_{adaptive} &= (1 - \beta) \cdot \tau_{adaptive} + \beta \cdot ||\text{error}|| \\
\tau_{adaptive} &= \text{clamp}(\tau_{adaptive}, \text{max}=0.8)
\end{aligned}

Error Statistics

Exponential moving average (EMA) for error statistics:

\begin{aligned}
\text{error\_mean} &= (1 - \beta) \cdot \text{error\_mean} + \beta \cdot \text{error} \\
\text{error\_var} &= (1 - \beta) \cdot \text{error\_var} + \beta \cdot (\text{error} - \text{error\_mean})^2
\end{aligned}

Liquid Time-Constant Adaptation

Dynamic Time Constant

The time constant adapts based on surprise:

\tau(\text{surprise}) = \frac{\tau_{system}}{1 + \text{surprise} \times \text{scale}}

Properties

PropertyValue
τ(0)τ_system (baseline)
τ(1)τ_system / (1 + scale) (minimum)
MonotonicityDecreasing in surprise

Behavioral Modes

High surprise (novel input):

\begin{aligned}
\text{surprise} &\approx 1 \\
\tau &\approx \frac{\tau_{system}}{1 + \text{scale}} \quad \text{(small)} \\
\frac{dt}{\tau} &\approx \text{large} \quad \text{(fast response)}
\end{aligned}

Low surprise (expected input):

\begin{aligned}
\text{surprise} &\approx 0 \\
\tau &\approx \tau_{system} \quad \text{(large)} \\
\frac{dt}{\tau} &\approx \text{small} \quad \text{(slow integration)}
\end{aligned}

Stability Clamping

For numerical stability:

\begin{aligned}
\tau_{effective} &= \text{clamp}(\tau_{dynamic}, \text{min}=0.01, \text{max}=50.0) \\
\frac{dt}{\tau} &= \text{clamp}\left(\frac{dt}{\tau_{effective} + dt}, \text{min}=0.01, \text{max}=0.5\right)
\end{aligned}

Low-Rank Fast Weights

Decomposition

Instead of storing full weight matrix W (hidden × input), DREAM uses:

W_{fast} = U \times V^T

Where:

  • U: (batch, hidden, rank) - learned via Hebbian plasticity
  • V: (input, rank) - fixed basis

Memory Efficiency

RepresentationMemory
Full matrix WO(hidden × input)
Low-rank U × V^TO(rank × (hidden + input))

Example: With hidden=256, input=64, rank=8:

  • Full: 256 × 64 = 16,384 parameters
  • Low-rank: 8 × (256 + 64) = 2,560 parameters
  • Compression: 6.4× reduction

Update Efficiency

  • Only U changes (fast, per-batch)
  • V stays fixed (slow, learned)
  • Enables efficient meta-learning

Sleep Consolidation

Mechanism

During low-surprise periods, the model "consolidates" memories:

\text{if } \text{avg\_surprise} < \text{threshold}: \quad U_{target} \leftarrow U_{target} + \zeta \cdot (U - U_{target})

Where ζ is the sleep rate.

Purpose

  • Stabilizes frequently-used patterns
  • Prevents catastrophic forgetting
  • Similar to biological memory consolidation during sleep

Predictive Coding

Prediction Generation

\begin{aligned}
\text{dynamic} &= U \times V^T \\
C_{effective} &= C^T + \text{dynamic} \cdot 0.1 \\
x_{pred} &= \tanh(C_{effective} \times h_{prev}) \times ||x||
\end{aligned}

Error Computation

\text{error} = x - x_{pred}

Design Rationale

  • Base matrix C: Stable predictions
  • Fast weights U: Input-specific adaptation
  • Multiplicative modulation: Preserves structure
  • Norm scaling: Appropriate magnitude

Summary of Equations

Complete Cell Update

Given input x and previous state (h, U, τ_adaptive):

  1. Prediction: x_pred = tanh((C^T + U V^T · 0.1) × h) × ||x||

  2. Error: error = x - x_pred

  3. Surprise: surprise = σ((||error|| - τ_effective) / γ)

  4. Fast weights update: U ← U + dt · [-λ(U - U_target) + η · surprise · (h ⊗ error × V)]

  5. Time constant: τ = τ_sys / (1 + surprise × scale)

  6. Hidden state update: h ← (1 - dt/τ) · h + (dt/τ) · tanh(input_effect)

Next Steps

On this page