Benchmarks

Comprehensive performance comparison of DREAM vs LSTM vs Transformer.

Overview

We evaluated DREAM on three audio tasks comparing against standard baselines:

LSTM (2-layer, 256 hidden)
Transformer (4-layer, d_model=128)

Test Suite

Test 1: Basic ASR Reconstruction

Task: Reconstruct mel spectrograms from 9 audio files.

Setup:

Input: 80 mel bins, 1014 frames
Training: 100 epochs
Metric: Reconstruction loss (MSE)

Results:

Model	Parameters	Initial Loss	Final Loss	Improvement	Time
DREAM	82K	0.9298	0.0010	99.9%	502s
LSTM	893K	0.7889	0.0478	93.9%	9s
Transformer	551K	0.9416	0.0696	92.6%	11s

Training Curves:

Epoch 20:  DREAM=0.024, LSTM=0.210, Transformer=0.190
Epoch 40:  DREAM=0.006, LSTM=0.131, Transformer=0.133
Epoch 60:  DREAM=0.003, LSTM=0.089, Transformer=0.104
Epoch 80:  DREAM=0.002, LSTM=0.063, Transformer=0.084
Epoch 100: DREAM=0.001, LSTM=0.048, Transformer=0.070

Conclusion: DREAM achieves lowest final loss (99.9% improvement) but requires more training time due to online adaptation.

Test 2: Speaker Adaptation

Task: Adapt to speaker change mid-sequence.

Setup:

Concatenate two different speakers
Measure steps to recover baseline loss
Target: {'<'}50 steps (Spec 7.5)

Results:

Model	Baseline Loss	Max Post-Switch	Surprise Spike
DREAM	1.2078	1.9657	0.119
LSTM	1.0435	1.5807	N/A
Transformer	1.1963	1.6963	N/A

Conclusion: All models adapt instantly (0 steps), but only DREAM detects change via surprise spike.

Test 3: Noise Robustness

Task: Reconstruction with additive white noise.

Setup:

SNR levels: 20dB, 10dB, 5dB, 0dB
Metric: Loss ratio (10dB / clean)
Target: {'<'}3× ratio

Results:

Model	Clean (20dB)	10dB Loss	Ratio	Surprise Response
DREAM	1.2308	1.3390	1.09×	❌ No
LSTM	1.0163	1.1052	1.09×	N/A
Transformer	1.2867	1.3757	1.07×	N/A

Surprise Response by SNR (DREAM):

SNR 20dB: Max Surprise = 0.973
SNR 10dB: Max Surprise = 0.987
SNR  5dB: Max Surprise = 0.995
SNR  0dB: Max Surprise = 1.000

Conclusion: DREAM is stable under noise (1.09×), surprise increases with noise but saturates.

Summary

Overall Performance

Test	DREAM	LSTM	Transformer	Target
ASR Improvement	99.9% ✅	93.9%	92.6%	`{'>'}90%`
Adaptation Steps	0 ✅	0	0	`{'<'}50`
Noise Ratio	1.09× ✅	1.09×	1.07×	`{'<'}3×`

Key Findings

DREAM achieves best reconstruction quality (99.9% vs 93-94%)
Instant adaptation to speaker changes (0 steps)
Stable under noise (1.09× ratio at 10dB SNR)
Fewer parameters (82K vs 551-893K)

Trade-offs

Aspect	DREAM	Baselines
Quality	✅ Best	Good
Training Speed	❌ Slower (502s)	✅ Fast (9-11s)
Parameters	✅ 82K	❌ 551-893K
Online Adaptation	✅ Yes	❌ No

Running Benchmarks

Quick Start

# Run all benchmarks (15-30 minutes)
uv run python tests/benchmarks/run_all.py

# Individual tests
uv run python tests/benchmarks/test_01_basic_asr.py
uv run python tests/benchmarks/test_02_speaker_adaptation.py
uv run python tests/benchmarks/test_03_noise_robustness.py

# Generate visualizations
uv run python tests/benchmarks/visualize.py

Output Files

After running:

tests/benchmarks/results/
├── results_basic_asr.json
├── results_speaker_adaptation.json
├── results_noise_robustness.json
├── figures/
│   ├── fig1_training_curves.pdf
│   ├── fig2_speaker_adaptation.pdf
│   ├── fig3_noise_robustness.pdf
│   └── benchmark_table.tex
└── BENCHMARK_REPORT.md

Hardware Requirements

Minimum:

8GB RAM
CPU (slower)

Recommended:

GPU with 4GB+ VRAM
16GB RAM
SSD

Estimated Runtime:

Test 1 (ASR): ~5-10 min per model
Test 2 (Adaptation): ~1 min per model
Test 3 (Noise): ~2 min per model
Total: ~15-30 minutes

Reproducibility

Environment

# Python
Python 3.10+

# Dependencies
torch>=2.0.0
numpy>=1.24.0
librosa>=0.10.0  # for audio tests

Dataset

10 audio files from LJ Speech-like corpus:

Sample rate: 16kHz
Features: Mel spectrogram (80 bins)
Duration: ~3-10 seconds each

Hyperparameters

DREAMConfig(
    input_dim=80,
    hidden_dim=256,
    rank=16,
    forgetting_rate=0.005,
    base_plasticity=0.5,
    base_threshold=0.3,
    ltc_tau_sys=5.0,
    ltc_surprise_scale=5.0,
)

Next Steps

Guides — Tutorials and examples
Technical Report — Implementation details
Benchmark Results — Full analysis

Benchmarks

On this page