Benchmarks
Performance comparison with LSTM and Transformer
Benchmarks
Comprehensive performance comparison of DREAM vs LSTM vs Transformer.
Overview
We evaluated DREAM on three audio tasks comparing against standard baselines:
- LSTM (2-layer, 256 hidden)
- Transformer (4-layer, d_model=128)
Test Suite
Test 1: Basic ASR Reconstruction
Task: Reconstruct mel spectrograms from 9 audio files.
Setup:
- Input: 80 mel bins, 1014 frames
- Training: 100 epochs
- Metric: Reconstruction loss (MSE)
Results:
| Model | Parameters | Initial Loss | Final Loss | Improvement | Time |
|---|---|---|---|---|---|
| DREAM | 82K | 0.9298 | 0.0010 | 99.9% | 502s |
| LSTM | 893K | 0.7889 | 0.0478 | 93.9% | 9s |
| Transformer | 551K | 0.9416 | 0.0696 | 92.6% | 11s |
Training Curves:
Epoch 20: DREAM=0.024, LSTM=0.210, Transformer=0.190
Epoch 40: DREAM=0.006, LSTM=0.131, Transformer=0.133
Epoch 60: DREAM=0.003, LSTM=0.089, Transformer=0.104
Epoch 80: DREAM=0.002, LSTM=0.063, Transformer=0.084
Epoch 100: DREAM=0.001, LSTM=0.048, Transformer=0.070Conclusion: DREAM achieves lowest final loss (99.9% improvement) but requires more training time due to online adaptation.
Test 2: Speaker Adaptation
Task: Adapt to speaker change mid-sequence.
Setup:
- Concatenate two different speakers
- Measure steps to recover baseline loss
- Target:
{'<'}50steps (Spec 7.5)
Results:
| Model | Baseline Loss | Max Post-Switch | Adapt Steps | Surprise Spike |
|---|---|---|---|---|
| DREAM | 1.2078 | 1.9657 | 0 | 0.119 |
| LSTM | 1.0435 | 1.5807 | 0 | N/A |
| Transformer | 1.1963 | 1.6963 | 0 | N/A |
Conclusion: All models adapt instantly (0 steps), but only DREAM detects change via surprise spike.
Test 3: Noise Robustness
Task: Reconstruction with additive white noise.
Setup:
- SNR levels: 20dB, 10dB, 5dB, 0dB
- Metric: Loss ratio (10dB / clean)
- Target:
{'<'}3×ratio
Results:
| Model | Clean (20dB) | 10dB Loss | Ratio | Surprise Response |
|---|---|---|---|---|
| DREAM | 1.2308 | 1.3390 | 1.09× | ❌ No |
| LSTM | 1.0163 | 1.1052 | 1.09× | N/A |
| Transformer | 1.2867 | 1.3757 | 1.07× | N/A |
Surprise Response by SNR (DREAM):
SNR 20dB: Max Surprise = 0.973
SNR 10dB: Max Surprise = 0.987
SNR 5dB: Max Surprise = 0.995
SNR 0dB: Max Surprise = 1.000Conclusion: DREAM is stable under noise (1.09×), surprise increases with noise but saturates.
Summary
Overall Performance
| Test | DREAM | LSTM | Transformer | Target |
|---|---|---|---|---|
| ASR Improvement | 99.9% ✅ | 93.9% | 92.6% | {'>'}90% |
| Adaptation Steps | 0 ✅ | 0 | 0 | {'<'}50 |
| Noise Ratio | 1.09× ✅ | 1.09× | 1.07× | {'<'}3× |
Key Findings
- DREAM achieves best reconstruction quality (99.9% vs 93-94%)
- Instant adaptation to speaker changes (0 steps)
- Stable under noise (1.09× ratio at 10dB SNR)
- Fewer parameters (82K vs 551-893K)
Trade-offs
| Aspect | DREAM | Baselines |
|---|---|---|
| Quality | ✅ Best | Good |
| Training Speed | ❌ Slower (502s) | ✅ Fast (9-11s) |
| Parameters | ✅ 82K | ❌ 551-893K |
| Online Adaptation | ✅ Yes | ❌ No |
Running Benchmarks
Quick Start
# Run all benchmarks (15-30 minutes)
uv run python tests/benchmarks/run_all.py
# Individual tests
uv run python tests/benchmarks/test_01_basic_asr.py
uv run python tests/benchmarks/test_02_speaker_adaptation.py
uv run python tests/benchmarks/test_03_noise_robustness.py
# Generate visualizations
uv run python tests/benchmarks/visualize.pyOutput Files
After running:
tests/benchmarks/results/
├── results_basic_asr.json
├── results_speaker_adaptation.json
├── results_noise_robustness.json
├── figures/
│ ├── fig1_training_curves.pdf
│ ├── fig2_speaker_adaptation.pdf
│ ├── fig3_noise_robustness.pdf
│ └── benchmark_table.tex
└── BENCHMARK_REPORT.mdHardware Requirements
Minimum:
- 8GB RAM
- CPU (slower)
Recommended:
- GPU with 4GB+ VRAM
- 16GB RAM
- SSD
Estimated Runtime:
- Test 1 (ASR): ~5-10 min per model
- Test 2 (Adaptation): ~1 min per model
- Test 3 (Noise): ~2 min per model
- Total: ~15-30 minutes
Reproducibility
Environment
# Python
Python 3.10+
# Dependencies
torch>=2.0.0
numpy>=1.24.0
librosa>=0.10.0 # for audio testsDataset
10 audio files from LJ Speech-like corpus:
- Sample rate: 16kHz
- Features: Mel spectrogram (80 bins)
- Duration: ~3-10 seconds each
Hyperparameters
DREAMConfig(
input_dim=80,
hidden_dim=256,
rank=16,
forgetting_rate=0.005,
base_plasticity=0.5,
base_threshold=0.3,
ltc_tau_sys=5.0,
ltc_surprise_scale=5.0,
)Next Steps
- Guides — Tutorials and examples
- Technical Report — Implementation details
- Benchmark Results — Full analysis