FAQ
Frequently asked questions about DREAM
FAQ
Frequently asked questions about DREAM (Dynamic Recall and Elastic Adaptive Memory).
General Questions
Q: How does DREAM compare to LSTM?
A: DREAM has several advantages over LSTM:
| Feature | LSTM | DREAM |
|---|---|---|
| Time constants | Fixed | Adaptive (LTC) |
| Learning rate | Global (optimizer) | Local (per synapse) |
| Memory update | Always same speed | Surprise-modulated |
| State representation | Hidden only | Hidden + Fast Weights |
| Adaptation during inference | ❌ No | ✅ Yes |
DREAM excels at online learning and non-stationary data where patterns change over time. LSTM is better for static patterns and has more mature tooling.
Q: Can I use DREAM for text/NLP tasks?
A: Yes! DREAM works well for sequential data including text. For NLP:
# Use embeddings as input
embedding = nn.Embedding(vocab_size, embed_dim)
model = DREAM(input_dim=embed_dim, hidden_dim=256)
# Process
embedded = embedding(token_ids) # (batch, seq_len, embed_dim)
output, state = model(embedded)Common NLP applications:
- Language modeling
- Sequence labeling
- Text classification
- Machine translation (encoder-decoder)
Q: Is DREAM compatible with transformers?
A: Yes! You can combine DREAM with transformers:
Option 1: DREAM + Attention
class HybridModel(nn.Module):
def __init__(self):
super().__init__()
self.dream = DREAM(embed_dim, hidden_dim)
self.attention = nn.MultiheadAttention(hidden_dim, num_heads)Option 2: DREAM as encoder, transformer as decoder
encoder = DREAM(input_dim, hidden_dim)
decoder = TransformerDecoder(...)Option 3: Replace self-attention with DREAM
# In transformer encoder layer
self.dream = DREAM(d_model, d_model)
# Instead of self-attentionQ: What's the best rank value?
A: It depends on your needs:
| Rank | Use Case | Memory |
|---|---|---|
| 4-8 | Compression, speed | Low |
| 16 | Balanced | Medium |
| 32 | Expressivity | High |
Recommendation: Start with rank=16. Adjust based on:
- Memory constraints: Lower rank
- Complex patterns: Higher rank
- Speed requirements: Lower rank
Q: Can I train DREAM on CPU?
A: Yes, but GPU is recommended for:
- Large sequences (seq_len > 100)
- Large batch sizes (batch_size > 16)
- Deep models (multiple layers)
CPU training tips:
# Smaller model
config = DREAMConfig(hidden_dim=64, rank=8)
# Smaller batches
batch_size = 4
# Shorter sequences
seq_len = 50Q: Does DREAM support mixed precision?
A: Yes! Use PyTorch's AMP:
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for batch in dataloader:
optimizer.zero_grad()
with autocast():
output, state = model(batch)
loss = criterion(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()Benefits:
- 2-3x faster on Volta/Ampere GPUs
- Reduced memory usage
- Larger batch sizes possible
Training Questions
Q: How do I save/load a trained model?
A: Use PyTorch's state dict:
# Save
torch.save({
'model': model.state_dict(),
'config': config,
'optimizer': optimizer.state_dict(),
'epoch': epoch,
}, 'checkpoint.pt')
# Load
checkpoint = torch.load('checkpoint.pt')
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']Q: Can I freeze fast weights during training?
A: Yes, freeze specific parameters:
# Freeze U (fast weights)
for param in model.cell.U:
param.requires_grad = False
# Freeze entire DREAM cell
for param in model.parameters():
param.requires_grad = False
# Unfreeze specific parameters
for param in model.classifier.parameters():
param.requires_grad = TrueQ: What optimizer works best?
A: We recommend AdamW:
optimizer = torch.optim.AdamW(
model.parameters(),
lr=1e-3,
weight_decay=1e-4
)Alternatives:
- Adam: Good default, but less regularization
- SGD with momentum: For fine-tuning
- LAMB: For very large models
Q: How do I use learning rate scheduling?
A: Use PyTorch schedulers:
# Reduce on plateau
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=5
)
# Cosine annealing
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer, T_max=num_epochs
)
# Step decay
scheduler = torch.optim.lr_scheduler.StepLR(
optimizer, step_size=30, gamma=0.1
)
# In training loop
for epoch in range(num_epochs):
train()
validate()
scheduler.step() # or scheduler.step(val_loss)Architecture Questions
Q: How many layers should I use?
A: Depends on task complexity:
| Layers | Use Case |
|---|---|
| 1 | Simple tasks, speed |
| 2-3 | Most tasks (recommended) |
| 4+ | Complex patterns |
Example:
# Simple
model = DREAM(input_dim=64, hidden_dim=128)
# Standard
stack = DREAMStack(
input_dim=64,
hidden_dims=[128, 128],
rank=16
)
# Deep
stack = DREAMStack(
input_dim=64,
hidden_dims=[128, 256, 256, 128],
rank=16,
dropout=0.1
)Q: Should I enable LTC?
A: LTC (Liquid Time-Constants) provides:
- ✅ Adaptive integration speeds
- ✅ Better handling of novelty
- ✅ More interpretable dynamics
Disable LTC when:
- You need maximum speed
- You want standard RNN behavior
- Your data is stationary
# Enable LTC (default)
config = DREAMConfig(ltc_enabled=True)
# Disable LTC
config = DREAMConfig(ltc_enabled=False)Q: How does surprise-driven plasticity work?
A: Surprise modulates learning:
- Low surprise (expected error) → Small updates
- High surprise (unexpected error) → Large updates
# High surprise triggers faster learning
surprise = sigmoid((error_norm - threshold) / temperature)
# Fast weights update
dU = -decay * (U - U_target) + plasticity * surprise * hebbianBenefit: Model adapts quickly to novel patterns while maintaining stability for expected patterns.
Debugging Questions
Q: How do I monitor what's happening inside DREAM?
A: Log state statistics:
@torch.no_grad()
def log_state(state, step):
print(f"Step {step}:")
print(f" Hidden norm: {state.h.norm().item():.4f}")
print(f" U norm: {state.U.norm().item():.4f}")
print(f" Surprise: {state.avg_surprise.mean().item():.4f}")
print(f" Adaptive tau: {state.adaptive_tau.mean().item():.4f}")
# During training
for step, batch in enumerate(dataloader):
output, state = model(batch)
if step % 100 == 0:
log_state(state, step)Q: Loss is NaN. What do I do?
A: Try these fixes:
# 1. Reduce learning rate
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
# 2. Smaller time step
config = DREAMConfig(time_step=0.05)
# 3. Gradient clipping
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
# 4. Normalize input
x = (x - x.mean()) / x.std()See Troubleshooting for more solutions.
Q: How do I check if gradients are flowing?
A: Inspect gradients:
for name, param in model.named_parameters():
if param.grad is not None:
grad_norm = param.grad.norm().item()
print(f"{name}: grad norm = {grad_norm:.4f}")
else:
print(f"{name}: no gradient")No gradients? Check:
requires_grad=Truefor parameters- Loss is connected to model output
loss.backward()is called
Performance Questions
Q: How can I make training faster?
A: Optimizations:
# 1. Use GPU
model = model.to('cuda')
# 2. Mixed precision
from torch.cuda.amp import autocast
# 3. Reduce rank
config = DREAMConfig(rank=8)
# 4. Larger batch size
batch_size = 64
# 5. Multiple data workers
loader = DataLoader(dataset, batch_size=32, num_workers=4)Q: How can I reduce memory usage?
A: Memory optimizations:
# 1. Smaller model
config = DREAMConfig(hidden_dim=128, rank=8)
# 2. Gradient accumulation
accumulation_steps = 4
# 3. Detach state periodically
state = state.detach()
# 4. Clear cache
torch.cuda.empty_cache()Deployment Questions
Q: Can I export DREAM to ONNX?
A: DREAM can be exported to ONNX, but there are limitations due to dynamic state. Contact maintainers for ONNX export support.
Q: Can I use DREAM for real-time inference?
A: Yes! DREAM supports streaming inference:
model = DREAM(input_dim=64, hidden_dim=128)
state = model.init_state(batch_size=1)
# Process one timestep at a time
for x_t in data_stream:
x_t = x_t.unsqueeze(0).unsqueeze(0) # (1, 1, input_dim)
output, state = model(x_t, state)
# output: (1, 1, hidden_dim)
# Use output for predictionLatency: ~1-5ms per timestep on GPU (depends on model size).
Q: How do I deploy DREAM to production?
A: Deployment checklist:
-
Save model:
torch.save(model.state_dict(), 'model.pt') -
Version control:
- Pin DREAM version
- Document dependencies
-
Monitoring:
- Log prediction latency
- Monitor surprise levels
- Track model drift
-
Testing:
- Test on production-like data
- Validate performance metrics
Still Have Questions?
- Check the documentation
- Search discussions
- Create a new discussion
- Report issues
Next Steps
- Contributing - Contribute to DREAM
- Troubleshooting - Common issues
- Examples - Real-world examples