FAQ

Frequently asked questions about DREAM (Dynamic Recall and Elastic Adaptive Memory).

General Questions

Q: How does DREAM compare to LSTM?

A: DREAM has several advantages over LSTM:

Feature	LSTM	DREAM
Time constants	Fixed	Adaptive (LTC)
Learning rate	Global (optimizer)	Local (per synapse)
Memory update	Always same speed	Surprise-modulated
State representation	Hidden only	Hidden + Fast Weights
Adaptation during inference	❌ No	✅ Yes

DREAM excels at online learning and non-stationary data where patterns change over time. LSTM is better for static patterns and has more mature tooling.

Q: Can I use DREAM for text/NLP tasks?

A: Yes! DREAM works well for sequential data including text. For NLP:

# Use embeddings as input
embedding = nn.Embedding(vocab_size, embed_dim)
model = DREAM(input_dim=embed_dim, hidden_dim=256)

# Process
embedded = embedding(token_ids)  # (batch, seq_len, embed_dim)
output, state = model(embedded)

Common NLP applications:

Language modeling
Sequence labeling
Text classification
Machine translation (encoder-decoder)

Q: Is DREAM compatible with transformers?

A: Yes! You can combine DREAM with transformers:

Option 1: DREAM + Attention

class HybridModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.dream = DREAM(embed_dim, hidden_dim)
        self.attention = nn.MultiheadAttention(hidden_dim, num_heads)

Option 2: DREAM as encoder, transformer as decoder

encoder = DREAM(input_dim, hidden_dim)
decoder = TransformerDecoder(...)

Option 3: Replace self-attention with DREAM

# In transformer encoder layer
self.dream = DREAM(d_model, d_model)
# Instead of self-attention

Q: What's the best rank value?

A: It depends on your needs:

Rank	Use Case	Memory
4-8	Compression, speed	Low
16	Balanced	Medium
32	Expressivity	High

Recommendation: Start with rank=16. Adjust based on:

Memory constraints: Lower rank
Complex patterns: Higher rank
Speed requirements: Lower rank

Q: Can I train DREAM on CPU?

A: Yes, but GPU is recommended for:

Large sequences (seq_len > 100)
Large batch sizes (batch_size > 16)
Deep models (multiple layers)

CPU training tips:

# Smaller model
config = DREAMConfig(hidden_dim=64, rank=8)

# Smaller batches
batch_size = 4

# Shorter sequences
seq_len = 50

Q: Does DREAM support mixed precision?

A: Yes! Use PyTorch's AMP:

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    optimizer.zero_grad()
    
    with autocast():
        output, state = model(batch)
        loss = criterion(output, target)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Benefits:

2-3x faster on Volta/Ampere GPUs
Reduced memory usage
Larger batch sizes possible

Training Questions

Q: How do I save/load a trained model?

A: Use PyTorch's state dict:

# Save
torch.save({
    'model': model.state_dict(),
    'config': config,
    'optimizer': optimizer.state_dict(),
    'epoch': epoch,
}, 'checkpoint.pt')

# Load
checkpoint = torch.load('checkpoint.pt')
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

Q: Can I freeze fast weights during training?

A: Yes, freeze specific parameters:

# Freeze U (fast weights)
for param in model.cell.U:
    param.requires_grad = False

# Freeze entire DREAM cell
for param in model.parameters():
    param.requires_grad = False

# Unfreeze specific parameters
for param in model.classifier.parameters():
    param.requires_grad = True

Q: What optimizer works best?

A: We recommend AdamW:

optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-3,
    weight_decay=1e-4
)

Alternatives:

Adam: Good default, but less regularization
SGD with momentum: For fine-tuning
LAMB: For very large models

Q: How do I use learning rate scheduling?

A: Use PyTorch schedulers:

# Reduce on plateau
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5
)

# Cosine annealing
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=num_epochs
)

# Step decay
scheduler = torch.optim.lr_scheduler.StepLR(
    optimizer, step_size=30, gamma=0.1
)

# In training loop
for epoch in range(num_epochs):
    train()
    validate()
    scheduler.step()  # or scheduler.step(val_loss)

Architecture Questions

Q: How many layers should I use?

A: Depends on task complexity:

Layers	Use Case
1	Simple tasks, speed
2-3	Most tasks (recommended)
4+	Complex patterns

Example:

# Simple
model = DREAM(input_dim=64, hidden_dim=128)

# Standard
stack = DREAMStack(
    input_dim=64,
    hidden_dims=[128, 128],
    rank=16
)

# Deep
stack = DREAMStack(
    input_dim=64,
    hidden_dims=[128, 256, 256, 128],
    rank=16,
    dropout=0.1
)

Q: Should I enable LTC?

A: LTC (Liquid Time-Constants) provides:

✅ Adaptive integration speeds
✅ Better handling of novelty
✅ More interpretable dynamics

Disable LTC when:

You need maximum speed
You want standard RNN behavior
Your data is stationary

# Enable LTC (default)
config = DREAMConfig(ltc_enabled=True)

# Disable LTC
config = DREAMConfig(ltc_enabled=False)

Q: How does surprise-driven plasticity work?

A: Surprise modulates learning:

Low surprise (expected error) → Small updates
High surprise (unexpected error) → Large updates

# High surprise triggers faster learning
surprise = sigmoid((error_norm - threshold) / temperature)

# Fast weights update
dU = -decay * (U - U_target) + plasticity * surprise * hebbian

Benefit: Model adapts quickly to novel patterns while maintaining stability for expected patterns.

Debugging Questions

Q: How do I monitor what's happening inside DREAM?

A: Log state statistics:

@torch.no_grad()
def log_state(state, step):
    print(f"Step {step}:")
    print(f"  Hidden norm: {state.h.norm().item():.4f}")
    print(f"  U norm: {state.U.norm().item():.4f}")
    print(f"  Surprise: {state.avg_surprise.mean().item():.4f}")
    print(f"  Adaptive tau: {state.adaptive_tau.mean().item():.4f}")

# During training
for step, batch in enumerate(dataloader):
    output, state = model(batch)
    if step % 100 == 0:
        log_state(state, step)

Q: Loss is NaN. What do I do?

A: Try these fixes:

# 1. Reduce learning rate
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

# 2. Smaller time step
config = DREAMConfig(time_step=0.05)

# 3. Gradient clipping
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

# 4. Normalize input
x = (x - x.mean()) / x.std()

See Troubleshooting for more solutions.

Q: How do I check if gradients are flowing?

A: Inspect gradients:

for name, param in model.named_parameters():
    if param.grad is not None:
        grad_norm = param.grad.norm().item()
        print(f"{name}: grad norm = {grad_norm:.4f}")
    else:
        print(f"{name}: no gradient")

No gradients? Check:

requires_grad=True for parameters
Loss is connected to model output
loss.backward() is called

Performance Questions

Q: How can I make training faster?

A: Optimizations:

# 1. Use GPU
model = model.to('cuda')

# 2. Mixed precision
from torch.cuda.amp import autocast

# 3. Reduce rank
config = DREAMConfig(rank=8)

# 4. Larger batch size
batch_size = 64

# 5. Multiple data workers
loader = DataLoader(dataset, batch_size=32, num_workers=4)

Q: How can I reduce memory usage?

A: Memory optimizations:

# 1. Smaller model
config = DREAMConfig(hidden_dim=128, rank=8)

# 2. Gradient accumulation
accumulation_steps = 4

# 3. Detach state periodically
state = state.detach()

# 4. Clear cache
torch.cuda.empty_cache()

Deployment Questions

Q: Can I export DREAM to ONNX?

A: DREAM can be exported to ONNX, but there are limitations due to dynamic state. Contact maintainers for ONNX export support.

Q: Can I use DREAM for real-time inference?

A: Yes! DREAM supports streaming inference:

model = DREAM(input_dim=64, hidden_dim=128)
state = model.init_state(batch_size=1)

# Process one timestep at a time
for x_t in data_stream:
    x_t = x_t.unsqueeze(0).unsqueeze(0)  # (1, 1, input_dim)
    output, state = model(x_t, state)
    # output: (1, 1, hidden_dim)
    # Use output for prediction

Latency: ~1-5ms per timestep on GPU (depends on model size).

Q: How do I deploy DREAM to production?

A: Deployment checklist:

Save model:

torch.save(model.state_dict(), 'model.pt')

Version control:
- Pin DREAM version
- Document dependencies
Monitoring:
- Log prediction latency
- Monitor surprise levels
- Track model drift
Testing:
- Test on production-like data
- Validate performance metrics

Still Have Questions?

Check the documentation
Search discussions
Create a new discussion
Report issues

Next Steps

Contributing - Contribute to DREAM
Troubleshooting - Common issues
Examples - Real-world examples

FAQ

On this page