Manifestro Docs

FAQ

Frequently asked questions about DREAM

FAQ

Frequently asked questions about DREAM (Dynamic Recall and Elastic Adaptive Memory).

General Questions

Q: How does DREAM compare to LSTM?

A: DREAM has several advantages over LSTM:

FeatureLSTMDREAM
Time constantsFixedAdaptive (LTC)
Learning rateGlobal (optimizer)Local (per synapse)
Memory updateAlways same speedSurprise-modulated
State representationHidden onlyHidden + Fast Weights
Adaptation during inference❌ No✅ Yes

DREAM excels at online learning and non-stationary data where patterns change over time. LSTM is better for static patterns and has more mature tooling.


Q: Can I use DREAM for text/NLP tasks?

A: Yes! DREAM works well for sequential data including text. For NLP:

# Use embeddings as input
embedding = nn.Embedding(vocab_size, embed_dim)
model = DREAM(input_dim=embed_dim, hidden_dim=256)

# Process
embedded = embedding(token_ids)  # (batch, seq_len, embed_dim)
output, state = model(embedded)

Common NLP applications:

  • Language modeling
  • Sequence labeling
  • Text classification
  • Machine translation (encoder-decoder)

Q: Is DREAM compatible with transformers?

A: Yes! You can combine DREAM with transformers:

Option 1: DREAM + Attention

class HybridModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.dream = DREAM(embed_dim, hidden_dim)
        self.attention = nn.MultiheadAttention(hidden_dim, num_heads)

Option 2: DREAM as encoder, transformer as decoder

encoder = DREAM(input_dim, hidden_dim)
decoder = TransformerDecoder(...)

Option 3: Replace self-attention with DREAM

# In transformer encoder layer
self.dream = DREAM(d_model, d_model)
# Instead of self-attention

Q: What's the best rank value?

A: It depends on your needs:

RankUse CaseMemory
4-8Compression, speedLow
16BalancedMedium
32ExpressivityHigh

Recommendation: Start with rank=16. Adjust based on:

  • Memory constraints: Lower rank
  • Complex patterns: Higher rank
  • Speed requirements: Lower rank

Q: Can I train DREAM on CPU?

A: Yes, but GPU is recommended for:

  • Large sequences (seq_len > 100)
  • Large batch sizes (batch_size > 16)
  • Deep models (multiple layers)

CPU training tips:

# Smaller model
config = DREAMConfig(hidden_dim=64, rank=8)

# Smaller batches
batch_size = 4

# Shorter sequences
seq_len = 50

Q: Does DREAM support mixed precision?

A: Yes! Use PyTorch's AMP:

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    optimizer.zero_grad()
    
    with autocast():
        output, state = model(batch)
        loss = criterion(output, target)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Benefits:

  • 2-3x faster on Volta/Ampere GPUs
  • Reduced memory usage
  • Larger batch sizes possible

Training Questions

Q: How do I save/load a trained model?

A: Use PyTorch's state dict:

# Save
torch.save({
    'model': model.state_dict(),
    'config': config,
    'optimizer': optimizer.state_dict(),
    'epoch': epoch,
}, 'checkpoint.pt')

# Load
checkpoint = torch.load('checkpoint.pt')
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']

Q: Can I freeze fast weights during training?

A: Yes, freeze specific parameters:

# Freeze U (fast weights)
for param in model.cell.U:
    param.requires_grad = False

# Freeze entire DREAM cell
for param in model.parameters():
    param.requires_grad = False

# Unfreeze specific parameters
for param in model.classifier.parameters():
    param.requires_grad = True

Q: What optimizer works best?

A: We recommend AdamW:

optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-3,
    weight_decay=1e-4
)

Alternatives:

  • Adam: Good default, but less regularization
  • SGD with momentum: For fine-tuning
  • LAMB: For very large models

Q: How do I use learning rate scheduling?

A: Use PyTorch schedulers:

# Reduce on plateau
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5
)

# Cosine annealing
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=num_epochs
)

# Step decay
scheduler = torch.optim.lr_scheduler.StepLR(
    optimizer, step_size=30, gamma=0.1
)

# In training loop
for epoch in range(num_epochs):
    train()
    validate()
    scheduler.step()  # or scheduler.step(val_loss)

Architecture Questions

Q: How many layers should I use?

A: Depends on task complexity:

LayersUse Case
1Simple tasks, speed
2-3Most tasks (recommended)
4+Complex patterns

Example:

# Simple
model = DREAM(input_dim=64, hidden_dim=128)

# Standard
stack = DREAMStack(
    input_dim=64,
    hidden_dims=[128, 128],
    rank=16
)

# Deep
stack = DREAMStack(
    input_dim=64,
    hidden_dims=[128, 256, 256, 128],
    rank=16,
    dropout=0.1
)

Q: Should I enable LTC?

A: LTC (Liquid Time-Constants) provides:

  • ✅ Adaptive integration speeds
  • ✅ Better handling of novelty
  • ✅ More interpretable dynamics

Disable LTC when:

  • You need maximum speed
  • You want standard RNN behavior
  • Your data is stationary
# Enable LTC (default)
config = DREAMConfig(ltc_enabled=True)

# Disable LTC
config = DREAMConfig(ltc_enabled=False)

Q: How does surprise-driven plasticity work?

A: Surprise modulates learning:

  1. Low surprise (expected error) → Small updates
  2. High surprise (unexpected error) → Large updates
# High surprise triggers faster learning
surprise = sigmoid((error_norm - threshold) / temperature)

# Fast weights update
dU = -decay * (U - U_target) + plasticity * surprise * hebbian

Benefit: Model adapts quickly to novel patterns while maintaining stability for expected patterns.


Debugging Questions

Q: How do I monitor what's happening inside DREAM?

A: Log state statistics:

@torch.no_grad()
def log_state(state, step):
    print(f"Step {step}:")
    print(f"  Hidden norm: {state.h.norm().item():.4f}")
    print(f"  U norm: {state.U.norm().item():.4f}")
    print(f"  Surprise: {state.avg_surprise.mean().item():.4f}")
    print(f"  Adaptive tau: {state.adaptive_tau.mean().item():.4f}")

# During training
for step, batch in enumerate(dataloader):
    output, state = model(batch)
    if step % 100 == 0:
        log_state(state, step)

Q: Loss is NaN. What do I do?

A: Try these fixes:

# 1. Reduce learning rate
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

# 2. Smaller time step
config = DREAMConfig(time_step=0.05)

# 3. Gradient clipping
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

# 4. Normalize input
x = (x - x.mean()) / x.std()

See Troubleshooting for more solutions.


Q: How do I check if gradients are flowing?

A: Inspect gradients:

for name, param in model.named_parameters():
    if param.grad is not None:
        grad_norm = param.grad.norm().item()
        print(f"{name}: grad norm = {grad_norm:.4f}")
    else:
        print(f"{name}: no gradient")

No gradients? Check:

  • requires_grad=True for parameters
  • Loss is connected to model output
  • loss.backward() is called

Performance Questions

Q: How can I make training faster?

A: Optimizations:

# 1. Use GPU
model = model.to('cuda')

# 2. Mixed precision
from torch.cuda.amp import autocast

# 3. Reduce rank
config = DREAMConfig(rank=8)

# 4. Larger batch size
batch_size = 64

# 5. Multiple data workers
loader = DataLoader(dataset, batch_size=32, num_workers=4)

Q: How can I reduce memory usage?

A: Memory optimizations:

# 1. Smaller model
config = DREAMConfig(hidden_dim=128, rank=8)

# 2. Gradient accumulation
accumulation_steps = 4

# 3. Detach state periodically
state = state.detach()

# 4. Clear cache
torch.cuda.empty_cache()

Deployment Questions

Q: Can I export DREAM to ONNX?

A: DREAM can be exported to ONNX, but there are limitations due to dynamic state. Contact maintainers for ONNX export support.


Q: Can I use DREAM for real-time inference?

A: Yes! DREAM supports streaming inference:

model = DREAM(input_dim=64, hidden_dim=128)
state = model.init_state(batch_size=1)

# Process one timestep at a time
for x_t in data_stream:
    x_t = x_t.unsqueeze(0).unsqueeze(0)  # (1, 1, input_dim)
    output, state = model(x_t, state)
    # output: (1, 1, hidden_dim)
    # Use output for prediction

Latency: ~1-5ms per timestep on GPU (depends on model size).


Q: How do I deploy DREAM to production?

A: Deployment checklist:

  1. Save model:

    torch.save(model.state_dict(), 'model.pt')
  2. Version control:

    • Pin DREAM version
    • Document dependencies
  3. Monitoring:

    • Log prediction latency
    • Monitor surprise levels
    • Track model drift
  4. Testing:

    • Test on production-like data
    • Validate performance metrics

Still Have Questions?


Next Steps

On this page