Manifestro Docs

Troubleshooting

Common issues and solutions for DREAM

Troubleshooting

Common issues and their solutions when working with DREAM.

Memory Issues

"CUDA out of memory"

Problem: Running out of GPU memory during training.

Solutions:

  1. Reduce batch size:

    batch_size = 16  # instead of 32
  2. Reduce model size:

    config = DREAMConfig(
        hidden_dim=128,  # instead of 256
        rank=8           # instead of 16
    )
  3. Use gradient accumulation:

    accumulation_steps = 4
    optimizer.zero_grad()
    
    for i, batch in enumerate(dataloader):
        loss = compute_loss(batch) / accumulation_steps
        loss.backward()
    
        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()
  4. Use mixed precision:

    from torch.cuda.amp import autocast, GradScaler
    
    scaler = GradScaler()
    
    with autocast():
        output, state = model(x)
        loss = criterion(output, target)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
  5. Clear cache:

    torch.cuda.empty_cache()

Numerical Stability

"Loss is NaN" or "Loss exploded"

Problem: Training produces NaN or Inf values.

Causes:

  • Learning rate too high
  • Time step too large
  • Unstable initialization
  • Gradient explosion

Solutions:

  1. Reduce learning rate:

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)  # instead of 1e-3
  2. Smaller time step:

    config = DREAMConfig(time_step=0.05)  # instead of 0.1
  3. Larger time constant:

    config = DREAMConfig(ltc_tau_sys=15.0)  # instead of 10.0
  4. Smaller weights:

    config = DREAMConfig(target_norm=1.5)  # instead of 2.0
  5. Gradient clipping:

    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
  6. Check input normalization:

    # Ensure inputs are normalized
    x = (x - x.mean()) / x.std()

Learning Issues

"Model doesn't learn"

Problem: Loss stays constant or decreases very slowly.

Check:

# 1. Gradients are flowing
for name, param in model.named_parameters():
    if param.grad is None:
        print(f"No gradient for {name}")

# 2. Surprise is non-zero
print(f"Surprise: {state.avg_surprise.mean().item()}")

# 3. Input is normalized
print(f"Input mean: {x.mean().item()}, std: {x.std().item()}")

# 4. Check loss computation
print(f"Loss: {loss.item()}")

Solutions:

  1. Increase plasticity:

    config = DREAMConfig(base_plasticity=0.2)  # instead of 0.1
  2. Reduce threshold:

    config = DREAMConfig(base_threshold=0.3)  # instead of 0.5
  3. Enable LTC:

    config = DREAMConfig(ltc_enabled=True)
  4. Check data loading:

    # Verify data is correct
    print(f"Input shape: {x.shape}")
    print(f"Target shape: {y.shape}")
    print(f"Unique labels: {y.unique()}")
  5. Try different initialization:

    def init_weights(m):
        if hasattr(m, 'weight') and m.weight.dim() > 1:
            torch.nn.init.xavier_uniform_(m.weight)
    
    model.apply(init_weights)

"Model converges too slowly"

Problem: Training takes many epochs to converge.

Solutions:

  1. Increase learning rate:

    optimizer = torch.optim.AdamW(model.parameters(), lr=3e-3)
  2. Increase plasticity:

    config = DREAMConfig(base_plasticity=0.15)
  3. Use learning rate warmup:

    from torch.optim.lr_scheduler import LinearLR
    
    warmup = LinearLR(optimizer, start_factor=0.1, end_factor=1.0, total_iters=1000)
  4. Reduce sequence length:

    # Use shorter sequences for faster iteration
    seq_len = 50  # instead of 100
  5. Use larger batch size:

    batch_size = 64  # instead of 32

Performance Issues

"Training is slow"

Problem: Training takes too long per epoch.

Optimizations:

  1. Use GPU:

    model = model.to('cuda')
    x = x.to('cuda')
  2. Reduce rank:

    config = DREAMConfig(rank=8)  # instead of 16
  3. Disable LTC if not needed:

    config = DREAMConfig(ltc_enabled=False)
  4. Use mixed precision:

    from torch.cuda.amp import autocast
    
    with autocast():
        output, state = model(x)
  5. Increase batch size:

    batch_size = 64  # larger batches are more efficient
  6. Use multiple workers for data loading:

    loader = DataLoader(dataset, batch_size=32, num_workers=4)
  7. Profile to find bottlenecks:

    with torch.profiler.profile() as prof:
        output, state = model(x)
    
    print(prof.key_averages().table(sort_by="cuda_time_total"))

State Management Issues

"State shape mismatch"

Problem: State tensors have incorrect shapes.

Solution:

# Ensure batch size matches
state = model.init_state(batch_size=x.shape[0])

# Check state shapes
print(f"Input shape: {x.shape}")
print(f"State h shape: {state.h.shape}")
print(f"State U shape: {state.U.shape}")

"Memory leak with state"

Problem: GPU memory grows over time.

Solution:

# Detach state for truncated BPTT
state = state.detach()

# Or reinitialize state periodically
if step % 100 == 0:
    state = model.init_state(batch_size)

Import Errors

"No module named 'dream'"

Problem: Cannot import DREAM.

Solutions:

  1. Verify installation:

    pip list | grep dreamnn
  2. Reinstall:

    pip install --upgrade dreamnn
  3. Check Python environment:

    python -c "import sys; print(sys.executable)"
    pip --version
  4. Install from source:

    git clone https://github.com/karl4th/dream-nn.git
    cd dream-nn
    pip install -e .

"ImportError: cannot import name 'DREAM'"

Problem: Import statement is incorrect.

Solution:

# Correct imports
from dream import DREAM, DREAMConfig, DREAMCell
from dream import DREAMStack, DREAMState

# Or
import dream
model = dream.DREAM(input_dim=64, hidden_dim=128)

Version Compatibility

"PyTorch version mismatch"

Problem: Incompatible PyTorch version.

Solution:

# Check PyTorch version
python -c "import torch; print(torch.__version__)"

# Upgrade if needed
pip install --upgrade torch

# DREAM requires PyTorch >= 2.0.0

Debugging Tips

Enable Debug Mode

import logging
logging.basicConfig(level=logging.DEBUG)

# Add debug prints
model.train()
for batch in dataloader:
    print(f"Input shape: {batch.shape}")
    output, state = model(batch)
    print(f"Output shape: {output.shape}")
    print(f"Surprise: {state.avg_surprise.mean().item()}")

Check Gradients

for name, param in model.named_parameters():
    if param.grad is not None:
        print(f"{name}: grad norm = {param.grad.norm().item():.4f}")
    else:
        print(f"{name}: no gradient")

Monitor State Statistics

@torch.no_grad()
def log_state(state, step):
    print(f"Step {step}:")
    print(f"  h norm: {state.h.norm().item():.4f}")
    print(f"  U norm: {state.U.norm().item():.4f}")
    print(f"  Surprise: {state.avg_surprise.mean().item():.4f}")
    print(f"  Adaptive tau: {state.adaptive_tau.mean().item():.4f}")

Still Having Issues?

If you can't find a solution here:

  1. Check the FAQ for common questions
  2. Search existing issues on GitHub
  3. Create a new issue with:
    • Minimal reproducible example
    • Error message (full traceback)
    • Environment details (Python, PyTorch, OS versions)
    • What you've already tried

Next Steps

On this page