Troubleshooting

Common issues and their solutions when working with DREAM.

Memory Issues

"CUDA out of memory"

Problem: Running out of GPU memory during training.

Solutions:

Reduce batch size:
```
batch_size = 16  # instead of 32
```

Reduce model size:

config = DREAMConfig(
    hidden_dim=128,  # instead of 256
    rank=8           # instead of 16
)

Use gradient accumulation:

accumulation_steps = 4
optimizer.zero_grad()

for i, batch in enumerate(dataloader):
    loss = compute_loss(batch) / accumulation_steps
    loss.backward()

    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

Use mixed precision:

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

with autocast():
    output, state = model(x)
    loss = criterion(output, target)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Clear cache:
```
torch.cuda.empty_cache()
```

Numerical Stability

"Loss is NaN" or "Loss exploded"

Problem: Training produces NaN or Inf values.

Causes:

Learning rate too high
Time step too large
Unstable initialization
Gradient explosion

Solutions:

Reduce learning rate:

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)  # instead of 1e-3

Smaller time step:

config = DREAMConfig(time_step=0.05)  # instead of 0.1

Larger time constant:

config = DREAMConfig(ltc_tau_sys=15.0)  # instead of 10.0

Smaller weights:

config = DREAMConfig(target_norm=1.5)  # instead of 2.0

Gradient clipping:

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Check input normalization:

# Ensure inputs are normalized
x = (x - x.mean()) / x.std()

Learning Issues

"Model doesn't learn"

Problem: Loss stays constant or decreases very slowly.

Check:

# 1. Gradients are flowing
for name, param in model.named_parameters():
    if param.grad is None:
        print(f"No gradient for {name}")

# 2. Surprise is non-zero
print(f"Surprise: {state.avg_surprise.mean().item()}")

# 3. Input is normalized
print(f"Input mean: {x.mean().item()}, std: {x.std().item()}")

# 4. Check loss computation
print(f"Loss: {loss.item()}")

Solutions:

Increase plasticity:

config = DREAMConfig(base_plasticity=0.2)  # instead of 0.1

Reduce threshold:

config = DREAMConfig(base_threshold=0.3)  # instead of 0.5

Enable LTC:
```
config = DREAMConfig(ltc_enabled=True)
```

Check data loading:

# Verify data is correct
print(f"Input shape: {x.shape}")
print(f"Target shape: {y.shape}")
print(f"Unique labels: {y.unique()}")

Try different initialization:

def init_weights(m):
    if hasattr(m, 'weight') and m.weight.dim() > 1:
        torch.nn.init.xavier_uniform_(m.weight)

model.apply(init_weights)

"Model converges too slowly"

Problem: Training takes many epochs to converge.

Solutions:

Increase learning rate:

optimizer = torch.optim.AdamW(model.parameters(), lr=3e-3)

Increase plasticity:

config = DREAMConfig(base_plasticity=0.15)

Use learning rate warmup:

from torch.optim.lr_scheduler import LinearLR

warmup = LinearLR(optimizer, start_factor=0.1, end_factor=1.0, total_iters=1000)

Reduce sequence length:

# Use shorter sequences for faster iteration
seq_len = 50  # instead of 100

Use larger batch size:
```
batch_size = 64  # instead of 32
```

Performance Issues

"Training is slow"

Problem: Training takes too long per epoch.

Optimizations:

Use GPU:

model = model.to('cuda')
x = x.to('cuda')

Reduce rank:

config = DREAMConfig(rank=8)  # instead of 16

Disable LTC if not needed:
```
config = DREAMConfig(ltc_enabled=False)
```

Use mixed precision:

from torch.cuda.amp import autocast

with autocast():
    output, state = model(x)

Increase batch size:

batch_size = 64  # larger batches are more efficient

Use multiple workers for data loading:

loader = DataLoader(dataset, batch_size=32, num_workers=4)

Profile to find bottlenecks:

with torch.profiler.profile() as prof:
    output, state = model(x)

print(prof.key_averages().table(sort_by="cuda_time_total"))

State Management Issues

"State shape mismatch"

Problem: State tensors have incorrect shapes.

Solution:

# Ensure batch size matches
state = model.init_state(batch_size=x.shape[0])

# Check state shapes
print(f"Input shape: {x.shape}")
print(f"State h shape: {state.h.shape}")
print(f"State U shape: {state.U.shape}")

"Memory leak with state"

Problem: GPU memory grows over time.

Solution:

# Detach state for truncated BPTT
state = state.detach()

# Or reinitialize state periodically
if step % 100 == 0:
    state = model.init_state(batch_size)

Import Errors

"No module named 'dream'"

Problem: Cannot import DREAM.

Solutions:

Verify installation:
```
pip list | grep dreamnn
```
Reinstall:
```
pip install --upgrade dreamnn
```

Check Python environment:

python -c "import sys; print(sys.executable)"
pip --version

Install from source:

git clone https://github.com/karl4th/dream-nn.git
cd dream-nn
pip install -e .

"ImportError: cannot import name 'DREAM'"

Problem: Import statement is incorrect.

Solution:

# Correct imports
from dream import DREAM, DREAMConfig, DREAMCell
from dream import DREAMStack, DREAMState

# Or
import dream
model = dream.DREAM(input_dim=64, hidden_dim=128)

Version Compatibility

"PyTorch version mismatch"

Problem: Incompatible PyTorch version.

Solution:

# Check PyTorch version
python -c "import torch; print(torch.__version__)"

# Upgrade if needed
pip install --upgrade torch

# DREAM requires PyTorch >= 2.0.0

Debugging Tips

Enable Debug Mode

import logging
logging.basicConfig(level=logging.DEBUG)

# Add debug prints
model.train()
for batch in dataloader:
    print(f"Input shape: {batch.shape}")
    output, state = model(batch)
    print(f"Output shape: {output.shape}")
    print(f"Surprise: {state.avg_surprise.mean().item()}")

Check Gradients

for name, param in model.named_parameters():
    if param.grad is not None:
        print(f"{name}: grad norm = {param.grad.norm().item():.4f}")
    else:
        print(f"{name}: no gradient")

Monitor State Statistics

@torch.no_grad()
def log_state(state, step):
    print(f"Step {step}:")
    print(f"  h norm: {state.h.norm().item():.4f}")
    print(f"  U norm: {state.U.norm().item():.4f}")
    print(f"  Surprise: {state.avg_surprise.mean().item():.4f}")
    print(f"  Adaptive tau: {state.adaptive_tau.mean().item():.4f}")

Still Having Issues?

If you can't find a solution here:

Check the FAQ for common questions
Search existing issues on GitHub
Create a new issue with:
- Minimal reproducible example
- Error message (full traceback)
- Environment details (Python, PyTorch, OS versions)
- What you've already tried

Next Steps

FAQ - Frequently asked questions
Contributing - Contribute to DREAM
Training Best Practices - Optimize training

Troubleshooting

On this page