Monitoring training

Real datasets can span millions of points across thousands of features. Fitting Glass Box UMAP at that scale takes real time, and in such cases you’ll want to observe how training is proceeding. This guide shows you how to monitor training progress and keep a record of each fit on disk.

Automated logging with Tensorboard

Under the hood, Glass Box UMAP trains its encoder with PyTorch Lightning, and every fit is automatically logged with TensorBoard to a temporary directory. To persist these logs, just pass an explicit checkpoint_dir to GlassBoxUMAP:

from pathlib import Path
import shutil

from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from glass_box_umap import GlassBoxUMAP

# Store logs to ./runs/
checkpoint_dir = Path.cwd() / "runs"

# The directory can exist, but needn't. We remove it for a fresh start.
shutil.rmtree(checkpoint_dir)

embedder = GlassBoxUMAP(
    random_state=0,
    checkpoint_dir=checkpoint_dir,
    quiet=True,
)

X, y = load_digits(return_X_y=True)
X = StandardScaler().fit_transform(X)
embedder.fit(X)

!tree runs/
runs/
├── checkpoints
│   └── best.ckpt
└── logs
    ├── events.out.tfevents.1778537655.evans-Apple-MacBook-Pro.71219.0
    └── hparams.yaml

3 directories, 3 files
  • checkpoints/best.ckpt is the same checkpoint that restore_best_weights (default True) reloads at the end of training, so you don’t normally need to touch it. It stays on disk in case you want to inspect or reload a specific run later.

  • logs/events.out.tfevents.… is the TensorBoard event file.

The TensorBoard events file isn’t human-readable, but can be viewed using a tensoboard server:

tensorboard --logdir runs/

Note

tensorboard ships as a dependency of glass-box-umap, so nothing extra needs to be installed. From the project root:

That starts a server (default http://localhost:6006) which auto-discovers every event file under runs/. Leave it running while you train and it will poll the directory and refresh logged data as new events are written, allowing you to watch a fit in progress.

Visualizing embedding evolution during training

As an alternative diagnostic, Glass Box UMAP exposes a LiveEmbeddingCallback that streams the embedding itself to a Bokeh server in your browser. After each training epoch, it runs transform on a slice of X and pushes the new 2D coordinates to the page, where a slider lets you scrub back through every epoch, a play button replays the trajectory, and a save button writes a self-contained HTML snapshot of the run.

Warning

Running transform after every epoch is not free. On large datasets the extra forward pass per epoch will noticeably slow training, so pass a representative subsample to the callback (as above) rather than the full X. When all you need is the loss curve, prefer TensorBoard.

This live embedding offers diagnostic insight into how the learned manifold is forming. You can watch the geometry settle (or fail to) and catch a misbehaving run within a few epochs instead of waiting for the training to complete.

It plugs in through extra_callbacks:

from glass_box_umap.plotting import LiveEmbeddingCallback

# Create the embedder
embedder = GlassBoxUMAP(
    random_state=0,
    quiet=True
)

# Now create the callback, passing embedder.transform
labels = [str(idx) for idx in y]
callback = LiveEmbeddingCallback(
    transform_fn=embedder.transform,
    X=X[:500],
    labels=labels[:500],
    block_after_fit=False,
)

# Append the callback to `extra_callbacks`
embedder.extra_callbacks.append(callback)

_ = embedder.fit(X)
Live embedding serving at http://localhost:65267/
Training done. Server still serving at http://localhost:65267/.

Note

The above code will open an interface in your browser, updating after each epoch. For your convenience, we replicate the interface below.

Hit “Play” to observe the embedding evolve throughout the training.