Monitoring training¶

Real datasets can span millions of points across thousands of features. Fitting Glass Box UMAP at that scale takes real time, and in such cases you’ll want to observe how training is proceeding. This guide shows you how to monitor training progress and keep a record of each fit on disk.

Automated logging with Tensorboard¶

Under the hood, Glass Box UMAP trains its encoder with PyTorch Lightning, and every fit is automatically logged with TensorBoard to a temporary directory. To persist these logs, just pass an explicit checkpoint_dir to GlassBoxUMAP:

from pathlib import Path
import shutil

from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from glass_box_umap import GlassBoxUMAP

# Store logs to ./runs/
checkpoint_dir = Path.cwd() / "runs"

# The directory can exist, but needn't. We remove it for a fresh start.
shutil.rmtree(checkpoint_dir)

embedder = GlassBoxUMAP(
    random_state=0,
    checkpoint_dir=checkpoint_dir,
    quiet=True,
)

X, y = load_digits(return_X_y=True)
X = StandardScaler().fit_transform(X)
embedder.fit(X)

!tree runs/

runs/
├── checkpoints
│   └── best.ckpt
└── logs
    ├── events.out.tfevents.1778537655.evans-Apple-MacBook-Pro.71219.0
    └── hparams.yaml

3 directories, 3 files

checkpoints/best.ckpt is the same checkpoint that restore_best_weights (default True) reloads at the end of training, so you don’t normally need to touch it. It stays on disk in case you want to inspect or reload a specific run later.
logs/events.out.tfevents.… is the TensorBoard event file.

The TensorBoard events file isn’t human-readable, but can be viewed using a tensoboard server:

tensorboard --logdir runs/

Note

tensorboard ships as a dependency of glass-box-umap, so nothing extra needs to be installed. From the project root:

That starts a server (default http://localhost:6006) which auto-discovers every event file under runs/. Leave it running while you train and it will poll the directory and refresh logged data as new events are written, allowing you to watch a fit in progress.

Visualizing embedding evolution during training¶

As an alternative diagnostic, Glass Box UMAP exposes a LiveEmbeddingCallback that streams the embedding itself to a Bokeh server in your browser. After each training epoch, it runs transform on a slice of X and pushes the new 2D coordinates to the page, where a slider lets you scrub back through every epoch, a play button replays the trajectory, and a save button writes a self-contained HTML snapshot of the run.

Warning

Running transform after every epoch is not free. On large datasets the extra forward pass per epoch will noticeably slow training, so pass a representative subsample to the callback (as above) rather than the full X. When all you need is the loss curve, prefer TensorBoard.

This live embedding offers diagnostic insight into how the learned manifold is forming. You can watch the geometry settle (or fail to) and catch a misbehaving run within a few epochs instead of waiting for the training to complete.

It plugs in through extra_callbacks:

Install the plotting extras

glass_box_umap.plotting is an optional dependency that’s required for this feature. It can be installed like so:

pip install "glass-box-umap[plotting]"
# or
uv pip install "glass-box-umap[plotting]"

LiveEmbeddingCallback API

From the API docs:

plotting.LiveEmbeddingCallback(X: Tensor, labels: list[str] | None = None, port: int = 0, output_backend: Literal['canvas', 'webgl'] = 'webgl', hover_images: ndarray[tuple[Any, ...], dtype[uint8]] | None = None, block_after_fit: bool = True) → None

Pytorch Lightning callback that serves a live-updating Bokeh scatter.

Spins up a Bokeh server on a background thread, opens a browser tab, and streams a fresh embedding (via transform_fn) to the page after each training epoch starts. Each session keeps a per-frame history that the user can scrub through with a slider, play back with a button, or export to a self-contained HTML file. Training keeps running on the main thread; updates cross to the Bokeh event loop via Document.add_next_tick_callback.

Parameters:

transform_fn¶ -- Callable that maps the high-dimensional X to a (n_samples, 2) array. Typically the embedder’s transform method.
X¶ -- High-dimensional input fed to transform_fn after each epoch.
labels¶ -- Optional per-sample categorical labels for coloring.
port¶ -- Port the Bokeh server listens on. 0 (default) lets the OS pick a free port, which avoids EADDRINUSE collisions when the callback is re-instantiated within the same process (e.g. a Jupyter kernel that already hosts a previous run’s server).
output_backend¶ -- Bokeh rendering backend for the scatter. Defaults to "webgl"; switch to "canvas" if the GPU/driver/browser combination renders the plot incorrectly.
hover_images¶ -- Optional uint8 image array of shape (n_samples, H, W) or (n_samples, H, W, 3 | 4). When set, each tooltip shows the sample’s image above the index/label text.
block_after_fit¶ -- When True (default), block at the end of training so the Bokeh server keeps serving until the user presses Ctrl-C. Set to False from interactive contexts (e.g. Jupyter) where the host process already keeps the server alive.

from glass_box_umap.plotting import LiveEmbeddingCallback

# Create the embedder
embedder = GlassBoxUMAP(
    random_state=0,
    quiet=True
)

# Now create the callback, passing embedder.transform
labels = [str(idx) for idx in y]
callback = LiveEmbeddingCallback(
    transform_fn=embedder.transform,
    X=X[:500],
    labels=labels[:500],
    block_after_fit=False,
)

# Append the callback to `extra_callbacks`
embedder.extra_callbacks.append(callback)

_ = embedder.fit(X)

Live embedding serving at http://localhost:65267/

Training done. Server still serving at http://localhost:65267/.

Note

The above code will open an interface in your browser, updating after each epoch. For your convenience, we replicate the interface below.

Hit “Play” to observe the embedding evolve throughout the training.