Monitoring training¶
Real datasets can span millions of points across thousands of features. Fitting Glass Box UMAP at that scale takes real time, and in such cases you’ll want to observe how training is proceeding. This guide shows you how to monitor training progress and keep a record of each fit on disk.
Automated logging with Tensorboard¶
Under the hood, Glass Box UMAP trains its encoder with PyTorch Lightning, and every fit is automatically logged with TensorBoard to a temporary directory. To persist these logs, just pass an explicit checkpoint_dir to GlassBoxUMAP:
from pathlib import Path
import shutil
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from glass_box_umap import GlassBoxUMAP
# Store logs to ./runs/
checkpoint_dir = Path.cwd() / "runs"
# The directory can exist, but needn't. We remove it for a fresh start.
shutil.rmtree(checkpoint_dir)
embedder = GlassBoxUMAP(
random_state=0,
checkpoint_dir=checkpoint_dir,
quiet=True,
)
X, y = load_digits(return_X_y=True)
X = StandardScaler().fit_transform(X)
embedder.fit(X)
!tree runs/
runs/
├── checkpoints
│ └── best.ckpt
└── logs
├── events.out.tfevents.1778537655.evans-Apple-MacBook-Pro.71219.0
└── hparams.yaml
3 directories, 3 files
checkpoints/best.ckptis the same checkpoint thatrestore_best_weights(defaultTrue) reloads at the end of training, so you don’t normally need to touch it. It stays on disk in case you want to inspect or reload a specific run later.logs/events.out.tfevents.…is the TensorBoard event file.
The TensorBoard events file isn’t human-readable, but can be viewed using a tensoboard server:
tensorboard --logdir runs/
Note
tensorboard ships as a dependency of glass-box-umap, so nothing extra needs to be installed. From the project root:
That starts a server (default http://localhost:6006) which auto-discovers every event file under runs/. Leave it running while you train and it will poll the directory and refresh logged data as new events are written, allowing you to watch a fit in progress.
Visualizing embedding evolution during training¶
As an alternative diagnostic, Glass Box UMAP exposes a LiveEmbeddingCallback that streams the embedding itself to a Bokeh server in your browser. After each training epoch, it runs transform on a slice of X and pushes the new 2D coordinates to the page, where a slider lets you scrub back through every epoch, a play button replays the trajectory, and a save button writes a self-contained HTML snapshot of the run.
Warning
Running transform after every epoch is not free. On large datasets the extra forward pass per epoch will noticeably slow training, so pass a representative subsample to the callback (as above) rather than the full X. When all you need is the loss curve, prefer TensorBoard.
This live embedding offers diagnostic insight into how the learned manifold is forming. You can watch the geometry settle (or fail to) and catch a misbehaving run within a few epochs instead of waiting for the training to complete.
It plugs in through extra_callbacks:
Install the plotting extras
glass_box_umap.plotting is an optional dependency that’s required for this feature. It can be installed like so:
pip install "glass-box-umap[plotting]"
# or
uv pip install "glass-box-umap[plotting]"
LiveEmbeddingCallback API
From the API docs:
- plotting.LiveEmbeddingCallback(X: Tensor, labels: list[str] | None = None, port: int = 0, output_backend: Literal['canvas', 'webgl'] = 'webgl', hover_images: ndarray[tuple[Any, ...], dtype[uint8]] | None = None, block_after_fit: bool = True) None
Pytorch Lightning callback that serves a live-updating Bokeh scatter.
Spins up a Bokeh server on a background thread, opens a browser tab, and streams a fresh embedding (via
transform_fn) to the page after each training epoch starts. Each session keeps a per-frame history that the user can scrub through with a slider, play back with a button, or export to a self-contained HTML file. Training keeps running on the main thread; updates cross to the Bokeh event loop viaDocument.add_next_tick_callback.- Parameters:
transform_fn¶ -- Callable that maps the high-dimensional
Xto a(n_samples, 2)array. Typically the embedder’stransformmethod.X¶ -- High-dimensional input fed to
transform_fnafter each epoch.labels¶ -- Optional per-sample categorical labels for coloring.
port¶ -- Port the Bokeh server listens on.
0(default) lets the OS pick a free port, which avoidsEADDRINUSEcollisions when the callback is re-instantiated within the same process (e.g. a Jupyter kernel that already hosts a previous run’s server).output_backend¶ -- Bokeh rendering backend for the scatter. Defaults to
"webgl"; switch to"canvas"if the GPU/driver/browser combination renders the plot incorrectly.hover_images¶ -- Optional uint8 image array of shape
(n_samples, H, W)or(n_samples, H, W, 3 | 4). When set, each tooltip shows the sample’s image above the index/label text.block_after_fit¶ -- When
True(default), block at the end of training so the Bokeh server keeps serving until the user presses Ctrl-C. Set toFalsefrom interactive contexts (e.g. Jupyter) where the host process already keeps the server alive.
from glass_box_umap.plotting import LiveEmbeddingCallback
# Create the embedder
embedder = GlassBoxUMAP(
random_state=0,
quiet=True
)
# Now create the callback, passing embedder.transform
labels = [str(idx) for idx in y]
callback = LiveEmbeddingCallback(
transform_fn=embedder.transform,
X=X[:500],
labels=labels[:500],
block_after_fit=False,
)
# Append the callback to `extra_callbacks`
embedder.extra_callbacks.append(callback)
_ = embedder.fit(X)
Live embedding serving at http://localhost:65267/
Training done. Server still serving at http://localhost:65267/.
Note
The above code will open an interface in your browser, updating after each epoch. For your convenience, we replicate the interface below.
Hit “Play” to observe the embedding evolve throughout the training.