Saving & Loading

A fitted GlassBoxUMAP is a PyTorch model, plus the bits of state that connect raw input features to that model (PCA basis, feature means, training hyperparameters). The save and load methods round-trip all of this to a single file on disk.

Fit a model

We’ll use scikit-learn’s digits dataset, which is 1,797 handwritten digit images flattened to 64 features (8x8 grayscale pixels).

from glass_box_umap import GlassBoxUMAP

def load_data():
    from sklearn.datasets import load_digits
    from sklearn.preprocessing import StandardScaler
    digits, _ = load_digits(return_X_y=True)
    return StandardScaler().fit_transform(digits)

X = load_data()
embedder = GlassBoxUMAP(quiet=True)
embedder.fit(X)

Save the model

save takes a path and writes a PyTorch checkpoint.

from pathlib import Path

model_path = Path.cwd() / "embedder.pt"
embedder.save(model_path)

print(f"saved model ({model_path.stat().st_size / 1024:.1f} KiB) to {model_path.name}")
saved model (167.2 KiB) to embedder.pt

Load the model

GlassBoxUMAP.load is a classmethod that reconstructs the embedder from a checkpoint.

loaded = GlassBoxUMAP.load(model_path)

The reloaded embedder is functionally identical to the original. We can confirm that with ==:

loaded == embedder
True

Note

== performs a semantic comparison: it checks that both embedders describe the same trained model — matching architecture, hyperparameters, learned weights, PCA fit, and centering vector — and ignores incidental runtime state like device placement, logging verbosity, and DataLoader workers. So two consecutive loads of the same file always equate, even if one was moved to CPU and the other left on GPU.

Concretely, that means transform and compute_contributions produce bitwise-identical outputs from either embedder:

import numpy as np

Z_original = embedder.transform(X)
Z_loaded = loaded.transform(X)
assert np.array_equal(Z_original, Z_loaded)

C_original = embedder.compute_contributions(X)
C_loaded = loaded.compute_contributions(X)
assert np.array_equal(C_original, C_loaded)

print("embeddings and contributions match exactly")
embeddings and contributions match exactly

Caveats

Custom encoders

If you fit a model with a custom encoder, the loading process must register the same encoder before load runs. See Saving/Loading in the Custom Encoders guide for details.