`glass_box_umap`¶

Subpackages¶

glass_box_umap.plotting

Submodules¶

glass_box_umap.jacobian

Overview¶

Classes¶
`GlassBoxUMAP`	Glass Box UMAP model.
`ParametricUMAP`	Parametric UMAP model.

Classes¶

class GlassBoxUMAP(*, n_neighbors: int = 15, min_dist: float = 0.1, metric: str = 'euclidean', n_components: int = 2, negative_sample_rate: int = 5, repulsion_strength: float = 1.0, pca_components: int | None = None, encoder_name: str = 'default', encoder_kwargs: dict[str, Any] = dict(), lr: float = 0.001, epochs: int = 200, batch_size: int = 10000, num_batches: int | None = None, num_workers: int = 0, checkpoint_dir: Path | None = None, restore_best_weights: bool = True, random_state: int | None = None, quiet: bool = False, extra_callbacks: list[pl.Callback] = list())[source]¶

Glass Box UMAP model.

Base Classes:

ParametricUMAP

Attributes:

n_neighbors¶: Number of nearest neighbors used to construct the high-dimensional graph.

min_dist¶: Minimum distance between points in the low-dimensional embedding.

metric¶: Distance metric used for computing nearest neighbors.

n_components¶: Dimensionality of the learned embedding.

random_state¶: Random seed for reproducibility. If None, no seed is set.

encoder_kwargs¶: Additional keyword arguments passed to the encoder constructor.

pca_components¶: Number of PCA components for input preprocessing. If None, no PCA is applied. PCA requires 2D input (n_samples, n_features); leave this None when fitting on multi-dimensional data (e.g. images for a convolutional encoder).

lr¶: Learning rate for the optimizer.

epochs¶: Number of training epochs.

batch_size¶: Batch size for training and (default) inference.

negative_sample_rate¶: Number of negative samples per positive edge in the UMAP loss.

repulsion_strength¶: Weighting of the repulsive term in the UMAP loss.

num_workers¶: Number of data loading workers.

checkpoint_dir¶: Directory for saving training checkpoints. If None, a temporary directory is used.

Methods:

compute_contributions(X: NDArray[floating] | Tensor, batch_size: int | None = None, reduction: Literal['l2'] | None = None) → NDArray[float32][source]¶

Compute per-feature contributions to the embedding via Gradient x Input.

Projects gradients back to raw feature space if PCA preprocessing was used.

Parameters:

X : NDArray[floating] | Tensor

The input data (same format as passed to fit/transform). Shape: (n_samples, n_features).
batch_size : int | None

Batch size for Jacobian computation. Defaults to self.batch_size.
reduction : Literal['l2'] | None

How to reduce contributions across embedding dimensions. If "l2", takes the L2 norm across components, returning shape (n_samples, n_features). If None, returns the full (n_samples, n_components, n_features) array.

Returns:

Feature contributions array. Shape is (n_samples, n_components, n_features) when reduction is None, or (n_samples, n_features) when a reduction is applied.

Return type:

NDArray[float32]

compute_jacobian(x: Tensor, batch_size: int = 1024) → Tensor[source]¶

Compute the Jacobian of a model using vmap + jacrev with functional_call.

See glass_box_umap.jacobian.compute_jacobian() for details.

Return type:: Tensor

class ParametricUMAP(*, n_neighbors: int = 15, min_dist: float = 0.1, metric: str = 'euclidean', n_components: int = 2, negative_sample_rate: int = 5, repulsion_strength: float = 1.0, pca_components: int | None = None, encoder_name: str = 'default', encoder_kwargs: dict[str, Any] = dict(), lr: float = 0.001, epochs: int = 200, batch_size: int = 10000, num_batches: int | None = None, num_workers: int = 0, checkpoint_dir: Path | None = None, restore_best_weights: bool = True, random_state: int | None = None, quiet: bool = False, extra_callbacks: list[Callback] = list())[source]¶

Parametric UMAP model.

Attributes:

n_neighbors : int ¶: Number of nearest neighbors used to construct the high-dimensional graph.

min_dist : float ¶: Minimum distance between points in the low-dimensional embedding.

metric : str ¶: Distance metric used for computing nearest neighbors.

n_components : int ¶: Dimensionality of the learned embedding.

negative_sample_rate : int ¶: Number of negative samples per positive edge in the UMAP loss.

repulsion_strength : float ¶: Weighting of the repulsive term in the UMAP loss.

pca_components : int | None ¶: Number of PCA components for input preprocessing. If None, no PCA is applied. PCA requires 2D input (n_samples, n_features); leave this None when fitting on multi-dimensional data (e.g. images for a convolutional encoder).

encoder_name : str ¶: Name of the registered encoder architecture.

encoder_kwargs : dict[str, Any]¶: Additional keyword arguments passed to the encoder constructor.

lr : float ¶: Learning rate for the optimizer.

epochs : int ¶: Number of training epochs.

batch_size : int ¶: Batch size for training and (default) inference.

num_batches : int | None ¶: Cap the number of batches per epoch. Useful for large graphs where a full pass would be prohibitively long. If None, trains on all batches.

num_workers : int ¶: Number of data loading workers.

checkpoint_dir : Path | None ¶: Directory for saving training checkpoints. If None, a temporary directory is used.

restore_best_weights : bool ¶: If True, restore the model weights from the epoch with the lowest loss after training. If False, keep the weights from the final epoch.

random_state : int | None ¶: Random seed for reproducibility. If None, no seed is set.

quiet : bool ¶: If True, suppress Lightning logs and progress output.

extra_callbacks : list[pl.Callback]¶: Additional Lightning callbacks to attach to the trainer.

Methods:

to(device: str | device) → Self[source]¶

Move the model (if initialized) and update the target device.

Return type:: Self

fit(X: NDArray[floating] | Tensor) → Self[source]¶

Return type:: Self

transform(X: NDArray[floating] | Tensor, batch_size: int | None = None) → NDArray[floating][source]¶

Return type:: NDArray[floating]

fit_transform(X: NDArray[floating] | Tensor) → NDArray[floating][source]¶

Return type:: NDArray[floating]

save(path: Path) → None[source]¶

classmethod load(path: Path) → Self[source]¶

Return type:: Self

glass_box_umap¶

Subpackages¶

Submodules¶

Overview¶

Classes¶

`glass_box_umap`¶