glass_box_umap¶
Subpackages¶
Submodules¶
Overview¶
Glass Box UMAP model. |
|
Parametric UMAP model. |
Classes¶
- class GlassBoxUMAP(*, n_neighbors: int = 15, min_dist: float = 0.1, metric: str = 'euclidean', n_components: int = 2, negative_sample_rate: int = 5, repulsion_strength: float = 1.0, pca_components: int | None = None, encoder_name: str = 'default', encoder_kwargs: dict[str, Any] = dict(), lr: float = 0.001, epochs: int = 200, batch_size: int = 10000, num_batches: int | None = None, num_workers: int = 0, checkpoint_dir: Path | None = None, restore_best_weights: bool = True, random_state: int | None = None, quiet: bool = False, extra_callbacks: list[pl.Callback] = list())[source]¶
Glass Box UMAP model.
Base Classes:
Attributes:
- n_neighbors¶
Number of nearest neighbors used to construct the high-dimensional graph.
- min_dist¶
Minimum distance between points in the low-dimensional embedding.
- metric¶
Distance metric used for computing nearest neighbors.
- n_components¶
Dimensionality of the learned embedding.
- random_state¶
Random seed for reproducibility. If
None, no seed is set.
- encoder_kwargs¶
Additional keyword arguments passed to the encoder constructor.
- pca_components¶
Number of PCA components for input preprocessing. If
None, no PCA is applied. PCA requires 2D input(n_samples, n_features); leave thisNonewhen fitting on multi-dimensional data (e.g. images for a convolutional encoder).
- lr¶
Learning rate for the optimizer.
- epochs¶
Number of training epochs.
- batch_size¶
Batch size for training and (default) inference.
- negative_sample_rate¶
Number of negative samples per positive edge in the UMAP loss.
- repulsion_strength¶
Weighting of the repulsive term in the UMAP loss.
- num_workers¶
Number of data loading workers.
- checkpoint_dir¶
Directory for saving training checkpoints. If
None, a temporary directory is used.
Methods:
- compute_contributions(X: NDArray[floating] | Tensor, batch_size: int | None = None, reduction: Literal['l2'] | None = None) NDArray[float32][source]¶
Compute per-feature contributions to the embedding via Gradient x Input.
Projects gradients back to raw feature space if PCA preprocessing was used.
- Parameters:
X : NDArray[floating] | Tensor
The input data (same format as passed to fit/transform). Shape: (n_samples, n_features).
-
Batch size for Jacobian computation. Defaults to
self.batch_size. reduction : Literal['l2'] | None
How to reduce contributions across embedding dimensions. If
"l2", takes the L2 norm across components, returning shape (n_samples, n_features). IfNone, returns the full (n_samples, n_components, n_features) array.
- Returns:
Feature contributions array. Shape is (n_samples, n_components, n_features) when reduction is
None, or (n_samples, n_features) when a reduction is applied.- Return type:
- class ParametricUMAP(*, n_neighbors: int = 15, min_dist: float = 0.1, metric: str = 'euclidean', n_components: int = 2, negative_sample_rate: int = 5, repulsion_strength: float = 1.0, pca_components: int | None = None, encoder_name: str = 'default', encoder_kwargs: dict[str, Any] = dict(), lr: float = 0.001, epochs: int = 200, batch_size: int = 10000, num_batches: int | None = None, num_workers: int = 0, checkpoint_dir: Path | None = None, restore_best_weights: bool = True, random_state: int | None = None, quiet: bool = False, extra_callbacks: list[Callback] = list())[source]¶
Parametric UMAP model.
Attributes:
- pca_components : int | None¶
Number of PCA components for input preprocessing. If
None, no PCA is applied. PCA requires 2D input(n_samples, n_features); leave thisNonewhen fitting on multi-dimensional data (e.g. images for a convolutional encoder).
- num_batches : int | None¶
Cap the number of batches per epoch. Useful for large graphs where a full pass would be prohibitively long. If
None, trains on all batches.
- checkpoint_dir : Path | None¶
Directory for saving training checkpoints. If
None, a temporary directory is used.
- restore_best_weights : bool¶
If
True, restore the model weights from the epoch with the lowest loss after training. IfFalse, keep the weights from the final epoch.
Methods:
- to(device: str | device) Self[source]¶
Move the model (if initialized) and update the target device.
- Return type: