Tools (gd.tl)

The tools module provides functions for model training, computing projections, dimensionality reduction, imputation, and differential analysis.

Model Training

gedi

Run GEDI batch correction and dimensionality reduction.

gedi2py.tools.gedi(adata, batch_key, *, n_latent=10, layer=None, layer2=None, max_iterations=100, track_interval=5, mode='Bsphere', ortho_Z=True, C=None, H=None, key_added='gedi', random_state=None, verbose=True, n_jobs=-1, copy=False)[source]

Run GEDI batch correction and dimensionality reduction.

Gene Expression Decomposition for Integration (GEDI) learns shared metagenes and sample-specific factors for batch effect correction.

Parameters:
  • adata (AnnData) – Annotated data matrix with cells as observations.

  • batch_key (str) – Key in adata.obs for batch/sample labels.

  • n_latent (int, default: 10) – Number of latent factors (K).

  • layer (str | None, default: None) – Layer to use instead of adata.X. If None, uses adata.X. For paired data (e.g., CITE-seq), this is the first count matrix.

  • layer2 (str | None, default: None) – Second layer for paired count data (M_paired mode). When specified along with layer, GEDI models the log-ratio: Yi = log((M1+1)/(M2+1)). This is useful for CITE-seq ADT/RNA ratios or similar paired assays.

  • max_iterations (int, default: 100) – Maximum number of optimization iterations.

  • track_interval (int, default: 5) – Interval for tracking convergence metrics.

  • mode (Literal['Bl2', 'Bsphere'], default: 'Bsphere') – Normalization mode for B matrices: “Bsphere” (recommended) or “Bl2”.

  • ortho_Z (bool, default: True) – Whether to orthogonalize Z matrix.

  • C (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None, default: None) – Gene × pathway prior matrix for pathway analysis. Optional.

  • H (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None, default: None) – Covariate × sample prior matrix. Optional.

  • key_added (str, default: 'gedi') – Base key for storing results. Results stored as: - adata.obsm[f'X_{key_added}']: Cell embeddings - adata.varm[f'{key_added}_Z']: Gene loadings - adata.uns[key_added]: Parameters and metadata

  • random_state (int | None, default: None) – Random seed for reproducibility. If None, uses global settings.

  • verbose (bool, default: True) – Whether to print progress messages.

  • n_jobs (int, default: -1) – Number of parallel jobs. -1 uses all available cores.

  • copy (bool, default: False) – Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

  • Returns ``None` if copy=False`, else returns an :class:`~anndata.AnnData.`

  • Sets the following fields

  • ``.obsm[‘X_gedi’]`` (numpy.ndarray) – Cell embeddings (n_cells × n_latent).

  • ``.varm[‘gedi_Z’]`` (numpy.ndarray) – Shared metagenes (n_genes × n_latent).

  • ``.uns[‘gedi’]`` (dict) – Model parameters and metadata.

Examples

Standard usage with log-transformed data:

>>> import gedi2py as gd
>>> import scanpy as sc
>>> adata = sc.read_h5ad("data.h5ad")
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> sc.pp.neighbors(adata, use_rep="X_gedi")
>>> sc.tl.umap(adata)
>>> gd.pl.embedding(adata, color="sample")

Paired data mode (e.g., CITE-seq with two count layers):

>>> # adata.layers['adt'] = ADT counts
>>> # adata.layers['rna'] = RNA counts (for same features)
>>> gd.tl.gedi(
...     adata,
...     batch_key="sample",
...     layer="adt",
...     layer2="rna",
...     n_latent=10
... )

Example

import gedi2py as gd

# Basic usage
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# With more options
gd.tl.gedi(
    adata,
    batch_key="sample",
    n_latent=20,
    max_iterations=200,
    mode="Bsphere",
    ortho_Z=True,
    key_added="gedi",
)

# Paired data mode (e.g., CITE-seq with ADT/RNA counts)
# GEDI models the log-ratio: Yi = log((M1+1)/(M2+1))
gd.tl.gedi(
    adata,
    batch_key="sample",
    layer="adt",       # First count matrix (numerator)
    layer2="rna",      # Second count matrix (denominator)
    n_latent=10,
)

Stored Results

  • adata.obsm['X_gedi']: Cell embeddings (n_cells × n_latent)

  • adata.varm['gedi_Z']: Gene loadings (n_genes × n_latent)

  • adata.uns['gedi']: Model parameters and metadata

Projections

get_projection

Compute and retrieve GEDI projections.

compute_zdb

Compute ZDB (shared manifold) projection.

compute_db

Compute DB (latent factor embedding) projection.

compute_adb

Compute ADB (pathway activity) projection.

gedi2py.tools.get_projection(adata, which='zdb', *, key='gedi', key_added=None, copy=False)[source]

Compute and retrieve GEDI projections.

Projections transform the learned GEDI parameters into interpretable representations:

  • ZDB: Shared manifold projection (genes × cells) = Z @ diag(D) @ B

  • DB: Latent factor embedding (K × cells) = diag(D) @ B

  • ADB: Pathway activity projection (pathways × cells) = C_rot @ A @ diag(D) @ B

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results in .uns[key].

  • which (Literal['zdb', 'db', 'adb'], default: 'zdb') – Which projection to compute: "zdb", "db", or "adb".

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store the result in adata.obsm. If None, defaults to X_{key}_{which} (e.g., X_gedi_zdb).

  • copy (bool, default: False) – If True, return a copy of the projection array instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns the projection as a numpy array.

  • Otherwise, stores the result in ``adata.obsm[key_added]`` and returns ``None`.`

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.get_projection(adata, "zdb")
>>> adata.obsm["X_gedi_zdb"]  # (n_cells, n_genes)
gedi2py.tools.compute_zdb(adata, *, key='gedi', key_added=None, copy=False)[source]

Compute ZDB (shared manifold) projection.

Alias for get_projection(adata, "zdb", ...).

See get_projection() for full documentation.

Return type:

AnnData | ndarray | None

gedi2py.tools.compute_db(adata, *, key='gedi', key_added=None, copy=False)[source]

Compute DB (latent factor embedding) projection.

Alias for get_projection(adata, "db", ...).

See get_projection() for full documentation.

Return type:

AnnData | ndarray | None

gedi2py.tools.compute_adb(adata, *, key='gedi', key_added=None, copy=False)[source]

Compute ADB (pathway activity) projection.

Alias for get_projection(adata, "adb", ...).

See get_projection() for full documentation.

Return type:

AnnData | ndarray | None

Projection Types

Type

Shape

Description

zdb

(n_genes, n_cells)

Full projection: shared manifold

db

(n_latent, n_cells)

Latent factors: batch-corrected cell embeddings

adb

(n_pathways, n_cells)

Pathway activity scores (requires C matrix)

Example

# Get DB projection (latent factors)
gd.tl.get_projection(adata, which="db")
db = adata.obsm['X_gedi_db']

# Compute ZDB (full projection)
zdb = gd.tl.compute_zdb(adata)

Embeddings

svd

Compute factorized SVD from GEDI decomposition.

pca

Compute PCA coordinates from GEDI decomposition.

umap

Compute UMAP embedding from GEDI results.

gedi2py.tools.svd(adata, *, key='gedi', copy=False)[source]

Compute factorized SVD from GEDI decomposition.

Computes SVD while preserving GEDI’s factorized structure: SVD(Z) × SVD(middle) × SVD(DB). This maintains biological interpretability by respecting the decomposition structure.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results in .uns[key].

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • copy (bool, default: False) – If True, return the SVD result as a dict instead of storing in adata.

Return type:

dict | None

Returns:

  • If ``copy=True``, returns dict with keys ````’d’:py:class:``, :py:class:``’u’:py:class:``, :py:class:``’v’:py:class:`.`

  • Otherwise, stores results in ``adata.uns[key][``’svd’:py:class:`]` and returns ``None`.`

  • The SVD components are

    • d: Singular values (K,)

    • u: Left singular vectors (n_genes, K) - gene loadings

    • v: Right singular vectors (n_cells, K) - cell embeddings

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.svd(adata)
>>> adata.uns["gedi"]["svd"]["d"]  # singular values
gedi2py.tools.pca(adata, *, n_components=None, key='gedi', key_added=None, copy=False)[source]

Compute PCA coordinates from GEDI decomposition.

PCA coordinates are computed as V @ diag(d) from the factorized SVD, where V are the right singular vectors (cell embeddings).

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results in .uns[key].

  • n_components (int | None, default: None) – Number of PCs to compute. If None, uses all K latent factors.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store PCA in adata.obsm. Defaults to X_{key}_pca.

  • copy (bool, default: False) – If True, return the PCA coordinates instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns PCA coordinates as numpy array (n_cells, n_components).

  • Otherwise, stores in ``adata.obsm[key_added]`` and returns ``None`.`

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.pca(adata, n_components=20)
>>> adata.obsm["X_gedi_pca"]
gedi2py.tools.umap(adata, *, n_neighbors=15, min_dist=0.1, n_components=2, metric='euclidean', input_key='pca', key='gedi', key_added=None, random_state=None, copy=False)[source]

Compute UMAP embedding from GEDI results.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • n_neighbors (int, default: 15) – Size of local neighborhood for UMAP.

  • min_dist (float, default: 0.1) – Minimum distance between points in the embedding.

  • n_components (int, default: 2) – Dimensionality of the UMAP embedding.

  • metric (str, default: 'euclidean') – Distance metric for neighbor search.

  • input_key (Literal['pca', 'db', 'zdb'], default: 'pca') – Which GEDI representation to use as input: - "pca": PCA coordinates (default) - "db": DB latent factor embedding - "zdb": ZDB shared manifold projection

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store UMAP in adata.obsm. Defaults to X_{key}_umap.

  • random_state (int | None, default: None) – Random seed for reproducibility. If None, uses settings.random_state.

  • copy (bool, default: False) – If True, return UMAP coordinates instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns UMAP coordinates as numpy array (n_cells, n_components).

  • Otherwise, stores in ``adata.obsm[key_added]`` and returns ``None`.`

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.umap(adata, n_neighbors=30)
>>> gd.pl.embedding(adata, basis="X_gedi_umap", color="cell_type")

Example

# Compute all embeddings
gd.tl.pca(adata)
gd.tl.umap(adata)

# Access results
pca_coords = adata.obsm['X_gedi_pca']
umap_coords = adata.obsm['X_gedi_umap']

Imputation

impute

Compute imputed expression values from GEDI model.

variance

Compute gene variance explained by GEDI model.

dispersion

Compute gene dispersion from GEDI model.

gedi2py.tools.impute(adata, *, samples=None, key='gedi', layer_added=None, copy=False)[source]

Compute imputed expression values from GEDI model.

Imputed values are computed as the expected expression under the GEDI model: Y_imputed = Z @ Q_i @ D @ B_i + o + o_i + s_i for each sample.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • samples (list[int] | None, default: None) – List of sample indices to impute. If None, imputes all samples.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • layer_added (str | None, default: None) – Layer name to store imputed values. If None, defaults to {key}_imputed.

  • copy (bool, default: False) – If True, return the imputed matrix instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns imputed expression matrix (n_cells, n_genes).

  • Otherwise, stores in ``adata.layers[layer_added]`` and returns ``None`.`

Notes

The imputed expression is computed sample-by-sample to preserve the sample-specific components (Q_i, o_i, s_i).

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.impute(adata)
>>> adata.layers["gedi_imputed"]
gedi2py.tools.variance(adata, *, key='gedi', copy=False)[source]

Compute gene variance explained by GEDI model.

Variance is computed across the imputed expression values, representing the systematic variation captured by the model.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • copy (bool, default: False) – If True, return variance vector instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns variance vector (n_genes,).

  • Otherwise, stores in ``adata.var[``’{key}_variance’:py:class:`]` and returns ``None`.`

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.variance(adata)
>>> adata.var["gedi_variance"]
gedi2py.tools.dispersion(adata, *, key='gedi', copy=False)[source]

Compute gene dispersion from GEDI model.

Dispersion is computed as variance / mean (coefficient of variation squared) from the imputed expression values.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • copy (bool, default: False) – If True, return dispersion vector instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns dispersion vector (n_genes,).

  • Otherwise, stores in ``adata.var[``’{key}_dispersion’:py:class:`]` and returns ``None`.`

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> gd.tl.dispersion(adata)
>>> adata.var["gedi_dispersion"]

Example

# Impute denoised expression
gd.tl.impute(adata)
imputed = adata.layers['gedi_imputed']

# Compute variance and dispersion
gd.tl.variance(adata)
gd.tl.dispersion(adata)

Differential Expression

differential

Compute differential expression effects from GEDI model.

diff_q

Compute cell-specific differential expression (diffQ).

diff_o

Compute global offset differential effect (diffO).

gedi2py.tools.differential(adata, contrast, *, mode='full', include_offset=True, key='gedi', key_added=None, copy=False)[source]

Compute differential expression effects from GEDI model.

Uses the sample-level covariate matrix H and learned regression coefficients (Rk, Ro) to compute differential expression effects for a given contrast.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • contrast (ndarray | list) – Contrast vector of length L (number of covariates). Specifies the linear combination of covariate effects to compute.

  • mode (Literal['full', 'offset', 'metagene'], default: 'full') – Type of differential effect to compute: - "full": Full cell-specific effect (diffQ, J × N matrix) - "offset": Global gene offset effect (diffO, J vector) - "metagene": Effect on metagene loadings

  • include_offset (bool, default: True) – If True and mode is "full", add the offset effect (diffO) to the cell-specific effect.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store results. Defaults depend on mode: - "full": adata.layers['{key}_diff'] - "offset": adata.var['{key}_diff_offset']

  • copy (bool, default: False) – If True, return the result instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • Depending on mode and copy

  • - ``mode=”full”``, ``copy=True`` ((n_cells, n_genes) array)

  • - ``mode=”offset”``, ``copy=True`` ((n_genes,) array)

  • - Otherwise (None (stores in adata))

Notes

Differential effects require that a covariate matrix H was provided during model training via gd.tl.gedi(..., H=covariate_matrix).

The contrast vector defines a linear combination of covariate effects. For example, with covariates [treatment, sex], a contrast of [1, 0] computes the treatment effect, while [1, -1] computes the interaction.

Examples

>>> import gedi2py as gd
>>> import numpy as np
>>> # Assuming H is (n_samples, 2) with [treatment, control]
>>> gd.tl.gedi(adata, batch_key="sample", H=H_matrix)
>>> contrast = np.array([1, -1])  # treatment - control
>>> gd.tl.differential(adata, contrast)
>>> adata.layers["gedi_diff"]  # (n_cells, n_genes)
gedi2py.tools.diff_q(adata, contrast, *, include_offset=True, key='gedi', key_added=None, copy=False)[source]

Compute cell-specific differential expression (diffQ).

Alias for differential(adata, contrast, mode="full", ...).

See differential() for full documentation.

Return type:

AnnData | ndarray | None

gedi2py.tools.diff_o(adata, contrast, *, key='gedi', key_added=None, copy=False)[source]

Compute global offset differential effect (diffO).

Alias for differential(adata, contrast, mode="offset", ...).

See differential() for full documentation.

Return type:

AnnData | ndarray | None

Example

import numpy as np

# Create contrast: sample 0 vs sample 1
n_samples = len(adata.obs['sample'].unique())
contrast = np.zeros(n_samples)
contrast[0] = 1
contrast[1] = -1

# Compute differential expression
gd.tl.differential(adata, contrast=contrast)

# Access results
de = adata.varm['gedi_differential']

Pathway Analysis

pathway_associations

Compute pathway-gene associations from GEDI model.

pathway_scores

Compute per-cell pathway activity scores.

top_pathway_genes

Get top genes associated with a pathway.

gedi2py.tools.pathway_associations(adata, *, sparse_output=False, key='gedi', key_added=None, copy=False)[source]

Compute pathway-gene associations from GEDI model.

Uses the pathway coefficient matrix A and gene loading matrix Z to compute associations between pathways and genes, accounting for the learned latent structure.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • sparse_output (bool, default: False) – If True, return a sparse matrix. Useful for large pathway sets.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store results in adata.varm. Defaults to {key}_pathway_assoc.

  • copy (bool, default: False) – If True, return the association matrix instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns pathway association matrix (n_genes, n_pathways).

  • Otherwise, stores in ``adata.varm[key_added]`` and returns ``None`.`

Notes

Pathway associations require that a gene-level prior matrix C was provided during model training via gd.tl.gedi(..., C=pathway_matrix).

The association matrix represents how strongly each gene is associated with each pathway through the learned latent factors.

Examples

>>> import gedi2py as gd
>>> # C is (n_genes, n_pathways) pathway membership matrix
>>> gd.tl.gedi(adata, batch_key="sample", C=pathway_matrix)
>>> gd.tl.pathway_associations(adata)
>>> adata.varm["gedi_pathway_assoc"]  # (n_genes, n_pathways)
gedi2py.tools.pathway_scores(adata, *, key='gedi', key_added=None, copy=False)[source]

Compute per-cell pathway activity scores.

Uses the ADB projection (pathway activity) to compute pathway scores for each cell.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store results in adata.obsm. Defaults to X_{key}_pathway_scores.

  • copy (bool, default: False) – If True, return the score matrix instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns pathway score matrix (n_cells, n_pathways).

  • Otherwise, stores in ``adata.obsm[key_added]`` and returns ``None`.`

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", C=pathway_matrix)
>>> gd.tl.pathway_scores(adata)
>>> adata.obsm["X_gedi_pathway_scores"]  # (n_cells, n_pathways)
gedi2py.tools.top_pathway_genes(adata, pathway_idx, *, n_genes=20, key='gedi')[source]

Get top genes associated with a pathway.

Returns the genes with the highest association scores for a given pathway, based on the pathway association matrix.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • pathway_idx (int) – Index of the pathway to query.

  • n_genes (int, default: 20) – Number of top genes to return.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

Return type:

List of gene names with highest pathway associations.

Examples

>>> import gedi2py as gd
>>> gd.tl.pathway_associations(adata)
>>> top_genes = gd.tl.top_pathway_genes(adata, pathway_idx=0, n_genes=10)
>>> print(top_genes)

Example

# Run GEDI with pathway prior
C = load_pathway_matrix()  # genes × pathways
gd.tl.gedi(adata, batch_key="sample", C=C)

# Get pathway associations
gd.tl.pathway_associations(adata)

# Get top genes per pathway
top_genes = gd.tl.top_pathway_genes(adata, pathway_idx=0, n_genes=10)

Dynamics / Trajectory

vector_field

Compute vector field for trajectory between two conditions.

gradient

Compute gradient of pathway activity across cells.

pseudotime

Compute pseudotime ordering based on GEDI embeddings.

gedi2py.tools.vector_field(adata, start_contrast, end_contrast, *, n_steps=10, key='gedi', key_added=None, copy=False)[source]

Compute vector field for trajectory between two conditions.

Uses the differential expression framework to compute a vector field representing the transcriptional trajectory from one condition to another.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • start_contrast (ndarray | list) – Contrast vector defining the starting condition.

  • end_contrast (ndarray | list) – Contrast vector defining the ending condition.

  • n_steps (int, default: 10) – Number of interpolation steps between conditions.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str | None, default: None) – Key to store results. Defaults to {key}_vector_field.

  • copy (bool, default: False) – If True, return the vector field dict instead of storing in adata.

Return type:

AnnData | dict | None

Returns:

  • If ``copy=True``, returns dict with keys

    • vectors: (n_steps, n_genes) array of expression changes

    • positions: (n_steps, L) array of contrast positions

  • Otherwise, stores in ``adata.uns[key_added]`` and returns ``None`.`

Notes

The vector field represents the direction and magnitude of transcriptional change at each point along the trajectory from start to end condition.

This requires that a covariate matrix H was provided during model training.

Examples

>>> import gedi2py as gd
>>> import numpy as np
>>> # Define trajectory from control to treatment
>>> start = np.array([0, 1])  # control
>>> end = np.array([1, 0])    # treatment
>>> gd.tl.vector_field(adata, start, end, n_steps=20)
>>> adata.uns["gedi_vector_field"]
gedi2py.tools.gradient(adata, pathway_idx=None, *, key='gedi', copy=False)[source]

Compute gradient of pathway activity across cells.

Uses the pathway coefficient matrix A and cell loadings B to compute the gradient of pathway activity in the latent space.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • pathway_idx (int | None, default: None) – Index of the pathway to compute gradient for. If None, computes gradients for all pathways.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • copy (bool, default: False) – If True, return the gradient array instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``

    • If pathway_idx is specified: (n_cells, K) gradient array

    • If pathway_idx is None: (n_cells, K, n_pathways) gradient array

  • Otherwise, stores in ``adata.obsm[``’{key}_gradient’:py:class:`]` and returns ``None`.`

Notes

The gradient represents the direction in latent space that maximally increases pathway activity. This can be used for trajectory analysis or identifying cells transitioning along a pathway.

This requires that a pathway prior matrix C was provided during model training.

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample", C=pathway_matrix)
>>> gd.tl.gradient(adata, pathway_idx=0)
>>> adata.obsm["gedi_gradient"]  # direction to increase pathway 0
gedi2py.tools.pseudotime(adata, start_cells, *, use_rep=None, key='gedi', key_added='gedi_pseudotime', copy=False)[source]

Compute pseudotime ordering based on GEDI embeddings.

Uses diffusion-based pseudotime computation on the GEDI latent representation to order cells along a trajectory.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • start_cells (ndarray | list) – Indices or boolean mask of cells to use as trajectory start points.

  • use_rep (str | None, default: None) – Representation to use. Defaults to X_{key}_pca or X_{key}.

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • key_added (str, default: 'gedi_pseudotime') – Key to store pseudotime in adata.obs.

  • copy (bool, default: False) – If True, return pseudotime array instead of storing in adata.

Return type:

AnnData | ndarray | None

Returns:

  • If ``copy=True``, returns pseudotime array (n_cells,).

  • Otherwise, stores in ``adata.obs[key_added]`` and returns ``None`.`

Notes

This is a simple geodesic distance-based pseudotime. For more sophisticated trajectory inference, consider using specialized tools like scanpy’s PAGA or scvelo.

Examples

>>> import gedi2py as gd
>>> # Use cells in cluster 0 as starting point
>>> start_cells = adata.obs["cluster"] == "0"
>>> gd.tl.pseudotime(adata, start_cells)
>>> adata.obs["gedi_pseudotime"]

Example

# Define start and end contrasts
start = np.array([1, 0, 0, 0])
end = np.array([0, 0, 0, 1])

# Compute vector field
gd.tl.vector_field(adata, start_contrast=start, end_contrast=end)

# Compute pseudotime
gd.tl.pseudotime(adata, root_contrast=start)