gedi2py.tools.gedi

gedi2py.tools.gedi(adata, batch_key, *, n_latent=10, layer=None, layer2=None, max_iterations=100, track_interval=5, mode='Bsphere', ortho_Z=True, C=None, H=None, key_added='gedi', random_state=None, verbose=True, n_jobs=-1, copy=False)[source]

Run GEDI batch correction and dimensionality reduction.

Gene Expression Decomposition for Integration (GEDI) learns shared metagenes and sample-specific factors for batch effect correction.

Parameters:
  • adata (AnnData) – Annotated data matrix with cells as observations.

  • batch_key (str) – Key in adata.obs for batch/sample labels.

  • n_latent (int, default: 10) – Number of latent factors (K).

  • layer (str | None, default: None) – Layer to use instead of adata.X. If None, uses adata.X. For paired data (e.g., CITE-seq), this is the first count matrix.

  • layer2 (str | None, default: None) – Second layer for paired count data (M_paired mode). When specified along with layer, GEDI models the log-ratio: Yi = log((M1+1)/(M2+1)). This is useful for CITE-seq ADT/RNA ratios or similar paired assays.

  • max_iterations (int, default: 100) – Maximum number of optimization iterations.

  • track_interval (int, default: 5) – Interval for tracking convergence metrics.

  • mode (Literal['Bl2', 'Bsphere'], default: 'Bsphere') – Normalization mode for B matrices: “Bsphere” (recommended) or “Bl2”.

  • ortho_Z (bool, default: True) – Whether to orthogonalize Z matrix.

  • C (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None, default: None) – Gene × pathway prior matrix for pathway analysis. Optional.

  • H (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None, default: None) – Covariate × sample prior matrix. Optional.

  • key_added (str, default: 'gedi') – Base key for storing results. Results stored as: - adata.obsm[f'X_{key_added}']: Cell embeddings - adata.varm[f'{key_added}_Z']: Gene loadings - adata.uns[key_added]: Parameters and metadata

  • random_state (int | None, default: None) – Random seed for reproducibility. If None, uses global settings.

  • verbose (bool, default: True) – Whether to print progress messages.

  • n_jobs (int, default: -1) – Number of parallel jobs. -1 uses all available cores.

  • copy (bool, default: False) – Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

  • Returns ``None` if copy=False`, else returns an :class:`~anndata.AnnData.`

  • Sets the following fields

  • ``.obsm[‘X_gedi’]`` (numpy.ndarray) – Cell embeddings (n_cells × n_latent).

  • ``.varm[‘gedi_Z’]`` (numpy.ndarray) – Shared metagenes (n_genes × n_latent).

  • ``.uns[‘gedi’]`` (dict) – Model parameters and metadata.

Examples

Standard usage with log-transformed data:

>>> import gedi2py as gd
>>> import scanpy as sc
>>> adata = sc.read_h5ad("data.h5ad")
>>> gd.tl.gedi(adata, batch_key="sample", n_latent=10)
>>> sc.pp.neighbors(adata, use_rep="X_gedi")
>>> sc.tl.umap(adata)
>>> gd.pl.embedding(adata, color="sample")

Paired data mode (e.g., CITE-seq with two count layers):

>>> # adata.layers['adt'] = ADT counts
>>> # adata.layers['rna'] = RNA counts (for same features)
>>> gd.tl.gedi(
...     adata,
...     batch_key="sample",
...     layer="adt",
...     layer2="rna",
...     n_latent=10
... )