gedi2py.tools.gedi¶
- gedi2py.tools.gedi(adata, batch_key, *, n_latent=10, layer=None, layer2=None, max_iterations=100, track_interval=5, mode='Bsphere', ortho_Z=True, C=None, H=None, key_added='gedi', random_state=None, verbose=True, n_jobs=-1, copy=False)[source]¶
Run GEDI batch correction and dimensionality reduction.
Gene Expression Decomposition for Integration (GEDI) learns shared metagenes and sample-specific factors for batch effect correction.
- Parameters:
adata (
AnnData) – Annotated data matrix with cells as observations.batch_key (
str) – Key inadata.obsfor batch/sample labels.n_latent (
int, default:10) – Number of latent factors (K).layer (
str|None, default:None) – Layer to use instead ofadata.X. If None, usesadata.X. For paired data (e.g., CITE-seq), this is the first count matrix.layer2 (
str|None, default:None) – Second layer for paired count data (M_paired mode). When specified along withlayer, GEDI models the log-ratio: Yi = log((M1+1)/(M2+1)). This is useful for CITE-seq ADT/RNA ratios or similar paired assays.max_iterations (
int, default:100) – Maximum number of optimization iterations.track_interval (
int, default:5) – Interval for tracking convergence metrics.mode (
Literal['Bl2','Bsphere'], default:'Bsphere') – Normalization mode for B matrices: “Bsphere” (recommended) or “Bl2”.ortho_Z (
bool, default:True) – Whether to orthogonalize Z matrix.C (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None, default:None) – Gene × pathway prior matrix for pathway analysis. Optional.H (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None, default:None) – Covariate × sample prior matrix. Optional.key_added (
str, default:'gedi') – Base key for storing results. Results stored as: -adata.obsm[f'X_{key_added}']: Cell embeddings -adata.varm[f'{key_added}_Z']: Gene loadings -adata.uns[key_added]: Parameters and metadatarandom_state (
int|None, default:None) – Random seed for reproducibility. If None, uses global settings.verbose (
bool, default:True) – Whether to print progress messages.n_jobs (
int, default:-1) – Number of parallel jobs. -1 uses all available cores.copy (
bool, default:False) – Whether to return a copy ofadata.
- Return type:
- Returns:
Returns ``None`ifcopy=False`,else returns an :class:`~anndata.AnnData.`Sets the following fields``.obsm[‘X_gedi’]`` (
numpy.ndarray) – Cell embeddings (n_cells × n_latent).``.varm[‘gedi_Z’]`` (
numpy.ndarray) – Shared metagenes (n_genes × n_latent).``.uns[‘gedi’]`` (
dict) – Model parameters and metadata.
Examples
Standard usage with log-transformed data:
>>> import gedi2py as gd >>> import scanpy as sc >>> adata = sc.read_h5ad("data.h5ad") >>> gd.tl.gedi(adata, batch_key="sample", n_latent=10) >>> sc.pp.neighbors(adata, use_rep="X_gedi") >>> sc.tl.umap(adata) >>> gd.pl.embedding(adata, color="sample")
Paired data mode (e.g., CITE-seq with two count layers):
>>> # adata.layers['adt'] = ADT counts >>> # adata.layers['rna'] = RNA counts (for same features) >>> gd.tl.gedi( ... adata, ... batch_key="sample", ... layer="adt", ... layer2="rna", ... n_latent=10 ... )