User Guide¶
This guide provides tutorials and examples for using gedi2py to integrate single-cell RNA-seq data.
Tutorials¶
Overview¶
gedi2py uses the GEDI (Gene Expression Decomposition for Integration) algorithm to learn a shared gene expression space across multiple samples while correcting for batch effects.
The GEDI Model¶
GEDI models gene expression as:
$$Y_i = ZDB_i + Q_i B_i + \mathbf{1}s_i^T + o_i\mathbf{1}^T + o\mathbf{1}^T + \epsilon$$
Where:
$Y_i$ is the log-transformed expression matrix for sample $i$
$Z$ is the shared metagene matrix (genes × latent factors)
$D$ is a diagonal scaling matrix
$B_i$ is the sample-specific cell factor matrix
$Q_i$ captures sample-specific deviations
$s_i$ and $o_i$ are cell and gene offsets
$o$ is the global gene offset
Workflow¶
A typical gedi2py workflow consists of:
Load data - Read H5AD files or other formats
Preprocess - Filter, normalize, log-transform (using scanpy)
Run GEDI - Train the model to learn latent factors
Analyze - Compute projections, embeddings, differential expression
Visualize - Plot results using gedi2py or scanpy
API Convention¶
gedi2py follows the scanpy API convention:
import gedi2py as gd
# Tools module (gd.tl)
gd.tl.gedi(adata, ...) # Run GEDI
gd.tl.umap(adata, ...) # Compute UMAP
# Plotting module (gd.pl)
gd.pl.embedding(adata, ...) # Plot embeddings
gd.pl.convergence(adata, ...) # Plot convergence
# I/O module (gd.io)
gd.read_h5ad(...) # Read data
gd.write_h5ad(...) # Write data
Results are stored in the AnnData object:
adata.obsm['X_gedi']- Cell embeddings (DB projection)adata.varm['gedi_Z']- Gene loadings (Z matrix)adata.uns['gedi']- Model parameters and metadata
Next Steps¶
Start with the Quick Start for a minimal example
See Basic Workflow for a complete analysis
Learn about Batch Correction for multi-sample integration