gedi2py¶

Gene Expression Decomposition for Integration

A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction.

gedi2py implements the GEDI algorithm for integrating single-cell RNA sequencing data across multiple samples and batches. It uses a latent variable model with block coordinate descent optimization to learn shared gene expression patterns while correcting for batch effects.

Quick Install¶

pip install gedi2py

Quick Start¶

import gedi2py as gd
import scanpy as sc

# Load your data
adata = sc.read_h5ad("data.h5ad")

# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Compute UMAP embedding
gd.tl.umap(adata)

# Visualize
gd.pl.embedding(adata, color=["sample", "cell_type"])

Key Features¶

Memory-efficient: C++ backend keeps large matrices in native memory
Fast: OpenMP parallelization for multi-threaded optimization
scverse-compliant: Works seamlessly with AnnData, scanpy, and the scverse ecosystem
Flexible: Supports multiple input types (counts, log-transformed, binary)
Comprehensive: Includes projections, embeddings, imputation, and differential analysis

Documentation¶

Getting Started

API Reference

API Reference

Modules¶

gedi2py follows the scanpy convention with submodules for different functionality:

Module	Description
`gd.tl`	Tools for model training, projections, embeddings, imputation, and analysis
`gd.pl`	Plotting functions for embeddings, convergence, and feature visualization
`gd.io`	Input/output for H5AD, 10X formats, and model persistence

Citation¶

If you use gedi2py in your research, please cite:

Mikaeili Namini, A., & Najafabadi, H.S. (2024). GEDI: Gene Expression Decomposition for Integration of single-cell RNA-seq data.

Links¶

License¶

gedi2py is released under the MIT License.