Quick Start¶
This guide shows the minimal steps to run GEDI on your single-cell data.
Minimal Example¶
import gedi2py as gd
import scanpy as sc
# 1. Load data (AnnData with sample/batch labels)
adata = sc.read_h5ad("your_data.h5ad")
# 2. Preprocess with scanpy
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# 3. Run GEDI
gd.tl.gedi(adata, batch_key="sample", n_latent=10)
# 4. Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color="sample")
What This Does¶
Load data: Reads an H5AD file containing your expression matrix and cell metadata
Preprocess: Standard single-cell preprocessing (filter, normalize, log-transform)
Run GEDI: Trains the GEDI model to learn:
Shared metagenes across samples
Sample-specific cell embeddings
Batch-corrected representations
Visualize: Computes UMAP and plots cells colored by sample
Understanding the Output¶
After running gd.tl.gedi(), your AnnData object contains:
# Cell embeddings (batch-corrected)
adata.obsm['X_gedi'] # shape: (n_cells, n_latent)
# Gene loadings
adata.varm['gedi_Z'] # shape: (n_genes, n_latent)
# Model parameters
adata.uns['gedi']['D'] # Scaling factors
adata.uns['gedi']['sigma2'] # Noise variance
adata.uns['gedi']['params'] # Full parameter dictionary
Key Parameters¶
Parameter |
Default |
Description |
|---|---|---|
|
required |
Column in |
|
10 |
Number of latent factors (metagenes) |
|
None |
Layer to use instead of |
|
None |
Second layer for paired data mode (M_paired) |
|
100 |
Maximum optimization iterations |
|
“Bsphere” |
Constraint on B matrices (“Bsphere” or “Bl2”) |
Paired Data Mode (M_paired)¶
For paired count data (e.g., CITE-seq with ADT/RNA ratios), use the layer2 parameter:
# Assuming adata has two count layers:
# - adata.layers['adt']: ADT protein counts
# - adata.layers['rna']: RNA counts (same features)
gd.tl.gedi(
adata,
batch_key="sample",
layer="adt", # Numerator counts (M1)
layer2="rna", # Denominator counts (M2)
n_latent=10
)
GEDI will model the log-ratio: Yi = log((M1+1)/(M2+1))
This is particularly useful for:
CITE-seq: Modeling ADT protein abundance relative to background
Dual-modality assays: Any paired count measurements
Ratio-based analyses: When the ratio between two measurements is biologically meaningful
Next Steps¶
See Basic Workflow for a complete analysis pipeline
Learn about Batch Correction for multi-sample integration
Check the API Reference for all available functions