Core¶
The core module contains the main GEDIModel class and global settings.
GEDIModel¶
- class gedi2py.GEDIModel(adata, batch_key, *, n_latent=10, layer=None, layer2=None, mode='Bsphere', ortho_Z=True, C=None, H=None, random_state=None, verbose=None, n_jobs=None)[source]¶
Bases:
objectGEDI model for single-cell RNA-seq data integration.
Gene Expression Decomposition for Integration (GEDI) learns shared metagenes and sample-specific factors for batch effect correction.
- Parameters:
adata (
AnnData) – Annotated data matrix with cells as observations (n_cells x n_genes).batch_key (
str) – Key inadata.obscontaining batch/sample labels.n_latent (
int, default:10) – Number of latent factors (K). Default: 10.layer (
str|None, default:None) – Layer to use instead ofadata.X. If None, usesadata.X. For paired data (e.g., CITE-seq), this is the first count matrix.layer2 (
str|None, default:None) – Second layer for paired count data (M_paired mode). When specified along withlayer, GEDI models the log-ratio: Yi = log((M1+1)/(M2+1)). This is useful for CITE-seq ADT/RNA ratios or similar paired assays.mode (
Literal['Bl2','Bsphere'], default:'Bsphere') – Normalization mode for B matrices: “Bsphere” (recommended) or “Bl2”.ortho_Z (
bool, default:True) – Whether to orthogonalize Z matrix. Default: True.C (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None, default:None) – Gene × pathway prior matrix for pathway analysis. Optional.H (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None, default:None) – Covariate × sample prior matrix. Optional.random_state (
int|None, default:None) – Random seed for reproducibility.verbose (
int|None, default:None) – Verbosity level (0-3). If None, uses global settings.n_jobs (
int|None, default:None) – Number of parallel jobs. -1 uses all available cores.
- is_trained¶
Whether the model has been trained.
- n_iter¶
Number of iterations completed.
Examples
Standard usage:
>>> import gedi2py as gd >>> import scanpy as sc >>> adata = sc.read_h5ad("data.h5ad") >>> model = gd.GEDIModel(adata, batch_key="sample", n_latent=10) >>> model.train(max_iterations=100) >>> Z = model.get_Z() >>> embeddings = model.get_latent_representation()
Paired data mode (e.g., CITE-seq):
>>> model = gd.GEDIModel( ... adata, batch_key="sample", n_latent=10, ... layer="adt", layer2="rna" ... ) >>> model.train(max_iterations=100)
The GEDIModel class provides fine-grained control over the GEDI algorithm.
Basic Usage
import gedi2py as gd # Create model model = gd.GEDIModel( adata, batch_key="sample", n_latent=10, ) # Train model.train(max_iterations=100) # Get results Z = model.get_Z() embeddings = model.get_latent_representation()
Step-by-Step Training
For more control, initialize and optimize separately:
model = gd.GEDIModel(adata, batch_key="sample", n_latent=10) # Initialize parameters model.initialize() # Run optimization in batches for i in range(10): model.optimize(iterations=10) print(f"sigma2: {model.get_sigma2()}")
Parameters
Parameter
Type
Description
adata
AnnData
Annotated data matrix with cells as observations
batch_key
str
Column in
adata.obscontaining sample/batch labelsn_latent
int
Number of latent factors (default: 10)
layer
str | None
Layer to use (default: None uses
adata.X)mode
str
Constraint mode: “Bsphere” or “Bl2” (default: “Bsphere”)
ortho_Z
bool
Orthogonalize Z matrix (default: True)
C
NDArray | None
Gene-pathway prior matrix (default: None)
H
NDArray | None
Covariate-sample prior matrix (default: None)
random_state
int | None
Random seed for reproducibility
verbose
int | None
Verbosity level (0-3)
n_jobs
int | None
Number of threads (-1 for all)
Attributes
Attribute
Description
is_trained
Whether the model has been trained
n_iter
Number of optimization iterations completed
Methods
Method
Description
initialize()
Initialize model parameters using randomized SVD
optimize(iterations, track_interval)
Run optimization iterations
train(max_iterations, track_interval)
Full training (initialize + optimize)
get_Z()
Get shared metagenes (n_genes × n_latent)
get_D()
Get scaling factors (n_latent,)
get_sigma2()
Get noise variance
get_Bi()
Get sample-specific cell factors
get_latent_representation()
Get DB projection (n_cells × n_latent)
get_tracking()
Get convergence tracking data
- __init__(adata, batch_key, *, n_latent=10, layer=None, layer2=None, mode='Bsphere', ortho_Z=True, C=None, H=None, random_state=None, verbose=None, n_jobs=None)[source]¶
- initialize()[source]¶
Initialize model parameters using randomized SVD.
This is called automatically by
train(), but can be called separately for more control.- Return type:
- get_Z()[source]¶
Get shared metagenes matrix.
- Returns:
Shared metagenes of shape (n_genes, n_latent).
- Return type:
np.ndarray
- get_D()[source]¶
Get scaling factors.
- Returns:
Scaling factors of shape (n_latent,).
- Return type:
np.ndarray
- get_Bi()[source]¶
Get sample-specific cell factor matrices.
- Returns:
List of Bi matrices, each of shape (n_latent, n_cells_in_sample).
- Return type:
listofnp.ndarray
Settings¶
- gedi2py.settings¶
Configuration for gedi2py.
- gedi2py.verbosity¶
Verbosity level: 0 (silent), 1 (normal), 2 (verbose), 3 (debug).
- gedi2py.n_jobs¶
Number of parallel jobs. -1 means all available cores.
- gedi2py.random_state¶
Default random state for reproducibility.
Global configuration settings for gedi2py.
import gedi2py as gd
# Set verbosity (0=silent, 1=progress, 2=detailed, 3=debug)
gd.settings.verbosity = 1
# Set number of threads (-1 for all available)
gd.settings.n_jobs = 4
# Set random seed for reproducibility
gd.settings.random_state = 42
Available Settings
Setting |
Default |
Description |
|---|---|---|
verbosity |
1 |
Verbosity level (0-3) |
n_jobs |
-1 |
Number of threads for parallel operations |
random_state |
0 |
Default random seed |