Core

The core module contains the main GEDIModel class and global settings.

GEDIModel

class gedi2py.GEDIModel(adata, batch_key, *, n_latent=10, layer=None, layer2=None, mode='Bsphere', ortho_Z=True, C=None, H=None, random_state=None, verbose=None, n_jobs=None)[source]

Bases: object

GEDI model for single-cell RNA-seq data integration.

Gene Expression Decomposition for Integration (GEDI) learns shared metagenes and sample-specific factors for batch effect correction.

Parameters:
  • adata (AnnData) – Annotated data matrix with cells as observations (n_cells x n_genes).

  • batch_key (str) – Key in adata.obs containing batch/sample labels.

  • n_latent (int, default: 10) – Number of latent factors (K). Default: 10.

  • layer (str | None, default: None) – Layer to use instead of adata.X. If None, uses adata.X. For paired data (e.g., CITE-seq), this is the first count matrix.

  • layer2 (str | None, default: None) – Second layer for paired count data (M_paired mode). When specified along with layer, GEDI models the log-ratio: Yi = log((M1+1)/(M2+1)). This is useful for CITE-seq ADT/RNA ratios or similar paired assays.

  • mode (Literal['Bl2', 'Bsphere'], default: 'Bsphere') – Normalization mode for B matrices: “Bsphere” (recommended) or “Bl2”.

  • ortho_Z (bool, default: True) – Whether to orthogonalize Z matrix. Default: True.

  • C (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None, default: None) – Gene × pathway prior matrix for pathway analysis. Optional.

  • H (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None, default: None) – Covariate × sample prior matrix. Optional.

  • random_state (int | None, default: None) – Random seed for reproducibility.

  • verbose (int | None, default: None) – Verbosity level (0-3). If None, uses global settings.

  • n_jobs (int | None, default: None) – Number of parallel jobs. -1 uses all available cores.

is_trained

Whether the model has been trained.

n_iter

Number of iterations completed.

Examples

Standard usage:

>>> import gedi2py as gd
>>> import scanpy as sc
>>> adata = sc.read_h5ad("data.h5ad")
>>> model = gd.GEDIModel(adata, batch_key="sample", n_latent=10)
>>> model.train(max_iterations=100)
>>> Z = model.get_Z()
>>> embeddings = model.get_latent_representation()

Paired data mode (e.g., CITE-seq):

>>> model = gd.GEDIModel(
...     adata, batch_key="sample", n_latent=10,
...     layer="adt", layer2="rna"
... )
>>> model.train(max_iterations=100)

The GEDIModel class provides fine-grained control over the GEDI algorithm.

Basic Usage

import gedi2py as gd

# Create model
model = gd.GEDIModel(
    adata,
    batch_key="sample",
    n_latent=10,
)

# Train
model.train(max_iterations=100)

# Get results
Z = model.get_Z()
embeddings = model.get_latent_representation()

Step-by-Step Training

For more control, initialize and optimize separately:

model = gd.GEDIModel(adata, batch_key="sample", n_latent=10)

# Initialize parameters
model.initialize()

# Run optimization in batches
for i in range(10):
    model.optimize(iterations=10)
    print(f"sigma2: {model.get_sigma2()}")

Parameters

Parameter

Type

Description

adata

AnnData

Annotated data matrix with cells as observations

batch_key

str

Column in adata.obs containing sample/batch labels

n_latent

int

Number of latent factors (default: 10)

layer

str | None

Layer to use (default: None uses adata.X)

mode

str

Constraint mode: “Bsphere” or “Bl2” (default: “Bsphere”)

ortho_Z

bool

Orthogonalize Z matrix (default: True)

C

NDArray | None

Gene-pathway prior matrix (default: None)

H

NDArray | None

Covariate-sample prior matrix (default: None)

random_state

int | None

Random seed for reproducibility

verbose

int | None

Verbosity level (0-3)

n_jobs

int | None

Number of threads (-1 for all)

Attributes

Attribute

Description

is_trained

Whether the model has been trained

n_iter

Number of optimization iterations completed

Methods

Method

Description

initialize()

Initialize model parameters using randomized SVD

optimize(iterations, track_interval)

Run optimization iterations

train(max_iterations, track_interval)

Full training (initialize + optimize)

get_Z()

Get shared metagenes (n_genes × n_latent)

get_D()

Get scaling factors (n_latent,)

get_sigma2()

Get noise variance

get_Bi()

Get sample-specific cell factors

get_latent_representation()

Get DB projection (n_cells × n_latent)

get_tracking()

Get convergence tracking data

__init__(adata, batch_key, *, n_latent=10, layer=None, layer2=None, mode='Bsphere', ortho_Z=True, C=None, H=None, random_state=None, verbose=None, n_jobs=None)[source]
initialize()[source]

Initialize model parameters using randomized SVD.

This is called automatically by train(), but can be called separately for more control.

Return type:

None

optimize(iterations=100, track_interval=5)[source]

Run optimization iterations.

Parameters:
  • iterations (int, default: 100) – Number of optimization iterations.

  • track_interval (int, default: 5) – Interval for tracking convergence metrics.

Return type:

None

train(max_iterations=100, track_interval=5)[source]

Train the GEDI model (initialize + optimize).

Parameters:
  • max_iterations (int, default: 100) – Maximum number of optimization iterations.

  • track_interval (int, default: 5) – Interval for tracking convergence metrics.

Return type:

None

property is_trained: bool

Whether the model has been trained.

get_Z()[source]

Get shared metagenes matrix.

Returns:

Shared metagenes of shape (n_genes, n_latent).

Return type:

np.ndarray

get_D()[source]

Get scaling factors.

Returns:

Scaling factors of shape (n_latent,).

Return type:

np.ndarray

get_sigma2()[source]

Get estimated noise variance.

Returns:

Noise variance (sigma^2).

Return type:

float

get_Bi()[source]

Get sample-specific cell factor matrices.

Returns:

List of Bi matrices, each of shape (n_latent, n_cells_in_sample).

Return type:

list of np.ndarray

get_latent_representation()[source]

Get cell embeddings in latent space (DB projection).

Returns:

Cell embeddings of shape (n_cells, n_latent).

Return type:

np.ndarray

get_tracking()[source]

Get tracking data from optimization.

Returns:

Dictionary with tracking data (sigma2, etc.).

Return type:

dict

Settings

gedi2py.settings

Configuration for gedi2py.

gedi2py.verbosity

Verbosity level: 0 (silent), 1 (normal), 2 (verbose), 3 (debug).

gedi2py.n_jobs

Number of parallel jobs. -1 means all available cores.

gedi2py.random_state

Default random state for reproducibility.

Global configuration settings for gedi2py.

import gedi2py as gd

# Set verbosity (0=silent, 1=progress, 2=detailed, 3=debug)
gd.settings.verbosity = 1

# Set number of threads (-1 for all available)
gd.settings.n_jobs = 4

# Set random seed for reproducibility
gd.settings.random_state = 42

Available Settings

Setting

Default

Description

verbosity

1

Verbosity level (0-3)

n_jobs

-1

Number of threads for parallel operations

random_state

0

Default random seed