I/O (gd.io)

The I/O module provides functions for reading and writing data files and persisting GEDI models.

Reading Data

read_h5ad

Read H5AD file.

read_10x_h5

Read 10X Genomics H5 file.

read_10x_mtx

Read 10X Genomics MTX directory.

gedi2py.io.read_h5ad(filename, *, backed=None)[source]

Read H5AD file.

Wrapper around anndata.read_h5ad() with consistent interface.

Parameters:
  • filename (str | Path) – Path to the H5AD file.

  • backed (str | None, default: None) – If 'r', open in read-only backed mode. If 'r+', open in read-write backed mode. If None, load into memory.

Return type:

Annotated data matrix.

Examples

>>> import gedi2py as gd
>>> adata = gd.read_h5ad("data.h5ad")

Example

import gedi2py as gd

# Read H5AD file
adata = gd.read_h5ad("data.h5ad")

# Read with backed mode (for large files)
adata = gd.read_h5ad("large_data.h5ad", backed="r")
gedi2py.io.read_10x_h5(filename, *, genome=None, gex_only=True)[source]

Read 10X Genomics H5 file.

Reads gene expression data from 10X Genomics HDF5 format files, including those from Cell Ranger.

Parameters:
  • filename (str | Path) – Path to the 10X H5 file.

  • genome (str | None, default: None) – Genome name to read (for multi-genome references). If None, reads the first available genome.

  • gex_only (bool, default: True) – If True, only read gene expression features (exclude antibody capture, CRISPR, etc. for multi-modal data).

Returns:

  • X: Sparse count matrix (cells × genes)

  • obs: Cell barcodes

  • var: Gene information (id, name, feature_type)

Return type:

Annotated data matrix with

Notes

Compatible with:
  • Cell Ranger v2 (matrix.h5)

  • Cell Ranger v3+ (filtered_feature_bc_matrix.h5)

  • Multi-modal outputs

Examples

>>> import gedi2py as gd
>>> adata = gd.read_10x_h5("filtered_feature_bc_matrix.h5")
>>> adata
AnnData object with n_obs × n_vars = 5000 × 20000

Example

# Read 10X Genomics H5 file
adata = gd.read_10x_h5("filtered_feature_bc_matrix.h5")

# Read specific genome
adata = gd.read_10x_h5("multi_genome.h5", genome="GRCh38")

# Include non-gene-expression features
adata = gd.read_10x_h5("multimodal.h5", gex_only=False)
gedi2py.io.read_10x_mtx(path, *, var_names='gene_symbols', make_unique=True)[source]

Read 10X Genomics MTX directory.

Reads gene expression data from 10X Genomics Market Exchange format directory (matrix.mtx, genes.tsv/features.tsv, barcodes.tsv).

Parameters:
  • path (str | Path) – Path to the directory containing matrix files.

  • var_names (str, default: 'gene_symbols') – Which column to use for variable names: 'gene_symbols' or 'gene_ids'.

  • make_unique (bool, default: True) – If True, make variable names unique by appending suffixes.

Return type:

Annotated data matrix.

Examples

>>> import gedi2py as gd
>>> adata = gd.read_10x_mtx("filtered_feature_bc_matrix/")

Example

# Read 10X MTX directory
adata = gd.read_10x_mtx("filtered_feature_bc_matrix/")

# With caching for faster subsequent reads
adata = gd.read_10x_mtx("filtered_feature_bc_matrix/", cache=True)

Writing Data

write_h5ad

Write AnnData to H5AD file.

gedi2py.io.write_h5ad(adata, filename, *, compression='gzip', compression_opts=None)[source]

Write AnnData to H5AD file.

Wrapper around anndata.AnnData.write_h5ad() with consistent interface.

Parameters:
  • adata (AnnData) – Annotated data matrix to write.

  • filename (str | Path) – Path to output H5AD file.

  • compression (str | None, default: 'gzip') – Compression algorithm. Options: 'gzip', 'lzf', None.

  • compression_opts (int | None, default: None) – Compression level (for gzip, 1-9).

Return type:

None

Examples

>>> import gedi2py as gd
>>> gd.write_h5ad(adata, "results.h5ad")

Example

# Write H5AD file with compression
gd.write_h5ad(adata, "output.h5ad", compression="gzip")

# Write without compression (faster, larger file)
gd.write_h5ad(adata, "output.h5ad", compression=None)

Model Persistence

save_model

Save GEDI model parameters to file.

load_model

Load GEDI model parameters from file.

gedi2py.io.save_model(adata, filename, *, key='gedi', compression='gzip')[source]

Save GEDI model parameters to file.

Saves the GEDI model parameters stored in adata.uns[key] to a separate file for later loading.

Parameters:
  • adata (AnnData) – Annotated data matrix with GEDI results.

  • filename (str | Path) – Path to output file (will use .npz format).

  • key (str, default: 'gedi') – Key in adata.uns where GEDI results are stored.

  • compression (str, default: 'gzip') – Compression for numpy save.

Return type:

None

Examples

>>> import gedi2py as gd
>>> gd.tl.gedi(adata, batch_key="sample")
>>> gd.io.save_model(adata, "gedi_model.npz")

Save just the GEDI model parameters, which is more compact than saving the entire AnnData object.

Example

# Run GEDI
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Save model
gd.io.save_model(adata, "gedi_model.h5")

Saved Parameters

  • Z matrix (shared metagenes)

  • D vector (scaling factors)

  • Bi matrices (sample-specific factors)

  • Qi matrices (sample-specific deviations)

  • Offset vectors (o, oi, si)

  • sigma2 (noise variance)

  • Convergence tracking data

gedi2py.io.load_model(adata, filename, *, key='gedi')[source]

Load GEDI model parameters from file.

Loads previously saved GEDI model parameters and stores them in adata.uns[key].

Parameters:
  • adata (AnnData) – Annotated data matrix to store model in.

  • filename (str | Path) – Path to saved model file.

  • key (str, default: 'gedi') – Key in adata.uns to store results.

Return type:

None

Examples

>>> import gedi2py as gd
>>> gd.io.load_model(adata, "gedi_model.npz")
>>> adata.uns["gedi"]["model"]["Z"]  # Loaded metagenes

Load a saved GEDI model into an AnnData object.

Example

# Load into new AnnData
adata = gd.read_h5ad("data.h5ad")
gd.io.load_model(adata, "gedi_model.h5")

# Results are now available
Z = adata.varm['gedi_Z']
embeddings = adata.obsm['X_gedi']

Note

The AnnData object must have the same genes and samples as when the model was saved.

Convenience Functions

These functions are also available at the top level:

import gedi2py as gd

# These are equivalent
adata = gd.read_h5ad("data.h5ad")
adata = gd.io.read_h5ad("data.h5ad")

gd.write_h5ad(adata, "output.h5ad")
gd.io.write_h5ad(adata, "output.h5ad")

File Formats

H5AD

The H5AD format is the standard file format for AnnData objects, based on HDF5. It efficiently stores:

  • Expression matrices (dense or sparse)

  • Cell metadata (obs)

  • Gene metadata (var)

  • Embeddings (obsm)

  • Loadings (varm)

  • Unstructured data (uns)

10X Formats

10X Genomics provides two main formats:

  • H5: Single HDF5 file with all data

  • MTX: Directory with matrix.mtx, barcodes.tsv, features.tsv

gedi2py can read both formats and convert to AnnData.

Workflow Example

import gedi2py as gd
import scanpy as sc

# Load raw data
adata = gd.read_10x_h5("raw_feature_bc_matrix.h5")

# Preprocess
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

# Save preprocessed data
gd.write_h5ad(adata, "preprocessed.h5ad")

# Run GEDI
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Save model separately (smaller file)
gd.io.save_model(adata, "gedi_model.h5")

# Save full results
gd.write_h5ad(adata, "with_gedi.h5ad")

# Later: reload just the model
adata2 = gd.read_h5ad("preprocessed.h5ad")
gd.io.load_model(adata2, "gedi_model.h5")