I/O (gd.io)¶
The I/O module provides functions for reading and writing data files and persisting GEDI models.
Reading Data¶
Read H5AD file. |
|
Read 10X Genomics H5 file. |
|
Read 10X Genomics MTX directory. |
- gedi2py.io.read_h5ad(filename, *, backed=None)[source]¶
Read H5AD file.
Wrapper around
anndata.read_h5ad()with consistent interface.- Parameters:
- Return type:
Annotated data matrix.
Examples
>>> import gedi2py as gd >>> adata = gd.read_h5ad("data.h5ad")
Example
import gedi2py as gd # Read H5AD file adata = gd.read_h5ad("data.h5ad") # Read with backed mode (for large files) adata = gd.read_h5ad("large_data.h5ad", backed="r")
- gedi2py.io.read_10x_h5(filename, *, genome=None, gex_only=True)[source]¶
Read 10X Genomics H5 file.
Reads gene expression data from 10X Genomics HDF5 format files, including those from Cell Ranger.
- Parameters:
- Returns:
X: Sparse count matrix (cells × genes)obs: Cell barcodesvar: Gene information (id, name, feature_type)
- Return type:
Annotated data matrix with
Notes
- Compatible with:
Cell Ranger v2 (matrix.h5)
Cell Ranger v3+ (filtered_feature_bc_matrix.h5)
Multi-modal outputs
Examples
>>> import gedi2py as gd >>> adata = gd.read_10x_h5("filtered_feature_bc_matrix.h5") >>> adata AnnData object with n_obs × n_vars = 5000 × 20000
Example
# Read 10X Genomics H5 file adata = gd.read_10x_h5("filtered_feature_bc_matrix.h5") # Read specific genome adata = gd.read_10x_h5("multi_genome.h5", genome="GRCh38") # Include non-gene-expression features adata = gd.read_10x_h5("multimodal.h5", gex_only=False)
- gedi2py.io.read_10x_mtx(path, *, var_names='gene_symbols', make_unique=True)[source]¶
Read 10X Genomics MTX directory.
Reads gene expression data from 10X Genomics Market Exchange format directory (matrix.mtx, genes.tsv/features.tsv, barcodes.tsv).
- Parameters:
- Return type:
Annotated data matrix.
Examples
>>> import gedi2py as gd >>> adata = gd.read_10x_mtx("filtered_feature_bc_matrix/")
Example
# Read 10X MTX directory adata = gd.read_10x_mtx("filtered_feature_bc_matrix/") # With caching for faster subsequent reads adata = gd.read_10x_mtx("filtered_feature_bc_matrix/", cache=True)
Writing Data¶
Write AnnData to H5AD file. |
- gedi2py.io.write_h5ad(adata, filename, *, compression='gzip', compression_opts=None)[source]¶
Write AnnData to H5AD file.
Wrapper around
anndata.AnnData.write_h5ad()with consistent interface.- Parameters:
- Return type:
Examples
>>> import gedi2py as gd >>> gd.write_h5ad(adata, "results.h5ad")
Example
# Write H5AD file with compression gd.write_h5ad(adata, "output.h5ad", compression="gzip") # Write without compression (faster, larger file) gd.write_h5ad(adata, "output.h5ad", compression=None)
Model Persistence¶
Save GEDI model parameters to file. |
|
Load GEDI model parameters from file. |
- gedi2py.io.save_model(adata, filename, *, key='gedi', compression='gzip')[source]¶
Save GEDI model parameters to file.
Saves the GEDI model parameters stored in
adata.uns[key]to a separate file for later loading.- Parameters:
- Return type:
Examples
>>> import gedi2py as gd >>> gd.tl.gedi(adata, batch_key="sample") >>> gd.io.save_model(adata, "gedi_model.npz")
Save just the GEDI model parameters, which is more compact than saving the entire AnnData object.
Example
# Run GEDI gd.tl.gedi(adata, batch_key="sample", n_latent=10) # Save model gd.io.save_model(adata, "gedi_model.h5")
Saved Parameters
Z matrix (shared metagenes)
D vector (scaling factors)
Bi matrices (sample-specific factors)
Qi matrices (sample-specific deviations)
Offset vectors (o, oi, si)
sigma2 (noise variance)
Convergence tracking data
- gedi2py.io.load_model(adata, filename, *, key='gedi')[source]¶
Load GEDI model parameters from file.
Loads previously saved GEDI model parameters and stores them in
adata.uns[key].- Parameters:
- Return type:
Examples
>>> import gedi2py as gd >>> gd.io.load_model(adata, "gedi_model.npz") >>> adata.uns["gedi"]["model"]["Z"] # Loaded metagenes
Load a saved GEDI model into an AnnData object.
Example
# Load into new AnnData adata = gd.read_h5ad("data.h5ad") gd.io.load_model(adata, "gedi_model.h5") # Results are now available Z = adata.varm['gedi_Z'] embeddings = adata.obsm['X_gedi']
Note
The AnnData object must have the same genes and samples as when the model was saved.
Convenience Functions¶
These functions are also available at the top level:
import gedi2py as gd
# These are equivalent
adata = gd.read_h5ad("data.h5ad")
adata = gd.io.read_h5ad("data.h5ad")
gd.write_h5ad(adata, "output.h5ad")
gd.io.write_h5ad(adata, "output.h5ad")
File Formats¶
H5AD
The H5AD format is the standard file format for AnnData objects, based on HDF5. It efficiently stores:
Expression matrices (dense or sparse)
Cell metadata (obs)
Gene metadata (var)
Embeddings (obsm)
Loadings (varm)
Unstructured data (uns)
10X Formats
10X Genomics provides two main formats:
H5: Single HDF5 file with all data
MTX: Directory with matrix.mtx, barcodes.tsv, features.tsv
gedi2py can read both formats and convert to AnnData.
Workflow Example¶
import gedi2py as gd
import scanpy as sc
# Load raw data
adata = gd.read_10x_h5("raw_feature_bc_matrix.h5")
# Preprocess
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# Save preprocessed data
gd.write_h5ad(adata, "preprocessed.h5ad")
# Run GEDI
gd.tl.gedi(adata, batch_key="sample", n_latent=10)
# Save model separately (smaller file)
gd.io.save_model(adata, "gedi_model.h5")
# Save full results
gd.write_h5ad(adata, "with_gedi.h5ad")
# Later: reload just the model
adata2 = gd.read_h5ad("preprocessed.h5ad")
gd.io.load_model(adata2, "gedi_model.h5")