Skip to contents

Constructs sparse gene expression matrices from one or more directories containing 10X Genomics-style output. The function supports barcode filtering using either an external whitelist or the internally provided filtered barcode file.

Usage

make_gene_count(
  expression_dirs,
  sample_ids,
  whitelist_barcodes = NULL,
  use_internal_whitelist = TRUE
)

Arguments

expression_dirs

A character vector or list of strings. Each element must be a path to a directory containing the gene expression matrix files: matrix.mtx, barcodes.tsv, and features.tsv (or genes.tsv).

sample_ids

A character vector or list of unique sample identifiers, one for each element in expression_dirs. These are used to name outputs in the returned list when multiple samples are provided.

whitelist_barcodes

A list of character vectors. Each list element corresponds to a sample and contains the barcodes to retain for that sample. If NULL (default), the function will attempt to use the internal filtered barcode file (e.g., barcodes.tsv or barcodes_filtered.tsv) if available.

use_internal_whitelist

Logical (default TRUE). If TRUE and whitelist_barcodes is NULL, the function will attempt to use the default filtered barcode list from the input directory. If FALSE, no internal filtration will be applied unless a whitelist is explicitly provided.

Value

If a single sample is provided, returns a sparse matrix of class "dgCMatrix" with genes as rows and barcodes as columns. If multiple samples are provided, returns a named list of sparse matrices, one per sample ID.

Details

The function is designed for bulk or single-cell gene expression processing from 10X-style output folders. Each input directory should contain the standard matrix.mtx, features.tsv/genes.tsv, and barcodes.tsv files. Barcodes can be filtered using either a provided whitelist or by relying on the filtered barcode files output by tools like CellRanger.

If neither an external whitelist nor an internal filtered barcode file is available, all barcodes from the raw matrix will be retained.

Dependencies

Requires the Matrix package for sparse matrix handling and potentially data.table for efficient I/O.