Constructs sparse gene expression matrices from one or more directories containing 10X Genomics-style output. The function supports barcode filtering using either an external whitelist or the internally provided filtered barcode file.
Usage
make_gene_count(
expression_dirs,
sample_ids,
whitelist_barcodes = NULL,
use_internal_whitelist = TRUE
)Arguments
- expression_dirs
A character vector or list of strings. Each element must be a path to a directory containing the gene expression matrix files:
matrix.mtx,barcodes.tsv, andfeatures.tsv(orgenes.tsv).- sample_ids
A character vector or list of unique sample identifiers, one for each element in
expression_dirs. These are used to name outputs in the returned list when multiple samples are provided.- whitelist_barcodes
A list of character vectors. Each list element corresponds to a sample and contains the barcodes to retain for that sample. If
NULL(default), the function will attempt to use the internal filtered barcode file (e.g.,barcodes.tsvorbarcodes_filtered.tsv) if available.- use_internal_whitelist
Logical (default
TRUE). IfTRUEandwhitelist_barcodesisNULL, the function will attempt to use the default filtered barcode list from the input directory. IfFALSE, no internal filtration will be applied unless a whitelist is explicitly provided.
Value
If a single sample is provided, returns a sparse matrix of class "dgCMatrix" with genes as rows and barcodes as columns.
If multiple samples are provided, returns a named list of sparse matrices, one per sample ID.
Details
The function is designed for bulk or single-cell gene expression processing from 10X-style output folders.
Each input directory should contain the standard matrix.mtx, features.tsv/genes.tsv, and barcodes.tsv
files. Barcodes can be filtered using either a provided whitelist or by relying on the filtered barcode files
output by tools like CellRanger.
If neither an external whitelist nor an internal filtered barcode file is available, all barcodes from the raw matrix will be retained.
