
Find Variable Genes Using Variance or Deviance-Based Metrics
Source:R/feature_selection.R
find_variable_genes.RdIdentifies highly variable genes from a sparse gene expression matrix using one of two methods: variance-stabilizing transformation (VST) or deviance-based modeling. The VST method uses a C++-accelerated approach to compute standardized variance, while the deviance-based method models gene variability across libraries using negative binomial deviances.
Usage
find_variable_genes(
gene_expression_matrix,
method = "vst",
n_threads = 1,
verbose = TRUE,
...
)Arguments
- gene_expression_matrix
A sparse gene expression matrix (of class
Matrix) with gene names as row names.- method
Character string, either
"vst"or"sum_deviance". The default is"sum_deviance"."vst"uses a variance-stabilizing transformation to identify variable genes."sum_deviance"computes per-library deviances and combines them with a row variance metric.- n_threads
If OpenMP is available for your device, the function suggests using multi-thread processing for even faster computation (only for sum_deviance method).
- verbose
Logical. If
TRUE(default), prints progress and informational messages.- ...
Additional arguments (currently unused).
Value
A data.table containing gene names (column events) and computed metrics.
For the deviance method, this includes sum_deviance and variance columns.
Examples
library(Matrix)
# loading the toy dataset
toy_obj <- load_toy_M1_M2_object()
# getting high variable genes
HVG_VST <- find_variable_genes(toy_obj$gene_expression, method = "vst") # vst method
#> The method we are using is vst (Seurat)...
HVG_DEV <- find_variable_genes(toy_obj$gene_expression, method = "sum_deviance") # sum_deviance method
#> The method we are using is like deviance summarion per library...
#> There are 11 libraries detected...
#> Calculating the deviances for sample A08 has been completed!
#> Calculating the deviances for sample E01 has been completed!
#> Calculating the deviances for sample F08 has been completed!
#> Calculating the deviances for sample B08 has been completed!
#> Calculating the deviances for sample A01 has been completed!
#> Calculating the deviances for sample B01 has been completed!
#> Calculating the deviances for sample H12 has been completed!
#> Calculating the deviances for sample G08 has been completed!
#> Calculating the deviances for sample C01 has been completed!
#> Calculating the deviances for sample G12 has been completed!
#> Calculating the deviances for sample F01 has been completed!
# Using multi-threading for faster computation (sum_deviance method only)
HVG_DEV_MT <- find_variable_genes(toy_obj$gene_expression,
method = "sum_deviance",
n_threads = 4) # 4 threads
#> The method we are using is like deviance summarion per library...
#> There are 11 libraries detected...
#> Calculating the deviances for sample A08 has been completed!
#> Calculating the deviances for sample E01 has been completed!
#> Calculating the deviances for sample F08 has been completed!
#> Calculating the deviances for sample B08 has been completed!
#> Calculating the deviances for sample A01 has been completed!
#> Calculating the deviances for sample B01 has been completed!
#> Calculating the deviances for sample H12 has been completed!
#> Calculating the deviances for sample G08 has been completed!
#> Calculating the deviances for sample C01 has been completed!
#> Calculating the deviances for sample G12 has been completed!
#> Calculating the deviances for sample F01 has been completed!
# printing the results
print(HVG_VST[order(-standardize_variance)])
#> events standardize_variance
#> <char> <num>
#> 1: ENSMUSG00000031425 7.67655024
#> 2: ENSMUSG00000027375 6.70624612
#> 3: ENSMUSG00000004366 6.45592352
#> 4: ENSMUSG00000037625 4.94429186
#> 5: ENSMUSG00000006782 4.63958586
#> ---
#> 496: ENSMUSG00000109113 0.16409767
#> 497: ENSMUSG00000037683 0.12806885
#> 498: ENSMUSG00000024754 0.08901653
#> 499: ENSMUSG00000029373 0.00000000
#> 500: ENSMUSG00000056379 0.00000000
print(HVG_DEV[order(-sum_deviance)])
#> events sum_deviance variance
#> <char> <num> <num>
#> 1: ENSMUSG00000036192 2611.20890 3.474114e+02
#> 2: ENSMUSG00000078591 2348.13417 1.284587e+03
#> 3: ENSMUSG00000049630 2204.72564 1.128642e+02
#> 4: ENSMUSG00000032036 2034.30586 6.802108e+02
#> 5: ENSMUSG00000059187 2030.86970 2.585974e+03
#> ---
#> 494: ENSMUSG00000026697 10.84818 4.997500e-04
#> 495: ENSMUSG00000026358 10.42433 4.997500e-04
#> 496: ENSMUSG00000018930 10.38825 4.997500e-04
#> 497: ENSMUSG00000035783 10.16459 4.997500e-04
#> 498: ENSMUSG00000038805 10.16459 4.997500e-04