Skip to contents

Identifies highly variable genes from a sparse gene expression matrix using one of two methods: variance-stabilizing transformation (VST) or deviance-based modeling. The VST method uses a C++-accelerated approach to compute standardized variance, while the deviance-based method models gene variability across libraries using negative binomial deviances.

Usage

find_variable_genes(
  gene_expression_matrix,
  method = "vst",
  n_threads = 1,
  verbose = TRUE,
  ...
)

Arguments

gene_expression_matrix

A sparse gene expression matrix (of class Matrix) with gene names as row names.

method

Character string, either "vst" or "sum_deviance". The default is "sum_deviance". "vst" uses a variance-stabilizing transformation to identify variable genes. "sum_deviance" computes per-library deviances and combines them with a row variance metric.

n_threads

If OpenMP is available for your device, the function suggests using multi-thread processing for even faster computation (only for sum_deviance method).

verbose

Logical. If TRUE (default), prints progress and informational messages.

...

Additional arguments (currently unused).

Value

A data.table containing gene names (column events) and computed metrics. For the deviance method, this includes sum_deviance and variance columns.

Examples

library(Matrix)
# loading the toy dataset
toy_obj <- load_toy_M1_M2_object()

# getting high variable genes
HVG_VST <- find_variable_genes(toy_obj$gene_expression, method = "vst") # vst method
#> The method we are using is vst (Seurat)...
HVG_DEV <- find_variable_genes(toy_obj$gene_expression, method = "sum_deviance") # sum_deviance method
#> The method we are using is like deviance summarion per library...
#> There are 11 libraries detected...
#> Calculating the deviances for sample A08 has been completed!
#> Calculating the deviances for sample E01 has been completed!
#> Calculating the deviances for sample F08 has been completed!
#> Calculating the deviances for sample B08 has been completed!
#> Calculating the deviances for sample A01 has been completed!
#> Calculating the deviances for sample B01 has been completed!
#> Calculating the deviances for sample H12 has been completed!
#> Calculating the deviances for sample G08 has been completed!
#> Calculating the deviances for sample C01 has been completed!
#> Calculating the deviances for sample G12 has been completed!
#> Calculating the deviances for sample F01 has been completed!

# Using multi-threading for faster computation (sum_deviance method only)
HVG_DEV_MT <- find_variable_genes(toy_obj$gene_expression, 
                                  method = "sum_deviance", 
                                  n_threads = 4) # 4 threads
#> The method we are using is like deviance summarion per library...
#> There are 11 libraries detected...
#> Calculating the deviances for sample A08 has been completed!
#> Calculating the deviances for sample E01 has been completed!
#> Calculating the deviances for sample F08 has been completed!
#> Calculating the deviances for sample B08 has been completed!
#> Calculating the deviances for sample A01 has been completed!
#> Calculating the deviances for sample B01 has been completed!
#> Calculating the deviances for sample H12 has been completed!
#> Calculating the deviances for sample G08 has been completed!
#> Calculating the deviances for sample C01 has been completed!
#> Calculating the deviances for sample G12 has been completed!
#> Calculating the deviances for sample F01 has been completed!

# printing the results
print(HVG_VST[order(-standardize_variance)])
#>                  events standardize_variance
#>                  <char>                <num>
#>   1: ENSMUSG00000031425           7.67655024
#>   2: ENSMUSG00000027375           6.70624612
#>   3: ENSMUSG00000004366           6.45592352
#>   4: ENSMUSG00000037625           4.94429186
#>   5: ENSMUSG00000006782           4.63958586
#>  ---                                        
#> 496: ENSMUSG00000109113           0.16409767
#> 497: ENSMUSG00000037683           0.12806885
#> 498: ENSMUSG00000024754           0.08901653
#> 499: ENSMUSG00000029373           0.00000000
#> 500: ENSMUSG00000056379           0.00000000
print(HVG_DEV[order(-sum_deviance)])
#>                  events sum_deviance     variance
#>                  <char>        <num>        <num>
#>   1: ENSMUSG00000036192   2611.20890 3.474114e+02
#>   2: ENSMUSG00000078591   2348.13417 1.284587e+03
#>   3: ENSMUSG00000049630   2204.72564 1.128642e+02
#>   4: ENSMUSG00000032036   2034.30586 6.802108e+02
#>   5: ENSMUSG00000059187   2030.86970 2.585974e+03
#>  ---                                             
#> 494: ENSMUSG00000026697     10.84818 4.997500e-04
#> 495: ENSMUSG00000026358     10.42433 4.997500e-04
#> 496: ENSMUSG00000018930     10.38825 4.997500e-04
#> 497: ENSMUSG00000035783     10.16459 4.997500e-04
#> 498: ENSMUSG00000038805     10.16459 4.997500e-04