Compute Average Silhouette Width with Logging — get_silhouette

Computes the average silhouette width for a clustering solution using Euclidean distance.

Usage

get_silhouette_mean(X, cluster_assignments, n_threads = 1)

Arguments

X: A numeric matrix where rows are observations and columns are features.
cluster_assignments: An integer vector of cluster assignments, which must be the same length as the number of rows in X.
n_threads: Number of threads to use for parallel processing.

Value

A single numeric value: the average silhouette score.

Note

This process can be very slow for large matrices if single-threaded. Use multiple threads to take advantage of parallel computation for significantly faster results.

Examples

# Preparing the inputs
set.seed(42)
pc_matrix <- matrix(data = rnorm(n = 10000 * 15, sd = 2), nrow = 10000, ncol = 15)
cluster_numbers <- as.integer(runif(n = 10000, min = 1, max = 10))

# Getting the mean silhouette score
n_threads <- parallel::detectCores()
score <- get_silhouette_mean(pc_matrix, cluster_numbers, n_threads)
#> [silhouette_avg] Starting computation...
#> [silhouette_avg] Using 16 threads... 
print(score)
#> [1] -0.006122089