R/cluster_elements.R
cluster_elements-methods.Rd
cluster_elements() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and identify clusters in the data.
cluster_elements(.data, method, of_samples = TRUE, transform = log1p, ...)
# S4 method for class 'SummarizedExperiment'
cluster_elements(.data, method, of_samples = TRUE, transform = log1p, ...)
# S4 method for class 'RangedSummarizedExperiment'
cluster_elements(.data, method, of_samples = TRUE, transform = log1p, ...)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
A character string. The cluster algorithm to use, at the moment k-means is the only algorithm included.
A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
Further parameters passed to the function kmeans
A tbl object with additional columns with cluster labels
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
identifies clusters in the data, normally of samples. This function returns a tibble with additional columns for the cluster annotation. At the moment only k-means (DOI: 10.2307/2346830) and SNN clustering (DOI:10.1016/j.cell.2019.05.031) is supported, the plan is to introduce more clustering methods.
Underlying method for kmeans do.call(kmeans(.data, iter.max = 1000, ...)
Underlying method for SNN .data |> Seurat::CreateSeuratObject() |> Seurat::ScaleData(display.progress = TRUE,num.cores = 4, do.par = TRUE) |> Seurat::FindVariableFeatures(selection.method = "vst") |> Seurat::RunPCA(npcs = 30) |> Seurat::FindNeighbors() |> Seurat::FindClusters(method = "igraph", ...)
Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281-297. doi:10.1007/978-3-642-05177-7_26
Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5), 411-420. doi:10.1038/nbt.4096
## Load airway dataset for examples
data('airway', package = 'airway')
# Ensure a 'condition' column exists for examples expecting it
SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex
if (FALSE) { # \dontrun{
cluster_elements(airway, centers = 2, method="kmeans")
} # }