Get clusters of elements (e.g., samples or transcripts)

cluster_elements() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and identify clusters in the data.

cluster_elements(.data, method, of_samples = TRUE, transform = log1p, ...)

# S4 method for class 'SummarizedExperiment'
cluster_elements(.data, method, of_samples = TRUE, transform = log1p, ...)

# S4 method for class 'RangedSummarizedExperiment'
cluster_elements(.data, method, of_samples = TRUE, transform = log1p, ...)

Arguments

.data: A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
method: A character string. The cluster algorithm to use, at the moment k-means is the only algorithm included.
of_samples: A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
transform: A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
...: Further parameters passed to the function kmeans

Value

A tbl object with additional columns with cluster labels

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

identifies clusters in the data, normally of samples. This function returns a tibble with additional columns for the cluster annotation. At the moment only k-means (DOI: 10.2307/2346830) and SNN clustering (DOI:10.1016/j.cell.2019.05.031) is supported, the plan is to introduce more clustering methods.

Underlying method for kmeans do.call(kmeans(.data, iter.max = 1000, ...)

Underlying method for SNN .data |> Seurat::CreateSeuratObject() |> Seurat::ScaleData(display.progress = TRUE,num.cores = 4, do.par = TRUE) |> Seurat::FindVariableFeatures(selection.method = "vst") |> Seurat::RunPCA(npcs = 30) |> Seurat::FindNeighbors() |> Seurat::FindClusters(method = "igraph", ...)

References

Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281-297. doi:10.1007/978-3-642-05177-7_26

Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5), 411-420. doi:10.1038/nbt.4096

Examples

## Load airway dataset for examples

  data('airway', package = 'airway')
  # Ensure a 'condition' column exists for examples expecting it

    SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex



if (FALSE) { # \dontrun{
    cluster_elements(airway, centers = 2, method="kmeans")
} # }