R/reduce_dimensions.R
reduce_dimensions-methods.Rd
reduce_dimensions() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and calculates the reduced dimensional space of the transcript abundance.
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
...,
log_transform = NULL
)
# S4 method for class 'SummarizedExperiment'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
...,
log_transform = NULL
)
# S4 method for class 'RangedSummarizedExperiment'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
...,
log_transform = NULL
)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
The name of the element column (normally samples).
The name of the feature column (normally transcripts/genes)
The name of the column including the numerical value the clustering is based on (normally transcript abundance)
A character string. The dimension reduction algorithm to use (PCA, MDS, tSNE).
An integer. The number of dimensions your are interested in (e.g., 4 for returning the first four principal components).
An integer. How many top genes to select for dimensionality reduction
A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
A boolean for method="PCA", this will be passed to the `prcomp` function. It is not included in the ... argument because although the default for `prcomp` if FALSE, it is advisable to set it as TRUE.
Further parameters passed to the function prcomp if you choose method="PCA" or Rtsne if you choose method="tSNE", or uwot::tumap if you choose method="umap"
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)
A tbl object with additional columns for the reduced dimensions
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
This function reduces the dimensions of the transcript abundances. It can use multi-dimensional scaling (MDS; DOI.org/10.1186/gb-2010-11-3-r25), principal component analysis (PCA), or tSNE (Jesse Krijthe et al. 2018)
Underlying method for PCA: prcomp(scale = scale, ...)
Underlying method for MDS: limma::plotMDS(ndim = .dims, plot = FALSE, top = top)
Underlying method for tSNE: Rtsne::Rtsne(data, ...)
Underlying method for UMAP:
df_source = .data |>
# Filter NA symbol filter(!!.feature |> is.na() |> not()) |>
# Prepare data frame distinct(!!.feature,!!.element,!!.abundance) |>
# Filter most variable genes keep_variable_transcripts(top) |> reduce_dimensions(method="PCA", .dims = calculate_for_pca_dimensions ) |> as_matrix(rownames = quo_name(.element)) |> uwot::tumap(...)
Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7
Krijthe, J. H. (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation. R package version 0.15.
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426.
## Load airway dataset for examples
data('airway', package = 'airway')
# Ensure a 'condition' column exists for examples expecting it
SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex
counts.MDS =
airway |>
identify_abundant() |>
reduce_dimensions( method="MDS", .dims = 3)
#> Warning: All samples appear to belong to the same group.
#> Getting the 500 most variable genes
#> [1] "MDS result_df colnames: sample, 1, 2, 3"
#> tidybulk says: to access the raw results do `metadata(.)$tidybulk$MDS`
counts.PCA =
airway |>
identify_abundant() |>
reduce_dimensions(method="PCA", .dims = 3)
#> Warning: All samples appear to belong to the same group.
#> Getting the 500 most variable genes
#> Fraction of variance explained by the selected principal components
#> # A tibble: 3 × 2
#> `Fraction of variance` PC
#> <dbl> <int>
#> 1 0.0650 1
#> 2 0.0453 2
#> 3 0.0233 3
#> tidybulk says: to access the raw results do `metadata(.)$tidybulk$PCA`