R/methods.R
, R/methods_SE.R
reduce_dimensions-methods.Rd
reduce_dimensions() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and calculates the reduced dimensional space of the transcript abundance.
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
action = "add",
...,
log_transform = NULL
)
# S4 method for class 'spec_tbl_df'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
action = "add",
...,
log_transform = NULL
)
# S4 method for class 'tbl_df'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
action = "add",
...,
log_transform = NULL
)
# S4 method for class 'tidybulk'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
action = "add",
...,
log_transform = NULL
)
# S4 method for class 'SummarizedExperiment'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
action = "add",
...,
log_transform = NULL
)
# S4 method for class 'RangedSummarizedExperiment'
reduce_dimensions(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
.dims = 2,
top = 500,
of_samples = TRUE,
transform = log1p,
scale = TRUE,
action = "add",
...,
log_transform = NULL
)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
The name of the element column (normally samples).
The name of the feature column (normally transcripts/genes)
The name of the column including the numerical value the clustering is based on (normally transcript abundance)
A character string. The dimension reduction algorithm to use (PCA, MDS, tSNE).
An integer. The number of dimensions your are interested in (e.g., 4 for returning the first four principal components).
An integer. How many top genes to select for dimensionality reduction
A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
A boolean for method="PCA", this will be passed to the `prcomp` function. It is not included in the ... argument because although the default for `prcomp` if FALSE, it is advisable to set it as TRUE.
A character string. Whether to join the new information to the input tbl (add), or just get the non-redundant tbl with the new information (get).
Further parameters passed to the function prcomp if you choose method="PCA" or Rtsne if you choose method="tSNE", or uwot::tumap if you choose method="umap"
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)
A tbl object with additional columns for the reduced dimensions
A tbl object with additional columns for the reduced dimensions
A tbl object with additional columns for the reduced dimensions
A tbl object with additional columns for the reduced dimensions
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
This function reduces the dimensions of the transcript abundances. It can use multi-dimensional scaling (MDS; DOI.org/10.1186/gb-2010-11-3-r25), principal component analysis (PCA), or tSNE (Jesse Krijthe et al. 2018)
Underlying method for PCA: prcomp(scale = scale, ...)
Underlying method for MDS: limma::plotMDS(ndim = .dims, plot = FALSE, top = top)
Underlying method for tSNE: Rtsne::Rtsne(data, ...)
Underlying method for UMAP:
df_source = .data |>
# Filter NA symbol filter(!!.feature |> is.na() |> not()) |>
# Prepare data frame distinct(!!.feature,!!.element,!!.abundance) |>
# Filter most variable genes keep_variable_transcripts(top) |> reduce_dimensions(method="PCA", .dims = calculate_for_pca_dimensions, action="get" ) |> as_matrix(rownames = quo_name(.element)) |> uwot::tumap(...)
counts.MDS =
tidybulk::se_mini |>
identify_abundant() |>
reduce_dimensions( method="MDS", .dims = 3)
#> No group or design set. Assuming all samples belong to one group.
#> Getting the 182 most variable genes
#> tidybulk says: to access the raw results do `attr(..., "internals")$MDS`
counts.PCA =
tidybulk::se_mini |>
identify_abundant() |>
reduce_dimensions(method="PCA", .dims = 3)
#> No group or design set. Assuming all samples belong to one group.
#> Getting the 182 most variable genes
#> Fraction of variance explained by the selected principal components
#> # A tibble: 3 × 2
#> `Fraction of variance` PC
#> <dbl> <int>
#> 1 0.598 1
#> 2 0.302 2
#> 3 0.0852 3
#> tidybulk says: to access the raw results do `attr(..., "internals")$PCA`