Dimension reduction of the transcript abundance data — reduce

reduce_dimensions() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and calculates the reduced dimensional space of the transcript abundance.

reduce_dimensions(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  .dims = 2,
  top = 500,
  of_samples = TRUE,
  transform = log1p,
  scale = TRUE,
  action = "add",
  ...,
  log_transform = NULL
)

# S4 method for class 'spec_tbl_df'
reduce_dimensions(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  .dims = 2,
  top = 500,
  of_samples = TRUE,
  transform = log1p,
  scale = TRUE,
  action = "add",
  ...,
  log_transform = NULL
)

# S4 method for class 'tbl_df'
reduce_dimensions(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  .dims = 2,
  top = 500,
  of_samples = TRUE,
  transform = log1p,
  scale = TRUE,
  action = "add",
  ...,
  log_transform = NULL
)

# S4 method for class 'tidybulk'
reduce_dimensions(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  .dims = 2,
  top = 500,
  of_samples = TRUE,
  transform = log1p,
  scale = TRUE,
  action = "add",
  ...,
  log_transform = NULL
)

# S4 method for class 'SummarizedExperiment'
reduce_dimensions(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  .dims = 2,
  top = 500,
  of_samples = TRUE,
  transform = log1p,
  scale = TRUE,
  action = "add",
  ...,
  log_transform = NULL
)

# S4 method for class 'RangedSummarizedExperiment'
reduce_dimensions(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  .dims = 2,
  top = 500,
  of_samples = TRUE,
  transform = log1p,
  scale = TRUE,
  action = "add",
  ...,
  log_transform = NULL
)

Arguments

.data: A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
.element: The name of the element column (normally samples).
.feature: The name of the feature column (normally transcripts/genes)
.abundance: The name of the column including the numerical value the clustering is based on (normally transcript abundance)
method: A character string. The dimension reduction algorithm to use (PCA, MDS, tSNE).
.dims: An integer. The number of dimensions your are interested in (e.g., 4 for returning the first four principal components).
top: An integer. How many top genes to select for dimensionality reduction
of_samples: A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
transform: A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
scale: A boolean for method="PCA", this will be passed to the `prcomp` function. It is not included in the ... argument because although the default for `prcomp` if FALSE, it is advisable to set it as TRUE.
action: A character string. Whether to join the new information to the input tbl (add), or just get the non-redundant tbl with the new information (get).
...: Further parameters passed to the function prcomp if you choose method="PCA" or Rtsne if you choose method="tSNE", or uwot::tumap if you choose method="umap"
log_transform: DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)

Value

A tbl object with additional columns for the reduced dimensions

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

This function reduces the dimensions of the transcript abundances. It can use multi-dimensional scaling (MDS; DOI.org/10.1186/gb-2010-11-3-r25), principal component analysis (PCA), or tSNE (Jesse Krijthe et al. 2018)

Underlying method for PCA: prcomp(scale = scale, ...)

Underlying method for MDS: limma::plotMDS(ndim = .dims, plot = FALSE, top = top)

Underlying method for tSNE: Rtsne::Rtsne(data, ...)

Underlying method for UMAP:

df_source = .data |>

# Filter NA symbol filter(!!.feature |> is.na() |> not()) |>

# Prepare data frame distinct(!!.feature,!!.element,!!.abundance) |>

# Filter most variable genes keep_variable_transcripts(top) |> reduce_dimensions(method="PCA", .dims = calculate_for_pca_dimensions, action="get" ) |> as_matrix(rownames = quo_name(.element)) |> uwot::tumap(...)

Examples




counts.MDS =
 tidybulk::se_mini |>
 identify_abundant() |>
 reduce_dimensions( method="MDS", .dims = 3)
#> No group or design set. Assuming all samples belong to one group.
#> Getting the 182 most variable genes
#> tidybulk says: to access the raw results do `attr(..., "internals")$MDS`


counts.PCA =
 tidybulk::se_mini |>
 identify_abundant() |>
 reduce_dimensions(method="PCA", .dims = 3)
#> No group or design set. Assuming all samples belong to one group.
#> Getting the 182 most variable genes
#> Fraction of variance explained by the selected principal components
#> # A tibble: 3 × 2
#>   `Fraction of variance`    PC
#>                    <dbl> <int>
#> 1                 0.598      1
#> 2                 0.302      2
#> 3                 0.0852     3
#> tidybulk says: to access the raw results do `attr(..., "internals")$PCA`