scale_abundance() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and Scales transcript abundance compansating for sequencing depth (e.g., with TMM algorithm, Robinson and Oshlack doi.org/10.1186/gb-2010-11-3-r25).

scale_abundance(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  method = "TMM",
  reference_sample = NULL,
  .subset_for_scaling = NULL,
  action = "add",
  reference_selection_function = NULL
)

# S4 method for spec_tbl_df
scale_abundance(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  method = "TMM",
  reference_sample = NULL,
  .subset_for_scaling = NULL,
  action = "add",
  reference_selection_function = NULL
)

# S4 method for tbl_df
scale_abundance(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  method = "TMM",
  reference_sample = NULL,
  .subset_for_scaling = NULL,
  action = "add",
  reference_selection_function = NULL
)

# S4 method for tidybulk
scale_abundance(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  method = "TMM",
  reference_sample = NULL,
  .subset_for_scaling = NULL,
  action = "add",
  reference_selection_function = NULL
)

# S4 method for SummarizedExperiment
scale_abundance(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  method = "TMM",
  reference_sample = NULL,
  .subset_for_scaling = NULL,
  action = NULL,
  reference_selection_function = NULL
)

# S4 method for RangedSummarizedExperiment
scale_abundance(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  method = "TMM",
  reference_sample = NULL,
  .subset_for_scaling = NULL,
  action = NULL,
  reference_selection_function = NULL
)

Arguments

.data

A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))

.sample

The name of the sample column

.transcript

The name of the transcript/gene column

.abundance

The name of the transcript/gene abundance column

method

A character string. The scaling method passed to the back-end function (i.e., edgeR::calcNormFactors; "TMM","TMMwsp","RLE","upperquartile")

reference_sample

A character string. The name of the reference sample. If NULL the sample with highest total read count will be selected as reference.

.subset_for_scaling

A gene-wise quosure condition. This will be used to filter rows (features/genes) of the dataset. For example

action

A character string between "add" (default) and "only". "add" joins the new information to the input tbl (default), "only" return a non-redundant tbl with the just new information.

reference_selection_function

DEPRECATED. please use reference_sample.

Value

A tbl object with additional columns with scaled data as `<NAME OF COUNT COLUMN>_scaled`

A tbl object with additional columns with scaled data as `<NAME OF COUNT COLUMN>_scaled`

A tbl object with additional columns with scaled data as `<NAME OF COUNT COLUMN>_scaled`

A tbl object with additional columns with scaled data as `<NAME OF COUNT COLUMN>_scaled`

A `SummarizedExperiment` object

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

Scales transcript abundance compensating for sequencing depth (e.g., with TMM algorithm, Robinson and Oshlack doi.org/10.1186/gb-2010-11-3-r25). Lowly transcribed transcripts/genes (defined with minimum_counts and minimum_proportion parameters) are filtered out from the scaling procedure. The scaling inference is then applied back to all unfiltered data.

Underlying method edgeR::calcNormFactors(.data, method = c("TMM","TMMwsp","RLE","upperquartile"))

Examples



 tidybulk::se_mini |>
   identify_abundant() |>
   scale_abundance()
#> No group or design set. Assuming all samples belong to one group.
#> tidybulk says: the sample with largest library size SRR1740035 was chosen as reference for scaling
#> class: SummarizedExperiment 
#> dim: 527 5 
#> metadata(0):
#> assays(2): count count_scaled
#> rownames(527): ABCB4 ABCB9 ... ZNF324 ZNF442
#> rowData names(2): entrez .abundant
#> colnames(5): SRR1740034 SRR1740035 SRR1740043 SRR1740058 SRR1740067
#> colData names(7): Cell.type time ... TMM multiplier