scale_abundance() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and Scales transcript abundance compansating for sequencing depth (e.g., with TMM algorithm, Robinson and Oshlack doi.org/10.1186/gb-2010-11-3-r25).
scale_abundance(
.data,
abundance = assayNames(.data)[1],
method = "TMM",
reference_sample = NULL,
.subset_for_scaling = NULL,
suffix = "_scaled",
reference_selection_function = NULL,
...,
.abundance = NULL
)
# S4 method for class 'SummarizedExperiment'
scale_abundance(
.data,
abundance = assayNames(.data)[1],
method = "TMM",
reference_sample = NULL,
.subset_for_scaling = NULL,
suffix = "_scaled",
reference_selection_function = NULL,
...,
.abundance = NULL
)
# S4 method for class 'RangedSummarizedExperiment'
scale_abundance(
.data,
abundance = assayNames(.data)[1],
method = "TMM",
reference_sample = NULL,
.subset_for_scaling = NULL,
suffix = "_scaled",
reference_selection_function = NULL,
...,
.abundance = NULL
)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
The name of the transcript/gene abundance column (character, preferred)
A character string. The scaling method passed to the back-end function (i.e., edgeR::calcNormFactors; "TMM","TMMwsp","RLE","upperquartile")
A character string. The name of the reference sample. If NULL the sample with highest total read count will be selected as reference.
A gene-wise quosure condition. This will be used to filter rows (features/genes) of the dataset. For example
A character string to append to the scaled abundance column name. Default is "_scaled".
DEPRECATED. please use reference_sample.
Further arguments.
DEPRECATED. The name of the transcript/gene abundance column (symbolic, for backward compatibility)
A tbl object with additional columns with scaled data as `<NAME OF COUNT COLUMN>_scaled`
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
Scales transcript abundance compensating for sequencing depth (e.g., with TMM algorithm, Robinson and Oshlack doi.org/10.1186/gb-2010-11-3-r25). Lowly transcribed transcripts/genes (defined with minimum_counts and minimum_proportion parameters) are filtered out from the scaling procedure. The scaling inference is then applied back to all unfiltered data.
Underlying method edgeR::calcNormFactors(.data, method = c("TMM","TMMwsp","RLE","upperquartile"))
Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7
Robinson, M. D., & Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11(3), R25. doi:10.1186/gb-2010-11-3-r25
## Load airway dataset for examples
data('airway', package = 'airway')
# Ensure a 'condition' column exists for examples expecting it
SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex
airway |>
identify_abundant() |>
scale_abundance()
#> Warning: All samples appear to belong to the same group.
#> tidybulk says: the sample with largest library size SRR1039517 was chosen as reference for scaling
#> # A SummarizedExperiment-tibble abstraction: 509,416 × 28
#> # Features=63677 | Samples=8 | Assays=counts, counts_scaled
#> .feature .sample counts counts_scaled SampleName cell dex albut Run
#> <chr> <chr> <int> <dbl> <fct> <fct> <fct> <fct> <fct>
#> 1 ENSG00000000… SRR103… 679 940. GSM1275862 N613… untrt untrt SRR1…
#> 2 ENSG00000000… SRR103… 0 0 GSM1275862 N613… untrt untrt SRR1…
#> 3 ENSG00000000… SRR103… 467 646. GSM1275862 N613… untrt untrt SRR1…
#> 4 ENSG00000000… SRR103… 260 360. GSM1275862 N613… untrt untrt SRR1…
#> 5 ENSG00000000… SRR103… 60 83.1 GSM1275862 N613… untrt untrt SRR1…
#> 6 ENSG00000000… SRR103… 0 0 GSM1275862 N613… untrt untrt SRR1…
#> 7 ENSG00000000… SRR103… 3251 4500. GSM1275862 N613… untrt untrt SRR1…
#> 8 ENSG00000001… SRR103… 1433 1984. GSM1275862 N613… untrt untrt SRR1…
#> 9 ENSG00000001… SRR103… 519 718. GSM1275862 N613… untrt untrt SRR1…
#> 10 ENSG00000001… SRR103… 394 545. GSM1275862 N613… untrt untrt SRR1…
#> # ℹ 40 more rows
#> # ℹ 19 more variables: avgLength <int>, Experiment <fct>, Sample <fct>,
#> # BioSample <fct>, condition <fct>, TMM <dbl>, multiplier <dbl>,
#> # gene_id <chr>, gene_name <chr>, entrezid <int>, gene_biotype <chr>,
#> # gene_seq_start <int>, gene_seq_end <int>, seq_name <chr>, seq_strand <int>,
#> # seq_coord_system <int>, symbol <chr>, .abundant <lgl>, GRangesList <list>