R/adjust_abundance.R
adjust_abundance-methods.Rd
adjust_abundance() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with an additional adjusted abundance column. This method uses scaled counts if present.
adjust_abundance(
.data,
.formula = NULL,
.factor_unwanted = NULL,
.factor_of_interest = NULL,
abundance = assayNames(.data)[1],
.abundance = NULL,
method = "combat_seq",
...,
log_transform = NULL,
transform = NULL,
inverse_transform = NULL
)
# S4 method for class 'SummarizedExperiment'
adjust_abundance(
.data,
.formula = NULL,
.factor_unwanted = NULL,
.factor_of_interest = NULL,
abundance = assayNames(.data)[1],
.abundance = NULL,
method = "combat_seq",
...,
log_transform = NULL,
transform = NULL,
inverse_transform = NULL
)
# S4 method for class 'RangedSummarizedExperiment'
adjust_abundance(
.data,
.formula = NULL,
.factor_unwanted = NULL,
.factor_of_interest = NULL,
abundance = assayNames(.data)[1],
.abundance = NULL,
method = "combat_seq",
...,
log_transform = NULL,
transform = NULL,
inverse_transform = NULL
)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
DEPRECATED - A formula with no response variable, representing the desired linear model where the first covariate is the factor of interest and the second covariate is the unwanted variation (of the kind ~ factor_of_interest + batch)
A tidy select, e.g. column names without double quotation. c(batch, country) These are the factor that we want to adjust for, including unwanted batcheffect, and unwanted biological effects.
A tidy select, e.g. column names without double quotation. c(treatment) These are the factor that we want to preserve.
The name of the transcript/gene abundance column (character, preferred)
DEPRECATED. The name of the transcript/gene abundance column (symbolic, for backward compatibility)
A character string. Methods include combat_seq (default), combat and limma_remove_batch_effect.
Further parameters passed to the function sva::ComBat
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)
DEPRECATED - A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
DEPRECATED - A function that is the inverse of transform (e.g. expm1 is inverse of log1p). This is needed to tranform back the counts after analysis.
A consistent object (to the input) with additional columns for the adjusted counts as `<COUNT COLUMN>_adjusted`
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
This function adjusts the abundance for (known) unwanted variation. At the moment just an unwanted covariate is allowed at a time using Combat (DOI: 10.1093/bioinformatics/bts034)
Underlying method: sva::ComBat(data, batch = my_batch, mod = design, prior.plots = FALSE, ...)
Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7
Zhang, Y., Parmigiani, G., & Johnson, W. E. (2020). ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics and Bioinformatics, 2(3), lqaa078. doi:10.1093/nargab/lqaa078
Johnson, W. E., Li, C., & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118–127. doi:10.1093/biostatistics/kxj037
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007
## Load airway dataset for examples
data('airway', package = 'airway')
# Ensure a 'condition' column exists for examples expecting it
SummarizedExperiment::colData(airway)$condition <- as.factor(SummarizedExperiment::colData(airway)$dex)
cm = airway
# Create a balanced two-level batch within each condition to avoid confounding
cond <- SummarizedExperiment::colData(cm)$condition
cm$batch <- rep(NA_character_, ncol(cm))
for (lev in unique(cond)) {
idx <- which(cond == lev)
cm$batch[idx] <- rep(c('A','B'), length.out = length(idx))
}
cm$batch <- as.factor(cm$batch)
cm |>
identify_abundant() |>
adjust_abundance( .factor_unwanted = batch, .factor_of_interest = condition, method="combat_seq" )
#> Warning: All samples appear to belong to the same group.
#> Found 2 batches
#> Using null model in ComBat-seq.
#> Adjusting for 1 covariate(s) or covariate level(s)
#> Estimating dispersions
#> Fitting the GLM model
#> Shrinkage off - using GLM estimates for parameters
#> Adjusting the data
#> # A SummarizedExperiment-tibble abstraction: 509,416 × 27
#> # Features=63677 | Samples=8 | Assays=counts, counts_adjusted
#> .feature .sample counts counts_adjusted SampleName cell dex albut Run
#> <chr> <chr> <int> <dbl> <fct> <fct> <fct> <fct> <fct>
#> 1 ENSG000000… SRR103… 679 694 GSM1275862 N613… untrt untrt SRR1…
#> 2 ENSG000000… SRR103… 0 0 GSM1275862 N613… untrt untrt SRR1…
#> 3 ENSG000000… SRR103… 467 455 GSM1275862 N613… untrt untrt SRR1…
#> 4 ENSG000000… SRR103… 260 256 GSM1275862 N613… untrt untrt SRR1…
#> 5 ENSG000000… SRR103… 60 56 GSM1275862 N613… untrt untrt SRR1…
#> 6 ENSG000000… SRR103… 0 0 GSM1275862 N613… untrt untrt SRR1…
#> 7 ENSG000000… SRR103… 3251 3879 GSM1275862 N613… untrt untrt SRR1…
#> 8 ENSG000000… SRR103… 1433 1465 GSM1275862 N613… untrt untrt SRR1…
#> 9 ENSG000000… SRR103… 519 569 GSM1275862 N613… untrt untrt SRR1…
#> 10 ENSG000000… SRR103… 394 374 GSM1275862 N613… untrt untrt SRR1…
#> # ℹ 40 more rows
#> # ℹ 18 more variables: avgLength <int>, Experiment <fct>, Sample <fct>,
#> # BioSample <fct>, condition <fct>, batch <fct>, gene_id <chr>,
#> # gene_name <chr>, entrezid <int>, gene_biotype <chr>, gene_seq_start <int>,
#> # gene_seq_end <int>, seq_name <chr>, seq_strand <int>,
#> # seq_coord_system <int>, symbol <chr>, .abundant <lgl>, GRangesList <list>