adjust_abundance() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with an additional adjusted abundance column. This method uses scaled counts if present.

adjust_abundance(
  .data,
  .formula = NULL,
  .factor_unwanted = NULL,
  .factor_of_interest = NULL,
  abundance = assayNames(.data)[1],
  .abundance = NULL,
  method = "combat_seq",
  ...,
  log_transform = NULL,
  transform = NULL,
  inverse_transform = NULL
)

# S4 method for class 'SummarizedExperiment'
adjust_abundance(
  .data,
  .formula = NULL,
  .factor_unwanted = NULL,
  .factor_of_interest = NULL,
  abundance = assayNames(.data)[1],
  .abundance = NULL,
  method = "combat_seq",
  ...,
  log_transform = NULL,
  transform = NULL,
  inverse_transform = NULL
)

# S4 method for class 'RangedSummarizedExperiment'
adjust_abundance(
  .data,
  .formula = NULL,
  .factor_unwanted = NULL,
  .factor_of_interest = NULL,
  abundance = assayNames(.data)[1],
  .abundance = NULL,
  method = "combat_seq",
  ...,
  log_transform = NULL,
  transform = NULL,
  inverse_transform = NULL
)

Arguments

.data

A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))

.formula

DEPRECATED - A formula with no response variable, representing the desired linear model where the first covariate is the factor of interest and the second covariate is the unwanted variation (of the kind ~ factor_of_interest + batch)

.factor_unwanted

A tidy select, e.g. column names without double quotation. c(batch, country) These are the factor that we want to adjust for, including unwanted batcheffect, and unwanted biological effects.

.factor_of_interest

A tidy select, e.g. column names without double quotation. c(treatment) These are the factor that we want to preserve.

abundance

The name of the transcript/gene abundance column (character, preferred)

.abundance

DEPRECATED. The name of the transcript/gene abundance column (symbolic, for backward compatibility)

method

A character string. Methods include combat_seq (default), combat and limma_remove_batch_effect.

...

Further parameters passed to the function sva::ComBat

log_transform

DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)

transform

DEPRECATED - A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity

inverse_transform

DEPRECATED - A function that is the inverse of transform (e.g. expm1 is inverse of log1p). This is needed to tranform back the counts after analysis.

Value

A consistent object (to the input) with additional columns for the adjusted counts as `<COUNT COLUMN>_adjusted`

A `SummarizedExperiment` object

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

This function adjusts the abundance for (known) unwanted variation. At the moment just an unwanted covariate is allowed at a time using Combat (DOI: 10.1093/bioinformatics/bts034)

Underlying method: sva::ComBat(data, batch = my_batch, mod = design, prior.plots = FALSE, ...)

References

Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7

Zhang, Y., Parmigiani, G., & Johnson, W. E. (2020). ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics and Bioinformatics, 2(3), lqaa078. doi:10.1093/nargab/lqaa078

Johnson, W. E., Li, C., & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118–127. doi:10.1093/biostatistics/kxj037

Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007

Examples

## Load airway dataset for examples

  data('airway', package = 'airway')
  # Ensure a 'condition' column exists for examples expecting it

    SummarizedExperiment::colData(airway)$condition <- as.factor(SummarizedExperiment::colData(airway)$dex)





cm = airway
# Create a balanced two-level batch within each condition to avoid confounding
cond <- SummarizedExperiment::colData(cm)$condition
cm$batch <- rep(NA_character_, ncol(cm))
for (lev in unique(cond)) {
  idx <- which(cond == lev)
  cm$batch[idx] <- rep(c('A','B'), length.out = length(idx))

}
cm$batch <- as.factor(cm$batch)

cm |>
identify_abundant() |>
adjust_abundance(  .factor_unwanted = batch, .factor_of_interest =  condition, method="combat_seq"  )
#> Warning: All samples appear to belong to the same group.
#> Found 2 batches
#> Using null model in ComBat-seq.
#> Adjusting for 1 covariate(s) or covariate level(s)
#> Estimating dispersions
#> Fitting the GLM model
#> Shrinkage off - using GLM estimates for parameters
#> Adjusting the data
#> # A SummarizedExperiment-tibble abstraction: 509,416 × 27
#> # Features=63677 | Samples=8 | Assays=counts, counts_adjusted
#>    .feature    .sample counts counts_adjusted SampleName cell  dex   albut Run  
#>    <chr>       <chr>    <int>           <dbl> <fct>      <fct> <fct> <fct> <fct>
#>  1 ENSG000000… SRR103…    679             694 GSM1275862 N613… untrt untrt SRR1…
#>  2 ENSG000000… SRR103…      0               0 GSM1275862 N613… untrt untrt SRR1…
#>  3 ENSG000000… SRR103…    467             455 GSM1275862 N613… untrt untrt SRR1…
#>  4 ENSG000000… SRR103…    260             256 GSM1275862 N613… untrt untrt SRR1…
#>  5 ENSG000000… SRR103…     60              56 GSM1275862 N613… untrt untrt SRR1…
#>  6 ENSG000000… SRR103…      0               0 GSM1275862 N613… untrt untrt SRR1…
#>  7 ENSG000000… SRR103…   3251            3879 GSM1275862 N613… untrt untrt SRR1…
#>  8 ENSG000000… SRR103…   1433            1465 GSM1275862 N613… untrt untrt SRR1…
#>  9 ENSG000000… SRR103…    519             569 GSM1275862 N613… untrt untrt SRR1…
#> 10 ENSG000000… SRR103…    394             374 GSM1275862 N613… untrt untrt SRR1…
#> # ℹ 40 more rows
#> # ℹ 18 more variables: avgLength <int>, Experiment <fct>, Sample <fct>,
#> #   BioSample <fct>, condition <fct>, batch <fct>, gene_id <chr>,
#> #   gene_name <chr>, entrezid <int>, gene_biotype <chr>, gene_seq_start <int>,
#> #   gene_seq_end <int>, seq_name <chr>, seq_strand <int>,
#> #   seq_coord_system <int>, symbol <chr>, .abundant <lgl>, GRangesList <list>