Perform differential expression testing using edgeR quasi-likelihood (QLT), edgeR likelihood-ratio (LR), limma-voom, limma-voom-with-quality-weights or DESeq2

test_differential_expression() is an alias for test_differential_abundance() that takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.

test_differential_expression(
  .data,
  .formula,
  abundance = assayNames(.data)[1],
  contrasts = NULL,
  method = c("edgeR_quasi_likelihood", "edgeR_likelihood_ratio",
    "edger_robust_likelihood_ratio", "DESeq2", "limma_voom", "limma_voom_sample_weights",
    "glmmseq_lme4", "glmmseq_glmmtmb"),
  test_above_log2_fold_change = NULL,
  scaling_method = "TMM",
  omit_contrast_in_colnames = FALSE,
  prefix = "",
  ...,
  significance_threshold = NULL,
  fill_missing_values = NULL,
  .contrasts = NULL,
  .abundance = NULL
)

# S4 method for class 'SummarizedExperiment'
test_differential_expression(
  .data,
  .formula,
  abundance = assayNames(.data)[1],
  contrasts = NULL,
  method = c("edgeR_quasi_likelihood", "edgeR_likelihood_ratio",
    "edger_robust_likelihood_ratio", "DESeq2", "limma_voom", "limma_voom_sample_weights",
    "glmmseq_lme4", "glmmseq_glmmtmb"),
  test_above_log2_fold_change = NULL,
  scaling_method = "TMM",
  omit_contrast_in_colnames = FALSE,
  prefix = "",
  ...,
  significance_threshold = NULL,
  fill_missing_values = NULL,
  .contrasts = NULL,
  .abundance = NULL
)

# S4 method for class 'RangedSummarizedExperiment'
test_differential_expression(
  .data,
  .formula,
  abundance = assayNames(.data)[1],
  contrasts = NULL,
  method = c("edgeR_quasi_likelihood", "edgeR_likelihood_ratio",
    "edger_robust_likelihood_ratio", "DESeq2", "limma_voom", "limma_voom_sample_weights",
    "glmmseq_lme4", "glmmseq_glmmtmb"),
  test_above_log2_fold_change = NULL,
  scaling_method = "TMM",
  omit_contrast_in_colnames = FALSE,
  prefix = "",
  ...,
  significance_threshold = NULL,
  fill_missing_values = NULL,
  .contrasts = NULL,
  .abundance = NULL
)

Arguments

.data: A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
.formula: A formula representing the desired linear model. If there is more than one factor, they should be in the order factor of interest + additional factors.
abundance: The name of the transcript/gene abundance column (character, preferred)
contrasts: This parameter takes the format of the contrast parameter of the method of choice. For edgeR and limma-voom is a character vector. For DESeq2 is a list including a character vector of length three. The first covariate is the one the model is tested against (e.g., ~ factor_of_interest)
method: A character vector. Available methods are "edgeR_quasi_likelihood" (i.e., QLF), "edgeR_likelihood_ratio" (i.e., LRT), "edger_robust_likelihood_ratio", "DESeq2", "limma_voom", "limma_voom_sample_weights", "glmmseq_lme4", "glmmseq_glmmtmb". Only one method can be specified at a time.
test_above_log2_fold_change: A positive real value. This works for edgeR and limma_voom methods. It uses the `treat` function, which tests that the difference in abundance is bigger than this threshold rather than zero https://pubmed.ncbi.nlm.nih.gov/19176553.
scaling_method: A character string. The scaling method passed to the back-end functions: edgeR and limma-voom (i.e., edgeR::calcNormFactors; "TMM","TMMwsp","RLE","upperquartile"). Setting the parameter to \"none\" will skip the compensation for sequencing-depth for the method edgeR or limma_voom.
omit_contrast_in_colnames: If just one contrast is specified you can choose to omit the contrast label in the colnames.
prefix: A character string. The prefix you would like to add to the result columns. It is useful if you want to compare several methods.
...: Further arguments passed to some of the internal experimental functions. For example for glmmSeq, it is possible to pass .dispersion, and .scaling_factor column tidyeval to skip the caluclation of dispersion and scaling and use precalculated values. This is helpful is you want to calculate those quantities on many genes and do DE testing on fewer genes. .scaling_factor is the TMM value that can be obtained with tidybulk::scale_abundance.
significance_threshold: DEPRECATED - A real between 0 and 1 (usually 0.05).
fill_missing_values: DEPRECATED - A boolean. Whether to fill missing sample/transcript values with the median of the transcript. This is rarely needed.
.contrasts: DEPRECATED - This parameter takes the format of the contrast parameter of the method of choice. For edgeR and limma-voom is a character vector. For DESeq2 is a list including a character vector of length three. The first covariate is the one the model is tested against (e.g., ~ factor_of_interest)
.abundance: DEPRECATED. The name of the transcript/gene abundance column (symbolic, for backward compatibility)

Value

A consistent object (to the input) with additional columns for the statistics from the test (e.g., log fold change, p-value and false discovery rate).

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

This function provides the option to use edgeR https://doi.org/10.1093/bioinformatics/btp616, limma-voom https://doi.org/10.1186/gb-2014-15-2-r29, limma_voom_sample_weights https://doi.org/10.1093/nar/gkv412 or DESeq2 https://doi.org/10.1186/s13059-014-0550-8 to perform the testing. All methods use raw counts, irrespective of if scale_abundance or adjust_abundance have been calculated, therefore it is essential to add covariates such as batch effects (if applicable) in the formula.

Underlying method for edgeR framework:

.data |>

# Filter keep_abundant( factor_of_interest = !!(as.symbol(parse_formula(.formula)[1])), minimum_counts = minimum_counts, minimum_proportion = minimum_proportion ) |>

# Format select(!!.transcript,!!.sample,!!.abundance) |> spread(!!.sample,!!.abundance) |> as_matrix(rownames = !!.transcript) |>

# edgeR edgeR::DGEList(counts = .) |> edgeR::calcNormFactors(method = scaling_method) |> edgeR::estimateDisp(design) |>

# Fit edgeR::glmQLFit(design) |> // or glmFit according to choice edgeR::glmQLFTest(coef = 2, contrast = my_contrasts) // or glmLRT according to choice

Underlying method for DESeq2 framework:

keep_abundant( factor_of_interest = !!as.symbol(parse_formula(.formula)[[1]]), minimum_counts = minimum_counts, minimum_proportion = minimum_proportion ) |>

# DESeq2 DESeq2::DESeqDataSet(design = .formula) |> DESeq2::DESeq() |> DESeq2::results()

Underlying method for glmmSeq framework:

counts = .data |> assay(my_assay)

# Create design matrix for dispersion, removing random effects design = model.matrix( object = .formula |> lme4::nobars(), data = metadata )

dispersion = counts |> edgeR::estimateDisp(design = design)

glmmSeq( .formula, countdata = counts , metadata = metadata |> as.data.frame(), dispersion = dispersion, progress = TRUE, method = method |> str_remove("(?i)^glmmSeq_" ), )

References

Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7

McCarthy, D. J., Chen, Y., & Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40(10), 4288-4297. doi:10.1093/nar/gks042

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. doi:10.1186/s13059-014-0550-8

Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15(2), R29. doi:10.1186/gb-2014-15-2-r29

Examples

## Load airway dataset for examples

  data('airway', package = 'airway')
  # Ensure a 'condition' column exists for examples expecting it

    SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex



 # edgeR (default method)

 airway |>
 identify_abundant() |>
  test_differential_expression( ~ condition, method = "edgeR_quasi_likelihood" )
#> Warning: All samples appear to belong to the same group.
#> Warning: The `.abundance` argument of `test_differential_abundance()` is deprecated as
#> of tidybulk 2.0.0.
#> ℹ Please use the `abundance` argument instead.
#> ℹ The deprecated feature was likely used in the tidybulk package.
#>   Please report the issue at <https://github.com/stemangiola/tidybulk/issues>.
#> tidybulk says: The design column names are "(Intercept), conditionuntrt"
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$edgeR_quasi_likelihood_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$edgeR_quasi_likelihood_fit`
#> # A SummarizedExperiment-tibble abstraction: 113,792 × 30
#> # Features=14224 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000000003 SRR10395…    679 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000000419 SRR10395…    467 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000000457 SRR10395…    260 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000000460 SRR10395…     60 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000000971 SRR10395…   3251 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000001036 SRR10395…   1433 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000001084 SRR10395…    519 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000001167 SRR10395…    394 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000001460 SRR10395…    172 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000001461 SRR10395…   2112 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 40 more rows
#> # ℹ 21 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   .abundant <lgl>, logFC <dbl>, logCPM <dbl>, F <dbl>, PValue <dbl>,
#> #   FDR <dbl>, GRangesList <list>

 # You can also explicitly specify the method
 airway |>
 identify_abundant() |>
  test_differential_expression( ~ condition, method = "edgeR_quasi_likelihood" )
#> Warning: All samples appear to belong to the same group.
#> tidybulk says: The design column names are "(Intercept), conditionuntrt"
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$edgeR_quasi_likelihood_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$edgeR_quasi_likelihood_fit`
#> # A SummarizedExperiment-tibble abstraction: 113,792 × 30
#> # Features=14224 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000000003 SRR10395…    679 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000000419 SRR10395…    467 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000000457 SRR10395…    260 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000000460 SRR10395…     60 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000000971 SRR10395…   3251 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000001036 SRR10395…   1433 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000001084 SRR10395…    519 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000001167 SRR10395…    394 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000001460 SRR10395…    172 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000001461 SRR10395…   2112 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 40 more rows
#> # ℹ 21 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   .abundant <lgl>, logFC <dbl>, logCPM <dbl>, F <dbl>, PValue <dbl>,
#> #   FDR <dbl>, GRangesList <list>

  # The function `test_differential_expression` operates with contrasts too

 airway |>
 identify_abundant(factor_of_interest = condition) |>
 test_differential_expression(
      ~ 0 + condition,
      contrasts = c( "conditiontrt - conditionuntrt"),
    method = "edgeR_quasi_likelihood"
 )
#> tidybulk says: The design column names are "conditiontrt, conditionuntrt"
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$edgeR_quasi_likelihood_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$edgeR_quasi_likelihood_fit`
#> # A SummarizedExperiment-tibble abstraction: 127,408 × 30
#> # Features=15926 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000000003 SRR10395…    679 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000000419 SRR10395…    467 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000000457 SRR10395…    260 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000000460 SRR10395…     60 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000000971 SRR10395…   3251 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000001036 SRR10395…   1433 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000001084 SRR10395…    519 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000001167 SRR10395…    394 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000001460 SRR10395…    172 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000001461 SRR10395…   2112 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 40 more rows
#> # ℹ 21 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   .abundant <lgl>, `logFC___conditiontrt - conditionuntrt` <dbl>,
#> #   `logCPM___conditiontrt - conditionuntrt` <dbl>, …

 # DESeq2 - equivalent for limma-voom

my_se_mini = airway
my_se_mini$condition  = factor(my_se_mini$condition)

# demontrating with `fitType` that you can access any arguments to DESeq()
my_se_mini  |>
   identify_abundant(factor_of_interest = condition) |>
       test_differential_expression( ~ condition, method="deseq2", fitType="local")
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$deseq2_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$deseq2_fit`
#> # A SummarizedExperiment-tibble abstraction: 127,408 × 31
#> # Features=15926 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000000003 SRR10395…    679 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000000419 SRR10395…    467 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000000457 SRR10395…    260 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000000460 SRR10395…     60 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000000971 SRR10395…   3251 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000001036 SRR10395…   1433 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000001084 SRR10395…    519 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000001167 SRR10395…    394 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000001460 SRR10395…    172 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000001461 SRR10395…   2112 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 40 more rows
#> # ℹ 22 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   .abundant <lgl>, baseMean <dbl>, log2FoldChange <dbl>, lfcSE <dbl>,
#> #   stat <dbl>, pvalue <dbl>, padj <dbl>, GRangesList <list>

# testing above a log2 threshold, passes along value to lfcThreshold of results()
res <- my_se_mini  |>
   identify_abundant(factor_of_interest = condition) |>
        test_differential_expression( ~ condition, method="deseq2",
            fitType="local",
            test_above_log2_fold_change=4 )
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$deseq2_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$deseq2_fit`

# Use random intercept and random effect models

 airway[1:50,] |>
  identify_abundant(factor_of_interest = condition) |>
  test_differential_expression(
    ~ condition + (1 + condition | cell),
    method = "glmmseq_lme4", cores = 1
  )
#> 
#> n = 8 samples, 4 individuals
#> Time difference of 32.3986 secs
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$glmmseq_lme4_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$glmmseq_lme4_fit`
#> # A SummarizedExperiment-tibble abstraction: 336 × 61
#> # Features=42 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000000003 SRR10395…    679 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000000419 SRR10395…    467 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000000457 SRR10395…    260 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000000460 SRR10395…     60 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000000971 SRR10395…   3251 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000001036 SRR10395…   1433 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000001084 SRR10395…    519 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000001167 SRR10395…    394 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000001460 SRR10395…    172 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000001461 SRR10395…   2112 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 32 more rows
#> # ℹ 52 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   .abundant <lgl>, Dispersion <dbl>, AIC <dbl>, logLik <dbl>, meanExp <dbl>,
#> #   `(Intercept)` <dbl>, conditionuntrt <dbl>, …

# confirm that lfcThreshold was used

if (FALSE) { # \dontrun{
    res |>
        mcols() |>
        DESeq2::DESeqResults() |>
        DESeq2::plotMA()
} # }

# The function `test_differential_expression` operates with contrasts too

 my_se_mini |>
 identify_abundant() |>
 test_differential_expression(
      ~ 0 + condition,
      contrasts = list(c("condition", "trt", "untrt")),
      method="deseq2",
         fitType="local"
 )
#> Warning: All samples appear to belong to the same group.
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> tidybulk says: to access the DE object do `metadata(.)$tidybulk$deseq2_object`
#> tidybulk says: to access the raw results (fitted GLM) do `metadata(.)$tidybulk$deseq2_fit`
#> # A SummarizedExperiment-tibble abstraction: 113,792 × 31
#> # Features=14224 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000000003 SRR10395…    679 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000000419 SRR10395…    467 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000000457 SRR10395…    260 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000000460 SRR10395…     60 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000000971 SRR10395…   3251 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000001036 SRR10395…   1433 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000001084 SRR10395…    519 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000001167 SRR10395…    394 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000001460 SRR10395…    172 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000001461 SRR10395…   2112 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 40 more rows
#> # ℹ 22 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   .abundant <lgl>, `baseMean___condition trt-untrt` <dbl>,
#> #   `log2FoldChange___condition trt-untrt` <dbl>, …