keep_variable() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.

keep_variable(
  .data,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = TRUE
)

# S4 method for class 'SummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)

# S4 method for class 'RangedSummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)

Arguments

.data

A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))

.abundance

The name of the transcript/gene abundance column

top

Integer. Number of top transcript to consider

transform

A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity

log_transform

DEPRECATED. Use transform instead.

Value

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

Underlying method: s <- rowMeans((x - rowMeans(x)) ^ 2) o <- order(s, decreasing = TRUE) x <- x[o[1L:top], , drop = FALSE] variable_trancripts = rownames(x)

A `SummarizedExperiment` object

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

At the moment this function uses edgeR https://doi.org/10.1093/bioinformatics/btp616

References

Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7

Examples

## Load airway dataset for examples

  data('airway', package = 'airway')
  # Ensure a 'condition' column exists for examples expecting it

    SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex





  keep_variable(airway, top = 500)
#> Warning: tidybulk says: highly abundant transcripts were not identified (i.e. identify_abundant()) or filtered (i.e., keep_abundant), therefore this operation will be performed on unfiltered data. In rare occasions this could be wanted. In standard whole-transcriptome workflows is generally unwanted.
#> Getting the 500 most variable genes
#> # A SummarizedExperiment-tibble abstraction: 4,000 × 24
#> # Features=500 | Samples=8 | Assays=counts
#>    .feature        .sample   counts SampleName cell  dex   albut Run   avgLength
#>    <chr>           <chr>      <int> <fct>      <fct> <fct> <fct> <fct>     <int>
#>  1 ENSG00000129824 SRR10395…   4846 GSM1275862 N613… untrt untrt SRR1…       126
#>  2 ENSG00000229807 SRR10395…      0 GSM1275862 N613… untrt untrt SRR1…       126
#>  3 ENSG00000114374 SRR10395…   1358 GSM1275862 N613… untrt untrt SRR1…       126
#>  4 ENSG00000067048 SRR10395…   1507 GSM1275862 N613… untrt untrt SRR1…       126
#>  5 ENSG00000131002 SRR10395…    676 GSM1275862 N613… untrt untrt SRR1…       126
#>  6 ENSG00000012817 SRR10395…   1094 GSM1275862 N613… untrt untrt SRR1…       126
#>  7 ENSG00000184674 SRR10395…    392 GSM1275862 N613… untrt untrt SRR1…       126
#>  8 ENSG00000183878 SRR10395…    330 GSM1275862 N613… untrt untrt SRR1…       126
#>  9 ENSG00000109906 SRR10395…      4 GSM1275862 N613… untrt untrt SRR1…       126
#> 10 ENSG00000213058 SRR10395…    152 GSM1275862 N613… untrt untrt SRR1…       126
#> # ℹ 40 more rows
#> # ℹ 15 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> #   condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> #   gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> #   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> #   GRangesList <list>