keep_variable() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.

  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = TRUE

# S4 method for spec_tbl_df
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = NULL

# S4 method for tbl_df
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = NULL

# S4 method for tidybulk
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = NULL

# S4 method for SummarizedExperiment
keep_variable(.data, top = 500, transform = log1p)

# S4 method for RangedSummarizedExperiment
keep_variable(.data, top = 500, transform = log1p)



A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))


The name of the sample column


The name of the transcript/gene column


The name of the transcript/gene abundance column


Integer. Number of top transcript to consider


A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity


DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)


A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

Underlying method: s <- rowMeans((x - rowMeans(x)) ^ 2) o <- order(s, decreasing = TRUE) x <- x[o[1L:top], , drop = FALSE] variable_trancripts = rownames(x)

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

A `SummarizedExperiment` object

A `SummarizedExperiment` object


`r lifecycle::badge("maturing")`

At the moment this function uses edgeR


      top = 500
#> Warning: tidybulk says: highly abundant transcripts were not identified (i.e. identify_abundant()) or filtered (i.e., keep_abundant), therefore this operation will be performed on unfiltered data. In rare occasions this could be wanted. In standard whole-transcriptome workflows is generally unwanted.
#> Getting the 500 most variable genes
#> class: SummarizedExperiment 
#> dim: 500 5 
#> metadata(0):
#> assays(1): count
#> rownames(500): IGKC TCL1A ... ATXN8OS CCL13
#> rowData names(1): entrez
#> colnames(5): SRR1740034 SRR1740035 SRR1740043 SRR1740058 SRR1740067
#> colData names(5): Cell.type time condition days dead