Keep variable transcripts — keep

keep_variable() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.

keep_variable(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = TRUE
)

# S4 method for class 'spec_tbl_df'
keep_variable(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = NULL
)

# S4 method for class 'tbl_df'
keep_variable(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = NULL
)

# S4 method for class 'tidybulk'
keep_variable(
  .data,
  .sample = NULL,
  .transcript = NULL,
  .abundance = NULL,
  top = 500,
  transform = log1p,
  log_transform = NULL
)

# S4 method for class 'SummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)

# S4 method for class 'RangedSummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)

Arguments

.data: A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
.sample: The name of the sample column
.transcript: The name of the transcript/gene column
.abundance: The name of the transcript/gene abundance column
top: Integer. Number of top transcript to consider
transform: A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
log_transform: DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)

Value

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

Underlying method: s <- rowMeans((x - rowMeans(x)) ^ 2) o <- order(s, decreasing = TRUE) x <- x[o[1L:top], , drop = FALSE] variable_trancripts = rownames(x)

A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

At the moment this function uses edgeR https://doi.org/10.1093/bioinformatics/btp616

Examples




  keep_variable(tidybulk::se_mini, top = 500)
#> Warning: tidybulk says: highly abundant transcripts were not identified (i.e. identify_abundant()) or filtered (i.e., keep_abundant), therefore this operation will be performed on unfiltered data. In rare occasions this could be wanted. In standard whole-transcriptome workflows is generally unwanted.
#> Getting the 500 most variable genes
#> class: SummarizedExperiment 
#> dim: 500 5 
#> metadata(0):
#> assays(1): count
#> rownames(500): IGKC TCL1A ... ATXN8OS CCL13
#> rowData names(1): entrez
#> colnames(5): SRR1740034 SRR1740035 SRR1740043 SRR1740058 SRR1740067
#> colData names(5): Cell.type time condition days dead