keep_variable() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.
keep_variable(
.data,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
top = 500,
transform = log1p,
log_transform = TRUE
)
# S4 method for class 'spec_tbl_df'
keep_variable(
.data,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
top = 500,
transform = log1p,
log_transform = NULL
)
# S4 method for class 'tbl_df'
keep_variable(
.data,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
top = 500,
transform = log1p,
log_transform = NULL
)
# S4 method for class 'tidybulk'
keep_variable(
.data,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
top = 500,
transform = log1p,
log_transform = NULL
)
# S4 method for class 'SummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)
# S4 method for class 'RangedSummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
The name of the sample column
The name of the transcript/gene column
The name of the transcript/gene abundance column
Integer. Number of top transcript to consider
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
Underlying method: s <- rowMeans((x - rowMeans(x)) ^ 2) o <- order(s, decreasing = TRUE) x <- x[o[1L:top], , drop = FALSE] variable_trancripts = rownames(x)
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
At the moment this function uses edgeR https://doi.org/10.1093/bioinformatics/btp616
keep_variable(tidybulk::se_mini, top = 500)
#> Warning: tidybulk says: highly abundant transcripts were not identified (i.e. identify_abundant()) or filtered (i.e., keep_abundant), therefore this operation will be performed on unfiltered data. In rare occasions this could be wanted. In standard whole-transcriptome workflows is generally unwanted.
#> Getting the 500 most variable genes
#> class: SummarizedExperiment
#> dim: 500 5
#> metadata(0):
#> assays(1): count
#> rownames(500): IGKC TCL1A ... ATXN8OS CCL13
#> rowData names(1): entrez
#> colnames(5): SRR1740034 SRR1740035 SRR1740043 SRR1740058 SRR1740067
#> colData names(5): Cell.type time condition days dead