keep_variable() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.
keep_variable(
.data,
.abundance = NULL,
top = 500,
transform = log1p,
log_transform = TRUE
)
# S4 method for class 'SummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)
# S4 method for class 'RangedSummarizedExperiment'
keep_variable(.data, top = 500, transform = log1p)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
The name of the transcript/gene abundance column
Integer. Number of top transcript to consider
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
DEPRECATED. Use transform instead.
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
Underlying method: s <- rowMeans((x - rowMeans(x)) ^ 2) o <- order(s, decreasing = TRUE) x <- x[o[1L:top], , drop = FALSE] variable_trancripts = rownames(x)
A `SummarizedExperiment` object
A `SummarizedExperiment` object
`r lifecycle::badge("maturing")`
At the moment this function uses edgeR https://doi.org/10.1093/bioinformatics/btp616
Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7
## Load airway dataset for examples
data('airway', package = 'airway')
# Ensure a 'condition' column exists for examples expecting it
SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex
keep_variable(airway, top = 500)
#> Warning: tidybulk says: highly abundant transcripts were not identified (i.e. identify_abundant()) or filtered (i.e., keep_abundant), therefore this operation will be performed on unfiltered data. In rare occasions this could be wanted. In standard whole-transcriptome workflows is generally unwanted.
#> Getting the 500 most variable genes
#> # A SummarizedExperiment-tibble abstraction: 4,000 × 24
#> # Features=500 | Samples=8 | Assays=counts
#> .feature .sample counts SampleName cell dex albut Run avgLength
#> <chr> <chr> <int> <fct> <fct> <fct> <fct> <fct> <int>
#> 1 ENSG00000129824 SRR10395… 4846 GSM1275862 N613… untrt untrt SRR1… 126
#> 2 ENSG00000229807 SRR10395… 0 GSM1275862 N613… untrt untrt SRR1… 126
#> 3 ENSG00000114374 SRR10395… 1358 GSM1275862 N613… untrt untrt SRR1… 126
#> 4 ENSG00000067048 SRR10395… 1507 GSM1275862 N613… untrt untrt SRR1… 126
#> 5 ENSG00000131002 SRR10395… 676 GSM1275862 N613… untrt untrt SRR1… 126
#> 6 ENSG00000012817 SRR10395… 1094 GSM1275862 N613… untrt untrt SRR1… 126
#> 7 ENSG00000184674 SRR10395… 392 GSM1275862 N613… untrt untrt SRR1… 126
#> 8 ENSG00000183878 SRR10395… 330 GSM1275862 N613… untrt untrt SRR1… 126
#> 9 ENSG00000109906 SRR10395… 4 GSM1275862 N613… untrt untrt SRR1… 126
#> 10 ENSG00000213058 SRR10395… 152 GSM1275862 N613… untrt untrt SRR1… 126
#> # ℹ 40 more rows
#> # ℹ 15 more variables: Experiment <fct>, Sample <fct>, BioSample <fct>,
#> # condition <fct>, gene_id <chr>, gene_name <chr>, entrezid <int>,
#> # gene_biotype <chr>, gene_seq_start <int>, gene_seq_end <int>,
#> # seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
#> # GRangesList <list>