vignettes/introduction.Rmd
introduction.Rmd
Brings SummarizedExperiment to the tidyverse!
website: stemangiola.github.io/tidySummarizedExperiment/
Please also have a look at
tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment (Morgan et al. 2020) and the tidyverse (Wickham et al. 2019). It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
SummarizedExperiment-compatible Functions | Description |
---|---|
all |
After all tidySummarizedExperiment is a
SummarizedExperiment object, just better |
tidyverse Packages | Description |
---|---|
dplyr |
Almost all dplyr APIs like for any tibble |
tidyr |
Almost all tidyr APIs like for any tibble |
ggplot2 |
ggplot like for any tibble |
plotly |
plot_ly like for any tibble |
Utilities | Description |
---|---|
as_tibble |
Convert cell-wise information to a tbl_df
|
if (!requireNamespace("BiocManager", quietly=TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("tidySummarizedExperiment")
From Github (development)
devtools::install_github("stemangiola/tidySummarizedExperiment")
Load libraries used in the examples.
tidySummarizedExperiment
, the best of both
worlds!
This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.
pasilla_tidy <- tidySummarizedExperiment::pasilla
It looks like a tibble
pasilla_tidy
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # ℹ 40 more rows
But it is a SummarizedExperiment object after all
assays(pasilla_tidy)
## List of length 1
## names(1): counts
We can use tidyverse commands to explore the tidy SummarizedExperiment object.
We can use slice
to choose rows by position, for example
to choose the first row.
## # A SummarizedExperiment-tibble abstraction: 1 × 5
## # Features=1 | Samples=1 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
We can use filter
to choose rows by criteria.
## # A SummarizedExperiment-tibble abstraction: 58,396 × 5
## # Features=14599 | Samples=4 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # ℹ 40 more rows
We can use select
to choose columns.
## # A tibble: 102,193 × 1
## .sample
## <chr>
## 1 untrt1
## 2 untrt1
## 3 untrt1
## 4 untrt1
## 5 untrt1
## 6 untrt1
## 7 untrt1
## 8 untrt1
## 9 untrt1
## 10 untrt1
## # ℹ 102,183 more rows
We can use count
to count how many rows we have for each
sample.
## # A tibble: 7 × 2
## .sample n
## <chr> <int>
## 1 trt1 14599
## 2 trt2 14599
## 3 trt3 14599
## 4 untrt1 14599
## 5 untrt2 14599
## 6 untrt3 14599
## 7 untrt4 14599
We can use distinct
to see what distinct sample
information we have.
## # A tibble: 7 × 3
## .sample condition type
## <chr> <chr> <chr>
## 1 untrt1 untreated single_end
## 2 untrt2 untreated single_end
## 3 untrt3 untreated paired_end
## 4 untrt4 untreated paired_end
## 5 trt1 treated single_end
## 6 trt2 treated paired_end
## 7 trt3 treated paired_end
We could use rename
to rename a column. For example, to
modify the type column name.
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts condition sequencing
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single_end
## 2 FBgn0000008 untrt1 92 untreated single_end
## 3 FBgn0000014 untrt1 5 untreated single_end
## 4 FBgn0000015 untrt1 0 untreated single_end
## 5 FBgn0000017 untrt1 4664 untreated single_end
## 6 FBgn0000018 untrt1 583 untreated single_end
## 7 FBgn0000022 untrt1 0 untreated single_end
## 8 FBgn0000024 untrt1 10 untreated single_end
## 9 FBgn0000028 untrt1 0 untreated single_end
## 10 FBgn0000032 untrt1 1446 untreated single_end
## # ℹ 40 more rows
We could use mutate
to create a column. For example, we
could create a new type column that contains single and paired instead
of single_end and paired_end.
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts condition type
## <chr> <chr> <int> <chr> <chr>
## 1 FBgn0000003 untrt1 0 untreated single
## 2 FBgn0000008 untrt1 92 untreated single
## 3 FBgn0000014 untrt1 5 untreated single
## 4 FBgn0000015 untrt1 0 untreated single
## 5 FBgn0000017 untrt1 4664 untreated single
## 6 FBgn0000018 untrt1 583 untreated single
## 7 FBgn0000022 untrt1 0 untreated single
## 8 FBgn0000024 untrt1 10 untreated single
## 9 FBgn0000028 untrt1 0 untreated single
## 10 FBgn0000032 untrt1 1446 untreated single
## # ℹ 40 more rows
We could use unite
to combine multiple columns into a
single column.
## # A SummarizedExperiment-tibble abstraction: 102,193 × 4
## # Features=14599 | Samples=7 | Assays=counts
## .feature .sample counts group
## <chr> <chr> <int> <chr>
## 1 FBgn0000003 untrt1 0 untreated_single_end
## 2 FBgn0000008 untrt1 92 untreated_single_end
## 3 FBgn0000014 untrt1 5 untreated_single_end
## 4 FBgn0000015 untrt1 0 untreated_single_end
## 5 FBgn0000017 untrt1 4664 untreated_single_end
## 6 FBgn0000018 untrt1 583 untreated_single_end
## 7 FBgn0000022 untrt1 0 untreated_single_end
## 8 FBgn0000024 untrt1 10 untreated_single_end
## 9 FBgn0000028 untrt1 0 untreated_single_end
## 10 FBgn0000032 untrt1 1446 untreated_single_end
## # ℹ 40 more rows
We can also combine commands with the tidyverse pipe
%>%
.
For example, we could combine group_by
and
summarise
to get the total counts for each sample.
## # A tibble: 7 × 2
## .sample total_counts
## <chr> <int>
## 1 trt1 18670279
## 2 trt2 9571826
## 3 trt3 10343856
## 4 untrt1 13972512
## 5 untrt2 21911438
## 6 untrt3 8358426
## 7 untrt4 9841335
We could combine group_by
, mutate
and
filter
to get the transcripts with mean count > 0.
## # A tibble: 86,513 × 6
## # Groups: .feature [12,359]
## .feature .sample counts condition type mean_count
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 FBgn0000003 untrt1 0 untreated single_end 0.143
## 2 FBgn0000008 untrt1 92 untreated single_end 99.6
## 3 FBgn0000014 untrt1 5 untreated single_end 1.43
## 4 FBgn0000015 untrt1 0 untreated single_end 0.857
## 5 FBgn0000017 untrt1 4664 untreated single_end 4672.
## 6 FBgn0000018 untrt1 583 untreated single_end 461.
## 7 FBgn0000022 untrt1 0 untreated single_end 0.143
## 8 FBgn0000024 untrt1 10 untreated single_end 7
## 9 FBgn0000028 untrt1 0 untreated single_end 0.429
## 10 FBgn0000032 untrt1 1446 untreated single_end 1085.
## # ℹ 86,503 more rows
my_theme <-
list(
scale_fill_brewer(palette="Set1"),
scale_color_brewer(palette="Set1"),
theme_bw() +
theme(
panel.border=element_blank(),
axis.line=element_line(),
panel.grid.major=element_line(size=0.2),
panel.grid.minor=element_line(size=0.1),
text=element_text(size=12),
legend.position="bottom",
aspect.ratio=1,
strip.background=element_blank(),
axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
)
)
We can treat pasilla_tidy
as a normal tibble for
plotting.
Here we plot the distribution of counts per sample.
pasilla_tidy %>%
tidySummarizedExperiment::ggplot(aes(counts + 1, group=.sample, color=`type`)) +
geom_density() +
scale_x_log10() +
my_theme
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidySummarizedExperiment_1.11.4 ttservice_0.2.2
## [3] SummarizedExperiment_1.28.0 Biobase_2.58.0
## [5] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
## [7] IRanges_2.32.0 S4Vectors_0.36.2
## [9] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
## [11] matrixStats_0.63.0 ggplot2_3.4.2
## [13] knitr_1.42 BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] lattice_0.21-8 tidyr_1.3.0 rprojroot_2.0.3
## [4] digest_0.6.31 utf8_1.2.3 R6_2.5.1
## [7] evaluate_0.20 highr_0.10 httr_1.4.5
## [10] pillar_1.9.0 zlibbioc_1.44.0 rlang_1.1.0
## [13] lazyeval_0.2.2 data.table_1.14.8 jquerylib_0.1.4
## [16] Matrix_1.5-4 rmarkdown_2.21 pkgdown_2.0.7
## [19] labeling_0.4.2 textshaping_0.3.6 desc_1.4.2
## [22] stringr_1.5.0 htmlwidgets_1.6.2 RCurl_1.98-1.12
## [25] munsell_0.5.0 DelayedArray_0.24.0 compiler_4.2.3
## [28] xfun_0.38 pkgconfig_2.0.3 systemfonts_1.0.4
## [31] htmltools_0.5.5 tidyselect_1.2.0 tibble_3.2.1
## [34] GenomeInfoDbData_1.2.9 bookdown_0.33 viridisLite_0.4.1
## [37] fansi_1.0.4 dplyr_1.1.1 withr_2.5.0
## [40] bitops_1.0-7 grid_4.2.3 jsonlite_1.8.4
## [43] gtable_0.3.3 lifecycle_1.0.3 magrittr_2.0.3
## [46] scales_1.2.1 cli_3.6.1 stringi_1.7.12
## [49] cachem_1.0.7 farver_2.1.1 XVector_0.38.0
## [52] fs_1.6.1 bslib_0.4.2 ellipsis_0.3.2
## [55] ragg_1.2.5 generics_0.1.3 vctrs_0.6.2
## [58] RColorBrewer_1.1-3 tools_4.2.3 glue_1.6.2
## [61] purrr_1.0.1 fastmap_1.1.1 yaml_2.3.7
## [64] colorspace_2.1-0 BiocManager_1.30.20 memoise_2.0.1
## [67] plotly_4.10.1 sass_0.4.5