biocasia2020tidybulk

Instructor names and contact information

  • Maria Doyle <Maria.Doyle at petermac.org>
  • Stefano Mangiola <mangiola.s at wehi.edu.au>

Syllabus

Material web page.

This material was created for the BioC Asia 2020 conference workshop but it can also be used for self-learning.

More details on the workshop are below.

Workshop package installation

This is necessary in order to reproduce the code shown in the workshop. The workshop is designed for R 4.0 and can be installed using one of the two ways below.

Via Docker image

If you’re familiar with Docker you could use the Docker image which has all the software pre-configured to the correct versions.

docker run -e PASSWORD=abc -p 8787:8787 stemangiola/biocasia2020_tidytranscriptomics:biocasia2020

Once running, navigate to http://localhost:8787/ and then login with Username:rstudio and Password:abc.

You should see the Rmarkdown file with all the workshop code which you can run.

Workshop Description

This workshop will present how to perform analysis of RNA sequencing count data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.

This can be achieved for RNA sequencing count data with the tidybulk, tidySCE, tidyHeatmap and tidyverse packages. The tidybulk package provides a tidy data structure and a modular framework for bulk transcriptional analyses, the tidySCE package provides similar for single-cell transcriptional analyses, and tidyHeatmap provides a tidy implementation of ComplexHeatmap. These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data.

Pre-requisites

  • Basic knowledge of RStudio
  • Familiarity with tidyverse syntax

Recommended Background Reading Introduction to R for Biologists

Workshop Participation

The workshop format is a 55 min session consisting of hands-on demos with Q&A.

R / Bioconductor packages used

  • tidyverse
  • tidybulk
  • tidySCE
  • tidyHeatmap
  • limma
  • edgeR
  • DESeq2
  • SummarizedExperiment
  • SingleCellExperiment
  • scater
  • scran
  • SingleR
  • igraph
  • airway
  • org.Hs.eg.db
  • dittoSeq
  • ggrepel
  • GGally
  • plotly

Time outline

Rough guide

Activity Time
Part 1 Bulk RNA-seq with tidybulk 25m
Part 2 Single-cell RNA-seq with tidySCE 30m
Total 55m

Workshop goals and objectives

In exploring and analysing RNA sequencing count data, there are a number of key concepts, such as filtering, scaling, dimensionality reduction, hypothesis testing, clustering and visualisation, that need to be understood. These concepts can be intuitively explained to new users, however, (i) the use of a heterogeneous vocabulary and jargon by methodologies/algorithms/packages, (ii) the complexity of data wrangling, and (iii) the coding burden, impede effective learning of the statistics and biology underlying an informed RNA sequencing analysis.

The tidytranscriptomics approach to RNA sequencing data analysis abstracts out the coding-related complexity and provides tools that use an intuitive and jargon-free vocabulary, enabling focus on the statistical and biological challenges.

Learning goals

  • To understand the key concepts and steps of RNA sequencing count data analysis
  • To approach data representation and analysis though a tidy data paradigm, integrating tidyverse with tidybulk, tidySCE and tidyHeatmap.

Learning objectives

  • Recall the key concepts of RNA sequencing count data analysis
  • Apply the concepts to publicly available data
  • Create plots that summarise the information content of the data and analysis results