Skip to contents


Brings Seurat to the tidyverse!


Please also have a look at

  • tidyseurat for tidy single-cell RNA sequencing analysis
  • tidySummarizedExperiment for tidy bulk RNA sequencing analysis
  • tidybulk for tidy bulk RNA-seq analysis
  • nanny for tidy high-level data analysis and manipulation
  • tidygate for adding custom gate information to your tibble
  • tidyHeatmap for heatmaps produced with tidy principles
visual cue
visual cue


tidyseurat provides a bridge between the Seurat single-cell package (Butler et al. 2018; Stuart et al. 2019) and the tidyverse (Wickham et al. 2019). It creates an invisible layer that enables viewing the Seurat object as a tidyverse tibble, and provides Seurat-compatible dplyr, tidyr, ggplot and plotly functions.

Functions/utilities available

Seurat-compatible Functions Description
tidyverse Packages Description
dplyr All dplyr APIs like for any tibble
tidyr All tidyr APIs like for any tibble
ggplot2 ggplot like for any tibble
plotly plot_ly like for any tibble
Utilities Description
tidy Add tidyseurat invisible layer over a Seurat object
as_tibble Convert cell-wise information to a tbl_df
join_features Add feature-wise information, returns a tbl_df
aggregate_cells Aggregate cell gene-transcription abundance as pseudobulk tissue




From Github (development)


Create tidyseurat, the best of both worlds!

This is a seurat object but it is evaluated as tibble. So it is fully compatible both with Seurat and tidyverse APIs.

pbmc_small = SeuratObject::pbmc_small

It looks like a tibble

## # A Seurat-tibble abstraction: 80 × 15
## # Features=230 | Cells=80 | Active assay=RNA | Assays=RNA
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 8 more variables: RNA_snn_res.1 <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>,
## #   PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

But it is a Seurat object after all

## $RNA
## Assay data with 230 features for 80 cells
## Top 10 variable features:

Preliminary plots

Set colours and theme for plots.

# Use colourblind-friendly colours
friendly_cols <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99", "#999933", "#882255", "#661100", "#6699CC")

# Set theme
my_theme <-
    scale_fill_manual(values = friendly_cols),
    scale_color_manual(values = friendly_cols),
    theme_bw() +
        panel.border = element_blank(),
        axis.line = element_line(),
        panel.grid.major = element_line(size = 0.2),
        panel.grid.minor = element_line(size = 0.1),
        text = element_text(size = 12),
        legend.position = "bottom",
        aspect.ratio = 1,
        strip.background = element_blank(),
        axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
        axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))

We can treat pbmc_small effectively as a normal tibble for plotting.

Here we plot number of features per cell.

pbmc_small %>%
  ggplot(aes(nFeature_RNA, fill = groups)) +
  geom_histogram() +

Here we plot total features per cell.

pbmc_small %>%
  ggplot(aes(groups, nCount_RNA, fill = groups)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.1) +

Here we plot abundance of two features for each group.

pbmc_small %>%
  join_features(features = c("HLA-DRA", "LYZ")) %>%
  ggplot(aes(groups, .abundance_RNA + 1, fill = groups)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(aes(size = nCount_RNA), alpha = 0.5, width = 0.2) +
  scale_y_log10() +

Preprocess the dataset

Also you can treat the object as Seurat object and proceed with data processing.

pbmc_small_pca <-
  pbmc_small %>%
  SCTransform(verbose = FALSE) %>%
  FindVariableFeatures(verbose = FALSE) %>%
  RunPCA(verbose = FALSE)

## # A Seurat-tibble abstraction: 80 × 17
## # Features=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 10 more variables: RNA_snn_res.1 <fct>, nCount_SCT <dbl>,
## #   nFeature_SCT <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
## #   PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

If a tool is not included in the tidyseurat collection, we can use as_tibble to permanently convert tidyseurat into tibble.

pbmc_small_pca %>%
  as_tibble() %>%
  select(contains("PC"), everything()) %>%
  GGally::ggpairs(columns = 1:5, ggplot2::aes(colour = groups)) +

Identify clusters

We proceed with cluster identification with Seurat.

pbmc_small_cluster <-
  pbmc_small_pca %>%
  FindNeighbors(verbose = FALSE) %>%
  FindClusters(method = "igraph", verbose = FALSE)

## # A Seurat-tibble abstraction: 80 × 19
## # Features=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 12 more variables: RNA_snn_res.1 <fct>, nCount_SCT <dbl>,
## #   nFeature_SCT <int>, SCT_snn_res.0.8 <fct>, seurat_clusters <fct>,
## #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
## #   tSNE_2 <dbl>

Now we can interrogate the object as if it was a regular tibble data frame.

pbmc_small_cluster %>%
  count(groups, seurat_clusters)
## # A tibble: 6 × 3
##   groups seurat_clusters     n
##   <chr>  <fct>           <int>
## 1 g1     0                  23
## 2 g1     1                  17
## 3 g1     2                   4
## 4 g2     0                  17
## 5 g2     1                  13
## 6 g2     2                   6

We can identify cluster markers using Seurat.

# Identify top 10 markers per cluster
markers <-
  pbmc_small_cluster %>%
  FindAllMarkers(only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25) %>%
  group_by(cluster) %>%
  top_n(10, avg_log2FC)

# Plot heatmap
pbmc_small_cluster %>%
    features = markers$gene,
    group.colors = friendly_cols

Reduce dimensions

We can calculate the first 3 UMAP dimensions using the Seurat framework.

pbmc_small_UMAP <-
  pbmc_small_cluster %>%
  RunUMAP(reduction = "pca", dims = 1:15, n.components = 3L)

And we can plot them using 3D plot using plotly.

pbmc_small_UMAP %>%
    x = ~`UMAP_1`,
    y = ~`UMAP_2`,
    z = ~`UMAP_3`,
    color = ~seurat_clusters,
    colors = friendly_cols[1:4]
screenshot plotly
screenshot plotly

Cell type prediction

We can infer cell type identities using SingleR (Aran et al. 2019) and manipulate the output using tidyverse.

# Get cell type reference data
blueprint <- celldex::BlueprintEncodeData()

# Infer cell identities
cell_type_df <-
  GetAssayData(pbmc_small_UMAP, slot = 'counts', assay = "SCT") %>%
  log1p() %>%
  Matrix::Matrix(sparse = TRUE) %>%
    ref = blueprint,
    labels = blueprint$label.main,
    method = "single"
  ) %>% %>%
  as_tibble(rownames = "cell") %>%
  select(cell, first.labels)
# Join UMAP and cell type info
pbmc_small_cell_type <-
  pbmc_small_UMAP %>%
  left_join(cell_type_df, by = "cell")

# Reorder columns
pbmc_small_cell_type %>%
  select(cell, first.labels, everything())

We can easily summarise the results. For example, we can see how cell type classification overlaps with cluster classification.

pbmc_small_cell_type %>%
  count(seurat_clusters, first.labels)

We can easily reshape the data for building information-rich faceted plots.

pbmc_small_cell_type %>%

  # Reshape and add classifier column
    cols = c(seurat_clusters, first.labels),
    names_to = "classifier", values_to = "label"
  ) %>%

  # UMAP plots for cell type and cluster
  ggplot(aes(UMAP_1, UMAP_2, color = label)) +
  geom_point() +
  facet_wrap(~classifier) +

We can easily plot gene correlation per cell category, adding multi-layer annotations.

pbmc_small_cell_type %>%

  # Add some mitochondrial abundance values
  mutate(mitochondrial = rnorm(n())) %>%

  # Plot correlation
  join_features(features = c("CST3", "LYZ"), shape = "wide") %>%
  ggplot(aes(CST3 + 1, LYZ + 1, color = groups, size = mitochondrial)) +
  geom_point() +
  facet_wrap(~first.labels, scales = "free") +
  scale_x_log10() +
  scale_y_log10() +

Nested analyses

A powerful tool we can use with tidyseurat is nest. We can easily perform independent analyses on subsets of the dataset. First we classify cell types in lymphoid and myeloid; then, nest based on the new classification

pbmc_small_nested <-
  pbmc_small_cell_type %>%
  filter(first.labels != "Erythrocytes") %>%
  mutate(cell_class = if_else(`first.labels` %in% c("Macrophages", "Monocytes"), "myeloid", "lymphoid")) %>%
  nest(data = -cell_class)


Now we can independently for the lymphoid and myeloid subsets (i) find variable features, (ii) reduce dimensions, and (iii) cluster using both tidyverse and Seurat seamlessly.

pbmc_small_nested_reanalysed <-
  pbmc_small_nested %>%
  mutate(data = map(
    data, ~ .x %>%
      FindVariableFeatures(verbose = FALSE) %>%
      RunPCA(npcs = 10, verbose = FALSE) %>%
      FindNeighbors(verbose = FALSE) %>%
      FindClusters(method = "igraph", verbose = FALSE) %>%
      RunUMAP(reduction = "pca", dims = 1:10, n.components = 3L, verbose = FALSE)


Now we can unnest and plot the new classification.

pbmc_small_nested_reanalysed %>%

  # Convert to tibble otherwise Seurat drops reduced dimensions when unifying data sets.
  mutate(data = map(data, ~ .x %>% as_tibble())) %>%
  unnest(data) %>%

  # Define unique clusters
  unite("cluster", c(cell_class, seurat_clusters), remove = FALSE) %>%

  # Plotting
  ggplot(aes(UMAP_1, UMAP_2, color = cluster)) +
  geom_point() +
  facet_wrap(~cell_class) +

Aggregating cells

Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.

In tidyseurat, cell aggregation can be achieved using the aggregate_cells function.

pbmc_small %>%
  aggregate_cells(groups, assays = "RNA")

Session Info

## R Under development (unstable) (2024-02-28 r85999)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/ 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/;  LAPACK version 3.10.0
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## time zone: UTC
## tzcode source: system (glibc)
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
##  [1] tidyseurat_0.8.1   ttservice_0.4.0    Seurat_5.0.2       SeuratObject_5.0.1
##  [5] sp_2.1-3           ggplot2_3.5.0      magrittr_2.0.3     purrr_1.0.2       
##  [9] tidyr_1.3.1        dplyr_1.1.4        knitr_1.45        
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     jsonlite_1.8.8         spatstat.utils_3.0-4  
##   [4] farver_2.1.1           rmarkdown_2.25         fs_1.6.3              
##   [7] ragg_1.2.7             vctrs_0.6.5            ROCR_1.0-11           
##  [10] memoise_2.0.1          spatstat.explore_3.2-6 htmltools_0.5.7       
##  [13] sass_0.4.8             sctransform_0.4.1      parallelly_1.37.1     
##  [16] KernSmooth_2.23-22     bslib_0.6.1            htmlwidgets_1.6.4     
##  [19] desc_1.4.3             ica_1.0-3              plyr_1.8.9            
##  [22] plotly_4.10.4          zoo_1.8-12             cachem_1.0.8          
##  [25] igraph_2.0.2           mime_0.12              lifecycle_1.0.4       
##  [28] pkgconfig_2.0.3        Matrix_1.6-5           R6_2.5.1              
##  [31] fastmap_1.1.1          fitdistrplus_1.1-11    future_1.33.1         
##  [34] shiny_1.8.0            digest_0.6.34          GGally_2.2.1          
##  [37] colorspace_2.1-0       patchwork_1.2.0        tensor_1.5            
##  [40] RSpectra_0.16-1        irlba_2.3.5.1          textshaping_0.3.7     
##  [43] labeling_0.4.3         progressr_0.14.0       fansi_1.0.6           
##  [46] spatstat.sparse_3.0-3  httr_1.4.7             polyclip_1.10-6       
##  [49] abind_1.4-5            compiler_4.4.0         withr_3.0.0           
##  [52] ggstats_0.5.1          fastDummies_1.7.3      highr_0.10            
##  [55] MASS_7.3-60.2          tools_4.4.0            lmtest_0.9-40         
##  [58] httpuv_1.6.14          future.apply_1.11.1    goftest_1.2-3         
##  [61] glue_1.7.0             nlme_3.1-164           promises_1.2.1        
##  [64] grid_4.4.0             Rtsne_0.17             cluster_2.1.6         
##  [67] reshape2_1.4.4         generics_0.1.3         gtable_0.3.4          
##  [70] spatstat.data_3.0-4    data.table_1.15.2      utf8_1.2.4            
##  [73] spatstat.geom_3.2-9    RcppAnnoy_0.0.22       ggrepel_0.9.5         
##  [76] RANN_2.6.1             pillar_1.9.0           stringr_1.5.1         
##  [79] spam_2.10-0            RcppHNSW_0.6.0         later_1.3.2           
##  [82] splines_4.4.0          lattice_0.22-5         survival_3.5-8        
##  [85] deldir_2.0-4           tidyselect_1.2.0       miniUI_0.1.1.1        
##  [88] pbapply_1.7-2          gridExtra_2.3          scattermore_1.2       
##  [91] xfun_0.42              matrixStats_1.2.0      stringi_1.8.3         
##  [94] lazyeval_0.2.2         yaml_2.3.8             evaluate_0.23         
##  [97] codetools_0.2-19       tibble_3.2.1           cli_3.6.2             
## [100] uwot_0.1.16            xtable_1.8-4           reticulate_1.35.0     
## [103] systemfonts_1.0.5      munsell_0.5.0          jquerylib_0.1.4       
## [106] Rcpp_1.0.12            globals_0.16.2         spatstat.random_3.2-3 
## [109] png_0.1-8              parallel_4.4.0         ellipsis_0.3.2        
## [112] pkgdown_2.0.7          dotCall64_1.1-1        listenv_0.9.1         
## [115] viridisLite_0.4.2      scales_1.3.0           ggridges_0.5.6        
## [118] leiden_0.4.3.1         rlang_1.1.3            cowplot_1.1.3


Aran, Dvir, Agnieszka P Looney, Leqian Liu, Esther Wu, Valerie Fong, Austin Hsu, Suzanna Chak, et al. 2019. “Reference-Based Analysis of Lung Single-Cell Sequencing Reveals a Transitional Profibrotic Macrophage.” Nature Immunology 20 (2): 163–72.
Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. 2018. “Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species.” Nature Biotechnology 36 (5): 411–20.
Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. 2019. “Comprehensive Integration of Single-Cell Data.” Cell 177 (7): 1888–1902.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686.