Get cell type proportions from samples — deconvolve

deconvolve_cellularity() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with the estimated cell type abundance for each sample

deconvolve_cellularity(
  .data,
  .abundance = NULL,
  reference = NULL,
  method = "cibersort",
  prefix = "",
  feature_column = NULL,
  ...
)

# S4 method for class 'SummarizedExperiment'
deconvolve_cellularity(
  .data,
  .abundance = NULL,
  reference = NULL,
  method = "cibersort",
  prefix = "",
  feature_column = NULL,
  ...
)

# S4 method for class 'RangedSummarizedExperiment'
deconvolve_cellularity(
  .data,
  .abundance = NULL,
  reference = NULL,
  method = "cibersort",
  prefix = "",
  feature_column = NULL,
  ...
)

Arguments

.data: A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
.abundance: The name of the transcript/gene abundance column
reference: A data frame. The methods cibersort and llsr can accept a custom rectangular dataframe with genes as rows names, cell types as column names and gene-transcript abundance as values. For exampler tidybulk::X_cibersort. The transcript/cell_type data frame of integer transcript abundance. If NULL, the default reference for each algorithm will be used. For llsr will be LM22.
method: A character string. The method to be used. Available methods: "cibersort", "llsr", "epic", "mcp_counter", "quantiseq", "xcell". If a vector is provided, an error will be thrown. Default is all available methods.
prefix: A character string. The prefix you would like to add to the result columns. It is useful if you want to reshape data.
feature_column: A character string. The name of a column in rowData to use as feature names instead of rownames. If NULL (default), rownames are used.
...: Further parameters passed to the function Cibersort

Value

A consistent object (to the input) including additional columns for each cell type estimated

A `SummarizedExperiment` object

Details

`r lifecycle::badge("maturing")`

This function infers the cell type composition of our samples (with the algorithm Cibersort; Newman et al., 10.1038/nmeth.3337).

Underlying method: CIBERSORT(Y = data, X = reference, ...)

References

Mangiola, S., Molania, R., Dong, R., Doyle, M. A., & Papenfuss, A. T. (2021). tidybulk: an R tidy framework for modular transcriptomic data analysis. Genome Biology, 22(1), 42. doi:10.1186/s13059-020-02233-7

Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., Hoang, C. D., Diehn, M., & Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12(5), 453-457. doi:10.1038/nmeth.3337

Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E., & Gfeller, D. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife, 6, e26476. doi:10.7554/eLife.26476

Examples

## Load airway dataset for examples

  data('airway', package = 'airway')
  # Ensure a 'condition' column exists for examples expecting it

    SummarizedExperiment::colData(airway)$condition <- SummarizedExperiment::colData(airway)$dex




# Map ENSEMBL rownames to SYMBOLs for compatibility with reference signatures
 
library(tidySummarizedExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: Seqinfo
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#> 
#>     rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     anyMissing, rowMedians
#> 
#> Attaching package: ‘tidySummarizedExperiment’
#> The following object is masked from ‘package:generics’:
#> 
#>     tidy
library(org.Hs.eg.db)
#> Loading required package: AnnotationDbi
#> 
#> Attaching package: ‘AnnotationDbi’
#> The following object is masked from ‘package:dplyr’:
#> 
#>     select

if (FALSE) { # \dontrun{
airway |>
mutate(gene_symbol = AnnotationDbi::mapIds(
  org.Hs.eg.db::org.Hs.eg.db,
  keys = .feature,
  keytype = 'ENSEMBL',
  column = 'SYMBOL',
  multiVals = 'first'
)) |>
mutate(gene_symbol = ifelse(is.na(gene_symbol) | gene_symbol == '', .feature, gene_symbol)) |>
deconvolve_cellularity(feature_column = 'gene_symbol', cores = 1)
} # }

# Alternatively, if you already have a feature column in rowData
# se_with_features <- airway
# rowData(se_with_features)$gene_symbol <- rownames(se_with_features)
# se_with_features |> deconvolve_cellularity(feature_column = 'gene_symbol', cores = 1)