ensembl_to_symbol() takes as input a `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with the additional transcript symbol column

ensembl_to_symbol(.data, .ensembl, action = "add")

# S4 method for spec_tbl_df
ensembl_to_symbol(.data, .ensembl, action = "add")

# S4 method for tbl_df
ensembl_to_symbol(.data, .ensembl, action = "add")

# S4 method for tidybulk
ensembl_to_symbol(.data, .ensembl, action = "add")

Arguments

.data

a `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))

.ensembl

A character string. The column that is represents ensembl gene id

action

A character string. Whether to join the new information to the input tbl (add), or just get the non-redundant tbl with the new information (get).

Value

A consistent object (to the input) including additional columns for transcript symbol

A consistent object (to the input) including additional columns for transcript symbol

A consistent object (to the input) including additional columns for transcript symbol

A consistent object (to the input) including additional columns for transcript symbol

Details

[Questioning]

This is useful since different resources use ensembl IDs while others use gene symbol IDs. At the moment this work for human (genes and transcripts) and mouse (genes) data.

Examples


library(dplyr)

# This function was designed for data.frame
# Convert from SummarizedExperiment for this example. It is NOT reccomended.

tidybulk::counts_SE |> tidybulk() |> as_tibble() |> ensembl_to_symbol(.feature)
#> # A tibble: 408,624 × 10
#>    .feature .sample    count Cell.…¹ time  condi…² batch facto…³ trans…⁴ ref_g…⁵
#>    <chr>    <chr>      <dbl> <fct>   <fct> <lgl>   <fct> <lgl>   <chr>   <chr>  
#>  1 A1BG     SRR1740034   153 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  2 A1BG-AS1 SRR1740034    83 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  3 AAAS     SRR1740034   868 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  4 AACS     SRR1740034   222 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  5 AAGAB    SRR1740034   590 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  6 AAMDC    SRR1740034    48 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  7 AAMP     SRR1740034  1257 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  8 AANAT    SRR1740034   284 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#>  9 AAR2     SRR1740034   379 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#> 10 AARS2    SRR1740034   685 b_cell  0 d   TRUE    0     TRUE    NA      NA     
#> # … with 408,614 more rows, and abbreviated variable names ¹​Cell.type,
#> #   ²​condition, ³​factor_of_interest, ⁴​transcript, ⁵​ref_genome