Add transcript symbol column from ensembl id for human and mouse data

ensembl_to_symbol() takes as input a `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with the additional transcript symbol column

ensembl_to_symbol(.data, .ensembl, action = "add")

# S4 method for class 'spec_tbl_df'
ensembl_to_symbol(.data, .ensembl, action = "add")

# S4 method for class 'tbl_df'
ensembl_to_symbol(.data, .ensembl, action = "add")

# S4 method for class 'tidybulk'
ensembl_to_symbol(.data, .ensembl, action = "add")

Arguments

.data: a `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
.ensembl: A character string. The column that is represents ensembl gene id
action: A character string. Whether to join the new information to the input tbl (add), or just get the non-redundant tbl with the new information (get).

Value

A consistent object (to the input) including additional columns for transcript symbol

Details

This is useful since different resources use ensembl IDs while others use gene symbol IDs. At the moment this work for human (genes and transcripts) and mouse (genes) data.

Examples




# This function was designed for data.frame
# Convert from SummarizedExperiment for this example. It is NOT reccomended.

tidybulk::se_mini |> tidybulk() |> as_tibble() |> ensembl_to_symbol(.feature)
#> # A tibble: 2,635 × 11
#>    .feature .sample    count Cell.type time  condition  days  dead entrez
#>    <chr>    <chr>      <dbl> <chr>     <chr> <lgl>     <dbl> <dbl> <chr> 
#>  1 ABCB4    SRR1740034  1035 b_cell    0 d   TRUE          1     1 5244  
#>  2 ABCB9    SRR1740034    45 b_cell    0 d   TRUE          1     1 23457 
#>  3 ACAP1    SRR1740034  7151 b_cell    0 d   TRUE          1     1 9744  
#>  4 ACHE     SRR1740034     2 b_cell    0 d   TRUE          1     1 43    
#>  5 ACP5     SRR1740034  2278 b_cell    0 d   TRUE          1     1 54    
#>  6 ADAM28   SRR1740034 11156 b_cell    0 d   TRUE          1     1 10863 
#>  7 ADAMDEC1 SRR1740034    72 b_cell    0 d   TRUE          1     1 27299 
#>  8 ADAMTS3  SRR1740034     0 b_cell    0 d   TRUE          1     1 9508  
#>  9 ADRB2    SRR1740034   298 b_cell    0 d   TRUE          1     1 154   
#> 10 AIF1     SRR1740034     8 b_cell    0 d   TRUE          1     1 199   
#> # ℹ 2,625 more rows
#> # ℹ 2 more variables: transcript <chr>, ref_genome <chr>