R/methods.R
ensembl_to_symbol-methods.Rd
ensembl_to_symbol() takes as input a `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with the additional transcript symbol column
ensembl_to_symbol(.data, .ensembl, action = "add")
# S4 method for spec_tbl_df
ensembl_to_symbol(.data, .ensembl, action = "add")
# S4 method for tbl_df
ensembl_to_symbol(.data, .ensembl, action = "add")
# S4 method for tidybulk
ensembl_to_symbol(.data, .ensembl, action = "add")
a `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
A character string. The column that is represents ensembl gene id
A character string. Whether to join the new information to the input tbl (add), or just get the non-redundant tbl with the new information (get).
A consistent object (to the input) including additional columns for transcript symbol
A consistent object (to the input) including additional columns for transcript symbol
A consistent object (to the input) including additional columns for transcript symbol
A consistent object (to the input) including additional columns for transcript symbol
This is useful since different resources use ensembl IDs while others use gene symbol IDs. At the moment this work for human (genes and transcripts) and mouse (genes) data.
library(dplyr)
# This function was designed for data.frame
# Convert from SummarizedExperiment for this example. It is NOT reccomended.
tidybulk::counts_SE |> tidybulk() |> as_tibble() |> ensembl_to_symbol(.feature)
#> # A tibble: 408,624 × 10
#> .feature .sample count Cell.…¹ time condi…² batch facto…³ trans…⁴ ref_g…⁵
#> <chr> <chr> <dbl> <fct> <fct> <lgl> <fct> <lgl> <chr> <chr>
#> 1 A1BG SRR1740034 153 b_cell 0 d TRUE 0 TRUE NA NA
#> 2 A1BG-AS1 SRR1740034 83 b_cell 0 d TRUE 0 TRUE NA NA
#> 3 AAAS SRR1740034 868 b_cell 0 d TRUE 0 TRUE NA NA
#> 4 AACS SRR1740034 222 b_cell 0 d TRUE 0 TRUE NA NA
#> 5 AAGAB SRR1740034 590 b_cell 0 d TRUE 0 TRUE NA NA
#> 6 AAMDC SRR1740034 48 b_cell 0 d TRUE 0 TRUE NA NA
#> 7 AAMP SRR1740034 1257 b_cell 0 d TRUE 0 TRUE NA NA
#> 8 AANAT SRR1740034 284 b_cell 0 d TRUE 0 TRUE NA NA
#> 9 AAR2 SRR1740034 379 b_cell 0 d TRUE 0 TRUE NA NA
#> 10 AARS2 SRR1740034 685 b_cell 0 d TRUE 0 TRUE NA NA
#> # … with 408,614 more rows, and abbreviated variable names ¹Cell.type,
#> # ²condition, ³factor_of_interest, ⁴transcript, ⁵ref_genome