R/methods.R
, R/methods_SE.R
remove_redundancy-methods.Rd
remove_redundancy() takes as input A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) for correlation method or | <DIMENSION 1> | <DIMENSION 2> | <...> | for reduced_dimensions method, and returns a consistent object (to the input) with dropped elements (e.g., samples).
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column,
Dim_b_column,
log_transform = NULL
)
# S4 method for spec_tbl_df
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
# S4 method for tbl_df
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
# S4 method for tidybulk
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
# S4 method for SummarizedExperiment
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
# S4 method for RangedSummarizedExperiment
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
A `tbl` (with at least three columns for sample, feature and transcript abundance) or `SummarizedExperiment` (more convenient if abstracted to tibble with library(tidySummarizedExperiment))
The name of the element column (normally samples).
The name of the feature column (normally transcripts/genes)
The name of the column including the numerical value the clustering is based on (normally transcript abundance)
A character string. The method to use, correlation and reduced_dimensions are available. The latter eliminates one of the most proximar pairs of samples in PCA reduced dimensions.
A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column
A real number between 0 and 1. For correlation based calculation.
An integer. How many top genes to select for correlation based method
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity
A character string. For reduced_dimension based calculation. The column of one principal component
A character string. For reduced_dimension based calculation. The column of another principal component
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)
A tbl object with with dropped redundant elements (e.g., samples).
A tbl object with with dropped redundant elements (e.g., samples).
A tbl object with with dropped redundant elements (e.g., samples).
A tbl object with with dropped redundant elements (e.g., samples).
A `SummarizedExperiment` object
A `SummarizedExperiment` object
tidybulk::se_mini |>
identify_abundant() |>
remove_redundancy(
.element = sample,
.feature = transcript,
.abundance = count,
method = "correlation"
)
#> No group or design set. Assuming all samples belong to one group.
#> Getting the 182 most variable genes
#> class: SummarizedExperiment
#> dim: 527 4
#> metadata(0):
#> assays(1): count
#> rownames(527): ABCB4 ABCB9 ... ZNF324 ZNF442
#> rowData names(2): entrez .abundant
#> colnames(4): SRR1740035 SRR1740043 SRR1740058 SRR1740067
#> colData names(5): Cell.type time condition days dead
counts.MDS =
tidybulk::se_mini |>
identify_abundant() |>
reduce_dimensions( method="MDS", .dims = 3)
#> No group or design set. Assuming all samples belong to one group.
#> Getting the 182 most variable genes
#> tidybulk says: to access the raw results do `attr(..., "internals")$MDS`
remove_redundancy(
counts.MDS,
Dim_a_column = `Dim1`,
Dim_b_column = `Dim2`,
.element = sample,
method = "reduced_dimensions"
)
#> class: SummarizedExperiment
#> dim: 527 3
#> metadata(0):
#> assays(1): count
#> rownames(527): ABCB4 ABCB9 ... ZNF324 ZNF442
#> rowData names(2): entrez .abundant
#> colnames(3): SRR1740035 SRR1740058 SRR1740067
#> colData names(8): Cell.type time ... Dim2 Dim3