arrange()
order the rows of a data frame rows by the values of selected
columns.
Unlike other dplyr verbs, arrange()
largely ignores grouping; you
need to explicit mention grouping variables (or use by_group=TRUE
)
in order to group by them, and functions of variables are evaluated
once per data frame, not once per group.
filter()
retains the rows where the conditions you provide a TRUE
. Note
that, unlike base subsetting with [
, rows where the condition evaluates
to NA
are dropped.
Most data operations are done on groups defined by variables.
group_by()
takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup()
removes grouping.
summarise()
creates a new data frame. It will have one (or more) rows for
each combination of grouping variables; if there are no grouping variables,
the output will have a single row summarising all observations in the input.
It will contain one column for each grouping variable and one column
for each of the summary statistics that you have specified.
summarise()
and summarize()
are synonyms.
mutate()
adds new variables and preserves existing ones;
transmute()
adds new variables and drops existing ones.
New variables overwrite existing variables of the same name.
Variables can be removed by setting their value to NULL
.
Rename individual variables using new_name=old_name
syntax.
See this repository for alternative ways to perform row-wise operations.
slice()
lets you index rows by their (integer) locations. It allows you
to select, remove, and duplicate rows. It is accompanied by a number of
helpers for common use cases:
slice_head()
and slice_tail()
select the first or last rows.
slice_sample()
randomly selects rows.
slice_min()
and slice_max()
select rows with highest or lowest values
of a variable.
If .data
is a grouped_df, the operation will be performed on each group,
so that (e.g.) slice_head(df, n=5)
will select the first five rows in
each group.
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. a:f
selects all columns from a
on the left to f
on the
right). You can also use predicate functions like is.numeric to select
variables based on their properties.
sample_n()
and sample_frac()
have been superseded in favour of
slice_sample()
. While they will not be deprecated in the near future,
retirement means that we will only perform critical bug fixes, so we recommend
moving to the newer alternative.
These functions were superseded because we realised it was more convenient to
have two mutually exclusive arguments to one function, rather than two
separate functions. This also made it to clean up a few other smaller
design issues with sample_n()
/sample_frac
:
The connection to slice()
was not obvious.
The name of the first argument, tbl
, is inconsistent with other
single table verbs which use .data
.
The size
argument uses tidy evaluation, which is surprising and
undocumented.
It was easier to remove the deprecated .env
argument.
...
was in a suboptimal position.
count()
lets you quickly count the unique values of one or more variables:
df %>% count(a, b)
is roughly equivalent to
df %>% group_by(a, b) %>% summarise(n=n())
.
count()
is paired with tally()
, a lower-level helper that is equivalent
to df %>% summarise(n=n())
. Supply wt
to perform weighted counts,
switching the summary from n=n()
to n=sum(wt)
.
add_count()
are add_tally()
are equivalents to count()
and tally()
but use mutate()
instead of summarise()
so that they add a new column
with group-wise counts.
pull()
is similar to $
. It's mostly useful because it looks a little
nicer in pipes, it also works with remote data frames, and it can optionally
name the output.
bind_rows(..., .id = NULL, add.cell.ids = NULL)
bind_cols(..., .id = NULL)
For use by methods.
Data frame identifier.
When .id
is supplied, a new column of identifiers is
created to link each row to its original data frame. The labels
are taken from the named arguments to bind_rows()
. When a
list of data frames is supplied, the labels are taken from the
names of the list. If no names are found a numeric sequence is
used instead.
from SingleCellExperiment 3.0 A character vector of length(x=c(x, y)). Appends the corresponding values to the start of each objects' cell names.
If TRUE, will sort first by grouping variable. Applies to grouped data frames only.
If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values. (See dplyr)
when FALSE
(the default), the grouping structure
is recalculated based on the resulting data, otherwise it is kept as is.
When FALSE
, the default, group_by()
will
override existing groups. To add to the existing groups, use
.add=TRUE
.
This argument was previously called add
, but that prevented
creating a new grouping variable called add
, and conflicts with
our naming conventions.
Input data frame.
tbls to join. (See dplyr)
A character vector of variables to join by. (See dplyr)
If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. (See dplyr)
If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2. (See dplyr)
A data.frame.
<tidy-select
>
For sample_n()
, the number of rows to select.
For sample_frac()
, the fraction of rows to select.
If tbl
is grouped, size
applies to each group.
Sample with or without replacement?
<tidy-select
> Sampling weights.
This must evaluate to a vector of non-negative numbers the same length as
the input. Weights are automatically standardised to sum to 1.
DEPRECATED.
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).
<data-masking
> Frequency weights.
Can be NULL
or a variable:
If NULL
(the default), counts the number of rows in each group.
If a variable, computes sum(wt)
for each group.
If TRUE
, will show the largest groups at the top.
For count()
: if FALSE
will include counts for empty groups
(i.e. for levels of factors that don't exist in the data). Deprecated in
add_count()
since it didn't actually affect the output.
An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as var
.
An object of the same type as .data
.
All rows appear in the output, but (usually) in a different place.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
A tidySingleCellExperiment object
An object of the same type as .data
.
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if .preserve
is not TRUE
).
Data frame attributes are preserved.
A grouped data frame, unless the combination of
...
and add
yields a non empty set of grouping columns, a
regular (ungrouped) data frame otherwise.
An object usually of the same type as .data
.
The rows come from the underlying group_keys()
.
The columns are a combination of the grouping keys and the summary expressions that you provide.
If x
is grouped by more than one variable, the output will be another
grouped_df with the right-most group removed.
If x
is grouped by one variable, or is not grouped, the output will
be a tibble.
Data frame attributes are not preserved, because summarise()
fundamentally creates a new data frame.
An object of the same type as .data
.
For mutate()
:
Rows are not affected.
Existing columns will be preserved unless explicitly modified.
New columns will be added to the right of existing columns.
Columns given value NULL
will be removed
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
For transmute()
:
Rows are not affected.
Apart from grouping variables, existing columns will be remove unless explicitly kept.
Column order matches order of expressions.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
An object of the same type as .data
.
Rows are not affected.
Column names are changed; column order is preserved
Data frame attributes are preserved.
Groups are updated to reflect new names.
A tbl
A tbl
A tidySingleCellExperiment object
A tidySingleCellExperiment object
A tidySingleCellExperiment object
A tidySingleCellExperiment object
An object of the same type as .data
. The output has the following
properties:
Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
An object of the same type as .data
. The output has the following
properties:
Rows are not affected.
Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if new_name=old_name
form is used.
Data frame attributes are preserved.
Groups are maintained; you can't select off grouping variables.
A tidySingleCellExperiment object
An object of the same type as .data
. count()
and add_count()
group transiently, so the output has the same groups as the input.
A vector the same size as .data
.
The sort order for character vectors will depend on the collating sequence
of the locale in use: see locales()
.
Unlike base sorting with sort()
, NA
are:
always sorted to the end for local data, even when wrapped with desc()
.
treated differently for remote data, depending on the backend.
dplyr is not yet smart enough to optimise filtering optimisation
on grouped datasets that don't need grouped calculations. For this reason,
filtering is often considerably faster on ungroup()
ed data.
rowwise()
is used for the results of do()
when you
create list-variables. It is also useful to support arbitrary
complex operations that need to be applied to each row.
Currently, rowwise grouping only works with data frames. Its
main impact is to allow you to work with list-variables in
summarise()
and mutate()
without having to
use [[1]]
. This makes summarise()
on a rowwise tbl
effectively equivalent to plyr::ldply()
.
Slice does not work with relational databases because they have no
intrinsic notion of row order. If you want to perform the equivalent
operation, use filter()
and row_number()
.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
slice()
: no methods found
.
slice_head()
: no methods found
.
slice_tail()
: no methods found
.
slice_min()
: no methods found
.
slice_max()
: no methods found
.
slice_sample()
: no methods found
.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found .
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found .
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
The former keeps rows with mass
greater than the global average
whereas the latter keeps rows with mass
greater than the gender
average.
Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:
With the grouped equivalent:
The former normalises mass
by the global average whereas the
latter normalises by the averages within gender levels.
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in mutate()
.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.
Use the three scoped variants (rename_all()
, rename_if()
, rename_at()
)
to renaming a set of variables with a function.
filter_all()
, filter_if()
and filter_at()
.
`%>%` <- magrittr::`%>%`
pbmc_small %>%
arrange(nFeature_RNA)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 CATG… SeuratPro… 51 26 0 A g2
#> 2 GGCA… SeuratPro… 172 29 0 A g1
#> 3 AGTC… SeuratPro… 157 29 0 A g1
#> 4 GACG… SeuratPro… 202 30 0 A g2
#> 5 GGAA… SeuratPro… 150 30 0 A g2
#> 6 AGGT… SeuratPro… 62 31 0 A g2
#> 7 CTTC… SeuratPro… 41 32 0 A g2
#> 8 GTAA… SeuratPro… 67 33 0 A g2
#> 9 GTCA… SeuratPro… 210 33 0 A g2
#> 10 TGGT… SeuratPro… 64 36 0 A g1
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
pbmc_small %>%
distinct(groups)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 2 × 1
#> groups
#> <chr>
#> 1 g2
#> 2 g1
`%>%` <- magrittr::`%>%`
pbmc_small %>%
filter(groups == "g1")
#> # A SingleCellExperiment-tibble abstraction: 44 × 17
#> # Features=230 | Cells=44 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 CATG… SeuratPro… 85 52 0 A g1
#> 2 TCTG… SeuratPro… 70 48 0 A g1
#> 3 TGGT… SeuratPro… 64 36 0 A g1
#> 4 GCAG… SeuratPro… 72 45 0 A g1
#> 5 GATA… SeuratPro… 52 36 0 A g1
#> 6 AATG… SeuratPro… 100 41 0 A g1
#> 7 AGAG… SeuratPro… 191 61 0 A g1
#> 8 CTAA… SeuratPro… 168 44 0 A g1
#> 9 TTGG… SeuratPro… 135 45 0 A g1
#> 10 CATC… SeuratPro… 79 43 0 A g1
#> # ℹ 34 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
# Learn more in ?dplyr_tidy_eval
`%>%` <- magrittr::`%>%`
pbmc_small %>%
group_by(groups)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 80 × 31
#> # Groups: groups [2]
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGC… SeuratPro… 70 47 0 A g2
#> 2 CATG… SeuratPro… 85 52 0 A g1
#> 3 GAAC… SeuratPro… 87 50 1 B g2
#> 4 TGAC… SeuratPro… 127 56 0 A g2
#> 5 AGTC… SeuratPro… 173 53 0 A g2
#> 6 TCTG… SeuratPro… 70 48 0 A g1
#> 7 TGGT… SeuratPro… 64 36 0 A g1
#> 8 GCAG… SeuratPro… 72 45 0 A g1
#> 9 GATA… SeuratPro… 52 36 0 A g1
#> 10 AATG… SeuratPro… 100 41 0 A g1
#> # ℹ 70 more rows
#> # ℹ 24 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, PC_6 <dbl>,
#> # PC_7 <dbl>, PC_8 <dbl>, PC_9 <dbl>, PC_10 <dbl>, PC_11 <dbl>, PC_12 <dbl>,
#> # PC_13 <dbl>, PC_14 <dbl>, PC_15 <dbl>, PC_16 <dbl>, PC_17 <dbl>,
#> # PC_18 <dbl>, PC_19 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
pbmc_small %>%
summarise(mean(nCount_RNA))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 1 × 1
#> `mean(nCount_RNA)`
#> <dbl>
#> 1 245.
`%>%` <- magrittr::`%>%`
pbmc_small %>%
mutate(nFeature_RNA=1)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <dbl> <fct> <fct> <chr>
#> 1 ATGC… SeuratPro… 70 1 0 A g2
#> 2 CATG… SeuratPro… 85 1 0 A g1
#> 3 GAAC… SeuratPro… 87 1 1 B g2
#> 4 TGAC… SeuratPro… 127 1 0 A g2
#> 5 AGTC… SeuratPro… 173 1 0 A g2
#> 6 TCTG… SeuratPro… 70 1 0 A g1
#> 7 TGGT… SeuratPro… 64 1 0 A g1
#> 8 GCAG… SeuratPro… 72 1 0 A g1
#> 9 GATA… SeuratPro… 52 1 0 A g1
#> 10 AATG… SeuratPro… 100 1 0 A g1
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
pbmc_small %>%
rename(s_score=nFeature_RNA)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA s_score RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGCCAGAA… SeuratPro… 70 47 0 A g2
#> 2 CATGGCCTG… SeuratPro… 85 52 0 A g1
#> 3 GAACCTGAT… SeuratPro… 87 50 1 B g2
#> 4 TGACTGGAT… SeuratPro… 127 56 0 A g2
#> 5 AGTCAGACT… SeuratPro… 173 53 0 A g2
#> 6 TCTGATACA… SeuratPro… 70 48 0 A g1
#> 7 TGGTATCTA… SeuratPro… 64 36 0 A g1
#> 8 GCAGCTCTG… SeuratPro… 72 45 0 A g1
#> 9 GATATAACA… SeuratPro… 52 36 0 A g1
#> 10 AATGTTGAC… SeuratPro… 100 41 0 A g1
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
`%>%` <- magrittr::`%>%`
tt <- pbmc_small
tt %>% left_join(tt %>% distinct(groups) %>% mutate(new_column=1:2))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> Joining with `by = join_by(groups)`
#> # A SingleCellExperiment-tibble abstraction: 80 × 18
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGC… SeuratPro… 70 47 0 A g2
#> 2 CATG… SeuratPro… 85 52 0 A g1
#> 3 GAAC… SeuratPro… 87 50 1 B g2
#> 4 TGAC… SeuratPro… 127 56 0 A g2
#> 5 AGTC… SeuratPro… 173 53 0 A g2
#> 6 TCTG… SeuratPro… 70 48 0 A g1
#> 7 TGGT… SeuratPro… 64 36 0 A g1
#> 8 GCAG… SeuratPro… 72 45 0 A g1
#> 9 GATA… SeuratPro… 52 36 0 A g1
#> 10 AATG… SeuratPro… 100 41 0 A g1
#> # ℹ 70 more rows
#> # ℹ 11 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # new_column <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
#> # PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
tt <- pbmc_small
tt %>% inner_join(tt %>% distinct(groups) %>% mutate(new_column=1:2) %>% slice(1))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> Joining with `by = join_by(groups)`
#> # A SingleCellExperiment-tibble abstraction: 36 × 18
#> # Features=230 | Cells=36 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGC… SeuratPro… 70 47 0 A g2
#> 2 GAAC… SeuratPro… 87 50 1 B g2
#> 3 TGAC… SeuratPro… 127 56 0 A g2
#> 4 AGTC… SeuratPro… 173 53 0 A g2
#> 5 AGGT… SeuratPro… 62 31 0 A g2
#> 6 GGGT… SeuratPro… 101 41 0 A g2
#> 7 CATG… SeuratPro… 51 26 0 A g2
#> 8 TACG… SeuratPro… 99 45 0 A g2
#> 9 GTAA… SeuratPro… 67 33 0 A g2
#> 10 TACA… SeuratPro… 109 41 0 A g2
#> # ℹ 26 more rows
#> # ℹ 11 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # new_column <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
#> # PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
tt <- pbmc_small
tt %>% right_join(tt %>% distinct(groups) %>% mutate(new_column=1:2) %>% slice(1))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> Joining with `by = join_by(groups)`
#> # A SingleCellExperiment-tibble abstraction: 36 × 18
#> # Features=230 | Cells=36 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGC… SeuratPro… 70 47 0 A g2
#> 2 GAAC… SeuratPro… 87 50 1 B g2
#> 3 TGAC… SeuratPro… 127 56 0 A g2
#> 4 AGTC… SeuratPro… 173 53 0 A g2
#> 5 AGGT… SeuratPro… 62 31 0 A g2
#> 6 GGGT… SeuratPro… 101 41 0 A g2
#> 7 CATG… SeuratPro… 51 26 0 A g2
#> 8 TACG… SeuratPro… 99 45 0 A g2
#> 9 GTAA… SeuratPro… 67 33 0 A g2
#> 10 TACA… SeuratPro… 109 41 0 A g2
#> # ℹ 26 more rows
#> # ℹ 11 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # new_column <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
#> # PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
tt <- pbmc_small
tt %>% full_join(tibble::tibble(groups="g1", other=1:4))
#> Joining with `by = join_by(groups)`
#> tidySingleCellExperiment says: This operation lead to duplicated cell names. A data frame is returned for independent data analysis.
#> # A tibble: 212 × 32
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGC… SeuratPro… 70 47 0 A g2
#> 2 CATG… SeuratPro… 85 52 0 A g1
#> 3 CATG… SeuratPro… 85 52 0 A g1
#> 4 CATG… SeuratPro… 85 52 0 A g1
#> 5 CATG… SeuratPro… 85 52 0 A g1
#> 6 GAAC… SeuratPro… 87 50 1 B g2
#> 7 TGAC… SeuratPro… 127 56 0 A g2
#> 8 AGTC… SeuratPro… 173 53 0 A g2
#> 9 TCTG… SeuratPro… 70 48 0 A g1
#> 10 TCTG… SeuratPro… 70 48 0 A g1
#> # ℹ 202 more rows
#> # ℹ 25 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, PC_6 <dbl>,
#> # PC_7 <dbl>, PC_8 <dbl>, PC_9 <dbl>, PC_10 <dbl>, PC_11 <dbl>, PC_12 <dbl>,
#> # PC_13 <dbl>, PC_14 <dbl>, PC_15 <dbl>, PC_16 <dbl>, PC_17 <dbl>,
#> # PC_18 <dbl>, PC_19 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>, other <int>
`%>%` <- magrittr::`%>%`
pbmc_small %>%
slice(1)
#> # A SingleCellExperiment-tibble abstraction: 1 × 17
#> # Features=230 | Cells=1 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATGCC… SeuratPro… 70 47 0 A g2
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
pbmc_small %>%
select(cell, orig.ident)
#> Warning: tidySingleCellExperiment says: from version 1.3.1, the special columns including cell id (colnames(se)) has changed to ".cell". This dataset is returned with the old-style vocabulary (cell), however we suggest to update your workflow to reflect the new vocabulary (.cell)
#> # A SingleCellExperiment-tibble abstraction: 80 × 9
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#> cell orig.ident PC_1 PC_2 PC_3 PC_4 PC_5 tSNE_1 tSNE_2
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ATGCCAGAACGACT SeuratProj… -0.774 -0.900 -0.249 0.559 0.465 0.868 -8.10
#> 2 CATGGCCTGTGCAT SeuratProj… -0.0260 -0.347 0.665 0.418 0.585 -7.39 -8.77
#> 3 GAACCTGATGAACC SeuratProj… -0.457 0.180 1.32 2.01 -0.482 -28.2 0.241
#> 4 TGACTGGATTCTCA SeuratProj… -0.812 -1.38 -1.00 0.139 -1.60 16.3 -11.2
#> 5 AGTCAGACTGCACA SeuratProj… -0.774 -0.900 -0.249 0.559 0.465 1.91 -11.2
#> 6 TCTGATACACGTGT SeuratProj… -0.774 -0.900 -0.249 0.559 0.465 3.15 -9.94
#> 7 TGGTATCTAAACAG SeuratProj… -0.460 -1.19 -0.312 0.716 -1.65 17.9 -9.90
#> 8 GCAGCTCTGTTTCT SeuratProj… -0.900 -0.388 0.693 0.404 0.536 -6.49 -8.39
#> 9 GATATAACACGCAT SeuratProj… -0.774 -0.900 -0.249 0.559 0.465 1.33 -9.68
#> 10 AATGTTGACAGTCA SeuratProj… -0.488 -1.16 -0.306 0.702 -1.47 17.0 -9.43
#> # ℹ 70 more rows
`%>%` <- magrittr::`%>%`
pbmc_small %>%
sample_n(50)
#> # A SingleCellExperiment-tibble abstraction: 50 × 17
#> # Features=230 | Cells=50 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 AGTC… SeuratPro… 157 29 0 A g1
#> 2 CTAA… SeuratPro… 189 53 0 A g1
#> 3 ACTC… SeuratPro… 231 49 1 B g2
#> 4 CTTC… SeuratPro… 41 32 0 A g2
#> 5 CATG… SeuratPro… 51 26 0 A g2
#> 6 TACA… SeuratPro… 109 41 0 A g2
#> 7 AGAG… SeuratPro… 191 61 0 A g1
#> 8 GCTC… SeuratPro… 139 61 0 A g2
#> 9 ACAG… SeuratPro… 151 59 0 A g1
#> 10 TGGT… SeuratPro… 64 36 0 A g1
#> # ℹ 40 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
pbmc_small %>%
sample_frac(0.1)
#> # A SingleCellExperiment-tibble abstraction: 8 × 17
#> # Features=230 | Cells=8 | Assays=counts, logcounts
#> .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#> <chr> <fct> <dbl> <int> <fct> <fct> <chr>
#> 1 ATTAC… SeuratPro… 463 77 1 B g1
#> 2 AGAGA… SeuratPro… 191 61 0 A g1
#> 3 CCATC… SeuratPro… 224 50 1 B g2
#> 4 TTGGT… SeuratPro… 135 45 0 A g1
#> 5 AATGC… SeuratPro… 389 73 1 B g1
#> 6 AAATT… SeuratPro… 327 62 1 B g2
#> 7 CATGC… SeuratPro… 443 81 0 A g1
#> 8 CTTGA… SeuratPro… 233 76 1 B g1
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> # tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`
pbmc_small %>%
count(groups)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 2 × 2
#> groups n
#> <chr> <int>
#> 1 g1 44
#> 2 g2 36
`%>%` <- magrittr::`%>%`
pbmc_small %>%
pull(groups)
#> [1] "g2" "g1" "g2" "g2" "g2" "g1" "g1" "g1" "g1" "g1" "g2" "g1" "g2" "g2" "g2"
#> [16] "g1" "g2" "g1" "g1" "g2" "g1" "g1" "g2" "g2" "g1" "g2" "g2" "g2" "g2" "g1"
#> [31] "g1" "g1" "g1" "g2" "g1" "g1" "g2" "g1" "g1" "g2" "g1" "g2" "g2" "g2" "g1"
#> [46] "g2" "g1" "g2" "g1" "g2" "g1" "g2" "g2" "g2" "g1" "g1" "g1" "g1" "g2" "g1"
#> [61] "g1" "g1" "g1" "g1" "g1" "g2" "g2" "g1" "g1" "g1" "g2" "g1" "g2" "g2" "g1"
#> [76] "g1" "g2" "g1" "g2" "g1"