Arrange rows by column values

arrange() order the rows of a data frame rows by the values of selected columns.

Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicit mention grouping variables (or use by_group=TRUE) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.

filter() retains the rows where the conditions you provide a TRUE. Note that, unlike base subsetting with [, rows where the condition evaluates to NA are dropped.

Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

summarise() and summarize() are synonyms.

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL.

Rename individual variables using new_name=old_name syntax.

See this repository for alternative ways to perform row-wise operations.

slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases:

slice_head() and slice_tail() select the first or last rows.
slice_sample() randomly selects rows.
slice_min() and slice_max() select rows with highest or lowest values of a variable.

If .data is a grouped_df, the operation will be performed on each group, so that (e.g.) slice_head(df, n=5) will select the first five rows in each group.

Select (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name (e.g. a:f selects all columns from a on the left to f on the right). You can also use predicate functions like is.numeric to select variables based on their properties.

sample_n() and sample_frac() have been superseded in favour of slice_sample(). While they will not be deprecated in the near future, retirement means that we will only perform critical bug fixes, so we recommend moving to the newer alternative.

These functions were superseded because we realised it was more convenient to have two mutually exclusive arguments to one function, rather than two separate functions. This also made it to clean up a few other smaller design issues with sample_n()/sample_frac:

The connection to slice() was not obvious.
The name of the first argument, tbl, is inconsistent with other single table verbs which use .data.
The size argument uses tidy evaluation, which is surprising and undocumented.
It was easier to remove the deprecated .env argument.
... was in a suboptimal position.

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n=n()). count() is paired with tally(), a lower-level helper that is equivalent to df %>% summarise(n=n()). Supply wt to perform weighted counts, switching the summary from n=n() to n=sum(wt).

add_count() are add_tally() are equivalents to count() and tally() but use mutate() instead of summarise() so that they add a new column with group-wise counts.

pull() is similar to $. It's mostly useful because it looks a little nicer in pipes, it also works with remote data frames, and it can optionally name the output.

bind_rows(..., .id = NULL, add.cell.ids = NULL)

bind_cols(..., .id = NULL)

Arguments

...

For use by methods.

.id

Data frame identifier.

When .id is supplied, a new column of identifiers is created to link each row to its original data frame. The labels are taken from the named arguments to bind_rows(). When a list of data frames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.

add.cell.ids

from SingleCellExperiment 3.0 A character vector of length(x=c(x, y)). Appends the corresponding values to the start of each objects' cell names.

.by_group

If TRUE, will sort first by grouping variable. Applies to grouped data frames only.

.keep_all

If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values. (See dplyr)

.preserve

when FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise it is kept as is.

.add

When FALSE, the default, group_by() will override existing groups. To add to the existing groups, use .add=TRUE.

This argument was previously called add, but that prevented creating a new grouping variable called add, and conflicts with our naming conventions.

.data

Input data frame.

y

tbls to join. (See dplyr)

by

A character vector of variables to join by. (See dplyr)

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. (See dplyr)

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2. (See dplyr)

tbl

A data.frame.

size

<tidy-select> For sample_n(), the number of rows to select. For sample_frac(), the fraction of rows to select. If tbl is grouped, size applies to each group.

replace

Sample with or without replacement?

weight

<tidy-select> Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.

.env

DEPRECATED.

x

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

wt

<data-masking> Frequency weights. Can be NULL or a variable:

If NULL (the default), counts the number of rows in each group.
If a variable, computes sum(wt) for each group.

sort

If TRUE, will show the largest groups at the top.

.drop

For count(): if FALSE will include counts for empty groups (i.e. for levels of factors that don't exist in the data). Deprecated in add_count() since it didn't actually affect the output.

name

An optional parameter that specifies the column to be used as names for a named vector. Specified in a similar manner as var.

Value

An object of the same type as .data.

All rows appear in the output, but (usually) in a different place.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.

A tidySingleCellExperiment object

An object of the same type as .data.

Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if .preserve is not TRUE).
Data frame attributes are preserved.

A grouped data frame, unless the combination of ... and add yields a non empty set of grouping columns, a regular (ungrouped) data frame otherwise.

An object usually of the same type as .data.

The rows come from the underlying group_keys().
The columns are a combination of the grouping keys and the summary expressions that you provide.
If x is grouped by more than one variable, the output will be another grouped_df with the right-most group removed.
If x is grouped by one variable, or is not grouped, the output will be a tibble.
Data frame attributes are not preserved, because summarise() fundamentally creates a new data frame.

An object of the same type as .data.

For mutate():

Rows are not affected.
Existing columns will be preserved unless explicitly modified.
New columns will be added to the right of existing columns.
Columns given value NULL will be removed
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.

For transmute():

Rows are not affected.
Apart from grouping variables, existing columns will be remove unless explicitly kept.
Column order matches order of expressions.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.

An object of the same type as .data.

Rows are not affected.
Column names are changed; column order is preserved
Data frame attributes are preserved.
Groups are updated to reflect new names.

A tbl

A tidySingleCellExperiment object

An object of the same type as .data. The output has the following properties:

Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.

An object of the same type as .data. The output has the following properties:

Rows are not affected.
Output columns are a subset of input columns, potentially with a different order. Columns will be renamed if new_name=old_name form is used.
Data frame attributes are preserved.
Groups are maintained; you can't select off grouping variables.

A tidySingleCellExperiment object

An object of the same type as .data. count() and add_count()

group transiently, so the output has the same groups as the input.

A vector the same size as .data.

Details

Locales

The sort order for character vectors will depend on the collating sequence of the locale in use: see locales().

Missing values

Unlike base sorting with sort(), NA are:

always sorted to the end for local data, even when wrapped with desc().
treated differently for remote data, depending on the backend.

dplyr is not yet smart enough to optimise filtering optimisation on grouped datasets that don't need grouped calculations. For this reason, filtering is often considerably faster on ungroup()ed data.

rowwise() is used for the results of do() when you create list-variables. It is also useful to support arbitrary complex operations that need to be applied to each row.

Currently, rowwise grouping only works with data frames. Its main impact is to allow you to work with list-variables in summarise() and mutate() without having to use [[1]]. This makes summarise() on a rowwise tbl effectively equivalent to plyr::ldply().

Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().

Methods

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:

These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:

The following methods are currently available in loaded packages:

Methods available in currently loaded packages:

The following methods are currently available in loaded packages:

Methods available in currently loaded packages:

slice(): no methods found .
slice_head(): no methods found .
slice_tail(): no methods found .
slice_min(): no methods found .
slice_max(): no methods found .
slice_sample(): no methods found .

The following methods are currently available in loaded packages: no methods found .

Useful filter functions

==, >, >= etc
&, |, !, xor()
is.na()
between(), near()

Grouped tibbles

Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:

The former keeps rows with mass greater than the global average whereas the latter keeps rows with mass greater than the gender

average.

Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:

With the grouped equivalent:

The former normalises mass by the global average whereas the latter normalises by the averages within gender levels.

Useful functions

Center: mean(), median()
Spread: sd(), IQR(), mad()
Range: min(), max(), quantile()
Position: first(), last(), nth(),
Count: n(), n_distinct()
Logical: any(), all()

Backend variations

The data frame backend supports creating a variable and using it in the same summary. This means that previously created summary variables can be further transformed or combined within the summary, as in mutate(). However, it also means that summary variables with the same names as previous variables overwrite them, making those variables unavailable to later summary variables.

This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.

Useful mutate functions

+, -, log(), etc., for their usual mathematical meanings
lead(), lag()
dense_rank(), min_rank(), percent_rank(), row_number(), cume_dist(), ntile()
cumsum(), cummean(), cummin(), cummax(), cumany(), cumall()
na_if(), coalesce()
if_else(), recode(), case_when()

Scoped selection and renaming

Use the three scoped variants (rename_all(), rename_if(), rename_at()) to renaming a set of variables with a function.

Examples

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    arrange(nFeature_RNA)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 CATG… SeuratPro…         51           26 0               A             g2    
#>  2 GGCA… SeuratPro…        172           29 0               A             g1    
#>  3 AGTC… SeuratPro…        157           29 0               A             g1    
#>  4 GACG… SeuratPro…        202           30 0               A             g2    
#>  5 GGAA… SeuratPro…        150           30 0               A             g2    
#>  6 AGGT… SeuratPro…         62           31 0               A             g2    
#>  7 CTTC… SeuratPro…         41           32 0               A             g2    
#>  8 GTAA… SeuratPro…         67           33 0               A             g2    
#>  9 GTCA… SeuratPro…        210           33 0               A             g2    
#> 10 TGGT… SeuratPro…         64           36 0               A             g1    
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    distinct(groups)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 2 × 1
#>   groups
#>   <chr> 
#> 1 g2    
#> 2 g1    


`%>%` <- magrittr::`%>%`
pbmc_small %>%

    filter(groups == "g1")
#> # A SingleCellExperiment-tibble abstraction: 44 × 17
#> # Features=230 | Cells=44 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 CATG… SeuratPro…         85           52 0               A             g1    
#>  2 TCTG… SeuratPro…         70           48 0               A             g1    
#>  3 TGGT… SeuratPro…         64           36 0               A             g1    
#>  4 GCAG… SeuratPro…         72           45 0               A             g1    
#>  5 GATA… SeuratPro…         52           36 0               A             g1    
#>  6 AATG… SeuratPro…        100           41 0               A             g1    
#>  7 AGAG… SeuratPro…        191           61 0               A             g1    
#>  8 CTAA… SeuratPro…        168           44 0               A             g1    
#>  9 TTGG… SeuratPro…        135           45 0               A             g1    
#> 10 CATC… SeuratPro…         79           43 0               A             g1    
#> # ℹ 34 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>

# Learn more in ?dplyr_tidy_eval

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    group_by(groups)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 80 × 31
#> # Groups:   groups [2]
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 ATGC… SeuratPro…         70           47 0               A             g2    
#>  2 CATG… SeuratPro…         85           52 0               A             g1    
#>  3 GAAC… SeuratPro…         87           50 1               B             g2    
#>  4 TGAC… SeuratPro…        127           56 0               A             g2    
#>  5 AGTC… SeuratPro…        173           53 0               A             g2    
#>  6 TCTG… SeuratPro…         70           48 0               A             g1    
#>  7 TGGT… SeuratPro…         64           36 0               A             g1    
#>  8 GCAG… SeuratPro…         72           45 0               A             g1    
#>  9 GATA… SeuratPro…         52           36 0               A             g1    
#> 10 AATG… SeuratPro…        100           41 0               A             g1    
#> # ℹ 70 more rows
#> # ℹ 24 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, PC_6 <dbl>,
#> #   PC_7 <dbl>, PC_8 <dbl>, PC_9 <dbl>, PC_10 <dbl>, PC_11 <dbl>, PC_12 <dbl>,
#> #   PC_13 <dbl>, PC_14 <dbl>, PC_15 <dbl>, PC_16 <dbl>, PC_17 <dbl>,
#> #   PC_18 <dbl>, PC_19 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    summarise(mean(nCount_RNA))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 1 × 1
#>   `mean(nCount_RNA)`
#>                <dbl>
#> 1               245.

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    mutate(nFeature_RNA=1)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <dbl> <fct>           <fct>         <chr> 
#>  1 ATGC… SeuratPro…         70            1 0               A             g2    
#>  2 CATG… SeuratPro…         85            1 0               A             g1    
#>  3 GAAC… SeuratPro…         87            1 1               B             g2    
#>  4 TGAC… SeuratPro…        127            1 0               A             g2    
#>  5 AGTC… SeuratPro…        173            1 0               A             g2    
#>  6 TCTG… SeuratPro…         70            1 0               A             g1    
#>  7 TGGT… SeuratPro…         64            1 0               A             g1    
#>  8 GCAG… SeuratPro…         72            1 0               A             g1    
#>  9 GATA… SeuratPro…         52            1 0               A             g1    
#> 10 AATG… SeuratPro…        100            1 0               A             g1    
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    rename(s_score=nFeature_RNA)
#> # A SingleCellExperiment-tibble abstraction: 80 × 17
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#>    .cell      orig.ident nCount_RNA s_score RNA_snn_res.0.8 letter.idents groups
#>    <chr>      <fct>           <dbl>   <int> <fct>           <fct>         <chr> 
#>  1 ATGCCAGAA… SeuratPro…         70      47 0               A             g2    
#>  2 CATGGCCTG… SeuratPro…         85      52 0               A             g1    
#>  3 GAACCTGAT… SeuratPro…         87      50 1               B             g2    
#>  4 TGACTGGAT… SeuratPro…        127      56 0               A             g2    
#>  5 AGTCAGACT… SeuratPro…        173      53 0               A             g2    
#>  6 TCTGATACA… SeuratPro…         70      48 0               A             g1    
#>  7 TGGTATCTA… SeuratPro…         64      36 0               A             g1    
#>  8 GCAGCTCTG… SeuratPro…         72      45 0               A             g1    
#>  9 GATATAACA… SeuratPro…         52      36 0               A             g1    
#> 10 AATGTTGAC… SeuratPro…        100      41 0               A             g1    
#> # ℹ 70 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`

`%>%` <- magrittr::`%>%`

tt <- pbmc_small
tt %>% left_join(tt %>% distinct(groups) %>% mutate(new_column=1:2))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> Joining with `by = join_by(groups)`
#> # A SingleCellExperiment-tibble abstraction: 80 × 18
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 ATGC… SeuratPro…         70           47 0               A             g2    
#>  2 CATG… SeuratPro…         85           52 0               A             g1    
#>  3 GAAC… SeuratPro…         87           50 1               B             g2    
#>  4 TGAC… SeuratPro…        127           56 0               A             g2    
#>  5 AGTC… SeuratPro…        173           53 0               A             g2    
#>  6 TCTG… SeuratPro…         70           48 0               A             g1    
#>  7 TGGT… SeuratPro…         64           36 0               A             g1    
#>  8 GCAG… SeuratPro…         72           45 0               A             g1    
#>  9 GATA… SeuratPro…         52           36 0               A             g1    
#> 10 AATG… SeuratPro…        100           41 0               A             g1    
#> # ℹ 70 more rows
#> # ℹ 11 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   new_column <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
#> #   PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
`%>%` <- magrittr::`%>%`

tt <- pbmc_small
tt %>% inner_join(tt %>% distinct(groups) %>% mutate(new_column=1:2) %>% slice(1))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> Joining with `by = join_by(groups)`
#> # A SingleCellExperiment-tibble abstraction: 36 × 18
#> # Features=230 | Cells=36 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 ATGC… SeuratPro…         70           47 0               A             g2    
#>  2 GAAC… SeuratPro…         87           50 1               B             g2    
#>  3 TGAC… SeuratPro…        127           56 0               A             g2    
#>  4 AGTC… SeuratPro…        173           53 0               A             g2    
#>  5 AGGT… SeuratPro…         62           31 0               A             g2    
#>  6 GGGT… SeuratPro…        101           41 0               A             g2    
#>  7 CATG… SeuratPro…         51           26 0               A             g2    
#>  8 TACG… SeuratPro…         99           45 0               A             g2    
#>  9 GTAA… SeuratPro…         67           33 0               A             g2    
#> 10 TACA… SeuratPro…        109           41 0               A             g2    
#> # ℹ 26 more rows
#> # ℹ 11 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   new_column <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
#> #   PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`

tt <- pbmc_small
tt %>% right_join(tt %>% distinct(groups) %>% mutate(new_column=1:2) %>% slice(1))
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> Joining with `by = join_by(groups)`
#> # A SingleCellExperiment-tibble abstraction: 36 × 18
#> # Features=230 | Cells=36 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 ATGC… SeuratPro…         70           47 0               A             g2    
#>  2 GAAC… SeuratPro…         87           50 1               B             g2    
#>  3 TGAC… SeuratPro…        127           56 0               A             g2    
#>  4 AGTC… SeuratPro…        173           53 0               A             g2    
#>  5 AGGT… SeuratPro…         62           31 0               A             g2    
#>  6 GGGT… SeuratPro…        101           41 0               A             g2    
#>  7 CATG… SeuratPro…         51           26 0               A             g2    
#>  8 TACG… SeuratPro…         99           45 0               A             g2    
#>  9 GTAA… SeuratPro…         67           33 0               A             g2    
#> 10 TACA… SeuratPro…        109           41 0               A             g2    
#> # ℹ 26 more rows
#> # ℹ 11 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   new_column <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
#> #   PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`

tt <- pbmc_small
tt %>% full_join(tibble::tibble(groups="g1", other=1:4))
#> Joining with `by = join_by(groups)`
#> tidySingleCellExperiment says: This operation lead to duplicated cell names. A data frame is returned for independent data analysis.
#> # A tibble: 212 × 32
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 ATGC… SeuratPro…         70           47 0               A             g2    
#>  2 CATG… SeuratPro…         85           52 0               A             g1    
#>  3 CATG… SeuratPro…         85           52 0               A             g1    
#>  4 CATG… SeuratPro…         85           52 0               A             g1    
#>  5 CATG… SeuratPro…         85           52 0               A             g1    
#>  6 GAAC… SeuratPro…         87           50 1               B             g2    
#>  7 TGAC… SeuratPro…        127           56 0               A             g2    
#>  8 AGTC… SeuratPro…        173           53 0               A             g2    
#>  9 TCTG… SeuratPro…         70           48 0               A             g1    
#> 10 TCTG… SeuratPro…         70           48 0               A             g1    
#> # ℹ 202 more rows
#> # ℹ 25 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, PC_6 <dbl>,
#> #   PC_7 <dbl>, PC_8 <dbl>, PC_9 <dbl>, PC_10 <dbl>, PC_11 <dbl>, PC_12 <dbl>,
#> #   PC_13 <dbl>, PC_14 <dbl>, PC_15 <dbl>, PC_16 <dbl>, PC_17 <dbl>,
#> #   PC_18 <dbl>, PC_19 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>, other <int>


`%>%` <- magrittr::`%>%`
pbmc_small %>%

    slice(1)
#> # A SingleCellExperiment-tibble abstraction: 1 × 17
#> # Features=230 | Cells=1 | Assays=counts, logcounts
#>   .cell  orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>   <chr>  <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#> 1 ATGCC… SeuratPro…         70           47 0               A             g2    
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    select(cell, orig.ident)
#> Warning: tidySingleCellExperiment says: from version 1.3.1, the special columns including cell id (colnames(se)) has changed to ".cell". This dataset is returned with the old-style vocabulary (cell), however we suggest to update your workflow to reflect the new vocabulary (.cell)
#> # A SingleCellExperiment-tibble abstraction: 80 × 9
#> # Features=230 | Cells=80 | Assays=counts, logcounts
#>    cell           orig.ident     PC_1   PC_2   PC_3  PC_4   PC_5  tSNE_1  tSNE_2
#>    <chr>          <fct>         <dbl>  <dbl>  <dbl> <dbl>  <dbl>   <dbl>   <dbl>
#>  1 ATGCCAGAACGACT SeuratProj… -0.774  -0.900 -0.249 0.559  0.465   0.868  -8.10 
#>  2 CATGGCCTGTGCAT SeuratProj… -0.0260 -0.347  0.665 0.418  0.585  -7.39   -8.77 
#>  3 GAACCTGATGAACC SeuratProj… -0.457   0.180  1.32  2.01  -0.482 -28.2     0.241
#>  4 TGACTGGATTCTCA SeuratProj… -0.812  -1.38  -1.00  0.139 -1.60   16.3   -11.2  
#>  5 AGTCAGACTGCACA SeuratProj… -0.774  -0.900 -0.249 0.559  0.465   1.91  -11.2  
#>  6 TCTGATACACGTGT SeuratProj… -0.774  -0.900 -0.249 0.559  0.465   3.15   -9.94 
#>  7 TGGTATCTAAACAG SeuratProj… -0.460  -1.19  -0.312 0.716 -1.65   17.9    -9.90 
#>  8 GCAGCTCTGTTTCT SeuratProj… -0.900  -0.388  0.693 0.404  0.536  -6.49   -8.39 
#>  9 GATATAACACGCAT SeuratProj… -0.774  -0.900 -0.249 0.559  0.465   1.33   -9.68 
#> 10 AATGTTGACAGTCA SeuratProj… -0.488  -1.16  -0.306 0.702 -1.47   17.0    -9.43 
#> # ℹ 70 more rows

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    sample_n(50)
#> # A SingleCellExperiment-tibble abstraction: 50 × 17
#> # Features=230 | Cells=50 | Assays=counts, logcounts
#>    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#>  1 AGTC… SeuratPro…        157           29 0               A             g1    
#>  2 CTAA… SeuratPro…        189           53 0               A             g1    
#>  3 ACTC… SeuratPro…        231           49 1               B             g2    
#>  4 CTTC… SeuratPro…         41           32 0               A             g2    
#>  5 CATG… SeuratPro…         51           26 0               A             g2    
#>  6 TACA… SeuratPro…        109           41 0               A             g2    
#>  7 AGAG… SeuratPro…        191           61 0               A             g1    
#>  8 GCTC… SeuratPro…        139           61 0               A             g2    
#>  9 ACAG… SeuratPro…        151           59 0               A             g1    
#> 10 TGGT… SeuratPro…         64           36 0               A             g1    
#> # ℹ 40 more rows
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>
pbmc_small %>%

    sample_frac(0.1)
#> # A SingleCellExperiment-tibble abstraction: 8 × 17
#> # Features=230 | Cells=8 | Assays=counts, logcounts
#>   .cell  orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
#>   <chr>  <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
#> 1 ATTAC… SeuratPro…        463           77 1               B             g1    
#> 2 AGAGA… SeuratPro…        191           61 0               A             g1    
#> 3 CCATC… SeuratPro…        224           50 1               B             g2    
#> 4 TTGGT… SeuratPro…        135           45 0               A             g1    
#> 5 AATGC… SeuratPro…        389           73 1               B             g1    
#> 6 AAATT… SeuratPro…        327           62 1               B             g2    
#> 7 CATGC… SeuratPro…        443           81 0               A             g1    
#> 8 CTTGA… SeuratPro…        233           76 1               B             g1    
#> # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
#> #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
#> #   tSNE_2 <dbl>


`%>%` <- magrittr::`%>%`
pbmc_small %>%

    count(groups)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 2 × 2
#>   groups     n
#>   <chr>  <int>
#> 1 g1        44
#> 2 g2        36

`%>%` <- magrittr::`%>%`
pbmc_small %>%

    pull(groups)
#>  [1] "g2" "g1" "g2" "g2" "g2" "g1" "g1" "g1" "g1" "g1" "g2" "g1" "g2" "g2" "g2"
#> [16] "g1" "g2" "g1" "g1" "g2" "g1" "g1" "g2" "g2" "g1" "g2" "g2" "g2" "g2" "g1"
#> [31] "g1" "g1" "g1" "g2" "g1" "g1" "g2" "g1" "g1" "g2" "g1" "g2" "g2" "g2" "g1"
#> [46] "g2" "g1" "g2" "g1" "g2" "g1" "g2" "g2" "g2" "g1" "g1" "g1" "g1" "g2" "g1"
#> [61] "g1" "g1" "g1" "g1" "g1" "g2" "g2" "g1" "g1" "g1" "g2" "g1" "g2" "g2" "g1"
#> [76] "g1" "g2" "g1" "g2" "g1"

Arguments

Value

Details

Locales

Missing values

Methods

Useful filter functions

Grouped tibbles

Useful functions

Backend variations

Useful mutate functions

Scoped selection and renaming

See also

Examples