seplyr

John Mount

2019-07-24

The R package seplyr supplies improved standard evaluation interfaces for some common data data plying tasks.

To install this packing in R please either install from CRAN with:

   install.packages('seplyr')

or from GitHub:

   devtools::install_github('WinVector/seplyr')

In dplyr if you know the names of columns when you are writing code you can write code such as the following.

suppressPackageStartupMessages(library("dplyr"))
packageVersion("dplyr")
## [1] '0.8.1'
datasets::mtcars %>% 
  arrange(cyl, desc(gear)) %>% 
  head()
##    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 2 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 3 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 5 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 6 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

In dplyr 0.7.* if the names of the columns are coming from a variable set elsewhere you would to need to use a tool to substitute those names in. One such tool is rlang/tidyeval (though we strongly prefer seplyr and wrapr::let())). rlang/tidyeval works as follows (for comparison only, this is not our suggested workflow).

# Assume this is set elsewhere,
# supplied by a user, function argument, or control file.
orderTerms <- c('cyl', 'desc(gear)')

# Now convert into splice-able types, the idea is the user
# supplies variable names that we later convert to "quosures"
# for use in `dplyr` 0.7.* generic code.
# This code is near the pipe under the rule:
# "If you are close enough to form a quosure, 
#  you are close enough to re-code the analysis"
orderQs <- lapply(orderTerms,
                  function(si) { rlang::parse_expr(si) })
# pipe
datasets::mtcars %>% 
  arrange(!!!orderQs) %>% 
  head()
##    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 2 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 3 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 5 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 6 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

If you don’t want to try and digest entire theory of quasi-quoting and splicing (the !!! operator) then you can use seplyr which conveniently and legibly wraps the operations as follows:

library("seplyr")

datasets::mtcars %.>% 
  arrange_se(., orderTerms) %>% 
  head(.)
##    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 2 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 3 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 5 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 6 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

The idea is: the above code looks very much like simple dplyr code used running an analysis, and yet is very easy to parameterize and re-use in a script or package.


seplyr::arrange_se() performs the wrapping for you without you having to work through the details of rlang. If you are interested in the details seplyr itself is a good tutorial. For example you can examine seplyr’s implementation to see the necessary notations (using a command such as print(arrange_se)). And, of course, we try to supply some usable help entries, such as: help(arrange_se). Some more discussion of the ideas can be found here.

The current set of SE adapters includes (all commands of the form NAME_se() being adapters for a dplyr::NAME() method):

Only two of the above are completely redundant. seplyr::group_by_se() essentially works as dplyr::group_by_at() and seplyr::select_se() essentially works as dplyr::select_at(). The others either have different semantics or currently (as of dplyr 0.7.1) no matching dplyr::*_at() method. Roughly all seplyr is trying to do is give a uniform first-class standard interface to all of the primary deprecated underscore suffixed verbs (such as dplyr::arrange_).

We also have a few methods that work around a few of the minor inconvenience of working with variable names as strings:

Here is a example using seplyr::summarize_se().

datasets::iris %.>%
  group_by_se(., "Species") %.>%
  summarize_se(., c("Mean.Sepal.Length" := "mean(Sepal.Length)", 
                    "Mean.Sepal.Width" := "mean(Sepal.Width)"))
## # A tibble: 3 x 3
##   Species    Mean.Sepal.Length Mean.Sepal.Width
##   <fct>                  <dbl>            <dbl>
## 1 setosa                  5.01             3.43
## 2 versicolor              5.94             2.77
## 3 virginica               6.59             2.97

In addition to the series of adapters we also supply a number of useful new verbs including:

seplyr is designed to be a thin package that passes work to dplyr. If you want a package that works around dplyr implementation differences on different data sources I suggest trying our own replyr package. Another alternative is using wrapr::let().

seplyr methods are short and have examples in their help, so always try both help and printing the method (for example: help(select_se) and print(select_se)). Printing methods can show you how to use dplyr directly with rlang/tidyeval methodology (allowing you to skip seplyr).

Some inspiration comes from Sebastian Kranz’s s_dplyr. Please see help("%.>%", package="wrapr") for details on “dot pipe.”