🚧 Deprecation notice 🚧

As of safejoin version 0.2.0 the package has been deprecated. As of version 1.1.1 dplyr has a relationship argument that provides the same functionality that safejoin was created for. See the dplyr docs https://dplyr.tidyverse.org/reference/mutate-joins.html for complete details.

Please use dplyr::left_join() with the relationship argument instead.

About

The goal of safejoin is to guarantee that when performing joins that extra rows are not added to your data. safejoin is a wrapper around the dplyr::left_join function.

Installation

You can install the released version of safejoin from CRAN with:

install.packages("safejoin")

Install development version from GitHub:

devtools::install_github("SamEdwardes/safejoin", ref = "dev")

Example

Depending on your need safejoin can raise an error, a warning, or a message. By default safejoin will raise an error.

Error:

library(safejoin)
x <- data.frame(key = c("a", "b"), value_x = c(1, 2))
y <- data.frame(key = c("a", "a"), value_y = c(1, 1))
safe_left_join(x, y, by = "key")
#> Warning: The `relationship` argument of `safe_left_join()` relationship-type as of
#> safejoin 0.2.0.
#> ℹ Please use `dplyr::left_join()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Error in safe_left_join(x, y, by = "key"): Input data x had 2 rows. After performing the join the data has 3 rows.

Warning:

safe_left_join(x, y, by = "key", action="warning")
#> Warning in safe_left_join(x, y, by = "key", action = "warning"): Input data x
#> had 2 rows. After performing the join the data has 3 rows.
#>   key value_x value_y
#> 1   a       1       1
#> 2   a       1       1
#> 3   b       2      NA

Message:

safe_left_join(x, y, by = "key", action="message")
#> Input data x had 2 rows. After performing the join the data has 3 rows.
#>   key value_x value_y
#> 1   a       1       1
#> 2   a       1       1
#> 3   b       2      NA

When a join is “safe” safe_left_join will have the exact same behavior as dplyr::left_join.

x <- data.frame(key = c("a", "b"), value_x = c(1, 2))
y <- data.frame(key = c("a", "b"), value_y = c(1, 1))
safe_left_join(x, y, by = "key")
#>   key value_x value_y
#> 1   a       1       1
#> 2   b       2       1

Other useful packages

There are other packages that help solve similar problems. Most notably https://github.com/cynkra/dm provides great features to treat data frames like a data base.

Reference and Attribution

safejoin is created and maintained by Sam Edwardes.