The goal of safejoin is to guarantee that when performing joins that extra rows are not added to your data. safejoin is a wrapper around the dplyr::left_join
function.
You can install the released version of safejoin from CRAN with:
install.packages("safejoin")
Note that as of 2021-04-18 safejoin has been submitted to CRAN. It has not yet been accepted.
Install from GitHub
devtools::install_github("SamEdwardes/safejoin")
Depending on your need safejoin can raise an error, a warning, or a message. By default safejoin will raise an error.
Error:
library(safejoin)
x <- data.frame(key = c("a", "b"), value_x = c(1, 2))
y <- data.frame(key = c("a", "a"), value_y = c(1, 1))
safe_left_join(x, y, by = "key")
#> Error in safe_left_join(x, y, by = "key"): Input data x had 2 rows. After performing the join the data has 3 rows.
Warning:
safe_left_join(x, y, by = "key", action="warning")
#> Warning in safe_left_join(x, y, by = "key", action = "warning"): Input data x had 2 rows. After performing the join
#> the data has 3 rows.
#> key value_x value_y
#> 1 a 1 1
#> 2 a 1 1
#> 3 b 2 NA
Message:
safe_left_join(x, y, by = "key", action="message")
#> Input data x had 2 rows. After performing the join the data has 3 rows.
#> key value_x value_y
#> 1 a 1 1
#> 2 a 1 1
#> 3 b 2 NA
When a join is “safe” safe_left_join
will have the exact same behavior as dplyr::left_join
.
x <- data.frame(key = c("a", "b"), value_x = c(1, 2))
y <- data.frame(key = c("a", "b"), value_y = c(1, 1))
safe_left_join(x, y, by = "key")
#> key value_x value_y
#> 1 a 1 1
#> 2 b 2 1
There are other packages that help solve similar problems. Most notably https://github.com/krlmlr/dm provides great features to treat data frames like a data base.