# Mean Relative Indifference

### A Comparison of R Comparison Functions

Most R object comparison functions are good at telling you that objects are different, but less so at conveying how they are different. I wrote diffobj to provide an “aha, that’s how they are different” comparison. In this vignette I will compare diffPrint to all.equal and to testthat::compare.

Disclaimer: I picked the examples here to showcase diffobj capabilities, not to carry out a fair and balanced comparison of these comparison functions. Nonetheless, I hope you will find the examples representative of common situations where comparison of R objects is useful.

## Vectors

I defined four pairs of numeric vectors for us to compare. I purposefully hid the variable definitions to simulate a comparison of unknown objects.

### Stage 1

all.equal(A1, B1)
## [1] "Mean relative difference: 0.1"

The objects are different… At this point I would normally print both A1 and B1 to try to figure out how that difference came about since the “mean relative difference” is unhelpful.

testthat::compare(A1, B1)
## 1/10 mismatches
## [10] 10 - 11 == -1

testthat::compare does a better job, but I still feel the need to look at A1 and B1.

diffPrint(A1, B1)
< A1> B1@@ 1 @@@@ 1 @@<  [1]  1  2  3  4  5  6  7  8  9 10>  [1]  1  2  3  4  5  6  7  8  9 11

Aha, that’s how they are different!

### Stage 2

Let’s up the difficulty a little bit:

testthat::compare(A2, B2)
## 20/20 mismatches (average diff: 1.9)
## [1] 1 - 20 == -19
## [2] 2 -  1 ==   1
## [3] 3 -  2 ==   1
## [4] 4 -  3 ==   1
## [5] 5 -  4 ==   1
## [6] 6 -  5 ==   1
## [7] 7 -  6 ==   1
## [8] 8 -  7 ==   1
## [9] 9 -  8 ==   1
## ...

If you look closely you will see that despite a reported 20/20 differences, the two vectors are actually similar, at least in the part visible part of the output. With diffPrint it is obvious that B2 and is the same as A2, except that the last value has been moved to the first position:

diffPrint(A2, B2)
< A2> B2@@ 1,2 @@@@ 1,2 @@<  [1]  1  2  3  4  5  6  7  8  9 10 11>  [1] 20  1  2  3  4  5  6  7  8  9 10< [12] 12 13 14 15 16 17 18 19 20> [12] 11 12 13 14 15 16 17 18 19

### Stage 3

testthat::compare throws in the towel as soon as lengths are unequal:

testthat::compare(A3, B3)
## Lengths differ: 20 is not 21

all.equal does the same. diffPrint is unfazed:

diffPrint(A3, B3)
< A3> B3@@ 1,2 @@@@ 1,2 @@<  [1]  1  2  3  4  5  6  7  8  9 10 11>  [1] 20 21  1  2  3  4  5  6  7  8  9< [12] 12 13 14 15 16 17 18 19 20> [12] 10 11 12 13 14 15 16 17 18 19

diffPrint also produces useful output for largish vectors:

A4 <- 1:1e4
B4 <- c(1e4 + 1, A4[-c(4:7, 9e3)])
diffPrint(A4, B4)
< A4> B4@@ 1,4 @@@@ 1,4 @@<     [1]     1     2     3     4     5>    [1] 10001     1     2     3     8<     [6]     6     7     8     9    10>    [6]     9    10    11    12    13     [11]    11    12    13    14    15    [11]    14    15    16    17    18     [16]    16    17    18    19    20    [16]    19    20    21    22    23@@ 1798,5 @@@@ 1798,5 @@   [8986]  8986  8987  8988  8989  8990  [8986]  8989  8990  8991  8992  8993   [8991]  8991  8992  8993  8994  8995  [8991]  8994  8995  8996  8997  8998<  [8996]  8996  8997  8998  8999  9000> [8996]  8999  9001  9002  9003  9004   [9001]  9001  9002  9003  9004  9005  [9001]  9005  9006  9007  9008  9009   [9006]  9006  9007  9008  9009  9010  [9006]  9010  9011  9012  9013  9014

Do note that the comparison algorithm scales with the square of the number of differences, so very large and different vectors will be slow to process.

## Objects

R Core and package authors put substantial effort into print and show methods. diffPrint takes advantage of this. Compare:

all.equal(iris, iris[-60,])
## [1] "Attributes: < Component \"row.names\": Numeric: lengths (150, 149) differ >"
## [2] "Component \"Sepal.Length\": Numeric: lengths (150, 149) differ"
## [3] "Component \"Sepal.Width\": Numeric: lengths (150, 149) differ"
## [4] "Component \"Petal.Length\": Numeric: lengths (150, 149) differ"
## [5] "Component \"Petal.Width\": Numeric: lengths (150, 149) differ"
##  [ reached getOption("max.print") -- omitted 3 entries ]

to:

diffPrint(iris, iris[-60,])
< iris> iris[-60, ]@@ 59,5 / 59,4 @@~     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  58           4.9         2.4          3.3         1.0 versicolor  59           6.6         2.9          4.6         1.3 versicolor< 60           5.2         2.7          3.9         1.4 versicolor  61           5.0         2.0          3.5         1.0 versicolor  62           5.9         3.0          4.2         1.5 versicolor

And:

all.equal(lm(hp ~ disp, mtcars), lm(hp ~ cyl, mtcars))
## [1] "Component \"coefficients\": Names: 1 string mismatch"
## [2] "Component \"coefficients\": Mean relative difference: 2.778944"
## [3] "Component \"residuals\": Mean relative difference: 0.7074011"
## [4] "Component \"effects\": Names: 1 string mismatch"
## [5] "Component \"effects\": Mean relative difference: 0.5907086"
##  [ reached getOption("max.print") -- omitted 9 entries ]

to:

diffPrint(lm(hp ~ disp, mtcars), lm(hp ~ cyl, mtcars))
< lm(hp ~ disp, mtcars)> lm(hp ~ cyl, mtcars)@@ 1,8 @@@@ 1,8 @@        Call:  Call:< lm(formula = hp ~ disp, data = mtcars)> lm(formula = hp ~ cyl, data = mtcars)        Coefficients:  Coefficients:< (Intercept)         disp  > (Intercept)          cyl  <     45.7345       0.4376  >      -51.05        31.96

In these examples I limited all.equal output to five lines for the sake of brevity. Also, since testthat::compare reverts to all.equal output with more complex objects I omit it from this comparison.

## Parting Thoughts

Another candidate comparison function is compare::compare. I omitted it from this vignette because it focuses more on similarities than on differences. Additionally, testthat::compare and compare::compare print methods conflict so they cannot be used together.

For a more thorough exploration of diffobj methods and their features please see the primary diffobj vignette.