2

I want to use comparison operators between vectors and dataframes. Say, for example, I have a vector vector_test defined in R as

vector_test = c(1, 2, 3)

and a corresponding dataframe A_test defined as

A_test = data.frame(
               x1 = c(1, 2, 3, 4, 5),
               x2 = c(2, 3, 4, 5, 6),
               x3 = c(3, 4, 5, 6, 7)
)

I want to use vector_test for isolating which elements in A_test are greater than / equal to the elements in vector_test. I want the output to be something like

A_test >= vector_test

> TRUE  TRUE        .
  TRUE. TRUE.       .
  TRUE. TRUE.       .
  TRUE.  ...        .
  TRUE.  ...        TRUE


But instead I got

enter image description here

It sounds dumb, I know, but I can't figure out (a) what I'm doing wrong and (b) what comparison R is making.

3 Answers 3

3

You need transpose the matrix first, compare, then transpose back.

> t(t(A_test) >= vector_test)
       x1   x2   x3
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] TRUE TRUE TRUE

This way you can make proper use of recycling.

It's similar to other arithmetic operations such as addition, multiplication, etc. Let's demonstrate addition on a matrix out of zeros with same dimensions like A_test.

> (M <- array(0, dim=dim(A_test)))
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0
[4,]    0    0    0
[5,]    0    0    0

What you did is similar to:

> M + vector_test
     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    3    2    1
[4,]    1    3    2
[5,]    2    1    3

What you want is:

> t(t(M) + vector_test)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    3
[3,]    1    2    3
[4,]    1    2    3
[5,]    1    2    3

Data:

> dput(vector_test)
c(1, 2, 3)
> dput(A_test)
structure(list(x1 = c(1, 2, 3, 4, 5), x2 = c(2, 3, 4, 5, 6), 
    x3 = c(3, 4, 5, 6, 7)), class = "data.frame", row.names = c(NA, 
-5L))
0
2

What appears to be happening is that the dataframe is coerced to a matrix which is then coerced to a vector for comparison, and the result is converted back to a matrix.

The vector version of A_test is

c(1, 2, 3, 4, 5, 
  2, 3, 4, 5, 6,
  3, 4, 5, 6, 7)

When you compare that to a length 3 vector, the vector is first recycled to length 15, giving this:

c(1, 2, 3, 1, 2,
  3, 1, 2, 3, 1,
  2, 3, 1, 2, 3)

and then elements are compared to the vector from the dataframe. The only FALSE in A_test >= vector_test comes in the 6th entry. When converted back to a matrix, that's the first entry in the 2nd column, as you saw.

1
  • Ahh that makes sense. Man I was wracking my brain trying to understand what was happening! Thank you so much!
    – JerBear
    Commented Nov 4, 2023 at 19:02
2

I think you are looking for row-wise comparisons of your data frame to your vector. Perhaps you want

t(apply(A_test, 1, `>=`, vector_test))
#>        x1   x2   x3
#> [1,] TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE
#> [4,] TRUE TRUE TRUE
#> [5,] TRUE TRUE TRUE

What R was doing was automatically recycling your length 3 vector 5 times, then comparing this to the three columns of your data frame stacked as one big vector. If we do this explicitly, we'll see we get the same as your initial result:

vector_test_long <- rep(vector_test, 4)

vector_test_long
#>  [1] 1 2 3 1 2 3 1 2 3 1 2 3

A_test >= vector_test_long
#>        x1    x2   x3
#> [1,] TRUE FALSE TRUE
#> [2,] TRUE  TRUE TRUE
#> [3,] TRUE  TRUE TRUE
#> [4,] TRUE  TRUE TRUE
#> [5,] TRUE  TRUE TRUE
2
  • Ah, that makes sense! What comparisons was R doing?
    – JerBear
    Commented Nov 4, 2023 at 18:51
  • @JerBear recycling - see my update Commented Nov 4, 2023 at 19:02

Not the answer you're looking for? Browse other questions tagged or ask your own question.