21

This is the glimpse() of my dataframe DF:

Observations: 221184
Variables:
$ Epsilon    (fctr) 96002.txt, 96002.txt, 96004.txt, 96004.txt, 96005.txt, 960...
$ Value   (int) 61914, 61887, 61680, 61649, 61776, 61800, 61753, 61725, 616...

I want to filter (remove) all the observations with the first two levels of Epsilon using dplyr.

I mean:

DF %>% filter(Epsilon != "96002.txt" & Epsilon != "96004.txt")

However, I don't want to use the string values (i.e., "96002.txt" and "96004.txt") but the level orders (i.e., 1 and 2), because it should be a general instruction independent of the level values.

3
  • 1
    Is filter(as.numeric(Epsilon)>2) what you are looking for?
    – nicola
    Commented May 5, 2015 at 11:46
  • @nicola Great, it is! Please rewrite it as an answer (not a comment) and I will accept it. Commented May 5, 2015 at 11:49
  • 1
    As commented by nicola, you can convert factors to their numeric/integer representation just by applying as.numeric or as.integer on them (which often causes confusion when it's not inteded).
    – talat
    Commented May 5, 2015 at 11:50

1 Answer 1

35

You can easily convert a factor into an integer and then use conditions on it. Just replace your filter statement with:

 filter(as.integer(Epsilon)>2)

More generally, if you have a vector of indices level you want to eliminate, you can try:

 #some random levels we don't want
 nonWantedLevels<-c(5,6,9,12,13)
 #just the filter part
 filter(!as.integer(Epsilon) %in% nonWantedLevels)
2
  • 1
    Is as.integer() better/safer than as.numeric here? Commented Dec 7, 2019 at 11:02
  • 5
    Very slightly more efficient, since a factor is internally an integer and numeric coerces to a float value.
    – nicola
    Commented Dec 9, 2019 at 11:39

Not the answer you're looking for? Browse other questions tagged or ask your own question.