4

I have this toy data as df:

structure(list(Product_Name = c("Delicious Chips", "Creamy Tomato Soup", 
"Cheesy Macaroni", "Savory Meatballs", "Crispy Chicken Tenders"
), Ingredients = c("Potato Slices | Vegetable Oil | Salt | Seasoning Blend", 
"Tomatoes | Water | Cream | Onions | Salt | Spices", "Macaroni | Cheese Sauce | Milk | Butter | Salt | Pepper", 
"Ground Meat | Breadcrumbs | Onions | Garlic | Spices", "Chicken Tenders | Breading Mix | Vegetable Oil | Salt | Pepper"
)), row.names = c(NA, 5L), class = "data.frame")

Here I want to find which rows contain "Salt" in the Ingredients variable.

Using library(tidyverse), initially I try df %>% str_detect(Ingredients, "Salt") but I get Error: object 'Ingredients' not found.

But when I change it to df %>% filter(str_detect(Ingredients, "Salt") it returns a dataframe with the products matching the string.

I thought str_detect needs a character vector or something coercible to one and I thought that Ingredients fit that because when I do class(df$Ingredients) it returns character. Why won't it take Ingredients as an argument and what changes when it is wrapped into filter()?

5
  • 8
    I'm not sure what your intention is with the str_detect, but the function str_detect goes inside mutate/filter or a similar function - not on it's own. E.g. df %>% mutate(salt_flag = str_detect(Ingredients, "Salt")) Commented Aug 17, 2023 at 23:18
  • Thank you, that's good to know, but (again for my own learning) why does this work: fruit <- c("apple", "banana", "pear", "pineapple") str_detect(fruit, "a") which is from the str_detect documentation -- and if it should always go into a mutate() or similar, how do I learn that or find it out?!
    – Jay Bee
    Commented Aug 17, 2023 at 23:20
  • 3
    mutate has a ... argument which allows all the columns of df to be passed through as separate objects, which then can be picked up by str_detect when it is nested inside mutate. str_detect only has arguments for string= and pattern= and no ..., so needs a string directly passed in. So you could do something like df %>% pull(Ingredients) %>% str_detect(pattern="Salt") to sort it out as well. Commented Aug 17, 2023 at 23:24
  • 3
    Also worth mentioning - without the complication of %>% you could also do str_detect(df$Ingredients, "Salt") by selecting the column explicitly from the df object in the global environment/workspace. And you could assign that back using base R logic then too - df$salt_flag <- str_detect(df$Ingredients, "Salt") Commented Aug 17, 2023 at 23:46
  • I could be persuaded that this should be opened again, but I see any answer people give to be along the same lines as the ones above - you can't do a mutate or filter or something outside of that context, because most functions don't work like that. str_detect takes a string and a pattern as an argument, not a dataframe, a string, and a pattern. While this could be the context for explaining that, it seems like a fairly universal point that I'd be surprised if it hasn't been answered in some way before
    – Mark
    Commented Aug 18, 2023 at 14:07

1 Answer 1

3

In many Tidyverse (e.g., dplyr) functions, unquoted variables that get passed along to functions use data masking which allow you to use unquoted data variables as if they were variables in the environment. We can see this when we use dplyr::filter:

library(dplyr)

df |> 
  filter(Product_Name == "Savory Meatballs")
#>       Product_Name                                          Ingredients
#> 1 Savory Meatballs Ground Meat | Breadcrumbs | Onions | Garlic | Spices

Here filter is looking for and using the variable "Product_Name" within df, not within your global environment.

However, str_detect, and most of the other functions from the stringr package, do not have this capability. As others have noted, you can nest your str_detect call within mutate or filter to see these results. But if you wanted to just pass along Ingredients to str_detect you can use the with function (more info about with on r-bloggers). This is what that looks like:

library(stringr)

df |>
  with(str_detect(Ingredients, "Salt"))
#> [1]  TRUE  TRUE  TRUE FALSE  TRUE

It does something very similar to what those dplyr functions are doing behind the scenes: rather than looking for a variable named "Ingredients" in your global environment (which is not defined because that is not what you want, you want it to be looking for "Ingredients" within df), it treats the first argument (df) as its own environment and looks for a variable called "Ingredients" in that environment instead.

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.