2

I have a dataframe of drug names. There are multiple doses for each type of drug. For instance, I have:

 x <- data.frame(c("DrugX 10 mg", "DrugX 20 mg", "DrugX 30mg", "DrugX 2% Cream", "DrugX 10% Gel", "DrugY 20 mg", "DrugY 30 mg"))

 x[,1] <- as.character(x[,1])

I would like to delete everything after a given numeric value. So I would like a new dataframe that looks like this:

 xnew <- data.frame(c("DrugX", "DrugX", "DrugX", "DrugX", "DrugX", "DrugY", "Drug Y"))

at which point I would like to take the 'uniques'

 xnew2 <- unique(xnew)

so my final product would be

 xnew2 <- c("DrugX", "Drug Y")

Thanks for the help in advance!

2 Answers 2

2

You can try sub

v1 <- sub('\\s*\\d+.*$', '', x[,1])
v1
#[1] "DrugX" "DrugX" "DrugX" "DrugX" "DrugX" "DrugY" "DrugY"

unique(v1)
#[1] "DrugX" "DrugY"
1

A sneaky possibility:

unique(gsub(' .*','\\1',x[,1]))
#[1] "DrugX" "DrugY"
1
  • Why do you need \\1? I didn't find any capture groups
    – akrun
    Commented Jul 1, 2015 at 19:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.