2

I have a column in my dataframe:

Colname
20151102
19920311
20130204
>=70
60-69
20-29

I wish to split this column into two columns like:

Col1         Col2
20151102
19920311
20130204
            >=70
            60-69
            20-29

How can I achieve this result?

5
  • 1
    Try do.call(cbind, split(df, cumsum(grepl('>', df$Colname)))) if you want as two separate columns.
    – akrun
    Commented Feb 4, 2015 at 8:56
  • @akrun I like that one. Should post it Commented Feb 4, 2015 at 9:03
  • @DavidArenburg But, it seems that is not the OP wanted :-)
    – akrun
    Commented Feb 4, 2015 at 9:04
  • Or an option to get the desired result would be indx <- cumsum(grepl('>', df$Colname)); df1 <- data.frame(Col1=df$Colname, Col2=df$Colname, stringsAsFactors=FALSE); df1[!indx,1] <- ''; df1[indx,2] <- ''
    – akrun
    Commented Feb 4, 2015 at 9:07
  • The column can also have data placed arbitarily. >=70 can be the 2nd entry of the column!
    – Abhishek
    Commented Feb 4, 2015 at 9:30

3 Answers 3

3

Without the need of any package:

df[,c("Col1", "Col2")] <- ""

isnum <- suppressWarnings(!is.na(as.numeric(df$colname)))

df$Col1[isnum] <- df$colname[isnum]
df$Col2[!isnum] <- df$colname[!isnum]

df <- df[,!(names(df) %in% "colname")]

Data:

df = data.frame(colname=c("20151102","19920311","20130204",">=70","60-69","20-29"), stringsAsFactors=FALSE)
2
  • this approach is not yeilding results.
    – Abhishek
    Commented Feb 4, 2015 at 11:59
  • 1
    My bad, I forgot to precise stringsAsFactors=FALSE when I created the data-frame. Now that works.
    – Jean Paul
    Commented Feb 4, 2015 at 14:22
3

One possible solution, the idea is to use extract from tidyr. Note that the delimiter I choose (the dot) must not appear in your initial data.frame.

library(magrittr)
library(tidyr)

df$colname = df$colname %>% 
             grepl("[>=|-]+", .) %>% 
             ifelse(paste0(".", df$colname), paste0(df$colname, ".")) 

extract(df, colname, c("col1","col2"), "(.*)\\.(.*)")
#     col1  col2
#1  222222      
#2 1111111      
#3          >=70
#4         60-69
#5         20-29

Data:

df = data.frame(colname=c("222222","1111111",">=70","60-69","20-29"))
4
  • You should swap the library() calls otherwise your code won't work because tidyr::extract is masked by magrittr:extract
    – alex23lemm
    Commented Feb 4, 2015 at 11:09
  • This approach looks promising and might work, but I am unable to use tidyr.
    – Abhishek
    Commented Feb 4, 2015 at 11:10
  • The error messsage says "Error in library(tidyr) : there is no package called ‘tidyr’"
    – Abhishek
    Commented Feb 4, 2015 at 11:11
  • maybe you need first install.packages("tidyr") ? Commented Feb 4, 2015 at 12:22
1

Here is a single statement solution. read.pattern captures the two field types separately in the parts of the regular expression surrounded by parentheses. (format can be omitted if the Colname column is already of class "character". Also, if it were desired to have the first column numeric then omit the colClasses argument.)

library(gsubfn)
read.pattern(text = format(DF$Colname), pattern = "(^\\d+$)|(.*)", 
                   col.names = c("Col1", "Col2"), colClasses = "character")

giving:

      col1     col2
1 20151102         
2 19920311         
3 20130204         
4          >=70    
5          60-69   
6          20-29 

Note: Here is a visualization of the regular expression used:

(^\d+$)|(.*)

Regular expression visualization

Debuggex Demo

Not the answer you're looking for? Browse other questions tagged or ask your own question.