How to split contents in a single column into two separate columns in R?

Question

I have a column in my dataframe:

Colname
20151102
19920311
20130204
>=70
60-69
20-29

I wish to split this column into two columns like:

Col1         Col2
20151102
19920311
20130204
            >=70
            60-69
            20-29

How can I achieve this result?

Try do.call(cbind, split(df, cumsum(grepl('>', df$Colname)))) if you want as two separate columns. — akrun, Commented Feb 4, 2015 at 8:56
Or an option to get the desired result would be indx <- cumsum(grepl('>', df$Colname)); df1 <- data.frame(Col1=df$Colname, Col2=df$Colname, stringsAsFactors=FALSE); df1[!indx,1] <- ''; df1[indx,2] <- '' — akrun, Commented Feb 4, 2015 at 9:07
The column can also have data placed arbitarily. >=70 can be the 2nd entry of the column! — Abhishek, Commented Feb 4, 2015 at 9:30

Jean Paul · Accepted Answer · 2015-02-04 14:19:26Z

3

Without the need of any package:

df[,c("Col1", "Col2")] <- ""

isnum <- suppressWarnings(!is.na(as.numeric(df$colname)))

df$Col1[isnum] <- df$colname[isnum]
df$Col2[!isnum] <- df$colname[!isnum]

df <- df[,!(names(df) %in% "colname")]

Data:

df = data.frame(colname=c("20151102","19920311","20130204",">=70","60-69","20-29"), stringsAsFactors=FALSE)

edited Feb 4, 2015 at 14:19

answered Feb 4, 2015 at 11:23

Jean Paul

1,53818 silver badges23 bronze badges

this approach is not yeilding results.
– Abhishek
Commented Feb 4, 2015 at 11:59
1

My bad, I forgot to precise stringsAsFactors=FALSE when I created the data-frame. Now that works.
– Jean Paul
Commented Feb 4, 2015 at 14:22

Add a comment |

Colonel Beauvel · Accepted Answer · 2015-02-04 12:28:16Z

3

One possible solution, the idea is to use extract from tidyr. Note that the delimiter I choose (the dot) must not appear in your initial data.frame.

library(magrittr)
library(tidyr)

df$colname = df$colname %>% 
             grepl("[>=|-]+", .) %>% 
             ifelse(paste0(".", df$colname), paste0(df$colname, ".")) 

extract(df, colname, c("col1","col2"), "(.*)\\.(.*)")
#     col1  col2
#1  222222      
#2 1111111      
#3          >=70
#4         60-69
#5         20-29

Data:

df = data.frame(colname=c("222222","1111111",">=70","60-69","20-29"))

edited Feb 4, 2015 at 12:28

answered Feb 4, 2015 at 10:33

Colonel Beauvel

30.9k11 gold badges48 silver badges88 bronze badges

You should swap the library() calls otherwise your code won't work because tidyr::extract is masked by magrittr:extract
– alex23lemm
Commented Feb 4, 2015 at 11:09
This approach looks promising and might work, but I am unable to use tidyr.
– Abhishek
Commented Feb 4, 2015 at 11:10
The error messsage says "Error in library(tidyr) : there is no package called ‘tidyr’"
– Abhishek
Commented Feb 4, 2015 at 11:11
maybe you need first install.packages("tidyr") ?
– Colonel Beauvel
Commented Feb 4, 2015 at 12:22

Add a comment |

G. Grothendieck · Accepted Answer · 2015-02-04 11:11:03Z

Here is a single statement solution. read.pattern captures the two field types separately in the parts of the regular expression surrounded by parentheses. (format can be omitted if the Colname column is already of class "character". Also, if it were desired to have the first column numeric then omit the colClasses argument.)

library(gsubfn)
read.pattern(text = format(DF$Colname), pattern = "(^\\d+$)|(.*)", 
                   col.names = c("Col1", "Col2"), colClasses = "character")

giving:

      col1     col2
1 20151102         
2 19920311         
3 20130204         
4          >=70    
5          60-69   
6          20-29

Note: Here is a visualization of the regular expression used:

(^\d+$)|(.*)

Regular expression visualization

Debuggex Demo

Collectives™ on Stack Overflow

How to split contents in a single column into two separate columns in R?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
regex
r
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged regexrdataframe or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
regex
r
dataframe
or ask your own question.