Coerce multiple columns to factors at once

Question

I have a sample data frame like below:

data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))

I want to know how can I select multiple columns and convert them together to factors. I usually do it in the way like data$A = as.factor(data$A). But when the data frame is very large and contains lots of columns, this way will be very time consuming. Does anyone know of a better way to do it?

All answers here are using function factor not as.factor (as you did). In fact, using as.factor is preferred: Why use as.factor() instead of just factor() — Zheyuan Li, Commented Sep 11, 2018 at 14:55

Rich Scriven · Accepted Answer · 2018-11-29 04:04:56Z

157

Choose some columns to coerce to factors:

cols <- c("A", "C", "D", "H")

Use lapply() to coerce and replace the chosen columns:

data[cols] <- lapply(data[cols], factor)  ## as.factor() could also be used

Check the result:

sapply(data, class)
#        A         B         C         D         E         F         G 
# "factor" "integer"  "factor"  "factor" "integer" "integer" "integer" 
#        H         I         J 
# "factor" "integer" "integer"

edited Nov 29, 2018 at 4:04

answered Oct 16, 2015 at 22:07

Rich Scriven

98.9k11 gold badges188 silver badges251 bronze badges

1

Wouldn't it need to be data[,cols] <- lapply(data[,cols], factor) (with the leading comma for columns)?
– TayTay
Commented Oct 16, 2015 at 23:23
8

@Tgsmith61591- It could be either. With the comma is a matrix-type subset, without the comma is a list subset. Data frames can be subsetted by either one so either way would work.
– Rich Scriven
Commented Oct 16, 2015 at 23:26
1

How can this solution be expanded to include factor levels and labels?
– Ben
Commented Aug 31, 2018 at 13:23
@Ben - It's probably best to ask a new question
– Rich Scriven
Commented Sep 26, 2018 at 19:46
2

@Ben you can specify labels and levels by extending the answer: data[cols] <- lapply(data[cols], factor, levels=c("val1", "val2", ...), labels=c("label1", "label2", ...) be careful with this though... all of the variables will use the same levels and labels you provide.
– Brian D
Commented Aug 7, 2019 at 14:51

| Show 2 more comments

akrun · Accepted Answer · 2015-10-17 06:08:12Z

Here is an option using dplyr. The %<>% operator from magrittr update the lhs object with the resulting value.

library(magrittr)
library(dplyr)
cols <- c("A", "C", "D", "H")

data %<>%
       mutate_each_(funs(factor(.)),cols)
str(data)
#'data.frame':  4 obs. of  10 variables:
# $ A: Factor w/ 4 levels "23","24","26",..: 1 2 3 4
# $ B: int  15 13 39 16
# $ C: Factor w/ 4 levels "3","5","18","37": 2 1 3 4
# $ D: Factor w/ 4 levels "2","6","28","38": 3 1 4 2
# $ E: int  14 4 22 20
# $ F: int  7 19 36 27
# $ G: int  35 40 21 10
# $ H: Factor w/ 4 levels "11","29","32",..: 1 4 3 2
# $ I: int  17 1 9 25
# $ J: int  12 30 8 33

Or if we are using data.table, either use a for loop with set

setDT(data)
for(j in cols){
  set(data, i=NULL, j=j, value=factor(data[[j]]))
}

Or we can specify the 'cols' in .SDcols and assign (:=) the rhs to 'cols'

setDT(data)[, (cols):= lapply(.SD, factor), .SDcols=cols]

Community · Accepted Answer · 2021-09-01 18:38:14Z

40

The more recent tidyverse way is to use the mutate_at function:

library(tidyverse)
library(magrittr)
set.seed(88)

data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
cols <- c("A", "C", "D", "H")

data %<>% mutate_at(cols, factor)
str(data)
 $ A: Factor w/ 4 levels "5","17","18",..: 2 1 4 3   
 $ B: int  36 35 2 26
 $ C: Factor w/ 4 levels "22","31","32",..: 1 2 4 3
 $ D: Factor w/ 4 levels "1","9","16","39": 3 4 1 2
 $ E: int  3 14 30 38
 $ F: int  27 15 28 37
 $ G: int  19 11 6 21
 $ H: Factor w/ 4 levels "7","12","20",..: 1 3 4 2
 $ I: int  23 24 13 8
 $ J: int  10 25 4 33

edited Sep 1, 2021 at 18:38

CommunityBot

11 silver badge

answered Apr 7, 2017 at 14:56

Yun Ching

6757 silver badges10 bronze badges

11

You don't even need to use funs if you only perform one transformation; mutate_at(cols, factor) is sufficient.
– cbrnr
Commented Jun 4, 2018 at 12:19

Add a comment |

GuedesBF · Accepted Answer · 2023-01-16 15:55:08Z

24

As of 2021 (still current in early 2023), the current tidyverse/dplyr approach would be to use across, and a <tidy-select> statement.

library(dplyr)

data %>% mutate(across(*<tidy-select>*, *function*))

across(<tidy-select>) allows very consistent and easy selection of columns to transform. Some examples:

data %>% mutate(across(c(A, B, C, E), as.factor)) # select columns A to C, and E (by name)

data %>% mutate(across(where(is.character), as.factor)) # select character columns

data %>% mutate(across(1:5, as.factor)) # select first 5 columns (by index)

edited Jan 16, 2023 at 15:55

answered Dec 2, 2021 at 23:55

GuedesBF

9,6775 gold badges20 silver badges40 bronze badges

can you add your citation for why we need/should use 'across'? I don't see it in R4DS or the ?dplyr page
– Casey Jayne
Commented Dec 29, 2021 at 19:26
2

dplyr.tidyverse.org/reference/across.html "across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all()."
– GuedesBF
Commented Dec 30, 2021 at 1:38

Add a comment |

neves · Accepted Answer · 2019-03-26 04:19:51Z

You can use mutate_if (dplyr):

For example, coerce integer in factor:

mydata=structure(list(a = 1:10, b = 1:10, c = c("a", "a", "b", "b", 
"c", "c", "c", "c", "c", "c")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 10 x 3
       a     b c    
   <int> <int> <chr>
 1     1     1 a    
 2     2     2 a    
 3     3     3 b    
 4     4     4 b    
 5     5     5 c    
 6     6     6 c    
 7     7     7 c    
 8     8     8 c    
 9     9     9 c    
10    10    10 c

Use the function:

library(dplyr)

mydata%>%
    mutate_if(is.integer,as.factor)

# A tibble: 10 x 3
       a     b c    
   <fct> <fct> <chr>
 1     1     1 a    
 2     2     2 a    
 3     3     3 b    
 4     4     4 b    
 5     5     5 c    
 6     6     6 c    
 7     7     7 c    
 8     8     8 c    
 9     9     9 c    
10    10    10 c

Community · Accepted Answer · 2017-05-23 12:10:43Z

7

and, for completeness and with regards to this question asking about changing string columns only, there's mutate_if:

data <- cbind(stringVar = sample(c("foo","bar"),10,replace=TRUE),
              data.frame(matrix(sample(1:40), 10, 10, dimnames = list(1:10, LETTERS[1:10]))),stringsAsFactors=FALSE)     

factoredData = data %>% mutate_if(is.character,funs(factor(.)))

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Apr 26, 2017 at 15:34

Janna Maas

1,13410 silver badges15 bronze badges

Add a comment |

Kayle Sawyer · Accepted Answer · 2020-10-02 02:15:02Z

4

Here is a data.table example. I used grep in this example because that's how I often select many columns by using partial matches to their names.

library(data.table)
data <- data.table(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))

factorCols <- grep(pattern = "A|C|D|H", x = names(data), value = TRUE)

data[, (factorCols) := lapply(.SD, as.factor), .SDcols = factorCols]

edited Oct 2, 2020 at 2:15

answered Mar 16, 2019 at 0:24

Kayle Sawyer

5897 silver badges23 bronze badges

Add a comment |

Paul Roub · Accepted Answer · 2021-05-18 17:50:48Z

1

A simple and updated solution

data <- data %>%
    mutate_at(cols, list(~factor(.)))

edited May 18, 2021 at 17:50

Paul Roub

36.4k27 gold badges84 silver badges93 bronze badges

answered May 18, 2021 at 17:47

cmoshe

3494 silver badges7 bronze badges

Add a comment |

user9333657 · Accepted Answer · 2018-02-08 14:26:29Z

0

If you have another objective of getting in values from the table then using them to be converted, you can try the following way

### pre processing
ind <- bigm.train[,lapply(.SD,is.character)]
ind <- names(ind[,.SD[T]])
### Convert multiple columns to factor
bigm.train[,(ind):=lapply(.SD,factor),.SDcols=ind]

This selects columns which are specifically character based and then converts them to factor.

answered Feb 8, 2018 at 14:26

user9333657

1

Add a comment |

Todd · Accepted Answer · 2020-08-04 14:27:33Z

Here is another tidyverse approach using the modify_at() function from the purrr package.

library(purrr)

# Data frame with only integer columns
data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))

# Modify specified columns to a factor class
data_with_factors <- data %>%
    purrr::modify_at(c("A", "C", "E"), factor)


# Check the results:
str(data_with_factors)
# 'data.frame':   4 obs. of  10 variables:
#  $ A: Factor w/ 4 levels "8","12","33",..: 1 3 4 2
#  $ B: int  25 32 2 19
#  $ C: Factor w/ 4 levels "5","15","35",..: 1 3 4 2
#  $ D: int  11 7 27 6
#  $ E: Factor w/ 4 levels "1","4","16","20": 2 3 1 4
#  $ F: int  21 23 39 18
#  $ G: int  31 14 38 26
#  $ H: int  17 24 34 10
#  $ I: int  13 28 30 29
#  $ J: int  3 22 37 9

John Karuitha · Accepted Answer · 2020-08-28 03:08:11Z

It appears that the use of SAPPLY on a data.frame to convert variables to factors at once does not work as it produces a matrix/ array. My approach is to use LAPPLY instead, as follows.

## let us create a data.frame here

class <- c("7", "6", "5", "3")

cash <- c(100, 200, 300, 150)

height <- c(170, 180, 150, 165)

people <- data.frame(class, cash, height)

class(people) ## This is a dataframe 

## We now apply lapply to the data.frame as follows.

bb <- lapply(people, as.factor) %>% data.frame() 

## The lapply part returns a list which we coerce back to a data.frame

class(bb) ## A data.frame

##Now let us check the classes of the variables 

class(bb$class)

class(bb$height)

class(bb$cash) ## as expected, are all factors.

Mehmet Yildirim · Accepted Answer · 2023-10-21 12:23:51Z

0

Here is a solution if you are trying to convert multiple columns with a matching pattern in data:

library(dplyr)

data <- data.frame(matrix(sample(0:1, 40, replace = TRUE), 4, 10, 
                   dimnames = list(1:4, LETTERS[1:10])))
colnames(data) <- c(LETTERS[1:5], paste0(rep("binary_", 5), LETTERS[6:10]))

data <- data %>% 
  mutate_if(grepl("binary", colnames(.)), as.factor)

edited Oct 21, 2023 at 12:23

answered Jul 26, 2023 at 21:13

Mehmet Yildirim

4931 gold badge5 silver badges18 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Coerce multiple columns to factors at once

12 Answers 12

Not the answer you're looking for? Browse other questions tagged
r
dataframe
r-factor
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

Not the answer you're looking for? Browse other questions tagged rdataframer-factor or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
r
dataframe
r-factor
or ask your own question.