95

Say I have a data.frame:

df <- data.frame(A=c(10,20,30),B=c(11,22,33), C=c(111,222,333))
  A  B  C
1 10 11 111
2 20 22 222
3 30 33 333

If I select two (or more) columns I get a data.frame:

x <- df[,1:2]
   A  B
 1 10 11
 2 20 22
 3 30 33

This is what I want. However, if I select only one column I get a numeric vector:

x <- df[,1]
[1] 1 2 3

I have tried to use as.data.frame(), which does not change the results for two or more columns. it does return a data.frame in the case of one column, but does not retain the column name:

x <- as.data.frame(df[,1])
     df[, 1]
1       1
2       2
3       3

I don't understand why it behaves like this. In my mind it should not make a difference if I extract one or two or ten columns. IT should either always return a vector (or matrix) or always return a data.frame (with the correct names). what am I missing? thanks!

Note: This is not a duplicate of the question about matrices, as matrix and data.frame are fundamentally different data types in R, and can work differently with dplyr. There are several answers that work with data.frame but not matrix.

2
  • This is not a duplicate, as matrix and data.frame can work differently with dplyr.
    – qwr
    Commented Jan 23, 2019 at 0:49
  • 1
    For data.frame, the tidy way with dplyr:select: mtcars %>% dplyr::select("wt")
    – qwr
    Commented Feb 6, 2019 at 19:50

3 Answers 3

137

Use drop=FALSE

> x <- df[,1, drop=FALSE]
> x
   A
1 10
2 20
3 30

From the documentation (see ?"[") you can find:

If drop=TRUE the result is coerced to the lowest possible dimension.

0
39

Omit the ,:

x <- df[1]

   A
1 10
2 20
3 30

From the help page of ?"[":

Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).

A data frame is a list. The columns are its elements.

1
  • or alternatively: df["A"]
    – maxbre
    Commented Jul 13, 2021 at 7:43
1

You can also use subset:

subset(df, select = 1) # by index
subset(df, select = A) # by name

As mentioned in the comments you can also use dplyr::select, but you do not need to quote the variable name:

library(dplyr)

# by name
df %>% 
  select(A)

# by index
df %>% 
  select(1)

Not the answer you're looking for? Browse other questions tagged or ask your own question.