2

Here's an example of what I am trying to do. I am starting with a dataframe in "wide" format, like below.

#sample dataframe
id_1 <- c(260, 500, 640, 720)
id_2 <- c(261, 501, 641, 721)
sleep_1 <- c(7, 3, 10, 6)
sleep_2 <- c(8, 9, 1, 4)
eat_1 <- c(6,8,4,2)
eat_2 <- c(8,1,3,8)
df <- data.frame(id_1, id_2, sleep_1, sleep_2, eat_1, eat_2)

> df
  id_1 id_2 sleep_1 sleep_2 eat_1 eat_2
1  260  261       7       8     6     8
2  500  501       3       9     8     1
3  640  641      10       1     4     3
4  720  721       6       4     2     8
5  801   NA       8      NA     5    NA
6  440  441       4       9     3     6

We can think of id_1 and id_2 as denoting pairs of siblings, in which 260, 261 is a pair, and 500, 501 is a pair etc. I would like to convert this dataframe to one in "long" format like below. In doing so, I would also like to be able to handle cases in which only one member of the pair is present (like 801) and corresponding sibling is NA (as shown above).

    id sleep eat
1  260     7   6
2  261     8   8
3  500     3   8
4  501     9   1
5  640    10   4
6  641     1   3
7  720     6   2
8  721     4   8
9  801     8   5
10 440     4   3
11 441     9   6
3

3 Answers 3

2

One potential option is to use .value in the pivot_longer() function to get the corresponding component of the column name (per https://tidyr.tidyverse.org/reference/pivot_longer.html), e.g.

library(tidyverse)

id_1 <- c(260, 500, 640, 720)
id_2 <- c(261, 501, 641, 721)
sleep_1 <- c(7, 3, 10, 6)
sleep_2 <- c(8, 9, 1, 4)
eat_1 <- c(6,8,4,2)
eat_2 <- c(8,1,3,8)
df <- data.frame(id_1, id_2, sleep_1, sleep_2, eat_1, eat_2)

df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value")
#> # A tibble: 8 × 3
#>      id sleep   eat
#>   <dbl> <dbl> <dbl>
#> 1   260     7     6
#> 2   261     8     8
#> 3   500     3     8
#> 4   501     9     1
#> 5   640    10     4
#> 6   641     1     3
#> 7   720     6     2
#> 8   721     4     8

Created on 2024-05-23 with reprex v2.1.0


Regarding NAs, you could filter out the 'missing' patients using filter():

library(tidyverse)

id_1 <- c(260, 500, 640, 720, 801, 901, 902)
id_2 <- c(261, 501, 641, 721, NA, 444, 555)
sleep_1 <- c(7, 3, 10, 6, 8, 10, 12)
sleep_2 <- c(8, 9, 1, 4, NA, 6, 7)
eat_1 <- c(6,8,4,2,5,6,7)
eat_2 <- c(8,1,3,8,NA,5,6)
df <- data.frame(id_1, id_2, sleep_1, sleep_2, eat_1, eat_2)

df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value")
#> # A tibble: 14 × 3
#>       id sleep   eat
#>    <dbl> <dbl> <dbl>
#>  1   260     7     6
#>  2   261     8     8
#>  3   500     3     8
#>  4   501     9     1
#>  5   640    10     4
#>  6   641     1     3
#>  7   720     6     2
#>  8   721     4     8
#>  9   801     8     5
#> 10    NA    NA    NA
#> 11   901    10     6
#> 12   444     6     5
#> 13   902    12     7
#> 14   555     7     6

df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value") %>%
  filter(!if_all(everything(), is.na)) # filter out '802' (all NAs)
#> # A tibble: 13 × 3
#>       id sleep   eat
#>    <dbl> <dbl> <dbl>
#>  1   260     7     6
#>  2   261     8     8
#>  3   500     3     8
#>  4   501     9     1
#>  5   640    10     4
#>  6   641     1     3
#>  7   720     6     2
#>  8   721     4     8
#>  9   801     8     5
#> 10   901    10     6
#> 11   444     6     5
#> 12   902    12     7
#> 13   555     7     6

# or use na.omit()
df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value") %>%
  na.omit()
#> # A tibble: 13 × 3
#>       id sleep   eat
#>    <dbl> <dbl> <dbl>
#>  1   260     7     6
#>  2   261     8     8
#>  3   500     3     8
#>  4   501     9     1
#>  5   640    10     4
#>  6   641     1     3
#>  7   720     6     2
#>  8   721     4     8
#>  9   801     8     5
#> 10   901    10     6
#> 11   444     6     5
#> 12   902    12     7
#> 13   555     7     6

Created on 2024-05-23 with reprex v2.1.0

2
  • Hello, my apologies for the delayed comment. I found your second piece of code to be useful. Could you explain to me what's going on exactly? I understand that it's basically identifying the columns with "_" in the names_pattern part. But how is it converting it to long exactly? Maybe it's this part "names_to = ".value" that I don't understand. Thanks again!
    – wooden05
    Commented Jun 5 at 23:17
  • Hi @wooden05, further details and an example are provided in the link I included (tidyr.tidyverse.org/reference/pivot_longer.html) - search for ".value" on that webpage and it will help you understand what's going on. After reading through the docs/example, if you have additional questions please leave another comment and I'll try to help. Commented Jun 6 at 1:36
1

This code might be a solution for your problem:

library(tidyverse)
id1 <- df %>% 
  select(ends_with("1"))
colnames(id1) <- c("id", "sleep", "eat")

id2 <- df %>% 
  select(ends_with("2"))
colnames(id2) <- c("id", "sleep", "eat")

df <- id1 %>% bind_rows(id2)
df
   id sleep eat
1 260     7   6
2 500     3   8
3 640    10   4
4 720     6   2
5 261     8   8
6 501     9   1
7 641     1   3
8 721     4   8
0

Staying in base R without using reshape():

# df = 
rbind(setNames(df[i<-grep("_1", ocn)], ncn<-unique(gsub("_.*", "", ocn<-names(df)))), 
      setNames(df[-i], ncn)) |> sort_by(~id) # cosmetics

   id sleep eat
1 260     7   6
5 261     8   8
2 500     3   8
6 501     9   1
3 640    10   4
7 641     1   3
4 720     6   2
8 721     4   8

Not the answer you're looking for? Browse other questions tagged or ask your own question.