Converting dataframe from "wide" to "long" format with pairs of ID variables

Question

Here's an example of what I am trying to do. I am starting with a dataframe in "wide" format, like below.

#sample dataframe
id_1 <- c(260, 500, 640, 720)
id_2 <- c(261, 501, 641, 721)
sleep_1 <- c(7, 3, 10, 6)
sleep_2 <- c(8, 9, 1, 4)
eat_1 <- c(6,8,4,2)
eat_2 <- c(8,1,3,8)
df <- data.frame(id_1, id_2, sleep_1, sleep_2, eat_1, eat_2)

> df
  id_1 id_2 sleep_1 sleep_2 eat_1 eat_2
1  260  261       7       8     6     8
2  500  501       3       9     8     1
3  640  641      10       1     4     3
4  720  721       6       4     2     8
5  801   NA       8      NA     5    NA
6  440  441       4       9     3     6

We can think of id_1 and id_2 as denoting pairs of siblings, in which 260, 261 is a pair, and 500, 501 is a pair etc. I would like to convert this dataframe to one in "long" format like below. In doing so, I would also like to be able to handle cases in which only one member of the pair is present (like 801) and corresponding sibling is NA (as shown above).

    id sleep eat
1  260     7   6
2  261     8   8
3  500     3   8
4  501     9   1
5  640    10   4
6  641     1   3
7  720     6   2
8  721     4   8
9  801     8   5
10 440     4   3
11 441     9   6

possible duplicates: stackoverflow.com/q/59253987/16421247; stackoverflow.com/q/71359531/16421247 — nightstand, Commented May 23 at 1:09
pivot_longer(df, everything(), names_to = c('.value', NA), names_sep = '_') — Onyambu, Commented May 23 at 3:33

jared_mamrot · Accepted Answer · 2024-05-23 02:04:35Z

One potential option is to use .value in the pivot_longer() function to get the corresponding component of the column name (per https://tidyr.tidyverse.org/reference/pivot_longer.html), e.g.

library(tidyverse)

id_1 <- c(260, 500, 640, 720)
id_2 <- c(261, 501, 641, 721)
sleep_1 <- c(7, 3, 10, 6)
sleep_2 <- c(8, 9, 1, 4)
eat_1 <- c(6,8,4,2)
eat_2 <- c(8,1,3,8)
df <- data.frame(id_1, id_2, sleep_1, sleep_2, eat_1, eat_2)

df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value")
#> # A tibble: 8 × 3
#>      id sleep   eat
#>   <dbl> <dbl> <dbl>
#> 1   260     7     6
#> 2   261     8     8
#> 3   500     3     8
#> 4   501     9     1
#> 5   640    10     4
#> 6   641     1     3
#> 7   720     6     2
#> 8   721     4     8

^{Created on 2024-05-23 with reprex v2.1.0}

Regarding NAs, you could filter out the 'missing' patients using filter():

library(tidyverse)

id_1 <- c(260, 500, 640, 720, 801, 901, 902)
id_2 <- c(261, 501, 641, 721, NA, 444, 555)
sleep_1 <- c(7, 3, 10, 6, 8, 10, 12)
sleep_2 <- c(8, 9, 1, 4, NA, 6, 7)
eat_1 <- c(6,8,4,2,5,6,7)
eat_2 <- c(8,1,3,8,NA,5,6)
df <- data.frame(id_1, id_2, sleep_1, sleep_2, eat_1, eat_2)

df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value")
#> # A tibble: 14 × 3
#>       id sleep   eat
#>    <dbl> <dbl> <dbl>
#>  1   260     7     6
#>  2   261     8     8
#>  3   500     3     8
#>  4   501     9     1
#>  5   640    10     4
#>  6   641     1     3
#>  7   720     6     2
#>  8   721     4     8
#>  9   801     8     5
#> 10    NA    NA    NA
#> 11   901    10     6
#> 12   444     6     5
#> 13   902    12     7
#> 14   555     7     6

df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value") %>%
  filter(!if_all(everything(), is.na)) # filter out '802' (all NAs)
#> # A tibble: 13 × 3
#>       id sleep   eat
#>    <dbl> <dbl> <dbl>
#>  1   260     7     6
#>  2   261     8     8
#>  3   500     3     8
#>  4   501     9     1
#>  5   640    10     4
#>  6   641     1     3
#>  7   720     6     2
#>  8   721     4     8
#>  9   801     8     5
#> 10   901    10     6
#> 11   444     6     5
#> 12   902    12     7
#> 13   555     7     6

# or use na.omit()
df %>%
  pivot_longer(everything(),
               names_pattern = "(\\w+)_\\d+",
               names_to = ".value") %>%
  na.omit()
#> # A tibble: 13 × 3
#>       id sleep   eat
#>    <dbl> <dbl> <dbl>
#>  1   260     7     6
#>  2   261     8     8
#>  3   500     3     8
#>  4   501     9     1
#>  5   640    10     4
#>  6   641     1     3
#>  7   720     6     2
#>  8   721     4     8
#>  9   801     8     5
#> 10   901    10     6
#> 11   444     6     5
#> 12   902    12     7
#> 13   555     7     6

^{Created on 2024-05-23 with reprex v2.1.0}

Hello, my apologies for the delayed comment. I found your second piece of code to be useful. Could you explain to me what's going on exactly? I understand that it's basically identifying the columns with "_" in the names_pattern part. But how is it converting it to long exactly? Maybe it's this part "names_to = ".value" that I don't understand. Thanks again! — wooden05, Commented Jun 5 at 23:17
Hi @wooden05, further details and an example are provided in the link I included (tidyr.tidyverse.org/reference/pivot_longer.html) - search for ".value" on that webpage and it will help you understand what's going on. After reading through the docs/example, if you have additional questions please leave another comment and I'll try to help. — jared_mamrot, Commented Jun 6 at 1:36

Pedro Faria · Accepted Answer · 2024-05-23 00:44:51Z

1

This code might be a solution for your problem:

library(tidyverse)
id1 <- df %>% 
  select(ends_with("1"))
colnames(id1) <- c("id", "sleep", "eat")

id2 <- df %>% 
  select(ends_with("2"))
colnames(id2) <- c("id", "sleep", "eat")

df <- id1 %>% bind_rows(id2)
df

   id sleep eat
1 260     7   6
2 500     3   8
3 640    10   4
4 720     6   2
5 261     8   8
6 501     9   1
7 641     1   3
8 721     4   8

answered May 23 at 0:44

Pedro Faria

8094 silver badges8 bronze badges

Add a comment |

Friede · Accepted Answer · 2024-05-23 08:37:00Z

0

Staying in base R without using reshape():

# df = 
rbind(setNames(df[i<-grep("_1", ocn)], ncn<-unique(gsub("_.*", "", ocn<-names(df)))), 
      setNames(df[-i], ncn)) |> sort_by(~id) # cosmetics

   id sleep eat
1 260     7   6
5 261     8   8
2 500     3   8
6 501     9   1
3 640    10   4
7 641     1   3
4 720     6   2
8 721     4   8

answered May 23 at 8:37

Friede

4,3771 gold badge6 silver badges23 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Converting dataframe from "wide" to "long" format with pairs of ID variables

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
r
dataframe
reshape
long-format-data
wide-format-data
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged rdataframereshapelong-format-datawide-format-data or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
r
dataframe
reshape
long-format-data
wide-format-data
or ask your own question.