5

Each week, I get a raw dataset from which I need to produce a report. I would like to write one R script that will work every week. Unfortunately, the raw data has slightly different sets of columns each week depending on what specimens were collected. Here's an example of what it looks like week to week.

library(readr)
w1 <- read_csv("species, males, females - fed, females - unfed
a,2,0,3
b,5,7,2
c,8,4,9")
w2 <- read_csv("species, males, females - mixed
a,2,0
b,5,7
c,8,4")

> w1
# A tibble: 3 x 4
  species males `females - fed` `females - unfed`
  <chr>   <dbl>           <dbl>             <dbl>
1 a           2               0                 3
2 b           5               7                 2
3 c           8               4                 9
> w2
# A tibble: 3 x 3
  species males `females - mixed`
  <chr>   <dbl>             <dbl>
1 a           2                 0
2 b           5                 7
3 c           8                 4

Here's how I normally use pivot_longer:

library(tidyr)
w1 %>% pivot_longer(cols = c(males, `females - fed`, `females - unfed`),
             names_to = c("sex","feeding_status"),
             names_sep = " - ",
             values_to = "quantity")

# A tibble: 9 x 4
  species sex     feeding_status quantity
  <chr>   <chr>   <chr>             <dbl>
1 a       males   NA                    2
2 a       females fed                   0
3 a       females unfed                 3
4 b       males   NA                    5
5 b       females fed                   7
6 b       females unfed                 2
7 c       males   NA                    8
8 c       females fed                   4
9 c       females unfed                 9

How can I write code for pivot_longer that will work on w1, w2 and w3? (edited to include w3 -- see below)

I have tried (select(., starts_with("females")) but can't figure out the right syntax. The pivot_longer documentation mentions names_pattern() and extract(), which seem promising but I don't know how to use them. Thank you!

EDIT: in response to akrun's answer, I realized I have to provide slightly more complex sample data. The code also needs to work with a column that occasionally appears in the datasets called "unknown sex", like so:

w3 <- read_csv("species, males, females - mixed, unknown sex
a,2,0,4
b,5,7,0
c,8,4,23")

> w3
# A tibble: 3 x 4
  species males `females - mixed` `unknown sex`
  <chr>   <dbl>             <dbl>         <dbl>
1 a           2                 0             4
2 b           5                 7             0
3 c           8                 4            23

The code suggested below by akrun to address w1 and w2 causes doubled values in the "unknown sex" column of w3:

w3 %>%
  pivot_longer(cols = c(males, starts_with('females')),
               names_to = c("sex", "feeding_status"), names_sep=" - ")

# A tibble: 6 x 5
  species `unknown sex` sex     feeding_status value
  <chr>           <dbl> <chr>   <chr>          <dbl>
1 a                   4 males   NA                 2
2 a                   4 females mixed              0
3 b                   0 males   NA                 5
4 b                   0 females mixed              7
5 c                  23 males   NA                 8
6 c                  23 females mixed              4
3
  • What is the issue with my first solution. It also works with w3. if you do cols = -species
    – akrun
    Commented Feb 26, 2020 at 21:23
  • I checked your w3. It is working fine for me w3 %>% pivot_longer(cols = -species, names_to = c("sex","feeding_status"), names_sep = " - ", values_to = "quantity")
    – akrun
    Commented Feb 26, 2020 at 21:24
  • If you check my updated code, it is not the case
    – akrun
    Commented Feb 26, 2020 at 21:35

1 Answer 1

5

The columns that are common are 'species', so we can use -

library(dplyr)
library(tidyr)
 w1 %>% 
    pivot_longer(cols = -species, 
       names_to = c("sex","feeding_status"),
          names_sep = " - ",
          values_to = "quantity")

Or one of the select_helpers

w1 %>%
    pivot_longer(cols = c(males, starts_with('females')),
      names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 9 x 4
#  species sex     feeding_status value
#  <chr>   <chr>   <chr>          <dbl>
#1 a       males   <NA>               2
#2 a       females fed                0
#3 a       females unfed              3
#4 b       males   <NA>               5
#5 b       females fed                7
#6 b       females unfed              2
#7 c       males   <NA>               8
#8 c       females fed                4
#9 c       females unfed              9

If we want to include multiple cases, then matches in another option

w3 %>% 
   pivot_longer(cols = c(males, matches('^(females|unknown)')), 
   names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 9 x 4
#  species sex         feeding_status value
#  <chr>   <chr>       <chr>          <dbl>
#1 a       males       <NA>               2
#2 a       females     mixed              0
#3 a       unknown sex <NA>               4
#4 b       males       <NA>               5
#5 b       females     mixed              7
#6 b       unknown sex <NA>               0
#7 c       males       <NA>               8
#8 c       females     mixed              4
#9 c       unknown sex <NA>              23
2
  • Hi akrun, thank you for this quick answer! Unfortunately, this code caused unexpected errors in a column I didn't include called "unknown sex." The output caused doubled values in that column. I have edited my question to include it. Thanks again! Commented Feb 26, 2020 at 21:22
  • 1
    The code for multiple cases with matches seems to be working! I'm running some tests to make sure I didn't miss anything before I mark it answered. Thank you!! Commented Feb 26, 2020 at 21:53

Not the answer you're looking for? Browse other questions tagged or ask your own question.