Reshape seems to be losing observations

Question

i am using reshape in dplyr to turn a long data table into a wide one. but when i do, i seem to be losing some observations. the data table called "data" is a record of 6 different species of bird observations over 20 years. i would like to make it a wide data table by combining each location/date to include 0s when each species was not observed during a survey.

head(data) COMMON.NAME LOCALITY.ID OBSERVATION.DATE OBSERVATION.COUNT 1 Bonellis_Eagle L1210237 12/17/2007 1 2 Boreal_Owl L11834228 9/3/2020 1 3 Saker_Falcon L12137171 6/27/2021 1 4 Saker_Falcon L1218263 4/27/2004 1 5 Brown_Fish_Owl L13864707 2/26/2021 1 6 Bonellis_Eagle L16000115 8/6/2021 2

table(data$COMMON.NAME, data$OBSERVATION.COUNT)

                      1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  17  20  21  22  26  27  28  31  35  39  40  41  42  50  51  60  62  64  94 100

Bonellis_Eagle 137 51 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Boreal_Owl 18 2 2 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Brown_Fish_Owl 51 29 11 6 4 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Great_Bustard 38 13 7 9 5 2 2 6 2 5 1 3 0 2 2 1 3 1 1 2 3 1 1 0 1 1 1 1 1 1 1 1 1 1 0 Lanner_Falcon 56 12 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Pin_tailed_Sandgrouse 17 7 2 6 2 4 3 5 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 Saker_Falcon 61 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

i should have 69 total Saker Falcons (61:1s, 2:2s, and 1:4)

then i reshape the data table using LOCALITY.ID & OBSERVATION.DATE, and use the time.var "COMMON.NAME" to get a wide data table for all 6 species:

d_sens = reshape(data, idvar=c('LOCALITY.ID','OBSERVATION.DATE'), timevar = 'COMMON.NAME', direction='wide')

finally, i replace the NAs with 0s

colnames(d_sens) = c('location','date','BE','BO','SF','BFO','GB','PS','LF') 
d_sens$BE[is.na(d_sens$BE)] = 0
d_sens$BO[is.na(d_sens$BO)] = 0
d_sens$SF[is.na(d_sens$SF)] = 0
d_sens$BFO[is.na(d_sens$BFO)] = 0
d_sens$GB[is.na(d_sens$GB)] = 0
d_sens$PS[is.na(d_sens$PS)] = 0
d_sens$LF[is.na(d_sens$LF)] = 0

head(d_sens) location date BE BO SF BFO GB PS LF 1 L1210237 12/17/2007 1 0 0 0 0 0 0 2 L11834228 9/3/2020 0 1 0 0 0 0 0 3 L12137171 6/27/2021 0 0 1 0 0 0 0 4 L1218263 4/27/2004 0 0 1 0 0 0 0 5 L13864707 2/26/2021 0 0 0 1 0 0 0 6 L16000115 8/6/2021 2 0 0 0 0 0 0

this is the type of data table that i want.

table(d_sens$SF)

0 1 2 4 545 57 2 1

when i compare the number of times of Saker Falcon observations (between the wide "d_sens" and the long "data" files), i've lost 4 of the times 1 was observed. and i don't have any missing values in "data". any suggestions would be much appreciated.

i tried using the dplyr reshape command:

d_sens = reshape(data, idvar=c('LOCALITY.ID','OBSERVATION.DATE'), timevar = 'COMMON.NAME', direction='wide')

d_sens looks like the type of table i want, but values are missing.

i also tried the tidyr pivot_wider function (it was suggested in an earlier post), but i can't get that to work.

data %>% pivot_wider(names_from=c(LOCALITY.ID,OBSERVATION.DATE), values_from=OBSERVATION.COUNT)

but i get this warning: Warning message: Values from OBSERVATION.COUNT are not uniquely identified; output will contain list-cols.

E.Wiest · Accepted Answer · 2024-05-18 14:28:09Z

It could be a problem with duplicates in your original data. For example, for the same location, the same day and the same specie :

COMMON.NAME     LOCALITY.ID OBSERVATION.DATE    OBSERVATION.COUNT
Bonellis_Eagle  L1218263    03/01/2023          1
Bonellis_Eagle  L1218263    03/01/2023          3

Reshape will keep only the first element when pivoting.
Pivot_wider will throw an error ("Values from XXX are not uniquely identified"). Also, you didn't use the the appropriate columns when you called it.

Assuming "data" is the name of your dataframe, before pivoting it may be a good idea to remove duplicates with :

library(dplyr)
data = data %>% distinct(LOCALITY.ID,OBSERVATION.DATE,COMMON.NAME,.keep_all = TRUE)

or if you want to sum observation counts of duplicated rows into one :

data %>%
  group_by(LOCALITY.ID, OBSERVATION.DATE, COMMON.NAME) %>%
  summarise(OBSERVATION.COUNT = sum(OBSERVATION.COUNT),.groups = "drop")

After this step you should have unique rows (1 row = 1 location, 1 day, 1 specie).

Toy dataset :

data=structure(list(COMMON.NAME = c("Bonellis_Eagle", "Lanner_Falcon", 
"Saker_Falcon", "Great_Bustard", "Brown_Fish_Owl", "Pin_tailed_Sandgrouse", 
"Boreal_Owl", "Brown_Fish_Owl", "Lanner_Falcon", "Lanner_Falcon", 
"Brown_Fish_Owl", "Great_Bustard", "Saker_Falcon", "Saker_Falcon", 
"Brown_Fish_Owl", "Boreal_Owl", "Lanner_Falcon", "Bonellis_Eagle", 
"Lanner_Falcon", "Saker_Falcon", "Saker_Falcon", "Brown_Fish_Owl", 
"Bonellis_Eagle", "Pin_tailed_Sandgrouse", "Great_Bustard", "Pin_tailed_Sandgrouse", 
"Saker_Falcon", "Great_Bustard", "Boreal_Owl", "Brown_Fish_Owl", 
"Lanner_Falcon", "Bonellis_Eagle", "Bonellis_Eagle", "Great_Bustard", 
"Lanner_Falcon", "Boreal_Owl", "Bonellis_Eagle", "Lanner_Falcon", 
"Brown_Fish_Owl", "Boreal_Owl", "Saker_Falcon", "Boreal_Owl", 
"Pin_tailed_Sandgrouse", "Brown_Fish_Owl", "Saker_Falcon", "Bonellis_Eagle", 
"Pin_tailed_Sandgrouse", "Boreal_Owl", "Lanner_Falcon", "Lanner_Falcon", 
"Lanner_Falcon", "Pin_tailed_Sandgrouse", "Pin_tailed_Sandgrouse", 
"Boreal_Owl", "Lanner_Falcon", "Bonellis_Eagle", "Bonellis_Eagle", 
"Brown_Fish_Owl", "Pin_tailed_Sandgrouse", "Boreal_Owl", "Lanner_Falcon", 
"Saker_Falcon", "Pin_tailed_Sandgrouse", "Saker_Falcon", "Great_Bustard", 
"Saker_Falcon", "Boreal_Owl", "Saker_Falcon", "Great_Bustard", 
"Saker_Falcon", "Brown_Fish_Owl", "Pin_tailed_Sandgrouse", "Boreal_Owl", 
"Lanner_Falcon", "Pin_tailed_Sandgrouse", "Bonellis_Eagle", "Boreal_Owl", 
"Great_Bustard", "Saker_Falcon", "Bonellis_Eagle", "Pin_tailed_Sandgrouse", 
"Pin_tailed_Sandgrouse"), LOCALITY.ID = c("L1218263", "L11834228", 
"L12137171", "L1210237", "L16000115", "L13864707", "L1210237", 
"L11834228", "L16000115", "L1210237", "L1210237", "L16000115", 
"L16000115", "L13864707", "L16000115", "L12137171", "L13864707", 
"L16000115", "L1210237", "L1210237", "L1210237", "L13864707", 
"L13864707", "L11834228", "L12137171", "L1218263", "L11834228", 
"L13864707", "L13864707", "L1210237", "L13864707", "L1210237", 
"L12137171", "L1210237", "L1210237", "L1210237", "L16000115", 
"L1210237", "L11834228", "L1218263", "L1218263", "L1210237", 
"L13864707", "L12137171", "L13864707", "L11834228", "L1218263", 
"L1210237", "L16000115", "L1218263", "L13864707", "L1210237", 
"L1218263", "L13864707", "L16000115", "L1210237", "L11834228", 
"L12137171", "L16000115", "L16000115", "L13864707", "L12137171", 
"L12137171", "L13864707", "L13864707", "L16000115", "L11834228", 
"L1210237", "L1210237", "L1210237", "L11834228", "L12137171", 
"L12137171", "L16000115", "L13864707", "L12137171", "L11834228", 
"L1218263", "L11834228", "L12137171", "L11834228", "L11834228"
), OBSERVATION.DATE = c("03/01/2023", "06/01/2023", "05/01/2023", 
"06/01/2023", "07/01/2023", "06/01/2023", "06/01/2023", "06/01/2023", 
"03/01/2023", "07/01/2023", "01/01/2023", "06/01/2023", "02/01/2023", 
"02/01/2023", "03/01/2023", "01/01/2023", "02/01/2023", "05/01/2023", 
"06/01/2023", "03/01/2023", "02/01/2023", "07/01/2023", "02/01/2023", 
"03/01/2023", "06/01/2023", "01/01/2023", "02/01/2023", "01/01/2023", 
"02/01/2023", "06/01/2023", "05/01/2023", "06/01/2023", "03/01/2023", 
"03/01/2023", "04/01/2023", "01/01/2023", "04/01/2023", "03/01/2023", 
"04/01/2023", "04/01/2023", "07/01/2023", "07/01/2023", "03/01/2023", 
"06/01/2023", "01/01/2023", "01/01/2023", "02/01/2023", "05/01/2023", 
"06/01/2023", "04/01/2023", "01/01/2023", "01/01/2023", "03/01/2023", 
"03/01/2023", "02/01/2023", "02/01/2023", "06/01/2023", "07/01/2023", 
"01/01/2023", "04/01/2023", "07/01/2023", "04/01/2023", "05/01/2023", 
"04/01/2023", "05/01/2023", "01/01/2023", "04/01/2023", "05/01/2023", 
"05/01/2023", "06/01/2023", "02/01/2023", "04/01/2023", "06/01/2023", 
"04/01/2023", "02/01/2023", "05/01/2023", "02/01/2023", "01/01/2023", 
"05/01/2023", "04/01/2023", "06/01/2023", "01/01/2023"), OBSERVATION.COUNT = c(1, 
5, 2, 2, 4, 2, 5, 1, 1, 2, 0, 1, 1, 5, 3, 3, 1, 3, 1, 1, 0, 3, 
0, 0, 3, 5, 5, 5, 4, 2, 5, 4, 1, 4, 5, 1, 3, 4, 5, 0, 0, 0, 4, 
2, 5, 0, 2, 2, 0, 1, 0, 5, 3, 3, 3, 2, 0, 4, 4, 2, 1, 5, 0, 2, 
4, 1, 4, 2, 2, 3, 0, 5, 1, 4, 3, 2, 1, 3, 4, 0, 1, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -82L))

Pivoting (+ setting names of columns, removing NAs, sorting the df).

library(tidyr)
output = data %>%
  pivot_wider(names_from = 'COMMON.NAME',values_from = 'OBSERVATION.COUNT') %>%
  rename('location'='LOCALITY.ID',
         'date'='OBSERVATION.DATE',
         'BE'='Bonellis_Eagle',
         'LF'='Lanner_Falcon',
         'SF'='Saker_Falcon',
         'GB'='Great_Bustard',
         'BFO'='Brown_Fish_Owl',
         'PS'='Pin_tailed_Sandgrouse',
         'BO'='Boreal_Owl') %>% 
  mutate(across(everything(),~replace_na(.x,0))) %>% 
  arrange(location,date) %>% 
  select(location,date,BE,BFO,BO,GB,LF,PS,SF)

Output :

# A tibble: 38 × 9
   location  date          BE   BFO    BO    GB    LF    PS    SF
   <chr>     <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 L11834228 01/01/2023     0     0     0     0     0     2     0
 2 L11834228 02/01/2023     0     0     1     0     0     0     5
 3 L11834228 03/01/2023     0     0     0     0     0     0     0
 4 L11834228 04/01/2023     0     5     4     0     0     0     0
 5 L11834228 05/01/2023     0     0     0     0     0     0     4
 6 L11834228 06/01/2023     0     1     0     0     5     1     0
 7 L1210237  01/01/2023     0     0     1     0     0     5     0
 8 L1210237  02/01/2023     2     0     0     0     0     0     0
 9 L1210237  03/01/2023     0     0     0     4     4     0     1
10 L1210237  04/01/2023     0     0     0     0     5     0     0
# ℹ 28 more rows

hi E.Wiest, i think you're right about the problem with duplicates. when i run your code (for either reshape or pivot_wider) i get a different, but still strange result. 0 1 2 4 545 53 6 1 now there are only 53 1s, and 6 2s. there should be 61 1s and 2 2s. thank you for the help. — colby tanner, Commented May 17 at 23:26
You're right. There was a mistake with the renaming of the columns. Post has been edited. Don't forget to remove duplicates (code has been updated) before pivoting. — E.Wiest, Commented May 18 at 0:34
That's it. thank you. now i need to think about whether i should be including the duplicate values or not (a more philosophical question). but you found the missing birds for which i was searching after pivoting. thanks again. — colby tanner, Commented May 21 at 0:11

Collectives™ on Stack Overflow

Reshape seems to be losing observations

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
pivot-table
reshape
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pivot-tablereshape or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
pivot-table
reshape
or ask your own question.