i am using reshape in dplyr to turn a long data table into a wide one. but when i do, i seem to be losing some observations. the data table called "data" is a record of 6 different species of bird observations over 20 years. i would like to make it a wide data table by combining each location/date to include 0s when each species was not observed during a survey.
head(data) COMMON.NAME LOCALITY.ID OBSERVATION.DATE OBSERVATION.COUNT 1 Bonellis_Eagle L1210237 12/17/2007 1 2 Boreal_Owl L11834228 9/3/2020 1 3 Saker_Falcon L12137171 6/27/2021 1 4 Saker_Falcon L1218263 4/27/2004 1 5 Brown_Fish_Owl L13864707 2/26/2021 1 6 Bonellis_Eagle L16000115 8/6/2021 2
table(data$COMMON.NAME, data$OBSERVATION.COUNT)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 20 21 22 26 27 28 31 35 39 40 41 42 50 51 60 62 64 94 100
Bonellis_Eagle 137 51 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Boreal_Owl 18 2 2 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Brown_Fish_Owl 51 29 11 6 4 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Great_Bustard 38 13 7 9 5 2 2 6 2 5 1 3 0 2 2 1 3 1 1 2 3 1 1 0 1 1 1 1 1 1 1 1 1 1 0 Lanner_Falcon 56 12 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Pin_tailed_Sandgrouse 17 7 2 6 2 4 3 5 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 Saker_Falcon 61 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i should have 69 total Saker Falcons (61:1s, 2:2s, and 1:4)
then i reshape the data table using LOCALITY.ID & OBSERVATION.DATE, and use the time.var "COMMON.NAME" to get a wide data table for all 6 species:
d_sens = reshape(data, idvar=c('LOCALITY.ID','OBSERVATION.DATE'), timevar = 'COMMON.NAME', direction='wide')
finally, i replace the NAs with 0s
colnames(d_sens) = c('location','date','BE','BO','SF','BFO','GB','PS','LF')
d_sens$BE[is.na(d_sens$BE)] = 0
d_sens$BO[is.na(d_sens$BO)] = 0
d_sens$SF[is.na(d_sens$SF)] = 0
d_sens$BFO[is.na(d_sens$BFO)] = 0
d_sens$GB[is.na(d_sens$GB)] = 0
d_sens$PS[is.na(d_sens$PS)] = 0
d_sens$LF[is.na(d_sens$LF)] = 0
head(d_sens) location date BE BO SF BFO GB PS LF 1 L1210237 12/17/2007 1 0 0 0 0 0 0 2 L11834228 9/3/2020 0 1 0 0 0 0 0 3 L12137171 6/27/2021 0 0 1 0 0 0 0 4 L1218263 4/27/2004 0 0 1 0 0 0 0 5 L13864707 2/26/2021 0 0 0 1 0 0 0 6 L16000115 8/6/2021 2 0 0 0 0 0 0
this is the type of data table that i want.
table(d_sens$SF)
0 1 2 4 545 57 2 1
when i compare the number of times of Saker Falcon observations (between the wide "d_sens" and the long "data" files), i've lost 4 of the times 1 was observed. and i don't have any missing values in "data". any suggestions would be much appreciated.
i tried using the dplyr reshape command:
d_sens = reshape(data, idvar=c('LOCALITY.ID','OBSERVATION.DATE'), timevar = 'COMMON.NAME', direction='wide')
d_sens looks like the type of table i want, but values are missing.
i also tried the tidyr pivot_wider function (it was suggested in an earlier post), but i can't get that to work.
data %>% pivot_wider(names_from=c(LOCALITY.ID,OBSERVATION.DATE), values_from=OBSERVATION.COUNT)
but i get this warning:
Warning message:
Values from OBSERVATION.COUNT
are not uniquely identified; output will contain list-cols.