0

I have cross-sectional data (for four different time periods) and I am trying to create a pseudo-panel based on four different variables. However, after trying the code (see below) I get more observations than I had in the beginning (from 4687 observations across the different datasets, now I have 202245 observations), which does not make sense as individuals with the same values on the four variables ("sex", "age_group", "education", "sector_eu") should appear in one row.

You can find a shorter version of my datasets here (as I can not post them directly in this forum as with less than 100 observations per dataset the code wouldn't work).

With only 100 observations per dataset, the resulting number of the pseudo-panel is of 12 observations, so in this case we don't have the problem of more observations (as described above).

A similar question to this one was asked in this post, from which I have the code snippet that I use to create the pseudo-panel:

list_df <- list(DT15_sl, dt17_sl,DATA17_sl,dt18_sl)

result <- Reduce(function(x, y)
  merge(x, y, by=c("sex", "age_group", "education", "sector_eu")), 
  lapply(list_df, function(x) 
  {names(x)[10:14] <- paste(names(x)[10:14], x$year_month[4], sep = '_');x[-4]})) 

result

This is what result looks like:

   sex age_group education               sector_eu ESTU.x PROV.x     date.x     sector_eu_without.x
1    1     35-39         4    Non-tradable sectors   3113      8 2015-09-30                    <NA>
2    1     35-39         4    Non-tradable sectors   3113      8 2015-09-30                    <NA>
3    2     25-29         4    Non-tradable sectors   3113      8 2015-09-30                    <NA>
4    2     25-29         4    Non-tradable sectors   3113      8 2015-09-30                    <NA>
5    2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
6    2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
7    2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
8    2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
9    2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
10   2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
11   2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
12   2       80+         1 Import-oriented sectors   3113      8 2015-09-30 Import-oriented sectors
   indep_2015-09 ideology_2015-09 identity_2015-09 income_2015-09 work_sit_fact_2015-09 ESTU.y PROV.y
1              0                5                3              7     Currently working   3191      8
2              0                5                3              3     Currently working   3191      8
3              0                5                3              7     Currently working   3191      8
4              0                5                3              7     Currently working   3191      8
5              0                3                1              6               Retired   3191      8
6              0                3                1              6               Retired   3191      8
7              0               NA                4              4               Retired   3191      8
8              0               NA                4              4               Retired   3191      8
9              1               NA                3              5               Retired   3191      8
10             1               NA                3              5               Retired   3191      8
11             0                6                1             NA               Retired   3191      8
12             0                6                1             NA               Retired   3191      8
       date.y     sector_eu_without.y indep_2017-10 ideology_2017-10 identity_2017-10 income_2017-10
1  2017-10-02                    <NA>             0                4                3              4
2  2017-10-02                    <NA>             0                4                3              4
3  2017-10-02                    <NA>            NA               NA                5              3
4  2017-10-02                    <NA>            NA               NA                5              3
5  2017-10-02 Import-oriented sectors             1                6                3             NA
6  2017-10-02 Import-oriented sectors             1                6                4              4
7  2017-10-02 Import-oriented sectors             1                6                3             NA
8  2017-10-02 Import-oriented sectors             1                6                4              4
9  2017-10-02 Import-oriented sectors             1                6                3             NA
10 2017-10-02 Import-oriented sectors             1                6                4              4
11 2017-10-02 Import-oriented sectors             1                6                3             NA
12 2017-10-02 Import-oriented sectors             1                6                4              4
   work_sit_fact_2017-10 ESTU.x PROV.x     date.x     sector_eu_without.x indep_2017-12 ideology_2017-12
1             Unemployed   3202      8 2017-12-28                    <NA>             0                1
2             Unemployed   3202      8 2017-12-28                    <NA>             0                1
3      Currently working   3202      8 2017-12-28                    <NA>             0                5
4      Currently working   3202      8 2017-12-28                    <NA>             1                2
5                Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
6                Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
7                Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
8                Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
9                Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
10               Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
11               Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
12               Retired   3202      8 2017-12-28 Import-oriented sectors            NA               NA
   identity_2017-12 income_2017-12 work_sit_fact_2017-12 ESTU.y PROV.y     date.y
1                 3              9     Currently working   3226      8 2018-10-01
2                 3              9     Currently working   3226      8 2018-10-01
3                 3             NA     Currently working   3226      8 2018-10-01
4                 3             NA     Currently working   3226      8 2018-10-01
5                 3             NA               Retired   3226      8 2018-10-01
6                 3             NA               Retired   3226      8 2018-10-01
7                 3             NA               Retired   3226      8 2018-10-01
8                 3             NA               Retired   3226      8 2018-10-01
9                 3             NA               Retired   3226      8 2018-10-01
10                3             NA               Retired   3226      8 2018-10-01
11                3             NA               Retired   3226      8 2018-10-01
12                3             NA               Retired   3226      8 2018-10-01
       sector_eu_without.y indep_2018-10 ideology_2018-10 identity_2018-10 income_2018-10
1                     <NA>            NA                5                3             NA
2                     <NA>            NA                5                3             NA
3                     <NA>            NA               NA                3             NA
4                     <NA>            NA               NA                3             NA
5  Import-oriented sectors             0                3                5             NA
6  Import-oriented sectors             0                3                5             NA
7  Import-oriented sectors             0                3                5             NA
8  Import-oriented sectors             0                3                5             NA
9  Import-oriented sectors             0                3                5             NA
10 Import-oriented sectors             0                3                5             NA
11 Import-oriented sectors             0                3                5             NA
12 Import-oriented sectors             0                3                5             NA
   work_sit_fact_2018-10
1      Currently working
2      Currently working
3      Currently working
4      Currently working
5                Retired
6                Retired
7                Retired
8                Retired
9                Retired
10               Retired
11               Retired
12               Retired

As you can see, the first two rows should be one individual as they have the same values on "sex", "age_group", "education" and "sector_eu". Looking at the values of these four variables, "result" should only contain 3 observations (instead of 12) as there are only 3 different unique combinations of the four variables.

What am I doing wrong or missing?

Further, if I had two additional variables in two of the four datasets (e.g. in DATA17_sl and DT15_sl but not in dt18_sl and dt17_sl), let's say in position 15 and 16, and I wanted to include them in the pseudo-panel as well, how would I have to change my code?

I thank you in advance for every help you can provide.

7
  • A minimal working example makes it much easier for us to help you. And I beg you, please do not post images of your data...
    – RoB
    Commented Jul 2 at 7:19
  • You said: "... the first two rows should be one individual as they have the same values on "sex", "age_group", "education" and "sector_eu" ". I can see that there are two different individuals with those same values because their values for income_2015-09 differ. The merge will therefore return two rows, not one. A similar logic can be applied to the other "two" individuals.
    – Edward
    Commented Jul 2 at 7:54
  • But as I am not merging on income, I thought that the values on other variables would not be considered when grouping individuals.
    – Ronald
    Commented Jul 2 at 8:39
  • You're not grouping anything, you're merging multiple datasets using many-many relationships. It's not surprising that you are getting more observations than you want.
    – Edward
    Commented Jul 2 at 9:10
  • How do I have to change the code, so that I can group individuals based on their values on the variables "sex", "age_group", "education" and "sector_eu" ? As this is my goal.
    – Ronald
    Commented Jul 2 at 9:44

0

Browse other questions tagged or ask your own question.