The layout (you say "format") is fine. You just need some renaming first. Furthermore, the code misses the point that reshape long
needs stub names, not variable names. That is, variables to be stacked vertically will ideally have the same prefix, as when foo1
and foo2
have the same prefix foo
, but foo
and bar
need renaming. (There are other acceptable inputs, but your names don't match that either.)
Stata tip: the spreadsheet terminology of rows and columns is better replaced by talking of observations and variables.
You've presented your date variable as if it were string. It may well be, and certainly should be, a numeric variable with a date display format.
* Example generated by -dataex-. For more info, type help dataex
clear
input str5(casea controlb1 controlb2 controlb3 controlb4) str6 indexdate
"ID_1" "ID_2" "ID_3" "ID_4" "ID_5" "1/1/01"
"ID_6" "ID_7" "ID_8" "ID_9" "ID_10" "1/2/01"
"ID_11" "ID_12" "ID_13" "ID_14" "ID_15" "1/5/01"
end
rename (casea-controlb4) (id=)
reshape long id, i(indexdate) j(which) string
gen case_control = strpos(which, "case") > 0
list
. * Example generated by -dataex-. For more info, type help dataex
. clear
. input str5(casea controlb1 controlb2 controlb3 controlb4) str6 indexdate
casea controlb1 controlb2 controlb3 controlb4 indexdate
1. "ID_1" "ID_2" "ID_3" "ID_4" "ID_5" "1/1/01"
2. "ID_6" "ID_7" "ID_8" "ID_9" "ID_10" "1/2/01"
3. "ID_11" "ID_12" "ID_13" "ID_14" "ID_15" "1/5/01"
4. end
.
. rename (casea-controlb4) (id=)
.
. reshape long id, i(indexdate) j(which) string
(j = casea controlb1 controlb2 controlb3 controlb4)
Data Wide -> Long
-----------------------------------------------------------------------------
Number of observations 3 -> 15
Number of variables 6 -> 3
j variable (5 values) -> which
xij variables:
idcasea idcontrolb1 ... idcontrolb4 -> id
-----------------------------------------------------------------------------
.
. gen case_control = strpos(which, "case") > 0
.
. list
+-----------------------------------------+
| indexd~e which id case_c~l |
|-----------------------------------------|
1. | 1/1/01 casea ID_1 1 |
2. | 1/1/01 controlb1 ID_2 0 |
3. | 1/1/01 controlb2 ID_3 0 |
4. | 1/1/01 controlb3 ID_4 0 |
5. | 1/1/01 controlb4 ID_5 0 |
|-----------------------------------------|
6. | 1/2/01 casea ID_6 1 |
7. | 1/2/01 controlb1 ID_7 0 |
8. | 1/2/01 controlb2 ID_8 0 |
9. | 1/2/01 controlb3 ID_9 0 |
10. | 1/2/01 controlb4 ID_10 0 |
|-----------------------------------------|
11. | 1/5/01 casea ID_11 1 |
12. | 1/5/01 controlb1 ID_12 0 |
13. | 1/5/01 controlb2 ID_13 0 |
14. | 1/5/01 controlb3 ID_14 0 |
15. | 1/5/01 controlb4 ID_15 0 |
+-----------------------------------------+
EDIT 5 July
This is a reshape long
of your modified data. Whether the duplicates are problematic otherwise is a substantive question: you have some built-in dependence.
* Example generated by -dataex-. For more info, type help dataex
clear
input str4(casea controlb1 controlb2 controlb3 controlb4) str6 indexdate
"ID_x" "ID_1" "ID_2" "ID_3" "ID_4" "1/1/01"
"ID_y" "ID_5" "ID_2" "ID_4" "ID_6" "1/2/01"
"ID_z" "ID_1" "ID_5" "ID_7" "ID_8" "1/5/01"
end
reshape long controlb , i(casea indexdate) j(which)
list, sepby(casea)
+-------------------------------------+
| casea indexd~e which controlb |
|-------------------------------------|
1. | ID_x 1/1/01 1 ID_1 |
2. | ID_x 1/1/01 2 ID_2 |
3. | ID_x 1/1/01 3 ID_3 |
4. | ID_x 1/1/01 4 ID_4 |
|-------------------------------------|
5. | ID_y 1/2/01 1 ID_5 |
6. | ID_y 1/2/01 2 ID_2 |
7. | ID_y 1/2/01 3 ID_4 |
8. | ID_y 1/2/01 4 ID_6 |
|-------------------------------------|
9. | ID_z 1/5/01 1 ID_1 |
10. | ID_z 1/5/01 2 ID_5 |
11. | ID_z 1/5/01 3 ID_7 |
12. | ID_z 1/5/01 4 ID_8 |
+-------------------------------------+