I recently reshaped my dataset from wide to long into balanced panel dataset. Clyde Schechter has been so gracious in helping me and I have a follow up on my post on Creating a year identifier for pre-post analysis to use for diff-in-diff but focused on the reshaping aspect of the response.
I am using panel-data (balanced data) with unit of analysis is the county level. Variables have observations in years from 2008-2018 but my period of interest is 2011-2017. I reshaped the data from wide too long. Some of the variables include observations reporting data for each year (example # of FQHCs reported for each year 2011-2018) and some with data for 5-year estimates (for example: veteran and non-veteran education level 2012-2016, population by race and gender 2011-2017). I also have variables that reflect count/percentage/total# of observations over a period of time (example number of black females).
Here is the dataset before the reshaped:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long povertystat12 byte ruralclinic10 long vet_hs12 int vet_nohs12 54598 0 4778 254 196640 0 18324 942 23641 1 1532 208 20603 2 1111 159 57099 2 3933 527 10154 1 351 55 19977 2 1172 160 112690 0 9421 814 33503 0 2028 304 25465 1 1923 303 43301 0 2682 288 13091 1 819 107 24448 5 1125 150 13106 0 890 197 14726 0 602 136 50255 0 5786 398 53910 0 3740 429 12622 0 728 92 10497 1 810 71 37016 2 2729 308 13662 1 786 111 80126 1 4900 604 48338 0 5918 309 40895 0 1930 196 70108 2 3308 399 end label var povertystat12 "# Pers w/Pov Status Determined 2012-16" label var ruralclinic10 "# Rural Health Clinics 2010" label var vet_hs12 "Veterans 25+ w/HS Dipl or more 2012-16" label var vet_nohs12 "Veterans 25+ w/< HS Diploma 2012-16"
Code:
tabstat vet_hs12, stat(sum) format(%14.0fc) c(v) stats | vet_hs12 ---------+---------- sum | 18,018,157 --------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte year int vetedu_hs byte ruralclinic long vetedu_hsplus 12 132 0 981 12 59 0 585 12 197 0 1608 12 1 1 159 12 16 2 314 12 7 0 52 12 311 0 2752 12 840 0 12952 12 157 0 1208 12 143 0 1498 12 2 0 48 12 87 3 1200 12 2140 0 29503 12 78 0 341 end label var year "Reshaped Year variable panel data" label var vetedu_hs "Veterans 25+ w/< HS Diploma 2012-16" label var ruralclinic "# of rural health clinics 2010-17" label var vetedu_hsplus "Veterans 25+ w/HS Diploma+ 2012-16"
Code:
. tabstat vetedu_hs, by (year) stat(sum) Summary for variables: vetedu_hs by categories of: year (Reshaped Year variable panel data) year | sum ---------+---------- 11 | 0 12 | 1328412 13 | 0 14 | 0 15 | 0 16 | 0 17 | 0 18 | 0 ---------+---------- Total | 1328412 --------------------
It is clear something is missing. I am befuddled. I consider I need to do some loops of the reshaped and populate across a range of time. I talked about this Creating a categorical variable from multiple numeric variables in post #4, however I do not think this is correct approach. Should I have not included the -vet_hs12- variable in the reshaped but I am not sure how I would have been able to build my model if i did not capture the fact that in my panel data I have 5-year estimate data?
Hope this is clear.
Thanks
Rene
Stata 12 on MAC OS (but also have access to Stata 15 on Windows)
0 Response to Reshaping data from wide to long panel-data
Post a Comment