Good day,

I recently reshaped my dataset from wide to long into balanced panel dataset. Clyde Schechter has been so gracious in helping me and I have a follow up on my post on Creating a year identifier for pre-post analysis to use for diff-in-diff but focused on the reshaping aspect of the response.

I am using panel-data (balanced data) with unit of analysis is the county level. Variables have observations in years from 2008-2018 but my period of interest is 2011-2017. I reshaped the data from wide too long. Some of the variables include observations reporting data for each year (example # of FQHCs reported for each year 2011-2018) and some with data for 5-year estimates (for example: veteran and non-veteran education level 2012-2016, population by race and gender 2011-2017). I also have variables that reflect count/percentage/total# of observations over a period of time (example number of black females).

Here is the dataset before the reshaped:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long povertystat12 byte ruralclinic10 long vet_hs12 int vet_nohs12
 54598 0  4778 254
196640 0 18324 942
 23641 1  1532 208
 20603 2  1111 159
 57099 2  3933 527
 10154 1   351  55
 19977 2  1172 160
112690 0  9421 814
 33503 0  2028 304
 25465 1  1923 303
 43301 0  2682 288
 13091 1   819 107
 24448 5  1125 150
 13106 0   890 197
 14726 0   602 136
 50255 0  5786 398
 53910 0  3740 429
 12622 0   728  92
 10497 1   810  71
 37016 2  2729 308
 13662 1   786 111
 80126 1  4900 604
 48338 0  5918 309
 40895 0  1930 196
 70108 2  3308 399
end
label var povertystat12 "# Pers w/Pov Status Determined 2012-16" 
label var ruralclinic10 "# Rural Health Clinics 2010" 
label var vet_hs12 "Veterans 25+ w/HS Dipl or more 2012-16" 
label var vet_nohs12 "Veterans 25+ w/< HS Diploma 2012-16"
When running a tabstat of the data looking at the total number of persons with veteran education

Code:
 tabstat vet_hs12, stat(sum) format(%14.0fc) c(v)

   stats |  vet_hs12
---------+----------
     sum |    18,018,157
--------------------
After reshaping the data, the data looks like this:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte year int vetedu_hs byte ruralclinic long vetedu_hsplus
12  132  0   981
12   59  0   585
12  197  0  1608
12    1  1   159
12   16  2   314
12    7  0    52
12  311  0  2752
12  840  0 12952
12  157  0  1208
12  143  0  1498
12    2  0    48
12   87  3  1200
12 2140  0 29503
12   78  0   341
end
label var year "Reshaped Year variable panel data" 
label var vetedu_hs "Veterans 25+ w/< HS Diploma 2012-16" 
label var ruralclinic "# of rural health clinics 2010-17" 
label var vetedu_hsplus "Veterans 25+ w/HS Diploma+ 2012-16"
Here is the results of sum of variable vetedu_hs by year - which reflects the number of veteran persons with high school education by year. What I know is wrong is that the original variable noted that the vet_hs12 was the number of veterans with high school diploma 2012-2016 but when reshaped it only populated in year 2012

Code:
. tabstat vetedu_hs, by (year) stat(sum)
Summary for variables: vetedu_hs
     by categories of: year (Reshaped Year variable panel data)

    year |       sum
---------+----------
      11 |         0
      12 |   1328412
      13 |         0
      14 |         0
      15 |         0
      16 |         0
      17 |         0
      18 |         0
---------+----------
   Total |   1328412
--------------------

It is clear something is missing. I am befuddled. I consider I need to do some loops of the reshaped and populate across a range of time. I talked about this Creating a categorical variable from multiple numeric variables in post #4, however I do not think this is correct approach. Should I have not included the -vet_hs12- variable in the reshaped but I am not sure how I would have been able to build my model if i did not capture the fact that in my panel data I have 5-year estimate data?

Hope this is clear.

Thanks

Rene
Stata 12 on MAC OS (but also have access to Stata 15 on Windows)