I am working with a large data set (SAMHSA TEDS-A). ~22MM observations before I start to create the study samples, and the studies are run on approx 5-19MM observations depending on the study.
I am using Diff-In-Diff (DiD) study design.
I have a categorical outcome variable for days waiting for a appointment and all categorical independent variables. The treatment is Medicaid Expansion.
I am currently doing the following using an LPM estimator - I create a short wat and long wait "binary" for this.
These are my variable transformations
gen wait1 = .
replace wait1 = 1 if DAYWAIT==0 | DAYWAIT==1
replace wait1 = 2 if DAYWAIT==2 | DAYWAIT==3 | DAYWAIT==4
replace wait1 = . if DAYWAIT==.
gen wait2 = .
replace wait2 = 1 if DAYWAIT==0 | DAYWAIT== 1 | DAYWAIT==2
replace wait2 = 2 if DAYWAIT==3 | DAYWAIT==4
replace wait2 = . if DAYWAIT==.
gen hisplat = 0
replace hisplat = 1 if ETHNIC==1 | ETHNIC==2 | ETHNIC==3 | ETHNIC==5
replace hisplat = 0 if ETHNIC==4
replace hisplat = . if ETHNIC==.
gen race = 0
replace race = 1 if RACE==5
replace race = 2 if RACE==4
replace race = 3 if hisplat ==1
replace race = 4 if RACE==1 | RACE==2 | RACE==3 | RACE==6 | RACE==7 |RACE==8 | RACE==9
replace race = . if RACE==.
gen married = 0
replace married = 1 if MARSTAT==2
replace married = 0 if MARSTAT==1 | MARSTAT==3 | MARSTAT==4
replace married = . if MARSTAT==.
gen educ = 0
replace educ = 1 if EDUC==1 | EDUC==2
replace educ = 2 if EDUC==3
replace educ = 3 if EDUC==4 | EDUC==5
replace educ = . if EDUC==.
gen employed = 0
replace employed = 1 if EMPLOY==1 | EMPLOY==2
replace employed = 0 if EMPLOY==3| EMPLOY==4
replace employed = . if EMPLOY==.
gen roofovrhd = 0
replace roofovrhd = 0 if LIVARAG==1
replace roofovrhd = 1 if LIVARAG==2 | LIVARAG==3
replace roofovrhd = . if LIVARAG==.
gen primsub = 0
replace primsub = 1 if SUB1 ==1
replace primsub = 2 if SUB1 ==2
replace primsub = 3 if SUB1 ==3
replace primsub = 4 if SUB1 == 4
replace primsub = 5 if SUB1 ==5 | SUB1 ==6 | SUB1 ==7
replace primsub = 10 if SUB1 ==10
replace primsub = 27 if SUB1 ==8 | SUB1 ==9 | SUB1 ==11 | SUB1 ==12 | SUB1 ==13 | SUB1 ==14 | SUB1 ==15 | SUB1 ==16 | SUB1 ==17 | SUB1 ==18 | SUB1 ==19
replace primsub = . if SUB1 ==.
I am currently doing the following after I set my sample up for the LPM model (cull sample based on some missing thresholds in the dependent variable as well as for patterns of missing in pre-post data by state (STFIPS):
foreach var in GENDER AGE race educ employed wait1 wait2 DAYWAIT primsub roofovrhd SERVICES married REGION PSOURCE {
tab(`var'), gen(`var'fe_)
}
gen admissions = CASEID
replace admissions = 1
collapse (mean) DAYWAITfe_* wait1fe_* wait2fe_* GENDERfe_* AGEfe_* racefe_* educfe_* employedfe_* SERVICESfe_* REGIONfe_* primsubfe_* PSOURCEfe_* (sum)admissions, by(ADMYR STFIPS)
THIS IS HOW I SPECIFY TREATMENT to evaluate the "main effect"
gen expand = 0
foreach state in 4 5 8 10 15 17 19 21 24 25 26 32 33 35 36 38 39 40 41 44 50 54 {
replace expand = 1 if STFIPS==`state' & ADMYR >=2014
}
foreach state in 6 9 11 27 34 53 {
replace expand = 1 if STFIPS==`state' & ADMYR>=2011
}
replace expand = 1 if STFIPS==2 & ADMYR>=2015
replace expand = 1 if STFIPS==42 & ADMYR>=2015
replace expand = 1 if STFIPS==18 & ADMYR>=2016
replace expand = 1 if STFIPS==30 & ADMYR>=2016
replace expand = 1 if STFIPS==22 & ADMYR>=2016
LPM Estimator
reg wait2fe_2 expand GENDERfe_* racefe_* AGEfe_* educfe_* employedfe_* SERVICESfe_* i.ADMYR i.STFIPS [aw=admissions], cluster(STFIPS)
And I get some output.
Below is where I am asking for help on non-linear categorical models - guidance on specification and coding:
I would like to use a fixed effects ordered mlogit (to capture all DAYWAIT categories as the dependent variable) or probit model (to check the LPM against) to capture all categories of the dependent variable and report out the associated treatment effects for each category. I have run some feologit models as well - these seem to do okay.
I am having a difficult time xtsetting this data set - even when I restrict to a "balanced panel" (all states reporting in every year). I am not sure if I need to collapse (sum) by STATE-YEAR for these non-linear models - I am thinking I might need to do this - at least i can XTSET after i do this. I would XTSET STFIPS (panel indicator) ADMYR (years)
I am also unsure if I need to designate "i." for the categorical independent variables as well as the expand variable (it does not seem to make a difference in the feologit I have run (the Beta is the same if I i.expand as when I do not).
So I presume to do everything the same as above for the LPM, except I would do this:
gen wait1 = .
replace wait1 = 1 if DAYWAIT==0 | DAYWAIT==1
replace wait1 = 2 if DAYWAIT==2 | DAYWAIT==3 | DAYWAIT==4
replace wait1 = . if DAYWAIT==.
gen wait2 = .
replace wait2 = 1 if DAYWAIT==0 | DAYWAIT== 1 | DAYWAIT==2
replace wait2 = 2 if DAYWAIT==3 | DAYWAIT==4
replace wait2 = . if DAYWAIT==.
gen hisplat = 0
replace hisplat = 1 if ETHNIC==1 | ETHNIC==2 | ETHNIC==3 | ETHNIC==5
replace hisplat = 0 if ETHNIC==4
replace hisplat = . if ETHNIC==.
gen race = 0
replace race = 1 if RACE==5
replace race = 2 if RACE==4
replace race = 3 if hisplat ==1
replace race = 4 if RACE==1 | RACE==2 | RACE==3 | RACE==6 | RACE==7 |RACE==8 | RACE==9
replace race = . if RACE==.
gen married = 0
replace married = 1 if MARSTAT==2
replace married = 0 if MARSTAT==1 | MARSTAT==3 | MARSTAT==4
replace married = . if MARSTAT==.
gen educ = 0
replace educ = 1 if EDUC==1 | EDUC==2
replace educ = 2 if EDUC==3
replace educ = 3 if EDUC==4 | EDUC==5
replace educ = . if EDUC==.
gen employed = 0
replace employed = 1 if EMPLOY==1 | EMPLOY==2
replace employed = 0 if EMPLOY==3| EMPLOY==4
replace employed = . if EMPLOY==.
gen roofovrhd = 0
replace roofovrhd = 0 if LIVARAG==1
replace roofovrhd = 1 if LIVARAG==2 | LIVARAG==3
replace roofovrhd = . if LIVARAG==.
gen primsub = 0
replace primsub = 1 if SUB1 ==1
replace primsub = 2 if SUB1 ==2
replace primsub = 3 if SUB1 ==3
replace primsub = 4 if SUB1 == 4
replace primsub = 5 if SUB1 ==5 | SUB1 ==6 | SUB1 ==7
replace primsub = 10 if SUB1 ==10
replace primsub = 27 if SUB1 ==8 | SUB1 ==9 | SUB1 ==11 | SUB1 ==12 | SUB1 ==13 | SUB1 ==14 | SUB1 ==15 | SUB1 ==16 | SUB1 ==17 | SUB1 ==18 | SUB1 ==19
replace primsub = . if SUB1 ==.
collapse (sum) DAYWAIT wait1 wait2 GENDER AGE race educ employed SERVICES REGION primsub PSOURCE admissions, by(ADMYR STFIPS)
gen expand = 0
foreach state in 4 5 8 10 15 17 19 21 24 25 26 32 33 35 36 38 39 40 41 44 50 54 {
replace expand = 1 if STFIPS==`state' & ADMYR >=2014
}
foreach state in 6 9 11 27 34 53 {
replace expand = 1 if STFIPS==`state' & ADMYR>=2011
}
replace expand = 1 if STFIPS==2 & ADMYR>=2015
replace expand = 1 if STFIPS==42 & ADMYR>=2015
replace expand = 1 if STFIPS==18 & ADMYR>=2016
replace expand = 1 if STFIPS==30 & ADMYR>=2016
replace expand = 1 if STFIPS==22 & ADMYR>=2016
Here is where I am just experimenting at the moment so see if the models run (I am not sure if I am specifying (or coding) correctly and therefore not sure if my effect estimates are in line with what I should expect if I had the correct set up):
xtset STFIPS ADMYR (this works after collapse-summing with gaps and panel is unbalanced)
Panel variable: STFIPS (unbalanced)
Time variable: ADMYR, 2009 to 2020, but with gaps
Delta: 1 unit
feologit DAYWAIT expand i.ADMYR, cluster(STFIPS) or
ologit DAYWAIT expand SERVICES GENDER race AGE employed REGION, vce(cluster STFIPS)
ologit DAYWAIT expand SERVICES i.ADMYR, vce(cluster STFIPS)
Here is my data (200 lines) - not collapse-summed
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte STFIPS int ADMYR byte DAYWAIT float wait2 byte GENDER float race byte(AGE REGION) float(admissions expand) 1 2015 0 1 1 1 3 3 1 0 2 2010 0 1 2 4 7 4 1 0 2 2014 0 1 1 4 5 4 1 0 5 2009 0 1 1 1 6 3 1 0 5 2014 0 1 1 1 7 3 1 1 5 2015 0 1 1 1 4 3 1 1 5 2018 0 1 1 2 11 3 1 1 5 2020 0 1 1 2 7 3 1 1 6 2009 0 1 1 1 3 4 1 0 6 2009 0 1 1 1 6 4 1 0 6 2009 0 1 2 1 9 4 1 0 6 2010 3 2 1 4 5 4 1 0 6 2010 1 1 1 4 3 4 1 0 6 2012 1 1 1 1 8 4 1 1 6 2012 0 1 2 2 10 4 1 1 6 2013 0 1 2 1 8 4 1 1 6 2013 0 1 1 1 7 4 1 1 6 2013 1 1 1 1 3 4 1 1 6 2013 1 1 1 1 5 4 1 1 6 2013 0 1 1 4 8 4 1 1 6 2014 0 1 1 1 4 4 1 1 6 2014 1 1 1 1 6 4 1 1 6 2014 0 1 2 1 9 4 1 1 6 2014 0 1 1 4 7 4 1 1 6 2015 0 1 2 2 7 4 1 1 6 2015 2 1 1 1 9 4 1 1 6 2016 0 1 2 1 5 4 1 1 6 2016 0 1 1 1 6 4 1 1 6 2016 0 1 2 3 7 4 1 1 6 2017 1 1 2 4 6 4 1 1 6 2018 0 1 2 4 3 4 1 1 6 2018 0 1 2 4 6 4 1 1 6 2018 3 2 . 4 4 4 1 1 6 2018 0 1 1 2 11 4 1 1 6 2018 0 1 1 4 7 4 1 1 6 2018 0 1 1 4 10 4 1 1 6 2018 0 1 2 1 9 4 1 1 6 2019 1 1 2 1 5 4 1 1 6 2019 0 1 2 1 5 4 1 1 6 2019 0 1 2 4 7 4 1 1 6 2019 0 1 1 3 7 4 1 1 6 2019 1 1 1 4 6 4 1 1 6 2020 0 1 1 4 5 4 1 1 6 2020 0 1 1 4 6 4 1 1 12 2009 0 1 1 1 5 3 1 0 12 2009 0 1 2 1 9 3 1 0 12 2010 0 1 2 1 6 3 1 0 12 2010 0 1 2 2 8 3 1 0 12 2011 0 1 2 4 8 3 1 0 12 2011 0 1 1 1 5 3 1 0 12 2012 0 1 1 1 6 3 1 0 12 2012 0 1 2 1 6 3 1 0 12 2013 0 1 1 1 7 3 1 0 12 2013 0 1 1 1 10 3 1 0 12 2014 0 1 1 1 10 3 1 0 12 2016 0 1 1 1 7 3 1 0 12 2016 3 2 1 2 4 3 1 0 12 2016 0 1 1 1 4 3 1 0 12 2017 0 1 2 4 5 3 1 0 12 2017 0 1 1 1 5 3 1 0 12 2017 0 1 1 1 6 3 1 0 12 2017 0 1 2 1 5 3 1 0 12 2017 0 1 1 4 11 3 1 0 16 2012 0 1 2 1 4 4 1 0 16 2018 0 1 2 1 4 4 1 0 17 2009 0 1 1 2 5 2 1 0 17 2009 0 1 1 2 6 2 1 0 17 2009 0 1 1 3 9 2 1 0 17 2009 1 1 1 3 6 2 1 0 17 2010 0 1 1 2 9 2 1 0 17 2011 3 2 1 2 3 2 1 0 17 2011 1 1 1 1 8 2 1 0 17 2011 1 1 1 2 7 2 1 0 17 2012 0 1 1 4 5 2 1 0 17 2013 3 2 1 1 11 2 1 0 17 2014 1 1 2 1 10 2 1 1 17 2014 2 1 1 2 4 2 1 1 17 2019 0 1 2 1 8 2 1 1 19 2009 3 2 2 1 6 2 1 0 19 2009 3 2 1 4 6 2 1 0 19 2010 3 2 2 1 4 2 1 0 19 2010 2 1 1 1 9 2 1 0 19 2012 0 1 1 1 4 2 1 0 19 2012 1 1 1 2 8 2 1 0 19 2014 0 1 2 1 6 2 1 1 19 2015 1 1 1 1 6 2 1 1 19 2019 1 1 2 1 6 2 1 1 19 2020 3 2 1 1 7 2 1 1 20 2012 0 1 2 1 6 2 1 0 20 2013 0 1 2 1 10 2 1 0 24 2009 0 1 2 2 8 3 1 0 24 2010 0 1 2 2 8 3 1 0 24 2012 1 1 1 1 4 3 1 0 24 2015 1 1 2 1 7 3 1 1 24 2015 0 1 1 4 11 3 1 1 24 2016 0 1 2 1 8 3 1 1 24 2016 0 1 1 . 5 3 1 1 24 2016 0 1 1 2 10 3 1 1 24 2016 0 1 1 1 9 3 1 1 24 2016 1 1 1 . 9 3 1 1 24 2017 0 1 1 2 10 3 1 1 24 2018 0 1 1 . 7 3 1 1 24 2018 0 1 2 1 6 3 1 1 24 2018 0 1 1 1 5 3 1 1 26 2009 0 1 1 2 5 2 1 0 26 2010 1 1 1 1 6 2 1 0 26 2012 0 1 1 1 11 2 1 0 26 2013 0 1 2 1 8 2 1 0 26 2015 0 1 1 2 10 2 1 1 26 2015 0 1 1 1 8 2 1 1 26 2016 2 1 1 1 6 2 1 1 26 2018 0 1 1 2 7 2 1 1 26 2019 1 1 1 1 6 2 1 1 26 2020 1 1 2 1 5 2 1 1 29 2012 2 1 1 1 9 2 1 0 29 2013 2 1 1 1 6 2 1 0 29 2017 1 1 1 1 6 2 1 0 29 2018 0 1 2 1 4 2 1 0 29 2019 4 2 2 1 5 2 1 0 29 2020 0 1 2 1 7 2 1 0 29 2020 2 1 1 1 3 2 1 0 30 2010 0 1 2 4 6 4 1 0 30 2015 0 1 1 1 3 4 1 0 32 2010 0 1 1 1 7 4 1 0 32 2018 2 1 1 3 9 4 1 1 33 2009 4 2 1 1 5 1 1 0 34 2009 0 1 1 3 9 1 1 0 34 2009 1 1 1 3 7 1 1 0 34 2010 0 1 1 1 5 1 1 0 34 2011 2 1 2 1 8 1 1 1 34 2011 1 1 2 1 4 1 1 1 34 2011 1 1 2 2 4 1 1 1 34 2011 2 1 1 1 6 1 1 1 34 2013 1 1 1 2 5 1 1 1 34 2014 3 2 2 2 4 1 1 1 34 2014 1 1 1 1 3 1 1 1 34 2015 0 1 1 2 7 1 1 1 34 2016 1 1 2 1 7 1 1 1 34 2016 1 1 1 3 5 1 1 1 34 2016 1 1 1 2 10 1 1 1 34 2017 1 1 1 1 8 1 1 1 34 2017 1 1 2 1 9 1 1 1 34 2017 1 1 1 2 7 1 1 1 34 2019 1 1 1 3 7 1 1 1 34 2019 1 1 2 1 9 1 1 1 34 2019 1 1 2 1 6 1 1 1 34 2020 0 1 2 1 6 1 1 1 34 2020 0 1 2 1 7 1 1 1 35 2014 . . 1 3 3 4 1 1 35 2018 0 1 1 . 3 4 1 1 35 2019 0 1 1 4 11 4 1 1 38 2016 3 2 1 4 7 2 1 1 38 2017 1 1 2 1 3 2 1 1 39 2009 0 1 1 2 6 2 1 0 39 2009 . . 1 . 7 2 1 0 39 2010 0 1 1 1 6 2 1 0 39 2010 4 2 1 2 4 2 1 0 39 2010 3 2 1 1 4 2 1 0 39 2011 0 1 1 2 7 2 1 0 39 2012 0 1 1 1 5 2 1 0 39 2014 1 1 1 3 5 2 1 1 39 2014 0 1 1 1 4 2 1 1 39 2015 0 1 1 1 6 2 1 1 39 2015 . . 1 1 7 2 1 1 39 2016 1 1 2 1 8 2 1 1 39 2016 0 1 1 1 9 2 1 1 39 2016 0 1 1 1 6 2 1 1 39 2018 1 1 2 1 7 2 1 1 39 2018 0 1 2 1 3 2 1 1 39 2019 0 1 2 1 9 2 1 1 39 2020 1 1 1 2 11 2 1 1 39 2020 0 1 1 . 6 2 1 1 39 2020 0 1 1 2 4 2 1 1 45 2010 3 2 1 1 7 3 1 0 45 2012 1 1 1 2 6 3 1 0 45 2013 1 1 2 1 5 3 1 0 45 2015 0 1 1 2 7 3 1 0 45 2017 0 1 2 2 7 3 1 0 45 2018 0 1 2 2 4 3 1 0 46 2015 0 1 2 4 6 2 1 0 46 2015 0 1 1 4 7 2 1 0 46 2019 0 1 1 1 4 2 1 0 47 2010 4 2 1 1 5 3 1 0 47 2011 0 1 1 1 4 3 1 0 47 2015 0 1 1 1 8 3 1 0 47 2015 0 1 1 1 6 3 1 0 47 2016 1 1 1 1 11 3 1 0 48 2009 0 1 2 1 10 3 1 0 48 2010 0 1 2 2 6 3 1 0 48 2011 0 1 1 2 9 3 1 0 48 2014 0 1 1 1 6 3 1 0 48 2015 0 1 1 3 4 3 1 0 48 2016 0 1 1 1 5 3 1 0 48 2019 0 1 1 1 11 3 1 0 49 2009 0 1 2 4 3 4 1 0 49 2011 0 1 1 1 9 4 1 0 49 2014 0 1 1 4 10 4 1 0 49 2016 0 1 1 1 6 4 1 0 49 2018 0 1 1 1 10 4 1 0 49 2020 0 1 1 4 6 4 1 0 end label values STFIPS STFIPS label def STFIPS 1 "1. Alabama", modify label def STFIPS 2 "2. Alaska", modify label def STFIPS 5 "5. Arkansas", modify label def STFIPS 6 "6. California", modify label def STFIPS 12 "12. Florida", modify label def STFIPS 16 "16. Idaho", modify label def STFIPS 17 "17. Illinois", modify label def STFIPS 19 "19. Iowa", modify label def STFIPS 20 "20. Kansas", modify label def STFIPS 24 "24. Maryland", modify label def STFIPS 26 "26. Michigan", modify label def STFIPS 29 "29. Missouri", modify label def STFIPS 30 "30. Montana", modify label def STFIPS 32 "32. Nevada", modify label def STFIPS 33 "33. New Hampshire", modify label def STFIPS 34 "34. New Jersey", modify label def STFIPS 35 "35. New Mexico", modify label def STFIPS 38 "38. North Dakota", modify label def STFIPS 39 "39. Ohio", modify label def STFIPS 45 "45. South Carolina", modify label def STFIPS 46 "46. South Dakota", modify label def STFIPS 47 "47. Tennessee", modify label def STFIPS 48 "48. Texas", modify label def STFIPS 49 "49. Utah", modify label values ADMYR ADMYR label def ADMYR 2009 "2009. 2009", modify label def ADMYR 2010 "2010. 2010", modify label def ADMYR 2011 "2011. 2011", modify label def ADMYR 2012 "2012. 2012", modify label def ADMYR 2013 "2013. 2013", modify label def ADMYR 2014 "2014. 2014", modify label def ADMYR 2015 "2015. 2015", modify label def ADMYR 2016 "2016. 2016", modify label def ADMYR 2017 "2017. 2017", modify label def ADMYR 2018 "2018. 2018", modify label def ADMYR 2019 "2019. 2019", modify label def ADMYR 2020 "2020. 2020", modify label values DAYWAIT DAYWAIT label def DAYWAIT 0 "0. 0", modify label def DAYWAIT 1 "1. 1-7", modify label def DAYWAIT 2 "2. 8-14", modify label def DAYWAIT 3 "3. 15-30", modify label def DAYWAIT 4 "4. 31 or more", modify label values GENDER GENDER label def GENDER 1 "1. Male", modify label def GENDER 2 "2. Female", modify label values AGE AGE label def AGE 3 "3. 18-20 years", modify label def AGE 4 "4. 21-24 years", modify label def AGE 5 "5. 25-29 years", modify label def AGE 6 "6. 30-34 years", modify label def AGE 7 "7. 35-39 years", modify label def AGE 8 "8. 40-44 years", modify label def AGE 9 "9. 45-49 years", modify label def AGE 10 "10. 50-54 years", modify label def AGE 11 "11. 55-64 years", modify label values REGION REGION label def REGION 1 "1. Northeast", modify label def REGION 2 "2. Midwest", modify label def REGION 3 "3. South", modify label def REGION 4 "4. West", modify label var STFIPS "Census state FIPS code" label var ADMYR "Year of admission" label var DAYWAIT "Days waiting to enter substance use treatment" label var GENDER "Gender" label var AGE "Age at admission" label var REGION "Census region"
I realize that I may need to clarify, and thanks much in advance for help and guidance with respect to the non-linear categorical models.
William "Cam" Bigler
0 Response to Fixed Effects Categorical outcome and independent variable modeling - repeated cross section
Post a Comment