Hello all. Stata/MP 17. I have researched in here and elsewhere, and found some useful info and I am a bit stuck due to my data set.

I am working with a large data set (SAMHSA TEDS-A). ~22MM observations before I start to create the study samples, and the studies are run on approx 5-19MM observations depending on the study.

I am using Diff-In-Diff (DiD) study design.

I have a categorical outcome variable for days waiting for a appointment and all categorical independent variables. The treatment is Medicaid Expansion.

I am currently doing the following using an LPM estimator - I create a short wat and long wait "binary" for this.

These are my variable transformations

gen wait1 = .
replace wait1 = 1 if DAYWAIT==0 | DAYWAIT==1
replace wait1 = 2 if DAYWAIT==2 | DAYWAIT==3 | DAYWAIT==4
replace wait1 = . if DAYWAIT==.

gen wait2 = .
replace wait2 = 1 if DAYWAIT==0 | DAYWAIT== 1 | DAYWAIT==2
replace wait2 = 2 if DAYWAIT==3 | DAYWAIT==4
replace wait2 = . if DAYWAIT==.

gen hisplat = 0
replace hisplat = 1 if ETHNIC==1 | ETHNIC==2 | ETHNIC==3 | ETHNIC==5
replace hisplat = 0 if ETHNIC==4
replace hisplat = . if ETHNIC==.

gen race = 0
replace race = 1 if RACE==5
replace race = 2 if RACE==4
replace race = 3 if hisplat ==1
replace race = 4 if RACE==1 | RACE==2 | RACE==3 | RACE==6 | RACE==7 |RACE==8 | RACE==9
replace race = . if RACE==.

gen married = 0
replace married = 1 if MARSTAT==2
replace married = 0 if MARSTAT==1 | MARSTAT==3 | MARSTAT==4
replace married = . if MARSTAT==.

gen educ = 0
replace educ = 1 if EDUC==1 | EDUC==2
replace educ = 2 if EDUC==3
replace educ = 3 if EDUC==4 | EDUC==5
replace educ = . if EDUC==.

gen employed = 0
replace employed = 1 if EMPLOY==1 | EMPLOY==2
replace employed = 0 if EMPLOY==3| EMPLOY==4
replace employed = . if EMPLOY==.

gen roofovrhd = 0
replace roofovrhd = 0 if LIVARAG==1
replace roofovrhd = 1 if LIVARAG==2 | LIVARAG==3
replace roofovrhd = . if LIVARAG==.

gen primsub = 0
replace primsub = 1 if SUB1 ==1
replace primsub = 2 if SUB1 ==2
replace primsub = 3 if SUB1 ==3
replace primsub = 4 if SUB1 == 4
replace primsub = 5 if SUB1 ==5 | SUB1 ==6 | SUB1 ==7
replace primsub = 10 if SUB1 ==10
replace primsub = 27 if SUB1 ==8 | SUB1 ==9 | SUB1 ==11 | SUB1 ==12 | SUB1 ==13 | SUB1 ==14 | SUB1 ==15 | SUB1 ==16 | SUB1 ==17 | SUB1 ==18 | SUB1 ==19
replace primsub = . if SUB1 ==.


I am currently doing the following after I set my sample up for the LPM model (cull sample based on some missing thresholds in the dependent variable as well as for patterns of missing in pre-post data by state (STFIPS):

foreach var in GENDER AGE race educ employed wait1 wait2 DAYWAIT primsub roofovrhd SERVICES married REGION PSOURCE {
tab(`var'), gen(`var'fe_)
}

gen admissions = CASEID
replace admissions = 1

collapse (mean) DAYWAITfe_* wait1fe_* wait2fe_* GENDERfe_* AGEfe_* racefe_* educfe_* employedfe_* SERVICESfe_* REGIONfe_* primsubfe_* PSOURCEfe_* (sum)admissions, by(ADMYR STFIPS)

THIS IS HOW I SPECIFY TREATMENT to evaluate the "main effect"
gen expand = 0
foreach state in 4 5 8 10 15 17 19 21 24 25 26 32 33 35 36 38 39 40 41 44 50 54 {
replace expand = 1 if STFIPS==`state' & ADMYR >=2014
}

foreach state in 6 9 11 27 34 53 {
replace expand = 1 if STFIPS==`state' & ADMYR>=2011
}

replace expand = 1 if STFIPS==2 & ADMYR>=2015
replace expand = 1 if STFIPS==42 & ADMYR>=2015
replace expand = 1 if STFIPS==18 & ADMYR>=2016
replace expand = 1 if STFIPS==30 & ADMYR>=2016
replace expand = 1 if STFIPS==22 & ADMYR>=2016

LPM Estimator
reg wait2fe_2 expand GENDERfe_* racefe_* AGEfe_* educfe_* employedfe_* SERVICESfe_* i.ADMYR i.STFIPS [aw=admissions], cluster(STFIPS)

And I get some output.

Below is where I am asking for help on non-linear categorical models - guidance on specification and coding:

I would like to use a fixed effects ordered mlogit (to capture all DAYWAIT categories as the dependent variable) or probit model (to check the LPM against) to capture all categories of the dependent variable and report out the associated treatment effects for each category. I have run some feologit models as well - these seem to do okay.

I am having a difficult time xtsetting this data set - even when I restrict to a "balanced panel" (all states reporting in every year). I am not sure if I need to collapse (sum) by STATE-YEAR for these non-linear models - I am thinking I might need to do this - at least i can XTSET after i do this. I would XTSET STFIPS (panel indicator) ADMYR (years)

I am also unsure if I need to designate "i." for the categorical independent variables as well as the expand variable (it does not seem to make a difference in the feologit I have run (the Beta is the same if I i.expand as when I do not).

So I presume to do everything the same as above for the LPM, except I would do this:

gen wait1 = .
replace wait1 = 1 if DAYWAIT==0 | DAYWAIT==1
replace wait1 = 2 if DAYWAIT==2 | DAYWAIT==3 | DAYWAIT==4
replace wait1 = . if DAYWAIT==.

gen wait2 = .
replace wait2 = 1 if DAYWAIT==0 | DAYWAIT== 1 | DAYWAIT==2
replace wait2 = 2 if DAYWAIT==3 | DAYWAIT==4
replace wait2 = . if DAYWAIT==.

gen hisplat = 0
replace hisplat = 1 if ETHNIC==1 | ETHNIC==2 | ETHNIC==3 | ETHNIC==5
replace hisplat = 0 if ETHNIC==4
replace hisplat = . if ETHNIC==.

gen race = 0
replace race = 1 if RACE==5
replace race = 2 if RACE==4
replace race = 3 if hisplat ==1
replace race = 4 if RACE==1 | RACE==2 | RACE==3 | RACE==6 | RACE==7 |RACE==8 | RACE==9
replace race = . if RACE==.

gen married = 0
replace married = 1 if MARSTAT==2
replace married = 0 if MARSTAT==1 | MARSTAT==3 | MARSTAT==4
replace married = . if MARSTAT==.

gen educ = 0
replace educ = 1 if EDUC==1 | EDUC==2
replace educ = 2 if EDUC==3
replace educ = 3 if EDUC==4 | EDUC==5
replace educ = . if EDUC==.

gen employed = 0
replace employed = 1 if EMPLOY==1 | EMPLOY==2
replace employed = 0 if EMPLOY==3| EMPLOY==4
replace employed = . if EMPLOY==.

gen roofovrhd = 0
replace roofovrhd = 0 if LIVARAG==1
replace roofovrhd = 1 if LIVARAG==2 | LIVARAG==3
replace roofovrhd = . if LIVARAG==.

gen primsub = 0
replace primsub = 1 if SUB1 ==1
replace primsub = 2 if SUB1 ==2
replace primsub = 3 if SUB1 ==3
replace primsub = 4 if SUB1 == 4
replace primsub = 5 if SUB1 ==5 | SUB1 ==6 | SUB1 ==7
replace primsub = 10 if SUB1 ==10
replace primsub = 27 if SUB1 ==8 | SUB1 ==9 | SUB1 ==11 | SUB1 ==12 | SUB1 ==13 | SUB1 ==14 | SUB1 ==15 | SUB1 ==16 | SUB1 ==17 | SUB1 ==18 | SUB1 ==19
replace primsub = . if SUB1 ==.



collapse (sum) DAYWAIT wait1 wait2 GENDER AGE race educ employed SERVICES REGION primsub PSOURCE admissions, by(ADMYR STFIPS)

gen expand = 0
foreach state in 4 5 8 10 15 17 19 21 24 25 26 32 33 35 36 38 39 40 41 44 50 54 {
replace expand = 1 if STFIPS==`state' & ADMYR >=2014
}

foreach state in 6 9 11 27 34 53 {
replace expand = 1 if STFIPS==`state' & ADMYR>=2011
}

replace expand = 1 if STFIPS==2 & ADMYR>=2015
replace expand = 1 if STFIPS==42 & ADMYR>=2015
replace expand = 1 if STFIPS==18 & ADMYR>=2016
replace expand = 1 if STFIPS==30 & ADMYR>=2016
replace expand = 1 if STFIPS==22 & ADMYR>=2016

Here is where I am just experimenting at the moment so see if the models run (I am not sure if I am specifying (or coding) correctly and therefore not sure if my effect estimates are in line with what I should expect if I had the correct set up):

xtset STFIPS ADMYR (this works after collapse-summing with gaps and panel is unbalanced)

Panel variable: STFIPS (unbalanced)
Time variable: ADMYR, 2009 to 2020, but with gaps
Delta: 1 unit

feologit DAYWAIT expand i.ADMYR, cluster(STFIPS) or
ologit DAYWAIT expand SERVICES GENDER race AGE employed REGION, vce(cluster STFIPS)
ologit DAYWAIT expand SERVICES i.ADMYR, vce(cluster STFIPS)

Here is my data (200 lines) - not collapse-summed

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte STFIPS int ADMYR byte DAYWAIT float wait2 byte GENDER float race byte(AGE REGION) float(admissions expand)
 1 2015 0 1 1 1  3 3 1 0
 2 2010 0 1 2 4  7 4 1 0
 2 2014 0 1 1 4  5 4 1 0
 5 2009 0 1 1 1  6 3 1 0
 5 2014 0 1 1 1  7 3 1 1
 5 2015 0 1 1 1  4 3 1 1
 5 2018 0 1 1 2 11 3 1 1
 5 2020 0 1 1 2  7 3 1 1
 6 2009 0 1 1 1  3 4 1 0
 6 2009 0 1 1 1  6 4 1 0
 6 2009 0 1 2 1  9 4 1 0
 6 2010 3 2 1 4  5 4 1 0
 6 2010 1 1 1 4  3 4 1 0
 6 2012 1 1 1 1  8 4 1 1
 6 2012 0 1 2 2 10 4 1 1
 6 2013 0 1 2 1  8 4 1 1
 6 2013 0 1 1 1  7 4 1 1
 6 2013 1 1 1 1  3 4 1 1
 6 2013 1 1 1 1  5 4 1 1
 6 2013 0 1 1 4  8 4 1 1
 6 2014 0 1 1 1  4 4 1 1
 6 2014 1 1 1 1  6 4 1 1
 6 2014 0 1 2 1  9 4 1 1
 6 2014 0 1 1 4  7 4 1 1
 6 2015 0 1 2 2  7 4 1 1
 6 2015 2 1 1 1  9 4 1 1
 6 2016 0 1 2 1  5 4 1 1
 6 2016 0 1 1 1  6 4 1 1
 6 2016 0 1 2 3  7 4 1 1
 6 2017 1 1 2 4  6 4 1 1
 6 2018 0 1 2 4  3 4 1 1
 6 2018 0 1 2 4  6 4 1 1
 6 2018 3 2 . 4  4 4 1 1
 6 2018 0 1 1 2 11 4 1 1
 6 2018 0 1 1 4  7 4 1 1
 6 2018 0 1 1 4 10 4 1 1
 6 2018 0 1 2 1  9 4 1 1
 6 2019 1 1 2 1  5 4 1 1
 6 2019 0 1 2 1  5 4 1 1
 6 2019 0 1 2 4  7 4 1 1
 6 2019 0 1 1 3  7 4 1 1
 6 2019 1 1 1 4  6 4 1 1
 6 2020 0 1 1 4  5 4 1 1
 6 2020 0 1 1 4  6 4 1 1
12 2009 0 1 1 1  5 3 1 0
12 2009 0 1 2 1  9 3 1 0
12 2010 0 1 2 1  6 3 1 0
12 2010 0 1 2 2  8 3 1 0
12 2011 0 1 2 4  8 3 1 0
12 2011 0 1 1 1  5 3 1 0
12 2012 0 1 1 1  6 3 1 0
12 2012 0 1 2 1  6 3 1 0
12 2013 0 1 1 1  7 3 1 0
12 2013 0 1 1 1 10 3 1 0
12 2014 0 1 1 1 10 3 1 0
12 2016 0 1 1 1  7 3 1 0
12 2016 3 2 1 2  4 3 1 0
12 2016 0 1 1 1  4 3 1 0
12 2017 0 1 2 4  5 3 1 0
12 2017 0 1 1 1  5 3 1 0
12 2017 0 1 1 1  6 3 1 0
12 2017 0 1 2 1  5 3 1 0
12 2017 0 1 1 4 11 3 1 0
16 2012 0 1 2 1  4 4 1 0
16 2018 0 1 2 1  4 4 1 0
17 2009 0 1 1 2  5 2 1 0
17 2009 0 1 1 2  6 2 1 0
17 2009 0 1 1 3  9 2 1 0
17 2009 1 1 1 3  6 2 1 0
17 2010 0 1 1 2  9 2 1 0
17 2011 3 2 1 2  3 2 1 0
17 2011 1 1 1 1  8 2 1 0
17 2011 1 1 1 2  7 2 1 0
17 2012 0 1 1 4  5 2 1 0
17 2013 3 2 1 1 11 2 1 0
17 2014 1 1 2 1 10 2 1 1
17 2014 2 1 1 2  4 2 1 1
17 2019 0 1 2 1  8 2 1 1
19 2009 3 2 2 1  6 2 1 0
19 2009 3 2 1 4  6 2 1 0
19 2010 3 2 2 1  4 2 1 0
19 2010 2 1 1 1  9 2 1 0
19 2012 0 1 1 1  4 2 1 0
19 2012 1 1 1 2  8 2 1 0
19 2014 0 1 2 1  6 2 1 1
19 2015 1 1 1 1  6 2 1 1
19 2019 1 1 2 1  6 2 1 1
19 2020 3 2 1 1  7 2 1 1
20 2012 0 1 2 1  6 2 1 0
20 2013 0 1 2 1 10 2 1 0
24 2009 0 1 2 2  8 3 1 0
24 2010 0 1 2 2  8 3 1 0
24 2012 1 1 1 1  4 3 1 0
24 2015 1 1 2 1  7 3 1 1
24 2015 0 1 1 4 11 3 1 1
24 2016 0 1 2 1  8 3 1 1
24 2016 0 1 1 .  5 3 1 1
24 2016 0 1 1 2 10 3 1 1
24 2016 0 1 1 1  9 3 1 1
24 2016 1 1 1 .  9 3 1 1
24 2017 0 1 1 2 10 3 1 1
24 2018 0 1 1 .  7 3 1 1
24 2018 0 1 2 1  6 3 1 1
24 2018 0 1 1 1  5 3 1 1
26 2009 0 1 1 2  5 2 1 0
26 2010 1 1 1 1  6 2 1 0
26 2012 0 1 1 1 11 2 1 0
26 2013 0 1 2 1  8 2 1 0
26 2015 0 1 1 2 10 2 1 1
26 2015 0 1 1 1  8 2 1 1
26 2016 2 1 1 1  6 2 1 1
26 2018 0 1 1 2  7 2 1 1
26 2019 1 1 1 1  6 2 1 1
26 2020 1 1 2 1  5 2 1 1
29 2012 2 1 1 1  9 2 1 0
29 2013 2 1 1 1  6 2 1 0
29 2017 1 1 1 1  6 2 1 0
29 2018 0 1 2 1  4 2 1 0
29 2019 4 2 2 1  5 2 1 0
29 2020 0 1 2 1  7 2 1 0
29 2020 2 1 1 1  3 2 1 0
30 2010 0 1 2 4  6 4 1 0
30 2015 0 1 1 1  3 4 1 0
32 2010 0 1 1 1  7 4 1 0
32 2018 2 1 1 3  9 4 1 1
33 2009 4 2 1 1  5 1 1 0
34 2009 0 1 1 3  9 1 1 0
34 2009 1 1 1 3  7 1 1 0
34 2010 0 1 1 1  5 1 1 0
34 2011 2 1 2 1  8 1 1 1
34 2011 1 1 2 1  4 1 1 1
34 2011 1 1 2 2  4 1 1 1
34 2011 2 1 1 1  6 1 1 1
34 2013 1 1 1 2  5 1 1 1
34 2014 3 2 2 2  4 1 1 1
34 2014 1 1 1 1  3 1 1 1
34 2015 0 1 1 2  7 1 1 1
34 2016 1 1 2 1  7 1 1 1
34 2016 1 1 1 3  5 1 1 1
34 2016 1 1 1 2 10 1 1 1
34 2017 1 1 1 1  8 1 1 1
34 2017 1 1 2 1  9 1 1 1
34 2017 1 1 1 2  7 1 1 1
34 2019 1 1 1 3  7 1 1 1
34 2019 1 1 2 1  9 1 1 1
34 2019 1 1 2 1  6 1 1 1
34 2020 0 1 2 1  6 1 1 1
34 2020 0 1 2 1  7 1 1 1
35 2014 . . 1 3  3 4 1 1
35 2018 0 1 1 .  3 4 1 1
35 2019 0 1 1 4 11 4 1 1
38 2016 3 2 1 4  7 2 1 1
38 2017 1 1 2 1  3 2 1 1
39 2009 0 1 1 2  6 2 1 0
39 2009 . . 1 .  7 2 1 0
39 2010 0 1 1 1  6 2 1 0
39 2010 4 2 1 2  4 2 1 0
39 2010 3 2 1 1  4 2 1 0
39 2011 0 1 1 2  7 2 1 0
39 2012 0 1 1 1  5 2 1 0
39 2014 1 1 1 3  5 2 1 1
39 2014 0 1 1 1  4 2 1 1
39 2015 0 1 1 1  6 2 1 1
39 2015 . . 1 1  7 2 1 1
39 2016 1 1 2 1  8 2 1 1
39 2016 0 1 1 1  9 2 1 1
39 2016 0 1 1 1  6 2 1 1
39 2018 1 1 2 1  7 2 1 1
39 2018 0 1 2 1  3 2 1 1
39 2019 0 1 2 1  9 2 1 1
39 2020 1 1 1 2 11 2 1 1
39 2020 0 1 1 .  6 2 1 1
39 2020 0 1 1 2  4 2 1 1
45 2010 3 2 1 1  7 3 1 0
45 2012 1 1 1 2  6 3 1 0
45 2013 1 1 2 1  5 3 1 0
45 2015 0 1 1 2  7 3 1 0
45 2017 0 1 2 2  7 3 1 0
45 2018 0 1 2 2  4 3 1 0
46 2015 0 1 2 4  6 2 1 0
46 2015 0 1 1 4  7 2 1 0
46 2019 0 1 1 1  4 2 1 0
47 2010 4 2 1 1  5 3 1 0
47 2011 0 1 1 1  4 3 1 0
47 2015 0 1 1 1  8 3 1 0
47 2015 0 1 1 1  6 3 1 0
47 2016 1 1 1 1 11 3 1 0
48 2009 0 1 2 1 10 3 1 0
48 2010 0 1 2 2  6 3 1 0
48 2011 0 1 1 2  9 3 1 0
48 2014 0 1 1 1  6 3 1 0
48 2015 0 1 1 3  4 3 1 0
48 2016 0 1 1 1  5 3 1 0
48 2019 0 1 1 1 11 3 1 0
49 2009 0 1 2 4  3 4 1 0
49 2011 0 1 1 1  9 4 1 0
49 2014 0 1 1 4 10 4 1 0
49 2016 0 1 1 1  6 4 1 0
49 2018 0 1 1 1 10 4 1 0
49 2020 0 1 1 4  6 4 1 0
end
label values STFIPS STFIPS
label def STFIPS 1 "1. Alabama", modify
label def STFIPS 2 "2. Alaska", modify
label def STFIPS 5 "5. Arkansas", modify
label def STFIPS 6 "6. California", modify
label def STFIPS 12 "12. Florida", modify
label def STFIPS 16 "16. Idaho", modify
label def STFIPS 17 "17. Illinois", modify
label def STFIPS 19 "19. Iowa", modify
label def STFIPS 20 "20. Kansas", modify
label def STFIPS 24 "24. Maryland", modify
label def STFIPS 26 "26. Michigan", modify
label def STFIPS 29 "29. Missouri", modify
label def STFIPS 30 "30. Montana", modify
label def STFIPS 32 "32. Nevada", modify
label def STFIPS 33 "33. New Hampshire", modify
label def STFIPS 34 "34. New Jersey", modify
label def STFIPS 35 "35. New Mexico", modify
label def STFIPS 38 "38. North Dakota", modify
label def STFIPS 39 "39. Ohio", modify
label def STFIPS 45 "45. South Carolina", modify
label def STFIPS 46 "46. South Dakota", modify
label def STFIPS 47 "47. Tennessee", modify
label def STFIPS 48 "48. Texas", modify
label def STFIPS 49 "49. Utah", modify
label values ADMYR ADMYR
label def ADMYR 2009 "2009. 2009", modify
label def ADMYR 2010 "2010. 2010", modify
label def ADMYR 2011 "2011. 2011", modify
label def ADMYR 2012 "2012. 2012", modify
label def ADMYR 2013 "2013. 2013", modify
label def ADMYR 2014 "2014. 2014", modify
label def ADMYR 2015 "2015. 2015", modify
label def ADMYR 2016 "2016. 2016", modify
label def ADMYR 2017 "2017. 2017", modify
label def ADMYR 2018 "2018. 2018", modify
label def ADMYR 2019 "2019. 2019", modify
label def ADMYR 2020 "2020. 2020", modify
label values DAYWAIT DAYWAIT
label def DAYWAIT 0 "0. 0", modify
label def DAYWAIT 1 "1. 1-7", modify
label def DAYWAIT 2 "2. 8-14", modify
label def DAYWAIT 3 "3. 15-30", modify
label def DAYWAIT 4 "4. 31 or more", modify
label values GENDER GENDER
label def GENDER 1 "1. Male", modify
label def GENDER 2 "2. Female", modify
label values AGE AGE
label def AGE 3 "3. 18-20 years", modify
label def AGE 4 "4. 21-24 years", modify
label def AGE 5 "5. 25-29 years", modify
label def AGE 6 "6. 30-34 years", modify
label def AGE 7 "7. 35-39 years", modify
label def AGE 8 "8. 40-44 years", modify
label def AGE 9 "9. 45-49 years", modify
label def AGE 10 "10. 50-54 years", modify
label def AGE 11 "11. 55-64 years", modify
label values REGION REGION
label def REGION 1 "1. Northeast", modify
label def REGION 2 "2. Midwest", modify
label def REGION 3 "3. South", modify
label def REGION 4 "4. West", modify
label var STFIPS "Census state FIPS code"
label var ADMYR "Year of admission"
label var DAYWAIT "Days waiting to enter substance use treatment"
label var GENDER "Gender"
label var AGE "Age at admission"
label var REGION "Census region"

I realize that I may need to clarify, and thanks much in advance for help and guidance with respect to the non-linear categorical models.

William "Cam" Bigler