Dear forum,

I'm an economics PhD student playing around with the new Discrete Choice features in Stata 16. We are learning about nested logit using Hansen's textbook so I was playing around with this CPS09 dataset this evening: https://www.ssc.wisc.edu/~bhansen/ec...s/cps09mar.dta

I wanted to do a basic nested logit on marital status as a function of age for women in the sample.

Code:
 /* bring in the cps09 data and create a copy */
  use cps09mar;frame copy default q1;
 
/* now we try to estimate the nested logit */ frame change q1;
  keep if female == 1;
  keep marital age;
    /* reshape the data as necessary */
  gen id = _n;
  tabulate marital, generate(m);
  reshape long m, i(id) j(chosen);
  cmset id chosen;
  cmtab, choice(m);
    /* a baseline logit model with no nesting -> works fine*/
  cmclogit m, casevars(age);
    /* set up two levels here */
  nlogitgen middle = marital(together: 1 | 2 | 3, widowed: 4, separated: 5 | 6, single: 7);
  nlogitgen top = marital(together: 1 | 2 | 3 | 4 | 5 | 6, single: 7);
    /* we get a warning here
  what does it mean:
  variable marital has replicate levels for one or more cases; this is not allowed */
  nlogittree marital middle top, case(id);
 
tree structure specified for the nested logit model

 top        N        middle      N       marital   N  
-------------------------------------------------------
 together 120050 --- together  89516 --- 1       86730
                  |                   |- 2        1015
                  |                   +- 3        1771
                  |- widowed    3815 --- 4        3815
                  +- separated 26719 --- 5       22253
                                      +- 6        4466
 single    31164 --- single    31164 --- 7       31164
-------------------------------------------------------
                                         total  151214

N = number of observations at each level

Note: At least one case has replicated alternatives; nlogit will not allow this.
Note: At least one case has only one alternative; nlogit will drop these cases.

/* run the estimation */ nlogit m || top: || middle: || marital:, case(id);
 
note: branch 2 of level 1 is degenerate and the associated dissimilarity parameter  is
       not defined; see help nlogit for details

variable marital has replicate levels for one or more cases; this is not allowed
r(459);
Here is the data at the end of that sequence of commands:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id byte chosen double(age marital) byte m int(middle top)
1 1 41 1 1 1 1
1 2 41 1 0 1 1
1 3 41 1 0 1 1
1 4 41 1 0 1 1
1 5 41 1 0 1 1
1 6 41 1 0 1 1
1 7 41 1 0 1 1
2 1 66 5 0 3 1
2 2 66 5 0 3 1
2 3 66 5 0 3 1
2 4 66 5 0 3 1
2 5 66 5 1 3 1
2 6 66 5 0 3 1
2 7 66 5 0 3 1
3 1 49 1 1 1 1
3 2 49 1 0 1 1
3 3 49 1 0 1 1
3 4 49 1 0 1 1
3 5 49 1 0 1 1
3 6 49 1 0 1 1
3 7 49 1 0 1 1
4 1 52 1 1 1 1
4 2 52 1 0 1 1
4 3 52 1 0 1 1
4 4 52 1 0 1 1
4 5 52 1 0 1 1
4 6 52 1 0 1 1
4 7 52 1 0 1 1
end
label values middle lb_middle
label def lb_middle 1 "together", modify
label def lb_middle 3 "separated", modify
label values top lb_top
label def lb_top 1 "together", modify
I have looked at the examples in Microeconometrics Using Stata and the new reference manual for Stata Choice Models with Stata 16 to no avail. I have no idea what "replicate level for one or more cases" means.

Thanks for your help.

Stephen