I have a dataset of about 3800 observations. This dataset is in long form: I have about 760 individuals (760*5=3800) who have five choice alternatives (in my case, hours of labour in a week). With a mixed logit model, I have obtained predictions for each number of hours worked, by each individual (idcode). I want to check the accuracy of my logit predictions with what the individuals actually worked, and wanted to do this in 3 steps, listed below.

My problem lies at step 2: when i generate the match dummy with the line starting with by idperson, for all 3800 observations, Stata generates a missing value. I have made dummies before and have tried to make this one in different ways as well, but each time Stata output says: (3,800 missing values generated).

Any help is greatly appreciated,
Olivier


1. With the following code, I created a variable that lists the maximum probability across the 5 labour quantity alternatives for each idcode:

Code:
bysort idcode: egen max_prediction=max(pr)
pr being the probabilities generated by the logit model.

2. Now I want to create a dummy "match" that equals 1 when the choice alternative for a row (pr) equals max_prediction. In other words, a dummy that equals 1 for the choice alternative with the highest probability across the 5 options.

Code:
by idperson: gen match = 1 if max_prediction==pr
replace match=0 if match==.
3. See how often "match" equals one & the variable "choice" equals one, choice being a dummy that is 1 when the individual's true hours worked is that choice alternative.