I have a question about the procedure how the PPML command is checking for overfitting. I haven't found anything regarding this matter in Statalist or any other source, so I'm posting my question here.
Background:
I'm a PhD student and currently use a Gravity Model to analyze dairy trade flows on a disaggregated level (Harmonized System, 6-digit level). The bilateral trade flows of the products are organized in individual panel data sets of 29 countries over 16 years. As the data is very disaggregated there are a lot of zero trade fows within the datasets (more or less depending on the product). Due to the occurence of the zero trade flows and the different levels of trade flows between different country pairs I chose the PPML command to estiamte the Gravity Models. For some estimations I get the warning
Code:
"WARNING: The model appears to overfit some observations with `y'=0"
My problem/question:
My question is not about the When and Why overfitting occures, it concernces the procedure of how overfitting is detected within the PPML command. As I understand it overfitting occures when there is perfect collinearity for subsamples of the data with positive observations. In such a case estimates should not be available. If, nevertheless, the algorithm finds an estimate it is spurious and the coefficients shouldn't/can't be interpreted.
Looking into the ado-file of the PPML command, I found the impotant overfit-checking part at the end of the code:
Code:
qui su `y' if (`y'>0)&(`touse') local _lbp=r(min) qui su `yh' if (`y'==0)&(`touse') if (r(min)<1e-6*`_lbp') di as error "WARNING: The model appears to overfit some observations with `y'=0"
What I don't understand is why the smallest estimate for observations with y==0 "r(min)" is compared to 1e-6 times the smallest estimate for observations with y>0 "_lbp"? Does this somehow reflect the perfect collinearity in the subsamples or the "perfect fit for observations" with y==0? If yes, why is it reflected in doing this comparison?
I hope you can help me solve this mystery of mine!

Kind regards,
Marvin
0 Response to How does the PPML command check for overfitting?
Post a Comment