I noticed some strange behaviour of dfbeta after regress: when I want to correct for outliers using a single categorical variable (and using the i-prefix in the regresion), dfbetas for these outliers are sometimes very strange: I would expect them to be zero or missing, however in some cases they are pretty large (e.g. 31.8 in on of the examples below). This behaviour seems to be random, depending on the seed?
If I create separate dummies, behaviour of dfbetas is as expected. See code below for an example.
Does anyone has an idea of what's going wrong?
Code:
clear all set obs 50 set seed 1 gen x=rnormal() gen y = 1+1.2*x+rnormal() replace y =3*y if _n==1 | _n==3 |_n==5 |_n==7 //create some extreme values --> outliers gen outliers=0 replace outliers=_n if _n==1 | _n==3 |_n==5 |_n==7 //get outlier indicators * Create separte dummies for outliers gen D1=0 replace D1=1 in 1 gen D3=0 replace D3=1 in 3 gen D5=0 replace D5=1 in 5 gen D7=0 replace D7=1 in 7 reg y x i.outliers //regression with extreme-case dummies dfbeta //weird??? dfbeta's /* for obs 1 : all zero for obs 3: 4/5 dfbeta small but not zero, HOWEVER: _dfbeta_3 = 31.8 ?? for obs 5: all missing for obs 7: all missing */ reg y x D1 D3 D5 D7 // same regression with separate dummies dfbeta // all dfbeta's for obs 1, 3, 5 and 7 missing (what I would expect) *** * Same procedure with different seed clear all set obs 50 set seed 2 gen x=rnormal() gen y = 1+1.2*x+rnormal() replace y =3*y if _n==1 | _n==3 |_n==5 |_n==7 //create some extreme values --> outliers gen outliers=0 replace outliers=_n if _n==1 | _n==3 |_n==5 |_n==7 //get outlier indicators * Create separte dummies for outliers gen D1=0 replace D1=1 in 1 gen D3=0 replace D3=1 in 3 gen D5=0 replace D5=1 in 5 gen D7=0 replace D7=1 in 7 reg y x i.outliers //regression with extreme-case dummies dfbeta //weird??? dfbeta's /* Dfbeta's for obs 1 : one missing, 3 small but not zero, HOWEVER: _dfbeta_2 = 4 ?? for obs 3: all missing for obs 5: all missing for obs 7: 1 missing and 4 zeroes */ reg y x D1 D3 D5 D7 // same regression with separate dummies dfbeta // all dfbeta's for obs 1, 3, 5 and 7 missing (what I would expect) ***
Thank you very much,
Mike
0 Response to Weird dfbetas after regress with case specific dummies
Post a Comment