Hello Statalisters,

I noticed some strange behaviour of dfbeta after regress: when I want to correct for outliers using a single categorical variable (and using the i-prefix in the regresion), dfbetas for these outliers are sometimes very strange: I would expect them to be zero or missing, however in some cases they are pretty large (e.g. 31.8 in on of the examples below). This behaviour seems to be random, depending on the seed?
If I create separate dummies, behaviour of dfbetas is as expected. See code below for an example.
Does anyone has an idea of what's going wrong?

Code:
clear all
set obs 50
set seed 1
gen x=rnormal()
gen y = 1+1.2*x+rnormal()

replace y =3*y if _n==1 | _n==3 |_n==5 |_n==7 //create some extreme values --> outliers

gen outliers=0
replace outliers=_n if _n==1 | _n==3 |_n==5 |_n==7 //get outlier indicators
* Create separte dummies for outliers
gen D1=0
replace D1=1 in 1
gen D3=0
replace D3=1 in 3
gen D5=0
replace D5=1 in 5
gen D7=0
replace D7=1 in 7

reg y x i.outliers //regression with extreme-case dummies

dfbeta //weird??? dfbeta's 
/* 
for obs 1 : all zero
for obs 3: 4/5 dfbeta small but not zero, HOWEVER: _dfbeta_3 = 31.8 ??
for obs 5: all missing 
for obs 7: all missing
*/
reg y x D1 D3 D5 D7 // same regression with separate dummies
dfbeta // all dfbeta's for obs 1, 3, 5 and 7 missing (what I would expect)

***
* Same procedure with different seed

clear all
set obs 50
set seed 2
gen x=rnormal()
gen y = 1+1.2*x+rnormal()

replace y =3*y if _n==1 | _n==3 |_n==5 |_n==7 //create some extreme values --> outliers

gen outliers=0
replace outliers=_n if _n==1 | _n==3 |_n==5 |_n==7 //get outlier indicators
* Create separte dummies for outliers
gen D1=0
replace D1=1 in 1
gen D3=0
replace D3=1 in 3
gen D5=0
replace D5=1 in 5
gen D7=0
replace D7=1 in 7

reg y x i.outliers //regression with extreme-case dummies

dfbeta //weird??? dfbeta's 
/* Dfbeta's 
for obs 1 : one missing, 3 small but not zero, HOWEVER: _dfbeta_2 = 4 ??
for obs 3: all missing 
for obs 5: all missing 
for obs 7: 1 missing and 4 zeroes
*/
reg y x D1 D3 D5 D7 // same regression with separate dummies
dfbeta // all dfbeta's for obs 1, 3, 5 and 7 missing (what I would expect)
***

Thank you very much,
Mike