Greetings,

I'm running Stata 15.1 on a Mac OS and currently working with survey data that's been merged with census data (using respondents' self-reported county and zip code of residence). I'm trying to determine whether the means across political groups ('party3'--a categorical variable) for a continuous outcome are significantly different from zero:

Code:
. reg segindex i.party3  if  white==1 [pweight=weight_pre], cluster(inputstate)
(sum of wgt is 39,638.3774779992)

Linear regression                               Number of obs     =     41,878
F(2, 50)          =       9.09
Prob > F          =     0.0004
R-squared         =     0.0066
Root MSE          =     .98795

(Std. Err. adjusted for 51 clusters in inputstate)

Robust
segindex       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]

party3
2    -.1022216   .0369211    -2.77   0.008    -.1763797   -.0280635
3    -.1774221   .0444548    -3.99   0.000    -.2667122    -.088132

_cons    .0772096   .1093238     0.71   0.483    -.1423737    .2967929


. margins i.party3, post

Adjusted predictions    Number    of    obs     =    41,878
Model VCE    : Robust

Expression   : Linear prediction, predict()

                
Delta-method
Margin   Std. Err.      t    P>t        [95% Conf.    Interval]
                
party3
1     .0772096   .1093238     0.71    0.483        -.1423737    .2967929
2     -.025012   .1101313    -0.23    0.821        -.2462172    .1961932
3    -.1002126   .1150351    -0.87    0.388        -.3312674    .1308423
                

. margins, coeflegend

Adjusted predictions    Number    of    obs     =    41,878
Model VCE    : Robust

Expression   : Linear prediction, predict()

                
Margin  Legend
                
party3
1     .0772096  _b[1bn.party3]
2     -.025012  _b[2.party3]
3    -.1002126  _b[3.party3]
                

. test _b[3.party3]=_b[1bn.party3]

( 1)  - 1bn.party3 + 3.party3 = 0

F(  1,    50) =   15.93
Prob > F =    0.0002
What's confusing me here is that the test reports a < 0.001 p-value, but the confidence intervals for each of the means overlap considerably. I understand that means with overlapping confidence intervals can still be significant at the p < 0.05 level. But the p < 0.001 level? That just doesn't make sense to me. Can anyone tell me what's going on here (or what I'm missing)? Thanks in advance for your time.