I want to run OLS regression on my dataset and I need advice on if I should choose areg or reghdfe or xtreg.
Data: I'm using a subsample of this data. Each observation is one birth, with information of the baby, the mom, and dad and birth time, state, county, etc. There's no id for the mother so I can't tell if they are same moms over different years or different moms over different years. I'm assuming they are different moms.
I'm asked to run the following regression:
Regression: Y_icmy = b0 + b1*EmploymentRate_cy + b2*X_i + u_ym + e_icmy
i: each birth
c: mom's residing county
m: birth month
y: birth year
Y_icmy are 4 different outcomes including, baby's birth weight indicator, delivery method, prenatal care visits pre-term birth.
EmploymentRate_cy is the employment rate of the county
X_i is a group of dummies (for eg, mom's education, race, ..)
u_ym is a set of year-by-month fixed effects of the conception time
standard err is clustered on the mom’s county
Stata Results: I tried xtreg, areg, and reghdfe and the code & results are as below:
xtset cntyrfip foreach yvar in birthwt pretermbir csec nprevis{ xtreg `yvar' emptopop i.dummage i.dmeduc i.dummrace i.dmar i.csex i.dumbirorder conceptmodate, fe cluster(cntyrfip) }
panel variable: cntyrfip (unbalanced) Fixed-effects (within) regression Number of obs = 415,258 Group variable: cntyrfip Number of groups = 28 R-sq: Obs per group: within = 0.0132 min = 536 between = 0.3507 avg = 14,830.6 overall = 0.0133 max = 97,957 F(27,27) = . corr(u_i, Xb) = -0.1707 Prob > F = . (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust birthwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | .0188474 .0422724 0.45 0.659 -.0678885 .1055833 | dummage | 2 | -.0234066 .0014085 -16.62 0.000 -.0262965 -.0205166 3 | -.0197053 .0022694 -8.68 0.000 -.0243617 -.0150488 4 | -.0005011 .0025157 -0.20 0.844 -.0056628 .0046606 | dmeduc | 1 | -.0060045 .012484 -0.48 0.634 -.0316195 .0196105 2 | -.009585 .0083465 -1.15 0.261 -.0267106 .0075406 3 | -.0201655 .009807 -2.06 0.050 -.0402877 -.0000433 4 | -.0212308 .0099151 -2.14 0.041 -.0415749 -.0008867 5 | -.0136463 .0039725 -3.44 0.002 -.0217972 -.0054953 6 | -.0267736 .004405 -6.08 0.000 -.0358119 -.0177353 7 | -.0163685 .0086613 -1.89 0.070 -.03414 .001403 8 | -.016259 .0078926 -2.06 0.049 -.0324533 -.0000647 9 | -.0179093 .0059974 -2.99 0.006 -.0302151 -.0056036 10 | -.0159399 .006219 -2.56 0.016 -.0287002 -.0031796 11 | -.0152933 .0070564 -2.17 0.039 -.0297719 -.0008147 12 | -.0236959 .0065388 -3.62 0.001 -.0371124 -.0102793 13 | -.0281477 .0068223 -4.13 0.000 -.0421458 -.0141495 14 | -.0305678 .0065424 -4.67 0.000 -.0439918 -.0171439 15 | -.0319048 .0066587 -4.79 0.000 -.0455673 -.0182422 16 | -.036879 .0070394 -5.24 0.000 -.0513228 -.0224353 17 | -.0372261 .0077217 -4.82 0.000 -.0530698 -.0213824 | dummrace | 2 | .0545832 .0013128 41.58 0.000 .0518895 .0572769 3 | -.0051098 .0015808 -3.23 0.003 -.0083533 -.0018662 4 | .011673 .0025248 4.62 0.000 .0064926 .0168535 | 2.dmar | .0237415 .0017259 13.76 0.000 .0202003 .0272827 2.csex | .0097996 .0006279 15.61 0.000 .0085112 .011088 4.dumbirorder | -.0609698 .0063748 -9.56 0.000 -.0740499 -.0478897 conceptmodate | .0001519 .0000231 6.58 0.000 .0001045 .0001992 _cons | .024313 .0177508 1.37 0.182 -.0121087 .0607347 --------------+---------------------------------------------------------------- sigma_u | .00738704 sigma_e | .23620757 rho | .00097708 (fraction of variance due to u_i) ------------------------------------------------------------------------------- Fixed-effects (within) regression Number of obs = 415,258 Group variable: cntyrfip Number of groups = 28 R-sq: Obs per group: within = 0.0146 min = 536 between = 0.0797 avg = 14,830.6 overall = 0.0096 max = 97,957 F(27,27) = . corr(u_i, Xb) = -0.6292 Prob > F = . (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust pretermbir | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | .2171136 .0674441 3.22 0.003 .0787297 .3554975 | dummage | 2 | -.043691 .0026736 -16.34 0.000 -.0491767 -.0382052 3 | -.0487423 .0033546 -14.53 0.000 -.0556255 -.0418592 4 | -.0188399 .0034741 -5.42 0.000 -.0259681 -.0117117 | dmeduc | 1 | .013965 .0137682 1.01 0.319 -.0142851 .042215 2 | .009339 .0068673 1.36 0.185 -.0047516 .0234296 3 | -.0030966 .0064848 -0.48 0.637 -.0164022 .010209 4 | -.0130372 .0096425 -1.35 0.188 -.032822 .0067475 5 | -.0129679 .0050734 -2.56 0.017 -.0233778 -.0025581 6 | -.0282894 .0070967 -3.99 0.000 -.0428506 -.0137283 7 | -.0197083 .008737 -2.26 0.032 -.037635 -.0017815 8 | -.014648 .0061302 -2.39 0.024 -.0272261 -.00207 9 | -.0230106 .0072043 -3.19 0.004 -.0377926 -.0082286 10 | -.019993 .0058661 -3.41 0.002 -.0320293 -.0079568 11 | -.0193685 .0073412 -2.64 0.014 -.0344315 -.0043055 12 | -.0248285 .0069535 -3.57 0.001 -.0390959 -.0105612 13 | -.0286812 .0071548 -4.01 0.000 -.0433618 -.0140007 14 | -.0267416 .0076261 -3.51 0.002 -.0423891 -.011094 15 | -.0294642 .0061119 -4.82 0.000 -.0420048 -.0169236 16 | -.0382608 .007095 -5.39 0.000 -.0528185 -.0237031 17 | -.0387698 .0078232 -4.96 0.000 -.0548217 -.0227179 | dummrace | 2 | .0682115 .0034626 19.70 0.000 .0611069 .075316 3 | .0084065 .0036133 2.33 0.028 .0009926 .0158204 4 | .0143703 .0030856 4.66 0.000 .0080393 .0207014 | 2.dmar | .0324672 .00117 27.75 0.000 .0300665 .0348679 2.csex | -.0087205 .0008703 -10.02 0.000 -.0105062 -.0069348 4.dumbirorder | -.1525281 .0091189 -16.73 0.000 -.1712385 -.1338176 conceptmodate | .0002491 .0000394 6.32 0.000 .0001682 .00033 _cons | -.01957 .0248677 -0.79 0.438 -.0705943 .0314542 --------------+---------------------------------------------------------------- sigma_u | .02551021 sigma_e | .30501554 rho | .00694635 (fraction of variance due to u_i) ------------------------------------------------------------------------------- Fixed-effects (within) regression Number of obs = 415,258 Group variable: cntyrfip Number of groups = 28 R-sq: Obs per group: within = . min = 536 between = . avg = 14,830.6 overall = . max = 97,957 F(0,27) = . corr(u_i, Xb) = . Prob > F = . (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust csec | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | 0 (omitted) | dummage | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) | dmeduc | 1 | 0 (omitted) 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) 5 | 0 (omitted) 6 | 0 (omitted) 7 | 0 (omitted) 8 | 0 (omitted) 9 | 0 (omitted) 10 | 0 (omitted) 11 | 0 (omitted) 12 | 0 (omitted) 13 | 0 (omitted) 14 | 0 (omitted) 15 | 0 (omitted) 16 | 0 (omitted) 17 | 0 (omitted) | dummrace | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) | 2.dmar | 0 (omitted) 2.csex | 0 (omitted) 4.dumbirorder | 0 (omitted) conceptmodate | 0 (omitted) _cons | 0 (omitted) --------------+---------------------------------------------------------------- sigma_u | 0 sigma_e | 0 rho | . (fraction of variance due to u_i) ------------------------------------------------------------------------------- Fixed-effects (within) regression Number of obs = 415,258 Group variable: cntyrfip Number of groups = 28 R-sq: Obs per group: within = 0.1525 min = 536 between = 0.5075 avg = 14,830.6 overall = 0.1613 max = 97,957 F(27,27) = . corr(u_i, Xb) = 0.0055 Prob > F = . (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust nprevis | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | -1.909821 3.595874 -0.53 0.600 -9.287944 5.468302 | dummage | 2 | .819195 .0677186 12.10 0.000 .6802479 .9581422 3 | 1.592506 .0703909 22.62 0.000 1.448076 1.736936 4 | 1.838432 .0741365 24.80 0.000 1.686316 1.990547 | dmeduc | 1 | .9760593 .3967149 2.46 0.021 .1620674 1.790051 2 | .7211828 .1885137 3.83 0.001 .3343847 1.107981 3 | .8592792 .4071324 2.11 0.044 .0239126 1.694646 4 | .9825091 .3128487 3.14 0.004 .3405965 1.624422 5 | .9184139 .3094473 2.97 0.006 .2834804 1.553347 6 | 1.243555 .459368 2.71 0.012 .3010099 2.186101 7 | 1.986289 .5034672 3.95 0.001 .9532593 3.019318 8 | 2.233033 .5824491 3.83 0.001 1.037947 3.42812 9 | 2.199378 .5549521 3.96 0.000 1.06071 3.338045 10 | 2.760089 .5764386 4.79 0.000 1.577334 3.942843 11 | 2.874429 .6213346 4.63 0.000 1.599556 4.149303 12 | 3.712736 .6371179 5.83 0.000 2.405478 5.019994 13 | 4.105504 .6737145 6.09 0.000 2.723156 5.487852 14 | 4.233067 .6717071 6.30 0.000 2.854838 5.611296 15 | 4.240404 .6982558 6.07 0.000 2.807701 5.673106 16 | 4.405252 .696915 6.32 0.000 2.9753 5.835203 17 | 4.456904 .7533609 5.92 0.000 2.911135 6.002673 | dummrace | 2 | -.7893785 .0684183 -11.54 0.000 -.9297612 -.6489959 3 | -1.095532 .1876042 -5.84 0.000 -1.480464 -.7106 4 | -1.006129 .0555952 -18.10 0.000 -1.120201 -.8920574 | 2.dmar | -1.592861 .0685017 -23.25 0.000 -1.733415 -1.452307 2.csex | .0508665 .0097061 5.24 0.000 .0309511 .0707818 4.dumbirorder | -1.412481 .4557176 -3.10 0.004 -2.347536 -.4774254 conceptmodate | .0170811 .0034858 4.90 0.000 .0099288 .0242334 _cons | 1.180458 1.121609 1.05 0.302 -1.120894 3.48181 --------------+---------------------------------------------------------------- sigma_u | .73037114 sigma_e | 4.1227098 rho | .03042992 (fraction of variance due to u_i) -------------------------------------------------------------------------------
foreach yvar in birthwt pretermbir csec nprevis{ xtreg `yvar' emptopop i.dummage i.dmeduc i.dummrace i.dmar i.csex i.dumbirorder, absorb(conceptmodate) cluster(cntyrfip) }
Linear regression, absorbing indicators Number of obs = 415,258 F( 27, 27) = 1328.66 Prob > F = 0.0000 R-squared = 0.2888 Adj R-squared = 0.2853 Root MSE = 0.2011 (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust birthwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | .0008996 .0070898 0.13 0.900 -.0136474 .0154466 | dummage | 2 | -.0047123 .0012823 -3.67 0.001 -.0073435 -.0020812 3 | .0000784 .0015011 0.05 0.959 -.0030015 .0031583 4 | .0076676 .0019832 3.87 0.001 .0035984 .0117368 | dmeduc | 1 | -.0038257 .010759 -0.36 0.725 -.0259014 .01825 2 | -.0139574 .0073547 -1.90 0.068 -.029048 .0011331 3 | -.0135079 .006974 -1.94 0.063 -.0278174 .0008015 4 | -.0108141 .0083696 -1.29 0.207 -.027987 .0063588 5 | -.0042134 .0051607 -0.82 0.421 -.0148022 .0063754 6 | -.0105367 .0046365 -2.27 0.031 -.02005 -.0010234 7 | -.0050185 .0071526 -0.70 0.489 -.0196944 .0096575 8 | -.0067927 .0069436 -0.98 0.337 -.0210397 .0074544 9 | -.0041783 .0052189 -0.80 0.430 -.0148867 .0065301 10 | -.0031763 .0052651 -0.60 0.551 -.0139794 .0076267 11 | -.0041207 .0052407 -0.79 0.439 -.0148738 .0066323 12 | -.0102089 .0056143 -1.82 0.080 -.0217285 .0013108 13 | -.0136341 .0058264 -2.34 0.027 -.0255888 -.0016794 14 | -.0156871 .0052049 -3.01 0.006 -.0263667 -.0050076 15 | -.0164624 .0066196 -2.49 0.019 -.0300447 -.0028802 16 | -.0187018 .0059169 -3.16 0.004 -.0308422 -.0065615 17 | -.0186256 .0060682 -3.07 0.005 -.0310765 -.0061747 | dummrace | 2 | .0223933 .0011831 18.93 0.000 .0199659 .0248207 3 | -.0082914 .0009647 -8.59 0.000 -.0102708 -.006312 4 | .0055149 .0019515 2.83 0.009 .0015107 .0095191 | 2.dmar | .0104538 .001398 7.48 0.000 .0075853 .0133223 2.csex | .0135626 .0006175 21.96 0.000 .0122956 .0148296 4.dumbirorder | -.0188707 .0078462 -2.41 0.023 -.0349698 -.0027716 _cons | .0634819 .0058863 10.78 0.000 .0514042 .0755596 --------------+---------------------------------------------------------------- conceptmodate | absorbed (1985 categories) Linear regression, absorbing indicators Number of obs = 415,258 F( 0, 27) = . Prob > F = . R-squared = 1.0000 Adj R-squared = 1.0000 Root MSE = 0.0000 (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust pretermbir | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | 0 (omitted) | dummage | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) | dmeduc | 1 | 0 (omitted) 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) 5 | 0 (omitted) 6 | 0 (omitted) 7 | 0 (omitted) 8 | 0 (omitted) 9 | 0 (omitted) 10 | 0 (omitted) 11 | 0 (omitted) 12 | 0 (omitted) 13 | 0 (omitted) 14 | 0 (omitted) 15 | 0 (omitted) 16 | 0 (omitted) 17 | 0 (omitted) | dummrace | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) | 2.dmar | 0 (omitted) 2.csex | 0 (omitted) 4.dumbirorder | 0 (omitted) _cons | .1056115 . . . . . --------------+---------------------------------------------------------------- conceptmodate | absorbed (1985 categories) Linear regression, absorbing indicators Number of obs = 415,258 F( 0, 27) = . Prob > F = . Root MSE = 0.0000 (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust csec | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | 0 (omitted) | dummage | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) | dmeduc | 1 | 0 (omitted) 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) 5 | 0 (omitted) 6 | 0 (omitted) 7 | 0 (omitted) 8 | 0 (omitted) 9 | 0 (omitted) 10 | 0 (omitted) 11 | 0 (omitted) 12 | 0 (omitted) 13 | 0 (omitted) 14 | 0 (omitted) 15 | 0 (omitted) 16 | 0 (omitted) 17 | 0 (omitted) | dummrace | 2 | 0 (omitted) 3 | 0 (omitted) 4 | 0 (omitted) | 2.dmar | 0 (omitted) 2.csex | 0 (omitted) 4.dumbirorder | 0 (omitted) _cons | 0 (omitted) --------------+---------------------------------------------------------------- conceptmodate | absorbed (1985 categories) Linear regression, absorbing indicators Number of obs = 415,258 F( 27, 27) = 7839401.62 Prob > F = 0.0000 R-squared = 0.1825 Adj R-squared = 0.1785 Root MSE = 4.1466 (Std. Err. adjusted for 28 clusters in cntyrfip) ------------------------------------------------------------------------------- | Robust nprevis | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- emptopop | .3913594 .8524499 0.46 0.650 -1.357723 2.140442 | dummage | 2 | .6713209 .0607689 11.05 0.000 .5466333 .7960085 3 | 1.440154 .0822093 17.52 0.000 1.271474 1.608833 4 | 1.77644 .0759003 23.40 0.000 1.620705 1.932174 | dmeduc | 1 | .9436234 .3958303 2.38 0.024 .1314466 1.7558 2 | .7093487 .2250707 3.15 0.004 .2475418 1.171156 3 | .7419543 .4287262 1.73 0.095 -.1377193 1.621628 4 | .8791356 .3429686 2.56 0.016 .1754221 1.582849 5 | .7503109 .3432982 2.19 0.038 .0459211 1.454701 6 | .9775896 .5125862 1.91 0.067 -.0741503 2.02933 7 | 1.878452 .5276921 3.56 0.001 .7957174 2.961187 8 | 2.154712 .6022926 3.58 0.001 .91891 3.390515 9 | 2.084571 .5881248 3.54 0.001 .8778384 3.291303 10 | 2.711758 .595407 4.55 0.000 1.490084 3.933432 11 | 2.848627 .6411 4.44 0.000 1.533199 4.164056 12 | 3.694534 .6486478 5.70 0.000 2.363619 5.025449 13 | 4.072212 .6832652 5.96 0.000 2.670267 5.474156 14 | 4.190809 .6825376 6.14 0.000 2.790358 5.591261 15 | 4.169677 .7214367 5.78 0.000 2.689411 5.649943 16 | 4.374855 .6988821 6.26 0.000 2.940867 5.808842 17 | 4.464445 .7446155 6.00 0.000 2.93662 5.99227 | dummrace | 2 | -.7008378 .0501316 -13.98 0.000 -.8036994 -.5979761 3 | -1.135264 .3336207 -3.40 0.002 -1.819797 -.4507306 4 | -.9966281 .0519124 -19.20 0.000 -1.103144 -.8901126 | 2.dmar | -1.558454 .0886083 -17.59 0.000 -1.740264 -1.376645 2.csex | .0359697 .0105536 3.41 0.002 .0143155 .0576239 4.dumbirorder | -1.361785 .4532831 -3.00 0.006 -2.291845 -.4317251 _cons | 6.885657 .6308294 10.92 0.000 5.591302 8.180012 --------------+---------------------------------------------------------------- conceptmodate | absorbed (1985 categories)
Now, my questions are:
1. Depending on my data and model, shall I choose xtreg or areg? If choosing xtreg, how should I setup my panal variable and time variable in xtset.
2. Neither xtreg and areg generate ideal results: how do I deal with omitted variables and insignificant coefficients? While I understand it's common, but if it is because of the mishandling of the data of the misusing the command in Statal, what could be possible reasons and how should I debug? I can't change the model.
Thanks and any inputs are greatly appreciated!!
0 Response to areg vs reghdfe vs xtreg with a group of controlling variables and fixed effects
Post a Comment