Today I face a strange situation that the number of observations shrinking when I expand the sample size
In particular, the numbers of observations for variable x1 and x2 in UNITEDS in my samples are
count if x1 != . & inlist(GEOGN, "UNITEDS")
count if x2 != . & inlist(GEOGN, "UNITEDS")
The result for these two variables are the same
Array
Then, I try to run the regression of x2 on x1 for this country (UNITEDS)
Code:
. reghdfe x1 x2 if inlist(GEOGN, "UNITEDS"), a(TYPE2 INDC32#yr) (dropped 1013 singleton observations) note: x2 is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09) (MWFE estimator converged in 14 iterations) note: x2 omitted because of collinearity HDFE Linear regression Number of obs = 54,409 Absorbing 2 HDFE groups F( 0, 47843) = . Prob > F = . R-squared = 0.8063 Adj R-squared = 0.7797 Within R-sq. = 0.0000 Root MSE = 0.3916 ------------------------------------------------------------------------------ x1 | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x2 | 0 (omitted) _cons | 1.307023 .0016788 778.54 0.000 1.303733 1.310314 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| TYPE2 | 6131 0 6131 | INDC32#yr | 450 15 435 | -----------------------------------------------------+
Code:
. reghdfe x1 x2 if inlist(GEOGN, "CHINA" "UNITEDS" "INDONESIA" "RUSSIAN" "MEXICO" "JAPAN" "PHILIPPINES" "VIETNAM" "SOUTHKOREA") | inlist(GEOGN,"COLOMBIA" "CANADA" "P > ERU" "MALAYSIA" "AUSTRALIA" "CHILE" "ECUADOR" "SINGAPORE" "NEWZEALAND"), a(TYPE2 INDC32#yr) (dropped 194 singleton observations) (MWFE estimator converged in 14 iterations) HDFE Linear regression Number of obs = 22,689 Absorbing 2 HDFE groups F( 1, 18715) = 0.07 Prob > F = 0.7857 R-squared = 0.7423 Adj R-squared = 0.6876 Within R-sq. = 0.0000 Root MSE = 0.2734 ------------------------------------------------------------------------------ x1| Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x2 | .0160276 .058948 0.27 0.786 -.0995158 .1315709 _cons | .7591069 .0458817 16.54 0.000 .6691746 .8490393 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| TYPE2 | 3614 0 3614 | INDC32#yr | 374 15 359 | -----------------------------------------------------+
As suggested by Ken Chui, I apply another way to deal with a subsample of countries (https://www.statalist.org/forums/for...st2-in-my-code)
And it turns out that the number of observation for the expanded sample are much bigger
Code:
gen include = 0 foreach ctry in CHINA UNITEDS INDONESIA RUSSIAN MEXICO JAPAN PHILIPPINES /// VIETNAM SOUTHKOREA COLOMBIA CANADA PERU MALAYSIA AUSTRALIA /// CHILE ECUADOR SINGAPORE NEWZEALAND{ replace include = 1 if GEOGN == "`ctry'" } reghdfe x1 x2 if include == 1, a(TYPE2 INDC32#yr)
Code:
. reghdfe x1 x2 if include == 1, a(TYPE2 INDC32#yr) (dropped 2165 singleton observations) (MWFE estimator converged in 13 iterations) HDFE Linear regression Number of obs = 232,994 Absorbing 2 HDFE groups F( 1, 209389) = 88.97 Prob > F = 0.0000 R-squared = 0.8183 Adj R-squared = 0.7978 Within R-sq. = 0.0004 Root MSE = 0.3176 ------------------------------------------------------------------------------ x1 | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x2 | .0282004 .0029897 9.43 0.000 .0223407 .0340601 _cons | 1.079796 .0023016 469.15 0.000 1.075285 1.084307 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| TYPE2 | 23169 0 23169 | INDC32#yr | 450 15 435 | -----------------------------------------------------+
0 Response to Why the number of observation decrease when I increase the sample size?
Post a Comment