My question is, if you write a program with regressions that are ran on different samples, does stata only bootstrap on the smallest sample (or the intersection) you have in the program? Here is my problem in detail:
I am interested in the "difference" generated in the program below:
Code:
# year_entry=year-years_teaching gen young=(years_teaching<=2) cap program drop diff program diff, rclass // 1. The effect of the policy on resid0 areg resid0 incentivized_instrument i.year i.years_teaching, ab(year_entry) local effect_resid0=_b[incentivized_instrument] // 2. The effect of the policy on other outcomes areg `1' incentivized_instrument i.year i.years_teaching, ab(year_entry) local effect_other=_b[incentivized_instrument] // 3. Cross-sectional relationships reg `1' VA if years_teaching<=2 local corr=_b[VA] return scalar difference=`corr'*`effect_resid0'-`effect_other' end
Code:
. egen iden=group(year young year_entry) . egen num_strata=nvals(iden),by(young) . tab num_strata young | young num_strata | 0 1 | Total -----------+----------------------+---------- 27 | 0 29,311 | 29,311 331 | 86,749 0 | 86,749 -----------+----------------------+---------- Total | 86,749 29,311 | 116,060 set seed 1231 bootstrap difference_fresid=r(difference), saving(bsdata,replace) strata(year year_entry young) reps(1000): diff fresid Bootstrap results Number of strata = 24 Number of obs = 18,345 Replications = 1,000 command: diff2 fresid difference_~d: r(difference) ----------------------------------------------------------------------------------- | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------+---------------------------------------------------------------- difference_fresid | .0307855 . . . . . -----------------------------------------------------------------------------------
But the bootstrap says that the number of strata=24, which is a little bit smaller than the number of strata in the sample "young==1". So my guess is that when my regression have different samples, the stata bootstrap only randomly take observations from the smallest sample (or the intersection of all my samples?) in my program. This also explains why I got the same difference on all bsamples (don't get any standard error). Any idea why and how I can get around this? I am not sure how the stata package bootstrap works internally.
Thank you!
0 Response to Bootstrapping a user-written program that has regressions with different samples
Post a Comment