My question is, if you write a program with regressions that are ran on different samples, does stata only bootstrap on the smallest sample (or the intersection) you have in the program? Here is my problem in detail:
I am interested in the "difference" generated in the program below:
Code:
# year_entry=year-years_teaching gen young=(years_teaching<=2) cap program drop diff program diff, rclass // 1. The effect of the policy on resid0 areg resid0 incentivized_instrument i.year i.years_teaching, ab(year_entry) local effect_resid0=_b[incentivized_instrument] // 2. The effect of the policy on other outcomes areg `1' incentivized_instrument i.year i.years_teaching, ab(year_entry) local effect_other=_b[incentivized_instrument] // 3. Cross-sectional relationships reg `1' VA if years_teaching<=2 local corr=_b[VA] return scalar difference=`corr'*`effect_resid0'-`effect_other' end
Code:
. egen iden=group(year young year_entry)
. egen num_strata=nvals(iden),by(young)
. tab num_strata young
           |         young
num_strata |         0          1 |     Total
-----------+----------------------+----------
        27 |         0     29,311 |    29,311
       331 |    86,749          0 |    86,749
-----------+----------------------+----------
     Total |    86,749     29,311 |   116,060
set seed 1231
bootstrap difference_fresid=r(difference), saving(bsdata,replace) strata(year year_entry young) reps(1000): diff fresid
Bootstrap results
Number of strata   =        24                  Number of obs     =     18,345
                                                Replications      =      1,000
      command:  diff2 fresid
difference_~d:  r(difference)
-----------------------------------------------------------------------------------
                  |   Observed   Bootstrap                         Normal-based
                  |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
difference_fresid |   .0307855          .        .       .            .           .
-----------------------------------------------------------------------------------
But the bootstrap says that the number of strata=24, which is a little bit smaller than the number of strata in the sample "young==1". So my guess is that when my regression have different samples, the stata bootstrap only randomly take observations from the smallest sample (or the intersection of all my samples?) in my program. This also explains why I got the same difference on all bsamples (don't get any standard error). Any idea why and how I can get around this? I am not sure how the stata package bootstrap works internally.
Thank you!
0 Response to Bootstrapping a user-written program that has regressions with different samples
Post a Comment