Hello,

My question is, if you write a program with regressions that are ran on different samples, does stata only bootstrap on the smallest sample (or the intersection) you have in the program? Here is my problem in detail:

I am interested in the "difference" generated in the program below:

Code:
# year_entry=year-years_teaching
gen young=(years_teaching<=2)

cap program drop diff
program diff, rclass
// 1. The effect of the policy on resid0
areg resid0 incentivized_instrument i.year i.years_teaching, ab(year_entry)
local effect_resid0=_b[incentivized_instrument]

// 2. The effect of the policy on other outcomes
areg `1' incentivized_instrument i.year i.years_teaching, ab(year_entry)
local effect_other=_b[incentivized_instrument]

// 3. Cross-sectional relationships
reg `1' VA if years_teaching<=2
local corr=_b[VA]

return scalar difference=`corr'*`effect_resid0'-`effect_other'
end
In this program, the sample of the 3rd regression is different from the other two. See the number of strata by young and my bootstrap below:

Code:
. egen iden=group(year young year_entry)
. egen num_strata=nvals(iden),by(young)
. tab num_strata young

           |         young
num_strata |         0          1 |     Total
-----------+----------------------+----------
        27 |         0     29,311 |    29,311
       331 |    86,749          0 |    86,749
-----------+----------------------+----------
     Total |    86,749     29,311 |   116,060



set seed 1231
bootstrap difference_fresid=r(difference), saving(bsdata,replace) strata(year year_entry young) reps(1000): diff fresid

Bootstrap results
Number of strata   =        24                  Number of obs     =     18,345
                                                Replications      =      1,000

      command:  diff2 fresid
difference_~d:  r(difference)

-----------------------------------------------------------------------------------
                  |   Observed   Bootstrap                         Normal-based
                  |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
difference_fresid |   .0307855          .        .       .            .           .
-----------------------------------------------------------------------------------

But the bootstrap says that the number of strata=24, which is a little bit smaller than the number of strata in the sample "young==1". So my guess is that when my regression have different samples, the stata bootstrap only randomly take observations from the smallest sample (or the intersection of all my samples?) in my program. This also explains why I got the same difference on all bsamples (don't get any standard error). Any idea why and how I can get around this? I am not sure how the stata package bootstrap works internally.

Thank you!