Dear all,

I am examining the effects of maternal education on low bitrth weights using a fuzzy regression discontinuity design. Specifically, I exploit exposure to the education reform as an IV for maternal education. The IV estimations show that an additional year of maternal education reduces the likelihood of low birth weights by 69%, which is large compared to previous studies. There have been argued that IV estimates of the impact of maternal education on child health indicators with low mean values (e.g., low birth weight and child mortality) may be biased without a sufficiently large sample. So, my questions are:

1) How to justify a sufficiently large sample in this case? My analysis sample (repeated cross-sectional data) consists of 8,000 observations (control: 5,500 observations and treatment: 3,500 observations). The sample mean is 0.03 (240 cases with low birth weights)
2) Let's assume that my analysis sample described above could raise a concern of biased IV estimates, is there any methods that I can test this concern? I am thinking of simulation but to be honest, I have never used it before. I would greatly appreciate it if anyone could give me a start by providing an example Stata code of simulation based on my issue.
Data: lbw is the outcome - low birth weights, yrschool is years of education, T is the treatment indicator, run is the running variable measured in years of birth, T_run is the interaction term between T and run, location and reg6 are where respondents reside (e., rural vs urban).
Code:
clear
input float(lbw T run T_run reg6 yrschool location)
0 1  23 23 0 13 1
0 0 -12  0 0 12 1
0 1  11 11 0 13 1
0 1  43 43 0 13 1
0 1  32 32 0 13 1
0 1  49 49 0 13 1
0 1  56 56 0 13 1
0 0 -62  0 0 13 1
0 0 -24  0 0 13 1
0 1  63 63 0 11 1
0 0 -37  0 0 12 1
0 1   8  8 0 13 1
0 1  37 37 0 13 1
0 1  15 15 0 13 1
0 1  16 16 0 13 1
0 1  26 26 0 13 1
0 1   4  4 0 13 1
0 1  26 26 0 13 1
0 1  41 41 0 13 1
1 0 -46  0 0 13 1
0 1  18 18 0 13 1
0 1   9  9 0 13 1
0 1  37 37 0 13 1
0 1  54 54 0 13 1
0 0 -13  0 0 13 1
0 1  62 62 0  9 1
0 0 -26  0 0 12 1
0 1  59 59 0 13 1
0 0 -18  0 0  5 1
0 1   7  7 0 13 1
0 1  51 51 0 13 1
0 1  53 53 0 13 1
0 0 -33  0 0 10 1
0 1  49 49 0 13 1
0 1  24 24 0 13 1
0 0 -25  0 0 12 1
0 1  53 53 0 13 1
0 0 -24  0 0 13 1
0 0  -9  0 0 12 1
0 1  13 13 0 13 1
0 1   6  6 0 13 1
0 1  51 51 0 13 1
0 1   8  8 0 13 1
0 1   8  8 0 13 1
0 0 -28  0 0 13 1
0 1  69 69 0  9 1
0 1  15 15 0 13 1
0 0 -29  0 0 12 1
0 1  58 58 0 13 1
0 0  -1  0 0 12 1
0 1  17 17 0 12 1
0 1  12 12 0 13 1
0 1   1  1 0 13 1
0 0 -25  0 0 13 1
0 0 -37  0 0 13 1
0 1  71 71 0  9 1
0 1  24 24 0 13 1
0 0 -10  0 0 10 1
0 0 -37  0 0 12 1
0 1  64 64 0 10 1
0 1  18 18 0 13 1
0 0 -35  0 0 12 1
0 1  51 51 0 13 1
0 1   8  8 0 13 1
0 1  20 20 0 13 1
0 0 -49  0 0  9 1
0 1  64 64 0 10 1
0 0  -4  0 0 13 1
0 1  59 59 0 13 1
0 1  56 56 0 13 1
0 0 -66  0 0 13 1
0 1  20 20 0 13 1
0 0 -44  0 0 12 1
0 0 -47  0 0 12 1
0 1  50 50 0 13 1
0 1  18 18 0 13 1
0 1  48 48 0 13 1
0 0 -12  0 0 13 1
0 0 -50  0 0 13 1
0 1   6  6 0 13 1
0 1  15 15 0 13 1
0 1  18 18 0 13 1
0 1  43 43 0 13 1
0 1  41 41 0 13 1
0 0 -38  0 0 12 1
0 1  63 63 0 11 1
0 0 -40  0 0 12 1
0 0 -20  0 0 13 1
0 1  28 28 0 13 1
0 0 -48  0 0  9 1
0 1  39 39 0 13 1
0 0 -55  0 0 13 1
0 1  37 37 0 13 1
0 1  60 60 0 11 1
0 0 -37  0 0 13 1
0 1  12 12 0 13 1
0 1  31 31 0 13 1
0 1   2  2 0 13 1
0 0 -27  0 0 12 1
0 1  33 33 0 12 1
end
[/CODE]

Command
ivregress 2sls lbw (yrschool = T) run T_run i.location i.reg6, vce(cluster run)