Hi statalist members,
This is my first post here so pardon me if I deviate from the established etiquette for the forum. I shall cut right to the chase.
I am trying to understand and replicate the analysis of Abadie, Athey, Imbens, and. Wooldridge (2017) (https://arxiv.org/abs/1710.02926), particularly what was presented at the Chamberlain Seminar last year (https://www.google.com/url?q=https%3...xpGV5v9dU8jDBi). I am running into issues setting up the Monte Carlo simulation.
The regression is of an outcome regressed on only a constant and treatment assignment variable (W). Outcome is generated by drawing from a normal distribution, with mean for control as alpha and for treatment as alpha + tau. Alpha and tau vary across clusters with variance 0.15 and 0.12 and have means 9.9 and 0.4 respectively.
Firstly, the treatment assignment variable (W). This should be drawn from a binomial distribution since we want W to be a binary variable with mean 0.55. Now my understanding is that W1 should be a 52x1 vector which is the means of W in each cluster. W1 will then help to generate data for W in each cluster by drawing from a binomial distribution with probability W1i where i belongs to [0,52]. sigmaK which Abadie et al are varying should be the variance of W1. To reiterate simply, the assignment probabilities across clusters should have mean 0.55 and variance sigmaK. My problem is that I am drawing W1 from a normal distribution with mean 0.55 and standard deviation sigmaK. When sigmaK is less than approx 0.23, the draws are all within (0,1). We need the draws to be between 0 and 1 because these will be the probability values for the binomial distribution. Abadie et al have a case of highly correlated assignment probability where sigmaK = 0.6. This lets the draws from normal distribution be outside (0,1). So my question is what should I be doing so that I get the correct form of W.
Secondly, the simulation results show the true standard deviation. I would naively assume them to be the standard error from OLS regression using all population as sample. But this does not sit right as standard errors (variance) is a function of q (proportion of observed clusters) and should vary accordingly. What would they be considering as true standard error?
Thirdly, in generating the outcome variable, they mention that it is drawn from a distribution with variance estimated on original data. This might be a long shot but I don’t have (know) the exact data they use. Could there be a workaround? I for now draw outcome variable from a multivariate normal with variances 1 and covariances 0.5.
Any help in understanding and correcting my understanding would be highly appreciated.
Regards,
Abbas
Related Posts with Understanding Monte Carlo simulations in "When Should You Adjust Standard Errors for Clustering?" (Abadie at al, 2017)
The intersection of two Chinese variablesDear All, I found this question here (in Chinese). The data set is Code: * Example generated by -da…
Seasonality in annual data? Problem or not?Hello, I have a panel data at an annual frequency. When I use the xtline command to get the linear p…
Synthetic Control - minimum number of obs?Can I build a synthetic control with 6 pre-treatment observations and 4 post-treatment? Is it valid,…
How to Drop Duplicate ID Observations if There are Multiple Conditions I Want to ApplyHello Everyone, I hope that you could help me with the below. I have a cross-sectional dataset of …
Building cross-lagged panel models with categorical variables on StataCan someone here suggest to me a suitable way to build a cross-lagged panel model with categorical v…
Subscribe to:
Post Comments (Atom)
0 Response to Understanding Monte Carlo simulations in "When Should You Adjust Standard Errors for Clustering?" (Abadie at al, 2017)
Post a Comment