I am involved in a project analyzing the effect of Environmental Impact Assessment's effect on wind-power plant applications being granted or not, the expectation being that a higher (reflecting more negative EIA) decreases the chances of an application being granted. We have a dataset of 86 observations with a dichotomous dependent variable which has 32 0's and 60 1's, using Stata SE 15.1. Since this is the whole population of applications for the relevant period (Norway 1999-2018), collecting more data is not an option. We simply aim to test the hypothesis of whether worse EIA reduces chances of concession being granted. We will at least try to publish a paper on the data set which is quite innovative, but we also want to test our most basic hypothesis. What we ask for is your opinion on our approach. Any comments - critical or constructive - are very welcome.
We have thought to do the following:
1. Dichotomize the 8-category EIA-variable so as to reduce chances of perfect separation/sparse cells. The 2x2 table with the dependent variable looks like this:
EIA = 0 | EIA = 1 | Total | |
Rejected | 12 | 18 | 30 |
Granted | 47 | 9 | 56 |
Total | 59 | 27 | 86 |
2. When we run regressions, never use more than maximum 2 more covariates in addition to the IV of interest and be more than normally vary of separation issues, collinearity, and instability between specifications and when dropping cases. We have done some preliminary analysis on this, and the IV of interest changes little when introducing one control at the time.
3. Due to no model being optimal for such a small dataset, we have opted for using several estimators: logit, logit with robust std err., rare events logit (King & Zeng's ReLogit), firthlogit, exact logit and (mainly to probe some assumptions) the good old LPM. I trust the firth logit and exact logit the most, but after what I can understand these have different strengths and weaknesses with Firth logit having the most reliable point estimate, whereas its standard error can be misleading, and if I got it right, the converse holds true for exact logit. From what I can judge (see below) our preliminary analyses shows little difference in the magnitude and standard errors across models (the exception being OLS which is on a different scale and of course not in odds ratios). However, I have some qualms as to how reliable any model would be in such a small sample and how much can be done, in particular when it comes to judging substantive impact.
4. Here's our code
Code:
*Model 1. Logistic logistic conc_1 revKU_nat if included == 1 *Model 2. Logistic robust std err logistic conc_1 revKU_nat if included == 1, robust *Model 3. ReLogit relogit conc_1 revKU_nat if included == 1 *Model 4. OLS reg conc_1 revKU_nat if included == 1 *Model 5. Firth logit firthlogit conc_1 revKU_nat if included == 1, or /*Obtaining reliable significance values for coeff of interest a la Heinze and Schemper (2002) a lr-test of the nested vs. full model BUT constrains the variable of interest to zero estimates store Full constraint 1 revKU_nat = 0 firthlogit conc_1 revKU_nat if included == 1, constraint(1) estimates store Constrained lrtest Full Constrained *lr-test: testval/p-val = 16.92/0.0000 *Model 6. Exact logit exlogistic conc_1 revKU_nat if included == 1, memory(2g) test(prob)
Model 1. Logistic regression | Model 2. Logistic regression | Model 3. ReLogit | Model 4. OLS | Model 5. Firth logit | Model 6. Exact logit | |
EIA | 0.128*** | 0.128*** | 0.135*** | -0.463*** | 0.135*** | 0.132*** |
(0.0665) | (0.0669) | (0.0693) | (0.1000) | (0.0690) | NA(see teststat) | |
Constant | 3.917*** | 3.917*** | 3.797*** | 0.797*** | 3.800*** | NA |
(1.267) | (1.274) | (1.207) | (0.0560) | (1.208) | NA | |
Observations | 86 | 86 | 86 | 86 | 86 | 86 |
prob-test | 0.000041/0.0001 | |||||
lr-test | 16.92/0.0000 | |||||
R-squared | 0.204 |
Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1
Again, thanks for your comments!
Best,
Ole Magnus Theisen
0 Response to Strategy for analyzing 90 observations with a dichotomous dependent variable
Post a Comment