I've a trying to estimate the effect of 'economic equality' (X) on whether you want to have a foreigner as a neighbor (Y) (measured with a dummy variable).
I'm using a repeated cross sectional dataset with 36 countries form 1986-2000. My question is about the trade off between loosing observations and controling for variables (potential confounders). I get the following output from STATA using logit-models:
-
A | B | C | D | E | |
Economic equality (coefficient) | 1,534*** | 1,59** | 1,55* | 1,776** | 1,704** |
Control for political beliefs | No | Yes | No | Yes | No |
Control for gender | No | No | Yes | Yes | No |
Other control variables | Same | Same | Same | Same | Same |
Observations (N) | 120,821 | 87,001 | 78,169 | 31,013 | 31,013 (same obs as model D) |
As you can see in model A-D i'm loosing data when I control for additional variables (I know this due to the fact that STATA doens't include observations, when there are missing data on one variable) The data is missing because some questions wasn't included in specific country surveys. This means that the missing data isn't random.
My first questions is this: How do I decide, which controlvariables to include?
- I've strong theoretical reasons to control political beliefs and gender and not doing so will cause ommited variables bias. But I'm also afraid of sample selection bias, because of the reduction in sample size. What should i do? In my opinion missing data imputation is above my level of econometric sophistication. Can I just use theoretical based argumentation to make an estimate of the most precise model?
-----
In model E i've estimated the effect of X on Y using the same observations as in model D. In my understanding the difference between Model D and E (1,776 - 1,704 = 0,72) can be attributed to the fact that model D controls for political beliefs and gender. Also that the difference between Model A and Model E (1,704 - 1,534 = 170) is caused be the reduction of the sample.
My question second question thus goes: "Can I control for the sample size reduction in model D, by deducting the difference between model A and E = 1,776 - 170 = 1,606. Thus giving me a more "precise estimate of the effect of economic equality on attitudes towards foreigners?
Kind regards
Johan
0 Response to Losing data when controlling for variables, and how to decide on best model?
Post a Comment