I am using a generalized Diff-in-Diff in Stata 15.1 on unbalanced data to estimate the impact of migration (post) on migrants (treatment group).
My model (a simplified version) looks as follows:
logit employed i.post i.treat i.year c.family_size c.family_size_age25 i.young_child i.muslim i.christian i.other_religion i.source, cluster(ident)
- post - migration (switches 1 for treatment groups; remains 0 for control group)
- treat - 1 for treatment group (migrants); 0 for control (non-migrants. This is time-invariant.
- employed is a binary variable (1 for yeears in which respondent was employed; 0 otherwise)
- fam_size is respondents' number of children in each year
- fam_size25 is respondents' number of children when they were aged 25, which I am using as an approximation (albeit not ideal) to control for the size of the family before treatment. I assume this can have impact not only on the dependent variable (employment), but also on whether someone migrates.
- young_child - a binary variable for whether respondents had a young child in each year
- muslim / christian / other religion - binary variables for respondents' religion
- source - country of origin
Question no. 1: Is there a good justification for only using family size at age 25 and not the actual family size? For instance, it appears to me that young children is a more important predictor of respondents' employment outcomes than how many children (regardless of age) they have.
Question no. 2: I am controlling for respondents' religion (Muslim, Christian, other) in addition to their country of origin. My problem is that one of the three countries in the sample is predominantly Muslim (over 90% of respondents in this country said they were Muslim). On the other hand, the share of Muslims in the remaining two countries is less than 5%. Consequently, one could argue that country of origin is tantamount to religion in my dataset. At the same time, post-estimation tests suggest strong preference for the model with religion and country of origin. Moreover, regressing the outcome of interest on each religion as well as on country of origin separately shows that each is highly significant and - importantly to my research question - they affect the independent variable in opposite directions. If one stood for another, I would expect that coming from a predominantly Muslim country would have the same impact on the outcome of interest as being Muslim, but that is not the case - and the reason I would like to keep both. So the question is: is keeping both justifiable?
Thank you very much in advance!
0 Response to Correlation between dependent variables and collinearity ]]]]]]]]]]
Post a Comment