Unfortunately I'm not able to download the dataex package, since I'm working on an external server. I hope you can still understand my query and are willing to help me out! I know some questions have been asked about simulation before, but none of the posts really matches what I'm looking for.
I am investigating the relationship between employment and crime on an individual and monthly level. I have data from around 1 million individuals for 96 time periods (8 years, 12 months per year), where I know whether they were employed or not, whether they committed an offence or not, monthly income and some other control variables.
My original dataset looks approximately like this:
id | time | emp | crime | income | age | crimehist |
1 | 1 | 1 | 0 | 2000 | 19 | 0 |
1 | 2 | 1 | 0 | 1800 | 19 | 0 |
1 | 3 | 0 | 1 | 0 | 19 | 0 |
1 | 4 | 0 | 0 | 0 | 20 | 1 |
1 | 5 | 1 | 0 | 1400 | 20 | 1 |
2 | 1 | 1 | 0 | 1500 | 24 | 3 |
2 | 2 | 1 | 1 | 1100 | 24 | 3 |
2 | 3 | 1 | 1 | 1400 | 24 | 4 |
2 | 4 | 0 | 0 | 0 | 24 | 5 |
2 | 5 | 0 | 0 | 0 | 25 | 5 |
I want to carry out a logistic regression to see whether there is a relationship in the following way:
Code:
xtlogit crime emp age age2 crimehist, fe
Even though there is some documentation on simulation studies online, I have not been able to find a proper code for this simulation study.
I think it's important for me to first of all know what kind of distribution my variables have. How do I find out? For example, income does not seem to have a perfectly normal distribution (see picture) - should my simulated independent variable then have a similar distribution to the real data, or can I assume normal distribution?
Array
In case I assume normal distribution for all my independent variables, what would be the next steps?
For generating the income variable I first used this code:
Code:
gen sim_inc = 0 replace sim_inc = 2416 + 1226 * invnorm(uniform()) if sim_emp != 0
Once I have generated all independent variables, how do I create the dependent variable?
And how do I then run the regression, and check whether the logistic model leads to consistent parameter estimates?
Thanks a lot in advance for your help!
0 Response to Simulation Study: Panel data xtlogit regression
Post a Comment