I have some code that works, but it seems extremely inelegant to me and I find it hard to believe there isn't a more efficient way. Given I will be running this code repeatedly on sequential quarterly datasets, I'd like it to be cleaner if it can be! (I also want to attach new variables for other statistics, eg. the employment:population ratio. I've left out the code for these since it's structurally the same, but that also makes me want to streamline the code as much as I can.)
At present my approach:
1. Creates two dummies, with values if the person is employed / unemployed (respectively), and missing otherwise.
2. Uses egen count() with by, to create two new variables recording the raw number of employed / unemployed people in the region.
3. Creates a fifth new variable, the unemployment rate I actually want, by calculation from the raw numbers.
4. Drops the variables used only for this process.
Originally I thought I could skip at least one step by using egen count() with an if qualifier. But if that's possible I haven't been able to figure out how, and if there's some other more efficient way of doing it I haven't figured that out either. Any advice appreciated!
Here's a sample of my code - the original variable empl_stat records 1 for employed, 2 for unemployed, other values for various types of inactivity.
Code:
gen unempl=1 if empl_stat==2 gen empl=1 if empl_stat==1 /*By region*/ bysort region: egen reg_unempl = count(unempl) bysort region: egen reg_empl = count(empl) gen reg_unempl_r = reg_unempl/(reg_empl+reg_unempl) drop empl unempl reg_empl reg_unempl
0 Response to Creating variable with egen count() and conditions
Post a Comment