Hi all,

I have 2 different individuals-level datasets that share the same categorical variables(e.g., race, sex, age_group). I want to create 1 dataset of frequencies for each category (e.g., number of males/females and the number of blacks, whites, etc..) within the variable. See example below:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 sex str5 race str11 age_groups
"male"   "white" "0-5 years"  
"male"   "black" "over 18"    
"female" "white" "over 18"    
"female" "white" "0-5 years"  
"female" "black" "13-18 years"
"male"   "black" "13-18 years"
"male"   "white" "over 18"    
end
Both datasets look similar to the one above, but the frequencies for each variable will be different. The goal is to create 1 dataset that compares the frequencies for each variable across the 2 individual datasets so something like the below:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 sex float(freq_sex_dataset1 freq_sex_dataset2) str5 race float(freq_race_dataset1 freq_race_dataset2) str13 age_groups float(freq_age_dataset1 freq_age_dataset2)
"male"   4 2 "white" 4 2 "0-5 years"     2 3
"female" 3 7 "black" 3 2 "13-18 years"   2 3
""       . . ""      . . "over 18 years" 3 4
end
Just an FYI that i make up the frequencies for all of the *dataset2 counts, but the counts in *dataset1 should reflect the first example dataset. I'm not sure of the best approach for this. I had been thinking about doing something like the below, but not sure:

Code:
use "dataset1.dta"

preserve
contract sex
rename _freq freq_sex_dataset1
tempfile sex_dataset1
save `sex_dataset1', replace
restore

preserve
contract race
rename _freq freq_race_dataset1
tempfile freq_race_dataset1
save `freq_race_dataset1', replace
restore

preserve
contract age
rename _freq freq_age_dataset1
tempfile freq_age_dataset1
save `freq_age_dataset1', replace
restore
Then do something similar for dataset2 and then merge them somehow, but I'm kind of lost at this point. Also, there are dozens of variables i need to do this for and not just the 3 above. Any help/input into how to best do this would be greatly appreciated as always!

Thanks Statalisters

Nick