Stata 16
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long region str6 gender float age 8 "male" 4 2 "male" 5 13 "male" 5 10 "male" 3 4 "male" 5 14 "female" 4 2 "male" 4 12 "male" 3 13 "male" 4 1 "male" 5 8 "male" 1 11 "female" 3 14 "female" 4 5 "male" 5 8 "male" 3 5 "female" 4 7 "male" 2 9 "male" 4 11 "male" 5 8 "female" 5 4 "female" 4 2 "male" 4 6 "male" 5 9 "male" 3 2 "male" 4 2 "male" 2 11 "female" 5 9 "male" 5 8 "male" 4 13 "female" 1 2 "female" 1 9 "male" 3 2 "male" 5 2 "male" 4 9 "male" 5 9 "male" 4 2 "female" 4 2 "male" 4 2 "male" 5 4 "male" 4 2 "male" 5 11 "female" 5 2 "female" 5 5 "female" 4 11 "male" 5 8 "male" 5 10 "male" 3 11 "female" 1 2 "female" 5 2 "male" 4 4 "male" 3 13 "male" 5 8 "male" 4 1 "male" 5 5 "male" 3 2 "male" 1 5 "female" 3 10 "male" 2 14 "male" 4 12 "male" 4 2 "male" 5 6 "female" 3 2 "male" 4 11 "female" 1 11 "male" 5 12 "female" 4 2 "male" 2 14 "male" 3 2 "female" 4 1 "female" 3 12 "female" 4 9 "female" 4 2 "female" 3 8 "male" 3 2 "female" 4 11 "male" 3 8 "female" 4 5 "male" 5 9 "male" 4 10 "female" 5 13 "male" 4 10 "male" 4 13 "female" 5 2 "female" 4 13 "male" 3 5 "male" 1 4 "male" 4 13 "male" 1 8 "male" 3 1 "male" 3 8 "male" 1 13 "male" 5 2 "male" 4 8 "male" 3 13 "female" 4 8 "male" 3 1 "female" 3 4 "female" 1 7 "female" 1 2 "male" 5 end label values region periph_res label def periph_res 1 "Anatolikis Macedonias kai Thrakis", modify label def periph_res 2 "Attikis", modify label def periph_res 4 "Dytikis Elladas", modify label def periph_res 5 "Dytikis Makedonias", modify label def periph_res 6 "Ionion Nison", modify label def periph_res 7 "Ipeirou", modify label def periph_res 8 "Kentrikis Makedonias", modify label def periph_res 9 "Kritis", modify label def periph_res 10 "Notiou Aigaiou", modify label def periph_res 11 "Peloponnisou", modify label def periph_res 12 "Stereas Elladas", modify label def periph_res 13 "Thessalias", modify label def periph_res 14 "Voreiou Aigaiou", modify label values age agel label def agel 1 "18-24", modify label def agel 2 "25-34", modify label def agel 3 "35-44", modify label def agel 4 "45-54", modify label def agel 5 "55 plus", modify
Hello!
I would like to ask help on how to balance my dataset under several constraints simultaneously.
I have collected a sample of 5946 respondents, with the following demographic characteristics: region, gender and age.
The sample is not representative on a national level, based on the above characteristics. In fact, these are the percentages in the real population (from the census) and in my sample are:
REGION CODE | CENSUS (Population) (%) | Sample (%) |
R1 | 5.62 | 5.8 |
R2 | 35.40 | 33.54 |
R3 | 6.28 | 4.44 |
R4 | 2.62 | 2.24 |
R5 | 1.92 | 1.66 |
R6 | 3.11 | 3.21 |
R7 | 17.40 | 19.07 |
R8 | 5.76 | 5.89 |
R9 | 2.86 | 3.55 |
R10 | 5.34 | 4.89 |
R11 | 5.06 | 4.86 |
R12 | 6.77 | 6.63 |
R13 | 1.84 | 4.22 |
GENDER | Males | Females |
CENSUS (Population) (%) | 49.03 | 50.97 |
Sample (%) | 55.63 | 44.37 |
AGE GROUP | 18-24 | 25-34 | 35-44 | 45-54 | 55 over |
CENSUS (Population) (%) | 9.63 | 17.32 | 18.43 | 16.58 | 38.03 |
Sample (%) | 9.2 | 14.3 | 24 | 26.86 | 25.65 |
I would like to balance my sample, so it matches all the population quotas according to the above tables. In case there is more than one datasets (ie, subsamples) satisfying the above conditions, I would like to keep the largest one. In case there are more than one datasets satisfying the above conditions, and are of equal size, I would like to keep all of them.
I only need the sample to be balanced on a national level, not on a regional level. For instance, I need 49% males in the overall subsample but I don’t necessarily need 49% in each region. That means that regions that have more women can compensate for those who have a deficiency in women etc.
The constraints are not really that strict. For example, I don’t need to get exactly 49.03% males. I could do 49.04%, or any other percentage that would not be significantly different from 49.03%.
The test we use to estimate the significance is a two-sample test of proportions (prtesti in STATA)
Is there any way to do this? I considered gsample, but I am not sure how to move forward with it or if it even relevant to my case.
Thank you for your time in advance
Best,
Eleni
0 Response to Subsample dataset under multiple conditions
Post a Comment