I am stuck trying to organize my large dataset for participants with ethnicity records.
There are multiple entries per participant and for ethnicity records there is one of the following conditions:
a)multiple similar records
b)multiple discrete records
c)no records
In case of a and c, I want the dataset unchanged, but where there are multiple discrepant records I would like STATA to either choose the most frequent/common record or when there is no particular common record to choose randomly any of those recorded codes.
I would very much appreciate if you may kindly help me out.
To illustrate, the dataset looks similar to that:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float patid str7 code str11 group 1 "9t09.00" "Pakistani" 1 "" "" 1 "" "" 1 "9t09.00" "Pakistani" 1 "9t09.00" "Pakistani" 1 "9SA7.00" "Indian" 1 "9iA2.00" "Kashmiri" 1 "9t09.00" "Pakistani" 2 "" "" 2 "" "" 2 "" "" 2 "" "" 3 "" "" 3 "9t14.00" "Asian white" 4 "9SA9.00" "Scotish" 4 "9SA9.00" "Scotish" 4 "9t20.00" "British" 4 "9t20.00" "British" 5 "" "" 5 "" "" 5 "9SA9.00" "Scotish" end
code is the ethnicity codes
group is the groups I've made to categories the codes.
Thank you.
0 Response to working with discrepant entries for the same participant!
Post a Comment