I have a problem looping over string values. I have a data set that contains duplicate observations. I would like to keep one of the duplicate observations for each duplicate observations. However, other variables create concerns to keep any one of the duplicate observations for each group. I have two other variables, gender, and province. For some of the duplicate observations, one of the duplicates contains the true province value and one contains a false value. I would like to keep the one with the true province value. Here is how the data looks:
HTML Code:
stid gender2 province2 E05528498 Male دیپارتمنتاداریودیپلوماسیپوهنحیحقوقپوهنتونالبیرونی E05528498 Male کاپيسا E05528502 Male کاپيسا E05528502 Male دیپارتمنتفقهوقانونپوهنحیشرعیاتپوهنتونالبیرونی
My question is how can I write a program that for each group of duplicates, I maintain the one with the true province value?
Another level of complication with this data is that two variables are missed up. Like the above, the province value has the same issue. In addition, the variable gender2 also contains a true gender indicator and an empty cell. For each group of duplicates, I want to maintain the one which has the gender indicator. The issue here is that the true province value and the true gender value are not in the same row. Here is an example in the data:
HTML Code:
stid gender2 province2 F01722690 Male دیپارتمنتتعلیماتاسلامیپوهنحیشرعیاتپوهنتونکابلبرایذکورواناث F01722690 کابل F01722815 کابل F01722815 Female دیپارتمنتگرافیکپوهنحیهنرهایزیباپوهنتونکابل
My question, in this case, is for each group how can I fill the empty cell with the gender indicator and then keep the one with the true province value?
With numerical value using max, min and group function it is easy to generate another variable. But I am struggling to do it with string.
You will save me a lot of time if you can help me with this.
Thanks!
0 Response to Keeping duplicate observations with the true value in other variables
Post a Comment