Hi. My data has each row identified by an id variable and multiple language string variables that list languages spoken for that id. However, the same language may be duplicated across the language variables. I want to retain each language in just the first variable instance that a language occurs in and replace all duplicates with NA. Is there a way I can write loop to run on all the language variables, look for duplicates across all the variables for each row and replace duplicates with a "NA"? For example, in the second row below, language1 will be Hindi" but all others would be NA
Code:
input double performer_id str8(language1 language2 language3 language4)
81016542895 "Kannada" "NA" "NA" "NA"
81013244989 "Hindi " "Hindi" "Hindi" "Hindi "
81013267181 "Hindi" "Hindi" "NA" "NA"
81009910893 "Hindi" "Gujarati" "Marathi" "Punjabi"
81013228751 "Punjabi" "Punjabi" "NA" "NA"
81015437069 "Hindi " "Hindi" "Gujarati" "Marathi"
81010739006 "Hindi" "Hindi " "Gujarati" "Marathi"
81015437069 "Hindi " "Hindi" "Gujarati" "Marathi"
81011361270 "Hindi" "Hindi" "Hindi " "Kannada"
81013269951 "Kannada" "Gujarati" "NA" "NA"
81029734638 "Hindi" "NA" "NA" "NA"
0 Response to Identifying and replacing duplicate values across multiple string variables
Post a Comment