I have a large dataset that consolidates identical observations from 3 sources that are stored under different identifiers. I used the Stata program -dtalink- to identify most of the matching observations. However, due to data errors some observations could not be matched by the Code and have thus been manually linked. Matching observations are marked with a "1" in the variable Match and the ID of the matching observation in the respective other database was put in the field ID_OtherSource (cross-linked).
Now I would like to link those two observations (by assigning an identical MatchID if this is possible) and either collapse the two lines into one or vice-versa replace the missings in one line with the values from the other and then drop one of the two lines (the lines always complement each other, so that replacing all the missings would lead to identical lines except the Source variable). So what I am trying to do is to say if ID_A or ID_B or ID_C match a value in ID_OtherSource, those two lines are identical observations and all the missings in the first observations can be replaced by values from the second observation.
Code:
*The data I have clear input str5 ID_A str5 ID_B str5 ID_C str5 Source Match str5 ID_OtherSource str5 Var1_A str5 Var1_B str5 Var1_C Var2_A Var2_B Var2_C "6789f" "6789f" "" "A B" 1 "714tt" "Yes" "Yes" "" 56 56 . "" "" "714tt" "C" 1 "6789f" "" "" "Yes" . . 56 end list * How it should look in the end clear input str5 ID_A str5 ID_B str5 ID_C str5 Source Match MatchID str5 ID_OtherSource str5 Var1_A str5 Var1_B str5 Var1_C Var2_A Var2_B Var2_C "6789f" "6789f" "714tt" "A B" 1 54 "714tt" "Yes" "Yes" "Yes" 56 56 56 "6789f" "6789f" "714tt" "C" 1 54 "6789f" "Yes" "Yes" "Yes" 56 56 56 end list |
0 Response to Match cross-linked observations
Post a Comment