I have two datasets I’d like to merge. Both of them contain the numeric variables stckcd and year, along with other variables. The observations in dataset 1 are uniquely identified by stckcd and year. The observations in dataset 2 are not.
I want to merge two datasets by stckcd and year so that if there is a duplicate observation in dataset 2, the corresponding observation for the other variables in dataset 1 is repeated.
Here’s a simple example.
Dataset 1:
stckcd | year | A |
1 | 2000 | 1 |
1 | 2001 | 1 |
2 | 2000 | 2 |
Dataset 2:
stckcd | year | B |
1 | 2000 | w |
1 | 2000 | x |
1 | 2001 | y |
2 | 2000 | z |
Here's what I'd like the merged datasets to look like:
stckcd | year | A | B |
1 | 2000 | 1 | w |
1 | 2000 | 1 | x |
1 | 2001 | 1 | y |
2 | 2000 | 2 | z |
My problem seems similar to the one described here: https://www.statalist.org/forums/for...the-duplicates,but I’m not entirely sure what that user wanted the final dataset to look like.
Apologies in advance if this question is not phrased clearly enough. I am new to StataList.
0 Response to Question on merging datasets with duplicates
Post a Comment