Dear Statalist,

I would like to merge two files. My data is about the posts and comments of an online community. The first file includes the post ID, the commenter names of each post, the comment ID (it is unique for each comment), and comment dates.

The second dataex includes the post ID, the bloggers' names (who made the post), the post time, commenter name for each post, comment time, and what categories the focal post belonged to (1 means this post belongs to this category and 0 means not; some posts belongs to more than one categories).

The tricky thing here is that all the commenters in my dataset are also bloggers who have posted something before comment or post after comment. For those who have posted something before they comment, I would like to find out if those commenters have posted in the same category/categories as the focal post before.

Therefore, firstly I want to merge the two files and finally get a new one that includes: post ID, post time, blogger name, focal post category 12345, commenter name, comment time, commenter past post ID, comment past post time, comment past category 12345. Then secondly, I need to match if they have overlapped categories.

I tried all the merges but it always shows the variables (commenter name) are not uniquely identified or sometimes a lot of observations are not matched.

Could you help me figure out how to merge the files or if there are other ways to do this match?

Thanks very much!


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 PostID str7 Commentername str3 commentID int Commenttime
"AB" "Micheal" "001" 20435
"AB" "Joey"    "002" 20438
"AB" "Skyline" "003" 20440
"AC" "Orcaman" "004" 21994
"AC" "Bricks"  "005" 21996
"AD" "KEY"     "006" 21746
"AD" "Marten"  "007" 21746
"AD" "Joey"    "008" 21747
"AD" "Micheal" "009" 21749
end
format %tdnn/dd/CCYY Commenttime

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 PostID str11 Bloggername int Posttime str7 Commentername int Commenttime byte(Postcategory1 Postcategory2 Postcategory3)
"AB" "James Brick" 20434 "Micheal" 20435 1 0 1
"AB" "James Brick" 20435 "Joey"    20438 1 0 1
"AB" "James Brick" 20436 "Skyline" 20440 1 0 1
"AC" "topstar001"  21994 "Orcaman" 21994 0 0 1
"AC" "topstar002"  21995 "Bricks"  21996 0 0 1
"AD" "Skyline"     21016 "KEY"     21746 1 0 0
"AD" "Skyline"     21017 "Marten"  21746 1 0 0
"AD" "Skyline"     21018 "Joey"    21747 1 0 0
"AD" "Skyline"     21019 "Micheal" 21749 1 0 0
end
format %tdnn/dd/CCYY Posttime
format %tdnn/dd/CCYY Commenttime