I am using Stata 13.1 and trying to match two datasets in the following logic:
The first dataset contains investments made in companies (variable: company) by individuals (variable: person_id) in a specific year (variable: invest_year).
The goal is to add an additional column (person_company) to this dataset that indicates which company/companies the individual worked for during the respective invest_year.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int person_id str22 company int invest_year 100 "Plx Technology" 1986 100 "Unity Systems" 1986 100 "Kyte" 2006 101 "Artificial Linguistics" 1990 101 "Vontu" 2002 101 "Hyper9" 2009 102 "Power Center" 1996 102 "Freeloader" 1996 102 "CrowdRise Inc." 2014 end
The second dataset contains work data for each person_id:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(person_id workexp01_start workexp01_end) str3 workexp01_company int(workexp02_start workexp02_end) str2 workexp02_company int(workexp03_start workexp03_end) str3 workexp03_company 100 . . "" 1985 2018 "DA" 1985 2018 "DFJ" 101 . . "" 1988 2013 "VR" . . "" 102 2003 2018 "USV" 1996 2018 "FP" 1987 1991 "EP" end
Let me give an example: person_id 102 invested in "Power Center" in 1996. The second dataset indictaes that person_id 102 in 1996 worked at "FP". Thus in the additional column (person_company) in the first dataset, we would show "FP".
Note that in some cases, the person worked at several companies in parallel, for example: person_id 100 invested in "Kyte" in 2006 and worked at "DA" and "DFJ" during this time. Thus, would probably need to have several addition columns: person_company_1, person_company_2, person_company_3 (the max. are 3 positions in parallel)
I hope the explanation and example help to clarify the issue - would very much appreciate your help! Thank you very much in advance.
0 Response to Matching 2 datasets based on multiple criteria
Post a Comment