With respect to a portion of the code below:
cross using `temp'
I get the following error - r(459) "sum of expand values exceed 2,147,483,620. The dataset may not contain more than 2,147,483,620 observations." That is because there are duplicate firms- The data looks at a firm's deal activity by year (1990-2016) so it is possible the firm is listed 26 times (See firm_id=3 is listed 4 times)... there are 373,088 unique firms but a total of 1,210,053 observations. I believe this is where the error is coming from...
Code:
// Example data
 clear input firm_id M_Acq_Naic str2 M_Acq_Reg
1 511210 "JP"
2 236116 "EU"
3 451120 "AM"
3 451120 "AM"
3 451120 "AM"
3 451120 "AM"
4 441110 "AM"
4 441110 "AM"
5 811310 "EU"
6 221119 "EU"
7 813212 "JP" end
// NAIC codes should be strings, especially for current purposes.
tostring M_Acq_Naic*, replace
// Make a file to pair with itself.
preserve
tempfile temp
rename * *_2  
save `temp' restore
//
rename * *_1
cross using `temp'  // the workhorse here
//
drop if  (firm_id_1 == firm_id_2) // no self pairs
// Create indicator variables for digit matches on NAIC codes
forval i = 1/6 {    
gen PairSameDigit`i' = substr(M_Acq_Naic_1,`i',1) == substr(M_Acq_Naic_2,`i', 1)
}
// Drop duplicate firm pairs
gen min = min(firm_id_1, firm_id_2)
gen max = max(firm_id_1, firm_id_2)
bysort min max: keep if _n ==1
drop min max
0 Response to Creating Variable based on NAIC Digit Level match for Firms
Post a Comment