With respect to a portion of the code below:
cross using `temp'
I get the following error - r(459) "sum of expand values exceed 2,147,483,620. The dataset may not contain more than 2,147,483,620 observations." That is because there are duplicate firms- The data looks at a firm's deal activity by year (1990-2016) so it is possible the firm is listed 26 times (See firm_id=3 is listed 4 times)... there are 373,088 unique firms but a total of 1,210,053 observations. I believe this is where the error is coming from...
Code:
// Example data clear input firm_id M_Acq_Naic str2 M_Acq_Reg 1 511210 "JP" 2 236116 "EU" 3 451120 "AM" 3 451120 "AM" 3 451120 "AM" 3 451120 "AM" 4 441110 "AM" 4 441110 "AM" 5 811310 "EU" 6 221119 "EU" 7 813212 "JP" end // NAIC codes should be strings, especially for current purposes. tostring M_Acq_Naic*, replace // Make a file to pair with itself. preserve tempfile temp rename * *_2 save `temp' restore // rename * *_1 cross using `temp' // the workhorse here // drop if (firm_id_1 == firm_id_2) // no self pairs // Create indicator variables for digit matches on NAIC codes forval i = 1/6 { gen PairSameDigit`i' = substr(M_Acq_Naic_1,`i',1) == substr(M_Acq_Naic_2,`i', 1) } // Drop duplicate firm pairs gen min = min(firm_id_1, firm_id_2) gen max = max(firm_id_1, firm_id_2) bysort min max: keep if _n ==1 drop min max
0 Response to Creating Variable based on NAIC Digit Level match for Firms
Post a Comment