I have a list of firms that I am comparing to each other using NAIC (Industry ID - 6 digits). I want to compare the NAIC code of each firm_id to each firm_id2 on each digit of the 6-digit level NAIC code. If the digit is the same, I am saying they are similar at the "X-level."

With respect to a portion of the code below:
cross using `temp'

I get the following error - r(459) "sum of expand values exceed 2,147,483,620. The dataset may not contain more than 2,147,483,620 observations." That is because there are duplicate firms- The data looks at a firm's deal activity by year (1990-2016) so it is possible the firm is listed 26 times (See firm_id=3 is listed 4 times)... there are 373,088 unique firms but a total of 1,210,053 observations. I believe this is where the error is coming from...

Code:
// Example data
 clear input firm_id M_Acq_Naic str2 M_Acq_Reg
1 511210 "JP"
2 236116 "EU"
3 451120 "AM"
3 451120 "AM"
3 451120 "AM"
3 451120 "AM"
4 441110 "AM"
4 441110 "AM"
5 811310 "EU"
6 221119 "EU"
7 813212 "JP" end

// NAIC codes should be strings, especially for current purposes.
tostring M_Acq_Naic*, replace

// Make a file to pair with itself.
preserve
tempfile temp
rename * *_2  
save `temp' restore
//
rename * *_1
cross using `temp'  // the workhorse here
//
drop if  (firm_id_1 == firm_id_2) // no self pairs

// Create indicator variables for digit matches on NAIC codes
forval i = 1/6 {    
gen PairSameDigit`i' = substr(M_Acq_Naic_1,`i',1) == substr(M_Acq_Naic_2,`i', 1)
}

// Drop duplicate firm pairs
gen min = min(firm_id_1, firm_id_2)
gen max = max(firm_id_1, firm_id_2)
bysort min max: keep if _n ==1
drop min max
Thanks in advance for your help!