I have a question regarding the merge accuracy of str. I have dataset A whose firm_id are in string format, but most of them actually contain only numbers, those not jusr numbers are like the following:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str8 patent_id "RE43814" "RE43864" "RE43868" "RE43956" "RE43986" "RE43997" "RE44164" "RE44215" "RE44861" "RE44874" "RE44924" "RE44930" "RE44958" "RE44993" "RE45248" "RE45348" "RE45418" "RE45473" "RE45539" "RE45733" "RE45782" "RE45804" "RE45956" "RE45962" "RE45990" "RE46020" "RE46089" "RE46096" "RE46176" "RE46193" "RE46351" "RE46409" "RE46436" "RE46436" "RE46436" "RE46473" "RE46488" "RE46518" "RE46558" "RE46564" "RE46630" "RE46686" "RE46703" "RE46746" "RE46850" "RE46891" "RE47055" "RE47257" "RE47341" "RE47342" "RE47351" "RE47425" "RE47487" "RE47553" "RE47663" "RE47698" "RE47715" "RE47736" "RE47737" "RE47761" "RE47763" "RE47813" "RE47857" "RE47949" "RE48267" "RE48274" "RE48308" "RE48359" "RE48378" "RE48446" "RE48524" "RE48532" "RE48599" "RE48641" "RE48695" "RE48702" "T100501" "T958006" "T962010" "T964006" "T965001" "T988005" end
Now I want to merge them using firm_id as a key, I have 2 options:
1. turn str to long
2. turn long to str
For the 1st one I think using long format to merge will be more accurate, but I have to drop those firms with characters. And I don't know how to test those contain characters and drop them
For the 2nd one I don't need to give up any observations but I wonder the accuracy of merge using str format, will this be accurate when most of them contain only numbers?
Thanks!
0 Response to Merge accuracy using str format when most contain only numbers
Post a Comment