Dear all,

My data has about 200,000 observations and it has two distinct ids (Rank and NCTId), Date, Location Country, Location Zip, and other variables. I have interest in counting "United States" zip codes by year.

I think the cases with only one US zip code is no problem. But some observations have multiple locations as seen below. There are quite various cases: US, US|US... (divided by '|'), US|US|other country (or multiple countries; in some cases, more than 10 countries are combined). In those cases, I would like to separate or split the line into additional lines with separate zip codes.

For example, the first line in the below data would be divided into two separate lines with different (Location country and) Location Zip while sharing other variables such as Rank, NCTId, OrgStudyId, Date, and etc. Another example, I would like to make the fourth line into separate two lines of US and Canada zip codes. Then, I could count the US zip codes.

Thank you.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 Rank str11 NCTId str32 OrgStudyId str18 StudyFirstPostDate strL(LocationCountry2 LocationZip)
"364383" "NCT00002293" "026B"              "August 31, 2001"   "United States|United States" "35233|085434000"   
"199477" "NCT02142465" "LAZ-012-0002SA"    "May 20, 2014"      ""                            ""                  
"206224" "NCT02054260" "SCHBC_IRB_2013-40" "February 4, 2014"  ""                            "420-767"           
"342857" "NCT00253617" "AXCAN-PHOCCC04-01" "November 15, 2005" "United States|Canada"        "90095-1781|J3H 6C4"
"255087" "NCT01414270" "1108-V19_PT03-01"  "August 11, 2011"   ""                            ""                  
"262783" "NCT01313195" "C-10-028"          "March 11, 2011"    ""                            ""                  
"208166" "NCT02028897" "SAIRB-13-0040"     "January 7, 2014"   "United States"               "89117"             
end