Dear Statalist community,

As part of my research I am trying to merge two data sets (WERS and ASHE) using firms' unique identifiers (IDBR number) in combination with their postcodes to establish a unique workplace identifier. The problem here with the UK postcodes is that they are not constructed in a completely systematic way. They can range from 5 to 7 characters. Sometimes they begin with one letter followed by one digit, sometimes followed by two digits and at other times they begin with two letters followed by two digits before a space.

Examples:

G2 5HN
B37 5TT
BS14 OTJ
EC3A 2BE

I have followed the discussion and comments on this topic (https://www.stata.com/statalist/arch.../msg00144.html). Unfortunately, deleting the space did not change anything about the character of the variable. It is still a string variable which cannot be combined with the IDBR number to generate the unique identifier. I always receive the error message that these two variables mismatch even when using: . gen identifier = IDBR + string(postcode,"%02.0f").

Any help is dearly appreciated.

Felix