Hello,

I have a dataset from the UK land registry in which I have thousands of housing transactions. I have the postcode variable for each house and have created a postcode area variable. I now need to create two new variables: postcode district and postcode sector.

Just to be clear, UK postcodes take this form: "The first, one or two letters indicate the postcode area, followed by one or two digits signifying a district within that area. This is followed by a space and then a number denoting a sector within said district, and finally by two letters which are allocated to streets or sides of a street."

Using: generate area = substr(postcode, 1, (2 - inrange(substr(postcode, 2, 1), "0", "9"))). I have managed to form a postcode area variable e.g. where "SE" and "E" are formed from the likes of "SE25 6AS" and "E7 9NB" respectively.

I now must create two more variables from the original postcode variable in order to form more specific postcode sector and postcode district variables (e.g. from "SE25 6AS", I would need to obtain variables which display this as "SE25" and "SE25 6") and do not really know how to go about doing this as UK postcodes vary in length from 5 to 7 characters.

Thank you