Good evening - I am appending / pooling recent surveys from a number of countries. I realized that the the length of the key stratification variables: idhspsu idhsstrata are different betweeen countries.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double sample str20 samplestr double(country idhspsu idhsstrata v021 v022)
 2401 " 2401"  24  2401000092  240100027  92 27
10803 "10803" 108 10803000035 1080300022  35 22
77701 "77701" 777    77701151     777018 151  8
99901 "99901" 999     9990140     999015  40  5
66601 "66601" 666    66601330    6660111 330 11
end
label values sample sample_lbl
label def sample_lbl 2401 "Angola 2015", modify
label def sample_lbl 10803 "Burundi 2016", modify
label def sample_lbl 66601 "Togo 2013", modify
label def sample_lbl 77701 "SierraLeone 2013", modify
label def sample_lbl 99901 "Gabon 2012", modify
label values country country_lbl
label def country_lbl 24 "Angola", modify
label def country_lbl 108 "Burundi", modify
label def country_lbl 666 "Togo", modify
label def country_lbl 777 "SierraLeone", modify
label def country_lbl 999 "Gabon", modify


According to the data documentation,

idhspsu is an 11-digit variable created by a combination of simplestr + v021
idhsstrata is a 10-digits variable created by a combination of simplestr + v022

Note:
v021 is a 6-digit variable
v022 is a 5 digit variable, while

Samplest is a 5-digit variable for almost all countries - there are few (such as Angola) with 4-digits. To get idhspsu and idhsstrata with the required length, v021 and v022 are then padded with leading 0s.

The problem is that some of the resulting idhspsu and idhsstrata data have different lengths. Take Angola for example, the idhspsu variable only has 10 characters (instead of 11), while the idhsstrata variable has 9 characters (instead of 10). I believe this is caused by the fact that insufficient 0s are padded to v021 and v022, which may also be related to the fact that Angola's samplestr variable is 4-digits, instead of 5.

So, to resolve all these problems, I want to recreate the idhspu and idhsstrata variables. This would involve:

1. checking first whether samplestr is a 5-digit variable, if not, then pad its end with a 0 to make it a 5 digit variable.
2. check if v021 and v022 are of the required lenght, and if not pad each of them with sufficient number of leading zeros before combining them with the samplestr variable.

This command
Code:
 gen str6 v021s=string(v021,"%06.0f") /// gen newpsu = samplestr + v021s
generates the new v021s variable and the corresponding psu variable, but I am not sure how to check if samplestr, v021 and v022 are of the required characters, before padding them with the zeros.

I would appreciate some assistance in this regard.


Thanks - cY