Hi,
I am working with the National Longitudinal Study of Youth 1997. The set of variables yempid denotes the year followed by the order or loop of the employer which is recorded in that year. For example, 201102 in a specific survey year means that this employer was first picked up in 2011 as employer 02.

I would like to create two variables based on the set of yempid variables; one for the year and another one for the employer loop so I can match these to another variable which identifies whether the employer is in the public or private sector. This particular variable is only captured once when the employer if first recorded; hence why I am trying to separate the year & loop of when all the employers are recorded.

I have first used tostring and then created a variable to identify the length of the yempid variables since there are 6str and 4str variables. Then, I tried extracting the first four or two characters based on the length of the variables. The issue is that this deletes all observations on the yempid and the new variables that are supposed to contain the year.

I am not sure if converting the variables to string variables as opposed to keeping them in numeric format is causing this problem. In the following data example, I have omitted the public vs. private sector variables but happy to provide them if it would help.

Thanks in advance for your help on this.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4(yempid_1997 yempid_1998) str6(yempid_1999 yempid_2000 yempid_2001 yempid_2002 yempid_2003 yempid_2004 yempid_2005 yempid_2006 yempid_2007 yempid_2008 yempid_2009 yempid_2010 yempid_2011 yempid_2013 yempid_2015 yempid_2017)
"9701" "9701" "199902" "199902" "199902" "199902" "200103" "200103" "200103" "200103" "200702" "200702" "200702" "200702" "200702" "200702" "200702" "."     
"."    "."    "."      "200002" "200102" "200202" "200301" "200402" "200402" "."      "."      "200802" "200802" "200802" "200802" "200802" "200802" "200802"
"."    "."    "199901" "200001" "200001" "200001" "200001" "200402" "."      "."      "."      "."      "200903" "."      "201102" "201102" "."      "201703"
"."    "9801" "9801"   "200001" "200104" "200202" "200202" "200202" "200502" "200502" "200502" "200802" "200802" "200802" "201102" "200802" "200802" "200802"
"."    "."    "199901" "200001" "."      "200201" "200201" "200201" "200402" "200602" "200602" "200602" "200602" "200602" "200602" "200602" "200602" "200602"
"."    "."    "."      "200002" "200104" "200203" "200202" "200402" "200402" "200402" "200402" "200402" "200402" "200402" "200402" "201302" "201302" "201701"
"."    "."    "."      "."      "200101" "200201" "."      "."      "200501" "200601" "."      "."      "200904" "200904" "."      "."      "201502" "."     
"9701" "9701" "9701"   "9701"   "200102" "200201" "200201" "200302" "200502" "200502" "200502" "."      "200901" "."      "201102" "."      "201503" "."     
"."    "."    "."      "."      "200101" "200201" "200201" "200201" "200501" "200601" "200601" "200601" "200601" "200601" "201102" "201102" "201102" "201702"
"."    "."    "."      "."      "."      "."      "."      "."      "."      "200601" "200702" "200702" "200702" "."      "."      "."      "."      "201703"
end
tostring yempid_1997 yempid_1998 yempid_1999 yempid_2000 yempid_2001 yempid_2002 yempid_2003 yempid_2004 yempid_2005 yempid_2006 yempid_2007 yempid_2008 yempid_2009 yempid_2010 yempid_2011 yempid_2013 yempid_2015 yempid_2017

local employerid "yempid_1997 yempid_1998 yempid_1999 yempid_2000 yempid_2001 yempid_2002 yempid_2003 yempid_2004 yempid_2005 yempid_2006 yempid_2007 yempid_2008 yempid_2009 yempid_2010 yempid_2011 yempid_2013 yempid_2015 yempid_2017"


foreach x of local employerid {
gen len`x' = strlen(`x')
}


foreach x of local employerid {
gen uidyear`x' = substr(`x',1,4) if len`x' == 6
replace uidyear`x' = substr(`x',1,2) if len`x' == 4
}


Many thanks
Karen