Hi All, I am trying to make a variable that identifies whether a person's second to last work history entry was employment (as opposed to being unemployed or abroad). The problem I am running into is that I do not have a traditional panel data set, as each person's work history entries are classified by different start dates over a period of 30 years, and I want to identify each person's most recent work history entry and their second most recent work history entry using the L. command. So far, I have sorted my data by worker id (fwid) and work history entry start date, and I have tried to generate a variable that identifies their work history entry number by generating a running sum of ones (called file_num) for each individual (after sorting the data). My idea was to use the xtset command with the worker's id as the panel variable and the file_num as the time variable. Then I was going to identify the max value for the file_num variable and use the L. command to identify the second to last entry for each person. The weird this is that when I run this same exact set of code from start to finish, my variable of interest (separated_to_employ_v2) winds up with different summary statistics every time, and I cannot figure out why. Any help you can provide to resolve this issue would be greatly appreciated, as this issue is preventing me from replicating my regression results when I re-run the code.


use "C:\Users\Zach\Dropbox\H-2A\Generated Data Files\NAWS Workgrid with Main File Merged 1989-2018.dta", clear
rename *, lower
encode c06, gen(work_type)
gen abroad = work_type==1
gen farm_work = work_type==2
gen non_farm_work = work_type==3
gen non_employed = work_type==4
gen file_num = .
gen ones = 1
sort fwid start_date
by fwid: replace file_num = sum(ones)
xtset fwid file_num
gen separated_to_employv2 = (l.farm_work==1 | l.non_farm_work==1) & l.end_date<start_date
sum separated_to_employv2, d

Here are the summary stats for the variable "separated_to_employ_v2" from two separate runs of the code from start to finish...note the difference in the means.

Array Array