Hello, I am trying to generate a variable based on the lagged values of other variables, where the time variable is actually the individual's work history entry number in the data set as opposed to a genuine time variable. My data setting is such that I have different people identified by the variable fwid and most people have multiple work history entries (some only have one). Each person that has multiple entries has a different start date for each work entry (farm work, non-farm work, non_employed, and abroad). I am trying to generate a variable that identifies the consecutive entries based on the start dates of each individual's work history entries so that I can use the L. command to pull data values from the previous work history entry. My goal is to identify the most recent work history entry for each person and determine whether their second most recent entry in the data base was employed (either in farm work or in non-farm work). However, every time I run this code, I get a different mean for the variable I am interested in (separated_to_employv2) so my regression coefficients are always slightly different, although they are usually very close to each other. So far, I tried to make the time variable equal to the running sum of a variable that consists only of ones for each worker to identify each individuals' work history entry by sorting the data by worker id and start date and generating a time variable called "file_num". Can someone tell me what I am doing wrong or help me resolve this. The summary statistics for two separate runs of the same exact code are also shown below...note the different means. Not sure why I am getting different summary stats for this variable when I run the exact same code multiple times. Any help would be greatly appreciated. Thanks is advance.
use "C:\Users\Zach\Dropbox\H-2A\Generated Data Files\NAWS Workgrid with Main File Merged 1989-2018.dta", clear
rename *, lower
encode c06, gen(work_type)
gen abroad = work_type==1
gen farm_work = work_type==2
gen non_farm_work = work_type==3
gen non_employed = work_type==4
gen start_date = c09a
gen end_date = c09b
gen file_num = .
gen ones = 1
sort fwid start_date
by fwid: replace file_num = sum(ones)
xtset fwid file_num
gen separated_to_employv2 = (l.farm_work==1 | l.non_farm_work==1) & l.end_date<start_date
sum separated_to_employv2, d
Array
Array
0 Response to Different lagged variables
Post a Comment