Hi All, I am trying to make a variable that identifies whether a person's second to last work history entry was employment (as opposed to being unemployed or abroad). The problem I am running into is that I do not have a traditional panel data set, as each person's work history entries are classified by different start dates over a period of 30 years, and I want to identify each person's most recent work history entry and their second most recent work history entry using the L. command. So far, I have sorted my data by worker id (fwid) and work history entry start date, and I have tried to generate a variable that identifies their work history entry number by generating a running sum of ones (called file_num) for each individual (after sorting the data). My idea was to use the xtset command with the worker's id as the panel variable and the file_num as the time variable. Then I was going to identify the max value for the file_num variable and use the L. command to identify the second to last entry for each person. The weird this is that when I run this same exact set of code from start to finish, my variable of interest (separated_to_employ_v2) winds up with different summary statistics every time, and I cannot figure out why. Any help you can provide to resolve this issue would be greatly appreciated, as this issue is preventing me from replicating my regression results when I re-run the code.
use "C:\Users\Zach\Dropbox\H-2A\Generated Data Files\NAWS Workgrid with Main File Merged 1989-2018.dta", clear
rename *, lower
encode c06, gen(work_type)
gen abroad = work_type==1
gen farm_work = work_type==2
gen non_farm_work = work_type==3
gen non_employed = work_type==4
gen file_num = .
gen ones = 1
sort fwid start_date
by fwid: replace file_num = sum(ones)
xtset fwid file_num
gen separated_to_employv2 = (l.farm_work==1 | l.non_farm_work==1) & l.end_date<start_date
sum separated_to_employv2, d
Here are the summary stats for the variable "separated_to_employ_v2" from two separate runs of the code from start to finish...note the difference in the means.
Array Array
Related Posts with Please Help with Code
"command tsset is unrecognized" Error 199Hi, since this is my first post/question I would like to apologize in advance if this is the wrong f…
mi impute for two variablesHi, I am trying to impute two separate variables. When trying to determine an appropriate imputatio…
Plausible values: what do i do?Hi, i have found nearly nothing on this. It is used mostly on PISA, TIMMS and PIRLS testing. I kind …
probability estimation using -margins- from zero cell in cross tabulationDear statalisters, I would like to ask you please some help interpreting margins I use Stata 14. W…
Displaying the number of observationsDear statalist, While running a logistic regression, is it possible to display the number of observ…
Subscribe to:
Post Comments (Atom)
0 Response to Please Help with Code
Post a Comment