Hi All, I am trying to make a variable that identifies whether a person's second to last work history entry was employment (as opposed to being unemployed or abroad). The problem I am running into is that I do not have a traditional panel data set, as each person's work history entries are classified by different start dates over a period of 30 years, and I want to identify each person's most recent work history entry and their second most recent work history entry using the L. command. So far, I have sorted my data by worker id (fwid) and work history entry start date, and I have tried to generate a variable that identifies their work history entry number by generating a running sum of ones (called file_num) for each individual (after sorting the data). My idea was to use the xtset command with the worker's id as the panel variable and the file_num as the time variable. Then I was going to identify the max value for the file_num variable and use the L. command to identify the second to last entry for each person. The weird this is that when I run this same exact set of code from start to finish, my variable of interest (separated_to_employ_v2) winds up with different summary statistics every time, and I cannot figure out why. Any help you can provide to resolve this issue would be greatly appreciated, as this issue is preventing me from replicating my regression results when I re-run the code.
use "C:\Users\Zach\Dropbox\H-2A\Generated Data Files\NAWS Workgrid with Main File Merged 1989-2018.dta", clear
rename *, lower
encode c06, gen(work_type)
gen abroad = work_type==1
gen farm_work = work_type==2
gen non_farm_work = work_type==3
gen non_employed = work_type==4
gen file_num = .
gen ones = 1
sort fwid start_date
by fwid: replace file_num = sum(ones)
xtset fwid file_num
gen separated_to_employv2 = (l.farm_work==1 | l.non_farm_work==1) & l.end_date<start_date
sum separated_to_employv2, d
Here are the summary stats for the variable "separated_to_employ_v2" from two separate runs of the code from start to finish...note the difference in the means.
Array Array
Related Posts with Please Help with Code
Deciding between RE and FEI have a theoretical question concerning usage of the FE model for panel data regression. I'm curre…
Sample size for cox proportional hazards model with multiple covariatesThis is my first time using power cox, and I would like confirmation of my code. HTML Code: power…
David Roodman's xtabond2: why difference in no of observations between GMMdiff and GMMsysUsing the dataset abdata.dta I ran two commands, one with the noleveleq option and one without. The …
State Panel data - generating a year treatment variable with a month component?I have annual state panel data. Each state implemented a state reform at different times (year). I h…
Saving F stat after RegressionHi All, I have data that resembles the following: Code: * Example generated by -dataex-. To in…
Subscribe to:
Post Comments (Atom)
0 Response to Please Help with Code
Post a Comment