I want to study the impact of a migrants language proficiency and best friends origin on how long it takes to get the first job after migration. I'm using Stata 14.2, I have panel data from two waves and want to do survival analysis using Cox regression.
My variables (among others) are:
ACT – employment status
WK_RC – if respondents ever worked in recieving country (RC) after migration
IMDATE_op – date of migration, only asked in wave 1
JBSTART_RC_op – date of job start in RC, only asked in wave 1
CURRJBSTART_op – date of job start in RC, only asked in wave 2
SAMEJB – if the job reported in CURRJBSTART_op is the same as in JBSTART_RC_op, only asked in wave 2
FR1CB – background of best friend
LRCSPK – RC language proficiency
What I did so far:
use datawave1
append using datawave2
sort ID wave
Then I excluded persons who dropped out in wave 2:
egen occurences=count(_n), by(ID)
drop if occurences < 2
Because ID was a string, i created a new identifier variable:
egen id= group(ID)
list id
Now the data looks like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id byte(wave ACT WK_RC) str23(IMDATE_op JBSTART_RC_op CURRJBSTART_op) byte(SAMEJB FR1CB LRCSPK) 2 1 2 2 "10/2009" "-99 (filtered)" "" . 1 3 3 1 1 -99 "10/2009" "05/2010" "" . 1 2 3 2 1 -99 "" "" "04/2011" 2 1 2 4 1 2 2 "05/2010" "-99 (filtered)" "" . 1 3 8 1 1 -99 "10/2009" "10/2009" "" . 1 4 8 2 1 -99 "" "" "11/2009" 1 1 4 9 1 1 -99 "10/2009" "10/2009" "" . 1 1 9 2 1 -99 "" "" "11/2011" 2 1 1 11 1 1 -99 "07/2010" "06/2010" "" . -99 3 11 2 1 -99 "" "" "07/2010" 1 2 3 12 1 1 -99 "03/2010" "04/2010" "" . 1 4 12 2 1 -99 "" "" "04/2009" 1 3 3 13 1 1 -99 "06/2010" "06/2010" "" . -99 3 14 1 1 -99 "01/2010" "01/2010" "" . 2 2 14 2 1 -99 "" "" "10/2010" 1 1 2 16 1 1 -99 "04/2010" "04/2010" "" . 1 3 16 2 1 -99 "" "" "12/2010" 1 2 3 17 1 1 -99 "10/2009" "12/2009" "" . -99 3 17 2 1 -99 "" "" "-52/2010" 1 1 2 18 2 1 -99 "" "" "-99 (filtered)" -99 -99 -99 20 1 1 -99 "12/2006" "02/2007" "" . 2 2 20 2 1 -99 "" "" "02/2009" 1 1 2 21 1 1 -99 "08/2010" "08/2010" "" . 2 2 21 2 1 -99 "" "" "09/2010" 1 1 3 22 1 1 -99 "08/2010" "08/2010" "" . 2 1 22 2 1 -99 "" "" "08/2007" 1 2 1 23 1 1 -99 "10/2009" "-99 (filtered)" "" . -99 3 23 2 1 -99 "" "" "-99 (filtered)" -99 -99 -99 24 1 1 -99 "09/2009" "02/2010" "" . 1 2 25 1 1 -99 "10/2009" "04/2010" "" . 1 3 25 2 1 -99 "" "" "10/2012" 1 1 3 26 2 1 -99 "" "" "03/2012" 2 -99 1 28 1 2 1 "04/2010" "09/2010" "" . -99 2 28 2 1 -99 "" "" "07/2011" 2 1 2 29 1 1 -99 "01/2010" "02/2010" "" . 1 3 29 2 1 -99 "" "" "02/2010" 1 1 3 30 1 2 2 "09/2010" "-99 (filtered)" "" . -99 2 30 2 1 -99 "" "" "10/2011" 1 1 1 32 1 2 2 "08/2010" "-99 (filtered)" "" . 1 2 32 2 2 1 "" "" "-52/2010" 1 1 2 end label values ACT Con38 label def Con38 1 "working", modify label def Con38 2 "unemployed", modify label values WK_RC Con3 label def Con3 -99 "filtered", modify label def Con3 1 "yes", modify label def Con3 2 "no", modify label values SAMEJB Con3_7 label def Con3_7 -99 "filtered", modify label def Con3_7 1 "yes", modify label def Con3_7 2 "no", modify label values FR1CB Con4 label def Con4 -99 "filtered", modify label def Con4 1 "[in CO]", modify label def Con4 2 "[RC]", modify label def Con4 3 "other", modify label values LRCSPK Con19 label def Con19 -99 "filtered", modify label def Con19 1 "very well", modify label def Con19 2 "well", modify label def Con19 3 "not well", modify label def Con19 4 "not at all", modify
Until now I didn't recode answers like „don't know“ or „refused“ as missings (.), because everytime a question was asked only in one panel wave the missing answers in the other panel wave are coded as missing (.), so I was afraid I'd mash things up if I also recoded the true missings.
Now my questions are:
1. How to create a time variable for survival analysis, that is, the time from date of migration to start of the first job in RC?
I know I somehow have to combine JBSTART_RC_op and CURRJBSTART_op (and maybe even SAMEJB) before substracting IMDATE_op from it, but I don't know how to do it (especially since I got so many false „missings“ in these variables because they were only asked in one wave).
2. How to create the failure indicator (employed: yes/no) while correctly taking into account respondents who are on maternity/paternity leave?
Kind regards,
Anna
0 Response to Calculate time variable for survival analysis with panel data
Post a Comment