Good morning all,
Hope you're well. Apologies if this is a very basic enquiry but I am still finding my feet in Stata and am completely new to panel data analysis.
I am trying to use a standard fixed effects balanced panel data model to estimate the effect of lockdown on individual workers' well-being (i.e. within-variation). I am using primary survey data which was collected from approximately 700 individuals in two waves. Wave 1 occurred pre-lockdown (data was collected between Nov 2019 to Feb 2020). Wave 2 was collected during and immediately after lockdown (May -June).The data has been reshaped into long panel data format using individuals id and the time variable wave (i.e. all variables are suffixed with 1 or 2 to indicate whether they come from wave 1 or 2 e.g. parentalstatus1 parentalstatus2 etc)
My dependent variables are various outcome variables e.g. job satisfaction ("ws"). My main independent variable is a binary dummy variable ("wave2") which indicates whether the observations originate in wave 1 (pre-lockdown) or wave 2 (during lockdown). wave2 was generated as follows:
gen wave2=0 if wave==1
replace wave2=1 if wave==2
label var wave2 "wave dummy indicating whether survey was taken pre or during c19"
My basic model specification is: xtreg ws wave2, fe vce(cluster id)
I have two questions:
1) How do I get the model to take account of the fact that there is a non-uniform time gap between the two surveys i.e. worker1 may have answered Survey 1 in Nov 2019 and Survey2 in June 2020 whereas worker2 may have answered Survey 1 in Feb 2020 and Survey 2 in July 2020?
I attempted to generate a duration variable as follows:
gen surveydate1=date(date1,"YMD###")
gen surveydate2=date(date2,"YMD###")
format surveydate1 %td
format surveydate2 %td
gen timebetweensurveys=surveydate2-surveydate1
label var timebetweensurveys "no of days between completing surveys 1 and 2"
Where surveydate1 is the date on which survey 1 was completed by worker i etc. The variable 'works' in that it generates the number of days between surveys which I what I am trying to capture BUT when I try and use it in my xtreg regression e.g. xtreg ws wave2 timebetweensurveys, fe vce(cluster id) it is of course omitted due to collinearity so I am back to square one! I thought about adding i.surveydate into the regression instead but that seems to mess up all my results i.e. changes the sign of the main coefficient etc. I assume this is because my main indepdent variable is essentially time variation so by introducing a time fixed effect I am using up all that variation that I need to make the model run? So my question is: how do I account for duration in my model?
2) My second question is more general. It relates to the fact that when I run my basic model, for some (not all) of my outcome variables I am getting a Prob > F which is greater than zero and a very low within R-sq figure. See below
. xtreg ws wave2, fe vce (cluster id)
Fixed-effects (within) regression Number of obs = 1,238
Group variable: id Number of groups = 621
R-sq: Obs per group:
within = 0.0042 min = 1
between = 0.0028 avg = 2.0
overall = 0.0010 max = 2
F(1,620) = 2.60
corr(u_i, Xb) = 0.0020 Prob > F = 0.1077
(Std. Err. adjusted for 621 clusters in id)
------------------------------------------------------------------------------
| Robust
ws | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wave2 | .1296596 .0804776 1.61 0.108 -.0283821 .2877014
_cons | 5.978086 .0401738 148.81 0.000 5.899193 6.056979
-------------+----------------------------------------------------------------
sigma_u | 1.9463651
sigma_e | 1.4129581
rho | .65487919 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Similarly, when I just run reg ws wave2 i.id I get missing values for F and Prob > F although the R squared is quite high. The beta for wave2 is the same under both models
. reg ws wave2 i.id,vce (cluster id)
Linear regression Number of obs = 1,238
F(0, 620) = .
Prob > F = .
R-squared = 0.7924
Root MSE = 1.413
(Std. Err. adjusted for 621 clusters in id)
------------------------------------------------------------------------------------------
| Robust
ws | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
wave2 | .1296596 .1139971 1.14 0.256 -.0942077 .3535269
|
id |
543e85adfdf99b735690~90 | 1.5 9.82e-14 1.5e+13 0.000 1.5 1.5
546aa9acfdf99b3f01f12~4 | 4.5 9.82e-14 4.6e+13 0.000 4.5 4.5
547a4f58fdf99b5321ba5~4 | 2 9.82e-14 2.0e+13 0.000 2 2
54876fe7fdf99b03e~64374 | 1 9.82e-14 1.0e+13 0.000 1 1
54b8ea6cfdf99b34ce257~5 | -1 9.82e-14 -1.0e+13 0.000 -1 -1
54d35e1afdf99b68c74dd~c | 1 9.82e-14 1.0e+13 0.000 1 1
Also, I am concerned that my model is unstable as when I add in the control variable ("wwbpriority" which is a 0-10 rating of org's prioritisation of wellbeing) the main coefficient switches sign from positive (which is what I would expect i.e. the mean value of ws DOES increase between survey 1 and 2, albeit non-significantly) and the Prob > F reverts to 0.000.
. xtreg ws wave2 wwbpriority, fe vce (cluster id)
Fixed-effects (within) regression Number of obs = 1,234
Group variable: id Number of groups = 621
R-sq: Obs per group:
within = 0.1127 min = 1
between = 0.3198 avg = 2.0
overall = 0.2680 max = 2
F(2,620) = 30.59
corr(u_i, Xb) = 0.2239 Prob > F = 0.0000
(Std. Err. adjusted for 621 clusters in id)
------------------------------------------------------------------------------
| Robust
ws | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wave2 | -.0523467 .0779918 -0.67 0.502 -.2055069 .1008134
wwbpriority | .3048209 .0393393 7.75 0.000 .2275665 .3820752
_cons | 4.313767 .2212248 19.50 0.000 3.879326 4.748208
-------------+----------------------------------------------------------------
sigma_u | 1.6614572
sigma_e | 1.3367479
rho | .60704568 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Does this mean that my model is unstable and could be misspecified? I am worried that I am doing something basic wrong here that I need to correct before I go any further and try to introduce further controls etc.
Thanks in advance for taking the time to read this message. Any help or advice that you can provide would be very much appreciated.
Diane
0 Response to Fixed effects model: how do I take account of non-uniform time intervals?
Post a Comment