I’m currently working as a research assistant, using my supervisor’s code, which uses employee-level data for a firm which “de-trashes” stock coming into its warehouse i.e., removes transit packaging.
The code is designed to estimate productivity, measured in units [de-trashed] per minute (upm). It uses the reghdfe command, a linear regression that absorbs multiple layers of fixed effects. It also uses an independent variable called PLANNED_UPH which is a target that, if reached, workers get paid a bonus.
The fixed effects used in the regression equation are:
- fe3_j (SKU code i.e., product fixed effects)
- fe3_i (worker fixed effects)
- fe3_t (date fixed effects)
- fe3_dow (day of week fixed effects)
- fe3_shift (shift type fixed effects i.e., day, early or late shift)
- fe3_h (hour of the day fixed effects)
- fe3_handle (handling class fixed effects)
- fe3_station (warehouse workstation fixed effects)
- fe3_group (group of workers fixed effects)
reghdfe uph PLANNED_UPH, ///
absorb(fe3_j=SKU_ID fe3_i=user_code fe3_t=date_code fe3_dow=dow fe3_shift=shift_type fe3_h=HourDay1 ///
fe3_handle=HANDLING_CLASS fe3_station=STATION_ID fe3_group=GROUP_ID)
quietly estadd local controls "Yes"
quietly estadd local FE_t "Yes"
quietly estadd local FE_i "Yes"
quietly estadd local FE_j "Yes"
est store H3
The output (H3) is as follows:
HDFE Linear regression | Number of obs = | 2,480,900 | ||
Absorbing 9 HDFE groups | F( 1,2454358) = | 1.66 | ||
Prob > F = | 0.1971 | |||
R-squared = | 0.5447 | |||
Adj R-squared = | 0.5398 | |||
Within R-sq. = | 0 | |||
Root MSE = | 0.2292 | |||
uph Coef. | Std. Err. | t | P>t [95% Conf. | Interval] |
PLANNED_UPH -2.25e-06 | 1.75E-06 | -1.29 | 0.197 -5.68e-06 | 1.17E-06 |
_cons .4962852 | 0.002311 | 214.75 | 0.000 .4917558 | 0.5008146 |
Absorbed degrees of freedom: | ||||
Absorbed FE | Categories | Redundant | Num. Coefs | |
- | ||||
SKU_ID | 25692 | 0 | 25692 | |
user_code | 567 | 1 | 566 | |
date_code | 232 | 1 | 231 | |
dow | 7 | 7 | 0 | |
shift_type | 3 | 1 | 2 | |
HourDay1 | 9 | 1 | 8 | |
HANDLING_CLASS | 2 | 2 | 0 | |
STATION_ID | 38 | 1 | 37 | |
GROUP_ID | 7 | 2 | 5 |
What I have been asked to do is to first, split the data in half by date (I did this by just creating binary dummies called split1 and split2 to represent data from the first and second halves of the year, respectively). I then have to run the same regression again for just the first half and then copy the values of the coefficients on the fixed effects into the data subset from the second half.
To run the regression on the first half of code, I thought of running the code with if-statements so that the regressions would only run if split1==1. Then for each user ID (worker), I could copy the coefficients from split1 to split2 somehow, then run the code only for split2. However, wherever I place the if-statements in the code, it returns with errors. I’m grateful for any ideas, thanks.
0 Response to Using reghdfe command with if-statements
Post a Comment