Hello, bit of a complex one here:

I’m currently working as a research assistant, using my supervisor’s code, which uses employee-level data for a firm which “de-trashes” stock coming into its warehouse i.e., removes transit packaging.
The code is designed to estimate productivity, measured in units [de-trashed] per minute (upm). It uses the reghdfe command, a linear regression that absorbs multiple layers of fixed effects. It also uses an independent variable called PLANNED_UPH which is a target that, if reached, workers get paid a bonus.
The fixed effects used in the regression equation are:
  • fe3_j (SKU code i.e., product fixed effects)
  • fe3_i (worker fixed effects)
  • fe3_t (date fixed effects)
  • fe3_dow (day of week fixed effects)
  • fe3_shift (shift type fixed effects i.e., day, early or late shift)
  • fe3_h (hour of the day fixed effects)
  • fe3_handle (handling class fixed effects)
  • fe3_station (warehouse workstation fixed effects)
  • fe3_group (group of workers fixed effects)
The code is as follows:

reghdfe uph PLANNED_UPH, ///
absorb(fe3_j=SKU_ID fe3_i=user_code fe3_t=date_code fe3_dow=dow fe3_shift=shift_type fe3_h=HourDay1 ///
fe3_handle=HANDLING_CLASS fe3_station=STATION_ID fe3_group=GROUP_ID)
quietly estadd local controls "Yes"
quietly estadd local FE_t "Yes"
quietly estadd local FE_i "Yes"
quietly estadd local FE_j "Yes"
est store H3

The output (H3) is as follows:
HDFE Linear regression Number of obs = 2,480,900
Absorbing 9 HDFE groups F( 1,2454358) = 1.66
Prob > F = 0.1971
R-squared = 0.5447
Adj R-squared = 0.5398
Within R-sq. = 0
Root MSE = 0.2292
uph Coef. Std. Err. t P>t [95% Conf. Interval]
PLANNED_UPH -2.25e-06 1.75E-06 -1.29 0.197 -5.68e-06 1.17E-06
_cons .4962852 0.002311 214.75 0.000 .4917558 0.5008146
Absorbed degrees of freedom:
Absorbed FE Categories Redundant Num. Coefs
-
SKU_ID 25692 0 25692
user_code 567 1 566
date_code 232 1 231
dow 7 7 0
shift_type 3 1 2
HourDay1 9 1 8
HANDLING_CLASS 2 2 0
STATION_ID 38 1 37
GROUP_ID 7 2 5

What I have been asked to do is to first, split the data in half by date (I did this by just creating binary dummies called split1 and split2 to represent data from the first and second halves of the year, respectively). I then have to run the same regression again for just the first half and then copy the values of the coefficients on the fixed effects into the data subset from the second half.

To run the regression on the first half of code, I thought of running the code with if-statements so that the regressions would only run if split1==1. Then for each user ID (worker), I could copy the coefficients from split1 to split2 somehow, then run the code only for split2. However, wherever I place the if-statements in the code, it returns with errors. I’m grateful for any ideas, thanks.