Dear everybody

I am investigating how employees are affected when they are outsourced from public employment to private employment. My dataset is in a long format with seven time periods (years). I cannot share my data as it is on a confidential server, but I have made an example dataset with 14 individuals (although my data has more than 300 treated individuals and more than 70.000 matched control individuals). Hopefully, the example data contains enough data points.

In the example data set, my treatment group is outsourced in time 100. I have used a Coarsened Exact Matching procedure to identify a similar group in terms of job type but who are still public employees in time 100 (and in the prior time, 99). I use a variable with 1200 different job types, so the matches are pretty detailed.

I have a few questions that need answering.

First of all, my cem_weights are obtained in time 100 when the outsourcing occurred. As a consequence, the rest of the time periods in each panel does not contain any weights. I could then use the following command to fill out the remaining time periods:
Code:
bysort id (cem_weights): gen cem_weights_all = cem_weights[1]
​​​​However, I am not sure whether this is the proper procedure when conducting a panel data analysis with cem weights?

Secondly, I want to model the effect of the treatment. I have no problem interpreting the following model:
Code:
xtreg salary i.treatment##i.time
margins treatment, at(time=(97(1)103))
marginsplot
So hopefully, my setup is not entirely wrong. I want to add control variables like gender and educational background. Is there a need in this case for a first difference estimator (or fixed effects)? And if so, how would I go about it?

In the case of first differences, I have tried the following code:
Code:
xtreg d.salary i.treatment##i.time [aw = cem_weights_all], nocons
But I receive the error “weights not allowed”, and if I remove the weights, I get “option nocons not allowed”. In all honesty, I have just today figured out how to run first-differences with the “d.-operator”, so my specification might not be correct.

I have also tried the areg command as it allows for weights, but the results are also puzzling as the treatment variable is omitted:
Code:
areg d.salary i.treatment##i.time [aw = cem_weights_all], a(id)
So, my quesiton is, do I need the first difference estimator? And if so how do I obtain it?

I hope somebody out there can help me out.


The example data contains the following variables:
Code:
Id                      
time
treatment (0 control. 1 treatment)
cem_weights
cem_weights_all
outsourcing_year (1 indicates when the outsourcing occurred)
company_id
sector (0 public. 1 private)
job_code (used to match treatment and control)
salary
cem_strata
cem_matched
matched_all