Hello everyone!

For my research paper, I am trying to use the difference-in-differences (DiD) method to estimate the causal effect of a particular social welfare program on low-income individuals' labour supply. The program was restructured and expanded at the beginning of the 2019 taxation year, so it created a quasi-experimental setting in which the treatment is as if randomly assigned (conditional on the variation in individual circumstances that created this randomness).

Since the program was enhanced in the 2019 tax year, I will be defining 2018 as a pre-treatment period and 2019 as post-treatment period (decided not to use 2020 because of COVID-19 affecting labour supply). I want to use the monthly Labour Force Survey (LFS) data from 2018 & 2019, and aggregate/combine the 12 months of each 2018 and 2019 survey data (January to December 2018 LFS data + January to December 2019 LFS data) to generate the "before" and "after" samples. But due to my limited experience working with Stata, I am not sure how to do this.

FYI, my DiD regression model can be expressed as:

LabourSupply = b0 + b1Treatment + b2Post + b3Treatment*Post + bX + e

where the LabourSupply is the binary outcome variable that equals one when an individual is participating in the labour force, zero otherwise.The Treatment variable is a binary variable that equals one if the individual receives treatment, zero otherwise. The variable Post is also binary and it equas one if the indiivdual is observed in 2019, zero if observed in 2018. Treatment*Post is the interaction variable, and X is a vector of control variables.

Below is a picture of 24 monthly LFS data files that I have for 2018 and 2019 (i.e., lfs2018_2 = February 2018 LFS data & lfs2019_5 = May 2019 LFS data). My guess is that I am gonna have to write a loop that reads each monthly file, prepares it for analysis (i.e., introduction sampling restrictions, creating variables) & then writes the now much smaller monthly dataset to an intermediate file. Then read in the first file, and then in a loop read in each subsequent file and "append" it to the existing large file. But again, this is my first time doing my own empirical work using Stata, so I am not sure how to do this.

Array


Thank you in advance.