Hi Statalist. This may be a long one.

I am assessing the effects of a change in state legislation on two variables: employment and wage. To do this, I have opted to run a difference-in-differences (DD) and triple differences (DDD) regression. I am using microdata from the Current Population Survey, specifically IPUMS-CPS.

I understand the basic intuition behind DD and DDD. What I do not understand is the following questions which I will explain one by one. I have quite a few questions so I have formatted it in a way that groups topics together. Long questions include an explanation of the problem for clarity.



  • How do I decide which control variables to include in my basic DD regression? I have data for education, age, work experience, maritial status, and race. I'd imagine including race would be unproductive since it doesn't change between groups or across time.

  • What should be my unit of TIME given that CPS data is surveyed monthly? Should I still treat one period as a year or a month?
  • Should I use a cross-sectional data set or a panel data set?
I have a specific date (November 12, 2008) of my policy implementation. CPS data is surveyed every month. In this case, what should be my unit of TIME? Should I use 12 months (November 2007 to October 2008 as pre-treatment period; November 2008 to October 2009 as post-treatment period). This would mean using a cross-sectional data set. I understand that doing this would require the composition of both cross-sections to be similar but I don't think this is the case since survey respondents constantly change (enter and exit the sample).

Alternatively, I could take advantage of CPS 4-8-4 rotation method. In a given monthly sample, 1/8 of the sample enters the survey as month-in-sample 1 (MIS1) and is surveyed for the next four months (as MIS2 in the next month and MIS3 in the following, and so on). After four months of surveys, they are not surveyed for 8 months before re-entering the survey (a year later) as MIS5. I could take advantage of this survey design and construct a panel data set because in a given month, say October 2008 (right before the policy change), MIS1 to MIS4 (half the survey) will reappear exactly 12 months later in the October 2009 sample as MIS5 to MIS 8. Would this be a better route to take? If do go down this route, am I still able to use OLS estimation?


  • What does it mean to include multiple periods? How can I extend the basic DD model to include multiple instances of the policy when the same policy was implemented in different states and at different times?
November 2008 is not the only instance of the policy being implemented. There are seven other instances (December, 1998; March, 2003; and so on) of the same policy being implemented but in different states. How can I modify my model to reflect this? What would my control group be?


  • Since employment is a binary dependent variable at the individual level, should I use a linear probability model (LPM), logit, or probit regression? I have heard that interaction variables in non-linear regressions are difficult to interpret but is that a legitimate reason to not use it?

  • When should I adjust (cluster) my standard errors to account for serial correlation and heterogeneity?



I have run a prelimiary DD regression. Here it is if you'd like to take a look:
Code:
. regress lnwage i.time##i.treated, robust

Linear regression                               Number of obs     =      7,850
                                                F(3, 7846)        =      29.75
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0110
                                                Root MSE          =     .49671

------------------------------------------------------------------------------
             |               Robust
      lnwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.time |   .0487005   .0165828     2.94   0.003     .0161938    .0812072
   1.treated |  -.0858513   .0158602    -5.41   0.000    -.1169415    -.054761
             |
time#treated |
        1 1  |   -.023486   .0223784    -1.05   0.294    -.0673536    .0203817
             |
       _cons |   2.601396   .0119016   218.57   0.000     2.578065    2.624726
------------------------------------------------------------------------------
As you can see, my DD estimate is insignificant.