My data consists of two cohorts (2005 cohort and 2015 cohort)
The first cohort starts on 2005 and end on 2007.
The second cohort starts on 2015 and end on 2017.
I appended these two panel datasets and the results are below.
Code:
. list pid year peducost male cohort if 6904 <= pid & pid <= 10005, sep(15) +-----------------------------------------+ | pid year peducost male cohort | |-----------------------------------------| 20710. | 6904 2005 0 1 2005 | 20711. | 6904 2006 0 1 2005 | 20712. | 6904 2007 0 1 2005 | 20713. | 6905 2005 30 1 2005 | 20714. | 6905 2006 50 1 2005 | 20715. | 6905 2007 58 1 2005 | 20716. | 6906 2005 12 1 2005 | 20717. | 6906 2006 27 1 2005 | 20718. | 6906 2007 22 1 2005 | 20719. | 6907 2005 18 1 2005 | 20720. | 6907 2006 27 1 2005 | 20721. | 6907 2007 18 1 2005 | 20722. | 6908 2005 0 1 2005 | 20723. | 6908 2006 75 1 2005 | 20724. | 6908 2007 26 1 2005 | |-----------------------------------------| 20725. | 10001 2015 0 0 2015 | 20726. | 10001 2016 0 0 2015 | 20727. | 10001 2017 0 0 2015 | 20728. | 10002 2015 9 0 2015 | 20729. | 10002 2016 0 0 2015 | 20730. | 10002 2017 0 0 2015 | 20731. | 10003 2015 0 0 2015 | 20732. | 10003 2016 0 0 2015 | 20733. | 10003 2017 34 0 2015 | 20734. | 10004 2015 0 1 2015 | 20735. | 10004 2016 0 1 2015 | 20736. | 10004 2017 0 1 2015 | 20737. | 10005 2015 0 0 2015 | 20738. | 10005 2016 0 0 2015 | 20739. | 10005 2017 0 0 2015 | +-----------------------------------------+
That is I am using two panel datasets simultaneously (2005 cohort set and 2015 cohort set).
Here, I want to know whether the partial effects of gender on private education cost are different between the two cohorts.
So, I run a regression with an interaction term like below.
Code:
. xtset pid year panel variable: pid (unbalanced) time variable: year, 2005 to 2017, but with gaps delta: 1 unit . global ctrlvar "dadage dadagesq momage momagesq i.dadedu i.momedu" . . gen dummy_2015 = (cohort == 2015) . xtreg peducost 1.male#1.dummy_2015 male $ctrlvar i.urbrur b2005.year i.dummy_2015, re vce(cl pid) note: 1.dummy_2015 omitted because of collinearity Random-effects GLS regression Number of obs = 31,735 Group variable: pid Number of groups = 6,836 R-sq: Obs per group: within = 0.1524 min = 1 between = 0.2032 avg = 4.6 overall = 0.1707 max = 6 Wald chi2(18) = 3477.54 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 (Std. Err. adjusted for 6,836 clusters in pid) --------------------------------------------------------------------------------- | Robust peducost | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------+---------------------------------------------------------------- male#dummy_2015 | 1 1 | -.720143 .891734 -0.81 0.419 -2.46791 1.027624 | male | 1.362174 .5742116 2.37 0.018 .2367403 2.487609 dadage | 1.108523 .4977059 2.23 0.026 .1330374 2.084009 dadagesq | -.0124439 .0051745 -2.40 0.016 -.0225856 -.0023022 momage | 1.653206 .4077752 4.05 0.000 .8539813 2.452431 momagesq | -.0164883 .0043175 -3.82 0.000 -.0249505 -.0080262 | dadedu | high_school | 2.382997 .8907487 2.68 0.007 .6371611 4.128832 university | 10.57592 .9752616 10.84 0.000 8.664444 12.4874 | momedu | high_school | 3.838404 .8931592 4.30 0.000 2.087844 5.588964 university | 12.82762 1.061476 12.08 0.000 10.74717 14.90807 | urbrur | big_city | -8.737641 .8099345 -10.79 0.000 -10.32508 -7.150199 city | -9.8842 .7329264 -13.49 0.000 -11.32071 -8.447691 rural | -15.79478 .8363283 -18.89 0.000 -17.43395 -14.15561 | year | 2006 | 3.182023 .2959997 10.75 0.000 2.601874 3.762171 2007 | 11.30476 .491985 22.98 0.000 10.34049 12.26903 2015 | 11.79945 .6385644 18.48 0.000 10.54789 13.05101 2016 | 13.24131 .6740927 19.64 0.000 11.92011 14.56251 2017 | 15.41846 .7222268 21.35 0.000 14.00292 16.834 | 1.dummy_2015 | 0 (omitted) _cons | -54.29597 10.39154 -5.23 0.000 -74.66301 -33.92893 ----------------+---------------------------------------------------------------- sigma_u | 12.830406 sigma_e | 23.443425 rho | .23049036 (fraction of variance due to u_i) ---------------------------------------------------------------------------------
Here, the problem is that the dummy_2015 variable (that is one if a person is in the 2015 cohort) is omitted.
I think, the dummy_2015 and time dummies cannot be used together because of the multicollinearity.
One solution is that I just use cross-sectional data (For example, combining 2005 and 2015 data).
But, due to my personal reason, I want to use two panel datasets simultaneously.
In this case, how can I test whether the partial effects of gender is different between the two cohorts?
Thank you for your time spent to read this question.
0 Response to Using two panel datasets simultaneously: time dummies and cohort dummy
Post a Comment