My data consists of two cohorts (2005 cohort and 2015 cohort)
The first cohort starts on 2005 and end on 2007.
The second cohort starts on 2015 and end on 2017.
I appended these two panel datasets and the results are below.
Code:
. list pid year peducost male cohort if 6904 <= pid & pid <= 10005, sep(15)
+-----------------------------------------+
| pid year peducost male cohort |
|-----------------------------------------|
20710. | 6904 2005 0 1 2005 |
20711. | 6904 2006 0 1 2005 |
20712. | 6904 2007 0 1 2005 |
20713. | 6905 2005 30 1 2005 |
20714. | 6905 2006 50 1 2005 |
20715. | 6905 2007 58 1 2005 |
20716. | 6906 2005 12 1 2005 |
20717. | 6906 2006 27 1 2005 |
20718. | 6906 2007 22 1 2005 |
20719. | 6907 2005 18 1 2005 |
20720. | 6907 2006 27 1 2005 |
20721. | 6907 2007 18 1 2005 |
20722. | 6908 2005 0 1 2005 |
20723. | 6908 2006 75 1 2005 |
20724. | 6908 2007 26 1 2005 |
|-----------------------------------------|
20725. | 10001 2015 0 0 2015 |
20726. | 10001 2016 0 0 2015 |
20727. | 10001 2017 0 0 2015 |
20728. | 10002 2015 9 0 2015 |
20729. | 10002 2016 0 0 2015 |
20730. | 10002 2017 0 0 2015 |
20731. | 10003 2015 0 0 2015 |
20732. | 10003 2016 0 0 2015 |
20733. | 10003 2017 34 0 2015 |
20734. | 10004 2015 0 1 2015 |
20735. | 10004 2016 0 1 2015 |
20736. | 10004 2017 0 1 2015 |
20737. | 10005 2015 0 0 2015 |
20738. | 10005 2016 0 0 2015 |
20739. | 10005 2017 0 0 2015 |
+-----------------------------------------+That is I am using two panel datasets simultaneously (2005 cohort set and 2015 cohort set).
Here, I want to know whether the partial effects of gender on private education cost are different between the two cohorts.
So, I run a regression with an interaction term like below.
Code:
. xtset pid year
panel variable: pid (unbalanced)
time variable: year, 2005 to 2017, but with gaps
delta: 1 unit
. global ctrlvar "dadage dadagesq momage momagesq i.dadedu i.momedu"
.
. gen dummy_2015 = (cohort == 2015)
. xtreg peducost 1.male#1.dummy_2015 male $ctrlvar i.urbrur b2005.year i.dummy_2015, re vce(cl pid)
note: 1.dummy_2015 omitted because of collinearity
Random-effects GLS regression Number of obs = 31,735
Group variable: pid Number of groups = 6,836
R-sq: Obs per group:
within = 0.1524 min = 1
between = 0.2032 avg = 4.6
overall = 0.1707 max = 6
Wald chi2(18) = 3477.54
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 6,836 clusters in pid)
---------------------------------------------------------------------------------
| Robust
peducost | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
male#dummy_2015 |
1 1 | -.720143 .891734 -0.81 0.419 -2.46791 1.027624
|
male | 1.362174 .5742116 2.37 0.018 .2367403 2.487609
dadage | 1.108523 .4977059 2.23 0.026 .1330374 2.084009
dadagesq | -.0124439 .0051745 -2.40 0.016 -.0225856 -.0023022
momage | 1.653206 .4077752 4.05 0.000 .8539813 2.452431
momagesq | -.0164883 .0043175 -3.82 0.000 -.0249505 -.0080262
|
dadedu |
high_school | 2.382997 .8907487 2.68 0.007 .6371611 4.128832
university | 10.57592 .9752616 10.84 0.000 8.664444 12.4874
|
momedu |
high_school | 3.838404 .8931592 4.30 0.000 2.087844 5.588964
university | 12.82762 1.061476 12.08 0.000 10.74717 14.90807
|
urbrur |
big_city | -8.737641 .8099345 -10.79 0.000 -10.32508 -7.150199
city | -9.8842 .7329264 -13.49 0.000 -11.32071 -8.447691
rural | -15.79478 .8363283 -18.89 0.000 -17.43395 -14.15561
|
year |
2006 | 3.182023 .2959997 10.75 0.000 2.601874 3.762171
2007 | 11.30476 .491985 22.98 0.000 10.34049 12.26903
2015 | 11.79945 .6385644 18.48 0.000 10.54789 13.05101
2016 | 13.24131 .6740927 19.64 0.000 11.92011 14.56251
2017 | 15.41846 .7222268 21.35 0.000 14.00292 16.834
|
1.dummy_2015 | 0 (omitted)
_cons | -54.29597 10.39154 -5.23 0.000 -74.66301 -33.92893
----------------+----------------------------------------------------------------
sigma_u | 12.830406
sigma_e | 23.443425
rho | .23049036 (fraction of variance due to u_i)
---------------------------------------------------------------------------------Here, the problem is that the dummy_2015 variable (that is one if a person is in the 2015 cohort) is omitted.
I think, the dummy_2015 and time dummies cannot be used together because of the multicollinearity.
One solution is that I just use cross-sectional data (For example, combining 2005 and 2015 data).
But, due to my personal reason, I want to use two panel datasets simultaneously.
In this case, how can I test whether the partial effects of gender is different between the two cohorts?
Thank you for your time spent to read this question.
0 Response to Using two panel datasets simultaneously: time dummies and cohort dummy
Post a Comment