Hi all,

so, this is a quite border line question. Hence, sorry if it is out of topic a bit.

The deal is the following: I have two datasets, call them P1 and P2. They are both strongly balanced panels made of 262 individuals. In the first one, the individuals are observed for 7 time periods (from 1997 to 2003):

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float idatc3 int Year float dummy_3
 1 1997 0
 1 1998 0
 1 1999 0
 1 2000 0
 1 2001 1
 1 2002 0
 1 2003 0
 3 1997 0
 3 1998 0
 3 1999 0
 3 2000 0
 3 2001 0
 3 2002 1
 3 2003 0
 4 1997 0
 4 1998 0
 4 1999 0
 4 2000 0
 4 2001 0
 4 2002 0
 4 2003 0
 5 1997 0
 5 1998 0
 5 1999 0
 5 2000 0
 5 2001 0
 5 2002 0
 5 2003 0
 6 1997 0
 6 1998 0
 6 1999 0
 6 2000 0
 6 2001 0
 6 2002 0
 6 2003 0
 7 1997 0
 7 1998 0
 7 1999 0
 7 2000 0
 7 2001 0
 7 2002 0
 7 2003 0
 8 1997 0
 8 1998 0
 8 1999 0
 8 2000 0
 8 2001 0
 8 2002 0
 8 2003 0
 9 1997 0
 9 1998 0
 9 1999 0
 9 2000 0
 9 2001 0
 9 2002 0
 9 2003 0
10 1997 0
10 1998 0
10 1999 0
10 2000 0
10 2001 0
10 2002 0
10 2003 0
11 1997 0
11 1998 0
11 1999 0
11 2000 0
11 2001 0
11 2002 0
11 2003 0
12 1997 0
12 1998 0
12 1999 0
12 2000 0
12 2001 0
12 2002 0
12 2003 0
13 1997 0
13 1998 0
13 1999 0
13 2000 0
13 2001 0
13 2002 0
13 2003 0
14 1997 0
14 1998 0
14 1999 0
14 2000 0
14 2001 0
14 2002 0
14 2003 0
15 1997 0
15 1998 0
15 1999 0
15 2000 0
15 2001 0
15 2002 0
15 2003 0
end
In the second one (P2), the individuals are observed for 10 time periods (from 2004 to 2013):

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(idatc3 Year dummy_3)
 1 2004 0
 1 2005 0
 1 2006 0
 1 2007 0
 1 2008 1
 1 2009 0
 1 2010 0
 1 2011 0
 1 2012 0
 1 2013 0
 2 2004 0
 2 2005 0
 2 2006 0
 2 2007 0
 2 2008 0
 2 2009 0
 2 2010 0
 2 2011 0
 2 2012 0
 2 2013 0
 3 2004 0
 3 2005 0
 3 2006 0
 3 2007 0
 3 2008 0
 3 2009 0
 3 2010 0
 3 2011 0
 3 2012 0
 3 2013 0
 4 2004 0
 4 2005 0
 4 2006 0
 4 2007 0
 4 2008 0
 4 2009 0
 4 2010 0
 4 2011 0
 4 2012 0
 4 2013 0
 5 2004 0
 5 2005 0
 5 2006 0
 5 2007 0
 5 2008 0
 5 2009 0
 5 2010 0
 5 2011 0
 5 2012 0
 5 2013 0
 6 2004 0
 6 2005 0
 6 2006 0
 6 2007 0
 6 2008 0
 6 2009 0
 6 2010 0
 6 2011 0
 6 2012 0
 6 2013 0
 7 2004 0
 7 2005 0
 7 2006 0
 7 2007 0
 7 2008 0
 7 2009 0
 7 2010 0
 7 2011 0
 7 2012 0
 7 2013 0
 8 2004 0
 8 2005 0
 8 2006 0
 8 2007 0
 8 2008 0
 8 2009 0
 8 2010 0
 8 2011 0
 8 2012 0
 8 2013 0
 9 2004 0
 9 2005 0
 9 2006 0
 9 2007 0
 9 2008 0
 9 2009 0
 9 2010 0
 9 2011 0
 9 2012 0
 9 2013 0
10 2004 0
10 2005 0
10 2006 0
10 2007 0
10 2008 0
10 2009 0
10 2010 0
10 2011 0
10 2012 0
10 2013 0
end
Now. If taken alone, the two databases, produce the expected estimates. In particular, the coefficients of dummy_3 and its lag are expected to be negative (the significance level does not matter at this stage). In P1 and P2, the coefficients are as expected, but when appending P1 and P2 (forming let's say P1P2), the coefficients are not a weighted mean of the coefficients of 1 and P2 as one should expect from theory. rather, they also change sign, which is unexpected:

Code:
. esttab, drop( _cons)

------------------------------------------------------------
                      (1)             (2)             (3)  
                        y               y               y  
------------------------------------------------------------
dummy_3           -0.0914         -0.0194           0.150  
                  (-1.27)         (-0.35)          (1.53)  

L.dummy_3          -0.175*       -0.00719          0.0822  
                  (-2.16)         (-0.10)          (0.95)  

1998.Year               0                               0  
                      (.)                             (.)  

1999.Year          0.0820**                        0.0770**
                   (3.03)                          (2.84)  

2000.Year          0.0997**                        0.0948**
                   (2.90)                          (2.71)  

2001.Year           0.234***                        0.232***
                   (5.89)                          (5.70)  

2002.Year           0.339***                        0.341***
                   (6.23)                          (6.21)  

2003.Year           0.382***                        0.389***
                   (5.50)                          (5.61)  

2005.Year                               0           1.516***
                                      (.)         (18.40)  

2006.Year                         -0.0172           1.496***
                                  (-0.31)         (17.43)  

2007.Year                         0.00242           1.514***
                                   (0.04)         (15.32)  

2008.Year                          0.0621           1.576***
                                   (0.88)         (15.56)  

2009.Year                          0.0839           1.597***
                                   (1.11)         (14.93)  

2010.Year                           0.106           1.617***
                                   (1.28)         (14.13)  

2011.Year                           0.111           1.621***
                                   (1.25)         (13.62)  

2012.Year                          0.0929           1.606***
                                   (1.16)         (14.06)  

2013.Year                           0.115           1.629***
                                   (1.31)         (13.57)  

2004.Year                                           1.478***
                                                  (18.51)  
------------------------------------------------------------
N                    1572            2358            4192
Notice that the number of observations seems to be correct: (262x7) - 262 for P1, (262x10)-262 for P2 and [(262x7)+(262x10)]-262 for P1P2. Moreover, the dependent variable its sales in ln. I have an idea that the problem lies in the junction oof P1 with P2 and in particular it should rely on the junction of the dependent variable sale which shows a different trend in the two joined periods (1997-2003 ad 2004-2013) as shown in the attached file.

The question, therefore is: is there a way to correct for the different trends? Is there another reason for that unexpected results in your opinion?

Thank you in advance,

Federico