Hi all,

so I am running a simple regression on products. My aim is to see if at a more aggregate level (categories of products called "atc3") a recall has a negative impact on the sales of the aggregate level I am examining. Of course his Is expected.
However, starting from an unbalanced panel dataset, I need to make it balanced before running the regression which I doo by implementing the following:

Code:
bysort idatc3 (Year): gen byte panelsize = _N
sum panelsize
drop if panelsize < r(max)
Now, this operation drops, among others, a specific atc3 called A3G (again, it does not matter the meaning, it is just a category of products), which has a recall specifically in 2000.
What I am performing is:

Code:
eststo clear
eststo: xtreg y recalls_normalized L.recalls_normalized L2.recalls_normalized i.Year, fe vce(cluster idatc3)
eststo: xtreg y recalls_normalized L.recalls_normalized avg_prd_sq mean_agefirm_byatc mean_agefirm_squared hhi share_generics i.Year average_age_prodbyatc3, fe vce(cluster idatc3)
The problem is in the results. In particular, when I perform the code by dropping all the tac which have a panel size lower than the maximum panelize (i.e. if I balance the panel), then recalls seem not significant and the output is as follows for current recalls, lagged once and lagged twice recalls:

Code:
. esttab, drop(*Year _cons)

------------------------------------------------------------
                      (1)             (2)             (3)   
                        y               y               y   
------------------------------------------------------------
recalls_no~d      -0.0102*       -0.00787         -0.0106   
                  (-2.16)         (-1.28)         (-0.95)   

L.recalls_~d      -0.0144         -0.0102         -0.0184   
                  (-1.88)         (-1.32)         (-1.82)   

L2.recalls~d     -0.00521                        -0.00920   
                  (-0.72)                         (-1.02)   

avg_prd_sq                       -0.00321        -0.00585*  
                                  (-1.48)         (-1.98)   

mean_agefi~c                       0.0107         0.00670   
                                   (0.61)          (0.39)   

mean_agefi~d                    -0.000160       -0.000109   
                                  (-1.03)         (-0.74)   

hhi                                 0.662           0.253   
                                   (1.25)          (0.40)   

share_gene~1                        0.195         -0.0413   
                                   (0.57)         (-0.12)   

average_ag~3                        0.112           0.184   
                                   (1.56)          (1.94)   
------------------------------------------------------------
N                    1355            1598            1332   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
when instead I add only the 4 observations related to A3G (observed from 2000 to 2003), then everything changed sensibly in the expected direction. In particular performing the same code but adding A3G, I end up with much more robust results:

Code:
. esttab, drop(*Year _cons)

------------------------------------------------------------
                      (1)             (2)             (3)   
                        y               y               y   
------------------------------------------------------------
recalls_no~d      -0.0126*        -0.0103         -0.0121   
                  (-2.15)         (-1.46)         (-1.06)   

L.recalls_~d      -0.0165*        -0.0429***      -0.0198*  
                  (-2.23)         (-3.97)         (-2.10)   

L2.recalls~d      -0.0157***                      -0.0160***
                  (-3.68)                         (-4.33)   

avg_prd_sq                       -0.00328        -0.00587*  
                                  (-1.51)         (-1.99)   

mean_agefi~c                       0.0118         0.00687   
                                   (0.68)          (0.40)   

mean_agefi~d                    -0.000161       -0.000111   
                                  (-1.03)         (-0.75)   

hhi                                 0.660           0.250   
                                   (1.25)          (0.40)   

share_gene~1                        0.183         -0.0329   
                                   (0.54)         (-0.09)   

average_ag~3                        0.115           0.185   
                                   (1.59)          (1.95)   
------------------------------------------------------------
N                    1357            1601            1334   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
I am quite puzzled about it. I mean, can only one aggregate level tac change the estimates? Please notice that have a total of 295 atc observed in a maximum of 7 years from 1997 to 2003; in balancing I lose 24 of them (which have panelize lower than 7) among which A3G.

Thank you,

Federico