Dear All

I have a peculiar issue I wish to share with you, in the hope of receiving some advice.
I have the following panel dataset (unbalanced), with four variables:

Industry-code Year Industry-sale Number of firm in industry
12 2001 34014 5
12 2002 35402 4
12 2003 29473 5
12 2004 . 5
12 2005 29044 7
12 2006 31024 7
12 2007 32209 10
12 2008 33218 9
13 2004 5162 5
13 2005 .
13 2006 5234 6
… … … …

I have to run this regression:

Industry-sale = a + year + error

Specifically, I want this regression to be run for each year, based on data from the previous 5 years. In other words: I want to create a rolling regression for each industry-year, in the following way and under the following conditions:
  • within each industry, for each year calculate the regression: industry-sale = a + year + error
Note: The calculation must be based on the information from the previous 5 years. For example, for the year 2008, the regression is based on data from years 2003, 2004, 2005, 2006, 2007. The window of 5 years is fixed and does not change, as the focal year of the regression moves ahead (i.e., 2002, 2003, etc.)
  • condition A: if in any of the previous 5 years, there is a missing value in either the depvar or indepvar, no estimate is given; that is, stata should return just a missing value;
  • condition B: if in any of the previous 5 years, the number of firms in industry is below 5 (i.e., 4 or less), then no estimate is given; that is, stata should return just a missing value.
Once these regressions are calculated for the whole dataset, I wish the standard error of the coefficient “year” to be stored in the dataset. I am aware of the command rolling. For example, the command:

rolling regress_SE = _se[year], window(5): regress industry-sale year

does this job. Howevr, the problem with this command is that this command does not account for missing values. That is, if there is a missing value, it will calculate the regression over 4 years, whereas I wish stata to not calculate/store this regression estimate (condition A above); or, if the number of firms is below 5, this command will still run the regression, whereas I wish it to not calculate the regression (condition B above).

Can anyone help me?
Thanks