I have a peculiar issue I wish to share with you, in the hope of receiving some advice.
I have the following panel dataset (unbalanced), with four variables:
Industry-code Year Industry-sale Number of firm in industry
12 2001 34014 5
12 2002 35402 4
12 2003 29473 5
12 2004 . 5
12 2005 29044 7
12 2006 31024 7
12 2007 32209 10
12 2008 33218 9
13 2004 5162 5
13 2005 .
13 2006 5234 6
… … … …
I have to run this regression:
Industry-sale = a + year + error
Specifically, I want this regression to be run for each year, based on data from the previous 5 years. In other words: I want to create a rolling regression for each industry-year, in the following way and under the following conditions:
- within each industry, for each year calculate the regression: industry-sale = a + year + error
- condition A: if in any of the previous 5 years, there is a missing value in either the depvar or indepvar, no estimate is given; that is, stata should return just a missing value;
- condition B: if in any of the previous 5 years, the number of firms in industry is below 5 (i.e., 4 or less), then no estimate is given; that is, stata should return just a missing value.
rolling regress_SE = _se[year], window(5): regress industry-sale year
does this job. Howevr, the problem with this command is that this command does not account for missing values. That is, if there is a missing value, it will calculate the regression over 4 years, whereas I wish stata to not calculate/store this regression estimate (condition A above); or, if the number of firms is below 5, this command will still run the regression, whereas I wish it to not calculate the regression (condition B above).
Can anyone help me?
Thanks
0 Response to Rolling regression or loops - missing values
Post a Comment