I have a panel data for stock returns, at the daily level, with the following variables: firm, year, day, return, and number of trades. I'm trying to remove observations based on on the number of trades variable.

Criteria: I want to check firms that have at least 30 days of trading activity within an year, that is, at least 30 non-blank and non-zero number of trades data points for each firm-year combination.

If a firm-year do pass the criteria, meaning it has at least 30 days with non-zero non-blank data points on the number of trades column, I want to keep all daily observations for that firm-year.

If I firm-year does not pass the criteria, I want to remove all the daily observations for that firm-year. Just to clarify: if a firm has an year for which it does not pass the criteria, but other years that do pass the criteria, I want to remove only the daily observations for that particular year that did not pass the criteria.

I have been trying to write a code for this, but to be honest I'm not very experienced and I could not accomplish this task. If anyone could help, it would be greatly appreciated!



Extra context: I'm using Compustat (compd.funda) merged with CRSP (crspa.dsf) through a link table (crsp.ccmxpf_linktable) to attempt to recreate a measure from Chen, Goldstein and Jiang, 2007 called R^2. This measure is from a regression of daily firm returns on market returns and industry returns. The authors filter the data by removing firm-year observations with less than 30 days of trading activities in a year, and I'm struggling to replicate this filter.


Thanks,


Lucas Balaminut