Hi Statlist,

so, I have a problem constructing two variables. I briefly describe my data here below: the database consists of a series of firms making products through molecules. So to each firm can belong different products using different molecules each one. The variable data_lancio describes the launch date of the product and of the associated molecule, while the variable Year represents the year in which the other variables (not displayed in the data) are available. The dataex his here represented:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(idfirm idproduct) float(id_molecule Year) int data_lancio
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2013 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2014 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1  287 1104 2015 2013
1 1474  895 2013 2013
1 1474  895 2013 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2014 2013
1 1474  895 2015 2013
1 1474  895 2015 2013
1 1474  895 2015 2013
1 1474  895 2015 2013
1 1474  895 2015 2013
1 1474  895 2015 2013
1 1474  895 2015 2013
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2014 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3026  301 2015 2014
1 3051  301 2008 2008
1 3051  301 2008 2008
1 3051  301 2008 2008
1 3051  301 2008 2008
1 3051  301 2008 2008
1 3051  301 2008 2008
1 3051  301 2008 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2009 2008
1 3051  301 2010 2008
1 3051  301 2010 2008
1 3051  301 2010 2008
1 3051  301 2010 2008
1 3051  301 2010 2008
1 3051  301 2010 2008
1 3051  301 2010 2008
end
Now, my aim is to generate two dummies: the first, taking on 1 if the molecule is observed for the first time in all the sample and 0 otherwise (so for instance molecule 736 is observed for the first time in the sample in 1986 so I would like to have 1 for 736 just in that launch date) and the other if the molecule is observed for the first time in a firm (so 1 if it is the first time that the ith firm employs the molecule and 0 otherwise). This I tried to accomplish UNSUCCESFULLY with the following:

Code:
bysort id_molecule (Year): gen counter_new_marke =_n == 1 // questa dovrebbe essere per new to the market
bysort idfirm id_molecule (idpr Year) : gen counter_new_fir = _n == 1 //questa dvrebbe essere per new_to the firm
Another complication comes from the fact that after I had done the counter I need to collapse twice the dataset, the first time by idfirm idproduct and Year:
Code:
collapse (sum) salesmnf counter_new_fir counter_new_marke (min) anno_numeric data_lancio (first) id_molecule firstposition molecule atc4 crp prd internationalp seq (last) sequence ,by (idfirm idprod Year)
and the second time by idfirm and Year:
Code:
collapse (mean) avsales avsales_existing avsales_new (first) agepr molecule idprod salesmnf data_lancio anno_numeric firstpos atc4 crp prd internationalp (max) numero_nuovi, by(idfirm Year)
Can someone please help me creating the dummy variable for new molecule to the market and to the firm and keep it after the collapses?