Consider the following example data
identifier type sales unit
00.00.01 1 400 kg
00.00.01 0 50 kg
00.00.01 0 100 kg
00.01.01 1 500 number
01.01.01 1 200 sqft
01.01.01 0 200 sqft
01.01.02 1 300 kg
01.01.02 0 25 kg
01.01.02 0 50 kg



in the actual data (which I'm not sure I'd be allowed to give an actual sample of), there are about 4000 observations, with ~2100 unique identifier values. My overall goal is to figure how what proportion of sales is accounted for by type 1.

Normally, I could just run prop or something, or simply find the sum of sales for each type and compute the overall proportion directly. However, the fact that the sales are in different, often incomparable units makes this complicated. So, what I am thinking is that, for each group, I would find the percentage of sales accounted for by each type in each identifier group, and then compute the mean of these percentages across identifier groups. That is, for example here, the proportion of sales for 00.00.01 accounted for by type 1 is .727; for 00.01.01 it is 1; for 01.01.01 it is .5; for 01.01.02 it is .8. The mean of these would then be .842.

First step, I imagine would be
Code:
 collapse (sum) sales, by(identifier type)
, so that means aren't weighted strangely.

But I am getting stuck with the rest and would appreciate any advice.