identifier | type | sales | unit |
00.00.01 | 1 | 400 | kg |
00.00.01 | 0 | 50 | kg |
00.00.01 | 0 | 100 | kg |
00.01.01 | 1 | 500 | number |
01.01.01 | 1 | 200 | sqft |
01.01.01 | 0 | 200 | sqft |
01.01.02 | 1 | 300 | kg |
01.01.02 | 0 | 25 | kg |
01.01.02 | 0 | 50 | kg |
in the actual data (which I'm not sure I'd be allowed to give an actual sample of), there are about 4000 observations, with ~2100 unique identifier values. My overall goal is to figure how what proportion of sales is accounted for by type 1.
Normally, I could just run prop or something, or simply find the sum of sales for each type and compute the overall proportion directly. However, the fact that the sales are in different, often incomparable units makes this complicated. So, what I am thinking is that, for each group, I would find the percentage of sales accounted for by each type in each identifier group, and then compute the mean of these percentages across identifier groups. That is, for example here, the proportion of sales for 00.00.01 accounted for by type 1 is .727; for 00.01.01 it is 1; for 01.01.01 it is .5; for 01.01.02 it is .8. The mean of these would then be .842.
First step, I imagine would be
Code:
collapse (sum) sales, by(identifier type)
But I am getting stuck with the rest and would appreciate any advice.
0 Response to proportion of sales in each group accounted for by type
Post a Comment