Hello Statalisters,

I'm using Stata 16.1, and I have panel data in which the unit of observation is the region (called czone). I have a few measures of interest that, for a given indicator, represents each region’s share of the nation in a given year. Each of these ends in 'shr'. Here is a random sample of my data, generated thru dataex.

Code:
year czone  normshr lesshr leoshr leshr
1960  1202 .0003638664 .00004861347 .00023602384            0
1990 10000  .000834548  .0003118703  .0005555756  .0002700112
1990 10102 .0006462617   .000217258  .0004091604  .0000675028
1970 17501 .0015207013  .0008979761   .001310687  .0004139073
2010 18400 .0006034978 .00019348213  .0004174155 .00013348313
2000 26201 .0007506197  .0003190925  .0006227999  .0003383522
2019 26203 .0006399278  .0001512621  .0004416599 .00013742084
1970 27603 .0005867054  .0002417628  .0006869808            0
1960 35002 .0003008295  .0000763926    .00029739 .00022237047
1950 36404 .0007740941   .000380904  .0007541478            0

For each year in the data, and within that for each 'shr' variable, what I want is to generate a variable that counts the number of regions it takes to reach a cumulative sum of 0.8. Ultimately, what I was thinking was that I want a summary dataset where the unit of observation is the year, that looks like:


Code:
year leshr80 lesshr80 leoshr80 normshr80
where the variables ending in 80 display the number of regions it took to get to an 80 percent cumulative sum on the underlying share variable. I don't care about the naming - just trying to make my goal clear.

I’m struggling to figure out how to do this. I see three issues - I am not sure how to solve any of them.

First, I need cumulative counts, from data sorted in descending order within each year. But ‘gsort’ does not play with ‘by’. I could chop the data up into years, but this seems clumsy.

Second, I tried dropping all but one year to see if I could make a go of it year by year, but I’m still stumped, this time the (seemingly simple) task of being able to generate a descending cumulative sum of the shares (highest to lowest). As an example, using the share indicator called leshr, I tried the following:

Code:
gsort -leshr            
g cumut=0
replace cumut=leshr[_n] + cumut[_n-1]
But this returned all missings. I think I see why, but I'm not sure what to do about it.

Third, if I could fix the above problems, I'm still not sure how to efficiently create a summary variable for each 'shr' variable that told me how many observations it took to get to the cumulative total of 0.8.

I hope I'm making sense. Thanks in advance.

Tom