I'm using Stata 16.1, and I have panel data in which the unit of observation is the region (called czone). I have a few measures of interest that, for a given indicator, represents each region’s share of the nation in a given year. Each of these ends in 'shr'. Here is a random sample of my data, generated thru dataex.
Code:
year czone normshr lesshr leoshr leshr 1960 1202 .0003638664 .00004861347 .00023602384 0 1990 10000 .000834548 .0003118703 .0005555756 .0002700112 1990 10102 .0006462617 .000217258 .0004091604 .0000675028 1970 17501 .0015207013 .0008979761 .001310687 .0004139073 2010 18400 .0006034978 .00019348213 .0004174155 .00013348313 2000 26201 .0007506197 .0003190925 .0006227999 .0003383522 2019 26203 .0006399278 .0001512621 .0004416599 .00013742084 1970 27603 .0005867054 .0002417628 .0006869808 0 1960 35002 .0003008295 .0000763926 .00029739 .00022237047 1950 36404 .0007740941 .000380904 .0007541478 0
For each year in the data, and within that for each 'shr' variable, what I want is to generate a variable that counts the number of regions it takes to reach a cumulative sum of 0.8. Ultimately, what I was thinking was that I want a summary dataset where the unit of observation is the year, that looks like:
Code:
year leshr80 lesshr80 leoshr80 normshr80
I’m struggling to figure out how to do this. I see three issues - I am not sure how to solve any of them.
First, I need cumulative counts, from data sorted in descending order within each year. But ‘gsort’ does not play with ‘by’. I could chop the data up into years, but this seems clumsy.
Second, I tried dropping all but one year to see if I could make a go of it year by year, but I’m still stumped, this time the (seemingly simple) task of being able to generate a descending cumulative sum of the shares (highest to lowest). As an example, using the share indicator called leshr, I tried the following:
Code:
gsort -leshr g cumut=0 replace cumut=leshr[_n] + cumut[_n-1]
Third, if I could fix the above problems, I'm still not sure how to efficiently create a summary variable for each 'shr' variable that told me how many observations it took to get to the cumulative total of 0.8.
I hope I'm making sense. Thanks in advance.
Tom
0 Response to Cumulative sums in descending order within groups
Post a Comment