Hello Statlisters,

I have a dataset with 42 variables and 682 observations of all the starting level employment (counts) for all industries in the state of New York. The data is being used to determine the top ten industries (by race) in the state over the entire period.
* Example generated by -dataex-. To install: ssc install dataex
input str11 county int year long(accommodationan administrativea) int agriculturefore long(artsentertainme construction11 educationalserv)
"Albany"   2008 6672  7421 19 472  904 6363
"Albany"   2009 6348  6815 36 432  846 6546
"Albany"   2010 6456  6681 43 404  784 6609
"Albany"   2011 6694  6712 42 461  920 7247
"Albany"   2012 6652  7275 27 444  947 7146
"Albany"   2013 6847  7312 35 464  962 7158
"Albany"   2014 6912  7497 33 558 1257 7211
"Albany"   2015 7532  8531 47 677 1451 7344
"Albany"   2016 8448 12286 51 748 1317 7644
"Albany"   2017 8727 12950 74 656 1376 8191
"Albany"   2018 9725 12794 65 653 1418 8144
"Allegany" 2008  181    40  0   0   18  364
"Allegany" 2009  228    39  .   3    6  371
"Allegany" 2010  237    22  .   0   17  413
"Allegany" 2011  263    21  0   0   13  456
"Allegany" 2012  278    22  0   0    3  456
"Allegany" 2013  286    30  0   0    3  483
"Allegany" 2014  388    36  0   3    8  535
"Allegany" 2015  370    45  0   0    0  541
"Allegany" 2016  391    29  .   0   13  536
My old code that I saved from another project that is similar has the following:
bysort county  year: egen yr_agri = sum(agriculturefore)
bysort county year: egen agrisum = sum(agriculturefore)
 generate agripct = (agrisum / yr_agri )*100
I assumed that after running these codes I could simply use collapse (mean) var-var, by (year) but this ends up with all the variables having the same exact percentage across the data. Where am I going wrong? I am not super comfortable with loops so if there's a brut force way to first find the sum of employment for all sectors across all years by each county and then find the overall percentage that would be great.