I have a table of aggregated data that I downloaded from Census. The first three years of the data are included below. I would like to make a stacked bar chart with year on the horizontal axis and the share of firms by age (by 5 new bins) on the vertical axis - so the vertical axis should be 100%. The aim is to show how the share of firms by age has changed over time.
There are many examples on this site and elsewhere using twoway bar and tabplot doing charts like this, but I have been unsuccessful in adapting any of that code to already-aggregated data like what I have .
In the course of trying to use twoway bar and tablot I generated the last four of the 7 variables in the data example below. The ones I made are:
- totalfirms, a sum of the total firms by year
- agegroup, because I'd like to use different age bins from what the data came with
- countfirms_agegroup, which counts the firms in each age group
- percentfirms_agegroup, which is countfirms_agegroup/totalfirms. This is what I'm trying to graph.
I know Stata is not really built for data like what I have downloaded because this does not have individual observations. However, I think the biggest problem is that I'm not used to working with already aggregated data in Stata so it is hard for me to think "outside the box". Thank you for any suggestions.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year str16 fage4 float agegroup long firms float(totalfirms countfirms_agegroup percentfirms_agegroup) 2001 "1" 1 374542 4881601 845753 17.32532 2001 "0" 1 471211 4881601 845753 17.32532 2001 "2" 2 327127 4881601 625529 12.814013 2001 "3" 2 298402 4881601 625529 12.814013 2001 "5" 3 231767 4881601 492267 10.08413 2001 "4" 3 260500 4881601 492267 10.08413 2001 "6 to 10" 4 863051 4881601 863051 17.67967 2001 "11 to 15" 5 634854 4881601 2055001 42.09687 2001 "21 to 25" 5 285459 4881601 2055001 42.09687 2001 "Left Censored" 5 659119 4881601 2055001 42.09687 2001 "16 to 20" 5 475569 4881601 2055001 42.09687 2002 "1" 1 368030 4908740 864168 17.604681 2002 "0" 1 496138 4908740 864168 17.604681 2002 "3" 2 287290 4908740 607289 12.371586 2002 "2" 2 319999 4908740 607289 12.371586 2002 "4" 3 263886 4908740 498119 10.147593 2002 "5" 3 234233 4908740 498119 10.147593 2002 "6 to 10" 4 874913 4908740 874913 17.823576 2002 "Left Censored" 5 603782 4908740 2064251 42.05256 2002 "21 to 25" 5 348478 4908740 2064251 42.05256 2002 "11 to 15" 5 626335 4908740 2064251 42.05256 2002 "16 to 20" 5 485656 4908740 2064251 42.05256 2003 "0" 1 500847 4963081 877087 17.672228 2003 "1" 1 376240 4963081 877087 17.672228 2003 "3" 2 282181 4963081 598962 12.06835 2003 "2" 2 316781 4963081 598962 12.06835 2003 "5" 3 239564 4963081 497553 10.025084 2003 "4" 3 257989 4963081 497553 10.025084 2003 "6 to 10" 4 895085 4963081 895085 18.034866 2003 "Left Censored" 5 576707 4963081 2094394 42.19947 2003 "21 to 25" 5 338754 4963081 2094394 42.19947 2003 "16 to 20" 5 499208 4963081 2094394 42.19947 2003 "26+" 5 65894 4963081 2094394 42.19947 2003 "11 to 15" 5 613831 4963081 2094394 42.19947 end label values agegroup agegrouplabel label def agegrouplabel 1 "0-1", modify label def agegrouplabel 2 "2-3", modify label def agegrouplabel 3 "4-5", modify label def agegrouplabel 4 "6-10", modify label def agegrouplabel 5 "11+", modify
PS. Through reading many threads on stacked bar charts on this site, I can see that people almost always recommend a sort of "spaced out" bar chart, like the second graph posted by Maarten Buis here: https://www.statalist.org/forums/for...ked-bar-charts. I am certainly open to exploring that and other ways of charting this as traditional bar charts do have their drawbacks, but am taking this one step at a time.
0 Response to Stacked bar chart using pre-aggregated census data
Post a Comment