I then thought, maybe it's more efficient to create one preterm birth variable, which in my code is pretermrate2 - the preterm birth rate for each individual observation - and then graph the mean of pretermrate2 using each observation's total deliveries as the frequency weight, which would in effect give me a weighted average. If it works I could cut out two lines of code and create fewer new variables.
The problem is, when I run both versions of this code, on the final graph I get preterm birth numbers that are slightly different. In most cases they are off by between .05-.5, and in only one case is the number the same. I suspect this problem lies in the ado file for Stata weights, but I'm really not sure how to find out if that's true, and running this method on different data gave the same numbers for both graphs.
If anyone knows why one kind of code produces different numbers than the other, I would greatly appreciate it!
* note - in the sample code I gave, the final graphed averages are a bit farther apart than when using the full unedited dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str10 district int(pregpreterm delinst) * Example generated by -dataex-. To install: ssc install dataex clear input str10 district int(pregpreterm delinst) "York" 10 158 "York" 18 272 "York" 11 155 "York" 13 153 "York" 12 206 "York" 14 321 "York" 12 215 "York" 12 222 "York" 14 194 "York" 18 208 "Jersey" 15 220 "Jersey" 18 299 "Jersey" 12 146 "Jersey" 16 175 "Jersey" 10 181 "Jersey" 13 179 "Jersey" 12 175 "Jersey" 15 274 "Jersey" 17 189 "Jersey" 9 160 "Jersey" 12 139 "Jersey" 16 210 "Jersey" 14 171 "Jersey" 14 207 "Jersey" . 114 "Jersey" . 84 "Jersey" . 69 "Jersey" . 88 "Jersey" . 75 "Guernsey" 1 89 "Guernsey" 1 138 "Guernsey" . 55 "Guernsey" . 96 "Guernsey" . 59 "Guernsey" . 102 "Guernsey" . 66 "Guernsey" 1 76 "Guernsey" 1 92 "Guernsey" 1 114 "Guernsey" . 67 "Guernsey" 1 72 "Guernsey" 1 103 "Guernsey" . 44 "Guernsey" . 122 "Guernsey" . 117 "Guernsey" . 135 "Guernsey" . 57 "Guernsey" 1 73 "Mersey" . 35 "Mersey" . 59 "Mersey" . 31 "Mersey" 1 37 "Mersey" . 46 "Mersey" . 37 "Mersey" 1 32 "Mersey" 1 37 "Mersey" . 46 "Mersey" . 40 "Mersey" . 35 "Mersey" . 48 "Mersey" . 34 "Mersey" . 53 "Mersey" . 50 "Mersey" . 44 "Mersey" . 35 "Mersey" . 52 "Mersey" 1 41 "Mersey" 1 21 "Mersey" . 32 "Mersey" 1 41 "Mersey" . 56 "Mersey" . 20 "Mersey" . 94 "Mersey" 5 145 "Mersey" . 117 "Mersey" 5 107 "Mersey" 0 83 "Mersey" . 106 "Mersey" 2 78 "Mersey" 3 83 "Mersey" 2 101 "Mersey" 3 152 "Percy" 2 102 "Percy" 0 61 "Percy" 0 152 "Percy" 5 192 "Percy" 5 95 "Percy" . 97 "Percy" 5 103 "Percy" 3 132 "Percy" 3 67 "Percy" 3 64 "Percy" 3 65 "Percy" . 128 "Percy" 5 138 "Percy" 5 92 "Percy" 0 45 "Percy" . 40 "Percy" . 49 "Percy" . 53 end tempfile g1 tempfile g2 bys district: egen pretermtotal=total(pregpreterm), missing bys district: egen totaldels=total(delinst), missing gen pretermrate1=100*pretermtotal/totaldels sum delinst, d local myn `r(sum)' graph hbar pretermrate1, over(district) blabel(bar, /// size(small)) title("Preterm birth rate by district") ytitle("rate of preterm births") /// note("Source: MP HMIS data for FY '17-'18 and '18-'19, n = `:di %-12.0fc `myn''") /// bargap(40) saving(`g1', replace) gen pretermrate2=100* pregpreterm/delinst graph hbar (mean) pretermrate2 [fw=delinst], over(district) blabel(bar, /// size(small)) title("Preterm birth rate by district") ytitle("rate of preterm births") /// note("Source: MP HMIS data for FY '17-'18 and '18-'19, n = `:di %-12.0fc `myn''") /// bargap(40) saving(`g2', replace) graph combine "`g1'" "`g2'" // compare output from both methods
0 Response to Using frequency weights on graph bar to produce weighted averages.
Post a Comment