This is my first post to Statalist. Please let me know if there is anything missing in terms of posting protocol.
I need a little help creating Gini coefficients for census tracts using categorical income data.
I’m building a longitudinal census tract-level dataset that looks at the impact of segregation and inequality on housing markets across the country. I'm still building it and having trouble constructing Gini coefficients for each census tract. I have block group data nested in census tracts and data from the 2000 decennial census, as well as 2008-2012 and ACS 2013-2017 5-year ACS estimates—so three time points. I have household income data, which is the count of the individuals in each income bracket within that geographical area, which I have collapsed into 6 brackets: inc1, inc2, inc3, inc4, inc5, inc6.
While the ACS provides Gini coefficients, it does not for 2000 decennial data or before. So, I was planning on constructing my own Gini coefficients using the income brackets to calculate a Gini coefficient for by each census tract, so that it is comparable to what the US Census gives you with ACS data. Following Fan et. al. (2017) [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684591/], since the brackets are ordinal-categorical data instead of individual level data, my coefficients will be underestimated. Therefore, I’ll need to estimate them for all 3 years to be consistent. Since I have the tract-level Gini coefficients produced by the US Census, I can compare my estimates with those.
My problem is how to exactly compute them. I’ve emailed Fan et. al and am still waiting to hear back about how they structured their data and wrote their syntax. They use Whitehead’s “relsgini” command, but there is very little information in Stata ado file, and “relsgini” only accepts one variable entry, which makes me think that I’ll need to convert the income brackets from wide to long. But when I do that, it only gives me an overall Gini coefficient statistic:
Code:
. relsgini inc [fw=pop]
Donaldson-Weymark relative S-Gini inequality measures of inc
------------------------------------------------------------------------------
delta = 2 .55093967
Does anyone have any ideas? I’ve also been trying to calculate them with a number of different Stata user-written programs, i.e., Reardon’s seg, inequal7, and ineqdeco, but haven't had any luck. Readon’s seg command says there are too many values:
Code:
. seg inc1 inc2 inc3 inc4 inc5, g by(tractidn) u(blkgrpidn) gen(g gini i index)
Code:
Note: Some by-groups have fewer units than groups. Multigroup indices for these by-groups should be interpreted with caution. Group Variables: inc1 inc2 inc3 inc4 inc5 Total Counts and Diversity Measures too many values r(134);
Inequal7 seems like I'm getting closer, but it doesn’t return any new variables with coefficients (only gives me one value), won't let me enter more than 5 brackets, and it’s unclear how to specify that I want the Gini coefficents for census tracts.
Code:
.
. inequal7 inc1 inc2 inc3 inc4 inc5 [fw=pop] ,returnscalars
Warning: inc1 has 2408 values == 0 *used* in calculations
(except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc2 has 1671 values == 0 *used* in calculations
(except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc3 has 3557 values == 0 *used* in calculations
(except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc4 has 14690 values == 0 *used* in calculations
(except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc5 has 43386 values == 0 *used* in calculations
(except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
----------------------------------------------------------------------------------
Inequality measures | inc1 inc2 inc3 inc4 inc5
-----------------------------------------+----------------------------------------
Relative mean deviation | 0.30234 0.27124 0.30575 0.37661 0.45320
Coefficient of variation | 0.91858 0.86620 0.95843 1.19158 1.46555
Standard deviation of logs | 0.87052 0.72200 0.83326 0.99858 1.06416
Gini coefficient | 0.42406 0.38228 0.42749 0.51864 0.61085
Mehran measure | 0.56660 0.50508 0.56279 0.67280 0.77520
Piesch measure | 0.35279 0.32088 0.35984 0.44156 0.52867
Kakwani measure | 0.15556 0.12848 0.15750 0.22505 0.30583
Theil index (GE(a), a = 1) | 0.30843 0.26063 0.31756 0.43934 0.52786
Mean Log Deviation (GE(a), a = 0) | 0.33585 0.25528 0.32601 0.46683 0.55369
Entropy index (GE(a), a = -1) | 0.60970 0.37482 0.54366 0.84968 1.00210
Half (Coeff.Var. squared) (GE(a), a = 2) | 0.42189 0.37515 0.45929 0.70993 1.07392
----------------------------------------------------------------------------------
Code:
. ineqdeco inc [fw=pop], by (tractidn)
too many values
While I saw this post response from Stephen P. Jenkins, I can't figure out how to fit this loop to my data structure: ://www.stata.com/statalist/archive/2004-03/msg00287.html
Any help or if you could point me in the right direction, I’d very much appreciate it.
I am using Stata 14.1.
Thanks. Best,
Kasey
0 Response to Creating Gini Coefficients from Categorical Income Data in 2000 US Census
Post a Comment