Hey everyone,

This is my first post to Statalist. Please let me know if there is anything missing in terms of posting protocol.

I need a little help creating Gini coefficients for census tracts using categorical income data.

I’m building a longitudinal census tract-level dataset that looks at the impact of segregation and inequality on housing markets across the country. I'm still building it and having trouble constructing Gini coefficients for each census tract. I have block group data nested in census tracts and data from the 2000 decennial census, as well as 2008-2012 and ACS 2013-2017 5-year ACS estimates—so three time points. I have household income data, which is the count of the individuals in each income bracket within that geographical area, which I have collapsed into 6 brackets: inc1, inc2, inc3, inc4, inc5, inc6.

While the ACS provides Gini coefficients, it does not for 2000 decennial data or before. So, I was planning on constructing my own Gini coefficients using the income brackets to calculate a Gini coefficient for by each census tract, so that it is comparable to what the US Census gives you with ACS data. Following Fan et. al. (2017) [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684591/], since the brackets are ordinal-categorical data instead of individual level data, my coefficients will be underestimated. Therefore, I’ll need to estimate them for all 3 years to be consistent. Since I have the tract-level Gini coefficients produced by the US Census, I can compare my estimates with those.

My problem is how to exactly compute them. I’ve emailed Fan et. al and am still waiting to hear back about how they structured their data and wrote their syntax. They use Whitehead’s “relsgini” command, but there is very little information in Stata ado file, and “relsgini” only accepts one variable entry, which makes me think that I’ll need to convert the income brackets from wide to long. But when I do that, it only gives me an overall Gini coefficient statistic:

Code:
. relsgini inc [fw=pop] 

Donaldson-Weymark relative S-Gini inequality measures of inc
------------------------------------------------------------------------------
delta = 2                              .55093967
Also, there’s very little documentation on how to format the distributional sensitivity parameters and it’s unclear to me how I specify that I want the Gini Coefficients by tracts.

Does anyone have any ideas? I’ve also been trying to calculate them with a number of different Stata user-written programs, i.e., Reardon’s seg, inequal7, and ineqdeco, but haven't had any luck. Readon’s seg command says there are too many values:
Code:
. seg inc1 inc2 inc3 inc4 inc5, g by(tractidn) u(blkgrpidn) gen(g gini i index)
Code:
Note: Some by-groups have fewer units than groups. Multigroup indices for these by-groups should be interpreted with caution.

Group Variables:   inc1 inc2 inc3 inc4 inc5

Total Counts and Diversity Measures
too many values
r(134);


Inequal7 seems like I'm getting closer, but it doesn’t return any new variables with coefficients (only gives me one value), won't let me enter more than 5 brackets, and it’s unclear how to specify that I want the Gini coefficents for census tracts.
Code:
. 

. inequal7 inc1 inc2 inc3 inc4 inc5 [fw=pop] ,returnscalars 
Warning: inc1 has 2408 values == 0 *used* in calculations
    (except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc2 has 1671 values == 0 *used* in calculations
    (except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc3 has 3557 values == 0 *used* in calculations
    (except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc4 has 14690 values == 0 *used* in calculations
    (except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).
Warning: inc5 has 43386 values == 0 *used* in calculations
    (except for SD logs, GE(-1), GE(0) (Mean log-deviation) and GE(1) (Theil)).

----------------------------------------------------------------------------------
                     Inequality measures |    inc1    inc2    inc3    inc4    inc5
-----------------------------------------+----------------------------------------
                 Relative mean deviation | 0.30234 0.27124 0.30575 0.37661 0.45320
                Coefficient of variation | 0.91858 0.86620 0.95843 1.19158 1.46555
              Standard deviation of logs | 0.87052 0.72200 0.83326 0.99858 1.06416
                        Gini coefficient | 0.42406 0.38228 0.42749 0.51864 0.61085
                          Mehran measure | 0.56660 0.50508 0.56279 0.67280 0.77520
                          Piesch measure | 0.35279 0.32088 0.35984 0.44156 0.52867
                         Kakwani measure | 0.15556 0.12848 0.15750 0.22505 0.30583
              Theil index (GE(a), a = 1) | 0.30843 0.26063 0.31756 0.43934 0.52786
       Mean Log Deviation (GE(a), a = 0) | 0.33585 0.25528 0.32601 0.46683 0.55369
           Entropy index (GE(a), a = -1) | 0.60970 0.37482 0.54366 0.84968 1.00210
Half (Coeff.Var. squared) (GE(a), a = 2) | 0.42189 0.37515 0.45929 0.70993 1.07392
----------------------------------------------------------------------------------
Finally, when I use ineqdeco, it says that I have too many values and won't calculate anything:
Code:
 . ineqdeco inc [fw=pop], by (tractidn)
too many values



While I saw this post response from Stephen P. Jenkins, I can't figure out how to fit this loop to my data structure: ://www.stata.com/statalist/archive/2004-03/msg00287.html
Any help or if you could point me in the right direction, I’d very much appreciate it.

I am using Stata 14.1.

Thanks. Best,
Kasey