Hi All,

I'm working with a data set that measures the black-white disparity in proportion of population below the poverty line. To do this, I take population data from census tracts in every state from the ACS, and I create a variable, called the PovertyIndex, defined as the proportion of African Americans in a census tract below the poverty line divided by the proportion of Whites below the poverty line in the same tract. This data set has over 72,000 observations, and the population of each tract is small, between 2,000 and 8,000. Over 99% of observations have PovertyIndex<27, but there are some major outliers, with some as large as 600, due to the small population in each observation. Do you have any recommendations for dealing with these outliers, and tools in state that will accomplish this?

We are planning to use this index with other variables that measure segregation and economic achievement to measure geographical racism. In the end, we plan to switch our data to the state level to avoid these small populations, but for now, we want to use a random forest to measure variable importance, so we want the larger sample size to improve its accuracy.

Attached is example data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte state int county long tract int(blacktotalpop blackbelow year) long whitetotalpop int whitebelow float(whiteprop blackprop PovertyIndexRaw)
1  1  20100  293   30 2010  1424  145  .10182584  .10238907 1.0055313
1  1  20200 1173  294 2010   777    0          0  .25063938         .
1  1  20300  588  175 2010  2896  109  .03763812  .29761904  7.907383
1  1  20400  112    0 2010  4543  285  .06273387          0         0
1  1  20500 1167  104 2010  7968  488  .06124498   .0891174  1.455097
1  1  20600  566  226 2010  2679  132  .04927212   .3992933  8.103839
1  1  20700  628  270 2010  1970  156  .07918782   .4299363  5.429324
1  1  20801  175   46 2010  2560  128        .05  .26285714  5.257143
1  1  20802 1642  470 2010  8114  553 .068153806   .2862363 4.1998577
1  1  20900  599  134 2010  4753  391  .08226383  .22370617 2.7193744
1  1  21000  624  325 2010  2203  257   .1166591   .5208333  4.464575
1  1  21100 1913  535 2010  1319   59  .04473086  .27966544  6.252182
1  3  10100  607   34 2010  2818   67 .023775727  .05601318 2.3558977
1  3  10200  215   12 2010  2276  126  .05536028  .05581395 1.0081949
1  3  10300 1567  333 2010  5644  163  .02888023    .212508  7.358252
1  3  10400  262  236 2010  4370  436  .09977116   .9007633  9.028294
1  3  10500  232   50 2010  3067  179  .05836322  .21551724  3.692689
1  3  10600 2483 1126 2010  1118  130  .11627907   .4534837   3.89996
1  3  10701  256    0 2010  8357  254  .03039368          0         0
1  3  10703  625  223 2010 10790  471  .04365153      .3568  8.173826
1  3  10704  382    0 2010  4625  181  .03913514          0         0
1  3  10705  774   16 2010  6420  676  .10529595 .020671835  .1963213
1  3  10800 2137  843 2010  4844  359   .0741123   .3944782  5.322709
1  3  10903  679  344 2010  3533  337  .09538636   .5066274  5.311319
1  3  10904  191   74 2010  6025 1179  .19568464   .3874345 1.9798925
1  3  10905  308    0 2010  5527  311  .05626922          0         0
1  3  10906   86    0 2010  3903  338  .08660005          0         0
1  3  11000  195   58 2010  3242  503  .15515114   .2974359  1.917072
1  3  11101  293  190 2010  8314  606 .072889104   .6484641  8.896585
1  3  11102   46    0 2010  3367  482  .14315414          0         0
1  3  11201   74    0 2010  4450  403   .0905618          0         0
1  3  11202  849  149 2010  3527  236  .06691239   .1755006 2.6228414
1  3  11300  185    0 2010  3666  301  .08210584          0         0
1  3  11401  725   85 2010  7815  978  .12514396  .11724138  .9368521
1  3  11403  237   34 2010  6029  477   .0791176  .14345992 1.8132492
1  3  11405    0    0 2010  3659  171  .04673408          .         .
1  3  11406   30    0 2010  2844  136  .04781997          0         0
1  3  11407    0    0 2010  5002  871  .17413035          .         .
1  3  11408    0    0 2010   674   40  .05934718          .         .
1  3  11501  493  400 2010  3791  365  .09628066    .811359   8.42702
1  3  11502 1981 1033 2010  5938  393   .0661839   .5214538  7.878861
1  3  11601   17    0 2010  5719  515   .0900507          0         0
1  3  11602   26    0 2010  5173  391  .07558477          0         0
1  3 990000    0    0 2010     0    0          .          .         .
1  5 950100 2015  629 2010  1403  100  .07127584   .3121588  4.379588
1  5 950200 1640  703 2010   731   24 .032831736  .42865855 13.056226
1  5 950300 1040  521 2010   734  160  .21798365  .50096154  2.298161
1  5 950400  913  278 2010  1484  246   .1657682   .3044907 1.8368462
1  5 950500  968  121 2010  2220  384  .17297298       .125  .7226563
1  5 950600  692  481 2010  1262  144   .1141046   .6950867  6.091662
1  5 950700  769  200 2010   822  118  .14355232    .260078 1.8117298
1  5 950800  895  218 2010  1233   99  .08029197  .24357542  3.033621
1  5 950900 2432 1332 2010  2035  106  .05208845  .54769737 10.514756
1  7  10001   73   21 2010  2911  401  .13775335  .28767124 2.0883067
1  7  10002  304   38 2010  5919  569   .0961311       .125 1.3003076
1  7  10003  833  141 2010  3963  172  .04340146   .1692677  3.900046
1  7  10004 2151  650 2010  5988  788  .13159652    .302185 2.2962995
1  9  50101  350  114 2010  5465  642  .11747484   .3257143   2.77263
1  9  50102  215   74 2010  5178  481  .09289301    .344186 3.7051876
1  9  50200    0    0 2010  3306  308  .09316394          .         .
1  9  50300   21    0 2010  4720  653  .13834746          0         0
1  9  50400    0    0 2010  4200 1025   .2440476          .         .
1  9  50500   32   32 2010  6626  849   .1281316          1  7.804476
1  9  50601    0    0 2010  3288  153  .04653285          .         .
1  9  50602   61    0 2010  8598  579 .067341246          0         0
1  9  50700   37   10 2010  8829 1249  .14146562  .27027026 1.9105014
1 11 952100 1428  240 2010   224    2 .008928572  .16806723 18.823528
1 11 952200 4406 1950 2010   939   10 .010649627   .4425783  41.55811
1 11 952500 2094  373 2010  1161   14  .01205857    .178128   14.7719
1 13 952700  892  509 2010  1311  253  .19298245   .5706278 2.9568896
1 13 952800  335    0 2010  1338   92 .068759345          0         0
1 13 952900 1220  571 2010   653   37  .05666156   .4680328  8.260146
1 13 953000  404   96 2010   869   86  .09896433  .23762377  2.401105
1 13 953100 2163  500 2010   565   61   .1079646  .23116043  2.141076
1 13 953200 1785  699 2010  2401  298  .12411495  .39159665 3.1551125
1 13 953300  115    0 2010  1843  200  .10851872          0         0
1 13 953400 1513  731 2010  1031  352   .3414161   .4831461  1.415124
1 13 953500  403  191 2010  1214  326  .26853377   .4739454  1.764938
1 15    200 1764  292 2010  1299  149  .11470362   .1655329 1.4431357
1 15    300 2616 1431 2010   451   77   .1707317  .54701835  3.203965
1 15    400 1870  590 2010  1043  173   .1658677    .315508  1.902167
1 15    500 1438  781 2010   172   61  .35465115  .54311544 1.5314075
1 15    600 1682  979 2010   404  163   .4034654   .5820452  1.442615
1 15    700 1282  521 2010  1295  214  .16525097   .4063963  2.459267
1 15    800  463  248 2010   282   19  .06737588  .53563714  7.949983
1 15    900  481   89 2010  2616  121  .04625382   .1850312  4.000344
1 15   1000 1221  167 2010  4165  197  .04729892  .13677314  2.891676
1 15   1100  807  309 2010  4524  357  .07891247   .3828996  4.852207
1 15   1201 1170  511 2010  1762  139  .07888763   .4367521  5.536383
1 15   1202  502   85 2010  3134  543    .173261   .1693227  .9772696
1 15   1300    0    0 2010  2396  834   .3480801          .         .
1 15   1400  996  183 2010  2314  136  .05877269  .18373494  3.126196
1 15   1500  106    0 2010  4798  437  .09107962          0         0
1 15   1600  313  124 2010  2915  404  .13859348  .39616615  2.858476
1 15   1700 1247  111 2010  5236  469   .0895722  .08901364  .9937642
1 15   1800  510  306 2010  5360  753  .14048508         .6 4.2709165
1 15   2000  462  172 2010  6359  795  .12501965   .3722944  2.977887
1 15   2101  802  755 2010  1450  683   .4710345   .9413965 1.9985723
1 15   2102  260   82 2010  2490  106  .04257028   .3153846  7.408563
1 15   2103 1505  450 2010  4335 1098   .2532872   .2990033 1.1804913
end