I'm working with a data set that measures the black-white disparity in proportion of population below the poverty line. To do this, I take population data from census tracts in every state from the ACS, and I create a variable, called the PovertyIndex, defined as the proportion of African Americans in a census tract below the poverty line divided by the proportion of Whites below the poverty line in the same tract. This data set has over 72,000 observations, and the population of each tract is small, between 2,000 and 8,000. Over 99% of observations have PovertyIndex<27, but there are some major outliers, with some as large as 600, due to the small population in each observation. Do you have any recommendations for dealing with these outliers, and tools in state that will accomplish this?
We are planning to use this index with other variables that measure segregation and economic achievement to measure geographical racism. In the end, we plan to switch our data to the state level to avoid these small populations, but for now, we want to use a random forest to measure variable importance, so we want the larger sample size to improve its accuracy.
Attached is example data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte state int county long tract int(blacktotalpop blackbelow year) long whitetotalpop int whitebelow float(whiteprop blackprop PovertyIndexRaw) 1 1 20100 293 30 2010 1424 145 .10182584 .10238907 1.0055313 1 1 20200 1173 294 2010 777 0 0 .25063938 . 1 1 20300 588 175 2010 2896 109 .03763812 .29761904 7.907383 1 1 20400 112 0 2010 4543 285 .06273387 0 0 1 1 20500 1167 104 2010 7968 488 .06124498 .0891174 1.455097 1 1 20600 566 226 2010 2679 132 .04927212 .3992933 8.103839 1 1 20700 628 270 2010 1970 156 .07918782 .4299363 5.429324 1 1 20801 175 46 2010 2560 128 .05 .26285714 5.257143 1 1 20802 1642 470 2010 8114 553 .068153806 .2862363 4.1998577 1 1 20900 599 134 2010 4753 391 .08226383 .22370617 2.7193744 1 1 21000 624 325 2010 2203 257 .1166591 .5208333 4.464575 1 1 21100 1913 535 2010 1319 59 .04473086 .27966544 6.252182 1 3 10100 607 34 2010 2818 67 .023775727 .05601318 2.3558977 1 3 10200 215 12 2010 2276 126 .05536028 .05581395 1.0081949 1 3 10300 1567 333 2010 5644 163 .02888023 .212508 7.358252 1 3 10400 262 236 2010 4370 436 .09977116 .9007633 9.028294 1 3 10500 232 50 2010 3067 179 .05836322 .21551724 3.692689 1 3 10600 2483 1126 2010 1118 130 .11627907 .4534837 3.89996 1 3 10701 256 0 2010 8357 254 .03039368 0 0 1 3 10703 625 223 2010 10790 471 .04365153 .3568 8.173826 1 3 10704 382 0 2010 4625 181 .03913514 0 0 1 3 10705 774 16 2010 6420 676 .10529595 .020671835 .1963213 1 3 10800 2137 843 2010 4844 359 .0741123 .3944782 5.322709 1 3 10903 679 344 2010 3533 337 .09538636 .5066274 5.311319 1 3 10904 191 74 2010 6025 1179 .19568464 .3874345 1.9798925 1 3 10905 308 0 2010 5527 311 .05626922 0 0 1 3 10906 86 0 2010 3903 338 .08660005 0 0 1 3 11000 195 58 2010 3242 503 .15515114 .2974359 1.917072 1 3 11101 293 190 2010 8314 606 .072889104 .6484641 8.896585 1 3 11102 46 0 2010 3367 482 .14315414 0 0 1 3 11201 74 0 2010 4450 403 .0905618 0 0 1 3 11202 849 149 2010 3527 236 .06691239 .1755006 2.6228414 1 3 11300 185 0 2010 3666 301 .08210584 0 0 1 3 11401 725 85 2010 7815 978 .12514396 .11724138 .9368521 1 3 11403 237 34 2010 6029 477 .0791176 .14345992 1.8132492 1 3 11405 0 0 2010 3659 171 .04673408 . . 1 3 11406 30 0 2010 2844 136 .04781997 0 0 1 3 11407 0 0 2010 5002 871 .17413035 . . 1 3 11408 0 0 2010 674 40 .05934718 . . 1 3 11501 493 400 2010 3791 365 .09628066 .811359 8.42702 1 3 11502 1981 1033 2010 5938 393 .0661839 .5214538 7.878861 1 3 11601 17 0 2010 5719 515 .0900507 0 0 1 3 11602 26 0 2010 5173 391 .07558477 0 0 1 3 990000 0 0 2010 0 0 . . . 1 5 950100 2015 629 2010 1403 100 .07127584 .3121588 4.379588 1 5 950200 1640 703 2010 731 24 .032831736 .42865855 13.056226 1 5 950300 1040 521 2010 734 160 .21798365 .50096154 2.298161 1 5 950400 913 278 2010 1484 246 .1657682 .3044907 1.8368462 1 5 950500 968 121 2010 2220 384 .17297298 .125 .7226563 1 5 950600 692 481 2010 1262 144 .1141046 .6950867 6.091662 1 5 950700 769 200 2010 822 118 .14355232 .260078 1.8117298 1 5 950800 895 218 2010 1233 99 .08029197 .24357542 3.033621 1 5 950900 2432 1332 2010 2035 106 .05208845 .54769737 10.514756 1 7 10001 73 21 2010 2911 401 .13775335 .28767124 2.0883067 1 7 10002 304 38 2010 5919 569 .0961311 .125 1.3003076 1 7 10003 833 141 2010 3963 172 .04340146 .1692677 3.900046 1 7 10004 2151 650 2010 5988 788 .13159652 .302185 2.2962995 1 9 50101 350 114 2010 5465 642 .11747484 .3257143 2.77263 1 9 50102 215 74 2010 5178 481 .09289301 .344186 3.7051876 1 9 50200 0 0 2010 3306 308 .09316394 . . 1 9 50300 21 0 2010 4720 653 .13834746 0 0 1 9 50400 0 0 2010 4200 1025 .2440476 . . 1 9 50500 32 32 2010 6626 849 .1281316 1 7.804476 1 9 50601 0 0 2010 3288 153 .04653285 . . 1 9 50602 61 0 2010 8598 579 .067341246 0 0 1 9 50700 37 10 2010 8829 1249 .14146562 .27027026 1.9105014 1 11 952100 1428 240 2010 224 2 .008928572 .16806723 18.823528 1 11 952200 4406 1950 2010 939 10 .010649627 .4425783 41.55811 1 11 952500 2094 373 2010 1161 14 .01205857 .178128 14.7719 1 13 952700 892 509 2010 1311 253 .19298245 .5706278 2.9568896 1 13 952800 335 0 2010 1338 92 .068759345 0 0 1 13 952900 1220 571 2010 653 37 .05666156 .4680328 8.260146 1 13 953000 404 96 2010 869 86 .09896433 .23762377 2.401105 1 13 953100 2163 500 2010 565 61 .1079646 .23116043 2.141076 1 13 953200 1785 699 2010 2401 298 .12411495 .39159665 3.1551125 1 13 953300 115 0 2010 1843 200 .10851872 0 0 1 13 953400 1513 731 2010 1031 352 .3414161 .4831461 1.415124 1 13 953500 403 191 2010 1214 326 .26853377 .4739454 1.764938 1 15 200 1764 292 2010 1299 149 .11470362 .1655329 1.4431357 1 15 300 2616 1431 2010 451 77 .1707317 .54701835 3.203965 1 15 400 1870 590 2010 1043 173 .1658677 .315508 1.902167 1 15 500 1438 781 2010 172 61 .35465115 .54311544 1.5314075 1 15 600 1682 979 2010 404 163 .4034654 .5820452 1.442615 1 15 700 1282 521 2010 1295 214 .16525097 .4063963 2.459267 1 15 800 463 248 2010 282 19 .06737588 .53563714 7.949983 1 15 900 481 89 2010 2616 121 .04625382 .1850312 4.000344 1 15 1000 1221 167 2010 4165 197 .04729892 .13677314 2.891676 1 15 1100 807 309 2010 4524 357 .07891247 .3828996 4.852207 1 15 1201 1170 511 2010 1762 139 .07888763 .4367521 5.536383 1 15 1202 502 85 2010 3134 543 .173261 .1693227 .9772696 1 15 1300 0 0 2010 2396 834 .3480801 . . 1 15 1400 996 183 2010 2314 136 .05877269 .18373494 3.126196 1 15 1500 106 0 2010 4798 437 .09107962 0 0 1 15 1600 313 124 2010 2915 404 .13859348 .39616615 2.858476 1 15 1700 1247 111 2010 5236 469 .0895722 .08901364 .9937642 1 15 1800 510 306 2010 5360 753 .14048508 .6 4.2709165 1 15 2000 462 172 2010 6359 795 .12501965 .3722944 2.977887 1 15 2101 802 755 2010 1450 683 .4710345 .9413965 1.9985723 1 15 2102 260 82 2010 2490 106 .04257028 .3153846 7.408563 1 15 2103 1505 450 2010 4335 1098 .2532872 .2990033 1.1804913 end
0 Response to How to Deal with Outlying Data
Post a Comment