Hi all,
I am looking to clean a data set I have. I think what I want to achieve should only take a few lines of code, but I can't seem to figure out what they should be! My data set has population density buckets for every county in the US. However, most counties have at least two different population density's within one county boundary. For any county that has more than one row of population density buckets, I want to drop all rows equal to that county except for the highest one. For example, county Adair has a population density of Under 2 and 2 to 6- I want to drop the row where Adair is = Under 2. But, I have 2,000 observations so is there some sort of loop that can do this for me?
The second factor that makes this data cleaning difficult is that there are county names that appear in multiple states. For example, the county name of Alleghany (shown at the bottom of the data example) appears in 2 different states (state number 37 and 51). I want to make sure that when lower population densities are being dropped, they are being dropped for only the specific state, and not deleting the observation of a separate county. i.e. dropping the observation where NAME=Alleghany for STATE=37 and Pop_density_string=Under 2 and where NAME=Alleghany for STATE=51 and Pop_density_string=Under 2.
My data looks like this:
STATEFP NAME Pop_density_string Population_Density
45 Abbeville 6 to 18 5
51 Accomack 18 to 45 17
21 Adair Under 2 0
21 Adair 2 to 6 1
17 Adams Under 2 0
18 Adams Under 2 0
28 Adams Under 2 0
39 Adams Under 2 0
42 Adams 18 to 45 17
55 Adams Under 2 0
50 Addison 2 to 6 1
50 Addison 6 to 18 5
45 Aiken 6 to 18 5
45 Aiken 2 to 6 1
27 Aitkin Under 2 0
37 Alamance 6 to 18 5
37 Alleghany 6 to 18 5
37 Alleghany Under 2 0
51 Alleghany 6 to 18 5
51 Alleghany Under 2 0
STATEFP is a ID number for each state, NAME is the name of a specific county, Pop_density_string specifies the buckets of population density, and Population_Density are the numbers I generated to match the text buckets listed in Pop_density_string.
Any advice on how to carry out this task? Thank you!
0 Response to Drop row where value is lower than another for a specific column
Post a Comment