I have a dataset with two geographical units of observations, districts and labor market regions. Labor market regions are just assembled districts, where each district is only part in one single labor market region. For districts, there exist shapefiles (Polygon files, which can be imported into a GIS program) and for labor market regions not. I also have satellite raster-data where each raster contains one of about 20 possible specific soil-types (so the dominant soil type within a raster). Using QGIS, I calculated zonal statistics and therefore could calculate the number of each soil-type within one district (e.g., there are 40 rasters with soiltype 6111 and 52 rasters with soil-type 6210 in a district that contains 92 rasters). This data I transferred into Stata where I want to aggregate the data and find the mode value of the soil variable within a labor market region. The file looks something like this:
Code:
District | Labor Market Region | soil_6111 | soil_6210 | soil_5210 | rasters |
1 | 1 | 40 | 52 | 0 | 92 |
2 | 1 | 560 | 890 | 340 | 1790 |
3 | 1 | 30 | 600 | 1000 | 1630 |
4 | 2 | 800 | 500 | 100 | 140 |
5 | 2 | 400 | 300 | 200 | 900 |
6 | 2 | 100 | 50 | 20 | 170 |
7 | 3 | 340 | 300 | 660 | 1300 |
8 | 3 | 200 | 100 | 200 | 500 |
Now the question is how to find a way to detect the dominant soil type within labor market regions. I simplyfied the example, actually there are about 20 soil types, but I don't think this should make a difference for the sollution. Any help would be much appreciated.
0 Response to Finding the mode value in aggregated spatial units
Post a Comment