Hello,

I have a dataset with two geographical units of observations, districts and labor market regions. Labor market regions are just assembled districts, where each district is only part in one single labor market region. For districts, there exist shapefiles (Polygon files, which can be imported into a GIS program) and for labor market regions not. I also have satellite raster-data where each raster contains one of about 20 possible specific soil-types (so the dominant soil type within a raster). Using QGIS, I calculated zonal statistics and therefore could calculate the number of each soil-type within one district (e.g., there are 40 rasters with soiltype 6111 and 52 rasters with soil-type 6210 in a district that contains 92 rasters). This data I transferred into Stata where I want to aggregate the data and find the mode value of the soil variable within a labor market region. The file looks something like this:

Code:
 
District Labor Market Region soil_6111 soil_6210 soil_5210 rasters
1 1 40 52 0 92
2 1 560 890 340 1790
3 1 30 600 1000 1630
4 2 800 500 100 140
5 2 400 300 200 900
6 2 100 50 20 170
7 3 340 300 660 1300
8 3 200 100 200 500
Thus, in district 1 and district 2 soil type 6210 is dominant, in district 3 soil type 5210 is dominant and in labor market region 1 soil type 6210 is dominant because it is the one mostly occurring there. Due to the variation of district size, there may be two districts with dominant soil type being e.g. 6111 but as the third region is much larger than the two and has another dominant soil type, this other soil is the dominant one in the labor market region.
Now the question is how to find a way to detect the dominant soil type within labor market regions. I simplyfied the example, actually there are about 20 soil types, but I don't think this should make a difference for the sollution. Any help would be much appreciated.