I am currently trying to add several macro indicators from the Eurostat database towards the cross-sectional study data (European Social Survey).
However, I am struggling at a pre-processing stage.
The excel files were already successfully imported into Stata.
The macro indicators are differentiated by the NUTS regional classification of European regions. The concept of NUTS classification is that NUTS 3 can be aggregated to NUTS 2, and correspondingly NUTS 2 to NUTS 1 data.
Now, I do have the issue that some countries like Germany and UK have only NUTS level 1 data, meaning that they have rather broad regions(Federal states), whereas some other countries have smaller spatial data like NUTS level 2 and NUTS level 3.
My aim is to use the smallest possible scale of regional differentiation. This is only limited to NUTS level 2 due to the availability of macro indicators up to that level in Eurostat, e.g. regional unemployment numbers, population density etc.
So, I do have two data files containing the spatial area in km² , one with NUTS level 2 data (19 countries) and the other with NUTS level 1 data (Germany and UK).
Appending results in the following:
Code:
use "$area\reg_area3_nuts1.dta", clear append using "$area\reg_area3_nuts2.dta", nolabel gen(source) . describe Contains data from ...\Data\Macro-indicators\Area in square kilometers\reg_area3_nuts1.dta obs: 447 vars: 4 7 Jul 2019 15:56 size: 83,589 ------------------------------------------------------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------------------------------ nuts_label str61 %61s NUTS label area long %8.0g area Area in km2 nuts1label str61 %61s NUTS 1 label nuts2label str61 %61s NUTS 2 label
1. The values for the variable "area" are apparently sorted by the first digit oft the value and not in an ascending manner. How is that possible?
Code:
. sort area . list in 1/10 +--------------------------------------+ | area nuts_label source | |--------------------------------------| 1. | 100450 Basilicata 1 | 2. | 100450 Ísland 0 | 3. | 11952 Île de France 0 | 4. | 11952 Ísland 1 | 5. | 12998 West Midlands (UK) 0 | +--------------------------------------+
Code:
. list in 17/21 +-------------------------------------+ area nuts_label source ------------------------------------- 17. 15408 Abruzzo 1 18. 15408 Schleswig-Holstein 0 19. 1553 Nord-Norge 1 20. 1553 Åland 0 21. 15623 East Midlands (UK) 0 +-------------------------------------+
The values for the variable "area" of the appended file with NUTS 2 level data have changed and seem to have rather arbitrary values.
Two examples for illustration purposes: Prov. Antwerpen has 2804 km² as correct and original value in "reg_area3_nuts2.dta" but now has changed to 149 km²
and Prov. Limburg (BE) has 2390 km² as original value and now is 124 km².
What went wrong? Is it the storage type?
Best regards
Thomas
0 Response to Help with merging aggregate data to survey data
Post a Comment