Dear Stata list members,

I am currently trying to add several macro indicators from the Eurostat database towards the cross-sectional study data (European Social Survey).
However, I am struggling at a pre-processing stage.

The excel files were already successfully imported into Stata.

The macro indicators are differentiated by the NUTS regional classification of European regions. The concept of NUTS classification is that NUTS 3 can be aggregated to NUTS 2, and correspondingly NUTS 2 to NUTS 1 data.
Now, I do have the issue that some countries like Germany and UK have only NUTS level 1 data, meaning that they have rather broad regions(Federal states), whereas some other countries have smaller spatial data like NUTS level 2 and NUTS level 3.

My aim is to use the smallest possible scale of regional differentiation. This is only limited to NUTS level 2 due to the availability of macro indicators up to that level in Eurostat, e.g. regional unemployment numbers, population density etc.

So, I do have two data files containing the spatial area in km² , one with NUTS level 2 data (19 countries) and the other with NUTS level 1 data (Germany and UK).

Appending results in the following:

Code:
use "$area\reg_area3_nuts1.dta", clear
append using "$area\reg_area3_nuts2.dta", nolabel gen(source)

. describe

Contains data from ...\Data\Macro-indicators\Area in square kilometers\reg_area3_nuts1.dta
  obs:           447                          
 vars:             4                          7 Jul 2019 15:56
 size:        83,589                          
------------------------------------------------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------------------------------------------------------
nuts_label      str61  %61s                   NUTS label
area           long   %8.0g       area       Area in km2
nuts1label      str61  %61s                   NUTS 1 label
nuts2label      str61  %61s                   NUTS 2 label
Two things strike me as odd:
1. The values for the variable "area" are apparently sorted by the first digit oft the value and not in an ascending manner. How is that possible?


Code:
. sort area
. list in 1/10
+--------------------------------------+
     |   area           nuts_label   source |
     |--------------------------------------|
  1. | 100450           Basilicata        1 |
  2. | 100450               Ísland        0 |
  3. |  11952        Île de France        0 |
  4. |  11952               Ísland        1 |
  5. |  12998   West Midlands (UK)        0 |
     +--------------------------------------+
Code:
. list in 17/21

+-------------------------------------+
area           nuts_label   source
-------------------------------------
17.  15408              Abruzzo        1
18.  15408   Schleswig-Holstein        0
19.   1553           Nord-Norge        1
20.   1553                Åland        0
21.  15623   East Midlands (UK)        0
+-------------------------------------+
And the second issue that occurs is:

The values for the variable "area" of the appended file with NUTS 2 level data have changed and seem to have rather arbitrary values.
Two examples for illustration purposes: Prov. Antwerpen has 2804 km² as correct and original value in "reg_area3_nuts2.dta" but now has changed to 149 km²
and Prov. Limburg (BE) has 2390 km² as original value and now is 124 km².

What went wrong? Is it the storage type?

Best regards
Thomas