Hey everyone,
I am preparing for my Master's dissertation, and it is my first time performing data analysis with stata. Thus, my questions are very basic, which I want to apologise for in advance.
I am interested in looking at correlates of life satisfaction (LS) in different countries over time.
The life satisfaction data is from the Eurobarometer gathered and harmonised within The Mannheim Eurobarometer Trend File, 1970-2002.
The dataset contains answers from about 1,000-2,000 respondents per country/year. LS is an ordinal variable with possible responses of 1 - very satisfied, 2 - fairly satisfied, 3 - not very satisfied and 4 - not at all satisfied.
I want to calculate both country-level averages and the percentage of respondents for each category. However, I am not sure how to work with the sampling weights. I plan to work with the nation2 classification and I think that wnation is the relevant weighting variable in this case (as I am not interested in the data on EU-level I think weuro is not applicable).
In the Codebook, it is stated:
3.2. Weighting Variables
Weighting variables adjust distributions of social-structural characteristics in a sample to those in a universe. Those adjustments have been made in a country-specific idiosyncratic way during the years when Eurobarometers and the preceding ECS studies were carried out by Gallup (Europe). When INRA took over in 1989 (starting with EB321), a consistent representative weighting routine has been applied allover. Six socio-demographic criteria were used (age, sex, size of household, regional affiliation, size of community). In order to harmonise weighting factors throughout the Eurobarometer series, ZEUS initiated a recalculation of weights for the earlier Eurobarometers (which was technically realised by INRA, Brussels). Therefore the three weighting variables WSAMPLE, WNATION, and WEURO that are included here are established upon common and uniform criteria for all Eurobarometer studies. The original weighting factors of the pre-1989 surveys are not included in the Mannheim Eurobarometer Trend File.
Further, it is stated:
WSAMPLE is the most basic of all three weighting factors. It adjusts distributions in the samples to those in the respective universe. WSAMPLE should be used together with id variable NATION1. In the case of Eurobarometer 41.0 the oversample of non-national European Union citizens (in total 373 cases from all countries) is excluded from analysis when weighting by wsample. This oversample was included in Eurobarometer 41.0 for test purposes; starting with Eurobarometer 41.1 the EB target population is the population of any nationality of an European Union member country. Note: The application of WSAMPLE for the Irish sample in ECS73 produces unexpected regional distributions for Ireland. In the case of Eurobarometer 5 the application of WSAMPLE excludes 17 Italian cases from analysis.
WNATION is used to adjust distributions to those in the respective nations. Note that WSAMPLE and WNATION only differ for the United Kingdom and, in later surveys, for Germany. WNATION should be used together with id variable NATION2.
WEURO adjusts the size of national (resp., subnational) samples to the size of national populations relative to one another. This weight is used when the European Community/European Unions is analysed as an entity rather than its constituent nations or regions.
Thus, these are the variables which I think are relevant for my purpose:
Observations: 1,134,384
Variables: 6
------------------------------------------------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
------------------------------------------------------------------------------------------------------------------------------------------------------------
id long %12.0g TECH VAR IDENTIFICATION NUMBER
year int %8.0g TECH VAR EUROBAROMETER YEAR
nation2 byte %8.0g NATION2 TECH VAR NATION2 IDENTIFIER
wnation double %10.0g TECH VAR WEIGHT NATION
weuro double %10.0g TECH VAR WEIGHT EUROPEAN UNION
satislfe byte %8.0g SATISLFE SATISFACTION LIFE
However, I am unsure how to use these variables in the svyset command or which commands to use in general.
To get the average per country, I used this command without prior using the svyset command since I was unsure about what, in this case is, the psuid and the strata :
collapse (mean) meansatislfe = satislfe [pw=wnation], by(year nation2)
I only found very small deviations between the means I am obtaining and the means I found in the supplementary materials of a paper published in nature.
To calculate the proportions, I think using the svyset command in advance is required.
Would that be the correct way to use that?
svyset id [pw=wnation], strata(nation2)
I highly doubt this is correct since it leads to unrealistic population size values.
Accordingly, to get the proportion of answers for the life satisfaction question, I thought about using this command (adapted for all relevant years):
svy: proportion satislfe if year == 1973, over (nation2)
However, I find some differences (=< 1%) between the values I am getting and the variable reports of the respective Eurobarometer. Thus, I am unsure whether the svyset command is used correctly.
I apologise for the long question and appreciate any help you can provide!
0 Response to svyset for repeated cross-sectional survey data // calculating averages & proportions
Post a Comment