Dear Statalist users,

I'm a PhD student and stata beginner, and probably my question would appear silly to many of you though I've been spending two days on the issue without sorting the problem out. I use a survey in which I have income deciles and electoral choice for every respondent. I have a score that I made in a previous analysis which I have to use in order to "weight" the actual percentages of votes for each party, in the sense that the vote of a given respondent will not weight 1 anymore but the equivalent of the score of the correspondent party he/she voted.

My objective is to make a line graph that indicates on the y axis the weigthed percentages of votes for parties together, on the x axis the income deciles. My code isn't working when the score is applied.

For drawing the unweighted graph for, let's say, the following two parties (M5S and LN) together, I wrote this code, where votes for parties are expressed in dummy variables (1 = yes; 0 = no):


**** LN

egen LN_income = total (LN), by(income)
egen LN_income_norm = count (LN), by(income)
gen LN_new = (LN_income/ LN_income_norm)
sort income

** error bar
gen LN_new_error = sqrt(LN_new*(1 - LN_new)/LN_income_norm)

** to plot
gen LN_new_low = LN_new - LN_new_error
gen LN_new_high = LN_new + LN_new_error


********* M5S

egen M5S_income = total (M5S), by(income)
egen M5S_income_norm = count (M5S), by(income)
gen M5S_new = (M5S_income/ M5S_income_norm)
sort income

** error bar
gen M5S_new_error = sqrt(M5S_new*(1 - M5S_new)/M5S_income_norm)

** to plot
gen M5S_new_low = M5S_new - M5S_new_error
gen M5S_new_high = M5S_new + M5S_new_error


**** sum of the two parties

gen sum_M5S_LN = (M5S_income/ M5S_income_norm) + (LN_income/ LN_income_norm)
gen sum_M5S_LN_income_norm = M5S_income_norm + LN_income_norm
sort income
** error bar
gen sum_M5S_LN_error = sqrt(sum_M5S_LN*(1 - sum_M5S_LN)/sum_M5S_LN_income_norm)
** to plot
gen sum_M5S_LN_low = sum_M5S_LN - sum_M5S_LN_error
gen sum_M5S_LN_high = sum_M5S_LN + sum_M5S_LN_error

*** final plot (basic form with no options)

line sum_M5S_LN income|| rcap sum_M5S_LN_low sum_M5S_LN_high income




When I apply the score, the code becomes:


******************************** LN

egen LN_income = total (LN), by(income)
egen LN_income_norm = count (LN), by(income)
gen LN_new = (LN_income/ LN_income_norm)* 0.78
sort income

** error bar
gen LN_new_error = sqrt(LN_new*(1 - LN_new)/LN_income_norm)

** to plot
gen LN_new_low = LN_new - LN_new_error
gen LN_new_high = LN_new + LN_new_error


***************************** M5S

egen M5S_income = total (M5S), by(income)
egen M5S_income_norm = count (M5S), by(income)
gen M5S_new = (M5S_income/ M5S_income_norm) *0.56
sort income

** error bar
gen M5S_new_error = sqrt(M5S_new*(1 - M5S_new)/M5S_income_norm)

** to plot
gen M5S_new_low = M5S_new - M5S_new_error
gen M5S_new_high = M5S_new + M5S_new_error

**************** sum of the two parties
gen sum_M5S_LN = (M5S_income/ M5S_income_norm) + (LN_income/ LN_income_norm)
gen sum_M5S_LN_income_norm = M5S_income_norm + LN_income_norm
sort income

** error bar
gen sum_M5S_LN_error = sqrt(sum_M5S_LN*(1 - sum_M5S_LN)/sum_M5S_LN_income_norm)
** to plot
gen sum_M5S_LN_low = sum_M5S_LN - sum_M5S_LN_error
gen sum_M5S_LN_high = sum_M5S_LN + sum_M5S_LN_error

*** final plot (basic form)

line sum_M5S_LN income|| rcap sum_M5S_LN_low sum_M5S_LN_high income




The problem is with normalization, because although the graph has a credible shape, normalised values are wrong. I logically got to the conclusion that there must be an extra passage that I am missing about normalising with the sum of the two parties*mean_score after obtaining the percentage*score, but I got lost on how to do this (provided I'm right).

I thank you in advance for your help,

J.