I would like to generate a variable equal to the simple average of some other variables. Problem is, I have an unbalanced dataset, like this one:
Code:
input str2 country int year str4 varName double value "AA" 2008 "varA" 4 "AA" 2009 "varA" . "AA" 2010 "varA" . "AA" 2008 "varB" 6.5 "AA" 2009 "varB" 8 "AA" 2010 "varB" 4.7 "AA" 2008 "varC" 3.2 "AA" 2009 "varC" 5.4 "AA" 2010 "varC" . "BB" 2008 "varA" 7 "BB" 2009 "varA" 8.2 "BB" 2010 "varA" . "BB" 2008 "varB" 7.3 "BB" 2009 "varB" 0 "BB" 2010 "varB" 9 "BB" 2008 "varC" 6 "BB" 2009 "varC" 5 "BB" 2010 "varC" . end
What I would like to achieve is the following:
1) calculate a simple average, by country, of varA, varB and varC taking the value from the latest year available for each variable. So it should take the average of the value of varA and varC in 2009 and the value of varB in 2010.
2) Moreover, note that country AA has its last observation for varA in 2008. However, the last observation
in the sample
for varA is 2009. Therefore, for country AA I would like to remove varA from the calculation of the average, so that for country AA we only take the average of varB in 2010 and varC in 2009.
In other words, the new variable should be calculated as:
for country AA: (4.7 + 5.4)/2 = 5.05
for country BB: (8.2 + 9 + 5)/3 = 7.4
In the dataset, it should
appear like this:
Code:
input str2 country int year str4 varName double value
"AA" 2010 "aver" 5.05 "BB" 2010 "aver" 7.4
end
i.e. it's enough to insert it just for the last year in the sample.
Point 1) is crucial to me. If implementing point 2) is too cumbersome, I could live with just 1).
0 Response to Calculating average of variables from different years
Post a Comment