this might be easily answered: Which of the following options must I choose for a "correct" bivariate regression/scatter plot?
n, R² and RMSE differ between option A and B.
Option A: Variable containing mean values (in a full dataset with different ids)
Starting with a panel dataset, I build the mean of each year of a variable and applied aaplot by Nick Cox, available from SSC.
So the dataset is sorted by year (called "relyear") and the newly generated variable contains the same mean for each id but the means differ by year.
Code:
bys relyear: egen relgdp = mean(gdpcapPPP11) gen lnrelgdp = ln(relgdp) aaplot lnrelgdp relyear
Option B: Dataset solely containing unique mean values (with corresponding variable averaged by)
Instead of plotting the variables of interest in my "full" dataset, I collapsed the dataset and then applied aaplot (see above) again for comparison.
Code:
preserve collapse (mean) gdpcapPPP11, by(relyear) gen ln_gdp=ln(gdpcapPPP11) aaplot ln_gdp relyear restore
Results B:
R²=92.7% n=181 RMSE=0.3360293I tried this with other variables, too and R² does not always fall from A to B. Of course, the variables ln_gdp and lnrelgdp contain the same values. Yet, the results are different. So: A or B?
Thank you!
***************************
If needed:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(relyear id) double gdpcapPPP11 float lnrelgdp -40 75 . 7.214939 -40 2 . 7.214939 -40 135 . 7.214939 -40 17 238.8545623609962 7.214939 -40 214 . 7.214939 -40 41 1429.997975378749 7.214939 -40 201 . 7.214939 -40 68 4377.438877160824 7.214939 -40 40 1038.932546183966 7.214939 -40 173 376.0451764821386 7.214939 -40 189 696.2769275188422 7.214939 -40 176 . 7.214939 -39 40 903.1257682167955 7.279641 -39 189 692.4529850621859 7.279641 -39 135 . 7.279641 -39 2 . 7.279641 -39 68 4974.194553731857 7.279641 -39 176 . 7.279641 -39 214 . 7.279641 -39 201 . 7.279641 -39 17 245.198582661663 7.279641 -39 75 . 7.279641 -39 41 1510.258194599068 7.279641 -39 173 377.5727516869297 7.279641 -38 135 . 7.341884 -38 17 256.7771744027178 7.341884 -38 176 . 7.341884 -38 201 . 7.341884 -38 214 . 7.341884 -38 75 . 7.341884 -38 68 5285.614365754863 7.341884 -38 173 389.0707432639699 7.341884 -38 41 1547.834960294038 7.341884 -38 189 715.3433022316342 7.341884 -38 2 . 7.341884 -38 40 1067.059379239463 7.341884 -37 68 5537.479760591066 7.357561 -37 214 . 7.357561 -37 75 . 7.357561 -37 201 . 7.357561 -37 41 1446.477872611546 7.357561 -37 2 . 7.357561 -37 135 . 7.357561 -37 189 690.1190825834448 7.357561 -37 40 1094.006461863386 7.357561 -37 17 250.1178520375817 7.357561 -37 173 389.8434767407718 7.357561 -37 176 . 7.357561 -36 68 5706.515462131701 7.370363 -36 189 659.7443148239921 7.370363 -36 41 1461.712348810607 7.370363 -36 17 252.2618318136373 7.370363 -36 214 . 7.370363 -36 2 . 7.370363 -36 201 . 7.370363 -36 75 . 7.370363 -36 40 1039.402910306577 7.370363 -36 176 . 7.370363 -36 135 . 7.370363 -36 173 409.6221260042836 7.370363 -35 17 257.9150775352896 7.036851 -35 30 . 7.036851 -35 3 . 7.036851 -35 40 1021.608540068463 7.036851 -35 189 651.0954092153226 7.036851 -35 14 214.1369544700441 7.036851 -35 176 . 7.036851 -35 190 349.0495545398963 7.036851 -35 110 1273.484095797152 7.036851 -35 129 . 7.036851 -35 173 432.2722889993854 7.036851 -35 147 267.0736069476994 7.036851 -35 75 . 7.036851 -35 108 . 7.036851 -35 73 . 7.036851 -35 2 . 7.036851 -35 201 . 7.036851 -35 68 6080.802586051083 7.036851 -35 41 1474.627346963577 7.036851 -35 143 1299.949248550549 7.036851 -35 135 . 7.036851 -35 150 . 7.036851 -35 167 331.5780066816225 7.036851 -35 214 . 7.036851 -34 30 . 7.04516 -34 68 6237.257473927168 7.04516 -34 2 . 7.04516 -34 17 255.2815312488197 7.04516 -34 3 . 7.04516 -34 135 . 7.04516 -34 176 . 7.04516 -34 129 . 7.04516 -34 73 . 7.04516 -34 201 . 7.04516 -34 143 1276.285209772381 7.04516 -34 110 1276.445370545105 7.04516 -34 190 387.3569051296755 7.04516 -34 173 432.7347275014108 7.04516 -34 167 310.6909401923209 7.04516 -34 41 1454.059022341608 7.04516 end format %ty relyear label values id id label def id 2 "AFG", modify label def id 3 "AGO", modify label def id 14 "BDI", modify label def id 17 "BFA", modify label def id 30 "BTN", modify label def id 40 "COD", modify label def id 41 "COG", modify label def id 68 "GAB", modify label def id 73 "GIN", modify label def id 75 "GNB", modify label def id 108 "LAO", modify label def id 110 "LBR", modify label def id 129 "MLI", modify label def id 135 "MOZ", modify label def id 143 "NGA", modify label def id 147 "NPL", modify label def id 150 "OMN", modify label def id 167 "RWA", modify label def id 173 "SLE", modify label def id 176 "SOM", modify label def id 189 "TCD", modify label def id 190 "TGO", modify label def id 201 "UGA", modify label def id 214 "YEM", modify
0 Response to Different n, R² and RMSE: Choosing between A) means in dataset / B) dataset of means
Post a Comment