But while I was tinkering around to trace that problem, I discovered something else a bit peculiar: When I compute a point-biserial correlation via -esize-, the sign of the correlation is reversed compared to what I get via -pwcorr- or -corrci-. Also, the 95% CIs from -esize- and -corrci- are not the same. Both of these things puzzle me. Perhaps someone can explain what is going on!
I'll paste below some of my output and the code that generated it.
Cheers,
Bruce
Code:
. sysuse auto, clear (1978 automobile data) . * r_pb = Pearson r for one dichotomous and one metric variable. . * Therefore, it can be obtained via any commmand that computes Pearson r. . * First, show that -pbis- gives the wrong result. . pwcorr foreign mpg | foreign mpg -------------+------------------ foreign | 1.0000 mpg | 0.3934 1.0000 . pbis foreign mpg (obs= 74) Np= 22 p= 0.30 Nq= 52 q= 0.70 ------------------+------------------+------------------+------------------+ Coef.= 0.3907 t= 3.6018 P>|t| = 0.0006 df= 72 . * I believe that -pbis- uses the sample SD in an equation that . * requires the descriptive (or population) SD. If so, I should . * be able to duplicate that incorrect result. . * Use stored results from the -ttest- command . quietly ttest mpg, by(foreign) . local M1 = r(mu_2) . local n1 = r(N_2) . local M0 = r(mu_1) . local n0 = r(N_1) . local n = r(N_1) + r(N_2) . local sigma = r(sd)*sqrt((`n'-1)/`n') // descriptive SD . local s = r(sd) // sample SD . local rpb1 = ((`M1'-`M0')/`sigma')*sqrt(`n1'*`n0'/`n'^2) . local rpb2 = ((`M1'-`M0')/`s')*sqrt(`n1'*`n0'/`n'^2) . display _newline /// > "Using descriptive SD, r_pb = " `rpb1' _newline /// > "Using the sample SD, r_pb = " `rpb2' Using descriptive SD, r_pb = .39339742 Using the sample SD, r_pb = .39073028 . * The first result matches what I get with -pwcorr-; . * the second result matches what I get with -pbis-. . . * Now try -esize- . esize twosample mpg, by(foreign) pbcorr Effect size based on mean comparison Obs per group: Domestic = 52 Foreign = 22 --------------------------------------------------------- Effect size | Estimate [95% conf. interval] --------------------+------------------------------------ Point-biserial r | -.3933974 -.555367 -.1821459 --------------------------------------------------------- . * -esize- gives the same magnitude, but with the opposite sign. . * If I reverse-code, presumably, I'll get the same sign too. . generate byte domestic = !foreign . esize twosample mpg, by(domestic) pbcorr Effect size based on mean comparison Obs per group: domestic==0 = 22 domestic==1 = 52 --------------------------------------------------------- Effect size | Estimate [95% conf. interval] --------------------+------------------------------------ Point-biserial r | .3933974 .1821459 .555367 --------------------------------------------------------- . * Finally, use -corrci- to get the 95% CI . corrci mpg foreign (obs=74) correlation and 95% limits mpg foreign 0.393 0.181 0.571 . matrix list r(corr) // show more decimals symmetric r(corr)[2,2] mpg foreign mpg 1 foreign .39339742 1 . * The correlation matches what I get from -corrci-, but the CI does not. . * I do not understand why the CIs are different.
Code:
sysuse auto, clear * r_pb = Pearson r for one dichotomous and one metric variable. * Therefore, it can be obtained via any commmand that computes Pearson r. * First, show that -pbis- gives the wrong result. pwcorr foreign mpg pbis foreign mpg * I believe that -pbis- uses the sample SD in an equation that * requires the descriptive (or population) SD. If so, I should * be able to duplicate that incorrect result. * Use stored results from the -ttest- command quietly ttest mpg, by(foreign) local M1 = r(mu_2) local n1 = r(N_2) local M0 = r(mu_1) local n0 = r(N_1) local n = r(N_1) + r(N_2) local sigma = r(sd)*sqrt((`n'-1)/`n') // descriptive SD local s = r(sd) // sample SD local rpb1 = ((`M1'-`M0')/`sigma')*sqrt(`n1'*`n0'/`n'^2) local rpb2 = ((`M1'-`M0')/`s')*sqrt(`n1'*`n0'/`n'^2) display _newline /// "Using descriptive SD, r_pb = " `rpb1' _newline /// "Using the sample SD, r_pb = " `rpb2' * The first result matches what I get with -pwcorr-; * the second result matches what I get with -pbis-. * Now try -esize- esize twosample mpg, by(foreign) pbcorr * -esize- gives the same magnitude, but with the opposite sign. * If I reverse-code, presumably, I'll get the same sign too. generate byte domestic = !foreign esize twosample mpg, by(domestic) pbcorr * Finally, use -corrci- to get the 95% CI corrci mpg foreign matrix list r(corr) // show more decimals * The correlation matches what I get from -corrci-, but the CI does not. * I do not understand why the CIs are different.
0 Response to Methods for computing the point-biserial correlation
Post a Comment