But while I was tinkering around to trace that problem, I discovered something else a bit peculiar: When I compute a point-biserial correlation via -esize-, the sign of the correlation is reversed compared to what I get via -pwcorr- or -corrci-. Also, the 95% CIs from -esize- and -corrci- are not the same. Both of these things puzzle me. Perhaps someone can explain what is going on!
I'll paste below some of my output and the code that generated it.
Cheers,
Bruce
Code:
. sysuse auto, clear
(1978 automobile data)
. * r_pb = Pearson r for one dichotomous and one metric variable.
. * Therefore, it can be obtained via any commmand that computes Pearson r.
. * First, show that -pbis- gives the wrong result.
. pwcorr foreign mpg
             |  foreign      mpg
-------------+------------------
     foreign |   1.0000
         mpg |   0.3934   1.0000
. pbis foreign mpg
(obs= 74)
Np= 22  p= 0.30
Nq= 52  q= 0.70
------------------+------------------+------------------+------------------+
Coef.= 0.3907          t= 3.6018        P>|t| = 0.0006        df=     72
. * I believe that -pbis- uses the sample SD in an equation that
. * requires the descriptive (or population) SD.  If so, I should
. * be able to duplicate that incorrect result.
. * Use stored results from the -ttest- command
. quietly ttest mpg, by(foreign)
. local M1 = r(mu_2)
. local n1 = r(N_2)
. local M0 = r(mu_1)
. local n0 = r(N_1)
. local n = r(N_1) + r(N_2)
. local sigma = r(sd)*sqrt((`n'-1)/`n') // descriptive SD
. local s = r(sd)                       // sample SD
. local rpb1 = ((`M1'-`M0')/`sigma')*sqrt(`n1'*`n0'/`n'^2)
. local rpb2 = ((`M1'-`M0')/`s')*sqrt(`n1'*`n0'/`n'^2)
. display _newline ///
> "Using descriptive SD, r_pb = " `rpb1' _newline ///
> "Using the sample SD,  r_pb = " `rpb2'
Using descriptive SD, r_pb = .39339742
Using the sample SD,  r_pb = .39073028
. * The first result matches what I get with -pwcorr-;
. * the second result matches what I get with -pbis-.
.
. * Now try -esize-
. esize twosample mpg, by(foreign) pbcorr
Effect size based on mean comparison
                               Obs per group:
                                    Domestic =         52
                                     Foreign =         22
---------------------------------------------------------
        Effect size |   Estimate     [95% conf. interval]
--------------------+------------------------------------
   Point-biserial r |  -.3933974     -.555367   -.1821459
---------------------------------------------------------
. * -esize- gives the same magnitude, but with the opposite sign.
. * If I reverse-code, presumably, I'll get the same sign too.
. generate byte domestic = !foreign
. esize twosample mpg, by(domestic) pbcorr
Effect size based on mean comparison
                               Obs per group:
                                 domestic==0 =         22
                                 domestic==1 =         52
---------------------------------------------------------
        Effect size |   Estimate     [95% conf. interval]
--------------------+------------------------------------
   Point-biserial r |   .3933974     .1821459     .555367
---------------------------------------------------------
. * Finally, use -corrci- to get the 95% CI
. corrci mpg foreign  
(obs=74)
                  correlation and 95% limits
mpg     foreign      0.393    0.181    0.571
. matrix list r(corr) // show more decimals
symmetric r(corr)[2,2]
               mpg    foreign
    mpg          1
foreign  .39339742          1
. * The correlation matches what I get from -corrci-, but the CI does not.
. * I do not understand why the CIs are different.Code:
sysuse auto, clear * r_pb = Pearson r for one dichotomous and one metric variable. * Therefore, it can be obtained via any commmand that computes Pearson r. * First, show that -pbis- gives the wrong result. pwcorr foreign mpg pbis foreign mpg * I believe that -pbis- uses the sample SD in an equation that * requires the descriptive (or population) SD. If so, I should * be able to duplicate that incorrect result. * Use stored results from the -ttest- command quietly ttest mpg, by(foreign) local M1 = r(mu_2) local n1 = r(N_2) local M0 = r(mu_1) local n0 = r(N_1) local n = r(N_1) + r(N_2) local sigma = r(sd)*sqrt((`n'-1)/`n') // descriptive SD local s = r(sd) // sample SD local rpb1 = ((`M1'-`M0')/`sigma')*sqrt(`n1'*`n0'/`n'^2) local rpb2 = ((`M1'-`M0')/`s')*sqrt(`n1'*`n0'/`n'^2) display _newline /// "Using descriptive SD, r_pb = " `rpb1' _newline /// "Using the sample SD, r_pb = " `rpb2' * The first result matches what I get with -pwcorr-; * the second result matches what I get with -pbis-. * Now try -esize- esize twosample mpg, by(foreign) pbcorr * -esize- gives the same magnitude, but with the opposite sign. * If I reverse-code, presumably, I'll get the same sign too. generate byte domestic = !foreign esize twosample mpg, by(domestic) pbcorr * Finally, use -corrci- to get the 95% CI corrci mpg foreign matrix list r(corr) // show more decimals * The correlation matches what I get from -corrci-, but the CI does not. * I do not understand why the CIs are different.
0 Response to Methods for computing the point-biserial correlation
Post a Comment