Methods for computing the point-biserial correlation

After reading this question on ResearchGate the other day, I installed -pbis- (net describe sg20, from(http://www.stata.com/stb/stb17)) and tried it. I was surprised to find that it gave a different result than I got using -pwcorr- and -corrci- (net sj 21-3 pr0041_4). I became a bit obsessed with figuring out why that was happening. To make a long story short, I believe that the -pbis- package uses the sample standard deviation (with n-1 in the denominator) in an equation that requires the descriptive (or population) SD (with n in the denominator).

But while I was tinkering around to trace that problem, I discovered something else a bit peculiar: When I compute a point-biserial correlation via -esize-, the sign of the correlation is reversed compared to what I get via -pwcorr- or -corrci-. Also, the 95% CIs from -esize- and -corrci- are not the same. Both of these things puzzle me. Perhaps someone can explain what is going on!

I'll paste below some of my output and the code that generated it.

Cheers,
Bruce

Code:

. sysuse auto, clear
(1978 automobile data)

. * r_pb = Pearson r for one dichotomous and one metric variable.
. * Therefore, it can be obtained via any commmand that computes Pearson r.
. * First, show that -pbis- gives the wrong result.
. pwcorr foreign mpg

             |  foreign      mpg
-------------+------------------
     foreign |   1.0000
         mpg |   0.3934   1.0000

. pbis foreign mpg

(obs= 74)
Np= 22  p= 0.30
Nq= 52  q= 0.70
------------------+------------------+------------------+------------------+
Coef.= 0.3907          t= 3.6018        P>|t| = 0.0006        df=     72

. * I believe that -pbis- uses the sample SD in an equation that
. * requires the descriptive (or population) SD.  If so, I should
. * be able to duplicate that incorrect result.
. * Use stored results from the -ttest- command
. quietly ttest mpg, by(foreign)

. local M1 = r(mu_2)

. local n1 = r(N_2)

. local M0 = r(mu_1)

. local n0 = r(N_1)

. local n = r(N_1) + r(N_2)

. local sigma = r(sd)*sqrt((`n'-1)/`n') // descriptive SD

. local s = r(sd)                       // sample SD

. local rpb1 = ((`M1'-`M0')/`sigma')*sqrt(`n1'*`n0'/`n'^2)

. local rpb2 = ((`M1'-`M0')/`s')*sqrt(`n1'*`n0'/`n'^2)

. display _newline ///
> "Using descriptive SD, r_pb = " `rpb1' _newline ///
> "Using the sample SD,  r_pb = " `rpb2'

Using descriptive SD, r_pb = .39339742
Using the sample SD,  r_pb = .39073028

. * The first result matches what I get with -pwcorr-;
. * the second result matches what I get with -pbis-.
.
. * Now try -esize-
. esize twosample mpg, by(foreign) pbcorr

Effect size based on mean comparison

                               Obs per group:
                                    Domestic =         52
                                     Foreign =         22
---------------------------------------------------------
        Effect size |   Estimate     [95% conf. interval]
--------------------+------------------------------------
   Point-biserial r |  -.3933974     -.555367   -.1821459
---------------------------------------------------------

. * -esize- gives the same magnitude, but with the opposite sign.
. * If I reverse-code, presumably, I'll get the same sign too.
. generate byte domestic = !foreign

. esize twosample mpg, by(domestic) pbcorr

Effect size based on mean comparison

                               Obs per group:
                                 domestic==0 =         22
                                 domestic==1 =         52
---------------------------------------------------------
        Effect size |   Estimate     [95% conf. interval]
--------------------+------------------------------------
   Point-biserial r |   .3933974     .1821459     .555367
---------------------------------------------------------

. * Finally, use -corrci- to get the 95% CI
. corrci mpg foreign  

(obs=74)

                  correlation and 95% limits
mpg     foreign      0.393    0.181    0.571

. matrix list r(corr) // show more decimals

symmetric r(corr)[2,2]
               mpg    foreign
    mpg          1
foreign  .39339742          1

. * The correlation matches what I get from -corrci-, but the CI does not.
. * I do not understand why the CIs are different.

Code:

sysuse auto, clear
* r_pb = Pearson r for one dichotomous and one metric variable.
* Therefore, it can be obtained via any commmand that computes Pearson r.
* First, show that -pbis- gives the wrong result.
pwcorr foreign mpg
pbis foreign mpg
* I believe that -pbis- uses the sample SD in an equation that
* requires the descriptive (or population) SD.  If so, I should
* be able to duplicate that incorrect result.
* Use stored results from the -ttest- command
quietly ttest mpg, by(foreign)
local M1 = r(mu_2)
local n1 = r(N_2)
local M0 = r(mu_1)
local n0 = r(N_1)
local n = r(N_1) + r(N_2)
local sigma = r(sd)*sqrt((`n'-1)/`n') // descriptive SD
local s = r(sd)                       // sample SD
local rpb1 = ((`M1'-`M0')/`sigma')*sqrt(`n1'*`n0'/`n'^2)
local rpb2 = ((`M1'-`M0')/`s')*sqrt(`n1'*`n0'/`n'^2)
display _newline ///
"Using descriptive SD, r_pb = " `rpb1' _newline ///
"Using the sample SD,  r_pb = " `rpb2'
* The first result matches what I get with -pwcorr-;
* the second result matches what I get with -pbis-.

* Now try -esize-
esize twosample mpg, by(foreign) pbcorr
* -esize- gives the same magnitude, but with the opposite sign.
* If I reverse-code, presumably, I'll get the same sign too.
generate byte domestic = !foreign
esize twosample mpg, by(domestic) pbcorr
* Finally, use -corrci- to get the 95% CI
corrci mpg foreign  
matrix list r(corr) // show more decimals
* The correlation matches what I get from -corrci-, but the CI does not.
* I do not understand why the CIs are different.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Methods for computing the point-biserial correlation
Methods for computing the point-biserial correlation

0 Response to Methods for computing the point-biserial correlation

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Methods for computing the point-biserial correlation Methods for computing the point-biserial correlation

Related Posts with Methods for computing the point-biserial correlation

0 Response to Methods for computing the point-biserial correlation

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Methods for computing the point-biserial correlation
Methods for computing the point-biserial correlation