Hi

I am working on statistical inference with instrumental variables (IV) following Wooldridge (2016) Introductory Econometrics, Ch. 15. I am using the Card data set (like the book), with wages as outcome (y), education as a endogenous continuous treatment (x) and distance to college as a binary IV (z).

I want to calculate the standard errors manually, and preferably additionally in matrix form using Mata. So far, I am able to calculate coefficients but I can't seem to obtain the correct standard errors and would be happy for input on this.

I obtain the point estimate for βiv with the Wald-estimator:

βiv = E[y|z=1] - E[y|z=1] / E[x|z=1] - E[x|z=1]

Code:
. use http://pped.org/card.dta, clear // Load Card data set

. keep nearc4 educ lwage id 

. rename nearc4 z

. rename educ x

. rename lwage y

. bysort z: sum y x

-----------------------------------------------------------------------------------------------------------------
-> z = 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |        957    6.155494    .4328417    4.60517   7.474772
           x |        957    12.69801    2.791523          1         18

-----------------------------------------------------------------------------------------------------------------
-> z = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |      2,053    6.311401    .4402214    4.60517   7.784889
           x |      2,053    13.52703    2.580455          2         18


. di (6.311401-6.155494)/(13.52703-12.69801)
.18806181
The point estimates can be cross checked by -ivregress-
Code:
. ivregress 2sls y (x=z), nohe
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1880626   .0262826     7.16   0.000     .1365496    .2395756
       _cons |   3.767472   .3487458    10.80   0.000     3.083943    4.451001
------------------------------------------------------------------------------
I now want to proceed by calculating the standard errors. Wooldridge (2016, p. 466) writes that standard errors for βiv is obtained by using the square root of the estimated asymptotic variance, where the latter is obtained by

Var(βiv) = sigma^2/(SSTx*R^2x,z)

First, the total sum of squares for x (SSTx), is obtained by
Code:
. egen x_bar = mean(x)

. gen SSTx = (x-x_bar)^2

. quiet sum SSTx

. di r(sum)
21562.08
Second, R^2x,z is the squared population correlation between x and z, obtained from the regression output
Code:
. reg x z, nohe
------------------------------------------------------------------------------
           x |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           z |    .829019   .1036988     7.99   0.000     .6256912    1.032347
       _cons |   12.69801   .0856416   148.27   0.000     12.53009    12.86594
------------------------------------------------------------------------------

. di .829^2
.687241
Finally, sigma^2 is the error variance. Wooldridge says to use the IV residuals û in calculating the error variance

sigma^2 = 1/(n-2)*sum(û^2)

Code:
. quiet reg x z

. predict x_hat
(option xb assumed; fitted values)

. quiet reg y x_hat, nohe

. predict iv_resid
(option xb assumed; fitted values)

. quiet sum iv_resid

. di r(sum)
18848.115

. di (18848.114)^2
3.553e+08

. gen sigma_squared = 3.553e+08

. tabstat sigma_squared, format(%20.2f)

    variable |      mean
-------------+----------
sigma_squa~d |        355300000.00
------------------------

. di (1/(3010-2))*355300000
118118.35
Thus, when finally substituting my values into the formula for Var(βiv) and standard error I get
Code:
. di 118118.35/(21562.08*.687241)
7.971089

. * sigma 
. di sqrt(7.971089)
2.8233117

. * se(βiv)
. di 2.8233117/sqrt(21562.08)
.01922709
My main question is: Does someone see where I have miscalculated and can anyone provide the correct calculation?

I have also worked out IV in Mata, although I am stuck at variance-estimation here as well:
Code:
. mata
: 
: y=st_data(.,"y")

: 
: X=st_data(.,"x")

: 
: Z=st_data(.,"z")

: 
: X = X, J(rows(X),1,1) // Add constant

: 
: Z = Z, J(rows(Z),1,1) // Add constant

: 
: b_iv = luinv(Z'*X)*Z'*y

: 
: b_iv
                 1
    +---------------+
  1 |  .1880626042  |
  2 |  3.767472015  |
    +---------------+

: end
(I have posted at stats.stackexchange as well)