Hello Stata experts,

at the moment I'm working on a project that requires the use of 2SLS method with fixed-effects included. I'm struggling to make sense of the differences in the estimation results produced by Stata commands: ivregress, reghdfe, and ivreghdfe, and then to make a decision on which one should be used.
More specifically, I regress a production function with Y (measures firm performance) as the explained variable and regressors include production factors (x1 - x4) and (endogenous) city factors cityX5 and cityX6 (says, log of city population and immigration rate, they are my variables of interest). I employ a cross-sectional firm-level data set of a country.

To deal with the endogeneity of city factors, I use city-level variables z1 z2 to instrument for X5, and z3 and z4 to instrument for X6. I add the dummies of sate_industry to control region_industry fixed effects (each state has many cities).

Results produced by ivregress (with some unimportant results are dropped to keep the space):
HTML Code:
. ivregress 2sls Y x1 x2 x3 x4 (cityX5 cityX6= z1 z2 z3 z4) i.state_industry_pairs if sample==1, vce(cluster state_id)
note: 43.state_industry_pairs identifies no observations in the sample
note: 44.state_industry_pairs identifies no observations in the sample
note: 48.state_industry_pairs identifies no observations in the sample
note: 60.state_industry_pairs identifies no observations in the sample
.....
Instrumental variables (2SLS) regression          Number of obs   =    164,343
                                                  Wald chi2(1031) =  181743.21
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.3889
                                                  Root MSE        =     1.1872

                                      (Std. Err. adjusted for 63 clusters in state_id)
--------------------------------------------------------------------------------------
                     |               Robust
                   Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
              cityX5 |   .0329823   .0152646     2.16   0.031     .0030642    .0629004
              cityX6 |   .1052269   .2336223     0.45   0.652    -.3526644    .5631183
                  x1 |   .2926192   .0102266    28.61   0.000     .2725754     .312663
                  x2 |   .0316666   .0350465     0.90   0.366    -.0370233    .1003565
                  x3 |   .0694911   .0182037     3.82   0.000     .0338126    .1051696
                  x4 |   .6947005   .0578016    12.02   0.000     .5814115    .8079895
                     |
state_industry_pairs |
                  2  |   1.686815   .0173291    97.34   0.000      1.65285    1.720779
                  3  |   1.727269   .0193602    89.22   0.000     1.689324    1.765214
                  4  |    .918509   .0260839    35.21   0.000     .8673854    .9696326
                  5  |   1.665119   .0484384    34.38   0.000     1.570181    1.760056
                  6  |   2.387751   .0417277    57.22   0.000     2.305967    2.469536
                  7  |   1.261545   .0535408    23.56   0.000     1.156607    1.366484
                  8  |   .7721015   .0317133    24.35   0.000     .7099446    .8342585
                  9  |   1.387479    .062664    22.14   0.000      1.26466    1.510298
....
                     |
               _cons |    2.09025   .2286741     9.14   0.000     1.642057    2.538443
--------------------------------------------------------------------------------------
Instrumented:  cityX5 cityX6
Instruments:   x1 x2 x3 x4 2.state_industry_pairs 3.state_industry_pairs
               4.state_industry_pairs 5.state_industry_pairs 6.state_industry_pairs
               7.state_industry_pairs 8.state_industry_pairs 9.state_industry_pairs
               10.state_industry_pairs 11.state_industry_pairs
               12.state_industry_pairs 13.state_industry_pairs
               14.state_industry_pairs 15.state_industry_pairs
               16.state_industry_pairs 17.state_industry_pairs
               18.state_industry_pairs 19.state_industry_pairs
               20.state_industry_pairs 21.state_industry_pairs
...
Results produced by ivreghdfe:
HTML Code:
. ivreghdfe Y x1 x2 x3 x4 (cityX5 cityX6= z1 z2 z3 z4) if sample==1, absorb(state_industry_pairs) vce(cluster state_id)
(dropped 55 singleton observations)
(MWFE estimator converged in 1 iterations)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

                                                      Number of obs =   164288
                                                      F(  6,164281) =  4595.61
                                                      Prob > F      =   0.0000
Total (centered) SS     =  270763.1577                Centered R2   =   0.1445
Total (uncentered) SS   =  270763.1577                Uncentered R2 =   0.1445
Residual SS             =  231627.2165                Root MSE      =    1.187

------------------------------------------------------------------------------
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cityX5 |   .0329823   .0067151     4.91   0.000     .0198209    .0461438
      cityX6 |   .1052269    .072774     1.45   0.148    -.0374086    .2478625
          x1 |   .2926192   .0020022   146.15   0.000     .2886949    .2965435
          x2 |   .0316666   .0015753    20.10   0.000     .0285791    .0347541
          x3 |   .0694911   .0070562     9.85   0.000     .0556611    .0833211
          x4 |   .6947005    .013997    49.63   0.000     .6672667    .7221343
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic):         4.3e+04
                                                   Chi-sq(3) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              1.4e+04
Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    11.04
                                         10% maximal IV relative bias     7.56
                                         20% maximal IV relative bias     5.57
                                         30% maximal IV relative bias     4.73
                                         10% maximal IV size             16.87
                                         15% maximal IV size              9.93
                                         20% maximal IV size              7.54
                                         25% maximal IV size              6.28
Source: Stock-Yogo (2005).  Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments):           0.064
                                                   Chi-sq(2) P-val =    0.9687
------------------------------------------------------------------------------
Instrumented:         cityX5 cityX6
Included instruments: x1 x2 x3 x4
Excluded instruments: z1 z2 z3 z4
Partialled-out:       _cons
                      nb: total SS, model F and R2s are after partialling-out;
                          any small-sample adjustments include partialled-out
                          variables in regressor count K
------------------------------------------------------------------------------

Absorbed degrees of freedom:
--------------------------------------------------------------+
          Absorbed FE | Categories  - Redundant  = Num. Coefs |
----------------------+---------------------------------------|
 state_industry_pairs |       971         971           0    *|
--------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
Results produced by my manual calculation with reghdfe, I run first-stage and second-stage regressions then correct standard errors of estimated coefficients in the second-stage regression with bootstrap:
HTML Code:
program tsls_test
  1. quietly: reghdfe cityX5 z1 z2 z3 z4 x1 x2 x3 x4  if sample==1, absorb(state_industry_pairs) vce(cluster state_id)
  2. predict cityX5n, xb
  3. quietly: reghdfe cityX6 z1 z2 z3 z4 x1 x2 x3 x4  if sample==1, absorb(state_industry_pairs) vce(cluster state_id)
  4. predict cityX6n, xb
  5. quietly: reghdfe y x1 x2 x3 x4 cityX5n cityX6n if sample==1, absorb(state_industry_pairs) vce(cluster state_id)
  6. drop cityX5n cityX6n
  7. end

bootstrap, cluster(state_id) reps(250): tsls_test
(running tsls_test on estimation sample)

Bootstrap replications (250)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250

HDFE Linear regression                          Number of obs     =    164,288
Absorbing 1 HDFE group                          Replications      =        250
                                                Wald chi2(6)      =    4202.52
                                                Prob > chi2       =     0.0000
                                                R-squared         =     0.3821
                                                Adj R-squared     =     0.3784
                                                Root MSE          =     1.1916

                               (Replications based on 63 clusters in state_id)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .2926192   .0104536    27.99   0.000     .2721305    .3131079
          x2 |   .0316666    .036746     0.86   0.389    -.0403543    .1036874
          x3 |   .0694911   .0176274     3.94   0.000      .034942    .1040402
          x4 |   .6947005   .0529963    13.11   0.000     .5908297    .7985713
     cityX5n |   .0329823    .015012     2.20   0.028     .0035593    .0624054
     cityX6n |    .105227   .3638874     0.29   0.772    -.6079792    .8184331
       _cons |   4.080161   .2575816    15.84   0.000     3.575311    4.585012
------------------------------------------------------------------------------

Absorbed degrees of freedom:
--------------------------------------------------------------+
          Absorbed FE | Categories  - Redundant  = Num. Coefs |
----------------------+---------------------------------------|
 state_industry_pairs |       971         971           0    *|
--------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
As you can see, coefficients of cityX5n and cityX6n are similar across three commands, but SEs are different.
The SEs produced by ivregress are quite close but a bit higher than those with manual calculation (esp cityX6n, SE is .3638874 with manual calculation, but only 0.2336223 through reghdfe, while SEs produced by ivreghdfe seem to be very low, which seems to be suspicious, it seems that they have problem with cluster option.

My questions are:
1. Do you know why many manual calculation does not give strictly similar results to what they are with ivregress?
2. Do you know what is the problem with ivreghdfe in this case? If you share my suspicion of the option cluster, do you suggest any solution?

The point is that, I would like to replace state_industry_pairs with dummies of state and industries at the finer level (more digit in the industry classification) to improve the accuracy of estimation, so there will be more pairs to estimate. In that case, ivregress could not handle the regression due to too many dummies (more than 3,000). The manual calculation can handle any types of dummies but it tends to produce higher SEs which can influence my final verdict on the role of cityX5 and cityX6, in addition, the bootstrap command is running quite slowly.... Finally, ivreghdfe is quickest, produces automatically important identification test, handle dummies very well, but gives values of SEs that are too good (in terms of expectation from theory) to be true. Hence, I really need to understand what is going on the differences in SEs and how to deal with it, because they should be all the same.

I look forward to receiving answers or advice from you to make sense out of this problem. Thanks in advance.

Best regards,
Cuong