BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Friday, December 31, 2021

Log transformation of the dependent variable in a first-differences model

Hello all,

I am estimating a first differences model of the following form ΔY_it = ß₀ + ß₁ΔX_it + ɛ_it.

In one of the regressions estimated, the dependent variable is wages. As I am more interested in the percentage change than the level change, I want to use the logarithm for the dependent variable.

Do I have to use log(wage_it+1 - wage_it) or log(wage_it+1) - log(wage_it) as my dependent variable?

Thank you!

joint significance of the dummy variables

Hi, How do I test for the joint significance of the dummy variables using Stata commands?

Denis

Bootstrap post-estimation Craggit model

Hi there, I am needing to use Cragg's Double Hurdle Model, and using the Craggit command. However, I'm using Burke's 2009 treatment of the command (APE boot) to estimate my actual parameters of interest and to bootsrapt standard errors.

But, crucially, I don't know how to access the co-efficients and standard errors from my bootstrapping to create a table for exporting! My code is below. I would love to know how I might get the estimates for tabulation, but also open to other ways to generate the standard errors.

program define APEboot, rclass

preserve

*generating the parameter estimates used to calculate the APE from the overall model for the E(y|y>0) and Pr(y>0)
craggit dum_attend treat, second(prop_attend treat) vce(robust)
predict bsx1g, eq(Tier1)
predict bsx2b, eq(Tier2)
predict bssigma, eq(sigma)
generate bsIMR = normalden(bsx2b/bssigma)/normal(bsx2b/bssigma)

*The estimates for each model below
gen bsdPw1_dtreat = [Tier1]_b[treat]*normalden(bsx1g)
gen bsdEyyx2_dtreat = [Tier2]_b[treat]*(1-bsIMR*(bsx2b/bssigma+bsIMR))
gen bsdEy_dtreat = ///
[Tier1]_b[treat]*normalden(bsx1g)*(bsx2b+bssigma*bsIMR) ///
+[Tier2]_b[treat]*normal(bsx1g)*(1-bsIMR*(bsx2b/bssigma+bsIMR))

*creating the ape matrices for bootstrapping
su bsdPw1_dtreat
return scalar ape_Pw1_dtreat = r(mean)
matrix ape_Pw1_dtreat = r(ape_Pw1_dtreat)

su bsdEyyx2_dtreat
return scalar ape_Eyyx2_dtreat = r(mean)
matrix ape_Eyyx2_dtreat = r(ape_Eyyx2_dtreat)

su bsdEy_dtreat
return scalar ape_dEy_dtreat = r(mean)
matrix ape_dEy_dtreat = r(ape_dEy_dtreat)

restore
end

*generating the ape estimates using bootstrapping
bootstrap ape_Pw1_dtreat = r(ape_Pw1_dtreat), reps(100): APEboot
bootstrap ape_Eyyx2_dtreat = r(ape_Eyyx2_dtreat), reps(100): APEboot
bootstrap ape_dEy_dtreat = r(ape_dEy_dtreat), reps(100): APEboot

program drop APEboot

Difference between DiD and two-way fixed effects model

I am currently writing my master thesis, where I analyze the effect of hurricanes on the stock market. I have an unbalanced panel dataset with stock returns over several days prior to and after each hurricane over a time frame of several years. I created a dummy variable "hurricane" taking the value 1 if the stock is affected that day by a hurricane, 0 otherwise. I included time-fixed as well as firm-fixed effects and clustered the standard error on firm-level. I have run the following regression as a baseline model:

xtreg RET i.hurricane i.date, fe vce(cluster PERMNO)

Some stocks are never affected, whereas others are affected once or even multiple times. The dummy is intermittent (switches "on and off")

Question:
My understanding is, that this variable can be seen as the interaction term post*treated in the diff-in-diff model, is that correct? Is the TWFE model equivalent to the generalized DID?

Any help would be highly appreciated!! Have a great NYE!

Interpretation of constant in xtreg

Hello everyone!

I am currently writing my master thesis, where I analyze the effect of hurricanes on the stock market. I have an unbalanced panel dataset with stock returns over several days prior to and after each hurricane over a time frame of several years. I created a dummy variable "hurricane" taking the value 1 if the stock is affected that day by a hurricane, 0 otherwise. I included time-fixed as well as firm-fixed effects and clustered the standard error on firm-level. I have run the following regression as a baseline model:

xtreg RET i.hurricane i.date, fe vce(cluster PERMNO)

The regression output shows the coefficient for the hurricane dummy variable, many coefficients for all the days (which I know I don't have to look at further) and the constant.

Question:
Does the constant reflect the intercept (if the hurricane dummy is 0)? If so, I understand that in order to retrieve the effect if the stock is affected by the hurricane (dummy = 1), by adding the constant to the coefficient for hurricane dummy.

I am confused, as I have read, that I cannot interpret the constant in the fixed effects model (https://www.stata.com/support/faqs/s...effects-model/)

I would very much appreciate it if someone could help!!

DID estimation

hello everyone, how can I treat the problem if I see an error years not nested in countryid
or countryid not a control.

Double Clustering in a Multi-country Data Set up

Dear Stata Members
First, a heartfelt advance New Year Wishes to All. I wish all a prosperous New Year

I am dealing with a cross-country dataset, in which the lowest units are firms. I have an agglomeration of firms (industry) and the broad level is the country. I have 22 Countries, 18 Industries,17252 firms and 22 years.
For panel data clustering I usually cluster at a single unit level, that is firm-level. However, some articles cluster at both firm and year levels in the cross-country setup.

What does it mean by double clustering (firm and year)?
Clustering as far as I know in the context of the panel, is to account for the correlation within the units. For instance, if the residual of the outcome variable is likely to be correlated within say Industry, one should cluster the standard errors by industry. But in the context of double clustering with respect to firm-year, will it make sense to cluster SE within these unique pairs of firm and year?

Similarly in a post, I have seen that clustering units less than 30 is not advisable (https://www.statalist.org/forums/for...72#post1603472). Will this apply to double clustering, where my no: of years are <30.

Code:

. xtset id year

Panel variable: id (unbalanced)
 Time variable: year, 1999 to 2020, but with gaps
         Delta: 1 unit

. reghdfe dividends risk  roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id )
(dropped 1846 singleton observations)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression                            Number of obs   =     92,159
Absorbing 2 HDFE groups                           F(   9,  10505) =     192.48
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.3821
                                                  Adj R-squared   =     0.3024
                                                  Within R-sq.    =     0.0373
Number of clusters (id)      =     10,506         Root MSE        =     0.1669

                                (Std. err. adjusted for 10,506 clusters in id)
------------------------------------------------------------------------------
             |               Robust
   dividends | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        risk |   .0089675    .003735     2.40   0.016     .0016462    .0162888
       roa_w |  -.7158403   .0192201   -37.24   0.000    -.7535152   -.6781653
      size_w |   .0051734   .0023954     2.16   0.031      .000478    .0098688
       lev_w |  -.0614293   .0088244    -6.96   0.000    -.0787268   -.0441318
        sg_w |  -.0029462   .0003515    -8.38   0.000    -.0036352   -.0022572
  cash_ta1_w |  -.0693444    .010555    -6.57   0.000    -.0900342   -.0486545
    tangib_w |  -.0245404   .0092626    -2.65   0.008    -.0426969    -.006384
         age |   .0165564   .0036146     4.58   0.000     .0094712    .0236417
        mb_w |  -.0006307   .0001642    -3.84   0.000    -.0009526   -.0003089
       _cons |   .2522908   .0239573    10.53   0.000     .2053299    .2992517
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          id |     10506       10506           0    *|
        year |        21           0          21     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. reghdfe dividends risk  roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id year )
(dropped 1846 singleton observations)
(MWFE estimator converged in 8 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.

HDFE Linear regression                            Number of obs   =     92,159
Absorbing 2 HDFE groups                           F(   9,     20) =      94.49
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.3821
                                                  Adj R-squared   =     0.3024
Number of clusters (id)      =     10,506         Within R-sq.    =     0.0373
Number of clusters (year)    =         21         Root MSE        =     0.1669

                               (Std. err. adjusted for 21 clusters in id year)
------------------------------------------------------------------------------
             |               Robust
   dividends | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        risk |   .0089675   .0122294     0.73   0.472    -.0165426    .0344776
       roa_w |  -.7158403    .046722   -15.32   0.000    -.8133007   -.6183799
      size_w |   .0051734    .004472     1.16   0.261     -.004155    .0145018
       lev_w |  -.0614293     .01249    -4.92   0.000     -.087483   -.0353757
        sg_w |  -.0029462   .0006372    -4.62   0.000    -.0042754    -.001617
  cash_ta1_w |  -.0693444   .0108852    -6.37   0.000    -.0920505   -.0466382
    tangib_w |  -.0245404   .0096214    -2.55   0.019    -.0446104   -.0044705
         age |   .0165564   .0060575     2.73   0.013     .0039207    .0291922
        mb_w |  -.0006307   .0002081    -3.03   0.007    -.0010648   -.0001967
       _cons |   .2522908   .0807704     3.12   0.005     .0838068    .4207749
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          id |     10506       10506           0    *|
        year |        21          21           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Double clustering indicates that clustering is done for 21 clusters (id-year). But the significance level has also changed. What could be the reason for this drop in significance from Single clustering to Double clustering?
Any thoughts, or suggestions could be helpful as this is for my general learning

Xtabond2 command system GMM.

Hello to everyone, I am searching in terms of the effect of the uncertainty on the saving and in one part of my robustness check I want to do a Xtabond2 . I read the construction of doing xtabond2 from David Roodman . but I become confused. I have a panel data from 1996 to 2017. and :

saving (i,t)= b0+b1saving (i,t-1)+b2 uncertainty (i,t-1)+ b3 X(i,t-1) +vt + vi+ e(i,t)

and x(i,t-1) is a vector of controls, which in the baseline model includes only human capital and per capita income.

in the Xtabond2 I want to address a solution to the possible endogeneity problem between economic uncertainty and the saving by instrumenting them with suitable lagged variables. To obtain efficient findings in the System GMM estimations, I need evidence for the validity of the first-order autocorrelation in the residuals, but second-order autocorrelation must be rejected. Then We run the Sargan test to avoid possible over-identification problems.
the more I read the more I become confused to how to do this.

Thank you very much in advance.

Regards,

Thursday, December 30, 2021

ivreg2 warning covariance matrix of moment conditions not of full rank

Dear Stata Users,
I was trying to estimate an IV regression (2SLS)

I have few dummies in my model, the dependent is a binary variable of benefit's claim and I am instrumenting the amount. I use weights to correct for sampling issues.
I got this error at the end of my 2nd stage results table

Code:

 ivreg2 nclaim age age2 married ib3.edu i.eu_nat i.pgemplst_gen i.howner i.singlep i.dis_dummy  i.female i.east ib2.citysize i.haskids  (ben_amt=  prvtransfers needs) [pw=hweight] if  head==1, first 
-----------------------------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):            146.799
                                                   Chi-sq(2) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):             1545.588
                         (Kleibergen-Paap rk Wald F statistic):        165.936
Stock-Yogo weak ID test critical values: 10% maximal IV size             19.93
                                         15% maximal IV size             11.59
                                         20% maximal IV size              8.75
                                         25% maximal IV size              7.25
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Warning: estimated covariance matrix of moment conditions not of full rank.
         overidentification statistic not reported, and standard errors and
         model tests should be interpreted with caution.
Possible causes:
         singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
partial option may address problem.
------------------------------------------------------------------------------
Instrumented:         ben_amt
Included instruments: age age2 married 0.edu1 1.edu2
                      2.edu3 4.edu4 1.howner 1.singlep 1.dis_dummy  
                      1.female 1.east 1.citysize 3.citysize 1.haskids
Excluded instruments: prvtransfers needs
------------------------------------------------------------------------------

I don't know what to do here with this error message. The results show the coefficient of the instrumented variable/ ben_amt -.007962 to be be significant (P-value 0.095 ) though weakly significant. The two instruments used are shown to be strongly significant in the first stage results (both p-value 0.000)

help on stacking matrices

Hi, Stata gurus,

this seems to be easy, once with right Mata commands.

I have N matrices, same number of columns, but different number of rows. They are named matrix1 .. matrixN.

I would like to stack them all, dropping the original ones, using a loop.

Any clue ? thanks, in advance.

2SLS with we use polynomial terms of birth cohort.

Hello,

I am trying to estimate a fuzzy RD. Fuzzy RD is equivalent to 2SLS (Hahn et al. 2001). So I am trying to run 2SLS. (I am not allowed to use rd or rdrobust commands.)
Compulsory schooling in Turkey increased from 5 to 8 years in 1997. Birth cohorts that are born after 1986 (not included) are affected by the reform. I want to control for birth-cohort polynomials up to fourth-order. I also include age dummies. However, when I run the command below, birth_year2 birth_year3 birth_year4 are omitted because of multicollinearity. Could you help me with my code?

xi: ivregress 2sls lwage (yrs_school = reform) birth_year birth_year2 birth_year3 birth_year4 i.age , first

Thank you very much

Choice of regression model

My dependent variable is interval but bounded in the interval 5 to 80. All the values are known within the accuracy of the experiment and it is not possible for values to occur outside of these bounds. I had intended to analyse the data within a regression framework but I am uncertain which regression model to use.

Would tobit or truncreg be appropriate or is there another model; or is the fact that the data does not follow the OLS assumption of an unbounded dependent variable not that important?

Eddy

Problem with bysort command as it is always showing not sorted!

Dear Stata Members
I would like to calculate the industry sales growth (isg) and following is my sample data

Code:

clear
input str9 firm int year byte ind_dum float netsales
"4DS.AX"   2015 31 7.70e-06
"PRS.AX"   2015 21        .
"HRN.AX"   2015 21        .
"VIC.AX"   2015 21  .000139
"PVW.AX"   2015 21        0
"ALX.AX"   2015 23  1.88932
"FBR.AX"   2015 31  .174572
"IPT.AX"   2015 21  .927405
"ODM.AX"   2015 21  .018043
"ABV.AX"   2015 31  5.31421
"SRK.AX"   2015 21  .193432
"KLO.AX"   2015 23  13.2431
"GLA.AX"   2015 21  .009091
"EOS.AX"   2015 31  22.1893
"BLY.AX"   2015 21  735.158
"AGR.AX"   2015 21  .030962
"DRE.AX"   2015 21  .015323
"LLC.AX"   2015 23  10231.6
"NZS.AX"   2015 31  .477879
"TPD.AX"   2015 21  .298108
"HGO.AX"   2015 21  101.487
"AWV.AX"   2015 21  .016317
"ALT.AX"   2015 31  .862368
"IMC.AX"   2015 31  .865704
"SYA.AX"   2015 21  .016271
"TNR.AX"   2015 21        0
"FLC.AX"   2015 31   .00118
"NWC.AX"   2015 21  .034922
"ADR.AX"   2015 31  2.21078
"AAP.AX"   2015 11   2.6729
"HPP.AX"   2015 31  54.7183
"CIO.AX"   2015 31        0
"AGD.AX"   2015 21  62.4951
"BMN.AX"   2015 21   .05778
"TGN.AX"   2015 21  .120552
"PXX.AX"   2015 21  .002889
"RIO.AX"   2015 21    34829
"AJQ.AX"   2015 21  .075345
"PDI.AX"   2015 21  .007142
"PSA.AX"   2015 21    1.613
"GGG.AX"   2015 21  .141135
"CCE.AX"   2015 22  1.25232
"PDZ.AX"   2015 21  .026895
"CRR.AX"   2015 21   .14729
"PPG.AX"   2015 31  187.885
"AQX.AX"   2015 21  .000917
"EUR.AX"   2015 21  .003143
"SVL.AX"   2015 21  .002712
"CCZ.AX"   2015 21  .004715
"FNT.AX"   2015 21  .015054
"LRL.AX"   2015 21  .001579
"TMS.AX"   2015 21        .
"SIH.AX"   2015 21  .001418
"APG.AX"   2015 21        .
"CNJ.AX"   2015 21  .000855
"EMU.AX"   2015 21  .166837
"XAM.AX"   2015 21  .026918
"FIN.AX"   2015 21        .
"CLA.AX"   2015 21  .002042
"GUD.AX"   2015 31  382.961
"DYL.AX"   2015 21  .097517
"DEV.AX"   2015 21        .
"AAU.AX"   2015 21  46.6495
"JAL.AX"   2015 21  .012966
"AAR.AX"   2015 21        0
"GLV.AX"   2015 21  .164472
"AZS.AX"   2015 21  .275887
"LCL.AX"   2015 21  .003237
"SFX.AX"   2015 21  .164811
"LEG.AX"   2015 21  .418429
"LCD.AX"   2015 21  .008343
"ADD.AX"   2015 21        0
"BSR.AX"   2015 21  .015278
"BKW.AX"   2015 31   528.38
"NVA.AX"   2015 21        0
"IDA.AX"   2015 21  2.86434
"PPY.AX"   2015 31  .068049
"EQR.AX"   2015 21  .040662
"HE8.AX"   2015 21        .
"KRR.AX"   2015 21  .003197
"ARD.AX"   2015 21        .
"SES.AX"   2015 31  13.8294
"SAN.AX"   2015 21     .047
"BNR.AX"   2015 21  .021363
"MSV.AX"   2015 21  19.3945
"CEL.AX"   2015 21  .044144
"MPR.AX"   2015 22  41.8527
"DEM.AX"   2015 22  .528381
"AS1.AX"   2015 11  1.25869
"LTR.AX"   2015 21  .383828
"VML.AX"   2015 21  .035893
"WNR.AX"   2015 31  2.20463
"MLM.AX"   2015 21  .552067
"MGT.AX"   2015 21  .056362
"DDD.AX"   2015 21   .00785
"XRF.AX"   2015 31  15.9248
"A3D.AX"   2015 31        .
"GLN.AX"   2015 21  .021841
"OKJ.AX"   2015 31  .011024
"FWD.AX"   2015 31  210.239
"RIC.AX"   2015 31  699.422
"ZEU.AX"   2015 21  .137385
"RFG.AX"   2015 31  190.477
"MNB.AX"   2015 21   .00235
"MXC.AX"   2015 31  .004538
"GMRDD.AX" 2015 21  .019976
"ERA.AX"   2015 21  253.359
"NXM.AX"   2015 21        0
"WC8.AX"   2015 21  .001402
"CZR.AX"   2015 21  .007365
"ESS.AX"   2015 21  .201089
"AYM.AX"   2015 21  .037542
"MYL.AX"   2015 21  .166144
"KOV.AX"   2015 31  48.5543
"ZGL.AX"   2015 31  92.6665
"AUH.AX"   2015 21  .013012
"VMS.AX"   2015 21  .134612
"BUY.AX"   2015 21  1.49285
"GAP.AX"   2015 31  114.014
"CRB.AX"   2015 21  .065337
"BRI.AX"   2015 31  123.384
"GW1.AX"   2015 21  .098519
"BUX.AX"   2015 21  .011232
"ECS.AX"   2015 31  4.54599
"WSI.AX"   2015 23  .000116
"88E.AX"   2015 21  .010549
"AJY.AX"   2015 21  1.93232
"E25.AX"   2015 21  .334176
"ERL.AX"   2015 21  .003112
"SI6.AX"   2015 21  .055345
"CSL.AX"   2015 31     5628
"ATL.AX"   2015 31  72.7194
"BKL.AX"   2015 31  363.331
"MBK.AX"   2015 21  .004946
"PBX.AX"   2015 21  .038497
"MVL.AX"   2015 21        0
"ANW.AX"   2015 21  .001009
"EGR.AX"   2015 31  .038173
"NWF.AX"   2015 21   .05033
"PEN.AX"   2015 21     .153
"WPL.AX"   2015 21     5030
"DRX.AX"   2015 21  .111577
"SGC.AX"   2015 21  .270732
"GLL.AX"   2015 21  .615548
"IBG.AX"   2015 21  .196452
"EMP.AX"   2015 21  .287859
"SKS.AX"   2015 23  2.71821
"PEK.AX"   2015 21  .029606
"AWN.AX"   2015 22   3.6738
"M7T.AX"   2015 31  .128718
"AUZ.AX"   2015 21        .
"TIE.AX"   2015 21        .
"EVN.AX"   2015 21  513.053
"COI.AX"   2015 21  .593977
"TON.AX"   2015 21  .000546
"AOP.AX"   2015 21  .016479
"FHS.AX"   2015 21  .000755
"PNV.AX"   2015 31   .10841
"AEV.AX"   2015 21  .550018
"BXB.AX"   2015 31   5440.5
"CGB.AX"   2015 31  .131854
"RSG.AX"   2015 21   381.81
"ASQ.AX"   2015 21  1.22065
"FFI.AX"   2015 31   23.669
"EL8.AX"   2015 21  .289801
"MWY.AX"   2015 31  109.614
"AQI.AX"   2015 21   .01681
"PRX.AX"   2015 21  .302281
"RR1.AX"   2015 21  .005855
"EGY.AX"   2015 31  8.32588
"RNX.AX"   2015 21  .012149
"FML.AX"   2015 21  1.82021
"RPG.AX"   2015 23        0
"CHK.AX"   2015 31  .010146
"MCE.AX"   2015 31  110.995
"AFR.AX"   2015 21    .0705
"HLX.AX"   2015 21  .055592
"GSN.AX"   2015 21  .030639
"VRX.AX"   2015 21  .047102
"GED.AX"   2015 21  .131484
"GOR.AX"   2015 21  .770753
"BMG.AX"   2015 21  .003821
"BLG.AX"   2015 31  2.72176
"GIB.AX"   2015 21   .07533
"ADN.AX"   2015 21  .036463
"ECT.AX"   2015 21  .009676
"ZIM.AX"   2015 21  408.391
"CCJ.AX"   2015 21  .011348
"MAT.AX"   2015 21  1.51972
"WMC.AX"   2015 21  .009707
"BAS.AX"   2015 21  .057037
"TNP.AX"   2015 21  .004053
"TZN.AX"   2015 21        0
"KAR.AX"   2015 21   1.6671
"TTT.AX"   2015 31        0
"CAE.AX"   2015 21  .005971
"AIS.AX"   2015 21  167.395
"STO.AX"   2015 21     2442
"CZN.AX"   2015 21  1.08749
"E79.AX"   2015 21  .074582
"CGN.AX"   2015 21   .04392
"MGX.AX"   2015 21  252.577
"FMG.AX"   2015 21     8574
"AVA.AX"   2015 31   14.371
"RAN.AX"   2015 31   .34746
"CYC.AX"   2015 31   9.1874
"ENR.AX"   2015 21  1.23003
"TOE.AX"   2015 21  .382333
"KPO.AX"   2015 22  .775005
"SYR.AX"   2015 21     .291
"GCR.AX"   2015 21  .030816
"AKN.AX"   2015 21  .031552
"MBH.AX"   2015 31  .007951
"RDN.AX"   2015 21  43.7964
"FAU.AX"   2015 21  .004569
"CHN.AX"   2015 21  .468602
"MML.AX"   2015 21  123.172
"NCR.AX"   2015 21  .396755
"BLU.AX"   2015 21  .161784
"EYE.AX"   2015 31  48.2909
"GLB.AX"   2015 31  106.131
"OBM.AX"   2015 21  .001872
"ARR.AX"   2015 21  .003944
"SKY.AX"   2015 21  .244236
"ICG.AX"   2015 21  .026116
"SGQ.AX"   2015 21  .016047
"INA.AX"   2015 23  58.5664
"STX.AX"   2015 21  2.07699
"AGS.AX"   2015 21  .170589
"TNG.AX"   2015 21        .
"OKU.AX"   2015 21  .505905
"CMM.AX"   2015 21  .214625
"PO3.AX"   2015 22        0
"AME.AX"   2015 21  .026317
"NMT.AX"   2015 21  .323205
"BDC.AX"   2015 21  .086285
"MAN.AX"   2015 21  2.41912
"PGM.AX"   2015 21  .013397
"KGM.AX"   2015 21  .007165
"DAF.AX"   2015 21  .001071
"ING.AX"   2015 11   1740.1
"RXM.AX"   2015 21        .
"KFE.AX"   2015 21   .01047
"PTX.AX"   2015 31  .030793
"ENT.AX"   2015 21  .012742
"GES.AX"   2015 21   .00077
"VAR.AX"   2015 21  .206112
"CYL.AX"   2015 21  .015516
"CMP.AX"   2015 31  25.8045
"SIX.AX"   2015 31  .973876
end

Code:

encode firm, gen (id)

xtset id year

Panel variable: id (strongly balanced)
 Time variable: year, 2015 to 2015
         Delta: 1 unit


 by ind_dum year, sort:  gen isg = d.netsales /l.netsales                                   
not sorted
r(5);

*I also tried 
bysort ind_dum (year): gen isg = d.netsales /l.netsales
not sorted
r(5);

Why is Stata repeatedly saying not sorted?

propensity score matching

I want to do a 1:2 propensity score matching. I am unable to find a source for this any help?

Wednesday, December 29, 2021

set data out of range as missing

Dear statalist,

I have a variable, index, its value should be between 0 to 1, but there are some values out of this range, so I want to set those values as .
I tried

Code:

replace index==. if index<0|index>1

but that doesn't work. replace index=. doesn't work either
It seems I could replace index with any real value but not .

Any help will be greatly appreciated!

Stata forecast does not support mixed effects models

As part of reviewer's requests/recommendations, I am trying to compare forecast accuracy of the (almost) same model using fixed effects, random effects, and mixed effects. My data is panel data. I see that forecast support xtreg but does not support xtmixed. Has anyone succeeded using Stata to forecast based on xtmixed model outputs?

If anyone from Stata is reading, can anyone briefly explain, under the hood, why forecast does not support xtmixed models? Is this some feature that can be added/updated in future Stata version? If for now forecast cannot support xtmixed models, what is the "work around" solution using Stata to forecast based on xtmixed model outputs?

Thanks for taking your time reading and/or answering in advance!

Endogeneity and RESET Test with PPML

Hello,

I'm trying to estimate trade potential for the BRICS group of countries - as in, intra-BRICS trade potential - using a gravity model and the PPML estimator. I'm generating predicted values, and then calculating trade potential by (predict-actual). My dependent variable is exports (in $USD MN), while my independents are ln(gdp) of exporter and importer, standard gravity covariates and membership in RTAs. I'm using exporter-time and importer-year fixed effects. I run this for aggregate merchandise trade and then replicate it for several products of interest. I'm using COMTRADE and CEPII gravity database to source my data.

I have three questions here:
1. How can I test for endogeneity in the context of the PPML estimator?
2. How does one test for misspecification problems such as RESET test? As far as I know, RESET tests for whether a linear form of suitable for our regression analysis. Does it also work when we no longer use OLS (like in my case here)?
3. Should I run my regression on all the relevant products of interest in one combined regression? Will that provide me with some additional insight?

This is my code (after importing dataset):

Code:

egen exporter_year = group(exporter year)
tabulate exporter_year, generate(EXPORTER_YEAR_FE)
egen importer_year = group(importer year)
tabulate importer_year, generate(IMPORTER_YEAR_FE)

drop if exp<0

ppml exp lngdp_ex lngdp_im lndistw contig comlang_off comcol col45 rta EXPORTER_YEAR_FE* IMPORTER_YEAR_FE*

predict fitted
gen potential = fitted - exp
(252 missing values generated)

Many thanks for your time and attention. Let me know if I should provide some extra information.

Regards,
Saunok

Charlson index for dataset that contains both ICD9 and ICD10 for different years

Hello,

I am working on a hospital administrative dataset combined for multiple years between 2004-2018 the cases in the earlier years made use of ICD9, while 2016 upward made use of ICD10. I am trying to calculate the Charlson Co-morbidity index for each case in the combined data set. The available code index only accommodates either ICD9 or ICD 10, is there a way I can combine both. The code I used is pasted below.

charlson DX1-DX30 I10_DX1-I10_DX40 , index(c) assign0 wtchrl cmorb *for ICD 9* or charlson DX1-DX30 I10_DX1-I10_DX40 , index(10) assign0 wtchrl cmorb *for ICD 10*

I look forward to your help with cheerful optimism.

Olowu

attempt at simple xtline plot gives syntax error

Hi everybody,

I am using Stata15 and want to show oil price development in the time-period I am looking at since its part of my independant variable.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id_2 long yrmo int target_sidea double oil_price_t0 float OPS_t
. 200312   .  27.85            .
. 200912   .  75.48            .
. 200304   .  23.43            .
. 200907   .  64.96            .
. 200305   .  24.25            .
. 200905   .  57.39            .
. 200401   .  28.68            .
. 200906   .  69.21            .
. 200307   .  26.62            .
. 200303   .  27.42            .
. 200311   .   27.5            .
. 200309   .  25.26            .
. 200302   .  30.19            .
. 200908   .  71.32            .
. 200911   .  77.62            .
. 200308   .  27.61            .
. 200310   .  27.13            .
. 200306   .  25.49            .
. 200910   .  73.27            .
. 200903   .  45.57            .
. 200904   .  50.18            .
. 200301   .  28.05            .
. 200909   .   67.9            .
1 200806  27 127.58    .07012443
1 200709  57  73.25    .08620504
1 200405  35  34.47    .10416858
1 200803  28  96.77    .07297191
1 200511  76  51.31   -.04478218
1 200605 221   64.9   .013183548
1 200606 128  65.08   .002769635
1 200802  37  89.96   .031504914
1 200510 105  53.66   -.05228052
1 200609 184  59.77   -.14040913
1 200901  18  44.96    .09244503
1 200807  21 131.22    .02813167
1 200406  52   33.4  -.031533517
1 200702 185  55.68    .06818504
1 200503 118  45.57     .1073863
1 200608 149  68.78  -.003773055
1 200407  54  34.48     .0318235
1 200610 245   56.5   -.05626323
1 200509  96  56.54 -.0014139086
1 200404  43  31.06     .0195065
1 200801  42  87.17    .01642415
1 200811  34  51.38   -.28933507
1 200410  87  37.57    .05751856
1 200701 216  52.01   -.12032206
1 200703 139  59.05    .05876352
1 200412 107  34.26  -.017648337
1 200808  36 113.21   -.14763081
1 200501 162  37.81    .09859517
1 200812  39  40.99   -.22592087
1 200602 145  57.57   -.01277203
1 200707 114  69.45     .0547474
1 200409  12  35.47   -.07650152
1 200505 179     45   -.04561048
1 200508 112  56.62    .06890459
1 200611 136  56.81   .005471728
1 200704 186  63.83    .07783877
1 200804  30 103.46    .06684807
1 200607 172  69.04    .05906874
1 200403  32  30.46     .0703774
1 200604 180  64.05    .10544727
1 200603 138  57.64  .0012151777
1 200506 122  50.97    .12457474
1 200512  29  53.12   .034667812
1 200902  18  43.13   -.04155437
1 200402  14  28.39  -.010163056
1 200507 117  52.85   .036220465
1 200411 231  34.87     -.074579
1 200504 123   47.1   .033023402
1 200706 108  65.75    .01857447
1 200809  30  95.96   -.16531305
1 200810  31  68.62    -.3353474
1 200711  52  86.73     .1171779
1 200601 163  58.31    .09322012
1 200805  32 118.94    .13943411
1 200712  57  85.75  -.011363798
1 200612 124  58.66   .032045674
1 200502 101  40.93    .07928964
1 200408  78  38.29    .10480934
1 200705 123  64.54   .011061858
1 200710  45  77.14     .0517437
1 200708  77   67.2   -.03293378
2 200601 293  58.31    .09322012
2 200604 208  64.05    .10544727
2 200509 185  56.54 -.0014139086
2 200512  75  53.12   .034667812
2 200703 339  59.05    .05876352
2 200610 378   56.5   -.05626323
2 200902  16  43.13   -.04155437
2 200808  29 113.21   -.14763081
2 200510 295  53.66   -.05228052
2 200708 199   67.2   -.03293378
2 200607 278  69.04    .05906874
2 200612 350  58.66   .032045674
2 200704 360  63.83    .07783877
2 200806  48 127.58    .07012443
2 200609 302  59.77   -.14040913
2 200506 149  50.97    .12457474
end
format %tm yrmo

The data is correctly xtset:

Code:

xtset id_2 yrmo, monthly

And I want (for starters) a simple graph:

Code:

xtline oil_price_t0, overlay

After all I have read this should work, but it doesn't and Stata gives me a 'invalid syntax'.

I'd appreciate any hint.
Best
Marvin

Data cleaning - checking correct encoding of variables

Hello,

I'm very new to Stata and am trying to complete some data cleaning. I have a dataset with 5 variables and around 200 million observations. The variables are all numeric, and I would like to check that three of them have been encoded correctly, as they were originally categorical (string) variables. For example, I would like to know if the numerical code captures distinct countries for the country variable (there may be typos in the original categories, for instance).

The original string variables are not available, but Stata shows the country names in browse (the categorical variable), but treats the variable as numeric in the data editor. Is there any way to check what the equivalencies between the two are?

Thank you in advance for any help you might be able to give me!

Best wishes,
Clara

Hausman test: p=value of 1

Hi, I am trying to carry out a Hausman test on panel data (using males and females) to clarify if I should use a fixed or random-effects model. However, when I do this I get a p-value of 1 (which I am assuming means something has gone wrong). I have tried to use "sigmamore" and "xtoverid" but neither seem to have any impact - if anyone has any ideas they would be much appreciated!

Small observation in multinomial logistic regression

Hello
I am trying to know what region is more attractive for people. so I used region ( five categorical variables) as a dependant.
people characteristics are education (high low medium), country of origin (EU, nonEU US Africa Asia), and type (Family, Student, Worker, Refugees). All are categorical
I also aim to interact education with type fn origin. But as there are only two low and medium educated students and 5 low educated EU in one specific region, the relative risk ratio for these groups in interaction is extremely high with empty intervals. So I remove them. running my MNL, results are much better.

but as e.g. low educated students are removed for students I have two levels of education while for other groups I have three educational levels. so the low educated student is zero while high educated students are also omitted! basically, how should I exclude some observations in MNL?

Weighted least squares

Hello!

I have conducted an ordinary least squares model and to test it I performed a sensitivity analysis by doing weighted least squares since I have heteroskedastic standard errors. I wonder what's the advantage of conducting a WLS and how I should interpret my two results?

Best regards,
Klaudia

Tuesday, December 28, 2021

Generalized propensity score matching for multilevel treatment

Hi,

I am a first-time user of the generalized propensity score. I have 3 treatment levels that are qualitatively different. I used mlogit and predict commands to estimate three sets of gpscore. But I am not sure how to match the pairs of treatments. I have read in several papers that the researchers used nearest neighbour matching to match the treated and control units. Can I use the same if I have a vector of propensity scores, and can I do it in Stata? If not, is there any other way to match the units based on the gpscores in Stata?

Will appreciate any help on this matter.

Thanks,
Nadia

Generate two new variables in Stata representing the change in net-pay and the change in well-being between june and october

Hi, I'm new to Stata how can I show the above statement in Stata?
I would really appreciate any advice on how to solve this in Stata.

error: depvar may not be a factor variable

Hey everybody,

I want to run a regression with binary variables but I am getting this error: depvar may not be a factor variable.

I don't know why "no_ma_deals_binary" might not be a factor variable. It only contains 0 and 1.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(no_ma_deals_binary CTO_presence)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
end

I hope someone can help me.

Kind regards,
Jana

Create variable using expansion factor

Dear all, I would like to ask you a question. In a database I have identified a group of people, who should receive a transfer, but I have a problem, I have no idea how to do this, considering the expansion factors. That is, if I give the subsidy to the 10,000 people identified in the sample, then the expansion factor makes the amount increase. Do you know of any way to correct this so that the number of people is the same?

Best regards and thanks in advance.

Converting Date-Month-Year (string) to year

Dear Stata Members
I have an issue which I know is the most answered questions in this form, I tried using some past posts but to no avail. My issue is reproduced. Incorp is the variable from which I need to extract year

Code:

describe incorp

Variable      Storage   Display    Value
    name         type    format    label      Variable label
------------------------------------------------------------------------------------------------------------------
incorp          str9    %9s

Code:

list incorp in 1/40

     +-----------+
     |    incorp |
     |-----------|
  1. |           |
  2. |           |
  3. |           |
  4. |           |
  5. |           |
     |-----------|
  6. |           |
  7. |           |
  8. |           |
  9. |           |
 10. |           |
     |-----------|
 11. |           |
 12. |           |
 13. |           |
 14. |           |
 15. |           |
     |-----------|
 16. |           |
 17. |           |
 18. |           |
 19. |           |
 20. |           |
     |-----------|
 21. |           |
 22. |           |
 23. | 05-May-86 |
 24. | 05-May-86 |
 25. | 05-May-86 |
     |-----------|
 26. | 05-May-86 |
 27. | 05-May-86 |
 28. | 05-May-86 |
 29. | 05-May-86 |
 30. | 05-May-86 |
     |-----------|
 31. | 05-May-86 |
 32. | 05-May-86 |
 33. | 05-May-86 |
 34. | 05-May-86 |
 35. | 05-May-86 |
     |-----------|
 36. | 05-May-86 |
 37. | 05-May-86 |
 38. | 05-May-86 |
 39. | 05-May-86 |
 40. | 05-May-86 |
     +-----------+

I tried using

Code:

generate numyear = date(incorp, "MDY")
(408,405 missing values generated)

but

Code:

. mdesc numyear

    Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
        numyear |     408,405        408,405         100.00
----------------+-----------------------------------------------

My issue is to extract 1986 if the year is 05-May-86. That is the calendar year. How to do that

Showing a different mean in tabstat

I tabulated in my data type of error cases by the number of children. After using sum and N along with mean in tabstat, I realized I don't want this way of mean calculation.

Code:

tabstat b_err, by( kidsnum) stat(sum N mean)

Array
Ideally, I needed the mean to be calculated of the total number of error cases which is shown by the total cases of sum. So, for example for the no children category, (144/445) instead of how it calculated here (144/818), how to adjust this?

Categorical Variables

Hi, I'm an ultra-beginner to Stata and everything related to it. Nevertheless, I have to fix this problem.
I work on a large-scale assessment with 42,000 values. have 5 self-concept variables, one for each subject (eg: sskmat, sskdeu) and 5 grade variables, one for each subject (eg: tnotemat, tnotedeu) each with 6 grade characteristics (1 is bad; 6 very good) and the dummy variable female.

I basically in the first step just need two grade groups (German&Math High group and one Mathhigh group) and their respective self-concept mean for female and non-female. I z-standardized the variables as well but didn’t use it for this analysis so far, cause I needed the “real” grades (due to a lack of skills).

So far I managed it this way, but it’s not really convenient. I would also need a Cohen’s d (I guess?)

Mean sskmat if tnotemat>=5 & tnotedeu ==4 & female

Mean sskmat tnotemat>=5 & tnotedeu >=5 & female

How can I do this properly?

Thanks a million!!!

Monday, December 27, 2021

Ho, ho, ho! Wild bootstrap tests for everyone!

I've translated the guts of my boottest program from Mata to Julia, producing the Julia package WildBootTests.jl. And at the page just linked, I've posted examples of calling the package from Julia, R, Python, and Stata. The Stata example is convoluted: using Stata 16 or 17, you go into Python. From there you use the Python package PyJulia to link to Julia. For it to work, you need to have installed both Python and Julia, along with PyJulia in Python and WildBootTests in Julia.

While I doubt this will be of much use to Stata users (boottest is damn fast and easier to use), I think the project is interesting in a number of ways:

It offers a model of cross-platform development of statistical software. One need invest in building a feature set and optimizing code just once. Tailored front-ends can be written for each platform. In fact Alexander Fischer is writing one for R.
Julia promises the plasticity of Python and the speed of C, roughly speaking, by way of just-in-time compilation. In my experience, fully realizing this potential takes a lot of work, at least when climbing learning curves. You have to dig into the innards of type inference and compilation and stare at pages of arcane output (from @code_warntype or SnoopCompile). Partly that is a comment on the immaturity of Julia. It has the well-recognized problem of long "time to first plot," meaning that there's a long lag the first time a package is used in a session. On my machine, the wildboottest() function often takes 12 seconds to run the first time, and it was a struggle to get it that low.
Nevertheless, the promise is real. A programmer can achieve much higher performance than with Mata, yet without having to bother with manually compiling code for multiple operating systems and CPUs, the way you do with C plug-ins for Stata. An example below shows WildBootTests 10x faster than boottest even when calling from Stata.
Julia could be more directly integrated into Stata, making the link easier and more reliable. I've already suggested that Stata corp do this, the way they have for Python. Or maybe a user could lead the way, as James Fiedler did for Python.

Here is the example. This does not demonstrate how to install Python, Julia, PyJulia, and WildBootTests, just how to run them once installed. The data set is used in Fast and Wild.

Code:

infile coll merit male black asian year state chst using regm.raw, clear
qui xi: regress coll merit male black asian i.year i.state, cluster(state)
generate individual = _n  // unique ID for each observation

timer clear

timer on 1
boottest merit, nogr reps(9999) bootcluster(individual)  // subcluster bootstrap
timer off 1

timer on 2
mat b = e(b)[1,1..colsof(e(b))-1]  // drop constant term
global vars: colnames b            // get right-side variable names

python
from julia import WildBootTests as wbt
import numpy as np
from sfi import Data

R = np.concatenate(([1], np.zeros(`=colsof(b)'))).reshape(1,-1)      # put null in Rβ = r form
r = np.array([0])                                              
resp = np.asarray(Data.get('coll'))                                  # get response variable
predexog = np.c_[np.asarray(Data.get('$vars')), np.ones(resp.size)]  # get exogenous predictor variables + constant
clustid = np.asarray(Data.get('individual state')).astype(int)       # get clustering variables
test = wbt.wildboottest(R, r, resp=resp, predexog=predexog,
       clustid=clustid, nbootclustvar=1, nerrclustvar=1, reps=9999)  # do test
wbt.teststat(test)                                                   # show results
wbt.p(test)      
wbt.CI(test)
end
timer off 2

timer list

On my machine, I get

Code:

. timer list
   1:     22.64 /        1 =      22.6360
   2:      2.10 /        1 =       2.1040

...meaning the new version is 10x faster.

One source of speed-up is that by default wildboottest() does all computations in single-precision (32-bit floats) rather than double, something that is not possible in Mata, but I think is typically fine for a bootstrap-based test.

How to calculate Sigma (standard deviation of weekly stock return)

Hello all Statalist members.
I would like to calculate SIGMA (denotes standard deviation of weekly stock returns in year T) i mean base on weekly stock return data, when i am using weekly dates then i get missing values, but when i am using year date command then i am getting weekReturnVolatility_yearlynwe values.Therefore i need some suggestions how can i get sigma variable from the following data.

Code:

by code week_start year , sort : egen float weekReturn1_sd= sd(wretwd)

Code:

by code year, sort : egen float weekReturnVolatility_yearlynwe= sd(wretwd)

also one friend of mine recommended me this command

weekReturn1_sd is missing value in below data,
as well as weekReturnVolatility_yearlynwe is sigma i.e .0355006 for 2002-01

Code:

 by code week_start: gen sigma = sd(wretwd)

but it give me this message "not sorted" error

Code:

code    trdwnt    wretwd    week_start    week_end    count    weekReturn1_sd    weekReturnVolatility_yearlynwe
2    2002-01    -.026966    31dec2001    04jan2002    50        .0355006
2    2002-02    -.052348    07jan2002    11jan2002    50        .0355006
2    2002-03    -.083672    14jan2002    18jan2002    50        .0355006
2    2002-04    .06383    21jan2002    25jan2002    50        .0355006
2    2002-05    .040833    28jan2002    01feb2002    50        .0355006
2    2002-06    .000801    04feb2002    08feb2002    50        .0355006
2    2002-09    -.0024    25feb2002    01mar2002    50        .0355006
2    2002-10    .074579    04mar2002    08mar2002    50        .0355006
2    2002-11    -.029104    11mar2002    15mar2002    50        .0355006
2    2002-12    .008455    18mar2002    22mar2002    50        .0355006
2    2002-13    -.033537    25mar2002    29mar2002    50        .0355006
2    2002-14    -.001577    01apr2002    05apr2002    50        .0355006
2    2002-15    .004739    08apr2002    12apr2002    50        .0355006
2    2002-16    -.019654    15apr2002    19apr2002    50        .0355006
2    2002-17    -.011227    22apr2002    26apr2002    50        .0355006
2    2002-18    .007299    29apr2002    03may2002    50        .0355006
2    2002-19    -.022544    06may2002    10may2002    50        .0355006
2    2002-20    -.054366    13may2002    17may2002    50        .0355006
2    2002-21    .029617    20may2002    24may2002    50        .0355006
2    2002-22    -.043993    27may2002    31may2002    50        .0355006
2    2002-23    .010619    03jun2002    07jun2002    50        .0355006
2    2002-24    .000876    10jun2002    14jun2002    50        .0355006
2    2002-25    .04112    17jun2002    21jun2002    50        .0355006
2    2002-26    .110084    24jun2002    28jun2002    50        .0355006
2    2002-27    -.028766    01jul2002    05jul2002    50        .0355006
2    2002-28    -.018706    08jul2002    12jul2002    50        .0355006
2    2002-29    .010504    15jul2002    19jul2002    50        .0355006
2    2002-30    -.040735    22jul2002    26jul2002    50        .0355006
2    2002-31    .009992    29jul2002    02aug2002    50        .0355006
2    2002-32    -.018961    05aug2002    09aug2002    50        .0355006
2    2002-33    -.009244    12aug2002    16aug2002    50        .0355006
2    2002-34    .029686    19aug2002    23aug2002    50        .0355006
2    2002-35    -.022241    26aug2002    30aug2002    50        .0355006
2    2002-36    -.012637    02sep2002    06sep2002    50        .0355006
2    2002-37    -.011945    09sep2002    13sep2002    50        .0355006
2    2002-38    -.002591    16sep2002    20sep2002    50        .0355006
2    2002-39    -.01039    23sep2002    27sep2002    50        .0355006
2    2002-41    -.028871    07oct2002    11oct2002    50        .0355006
2    2002-42    0    14oct2002    18oct2002    50        .0355006
2    2002-43    0    21oct2002    25oct2002    50        .0355006
2    2002-44    -.004505    28oct2002    01nov2002    50        .0355006
2    2002-45    .017195    04nov2002    08nov2002    50        .0355006
2    2002-46    -.102313    11nov2002    15nov2002    50        .0355006
2    2002-47    .012884    18nov2002    22nov2002    50        .0355006
2    2002-48    -.008806    25nov2002    29nov2002    50        .0355006
2    2002-49    -.026654    02dec2002    06dec2002    50        .0355006
2    2002-50    -.009128    09dec2002    13dec2002    50        .0355006
2    2002-51    .0174    16dec2002    20dec2002    50        .0355006
2    2002-52    -.015091    23dec2002    27dec2002    50        .0355006
2    2002-53    -.0143    30dec2002    03jan2003    50        .0355006
2    2003-02    .049505    06jan2003    10jan2003    50        .0421453
2    2003-03    .06499    13jan2003    17jan2003    50        .0421453
2    2003-04    -.019685    20jan2003    24jan2003    50        .0421453
2    2003-05    .011044    27jan2003    31jan2003    50        .0421453
2    2003-07    .018868    10feb2003    14feb2003    50        .0421453
2    2003-08    -.031189    17feb2003    21feb2003    50        .0421453
2    2003-09    .039235    24feb2003    28feb2003    50        .0421453
2    2003-10    .055179    03mar2003    07mar2003    50        .0421453
2    2003-11    .037615    10mar2003    14mar2003    50        .0421453
2    2003-12    .009726    17mar2003    21mar2003    50        .0421453
2    2003-13    .011384    24mar2003    28mar2003    50        .0421453
2    2003-14    .037229    31mar2003    04apr2003    50        .0421453
2    2003-15    .065109    07apr2003    11apr2003    50        .0421453
2    2003-16    .088558    14apr2003    18apr2003    50        .0421453
2    2003-17    -.12095    21apr2003    25apr2003    50        .0421453
2    2003-18    .051597    28apr2003    02may2003    50        .0421453
2    2003-20    .080997    12may2003    16may2003    50        .0421453
2    2003-21    -.007205    19may2003    23may2003    50        .0421453
2    2003-22    .050074    26may2003    30may2003    50        .0421453
2    2003-23    -.049088    02jun2003    06jun2003    50        .0421453
2    2003-24    .0059    09jun2003    13jun2003    50        .0421453
2    2003-25    -.060117    16jun2003    20jun2003    50        .0421453
2    2003-26    -.090484    23jun2003    27jun2003    50        .0421453
2    2003-27    .02916    30jun2003    04jul2003    50        .0421453
2    2003-28    .035    07jul2003    11jul2003    50        .0421453
2    2003-29    .006441    14jul2003    18jul2003    50        .0421453
2    2003-30    -.0144    21jul2003    25jul2003    50        .0421453
2    2003-31    .037338    28jul2003    01aug2003    50        .0421453
2    2003-32    .015649    04aug2003    08aug2003    50        .0421453
2    2003-33    .001541    11aug2003    15aug2003    50        .0421453
2    2003-34    .021538    18aug2003    22aug2003    50        .0421453
2    2003-35    -.027108    25aug2003    29aug2003    50        .0421453
2    2003-36    -.018576    01sep2003    05sep2003    50        .0421453
2    2003-37    -.047319    08sep2003    12sep2003    50        .0421453
2    2003-38    -.038079    15sep2003    19sep2003    50        .0421453
2    2003-39    .017212    22sep2003    26sep2003    50        .0421453
2    2003-40    -.00846    29sep2003    03oct2003    50        .0421453
2    2003-41    .032423    06oct2003    10oct2003    50        .0421453
2    2003-42    -.024793    13oct2003    17oct2003    50        .0421453
2    2003-43    .027119    20oct2003    24oct2003    50        .0421453
2    2003-44    -.019802    27oct2003    31oct2003    50        .0421453
2    2003-45    .001684    03nov2003    07nov2003    50        .0421453
2    2003-46    -.023529    10nov2003    14nov2003    50        .0421453
2    2003-47    .010327    17nov2003    21nov2003    50        .0421453
2    2003-48    .006814    24nov2003    28nov2003    50        .0421453
2    2003-49    .018613    01dec2003    05dec2003    50        .0421453
2    2003-50    .01495    08dec2003    12dec2003    50        .0421453
2    2003-51    -.018003    15dec2003    19dec2003    50        .0421453
2    2003-52    .088333    22dec2003    26dec2003    50        .0421453
2    2003-53    .001531    29dec2003    02jan2004    50        .0421453
2    2004-02    .049491    05jan2004    09jan2004    50        .0493958
2    2004-03    .101248    12jan2004    16jan2004    50        .0493958
2    2004-05    -.011335    26jan2004    30jan2004    50        .0493958
2    2004-06    .003822    02feb2004    06feb2004    50        .0493958
2    2004-07    .063452    09feb2004    13feb2004    50        .0493958
2    2004-08    .004773    16feb2004    20feb2004    50        .0493958
2    2004-09    -.030879    23feb2004    27feb2004    50        .0493958
2    2004-10    .002451    01mar2004    05mar2004    50        .0493958
2    2004-11    .025672    08mar2004    12mar2004    50        .0493958
2    2004-12    0    15mar2004    19mar2004    50        .0493958
2    2004-13    .059595    22mar2004    26mar2004    50        .0493958
2    2004-14    .051744    29mar2004    02apr2004    50        .0493958
2    2004-15    -.04385    05apr2004    09apr2004    50        .0493958
2    2004-16    -.008949    12apr2004    16apr2004    50        .0493958
2    2004-17    -.037246    19apr2004    23apr2004    50        .0493958
2    2004-18    -.098476    26apr2004    30apr2004    50        .0493958
2    2004-20    -.011704    10may2004    14may2004    50        .0493958
2    2004-21    .007895    17may2004    21may2004    50        .0493958
2    2004-22    -.008367    24may2004    28may2004    50        .0493958
2    2004-23    -.007952    31may2004    04jun2004    50        .0493958
2    2004-24    -.02004    07jun2004    11jun2004    50        .0493958
2    2004-25    -.063395    14jun2004    18jun2004    50        .0493958
2    2004-26    .021834    21jun2004    25jun2004    50        .0493958
2    2004-27    .089744    28jun2004    02jul2004    50        .0493958
2    2004-28    -.003922    05jul2004    09jul2004    50        .0493958
2    2004-29    .061024    12jul2004    16jul2004    50        .0493958
2    2004-30    -.040816    19jul2004    23jul2004    50        .0493958
2    2004-31    .034816    26jul2004    30jul2004    50        .0493958
2    2004-32    -.009346    02aug2004    06aug2004    50        .0493958
2    2004-33    .037736    09aug2004    13aug2004    50        .0493958
2    2004-34    .016364    16aug2004    20aug2004    50        .0493958
2    2004-35    -.125224    23aug2004    27aug2004    50        .0493958
2    2004-36    .02045    30aug2004    03sep2004    50        .0493958
2    2004-37    -.034068    06sep2004    10sep2004    50        .0493958
2    2004-38    .139004    13sep2004    17sep2004    50        .0493958
2    2004-39    -.020036    20sep2004    24sep2004    50        .0493958
2    2004-40    .007435    27sep2004    01oct2004    50        .0493958
2    2004-41    .02583    04oct2004    08oct2004    50        .0493958
2    2004-42    .001799    11oct2004    15oct2004    50        .0493958
2    2004-43    .039497    18oct2004    22oct2004    50        .0493958
2    2004-44    -.136442    25oct2004    29oct2004    50        .0493958
2    2004-45    -.006    01nov2004    05nov2004    50        .0493958
2    2004-46    .036217    08nov2004    12nov2004    50        .0493958
2    2004-47    0    15nov2004    19nov2004    50        .0493958
2    2004-48    0    22nov2004    26nov2004    50        .0493958
2    2004-49    -.013592    29nov2004    03dec2004    50        .0493958
2    2004-50    -.005906    06dec2004    10dec2004    50        .0493958
2    2004-51    .005941    13dec2004    17dec2004    50        .0493958
2    2004-52    -.001969    20dec2004    24dec2004    50        .0493958
2    2004-53    .037475    27dec2004    31dec2004    50        .0493958

Please guide me which command in Stata i can use for sigma calculation.

entropy balancing commands

Dear all,
I hope you are well.

I am performing entropy balancing method for PSM. I have read the paper of Hainmueller and Xu (2013). I follow them to perform the model. However, I need to ensure that I am doing the right way (using commands) or not?
Here is my commands as follows:

HTML Code:

ebalance treat  Control variables, targets(1)
svyset [pweight=_webal]
then
svy: reg Dependent Var treat Control variables

Kindly If I am wrong please guide me to do it in the right way

Generate a new variable based on the values of a second variable

Hi,

I'm trying to generate a continuous variable using a comparative dataset where I want the countries to be sorted based on the share of highly educated within the country. Does anyone know which command is best to use? I'm a bit confused - sorry for a beginner question:-)

All the best,
Marcus

Time fixed effect and State fixed effect

Dear all,
I am pretty much new for STATA. I'm currently working on my master's thesis. I want to investigate the effect of Vehicle Miles Travelled (VMT) on PM2.5 concentrations. I retrieved a panel dataset of VMT and PM2.5 for each US cities from 2012-2016. I am struggling with the STATA command of how to use the Time fixed effect and the Entity fixed effect at "State" level. I have tried command -xtset ID STATE_ID_N- , where variable "ID" is the id of each US cities and variable "STATE_ID_N" is the State where city is located. However, STATA shows "repeated time values within panel". How could I deal with it? and what is the correct command for

(1) regress VMT on PM2.5 with Time fixed effect,
(2) regress VMT on PM2.5 with Entity fixed effect at "State" level, and
(3) regress VMT on PM2.5 with both Time and Entity fixed.

Thank you

Rus

Here are my dataset Array

DID two way fixed effects

Dear Statalist Members,

PS: I have attached a small portion of my data.

By using a repeated cross sectional data (Demographic and Health Survey; rounds 2008, 2013, and 2018), I am trying to estimate the effect of a refugee shock (influx of refugees) on children's health status. I appended the DHS-2008,2013, and 2018.

The health indicator of children's health is Height-for-age-Z-Score (HAZ).

*The start of the refugee inflow to the country of interest (treatment) started in 2011.
*The country of interest has 81 cities, and the arrivals of the refugees increased year by year.
*For these 81 cities, I have population data from 2008 to 2018, and the number of refugees in each city from 2013 to 2018. Thus, I have created a refugee ratio (Refugee Ratio = Population of City c in year y / Number of Refugees in city c in year y). Since the number of refugees in the cities is available starting from 2013, the refugee ratio takes value of zero before 2013. I have merged the refugee ratio (ranging from 0 and 1) with the city of residence of children so that I can understand whether the child i at the survey year is exposed to any refugee shock.

Please note that each DHS rounds includes the children who born 5 years preceding the survey year. To be more clear,
*DHS 2008 only includes children born between 2003 and 2008. I will use the DHS-2008 to look at the placebo effect.
*DHS-2013 includes children born between 2008-2013.
*DHS-2018 includes children born between 2013-2018.

I am having troubles in creating "time" and "treatment" variables - and hence did variable (treatment*time).
In DHS, I can see the children's year of birth.

1) For the "time" variable, I run this code:
replace time=1 if child_birth_year>=2011
replace time=0 if child_birth_year<=2010. But this is problematic because it is like assuming that those born before 2010 are never treated. But they may be exposed to the presence of refugees later in life. That is way I need to consider the Two Way FE model which is a generalization of DID.

2) My model should be like this
HAZ_ihct= \beta Ref/pop_ct + \alpha_c + \alpha_t + \epsilon

Where i is the individual in household h, in city c, at time t. The treatment variable could be time-lagged (Population/Refugee_ct-1).

My question is how to define the timing of the treatment. I would start by the year of birth. But I cannot define it properly. I would be more than happy if you can help me.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 CASEID double haz06 float child_birth_year byte(ref_ratio2008 ref_ratio2009 ref_ratio2010 ref_ratio2011 ref_ratio2012) double(ref_ratio2013 ref_ratio2014 ref_ratio2015 ref_ratio2016 ref_ratio2017 ref_ratio2018) float survey_year
"    15220003 02"                   . 2018 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2018
"    12560019 02"                 .99 2018 0 0 0 0 0                   0                     0   .0002882681878477381 .00032848332947102936  .0003680900243030867  .0003833342145614128 2018
"    08360003 02"               -3.37 2018 . . . . .                   .                     .                      .                     .                     .                     . 2018
"      331311  2"                   . 2008 0 0 0 0 0                   0  .0016293140344733668   .0014899884519778434  .0020713356178705586  .0023507949691504507   .003313790466479735 2008
"      351016 02" -.47000000000000003 2013 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2013
"      351121  4"                3.36 2008 0 0 0 0 0  .04788275035973622     .0999472747643841     .11175873189629308    .11743237252225154    .11651696452720017    .11183859044012566 2008
"      241024  2"                   . 2008 0 0 0 0 0                   0  .0009685123256973092   .0023463063840160514  .0030950817445735755   .003791346468746906   .004348978375642727 2008
"    08310006 05"  1.6500000000000001 2018 0 0 0 0 0 .008997050147492625    .02827213768040654     .06261454116886157    .06801110066449559    .07022186128875806    .08672574742413153 2018
"    15270016 02"                1.12 2018 0 0 0 0 0                   0   .003058014198971529     .01617946120048652   .017461997622404624   .017929279410977463   .018889445600197646 2018
"    05020015 02"                   . 2018 0 0 0 0 0                   0 .00005773732665679883  .00044272332591961336  .0006049068931493858  .0008877145892747302  .0013253948067363278 2018
"      361418 05"               -3.64 2013 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2013
"    08840003 02"                   . 2018 0 0 0 0 0                   0  .0003820621806198959    .012800937012466629   .014539319111493854                     0   .016161771768778376 2018
"      270203 02"                 .06 2013 0 0 0 0 0                   0 .00027240533914464724    .001320843350189284  .0018435936480313247  .0021550565903887702   .002352538800713619 2013
"    15070004 02"                 .06 2018 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2018
"    10020016 02"                   . 2018 0 0 0 0 0                   0 .00023380249701066808  .00045309083464366084  .0005070926021381568  .0006098255630834389   .000773722773796144 2018
"    03630018 02"                   . 2018 0 0 0 0 0                   0  .0011833514287489313   .0015244574077174554   .001931277016631255  .0022959288262063876   .002618805023526314 2018
"    13220001 02"                   . 2018 0 0 0 0 0                   0                     0  .00038454543092472784 .00042015716990429755  .0005493518161981774  .0006786322016044081 2018
"      3624 4  2"                   . 2008 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2008
"    14020008 02"                   . 2018 0 0 0 0 0                   0  .0015824092356435922    .007260962835014208   .008873700087596689   .012827431892281782    .01807977328511613 2018
"    15140013 02"                 .46 2018 0 0 0 0 0  .01706844121744722    .05579131365677821     .03663230331766135   .040618591150628026    .04283047948546196    .04727203436918047 2018
"      300506 07"                   . 2013 0 0 0 0 0                   0  .0001820051507457661    .001454651779024506  .0015804372481581545  .0018795976020213132  .0021050407944305364 2013
"       30823  2"                   . 2008 0 0 0 0 0                   0 .00042050128799544514    .001258123449466077    .00156916707909204  .0028302889052674914  .0034176466991419196 2008
"    15490013 04"                3.14 2018 0 0 0 0 0                   0     .0358684560243762     .02910349379580787    .03363500370082111    .03420236069248802   .035309454300846435 2018
"    08720001 06"                3.41 2018 . . . . .                   .                     .                      .                     .                     .                     . 2018
"    01490005 04"                   . 2018 . . . . .                   .                     .                      .                     .                     .                     . 2018
"    08140014 02"                -.73 2018 0 0 0 0 0                   0  .0009731488762076778    .027289723967345234    .02992719997245611   .030769056458404935   .031545682890866386 2018
"    13280012 02"                 3.2 2018 0 0 0 0 0                   0  .0058251612792986195     .01024024892919653     .0125578928192143    .01473730336641184   .017472613024926486 2018
"      340721 04"                 .06 2013 0 0 0 0 0  .06652216013766796    .13407386002182628      .1532707745377972     .1611138238231951    .16609200130639762     .1860952802550377 2013
"    08130009 02"                   . 2018 0 0 0 0 0                   0  .0044993120551867615 .000057243836018695577  .0001571790230421871 .00019370697632714655 .00029632914543455287 2018
"      2016 4  2"                   . 2008 0 0 0 0 0  .05016679240964801    .13470861329775055     .22926990225672267    .24379470988608926    .25474312892245304    .27670797885028225 2008
"      322315  2"                   . 2008 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2008
"      362112  3"                   . 2008 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2008
"    13140009 02"                4.29 2018 0 0 0 0 0                   0  .0001820051507457661    .001454651779024506  .0015804372481581545  .0018795976020213132  .0021050407944305364 2018
"      211117 02"                -.62 2013 0 0 0 0 0                   0  .0044993120551867615 .000057243836018695577  .0001571790230421871 .00019370697632714655 .00029632914543455287 2013
"    08580019 02"                   . 2018 0 0 0 0 0 .008997050147492625    .02827213768040654     .06261454116886157    .06801110066449559    .07022186128875806    .08672574742413153 2018
"      020312 02"                 .59 2013 0 0 0 0 0 .004223024565503383   .022953299495069145     .02451377232877187   .029644525887260003   .033433380590131324    .03681080168444816 2013
"    08780007 06"                   . 2018 0 0 0 0 0  .02321266219580443    .05621015979240394     .06983977895514358     .0781604732553562    .08195735631500954    .08756423324956697 2018
"    15220005 02"                   . 2018 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2018
"      3616 3  2"                   . 2008 0 0 0 0 0  .04788275035973622     .0999472747643841     .11175873189629308    .11743237252225154    .11651696452720017    .11183859044012566 2008
"      191602 02"                   . 2013 0 0 0 0 0 .008997050147492625    .02827213768040654     .06261454116886157    .06801110066449559    .07022186128875806    .08672574742413153 2013
"      191009 02"                   . 2013 0 0 0 0 0 .008997050147492625    .02827213768040654     .06261454116886157    .06801110066449559    .07022186128875806    .08672574742413153 2013
"    15240011 02"                   . 2018 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2018
"    01310015 02"                3.14 2018 0 0 0 0 0                   0   .002198727251025549    .008072395591086027   .009833758956685592    .01228752689630511   .013615943558356198 2018
"      321519  2"                   . 2008 0 0 0 0 0 .008707762760675534   .010841485347166634     .02121220746690404   .026565809766992403   .027556452720052475    .03763569023231071 2008
"    15120013 02"                2.07 2018 0 0 0 0 0  .06652216013766796    .13407386002182628      .1532707745377972     .1611138238231951    .16609200130639762     .1860952802550377 2018
"      360516 10"                   . 2013 0 0 0 0 0                   0   .003058014198971529     .01617946120048652   .017461997622404624   .017929279410977463   .018889445600197646 2013
"      361117 18"                   . 2013 0 0 0 0 0  .04788275035973622     .0999472747643841     .11175873189629308    .11743237252225154    .11651696452720017    .11183859044012566 2013
"    02480004 02"                1.59 2018 0 0 0 0 0                   0 .00042050128799544514    .001258123449466077    .00156916707909204  .0028302889052674914  .0034176466991419196 2018
"      321905 02"                   . 2013 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2013
"    15400008 05"                2.93 2018 0 0 0 0 0  .04788275035973622     .0999472747643841     .11175873189629308    .11743237252225154    .11651696452720017    .11183859044012566 2018
"      1515 1  3"                   . 2008 0 0 0 0 0                   0  .0058251612792986195     .01024024892919653     .0125578928192143    .01473730336641184   .017472613024926486 2008
"      220902 02"                   . 2013 0 0 0 0 0                   0   .007184038427799658    .033288691896535265   .040787943899100794    .04466624343912569   .052705658856715215 2013
"      2305 4  8"                   . 2008 0 0 0 0 0                   0    .00062882096069869    .015563157546021684    .01961532511731037   .027000495955398218   .028983807011486933 2008
"      362116 15"                   . 2013 0 0 0 0 0                   0    .38875913662708655    .027567199255789664     .0296286803310541   .028966528626727817   .028310345485415594 2013
"    09370016 02"                   . 2018 0 0 0 0 0                   0  .0031986228474722156    .006388068671015908   .009150192905186247   .011309029362651568   .013369636447888305 2018
"      361614 10"                   . 2008 0 0 0 0 0  .04788275035973622     .0999472747643841     .11175873189629308    .11743237252225154    .11651696452720017    .11183859044012566 2008
"      331414 02"                   . 2013 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2013
"    03270005 02"                   . 2018 0 0 0 0 0                   0   .003058014198971529     .01617946120048652   .017461997622404624   .017929279410977463   .018889445600197646 2018
"       904 2  4"                -.63 2008 0 0 0 0 0                   0 .00005773732665679883  .00044272332591961336  .0006049068931493858  .0008877145892747302  .0013253948067363278 2008
"    08470016 02"                 2.7 2018 0 0 0 0 0 .008997050147492625    .02827213768040654     .06261454116886157    .06801110066449559    .07022186128875806    .08672574742413153 2018
"      202911  2"                   . 2008 0 0 0 0 0 .005862441331618374    .02605289896396305     .07233868948402523    .07889778854154687    .08475298102323892    .11341285710191637 2008
"    14200015 02"                   . 2018 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2018
"      110313  2"                   . 2008 0 0 0 0 0                   0   .007174787509699416    .031019715768991683   .035517040762446765    .04226364519513226    .04788311719971241 2008
"    14250003 02"                 .06 2018 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2018
"      341322  2"                   . 2008 0 0 0 0 0  .06652216013766796    .13407386002182628      .1532707745377972     .1611138238231951    .16609200130639762     .1860952802550377 2008
"      281413  2" -1.1500000000000001 2008 0 0 0 0 0                   0 .00011045635041172604   .0006433920617217391  .0008220222012608781  .0009968464627442106  .0010674515371820316 2008
"       901 4  2"                   . 2008 0 0 0 0 0                   0  .0007078433287889792    .004006967412537111    .00585145614626821   .006891222912617561   .008190824292140778 2008
"      3008 6  2"                   . 2008 0 0 0 0 0                   0                     0   .0007087808072923676   .000769802505839881  .0007990980990104142  .0007202352203496107 2008
"      300820  2"                   . 2008 0 0 0 0 0                   0                     0   .0007087808072923676   .000769802505839881  .0007990980990104142  .0007202352203496107 2008
"      230210 08"                 .58 2013 0 0 0 0 0                   0   .007184038427799658    .033288691896535265   .040787943899100794    .04466624343912569   .052705658856715215 2013
"    02540009 02"                 .79 2018 0 0 0 0 0                   0 .00042050128799544514    .001258123449466077    .00156916707909204  .0028302889052674914  .0034176466991419196 2018
"      321622  2"                   . 2008 0 0 0 0 0 .008707762760675534   .010841485347166634     .02121220746690404   .026565809766992403   .027556452720052475    .03763569023231071 2008
"    03610007 02"                   . 2018 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2018
"    06140017 02"                   . 2018 0 0 0 0 0                   0 .00023818030248898416   .0018694581396772477  .0023408475608918127  .0026929131727208345  .0029223801510866064 2018
"      031821 02"                -.78 2013 0 0 0 0 0                   0   .000124912561207155    .014776281434998522    .01615131652647118   .016104017401777047   .015850683307089674 2013
"    08700019 02"                -.24 2018 0 0 0 0 0  .05016679240964801    .13470861329775055     .22926990225672267    .24379470988608926    .25474312892245304    .27670797885028225 2018
"      2702 1  6"                   . 2008 0 0 0 0 0                   0 .00027240533914464724    .001320843350189284  .0018435936480313247  .0021550565903887702   .002352538800713619 2008
"      321820  1"                   . 2008 0 0 0 0 0                   0  .0016293140344733668   .0014899884519778434  .0020713356178705586  .0023507949691504507   .003313790466479735 2008
"      3413 7 10"                   . 2008 0 0 0 0 0  .06652216013766796    .13407386002182628      .1532707745377972     .1611138238231951    .16609200130639762     .1860952802550377 2008
"      160512 02"                   . 2013 0 0 0 0 0 .009618968606091212   .021339069275154495      .0290470415067701    .03249243627570961   .037346071300631285     .0474086748829915 2013
"      190222 02"  3.2600000000000002 2013 0 0 0 0 0 .008997050147492625    .02827213768040654     .06261454116886157    .06801110066449559    .07022186128875806    .08672574742413153 2013
"      322316  3"                   . 2008 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2008
"      260219  2"                   . 2008 0 0 0 0 0                   0 .00027240533914464724    .001320843350189284  .0018435936480313247  .0021550565903887702   .002352538800713619 2008
"      161014  2"                   . 2008 0 0 0 0 0 .009618968606091212   .021339069275154495      .0290470415067701    .03249243627570961   .037346071300631285     .0474086748829915 2008
"       60110  2"                   . 2008 0 0 0 0 0                   0  .0009597122398819938    .005605093848540018   .006927152565854754   .007841130825220038   .009919416695665482 2008
"    15480011 02"                3.52 2018 0 0 0 0 0  .01706844121744722    .05579131365677821     .03663230331766135   .040618591150628026    .04283047948546196    .04727203436918047 2018
"    10380016 02"                -.32 2018 0 0 0 0 0                   0                     0   .0013605880316558115  .0019658040297656156   .003080244808297057   .003557892705015742 2018
"      3606 7  3"                   . 2008 0 0 0 0 0                   0   .003058014198971529     .01617946120048652   .017461997622404624   .017929279410977463   .018889445600197646 2008
"      3623 5  2"                   . 2008 0 0 0 0 0  .06564723248870687    .25302505814970955     .19301122431724022    .20895875405216974    .22131554125815245    .23438397217027726 2008
"      321923 02"                1.21 2013 0 0 0 0 0                   0  .0005527192867710324   .0012878546730791857  .0014915605486325088  .0018673925436199228   .002585906188377838 2013
"    08090017 02"                   . 2018 0 0 0 0 0                   0  .0044993120551867615 .000057243836018695577  .0001571790230421871 .00019370697632714655 .00029632914543455287 2018
"      260116 02"                 .79 2013 0 0 0 0 0                   0  .0003106429376881331   .0004469731536749574  .0006649282520966689   .001218595402075856  .0016888488569159843 2013
"      110919 02"               -2.65 2013 0 0 0 0 0                   0  .0007183399699369362    .005231954896289372   .007066906324594554   .009549450926769365   .013034530523399625 2013
"      032508 02"               -1.35 2013 0 0 0 0 0                   0  .0005293736186657138    .005178535253915621   .006102531157651291  .0075278752176857824    .00961912834598957 2013
"    15310014 02"                   . 2018 0 0 0 0 0                   0   .003058014198971529     .01617946120048652   .017461997622404624   .017929279410977463   .018889445600197646 2018
"      030723 02"                1.72 2013 0 0 0 0 0                   0 .00042050128799544514    .001258123449466077    .00156916707909204  .0028302889052674914  .0034176466991419196 2013
"    15070001 02"                   . 2018 0 0 0 0 0                   0 .00015178806338669526  .00038953051322353585  .0005659348622776808  .0006848505496186391  .0007857988493412444 2018
"      1011 9  2"                2.23 2008 0 0 0 0 0                   0   .007174787509699416    .031019715768991683   .035517040762446765    .04226364519513226    .04788311719971241 2008
"      200603 02"                   . 2013 0 0 0 0 0                   0  .0044993120551867615 .000057243836018695577  .0001571790230421871 .00019370697632714655 .00029632914543455287 2013
"      3104 2 14"                   . 2008 0 0 0 0 0                   0  .0001820051507457661    .001454651779024506  .0015804372481581545  .0018795976020213132  .0021050407944305364 2008
end

Expxport codebook descriptives to latex

Hello, I have a very simple query that unfortunately I am unable to resolve.

I am trying to export my descriptive statistics into a latex table. I can easily generate the descriptions with the codebook and sum command, but I am having trouble understanding how to export them.

Does anyone knows how I can do this??

here is a replication of the commands.

Code:

sysuse auto, clear

codebook make price mpg foreign,compact
sum make price mpg foreign

In sum, I would like a latex table that replicates the codebook table, but I would also include the standard deviation in the summary statistic

thanks a lot for your help

Control variables in interaction function

Hello fellow stata users!

I have a more theoretical question instead of how to use stata. I was wondering if it is okay to add control variables in an interaction function?

best wishes,
Klaudia

Suspected bug in -confirm-

The title is self-explanatory. Here is a reproducible example:

Code:

clear
sysuse auto
confirm variable , exact

There is no error where I think there should be one.

I think the above should be equivalent to

Code:

clear
sysuse auto
confirm variable

but it is not.

In case you are wondering, this is relevant in situations where you pass to confirm a local macro that might contain a variable name (or a variable list) but might also be empty. Something like

Code:

confirm `could_be_a_varname_or_empty' , exact

I am (still) using Stata 16.1 fully updated on Windows 11.

Edit:

Here is my workaround that I would like to share:

Code:

novarabbrev confirm variable ...

cmp-random effects

Good day,
I am trying to estimate the impact of health status expectations (Exp_h) on consumption expectations. I performed a conditional mixed process -CMP-, because the model is non linear, and I have a way of thinking that the regressor (Exp_h (that is binary) is endogenous.
Then, I tried to estimate the model by considering random effects.

Code:

cmp(cons= Exp_h x2 x3 ||id:) (Exp_h=z1 z2 x2 x3|| id:) , ind ($cmp_oprobit $cmp_probit) cl(id) cov(indep unstruct)

cons=ordinal dependent variable
Exp_h=endogenous dummy variable
z1 z2 =dummies IV variables

I get the results of the first and second stage equation and finally this last table, which I have difficulty interpreting.
Any suggestions would be welcome. Thank you very much.

Code:

Random effects parameters    Estimate    Std. Err.    [95% Conf.    Interval]
                
Level: id             
Cons       
Standard deviations              
_cons    .6071207    .0526769    .5121775    .7196637
Exp_h           
Standard deviations              
_cons    .8759736    .0612121    .7638533    1.004551
                
Level: Observations             
Standard deviations              
Cons    1    (constrained)
Exp_h    1    (constrained)
Cross-eq correlation              
Cons       Exp_h   .4120378    .0875992    .2272259    .5682024

How to set the X axis in the DASP package in stata

I want to use the DASP plug-in in stata and use the Lorenz curve to analyze the fairness of inter-provincial health resources. The vertical axis sets health resources, but how to set the horizontal axis. I want to set them as the cumulative percentage of population, cumulative percentage of geographic area, and economic GDP. Cumulative percentage, how to set it? I even don't quite understand the meaning of ranking variable

Thank you very much

Array

Sunday, December 26, 2021

putdocx cell / row height

Is there a way to control the row or cell height in a putdocx table?

Business calender: Get omitted dates using bodf()

I have a list of dates for specific company events of listed US companies. This variable (reguldate) may include weekends as well as holidays.
For each of these companies, I have a dataset containing daily stock price data from which I created a business calender:

Code:

 bcal create biscal, from(date) replace

I then transformed the regular event dates using:

Code:

gen businessdate= bofd("biscal",reguldate)

The help file states:

Function bofd() returns missing when the date does not appear on the specified calendar.

However, if reguldate is a non trading date (according to the business calender I created) I would like to get previous trading date instead of a missing value, so something like: businessdate-1
Is there a way to include those values what are omited according to my .stbcal file?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int regulardate float businessdate
14607 104
14570  79
14571  80
14574  82
14572  81
14577  83
14578  84
14579  85
14578  84
14579  85
14577  83
14580  86
14580  86
14580  86
14585  89
14584  88
14573   .
14584  88
14585  89
14585  89
end
format %tdDD/NN/CCYY regulardate
format %tbbiscal businessdate

Thanks

Optimal lags and Error-correction model

Good morning everyone;

I want to perform a cointegration with the Error correction model.

Everything works fine. My data with optimal lags 1 and 2 are of the same order I(1).

They are stationary in the first difference.

However here is my code to determine if, in the first difference, they are stationary:

varsoc dloga
* Optimal lag = 2 by HQIC, SBIC, AIC
varsoc dlogb
* Optimal lag = 0 by HQIC, SBIC, AIC

As you can see the optimal lag for the first difference of the second variable b is 0 with all criteria. I wanted to know if this would be a problem later in the estimation of my coefficients for the error correction model.

Thanks in advance.

Pita

How to evaluate the following hypothesizes?

Hello all!

I am currently working on a research project and I have a dataset of bankrupt and non-bankrupt European companies.
First of all, before I can conduct this research I have to think about my methodology how I will evaluate the hypothesizes.
My knowledge in stata is still limited, any help will be highly appreciated.

I need to construct a default predication model using a number of variables.
I use R&D as independent variable and some control variables (age, size, leverage, liquidity, Z-score, industry, and acquisitions).
The model I use is the logit model.

My first basic-hypothesis is: R&D spending has a positive impact on the prediction of bankruptcy because it improves the financial performance of a firm.

The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)

X1 = R&D
X2 = Leverage
X3 = Size
X4 = Liquidity
X5 = Z-score
X6 = age
X7 = industry

The second hypothesis goes as follows: Age and firm size will reinforce the basic hypothesis. Together, the determinants will improve the prediction of bankruptcies.

The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)

X1 = R&D
X2 = Size
X3 = Age
X4 = Liquidity
X5 = Z-score
X6 = leverage
X7 = industry

How do I best evaluate this hyptothesis? Do I no longer control for age and firm size and use them as independent variables? What do you all recommend? What I have done here is put the variables age and size at the beginning of the regression. By doing this the variables are no longer seen as control variables but as independent variables.
By not controlling size and age, I can deduce whether the prediction model improves by looking at the significance of the variables and the adjusted R squared of the model. Here I'm not sure yet, can someone confirm please?

The final hypothesis: There is a non-linear U-shaped relationship between R&D spending and the probability of bankruptcy.

Here I include a quadratic term and check whether the quadratic term of R&D is significant. Is this a correct method? In this hypothesis age and size will be control variables as original.

The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + 𝛽8𝑋8 + errorterm)

X1 = R&D
X2 = R&D²
X3 = Leverage
X4 = Size
X5 = Liquidity
X6 = Z-score
X7 = age
X8 = industry

Thanks in advance.

Kind regards,
Chun H

Problem with parallel: parallelize a loop

Hello to all,
I am not very strong in STATA but I wrote a program. I wanted to parallelize a loop. But, while searching, I didn't find a commercial way to parallelize it directly. So I integrated the loop in a program that I parallelized afterward.

The objective is to make regression under several sub-samples chosen with the variable "rp_partner_rank".

Then I have to get information about my variable of interest: lag_fdi_in_all
and the stocks in the variables.

Code:

program define savereg
qui {
sum rp_partner_rank

forvalues i = `r(min)' (1) `r(max)' {

xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id)

lincom _b[lag_fdi_in_all], l(95)
replace beta = r(estimate) in `i'
replace se = r(se) in `i'
replace i_pos = `i' in `i'
replace lb_95 = r(lb) in `i'
replace ub_95 = r(ub) in `i'

lincom _b[lag_fdi_in_all], l(90)
replace lb_90 = r(lb) in `i'
replace ub_90 = r(ub) in `i'

lincom _b[lag_fdi_in_all], l(99)
replace lb_99 = r(lb) in `i'
replace ub_99 = r(ub) in `i'


}
}

end
preserve

drop beta se i_pos lb_* ub_*
g i_pos = .
g beta = .
g se = .
g lb_90 = .
g lb_95 = .
g lb_99 = .
g ub_90 = .
g ub_95 = .
g ub_99 = .

parallel prog(savereg): savereg
*parallel: savereg
*parallel do "$Code\saveregdo.do"
*drop if beta==.

keep beta se i_pos lb_* ub_*


parallel append, do(savereg) prog(savereg) e(3)
twoway rarea ub_90 lb_90 i_pos , astyle(ci) || ///
line beta i_pos

save "$DataCreated\asup", replace
restore

But the code doesn't work, I have tested several ways with even the "parallele do" command, saving the loop in a do file. but the results are not what I expect. Please can you help me?
here are the error results

Code:

.                                 parallel prog(savereg): savereg
--------------------------------------------------------------------------------
Parallel Computing with Stata (by GVY)
Clusters   : 4
pll_id     : l9d9vacs17
Running at : G:\Etude et biblio\Université\tours\Master2 IE\Econometrie avancee\memoire\donn
> ees\Code
Randtype   : datetime
Waiting for the clusters to finish...
cluster 0002 Exited with error -199- while running the command/dofile (view log)...
cluster 0003 Exited with error -199- while running the command/dofile (view log)...
cluster 0004 Exited with error -199- while running the command/dofile (view log)...
cluster 0001 Exited with error -199- while running the command/dofile (view log)...
--------------------------------------------------------------------------------
Enter -parallel printlog #- to checkout logfiles.
--------------------------------------------------------------------------------

.

How should I parallelize this loop?

Problem with parallel

Code:

program define savereg
qui {
sum rp_partner_rank

forvalues i = `r(min)' (1) `r(max)' {

xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id)

lincom _b[lag_fdi_in_all], l(95)
replace beta = r(estimate) in `i'
replace se = r(se) in `i'
replace i_pos = `i' in `i'
replace lb_95 = r(lb) in `i'
replace ub_95 = r(ub) in `i'

lincom _b[lag_fdi_in_all], l(90)
replace lb_90 = r(lb) in `i'
replace ub_90 = r(ub) in `i'

lincom _b[lag_fdi_in_all], l(99)
replace lb_99 = r(lb) in `i'
replace ub_99 = r(ub) in `i'


}
}

end
preserve

drop beta se i_pos lb_* ub_*
g i_pos = .
g beta = .
g se = .
g lb_90 = .
g lb_95 = .
g lb_99 = .
g ub_90 = .
g ub_95 = .
g ub_99 = .

parallel prog(savereg): savereg
*parallel: savereg
*parallel do "$Code\saveregdo.do"
*drop if beta==.

keep beta se i_pos lb_* ub_*


parallel append, do(savereg) prog(savereg) e(3)
twoway rarea ub_90 lb_90 i_pos , astyle(ci) || ///
line beta i_pos

save "$DataCreated\asup", replace
restore

Code:

.                                 parallel prog(savereg): savereg
--------------------------------------------------------------------------------
Parallel Computing with Stata (by GVY)
Clusters   : 4
pll_id     : l9d9vacs17
Running at : G:\Etude et biblio\Université\tours\Master2 IE\Econometrie avancee\memoire\donn
> ees\Code
Randtype   : datetime
Waiting for the clusters to finish...
cluster 0002 Exited with error -199- while running the command/dofile (view log)...
cluster 0003 Exited with error -199- while running the command/dofile (view log)...
cluster 0004 Exited with error -199- while running the command/dofile (view log)...
cluster 0001 Exited with error -199- while running the command/dofile (view log)...
--------------------------------------------------------------------------------
Enter -parallel printlog #- to checkout logfiles.
--------------------------------------------------------------------------------

.

How should I parallelize this loop?