Hello all,
I am estimating a first differences model of the following form ΔYit = ß0 + ß1ΔXit + ɛit.
In one of the regressions estimated, the dependent variable is wages. As I am more interested in the percentage change than the level change, I want to use the logarithm for the dependent variable.
Do I have to use log(wageit+1 - wageit) or log(wageit+1) - log(wageit) as my dependent variable?
Thank you!
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Friday, December 31, 2021
joint significance of the dummy variables
Hi, How do I test for the joint significance of the dummy variables using Stata commands?
Denis
Denis
Bootstrap post-estimation Craggit model
Hi there, I am needing to use Cragg's Double Hurdle Model, and using the Craggit command. However, I'm using Burke's 2009 treatment of the command (APE boot) to estimate my actual parameters of interest and to bootsrapt standard errors.
But, crucially, I don't know how to access the co-efficients and standard errors from my bootstrapping to create a table for exporting! My code is below. I would love to know how I might get the estimates for tabulation, but also open to other ways to generate the standard errors.
program define APEboot, rclass
preserve
*generating the parameter estimates used to calculate the APE from the overall model for the E(y|y>0) and Pr(y>0)
craggit dum_attend treat, second(prop_attend treat) vce(robust)
predict bsx1g, eq(Tier1)
predict bsx2b, eq(Tier2)
predict bssigma, eq(sigma)
generate bsIMR = normalden(bsx2b/bssigma)/normal(bsx2b/bssigma)
*The estimates for each model below
gen bsdPw1_dtreat = [Tier1]_b[treat]*normalden(bsx1g)
gen bsdEyyx2_dtreat = [Tier2]_b[treat]*(1-bsIMR*(bsx2b/bssigma+bsIMR))
gen bsdEy_dtreat = ///
[Tier1]_b[treat]*normalden(bsx1g)*(bsx2b+bssigma*bsIMR) ///
+[Tier2]_b[treat]*normal(bsx1g)*(1-bsIMR*(bsx2b/bssigma+bsIMR))
*creating the ape matrices for bootstrapping
su bsdPw1_dtreat
return scalar ape_Pw1_dtreat = r(mean)
matrix ape_Pw1_dtreat = r(ape_Pw1_dtreat)
su bsdEyyx2_dtreat
return scalar ape_Eyyx2_dtreat = r(mean)
matrix ape_Eyyx2_dtreat = r(ape_Eyyx2_dtreat)
su bsdEy_dtreat
return scalar ape_dEy_dtreat = r(mean)
matrix ape_dEy_dtreat = r(ape_dEy_dtreat)
restore
end
*generating the ape estimates using bootstrapping
bootstrap ape_Pw1_dtreat = r(ape_Pw1_dtreat), reps(100): APEboot
bootstrap ape_Eyyx2_dtreat = r(ape_Eyyx2_dtreat), reps(100): APEboot
bootstrap ape_dEy_dtreat = r(ape_dEy_dtreat), reps(100): APEboot
program drop APEboot
But, crucially, I don't know how to access the co-efficients and standard errors from my bootstrapping to create a table for exporting! My code is below. I would love to know how I might get the estimates for tabulation, but also open to other ways to generate the standard errors.
program define APEboot, rclass
preserve
*generating the parameter estimates used to calculate the APE from the overall model for the E(y|y>0) and Pr(y>0)
craggit dum_attend treat, second(prop_attend treat) vce(robust)
predict bsx1g, eq(Tier1)
predict bsx2b, eq(Tier2)
predict bssigma, eq(sigma)
generate bsIMR = normalden(bsx2b/bssigma)/normal(bsx2b/bssigma)
*The estimates for each model below
gen bsdPw1_dtreat = [Tier1]_b[treat]*normalden(bsx1g)
gen bsdEyyx2_dtreat = [Tier2]_b[treat]*(1-bsIMR*(bsx2b/bssigma+bsIMR))
gen bsdEy_dtreat = ///
[Tier1]_b[treat]*normalden(bsx1g)*(bsx2b+bssigma*bsIMR) ///
+[Tier2]_b[treat]*normal(bsx1g)*(1-bsIMR*(bsx2b/bssigma+bsIMR))
*creating the ape matrices for bootstrapping
su bsdPw1_dtreat
return scalar ape_Pw1_dtreat = r(mean)
matrix ape_Pw1_dtreat = r(ape_Pw1_dtreat)
su bsdEyyx2_dtreat
return scalar ape_Eyyx2_dtreat = r(mean)
matrix ape_Eyyx2_dtreat = r(ape_Eyyx2_dtreat)
su bsdEy_dtreat
return scalar ape_dEy_dtreat = r(mean)
matrix ape_dEy_dtreat = r(ape_dEy_dtreat)
restore
end
*generating the ape estimates using bootstrapping
bootstrap ape_Pw1_dtreat = r(ape_Pw1_dtreat), reps(100): APEboot
bootstrap ape_Eyyx2_dtreat = r(ape_Eyyx2_dtreat), reps(100): APEboot
bootstrap ape_dEy_dtreat = r(ape_dEy_dtreat), reps(100): APEboot
program drop APEboot
Difference between DiD and two-way fixed effects model
I am currently writing my master thesis, where I analyze the effect of hurricanes on the stock market. I have an unbalanced panel dataset with stock returns over several days prior to and after each hurricane over a time frame of several years. I created a dummy variable "hurricane" taking the value 1 if the stock is affected that day by a hurricane, 0 otherwise. I included time-fixed as well as firm-fixed effects and clustered the standard error on firm-level. I have run the following regression as a baseline model:
xtreg RET i.hurricane i.date, fe vce(cluster PERMNO)
Some stocks are never affected, whereas others are affected once or even multiple times. The dummy is intermittent (switches "on and off")
Question:
My understanding is, that this variable can be seen as the interaction term post*treated in the diff-in-diff model, is that correct? Is the TWFE model equivalent to the generalized DID?
Any help would be highly appreciated!! Have a great NYE!
xtreg RET i.hurricane i.date, fe vce(cluster PERMNO)
Some stocks are never affected, whereas others are affected once or even multiple times. The dummy is intermittent (switches "on and off")
Question:
My understanding is, that this variable can be seen as the interaction term post*treated in the diff-in-diff model, is that correct? Is the TWFE model equivalent to the generalized DID?
Any help would be highly appreciated!! Have a great NYE!
Interpretation of constant in xtreg
Hello everyone!
I am currently writing my master thesis, where I analyze the effect of hurricanes on the stock market. I have an unbalanced panel dataset with stock returns over several days prior to and after each hurricane over a time frame of several years. I created a dummy variable "hurricane" taking the value 1 if the stock is affected that day by a hurricane, 0 otherwise. I included time-fixed as well as firm-fixed effects and clustered the standard error on firm-level. I have run the following regression as a baseline model:
xtreg RET i.hurricane i.date, fe vce(cluster PERMNO)
The regression output shows the coefficient for the hurricane dummy variable, many coefficients for all the days (which I know I don't have to look at further) and the constant.
Question:
Does the constant reflect the intercept (if the hurricane dummy is 0)? If so, I understand that in order to retrieve the effect if the stock is affected by the hurricane (dummy = 1), by adding the constant to the coefficient for hurricane dummy.
I am confused, as I have read, that I cannot interpret the constant in the fixed effects model (https://www.stata.com/support/faqs/s...effects-model/)
I would very much appreciate it if someone could help!!
I am currently writing my master thesis, where I analyze the effect of hurricanes on the stock market. I have an unbalanced panel dataset with stock returns over several days prior to and after each hurricane over a time frame of several years. I created a dummy variable "hurricane" taking the value 1 if the stock is affected that day by a hurricane, 0 otherwise. I included time-fixed as well as firm-fixed effects and clustered the standard error on firm-level. I have run the following regression as a baseline model:
xtreg RET i.hurricane i.date, fe vce(cluster PERMNO)
The regression output shows the coefficient for the hurricane dummy variable, many coefficients for all the days (which I know I don't have to look at further) and the constant.
Question:
Does the constant reflect the intercept (if the hurricane dummy is 0)? If so, I understand that in order to retrieve the effect if the stock is affected by the hurricane (dummy = 1), by adding the constant to the coefficient for hurricane dummy.
I am confused, as I have read, that I cannot interpret the constant in the fixed effects model (https://www.stata.com/support/faqs/s...effects-model/)
I would very much appreciate it if someone could help!!
DID estimation
hello everyone, how can I treat the problem if I see an error years not nested in countryid
or countryid not a control.
or countryid not a control.
Double Clustering in a Multi-country Data Set up
Dear Stata Members
First, a heartfelt advance New Year Wishes to All. I wish all a prosperous New Year
I am dealing with a cross-country dataset, in which the lowest units are firms. I have an agglomeration of firms (industry) and the broad level is the country. I have 22 Countries, 18 Industries,17252 firms and 22 years.
For panel data clustering I usually cluster at a single unit level, that is firm-level. However, some articles cluster at both firm and year levels in the cross-country setup.
What does it mean by double clustering (firm and year)?
Clustering as far as I know in the context of the panel, is to account for the correlation within the units. For instance, if the residual of the outcome variable is likely to be correlated within say Industry, one should cluster the standard errors by industry. But in the context of double clustering with respect to firm-year, will it make sense to cluster SE within these unique pairs of firm and year?
Similarly in a post, I have seen that clustering units less than 30 is not advisable (https://www.statalist.org/forums/for...72#post1603472). Will this apply to double clustering, where my no: of years are <30.
Double clustering indicates that clustering is done for 21 clusters (id-year). But the significance level has also changed. What could be the reason for this drop in significance from Single clustering to Double clustering?
Any thoughts, or suggestions could be helpful as this is for my general learning
First, a heartfelt advance New Year Wishes to All. I wish all a prosperous New Year
I am dealing with a cross-country dataset, in which the lowest units are firms. I have an agglomeration of firms (industry) and the broad level is the country. I have 22 Countries, 18 Industries,17252 firms and 22 years.
For panel data clustering I usually cluster at a single unit level, that is firm-level. However, some articles cluster at both firm and year levels in the cross-country setup.
What does it mean by double clustering (firm and year)?
Clustering as far as I know in the context of the panel, is to account for the correlation within the units. For instance, if the residual of the outcome variable is likely to be correlated within say Industry, one should cluster the standard errors by industry. But in the context of double clustering with respect to firm-year, will it make sense to cluster SE within these unique pairs of firm and year?
Similarly in a post, I have seen that clustering units less than 30 is not advisable (https://www.statalist.org/forums/for...72#post1603472). Will this apply to double clustering, where my no: of years are <30.
Code:
. xtset id year Panel variable: id (unbalanced) Time variable: year, 1999 to 2020, but with gaps Delta: 1 unit . reghdfe dividends risk roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id ) (dropped 1846 singleton observations) (MWFE estimator converged in 8 iterations) HDFE Linear regression Number of obs = 92,159 Absorbing 2 HDFE groups F( 9, 10505) = 192.48 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.3821 Adj R-squared = 0.3024 Within R-sq. = 0.0373 Number of clusters (id) = 10,506 Root MSE = 0.1669 (Std. err. adjusted for 10,506 clusters in id) ------------------------------------------------------------------------------ | Robust dividends | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- risk | .0089675 .003735 2.40 0.016 .0016462 .0162888 roa_w | -.7158403 .0192201 -37.24 0.000 -.7535152 -.6781653 size_w | .0051734 .0023954 2.16 0.031 .000478 .0098688 lev_w | -.0614293 .0088244 -6.96 0.000 -.0787268 -.0441318 sg_w | -.0029462 .0003515 -8.38 0.000 -.0036352 -.0022572 cash_ta1_w | -.0693444 .010555 -6.57 0.000 -.0900342 -.0486545 tangib_w | -.0245404 .0092626 -2.65 0.008 -.0426969 -.006384 age | .0165564 .0036146 4.58 0.000 .0094712 .0236417 mb_w | -.0006307 .0001642 -3.84 0.000 -.0009526 -.0003089 _cons | .2522908 .0239573 10.53 0.000 .2053299 .2992517 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id | 10506 10506 0 *| year | 21 0 21 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation . reghdfe dividends risk roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id year ) (dropped 1846 singleton observations) (MWFE estimator converged in 8 iterations) Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied. HDFE Linear regression Number of obs = 92,159 Absorbing 2 HDFE groups F( 9, 20) = 94.49 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.3821 Adj R-squared = 0.3024 Number of clusters (id) = 10,506 Within R-sq. = 0.0373 Number of clusters (year) = 21 Root MSE = 0.1669 (Std. err. adjusted for 21 clusters in id year) ------------------------------------------------------------------------------ | Robust dividends | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- risk | .0089675 .0122294 0.73 0.472 -.0165426 .0344776 roa_w | -.7158403 .046722 -15.32 0.000 -.8133007 -.6183799 size_w | .0051734 .004472 1.16 0.261 -.004155 .0145018 lev_w | -.0614293 .01249 -4.92 0.000 -.087483 -.0353757 sg_w | -.0029462 .0006372 -4.62 0.000 -.0042754 -.001617 cash_ta1_w | -.0693444 .0108852 -6.37 0.000 -.0920505 -.0466382 tangib_w | -.0245404 .0096214 -2.55 0.019 -.0446104 -.0044705 age | .0165564 .0060575 2.73 0.013 .0039207 .0291922 mb_w | -.0006307 .0002081 -3.03 0.007 -.0010648 -.0001967 _cons | .2522908 .0807704 3.12 0.005 .0838068 .4207749 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id | 10506 10506 0 *| year | 21 21 0 *| -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation
Any thoughts, or suggestions could be helpful as this is for my general learning
Xtabond2 command system GMM.
Hello to everyone, I am searching in terms of the effect of the uncertainty on the saving and in one part of my robustness check I want to do a Xtabond2 . I read the construction of doing xtabond2 from David Roodman . but I become confused. I have a panel data from 1996 to 2017. and :
saving (i,t)= b0+b1saving (i,t-1)+b2 uncertainty (i,t-1)+ b3 X(i,t-1) +vt + vi+ e(i,t)
and x(i,t-1) is a vector of controls, which in the baseline model includes only human capital and per capita income.
in the Xtabond2 I want to address a solution to the possible endogeneity problem between economic uncertainty and the saving by instrumenting them with suitable lagged variables. To obtain efficient findings in the System GMM estimations, I need evidence for the validity of the first-order autocorrelation in the residuals, but second-order autocorrelation must be rejected. Then We run the Sargan test to avoid possible over-identification problems.
the more I read the more I become confused to how to do this.
Thank you very much in advance.
Regards,
saving (i,t)= b0+b1saving (i,t-1)+b2 uncertainty (i,t-1)+ b3 X(i,t-1) +vt + vi+ e(i,t)
and x(i,t-1) is a vector of controls, which in the baseline model includes only human capital and per capita income.
in the Xtabond2 I want to address a solution to the possible endogeneity problem between economic uncertainty and the saving by instrumenting them with suitable lagged variables. To obtain efficient findings in the System GMM estimations, I need evidence for the validity of the first-order autocorrelation in the residuals, but second-order autocorrelation must be rejected. Then We run the Sargan test to avoid possible over-identification problems.
the more I read the more I become confused to how to do this.
Thank you very much in advance.
Regards,
Thursday, December 30, 2021
ivreg2 warning covariance matrix of moment conditions not of full rank
Dear Stata Users,
I was trying to estimate an IV regression (2SLS)
I have few dummies in my model, the dependent is a binary variable of benefit's claim and I am instrumenting the amount. I use weights to correct for sampling issues.
I got this error at the end of my 2nd stage results table
I don't know what to do here with this error message. The results show the coefficient of the instrumented variable/ ben_amt -.007962 to be be significant (P-value 0.095 ) though weakly significant. The two instruments used are shown to be strongly significant in the first stage results (both p-value 0.000)
I was trying to estimate an IV regression (2SLS)
I have few dummies in my model, the dependent is a binary variable of benefit's claim and I am instrumenting the amount. I use weights to correct for sampling issues.
I got this error at the end of my 2nd stage results table
Code:
ivreg2 nclaim age age2 married ib3.edu i.eu_nat i.pgemplst_gen i.howner i.singlep i.dis_dummy i.female i.east ib2.citysize i.haskids (ben_amt= prvtransfers needs) [pw=hweight] if head==1, first ----------------------------------------------------------------------------------------------------------- Underidentification test (Kleibergen-Paap rk LM statistic): 146.799 Chi-sq(2) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 1545.588 (Kleibergen-Paap rk Wald F statistic): 165.936 Stock-Yogo weak ID test critical values: 10% maximal IV size 19.93 15% maximal IV size 11.59 20% maximal IV size 8.75 25% maximal IV size 7.25 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Warning: estimated covariance matrix of moment conditions not of full rank. overidentification statistic not reported, and standard errors and model tests should be interpreted with caution. Possible causes: singleton dummy variable (dummy with one 1 and N-1 0s or vice versa) partial option may address problem. ------------------------------------------------------------------------------ Instrumented: ben_amt Included instruments: age age2 married 0.edu1 1.edu2 2.edu3 4.edu4 1.howner 1.singlep 1.dis_dummy 1.female 1.east 1.citysize 3.citysize 1.haskids Excluded instruments: prvtransfers needs ------------------------------------------------------------------------------
help on stacking matrices
Hi, Stata gurus,
this seems to be easy, once with right Mata commands.
I have N matrices, same number of columns, but different number of rows. They are named matrix1 .. matrixN.
I would like to stack them all, dropping the original ones, using a loop.
Any clue ? thanks, in advance.
this seems to be easy, once with right Mata commands.
I have N matrices, same number of columns, but different number of rows. They are named matrix1 .. matrixN.
I would like to stack them all, dropping the original ones, using a loop.
Any clue ? thanks, in advance.
2SLS with we use polynomial terms of birth cohort.
Hello,
I am trying to estimate a fuzzy RD. Fuzzy RD is equivalent to 2SLS (Hahn et al. 2001). So I am trying to run 2SLS. (I am not allowed to use rd or rdrobust commands.)
Compulsory schooling in Turkey increased from 5 to 8 years in 1997. Birth cohorts that are born after 1986 (not included) are affected by the reform. I want to control for birth-cohort polynomials up to fourth-order. I also include age dummies. However, when I run the command below, birth_year2 birth_year3 birth_year4 are omitted because of multicollinearity. Could you help me with my code?
xi: ivregress 2sls lwage (yrs_school = reform) birth_year birth_year2 birth_year3 birth_year4 i.age , first
Thank you very much
I am trying to estimate a fuzzy RD. Fuzzy RD is equivalent to 2SLS (Hahn et al. 2001). So I am trying to run 2SLS. (I am not allowed to use rd or rdrobust commands.)
Compulsory schooling in Turkey increased from 5 to 8 years in 1997. Birth cohorts that are born after 1986 (not included) are affected by the reform. I want to control for birth-cohort polynomials up to fourth-order. I also include age dummies. However, when I run the command below, birth_year2 birth_year3 birth_year4 are omitted because of multicollinearity. Could you help me with my code?
xi: ivregress 2sls lwage (yrs_school = reform) birth_year birth_year2 birth_year3 birth_year4 i.age , first
Thank you very much
Choice of regression model
My dependent variable is interval but bounded in the interval 5 to 80. All the values are known within the accuracy of the experiment and it is not possible for values to occur outside of these bounds. I had intended to analyse the data within a regression framework but I am uncertain which regression model to use.
Would tobit or truncreg be appropriate or is there another model; or is the fact that the data does not follow the OLS assumption of an unbounded dependent variable not that important?
Eddy
Would tobit or truncreg be appropriate or is there another model; or is the fact that the data does not follow the OLS assumption of an unbounded dependent variable not that important?
Eddy
Problem with bysort command as it is always showing not sorted!
Dear Stata Members
I would like to calculate the industry sales growth (isg) and following is my sample data
Why is Stata repeatedly saying not sorted?
I would like to calculate the industry sales growth (isg) and following is my sample data
Code:
clear input str9 firm int year byte ind_dum float netsales "4DS.AX" 2015 31 7.70e-06 "PRS.AX" 2015 21 . "HRN.AX" 2015 21 . "VIC.AX" 2015 21 .000139 "PVW.AX" 2015 21 0 "ALX.AX" 2015 23 1.88932 "FBR.AX" 2015 31 .174572 "IPT.AX" 2015 21 .927405 "ODM.AX" 2015 21 .018043 "ABV.AX" 2015 31 5.31421 "SRK.AX" 2015 21 .193432 "KLO.AX" 2015 23 13.2431 "GLA.AX" 2015 21 .009091 "EOS.AX" 2015 31 22.1893 "BLY.AX" 2015 21 735.158 "AGR.AX" 2015 21 .030962 "DRE.AX" 2015 21 .015323 "LLC.AX" 2015 23 10231.6 "NZS.AX" 2015 31 .477879 "TPD.AX" 2015 21 .298108 "HGO.AX" 2015 21 101.487 "AWV.AX" 2015 21 .016317 "ALT.AX" 2015 31 .862368 "IMC.AX" 2015 31 .865704 "SYA.AX" 2015 21 .016271 "TNR.AX" 2015 21 0 "FLC.AX" 2015 31 .00118 "NWC.AX" 2015 21 .034922 "ADR.AX" 2015 31 2.21078 "AAP.AX" 2015 11 2.6729 "HPP.AX" 2015 31 54.7183 "CIO.AX" 2015 31 0 "AGD.AX" 2015 21 62.4951 "BMN.AX" 2015 21 .05778 "TGN.AX" 2015 21 .120552 "PXX.AX" 2015 21 .002889 "RIO.AX" 2015 21 34829 "AJQ.AX" 2015 21 .075345 "PDI.AX" 2015 21 .007142 "PSA.AX" 2015 21 1.613 "GGG.AX" 2015 21 .141135 "CCE.AX" 2015 22 1.25232 "PDZ.AX" 2015 21 .026895 "CRR.AX" 2015 21 .14729 "PPG.AX" 2015 31 187.885 "AQX.AX" 2015 21 .000917 "EUR.AX" 2015 21 .003143 "SVL.AX" 2015 21 .002712 "CCZ.AX" 2015 21 .004715 "FNT.AX" 2015 21 .015054 "LRL.AX" 2015 21 .001579 "TMS.AX" 2015 21 . "SIH.AX" 2015 21 .001418 "APG.AX" 2015 21 . "CNJ.AX" 2015 21 .000855 "EMU.AX" 2015 21 .166837 "XAM.AX" 2015 21 .026918 "FIN.AX" 2015 21 . "CLA.AX" 2015 21 .002042 "GUD.AX" 2015 31 382.961 "DYL.AX" 2015 21 .097517 "DEV.AX" 2015 21 . "AAU.AX" 2015 21 46.6495 "JAL.AX" 2015 21 .012966 "AAR.AX" 2015 21 0 "GLV.AX" 2015 21 .164472 "AZS.AX" 2015 21 .275887 "LCL.AX" 2015 21 .003237 "SFX.AX" 2015 21 .164811 "LEG.AX" 2015 21 .418429 "LCD.AX" 2015 21 .008343 "ADD.AX" 2015 21 0 "BSR.AX" 2015 21 .015278 "BKW.AX" 2015 31 528.38 "NVA.AX" 2015 21 0 "IDA.AX" 2015 21 2.86434 "PPY.AX" 2015 31 .068049 "EQR.AX" 2015 21 .040662 "HE8.AX" 2015 21 . "KRR.AX" 2015 21 .003197 "ARD.AX" 2015 21 . "SES.AX" 2015 31 13.8294 "SAN.AX" 2015 21 .047 "BNR.AX" 2015 21 .021363 "MSV.AX" 2015 21 19.3945 "CEL.AX" 2015 21 .044144 "MPR.AX" 2015 22 41.8527 "DEM.AX" 2015 22 .528381 "AS1.AX" 2015 11 1.25869 "LTR.AX" 2015 21 .383828 "VML.AX" 2015 21 .035893 "WNR.AX" 2015 31 2.20463 "MLM.AX" 2015 21 .552067 "MGT.AX" 2015 21 .056362 "DDD.AX" 2015 21 .00785 "XRF.AX" 2015 31 15.9248 "A3D.AX" 2015 31 . "GLN.AX" 2015 21 .021841 "OKJ.AX" 2015 31 .011024 "FWD.AX" 2015 31 210.239 "RIC.AX" 2015 31 699.422 "ZEU.AX" 2015 21 .137385 "RFG.AX" 2015 31 190.477 "MNB.AX" 2015 21 .00235 "MXC.AX" 2015 31 .004538 "GMRDD.AX" 2015 21 .019976 "ERA.AX" 2015 21 253.359 "NXM.AX" 2015 21 0 "WC8.AX" 2015 21 .001402 "CZR.AX" 2015 21 .007365 "ESS.AX" 2015 21 .201089 "AYM.AX" 2015 21 .037542 "MYL.AX" 2015 21 .166144 "KOV.AX" 2015 31 48.5543 "ZGL.AX" 2015 31 92.6665 "AUH.AX" 2015 21 .013012 "VMS.AX" 2015 21 .134612 "BUY.AX" 2015 21 1.49285 "GAP.AX" 2015 31 114.014 "CRB.AX" 2015 21 .065337 "BRI.AX" 2015 31 123.384 "GW1.AX" 2015 21 .098519 "BUX.AX" 2015 21 .011232 "ECS.AX" 2015 31 4.54599 "WSI.AX" 2015 23 .000116 "88E.AX" 2015 21 .010549 "AJY.AX" 2015 21 1.93232 "E25.AX" 2015 21 .334176 "ERL.AX" 2015 21 .003112 "SI6.AX" 2015 21 .055345 "CSL.AX" 2015 31 5628 "ATL.AX" 2015 31 72.7194 "BKL.AX" 2015 31 363.331 "MBK.AX" 2015 21 .004946 "PBX.AX" 2015 21 .038497 "MVL.AX" 2015 21 0 "ANW.AX" 2015 21 .001009 "EGR.AX" 2015 31 .038173 "NWF.AX" 2015 21 .05033 "PEN.AX" 2015 21 .153 "WPL.AX" 2015 21 5030 "DRX.AX" 2015 21 .111577 "SGC.AX" 2015 21 .270732 "GLL.AX" 2015 21 .615548 "IBG.AX" 2015 21 .196452 "EMP.AX" 2015 21 .287859 "SKS.AX" 2015 23 2.71821 "PEK.AX" 2015 21 .029606 "AWN.AX" 2015 22 3.6738 "M7T.AX" 2015 31 .128718 "AUZ.AX" 2015 21 . "TIE.AX" 2015 21 . "EVN.AX" 2015 21 513.053 "COI.AX" 2015 21 .593977 "TON.AX" 2015 21 .000546 "AOP.AX" 2015 21 .016479 "FHS.AX" 2015 21 .000755 "PNV.AX" 2015 31 .10841 "AEV.AX" 2015 21 .550018 "BXB.AX" 2015 31 5440.5 "CGB.AX" 2015 31 .131854 "RSG.AX" 2015 21 381.81 "ASQ.AX" 2015 21 1.22065 "FFI.AX" 2015 31 23.669 "EL8.AX" 2015 21 .289801 "MWY.AX" 2015 31 109.614 "AQI.AX" 2015 21 .01681 "PRX.AX" 2015 21 .302281 "RR1.AX" 2015 21 .005855 "EGY.AX" 2015 31 8.32588 "RNX.AX" 2015 21 .012149 "FML.AX" 2015 21 1.82021 "RPG.AX" 2015 23 0 "CHK.AX" 2015 31 .010146 "MCE.AX" 2015 31 110.995 "AFR.AX" 2015 21 .0705 "HLX.AX" 2015 21 .055592 "GSN.AX" 2015 21 .030639 "VRX.AX" 2015 21 .047102 "GED.AX" 2015 21 .131484 "GOR.AX" 2015 21 .770753 "BMG.AX" 2015 21 .003821 "BLG.AX" 2015 31 2.72176 "GIB.AX" 2015 21 .07533 "ADN.AX" 2015 21 .036463 "ECT.AX" 2015 21 .009676 "ZIM.AX" 2015 21 408.391 "CCJ.AX" 2015 21 .011348 "MAT.AX" 2015 21 1.51972 "WMC.AX" 2015 21 .009707 "BAS.AX" 2015 21 .057037 "TNP.AX" 2015 21 .004053 "TZN.AX" 2015 21 0 "KAR.AX" 2015 21 1.6671 "TTT.AX" 2015 31 0 "CAE.AX" 2015 21 .005971 "AIS.AX" 2015 21 167.395 "STO.AX" 2015 21 2442 "CZN.AX" 2015 21 1.08749 "E79.AX" 2015 21 .074582 "CGN.AX" 2015 21 .04392 "MGX.AX" 2015 21 252.577 "FMG.AX" 2015 21 8574 "AVA.AX" 2015 31 14.371 "RAN.AX" 2015 31 .34746 "CYC.AX" 2015 31 9.1874 "ENR.AX" 2015 21 1.23003 "TOE.AX" 2015 21 .382333 "KPO.AX" 2015 22 .775005 "SYR.AX" 2015 21 .291 "GCR.AX" 2015 21 .030816 "AKN.AX" 2015 21 .031552 "MBH.AX" 2015 31 .007951 "RDN.AX" 2015 21 43.7964 "FAU.AX" 2015 21 .004569 "CHN.AX" 2015 21 .468602 "MML.AX" 2015 21 123.172 "NCR.AX" 2015 21 .396755 "BLU.AX" 2015 21 .161784 "EYE.AX" 2015 31 48.2909 "GLB.AX" 2015 31 106.131 "OBM.AX" 2015 21 .001872 "ARR.AX" 2015 21 .003944 "SKY.AX" 2015 21 .244236 "ICG.AX" 2015 21 .026116 "SGQ.AX" 2015 21 .016047 "INA.AX" 2015 23 58.5664 "STX.AX" 2015 21 2.07699 "AGS.AX" 2015 21 .170589 "TNG.AX" 2015 21 . "OKU.AX" 2015 21 .505905 "CMM.AX" 2015 21 .214625 "PO3.AX" 2015 22 0 "AME.AX" 2015 21 .026317 "NMT.AX" 2015 21 .323205 "BDC.AX" 2015 21 .086285 "MAN.AX" 2015 21 2.41912 "PGM.AX" 2015 21 .013397 "KGM.AX" 2015 21 .007165 "DAF.AX" 2015 21 .001071 "ING.AX" 2015 11 1740.1 "RXM.AX" 2015 21 . "KFE.AX" 2015 21 .01047 "PTX.AX" 2015 31 .030793 "ENT.AX" 2015 21 .012742 "GES.AX" 2015 21 .00077 "VAR.AX" 2015 21 .206112 "CYL.AX" 2015 21 .015516 "CMP.AX" 2015 31 25.8045 "SIX.AX" 2015 31 .973876 end
Code:
encode firm, gen (id) xtset id year Panel variable: id (strongly balanced) Time variable: year, 2015 to 2015 Delta: 1 unit by ind_dum year, sort: gen isg = d.netsales /l.netsales not sorted r(5); *I also tried bysort ind_dum (year): gen isg = d.netsales /l.netsales not sorted r(5);
propensity score matching
I want to do a 1:2 propensity score matching. I am unable to find a source for this any help?
Wednesday, December 29, 2021
set data out of range as missing
Dear statalist,
I have a variable, index, its value should be between 0 to 1, but there are some values out of this range, so I want to set those values as .
I tried
but that doesn't work. replace index=. doesn't work either
It seems I could replace index with any real value but not .
Any help will be greatly appreciated!
I have a variable, index, its value should be between 0 to 1, but there are some values out of this range, so I want to set those values as .
I tried
Code:
replace index==. if index<0|index>1
It seems I could replace index with any real value but not .
Any help will be greatly appreciated!
Stata forecast does not support mixed effects models
As part of reviewer's requests/recommendations, I am trying to compare forecast accuracy of the (almost) same model using fixed effects, random effects, and mixed effects. My data is panel data. I see that forecast support xtreg but does not support xtmixed. Has anyone succeeded using Stata to forecast based on xtmixed model outputs?
If anyone from Stata is reading, can anyone briefly explain, under the hood, why forecast does not support xtmixed models? Is this some feature that can be added/updated in future Stata version? If for now forecast cannot support xtmixed models, what is the "work around" solution using Stata to forecast based on xtmixed model outputs?
Thanks for taking your time reading and/or answering in advance!
If anyone from Stata is reading, can anyone briefly explain, under the hood, why forecast does not support xtmixed models? Is this some feature that can be added/updated in future Stata version? If for now forecast cannot support xtmixed models, what is the "work around" solution using Stata to forecast based on xtmixed model outputs?
Thanks for taking your time reading and/or answering in advance!
Endogeneity and RESET Test with PPML
Hello,
I'm trying to estimate trade potential for the BRICS group of countries - as in, intra-BRICS trade potential - using a gravity model and the PPML estimator. I'm generating predicted values, and then calculating trade potential by (predict-actual). My dependent variable is exports (in $USD MN), while my independents are ln(gdp) of exporter and importer, standard gravity covariates and membership in RTAs. I'm using exporter-time and importer-year fixed effects. I run this for aggregate merchandise trade and then replicate it for several products of interest. I'm using COMTRADE and CEPII gravity database to source my data.
I have three questions here:
1. How can I test for endogeneity in the context of the PPML estimator?
2. How does one test for misspecification problems such as RESET test? As far as I know, RESET tests for whether a linear form of suitable for our regression analysis. Does it also work when we no longer use OLS (like in my case here)?
3. Should I run my regression on all the relevant products of interest in one combined regression? Will that provide me with some additional insight?
This is my code (after importing dataset):
Many thanks for your time and attention. Let me know if I should provide some extra information.
Regards,
Saunok
I'm trying to estimate trade potential for the BRICS group of countries - as in, intra-BRICS trade potential - using a gravity model and the PPML estimator. I'm generating predicted values, and then calculating trade potential by (predict-actual). My dependent variable is exports (in $USD MN), while my independents are ln(gdp) of exporter and importer, standard gravity covariates and membership in RTAs. I'm using exporter-time and importer-year fixed effects. I run this for aggregate merchandise trade and then replicate it for several products of interest. I'm using COMTRADE and CEPII gravity database to source my data.
I have three questions here:
1. How can I test for endogeneity in the context of the PPML estimator?
2. How does one test for misspecification problems such as RESET test? As far as I know, RESET tests for whether a linear form of suitable for our regression analysis. Does it also work when we no longer use OLS (like in my case here)?
3. Should I run my regression on all the relevant products of interest in one combined regression? Will that provide me with some additional insight?
This is my code (after importing dataset):
Code:
egen exporter_year = group(exporter year) tabulate exporter_year, generate(EXPORTER_YEAR_FE) egen importer_year = group(importer year) tabulate importer_year, generate(IMPORTER_YEAR_FE) drop if exp<0 ppml exp lngdp_ex lngdp_im lndistw contig comlang_off comcol col45 rta EXPORTER_YEAR_FE* IMPORTER_YEAR_FE* predict fitted gen potential = fitted - exp (252 missing values generated)
Regards,
Saunok
Charlson index for dataset that contains both ICD9 and ICD10 for different years
Hello,
I am working on a hospital administrative dataset combined for multiple years between 2004-2018 the cases in the earlier years made use of ICD9, while 2016 upward made use of ICD10. I am trying to calculate the Charlson Co-morbidity index for each case in the combined data set. The available code index only accommodates either ICD9 or ICD 10, is there a way I can combine both. The code I used is pasted below.
charlson DX1-DX30 I10_DX1-I10_DX40 , index(c) assign0 wtchrl cmorb *for ICD 9* or charlson DX1-DX30 I10_DX1-I10_DX40 , index(10) assign0 wtchrl cmorb *for ICD 10*
I look forward to your help with cheerful optimism.
Olowu
I am working on a hospital administrative dataset combined for multiple years between 2004-2018 the cases in the earlier years made use of ICD9, while 2016 upward made use of ICD10. I am trying to calculate the Charlson Co-morbidity index for each case in the combined data set. The available code index only accommodates either ICD9 or ICD 10, is there a way I can combine both. The code I used is pasted below.
charlson DX1-DX30 I10_DX1-I10_DX40 , index(c) assign0 wtchrl cmorb *for ICD 9* or charlson DX1-DX30 I10_DX1-I10_DX40 , index(10) assign0 wtchrl cmorb *for ICD 10*
I look forward to your help with cheerful optimism.
Olowu
attempt at simple xtline plot gives syntax error
Hi everybody,
I am using Stata15 and want to show oil price development in the time-period I am looking at since its part of my independant variable.
The data is correctly xtset:
And I want (for starters) a simple graph:
After all I have read this should work, but it doesn't and Stata gives me a 'invalid syntax'.
I'd appreciate any hint.
Best
Marvin
I am using Stata15 and want to show oil price development in the time-period I am looking at since its part of my independant variable.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int id_2 long yrmo int target_sidea double oil_price_t0 float OPS_t . 200312 . 27.85 . . 200912 . 75.48 . . 200304 . 23.43 . . 200907 . 64.96 . . 200305 . 24.25 . . 200905 . 57.39 . . 200401 . 28.68 . . 200906 . 69.21 . . 200307 . 26.62 . . 200303 . 27.42 . . 200311 . 27.5 . . 200309 . 25.26 . . 200302 . 30.19 . . 200908 . 71.32 . . 200911 . 77.62 . . 200308 . 27.61 . . 200310 . 27.13 . . 200306 . 25.49 . . 200910 . 73.27 . . 200903 . 45.57 . . 200904 . 50.18 . . 200301 . 28.05 . . 200909 . 67.9 . 1 200806 27 127.58 .07012443 1 200709 57 73.25 .08620504 1 200405 35 34.47 .10416858 1 200803 28 96.77 .07297191 1 200511 76 51.31 -.04478218 1 200605 221 64.9 .013183548 1 200606 128 65.08 .002769635 1 200802 37 89.96 .031504914 1 200510 105 53.66 -.05228052 1 200609 184 59.77 -.14040913 1 200901 18 44.96 .09244503 1 200807 21 131.22 .02813167 1 200406 52 33.4 -.031533517 1 200702 185 55.68 .06818504 1 200503 118 45.57 .1073863 1 200608 149 68.78 -.003773055 1 200407 54 34.48 .0318235 1 200610 245 56.5 -.05626323 1 200509 96 56.54 -.0014139086 1 200404 43 31.06 .0195065 1 200801 42 87.17 .01642415 1 200811 34 51.38 -.28933507 1 200410 87 37.57 .05751856 1 200701 216 52.01 -.12032206 1 200703 139 59.05 .05876352 1 200412 107 34.26 -.017648337 1 200808 36 113.21 -.14763081 1 200501 162 37.81 .09859517 1 200812 39 40.99 -.22592087 1 200602 145 57.57 -.01277203 1 200707 114 69.45 .0547474 1 200409 12 35.47 -.07650152 1 200505 179 45 -.04561048 1 200508 112 56.62 .06890459 1 200611 136 56.81 .005471728 1 200704 186 63.83 .07783877 1 200804 30 103.46 .06684807 1 200607 172 69.04 .05906874 1 200403 32 30.46 .0703774 1 200604 180 64.05 .10544727 1 200603 138 57.64 .0012151777 1 200506 122 50.97 .12457474 1 200512 29 53.12 .034667812 1 200902 18 43.13 -.04155437 1 200402 14 28.39 -.010163056 1 200507 117 52.85 .036220465 1 200411 231 34.87 -.074579 1 200504 123 47.1 .033023402 1 200706 108 65.75 .01857447 1 200809 30 95.96 -.16531305 1 200810 31 68.62 -.3353474 1 200711 52 86.73 .1171779 1 200601 163 58.31 .09322012 1 200805 32 118.94 .13943411 1 200712 57 85.75 -.011363798 1 200612 124 58.66 .032045674 1 200502 101 40.93 .07928964 1 200408 78 38.29 .10480934 1 200705 123 64.54 .011061858 1 200710 45 77.14 .0517437 1 200708 77 67.2 -.03293378 2 200601 293 58.31 .09322012 2 200604 208 64.05 .10544727 2 200509 185 56.54 -.0014139086 2 200512 75 53.12 .034667812 2 200703 339 59.05 .05876352 2 200610 378 56.5 -.05626323 2 200902 16 43.13 -.04155437 2 200808 29 113.21 -.14763081 2 200510 295 53.66 -.05228052 2 200708 199 67.2 -.03293378 2 200607 278 69.04 .05906874 2 200612 350 58.66 .032045674 2 200704 360 63.83 .07783877 2 200806 48 127.58 .07012443 2 200609 302 59.77 -.14040913 2 200506 149 50.97 .12457474 end format %tm yrmo
The data is correctly xtset:
Code:
xtset id_2 yrmo, monthly
Code:
xtline oil_price_t0, overlay
I'd appreciate any hint.
Best
Marvin
Data cleaning - checking correct encoding of variables
Hello,
I'm very new to Stata and am trying to complete some data cleaning. I have a dataset with 5 variables and around 200 million observations. The variables are all numeric, and I would like to check that three of them have been encoded correctly, as they were originally categorical (string) variables. For example, I would like to know if the numerical code captures distinct countries for the country variable (there may be typos in the original categories, for instance).
The original string variables are not available, but Stata shows the country names in browse (the categorical variable), but treats the variable as numeric in the data editor. Is there any way to check what the equivalencies between the two are?
Thank you in advance for any help you might be able to give me!
Best wishes,
Clara
I'm very new to Stata and am trying to complete some data cleaning. I have a dataset with 5 variables and around 200 million observations. The variables are all numeric, and I would like to check that three of them have been encoded correctly, as they were originally categorical (string) variables. For example, I would like to know if the numerical code captures distinct countries for the country variable (there may be typos in the original categories, for instance).
The original string variables are not available, but Stata shows the country names in browse (the categorical variable), but treats the variable as numeric in the data editor. Is there any way to check what the equivalencies between the two are?
Thank you in advance for any help you might be able to give me!
Best wishes,
Clara
Hausman test: p=value of 1
Hi, I am trying to carry out a Hausman test on panel data (using males and females) to clarify if I should use a fixed or random-effects model. However, when I do this I get a p-value of 1 (which I am assuming means something has gone wrong). I have tried to use "sigmamore" and "xtoverid" but neither seem to have any impact - if anyone has any ideas they would be much appreciated!
Small observation in multinomial logistic regression
Hello
I am trying to know what region is more attractive for people. so I used region ( five categorical variables) as a dependant.
people characteristics are education (high low medium), country of origin (EU, nonEU US Africa Asia), and type (Family, Student, Worker, Refugees). All are categorical
I also aim to interact education with type fn origin. But as there are only two low and medium educated students and 5 low educated EU in one specific region, the relative risk ratio for these groups in interaction is extremely high with empty intervals. So I remove them. running my MNL, results are much better.
but as e.g. low educated students are removed for students I have two levels of education while for other groups I have three educational levels. so the low educated student is zero while high educated students are also omitted! basically, how should I exclude some observations in MNL?
I am trying to know what region is more attractive for people. so I used region ( five categorical variables) as a dependant.
people characteristics are education (high low medium), country of origin (EU, nonEU US Africa Asia), and type (Family, Student, Worker, Refugees). All are categorical
I also aim to interact education with type fn origin. But as there are only two low and medium educated students and 5 low educated EU in one specific region, the relative risk ratio for these groups in interaction is extremely high with empty intervals. So I remove them. running my MNL, results are much better.
but as e.g. low educated students are removed for students I have two levels of education while for other groups I have three educational levels. so the low educated student is zero while high educated students are also omitted! basically, how should I exclude some observations in MNL?
Weighted least squares
Hello!
I have conducted an ordinary least squares model and to test it I performed a sensitivity analysis by doing weighted least squares since I have heteroskedastic standard errors. I wonder what's the advantage of conducting a WLS and how I should interpret my two results?
Best regards,
Klaudia
I have conducted an ordinary least squares model and to test it I performed a sensitivity analysis by doing weighted least squares since I have heteroskedastic standard errors. I wonder what's the advantage of conducting a WLS and how I should interpret my two results?
Best regards,
Klaudia
Tuesday, December 28, 2021
Generalized propensity score matching for multilevel treatment
Hi,
I am a first-time user of the generalized propensity score. I have 3 treatment levels that are qualitatively different. I used mlogit and predict commands to estimate three sets of gpscore. But I am not sure how to match the pairs of treatments. I have read in several papers that the researchers used nearest neighbour matching to match the treated and control units. Can I use the same if I have a vector of propensity scores, and can I do it in Stata? If not, is there any other way to match the units based on the gpscores in Stata?
Will appreciate any help on this matter.
Thanks,
Nadia
I am a first-time user of the generalized propensity score. I have 3 treatment levels that are qualitatively different. I used mlogit and predict commands to estimate three sets of gpscore. But I am not sure how to match the pairs of treatments. I have read in several papers that the researchers used nearest neighbour matching to match the treated and control units. Can I use the same if I have a vector of propensity scores, and can I do it in Stata? If not, is there any other way to match the units based on the gpscores in Stata?
Will appreciate any help on this matter.
Thanks,
Nadia
Generate two new variables in Stata representing the change in net-pay and the change in well-being between june and october
Hi, I'm new to Stata how can I show the above statement in Stata?
I would really appreciate any advice on how to solve this in Stata.
I would really appreciate any advice on how to solve this in Stata.
error: depvar may not be a factor variable
Hey everybody,
I want to run a regression with binary variables but I am getting this error: depvar may not be a factor variable.
I don't know why "no_ma_deals_binary" might not be a factor variable. It only contains 0 and 1.
I hope someone can help me.
Kind regards,
Jana
I want to run a regression with binary variables but I am getting this error: depvar may not be a factor variable.
I don't know why "no_ma_deals_binary" might not be a factor variable. It only contains 0 and 1.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(no_ma_deals_binary CTO_presence) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 end
Kind regards,
Jana

Create variable using expansion factor
Dear all, I would like to ask you a question. In a database I have identified a group of people, who should receive a transfer, but I have a problem, I have no idea how to do this, considering the expansion factors. That is, if I give the subsidy to the 10,000 people identified in the sample, then the expansion factor makes the amount increase. Do you know of any way to correct this so that the number of people is the same?
Best regards and thanks in advance.
Best regards and thanks in advance.
Converting Date-Month-Year (string) to year
Dear Stata Members
I have an issue which I know is the most answered questions in this form, I tried using some past posts but to no avail. My issue is reproduced. Incorp is the variable from which I need to extract year
.
I tried using
but
My issue is to extract 1986 if the year is 05-May-86. That is the calendar year. How to do that
I have an issue which I know is the most answered questions in this form, I tried using some past posts but to no avail. My issue is reproduced. Incorp is the variable from which I need to extract year
Code:
describe incorp Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------------------------------------------ incorp str9 %9s
Code:
list incorp in 1/40 +-----------+ | incorp | |-----------| 1. | | 2. | | 3. | | 4. | | 5. | | |-----------| 6. | | 7. | | 8. | | 9. | | 10. | | |-----------| 11. | | 12. | | 13. | | 14. | | 15. | | |-----------| 16. | | 17. | | 18. | | 19. | | 20. | | |-----------| 21. | | 22. | | 23. | 05-May-86 | 24. | 05-May-86 | 25. | 05-May-86 | |-----------| 26. | 05-May-86 | 27. | 05-May-86 | 28. | 05-May-86 | 29. | 05-May-86 | 30. | 05-May-86 | |-----------| 31. | 05-May-86 | 32. | 05-May-86 | 33. | 05-May-86 | 34. | 05-May-86 | 35. | 05-May-86 | |-----------| 36. | 05-May-86 | 37. | 05-May-86 | 38. | 05-May-86 | 39. | 05-May-86 | 40. | 05-May-86 | +-----------+
I tried using
Code:
generate numyear = date(incorp, "MDY") (408,405 missing values generated)
Code:
. mdesc numyear Variable | Missing Total Percent Missing ----------------+----------------------------------------------- numyear | 408,405 408,405 100.00 ----------------+-----------------------------------------------
Showing a different mean in tabstat
I tabulated in my data type of error cases by the number of children. After using sum and N along with mean in tabstat, I realized I don't want this way of mean calculation.
Code:
tabstat b_err, by( kidsnum) stat(sum N mean)
Ideally, I needed the mean to be calculated of the total number of error cases which is shown by the total cases of sum. So, for example for the no children category, (144/445) instead of how it calculated here (144/818), how to adjust this?
Categorical Variables
Hi, I'm an ultra-beginner to Stata and everything related to it. Nevertheless, I have to fix this problem.
I work on a large-scale assessment with 42,000 values. have 5 self-concept variables, one for each subject (eg: sskmat, sskdeu) and 5 grade variables, one for each subject (eg: tnotemat, tnotedeu) each with 6 grade characteristics (1 is bad; 6 very good) and the dummy variable female.
I basically in the first step just need two grade groups (German&Math High group and one Mathhigh group) and their respective self-concept mean for female and non-female. I z-standardized the variables as well but didn’t use it for this analysis so far, cause I needed the “real” grades (due to a lack of skills).
So far I managed it this way, but it’s not really convenient. I would also need a Cohen’s d (I guess?)
Mean sskmat if tnotemat>=5 & tnotedeu ==4 & female
Mean sskmat tnotemat>=5 & tnotedeu >=5 & female
How can I do this properly?
Thanks a million!!!
Monday, December 27, 2021
Ho, ho, ho! Wild bootstrap tests for everyone!
I've translated the guts of my boottest program from Mata to Julia, producing the Julia package WildBootTests.jl. And at the page just linked, I've posted examples of calling the package from Julia, R, Python, and Stata. The Stata example is convoluted: using Stata 16 or 17, you go into Python. From there you use the Python package PyJulia to link to Julia. For it to work, you need to have installed both Python and Julia, along with PyJulia in Python and WildBootTests in Julia.
While I doubt this will be of much use to Stata users (boottest is damn fast and easier to use), I think the project is interesting in a number of ways:
On my machine, I get
...meaning the new version is 10x faster.
One source of speed-up is that by default wildboottest() does all computations in single-precision (32-bit floats) rather than double, something that is not possible in Mata, but I think is typically fine for a bootstrap-based test.
While I doubt this will be of much use to Stata users (boottest is damn fast and easier to use), I think the project is interesting in a number of ways:
- It offers a model of cross-platform development of statistical software. One need invest in building a feature set and optimizing code just once. Tailored front-ends can be written for each platform. In fact Alexander Fischer is writing one for R.
- Julia promises the plasticity of Python and the speed of C, roughly speaking, by way of just-in-time compilation. In my experience, fully realizing this potential takes a lot of work, at least when climbing learning curves. You have to dig into the innards of type inference and compilation and stare at pages of arcane output (from @code_warntype or SnoopCompile). Partly that is a comment on the immaturity of Julia. It has the well-recognized problem of long "time to first plot," meaning that there's a long lag the first time a package is used in a session. On my machine, the wildboottest() function often takes 12 seconds to run the first time, and it was a struggle to get it that low.
- Nevertheless, the promise is real. A programmer can achieve much higher performance than with Mata, yet without having to bother with manually compiling code for multiple operating systems and CPUs, the way you do with C plug-ins for Stata. An example below shows WildBootTests 10x faster than boottest even when calling from Stata.
- Julia could be more directly integrated into Stata, making the link easier and more reliable. I've already suggested that Stata corp do this, the way they have for Python. Or maybe a user could lead the way, as James Fiedler did for Python.
Code:
infile coll merit male black asian year state chst using regm.raw, clear qui xi: regress coll merit male black asian i.year i.state, cluster(state) generate individual = _n // unique ID for each observation timer clear timer on 1 boottest merit, nogr reps(9999) bootcluster(individual) // subcluster bootstrap timer off 1 timer on 2 mat b = e(b)[1,1..colsof(e(b))-1] // drop constant term global vars: colnames b // get right-side variable names python from julia import WildBootTests as wbt import numpy as np from sfi import Data R = np.concatenate(([1], np.zeros(`=colsof(b)'))).reshape(1,-1) # put null in Rβ = r form r = np.array([0]) resp = np.asarray(Data.get('coll')) # get response variable predexog = np.c_[np.asarray(Data.get('$vars')), np.ones(resp.size)] # get exogenous predictor variables + constant clustid = np.asarray(Data.get('individual state')).astype(int) # get clustering variables test = wbt.wildboottest(R, r, resp=resp, predexog=predexog, clustid=clustid, nbootclustvar=1, nerrclustvar=1, reps=9999) # do test wbt.teststat(test) # show results wbt.p(test) wbt.CI(test) end timer off 2 timer list
Code:
. timer list 1: 22.64 / 1 = 22.6360 2: 2.10 / 1 = 2.1040
One source of speed-up is that by default wildboottest() does all computations in single-precision (32-bit floats) rather than double, something that is not possible in Mata, but I think is typically fine for a bootstrap-based test.
How to calculate Sigma (standard deviation of weekly stock return)
Hello all Statalist members.
I would like to calculate SIGMA (denotes standard deviation of weekly stock returns in year T) i mean base on weekly stock return data, when i am using weekly dates then i get missing values, but when i am using year date command then i am getting weekReturnVolatility_yearlynwe values.Therefore i need some suggestions how can i get sigma variable from the following data.
also one friend of mine recommended me this command
weekReturn1_sd is missing value in below data,
as well as weekReturnVolatility_yearlynwe is sigma i.e .0355006 for 2002-01
but it give me this message "not sorted" error
Please guide me which command in Stata i can use for sigma calculation.
I would like to calculate SIGMA (denotes standard deviation of weekly stock returns in year T) i mean base on weekly stock return data, when i am using weekly dates then i get missing values, but when i am using year date command then i am getting weekReturnVolatility_yearlynwe values.Therefore i need some suggestions how can i get sigma variable from the following data.
Code:
by code week_start year , sort : egen float weekReturn1_sd= sd(wretwd)
Code:
by code year, sort : egen float weekReturnVolatility_yearlynwe= sd(wretwd)
also one friend of mine recommended me this command
weekReturn1_sd is missing value in below data,
as well as weekReturnVolatility_yearlynwe is sigma i.e .0355006 for 2002-01
Code:
by code week_start: gen sigma = sd(wretwd)
Code:
code trdwnt wretwd week_start week_end count weekReturn1_sd weekReturnVolatility_yearlynwe 2 2002-01 -.026966 31dec2001 04jan2002 50 .0355006 2 2002-02 -.052348 07jan2002 11jan2002 50 .0355006 2 2002-03 -.083672 14jan2002 18jan2002 50 .0355006 2 2002-04 .06383 21jan2002 25jan2002 50 .0355006 2 2002-05 .040833 28jan2002 01feb2002 50 .0355006 2 2002-06 .000801 04feb2002 08feb2002 50 .0355006 2 2002-09 -.0024 25feb2002 01mar2002 50 .0355006 2 2002-10 .074579 04mar2002 08mar2002 50 .0355006 2 2002-11 -.029104 11mar2002 15mar2002 50 .0355006 2 2002-12 .008455 18mar2002 22mar2002 50 .0355006 2 2002-13 -.033537 25mar2002 29mar2002 50 .0355006 2 2002-14 -.001577 01apr2002 05apr2002 50 .0355006 2 2002-15 .004739 08apr2002 12apr2002 50 .0355006 2 2002-16 -.019654 15apr2002 19apr2002 50 .0355006 2 2002-17 -.011227 22apr2002 26apr2002 50 .0355006 2 2002-18 .007299 29apr2002 03may2002 50 .0355006 2 2002-19 -.022544 06may2002 10may2002 50 .0355006 2 2002-20 -.054366 13may2002 17may2002 50 .0355006 2 2002-21 .029617 20may2002 24may2002 50 .0355006 2 2002-22 -.043993 27may2002 31may2002 50 .0355006 2 2002-23 .010619 03jun2002 07jun2002 50 .0355006 2 2002-24 .000876 10jun2002 14jun2002 50 .0355006 2 2002-25 .04112 17jun2002 21jun2002 50 .0355006 2 2002-26 .110084 24jun2002 28jun2002 50 .0355006 2 2002-27 -.028766 01jul2002 05jul2002 50 .0355006 2 2002-28 -.018706 08jul2002 12jul2002 50 .0355006 2 2002-29 .010504 15jul2002 19jul2002 50 .0355006 2 2002-30 -.040735 22jul2002 26jul2002 50 .0355006 2 2002-31 .009992 29jul2002 02aug2002 50 .0355006 2 2002-32 -.018961 05aug2002 09aug2002 50 .0355006 2 2002-33 -.009244 12aug2002 16aug2002 50 .0355006 2 2002-34 .029686 19aug2002 23aug2002 50 .0355006 2 2002-35 -.022241 26aug2002 30aug2002 50 .0355006 2 2002-36 -.012637 02sep2002 06sep2002 50 .0355006 2 2002-37 -.011945 09sep2002 13sep2002 50 .0355006 2 2002-38 -.002591 16sep2002 20sep2002 50 .0355006 2 2002-39 -.01039 23sep2002 27sep2002 50 .0355006 2 2002-41 -.028871 07oct2002 11oct2002 50 .0355006 2 2002-42 0 14oct2002 18oct2002 50 .0355006 2 2002-43 0 21oct2002 25oct2002 50 .0355006 2 2002-44 -.004505 28oct2002 01nov2002 50 .0355006 2 2002-45 .017195 04nov2002 08nov2002 50 .0355006 2 2002-46 -.102313 11nov2002 15nov2002 50 .0355006 2 2002-47 .012884 18nov2002 22nov2002 50 .0355006 2 2002-48 -.008806 25nov2002 29nov2002 50 .0355006 2 2002-49 -.026654 02dec2002 06dec2002 50 .0355006 2 2002-50 -.009128 09dec2002 13dec2002 50 .0355006 2 2002-51 .0174 16dec2002 20dec2002 50 .0355006 2 2002-52 -.015091 23dec2002 27dec2002 50 .0355006 2 2002-53 -.0143 30dec2002 03jan2003 50 .0355006 2 2003-02 .049505 06jan2003 10jan2003 50 .0421453 2 2003-03 .06499 13jan2003 17jan2003 50 .0421453 2 2003-04 -.019685 20jan2003 24jan2003 50 .0421453 2 2003-05 .011044 27jan2003 31jan2003 50 .0421453 2 2003-07 .018868 10feb2003 14feb2003 50 .0421453 2 2003-08 -.031189 17feb2003 21feb2003 50 .0421453 2 2003-09 .039235 24feb2003 28feb2003 50 .0421453 2 2003-10 .055179 03mar2003 07mar2003 50 .0421453 2 2003-11 .037615 10mar2003 14mar2003 50 .0421453 2 2003-12 .009726 17mar2003 21mar2003 50 .0421453 2 2003-13 .011384 24mar2003 28mar2003 50 .0421453 2 2003-14 .037229 31mar2003 04apr2003 50 .0421453 2 2003-15 .065109 07apr2003 11apr2003 50 .0421453 2 2003-16 .088558 14apr2003 18apr2003 50 .0421453 2 2003-17 -.12095 21apr2003 25apr2003 50 .0421453 2 2003-18 .051597 28apr2003 02may2003 50 .0421453 2 2003-20 .080997 12may2003 16may2003 50 .0421453 2 2003-21 -.007205 19may2003 23may2003 50 .0421453 2 2003-22 .050074 26may2003 30may2003 50 .0421453 2 2003-23 -.049088 02jun2003 06jun2003 50 .0421453 2 2003-24 .0059 09jun2003 13jun2003 50 .0421453 2 2003-25 -.060117 16jun2003 20jun2003 50 .0421453 2 2003-26 -.090484 23jun2003 27jun2003 50 .0421453 2 2003-27 .02916 30jun2003 04jul2003 50 .0421453 2 2003-28 .035 07jul2003 11jul2003 50 .0421453 2 2003-29 .006441 14jul2003 18jul2003 50 .0421453 2 2003-30 -.0144 21jul2003 25jul2003 50 .0421453 2 2003-31 .037338 28jul2003 01aug2003 50 .0421453 2 2003-32 .015649 04aug2003 08aug2003 50 .0421453 2 2003-33 .001541 11aug2003 15aug2003 50 .0421453 2 2003-34 .021538 18aug2003 22aug2003 50 .0421453 2 2003-35 -.027108 25aug2003 29aug2003 50 .0421453 2 2003-36 -.018576 01sep2003 05sep2003 50 .0421453 2 2003-37 -.047319 08sep2003 12sep2003 50 .0421453 2 2003-38 -.038079 15sep2003 19sep2003 50 .0421453 2 2003-39 .017212 22sep2003 26sep2003 50 .0421453 2 2003-40 -.00846 29sep2003 03oct2003 50 .0421453 2 2003-41 .032423 06oct2003 10oct2003 50 .0421453 2 2003-42 -.024793 13oct2003 17oct2003 50 .0421453 2 2003-43 .027119 20oct2003 24oct2003 50 .0421453 2 2003-44 -.019802 27oct2003 31oct2003 50 .0421453 2 2003-45 .001684 03nov2003 07nov2003 50 .0421453 2 2003-46 -.023529 10nov2003 14nov2003 50 .0421453 2 2003-47 .010327 17nov2003 21nov2003 50 .0421453 2 2003-48 .006814 24nov2003 28nov2003 50 .0421453 2 2003-49 .018613 01dec2003 05dec2003 50 .0421453 2 2003-50 .01495 08dec2003 12dec2003 50 .0421453 2 2003-51 -.018003 15dec2003 19dec2003 50 .0421453 2 2003-52 .088333 22dec2003 26dec2003 50 .0421453 2 2003-53 .001531 29dec2003 02jan2004 50 .0421453 2 2004-02 .049491 05jan2004 09jan2004 50 .0493958 2 2004-03 .101248 12jan2004 16jan2004 50 .0493958 2 2004-05 -.011335 26jan2004 30jan2004 50 .0493958 2 2004-06 .003822 02feb2004 06feb2004 50 .0493958 2 2004-07 .063452 09feb2004 13feb2004 50 .0493958 2 2004-08 .004773 16feb2004 20feb2004 50 .0493958 2 2004-09 -.030879 23feb2004 27feb2004 50 .0493958 2 2004-10 .002451 01mar2004 05mar2004 50 .0493958 2 2004-11 .025672 08mar2004 12mar2004 50 .0493958 2 2004-12 0 15mar2004 19mar2004 50 .0493958 2 2004-13 .059595 22mar2004 26mar2004 50 .0493958 2 2004-14 .051744 29mar2004 02apr2004 50 .0493958 2 2004-15 -.04385 05apr2004 09apr2004 50 .0493958 2 2004-16 -.008949 12apr2004 16apr2004 50 .0493958 2 2004-17 -.037246 19apr2004 23apr2004 50 .0493958 2 2004-18 -.098476 26apr2004 30apr2004 50 .0493958 2 2004-20 -.011704 10may2004 14may2004 50 .0493958 2 2004-21 .007895 17may2004 21may2004 50 .0493958 2 2004-22 -.008367 24may2004 28may2004 50 .0493958 2 2004-23 -.007952 31may2004 04jun2004 50 .0493958 2 2004-24 -.02004 07jun2004 11jun2004 50 .0493958 2 2004-25 -.063395 14jun2004 18jun2004 50 .0493958 2 2004-26 .021834 21jun2004 25jun2004 50 .0493958 2 2004-27 .089744 28jun2004 02jul2004 50 .0493958 2 2004-28 -.003922 05jul2004 09jul2004 50 .0493958 2 2004-29 .061024 12jul2004 16jul2004 50 .0493958 2 2004-30 -.040816 19jul2004 23jul2004 50 .0493958 2 2004-31 .034816 26jul2004 30jul2004 50 .0493958 2 2004-32 -.009346 02aug2004 06aug2004 50 .0493958 2 2004-33 .037736 09aug2004 13aug2004 50 .0493958 2 2004-34 .016364 16aug2004 20aug2004 50 .0493958 2 2004-35 -.125224 23aug2004 27aug2004 50 .0493958 2 2004-36 .02045 30aug2004 03sep2004 50 .0493958 2 2004-37 -.034068 06sep2004 10sep2004 50 .0493958 2 2004-38 .139004 13sep2004 17sep2004 50 .0493958 2 2004-39 -.020036 20sep2004 24sep2004 50 .0493958 2 2004-40 .007435 27sep2004 01oct2004 50 .0493958 2 2004-41 .02583 04oct2004 08oct2004 50 .0493958 2 2004-42 .001799 11oct2004 15oct2004 50 .0493958 2 2004-43 .039497 18oct2004 22oct2004 50 .0493958 2 2004-44 -.136442 25oct2004 29oct2004 50 .0493958 2 2004-45 -.006 01nov2004 05nov2004 50 .0493958 2 2004-46 .036217 08nov2004 12nov2004 50 .0493958 2 2004-47 0 15nov2004 19nov2004 50 .0493958 2 2004-48 0 22nov2004 26nov2004 50 .0493958 2 2004-49 -.013592 29nov2004 03dec2004 50 .0493958 2 2004-50 -.005906 06dec2004 10dec2004 50 .0493958 2 2004-51 .005941 13dec2004 17dec2004 50 .0493958 2 2004-52 -.001969 20dec2004 24dec2004 50 .0493958 2 2004-53 .037475 27dec2004 31dec2004 50 .0493958
entropy balancing commands
Dear all,
I hope you are well.
I am performing entropy balancing method for PSM. I have read the paper of Hainmueller and Xu (2013). I follow them to perform the model. However, I need to ensure that I am doing the right way (using commands) or not?
Here is my commands as follows:
Kindly If I am wrong please guide me to do it in the right way
I hope you are well.
I am performing entropy balancing method for PSM. I have read the paper of Hainmueller and Xu (2013). I follow them to perform the model. However, I need to ensure that I am doing the right way (using commands) or not?
Here is my commands as follows:
HTML Code:
ebalance treat Control variables, targets(1) svyset [pweight=_webal] then svy: reg Dependent Var treat Control variables
Generate a new variable based on the values of a second variable
Hi,
I'm trying to generate a continuous variable using a comparative dataset where I want the countries to be sorted based on the share of highly educated within the country. Does anyone know which command is best to use? I'm a bit confused - sorry for a beginner question:-)
All the best,
Marcus
I'm trying to generate a continuous variable using a comparative dataset where I want the countries to be sorted based on the share of highly educated within the country. Does anyone know which command is best to use? I'm a bit confused - sorry for a beginner question:-)
All the best,
Marcus
Time fixed effect and State fixed effect
Dear all,
I am pretty much new for STATA. I'm currently working on my master's thesis. I want to investigate the effect of Vehicle Miles Travelled (VMT) on PM2.5 concentrations. I retrieved a panel dataset of VMT and PM2.5 for each US cities from 2012-2016. I am struggling with the STATA command of how to use the Time fixed effect and the Entity fixed effect at "State" level. I have tried command -xtset ID STATE_ID_N- , where variable "ID" is the id of each US cities and variable "STATE_ID_N" is the State where city is located. However, STATA shows "repeated time values within panel". How could I deal with it? and what is the correct command for
(1) regress VMT on PM2.5 with Time fixed effect,
(2) regress VMT on PM2.5 with Entity fixed effect at "State" level, and
(3) regress VMT on PM2.5 with both Time and Entity fixed.
Thank you
Rus
Here are my dataset Array
I am pretty much new for STATA. I'm currently working on my master's thesis. I want to investigate the effect of Vehicle Miles Travelled (VMT) on PM2.5 concentrations. I retrieved a panel dataset of VMT and PM2.5 for each US cities from 2012-2016. I am struggling with the STATA command of how to use the Time fixed effect and the Entity fixed effect at "State" level. I have tried command -xtset ID STATE_ID_N- , where variable "ID" is the id of each US cities and variable "STATE_ID_N" is the State where city is located. However, STATA shows "repeated time values within panel". How could I deal with it? and what is the correct command for
(1) regress VMT on PM2.5 with Time fixed effect,
(2) regress VMT on PM2.5 with Entity fixed effect at "State" level, and
(3) regress VMT on PM2.5 with both Time and Entity fixed.
Thank you

Rus
Here are my dataset Array
DID two way fixed effects
Dear Statalist Members,
PS: I have attached a small portion of my data.
By using a repeated cross sectional data (Demographic and Health Survey; rounds 2008, 2013, and 2018), I am trying to estimate the effect of a refugee shock (influx of refugees) on children's health status. I appended the DHS-2008,2013, and 2018.
The health indicator of children's health is Height-for-age-Z-Score (HAZ).
*The start of the refugee inflow to the country of interest (treatment) started in 2011.
*The country of interest has 81 cities, and the arrivals of the refugees increased year by year.
*For these 81 cities, I have population data from 2008 to 2018, and the number of refugees in each city from 2013 to 2018. Thus, I have created a refugee ratio (Refugee Ratio = Population of City c in year y / Number of Refugees in city c in year y). Since the number of refugees in the cities is available starting from 2013, the refugee ratio takes value of zero before 2013. I have merged the refugee ratio (ranging from 0 and 1) with the city of residence of children so that I can understand whether the child i at the survey year is exposed to any refugee shock.
Please note that each DHS rounds includes the children who born 5 years preceding the survey year. To be more clear,
*DHS 2008 only includes children born between 2003 and 2008. I will use the DHS-2008 to look at the placebo effect.
*DHS-2013 includes children born between 2008-2013.
*DHS-2018 includes children born between 2013-2018.
I am having troubles in creating "time" and "treatment" variables - and hence did variable (treatment*time).
In DHS, I can see the children's year of birth.
1) For the "time" variable, I run this code:
replace time=1 if child_birth_year>=2011
replace time=0 if child_birth_year<=2010. But this is problematic because it is like assuming that those born before 2010 are never treated. But they may be exposed to the presence of refugees later in life. That is way I need to consider the Two Way FE model which is a generalization of DID.
2) My model should be like this
HAZ_ihct= \beta Ref/pop_ct + \alpha_c + \alpha_t + \epsilon
Where i is the individual in household h, in city c, at time t. The treatment variable could be time-lagged (Population/Refugee_ct-1).
My question is how to define the timing of the treatment. I would start by the year of birth. But I cannot define it properly. I would be more than happy if you can help me.
PS: I have attached a small portion of my data.
By using a repeated cross sectional data (Demographic and Health Survey; rounds 2008, 2013, and 2018), I am trying to estimate the effect of a refugee shock (influx of refugees) on children's health status. I appended the DHS-2008,2013, and 2018.
The health indicator of children's health is Height-for-age-Z-Score (HAZ).
*The start of the refugee inflow to the country of interest (treatment) started in 2011.
*The country of interest has 81 cities, and the arrivals of the refugees increased year by year.
*For these 81 cities, I have population data from 2008 to 2018, and the number of refugees in each city from 2013 to 2018. Thus, I have created a refugee ratio (Refugee Ratio = Population of City c in year y / Number of Refugees in city c in year y). Since the number of refugees in the cities is available starting from 2013, the refugee ratio takes value of zero before 2013. I have merged the refugee ratio (ranging from 0 and 1) with the city of residence of children so that I can understand whether the child i at the survey year is exposed to any refugee shock.
Please note that each DHS rounds includes the children who born 5 years preceding the survey year. To be more clear,
*DHS 2008 only includes children born between 2003 and 2008. I will use the DHS-2008 to look at the placebo effect.
*DHS-2013 includes children born between 2008-2013.
*DHS-2018 includes children born between 2013-2018.
I am having troubles in creating "time" and "treatment" variables - and hence did variable (treatment*time).
In DHS, I can see the children's year of birth.
1) For the "time" variable, I run this code:
replace time=1 if child_birth_year>=2011
replace time=0 if child_birth_year<=2010. But this is problematic because it is like assuming that those born before 2010 are never treated. But they may be exposed to the presence of refugees later in life. That is way I need to consider the Two Way FE model which is a generalization of DID.
2) My model should be like this
HAZ_ihct= \beta Ref/pop_ct + \alpha_c + \alpha_t + \epsilon
Where i is the individual in household h, in city c, at time t. The treatment variable could be time-lagged (Population/Refugee_ct-1).
My question is how to define the timing of the treatment. I would start by the year of birth. But I cannot define it properly. I would be more than happy if you can help me.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str15 CASEID double haz06 float child_birth_year byte(ref_ratio2008 ref_ratio2009 ref_ratio2010 ref_ratio2011 ref_ratio2012) double(ref_ratio2013 ref_ratio2014 ref_ratio2015 ref_ratio2016 ref_ratio2017 ref_ratio2018) float survey_year " 15220003 02" . 2018 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2018 " 12560019 02" .99 2018 0 0 0 0 0 0 0 .0002882681878477381 .00032848332947102936 .0003680900243030867 .0003833342145614128 2018 " 08360003 02" -3.37 2018 . . . . . . . . . . . 2018 " 331311 2" . 2008 0 0 0 0 0 0 .0016293140344733668 .0014899884519778434 .0020713356178705586 .0023507949691504507 .003313790466479735 2008 " 351016 02" -.47000000000000003 2013 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2013 " 351121 4" 3.36 2008 0 0 0 0 0 .04788275035973622 .0999472747643841 .11175873189629308 .11743237252225154 .11651696452720017 .11183859044012566 2008 " 241024 2" . 2008 0 0 0 0 0 0 .0009685123256973092 .0023463063840160514 .0030950817445735755 .003791346468746906 .004348978375642727 2008 " 08310006 05" 1.6500000000000001 2018 0 0 0 0 0 .008997050147492625 .02827213768040654 .06261454116886157 .06801110066449559 .07022186128875806 .08672574742413153 2018 " 15270016 02" 1.12 2018 0 0 0 0 0 0 .003058014198971529 .01617946120048652 .017461997622404624 .017929279410977463 .018889445600197646 2018 " 05020015 02" . 2018 0 0 0 0 0 0 .00005773732665679883 .00044272332591961336 .0006049068931493858 .0008877145892747302 .0013253948067363278 2018 " 361418 05" -3.64 2013 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2013 " 08840003 02" . 2018 0 0 0 0 0 0 .0003820621806198959 .012800937012466629 .014539319111493854 0 .016161771768778376 2018 " 270203 02" .06 2013 0 0 0 0 0 0 .00027240533914464724 .001320843350189284 .0018435936480313247 .0021550565903887702 .002352538800713619 2013 " 15070004 02" .06 2018 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2018 " 10020016 02" . 2018 0 0 0 0 0 0 .00023380249701066808 .00045309083464366084 .0005070926021381568 .0006098255630834389 .000773722773796144 2018 " 03630018 02" . 2018 0 0 0 0 0 0 .0011833514287489313 .0015244574077174554 .001931277016631255 .0022959288262063876 .002618805023526314 2018 " 13220001 02" . 2018 0 0 0 0 0 0 0 .00038454543092472784 .00042015716990429755 .0005493518161981774 .0006786322016044081 2018 " 3624 4 2" . 2008 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2008 " 14020008 02" . 2018 0 0 0 0 0 0 .0015824092356435922 .007260962835014208 .008873700087596689 .012827431892281782 .01807977328511613 2018 " 15140013 02" .46 2018 0 0 0 0 0 .01706844121744722 .05579131365677821 .03663230331766135 .040618591150628026 .04283047948546196 .04727203436918047 2018 " 300506 07" . 2013 0 0 0 0 0 0 .0001820051507457661 .001454651779024506 .0015804372481581545 .0018795976020213132 .0021050407944305364 2013 " 30823 2" . 2008 0 0 0 0 0 0 .00042050128799544514 .001258123449466077 .00156916707909204 .0028302889052674914 .0034176466991419196 2008 " 15490013 04" 3.14 2018 0 0 0 0 0 0 .0358684560243762 .02910349379580787 .03363500370082111 .03420236069248802 .035309454300846435 2018 " 08720001 06" 3.41 2018 . . . . . . . . . . . 2018 " 01490005 04" . 2018 . . . . . . . . . . . 2018 " 08140014 02" -.73 2018 0 0 0 0 0 0 .0009731488762076778 .027289723967345234 .02992719997245611 .030769056458404935 .031545682890866386 2018 " 13280012 02" 3.2 2018 0 0 0 0 0 0 .0058251612792986195 .01024024892919653 .0125578928192143 .01473730336641184 .017472613024926486 2018 " 340721 04" .06 2013 0 0 0 0 0 .06652216013766796 .13407386002182628 .1532707745377972 .1611138238231951 .16609200130639762 .1860952802550377 2013 " 08130009 02" . 2018 0 0 0 0 0 0 .0044993120551867615 .000057243836018695577 .0001571790230421871 .00019370697632714655 .00029632914543455287 2018 " 2016 4 2" . 2008 0 0 0 0 0 .05016679240964801 .13470861329775055 .22926990225672267 .24379470988608926 .25474312892245304 .27670797885028225 2008 " 322315 2" . 2008 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2008 " 362112 3" . 2008 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2008 " 13140009 02" 4.29 2018 0 0 0 0 0 0 .0001820051507457661 .001454651779024506 .0015804372481581545 .0018795976020213132 .0021050407944305364 2018 " 211117 02" -.62 2013 0 0 0 0 0 0 .0044993120551867615 .000057243836018695577 .0001571790230421871 .00019370697632714655 .00029632914543455287 2013 " 08580019 02" . 2018 0 0 0 0 0 .008997050147492625 .02827213768040654 .06261454116886157 .06801110066449559 .07022186128875806 .08672574742413153 2018 " 020312 02" .59 2013 0 0 0 0 0 .004223024565503383 .022953299495069145 .02451377232877187 .029644525887260003 .033433380590131324 .03681080168444816 2013 " 08780007 06" . 2018 0 0 0 0 0 .02321266219580443 .05621015979240394 .06983977895514358 .0781604732553562 .08195735631500954 .08756423324956697 2018 " 15220005 02" . 2018 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2018 " 3616 3 2" . 2008 0 0 0 0 0 .04788275035973622 .0999472747643841 .11175873189629308 .11743237252225154 .11651696452720017 .11183859044012566 2008 " 191602 02" . 2013 0 0 0 0 0 .008997050147492625 .02827213768040654 .06261454116886157 .06801110066449559 .07022186128875806 .08672574742413153 2013 " 191009 02" . 2013 0 0 0 0 0 .008997050147492625 .02827213768040654 .06261454116886157 .06801110066449559 .07022186128875806 .08672574742413153 2013 " 15240011 02" . 2018 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2018 " 01310015 02" 3.14 2018 0 0 0 0 0 0 .002198727251025549 .008072395591086027 .009833758956685592 .01228752689630511 .013615943558356198 2018 " 321519 2" . 2008 0 0 0 0 0 .008707762760675534 .010841485347166634 .02121220746690404 .026565809766992403 .027556452720052475 .03763569023231071 2008 " 15120013 02" 2.07 2018 0 0 0 0 0 .06652216013766796 .13407386002182628 .1532707745377972 .1611138238231951 .16609200130639762 .1860952802550377 2018 " 360516 10" . 2013 0 0 0 0 0 0 .003058014198971529 .01617946120048652 .017461997622404624 .017929279410977463 .018889445600197646 2013 " 361117 18" . 2013 0 0 0 0 0 .04788275035973622 .0999472747643841 .11175873189629308 .11743237252225154 .11651696452720017 .11183859044012566 2013 " 02480004 02" 1.59 2018 0 0 0 0 0 0 .00042050128799544514 .001258123449466077 .00156916707909204 .0028302889052674914 .0034176466991419196 2018 " 321905 02" . 2013 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2013 " 15400008 05" 2.93 2018 0 0 0 0 0 .04788275035973622 .0999472747643841 .11175873189629308 .11743237252225154 .11651696452720017 .11183859044012566 2018 " 1515 1 3" . 2008 0 0 0 0 0 0 .0058251612792986195 .01024024892919653 .0125578928192143 .01473730336641184 .017472613024926486 2008 " 220902 02" . 2013 0 0 0 0 0 0 .007184038427799658 .033288691896535265 .040787943899100794 .04466624343912569 .052705658856715215 2013 " 2305 4 8" . 2008 0 0 0 0 0 0 .00062882096069869 .015563157546021684 .01961532511731037 .027000495955398218 .028983807011486933 2008 " 362116 15" . 2013 0 0 0 0 0 0 .38875913662708655 .027567199255789664 .0296286803310541 .028966528626727817 .028310345485415594 2013 " 09370016 02" . 2018 0 0 0 0 0 0 .0031986228474722156 .006388068671015908 .009150192905186247 .011309029362651568 .013369636447888305 2018 " 361614 10" . 2008 0 0 0 0 0 .04788275035973622 .0999472747643841 .11175873189629308 .11743237252225154 .11651696452720017 .11183859044012566 2008 " 331414 02" . 2013 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2013 " 03270005 02" . 2018 0 0 0 0 0 0 .003058014198971529 .01617946120048652 .017461997622404624 .017929279410977463 .018889445600197646 2018 " 904 2 4" -.63 2008 0 0 0 0 0 0 .00005773732665679883 .00044272332591961336 .0006049068931493858 .0008877145892747302 .0013253948067363278 2008 " 08470016 02" 2.7 2018 0 0 0 0 0 .008997050147492625 .02827213768040654 .06261454116886157 .06801110066449559 .07022186128875806 .08672574742413153 2018 " 202911 2" . 2008 0 0 0 0 0 .005862441331618374 .02605289896396305 .07233868948402523 .07889778854154687 .08475298102323892 .11341285710191637 2008 " 14200015 02" . 2018 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2018 " 110313 2" . 2008 0 0 0 0 0 0 .007174787509699416 .031019715768991683 .035517040762446765 .04226364519513226 .04788311719971241 2008 " 14250003 02" .06 2018 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2018 " 341322 2" . 2008 0 0 0 0 0 .06652216013766796 .13407386002182628 .1532707745377972 .1611138238231951 .16609200130639762 .1860952802550377 2008 " 281413 2" -1.1500000000000001 2008 0 0 0 0 0 0 .00011045635041172604 .0006433920617217391 .0008220222012608781 .0009968464627442106 .0010674515371820316 2008 " 901 4 2" . 2008 0 0 0 0 0 0 .0007078433287889792 .004006967412537111 .00585145614626821 .006891222912617561 .008190824292140778 2008 " 3008 6 2" . 2008 0 0 0 0 0 0 0 .0007087808072923676 .000769802505839881 .0007990980990104142 .0007202352203496107 2008 " 300820 2" . 2008 0 0 0 0 0 0 0 .0007087808072923676 .000769802505839881 .0007990980990104142 .0007202352203496107 2008 " 230210 08" .58 2013 0 0 0 0 0 0 .007184038427799658 .033288691896535265 .040787943899100794 .04466624343912569 .052705658856715215 2013 " 02540009 02" .79 2018 0 0 0 0 0 0 .00042050128799544514 .001258123449466077 .00156916707909204 .0028302889052674914 .0034176466991419196 2018 " 321622 2" . 2008 0 0 0 0 0 .008707762760675534 .010841485347166634 .02121220746690404 .026565809766992403 .027556452720052475 .03763569023231071 2008 " 03610007 02" . 2018 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2018 " 06140017 02" . 2018 0 0 0 0 0 0 .00023818030248898416 .0018694581396772477 .0023408475608918127 .0026929131727208345 .0029223801510866064 2018 " 031821 02" -.78 2013 0 0 0 0 0 0 .000124912561207155 .014776281434998522 .01615131652647118 .016104017401777047 .015850683307089674 2013 " 08700019 02" -.24 2018 0 0 0 0 0 .05016679240964801 .13470861329775055 .22926990225672267 .24379470988608926 .25474312892245304 .27670797885028225 2018 " 2702 1 6" . 2008 0 0 0 0 0 0 .00027240533914464724 .001320843350189284 .0018435936480313247 .0021550565903887702 .002352538800713619 2008 " 321820 1" . 2008 0 0 0 0 0 0 .0016293140344733668 .0014899884519778434 .0020713356178705586 .0023507949691504507 .003313790466479735 2008 " 3413 7 10" . 2008 0 0 0 0 0 .06652216013766796 .13407386002182628 .1532707745377972 .1611138238231951 .16609200130639762 .1860952802550377 2008 " 160512 02" . 2013 0 0 0 0 0 .009618968606091212 .021339069275154495 .0290470415067701 .03249243627570961 .037346071300631285 .0474086748829915 2013 " 190222 02" 3.2600000000000002 2013 0 0 0 0 0 .008997050147492625 .02827213768040654 .06261454116886157 .06801110066449559 .07022186128875806 .08672574742413153 2013 " 322316 3" . 2008 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2008 " 260219 2" . 2008 0 0 0 0 0 0 .00027240533914464724 .001320843350189284 .0018435936480313247 .0021550565903887702 .002352538800713619 2008 " 161014 2" . 2008 0 0 0 0 0 .009618968606091212 .021339069275154495 .0290470415067701 .03249243627570961 .037346071300631285 .0474086748829915 2008 " 60110 2" . 2008 0 0 0 0 0 0 .0009597122398819938 .005605093848540018 .006927152565854754 .007841130825220038 .009919416695665482 2008 " 15480011 02" 3.52 2018 0 0 0 0 0 .01706844121744722 .05579131365677821 .03663230331766135 .040618591150628026 .04283047948546196 .04727203436918047 2018 " 10380016 02" -.32 2018 0 0 0 0 0 0 0 .0013605880316558115 .0019658040297656156 .003080244808297057 .003557892705015742 2018 " 3606 7 3" . 2008 0 0 0 0 0 0 .003058014198971529 .01617946120048652 .017461997622404624 .017929279410977463 .018889445600197646 2008 " 3623 5 2" . 2008 0 0 0 0 0 .06564723248870687 .25302505814970955 .19301122431724022 .20895875405216974 .22131554125815245 .23438397217027726 2008 " 321923 02" 1.21 2013 0 0 0 0 0 0 .0005527192867710324 .0012878546730791857 .0014915605486325088 .0018673925436199228 .002585906188377838 2013 " 08090017 02" . 2018 0 0 0 0 0 0 .0044993120551867615 .000057243836018695577 .0001571790230421871 .00019370697632714655 .00029632914543455287 2018 " 260116 02" .79 2013 0 0 0 0 0 0 .0003106429376881331 .0004469731536749574 .0006649282520966689 .001218595402075856 .0016888488569159843 2013 " 110919 02" -2.65 2013 0 0 0 0 0 0 .0007183399699369362 .005231954896289372 .007066906324594554 .009549450926769365 .013034530523399625 2013 " 032508 02" -1.35 2013 0 0 0 0 0 0 .0005293736186657138 .005178535253915621 .006102531157651291 .0075278752176857824 .00961912834598957 2013 " 15310014 02" . 2018 0 0 0 0 0 0 .003058014198971529 .01617946120048652 .017461997622404624 .017929279410977463 .018889445600197646 2018 " 030723 02" 1.72 2013 0 0 0 0 0 0 .00042050128799544514 .001258123449466077 .00156916707909204 .0028302889052674914 .0034176466991419196 2013 " 15070001 02" . 2018 0 0 0 0 0 0 .00015178806338669526 .00038953051322353585 .0005659348622776808 .0006848505496186391 .0007857988493412444 2018 " 1011 9 2" 2.23 2008 0 0 0 0 0 0 .007174787509699416 .031019715768991683 .035517040762446765 .04226364519513226 .04788311719971241 2008 " 200603 02" . 2013 0 0 0 0 0 0 .0044993120551867615 .000057243836018695577 .0001571790230421871 .00019370697632714655 .00029632914543455287 2013 " 3104 2 14" . 2008 0 0 0 0 0 0 .0001820051507457661 .001454651779024506 .0015804372481581545 .0018795976020213132 .0021050407944305364 2008 end
Expxport codebook descriptives to latex
Hello, I have a very simple query that unfortunately I am unable to resolve.
I am trying to export my descriptive statistics into a latex table. I can easily generate the descriptions with the codebook and sum command, but I am having trouble understanding how to export them.
Does anyone knows how I can do this??
here is a replication of the commands.
In sum, I would like a latex table that replicates the codebook table, but I would also include the standard deviation in the summary statistic
thanks a lot for your help
I am trying to export my descriptive statistics into a latex table. I can easily generate the descriptions with the codebook and sum command, but I am having trouble understanding how to export them.
Does anyone knows how I can do this??
here is a replication of the commands.
Code:
sysuse auto, clear codebook make price mpg foreign,compact sum make price mpg foreign
thanks a lot for your help
Control variables in interaction function
Hello fellow stata users!
I have a more theoretical question instead of how to use stata. I was wondering if it is okay to add control variables in an interaction function?
best wishes,
Klaudia
I have a more theoretical question instead of how to use stata. I was wondering if it is okay to add control variables in an interaction function?
best wishes,
Klaudia
Suspected bug in -confirm-
The title is self-explanatory. Here is a reproducible example:
There is no error where I think there should be one.
I think the above should be equivalent to
but it is not.
In case you are wondering, this is relevant in situations where you pass to confirm a local macro that might contain a variable name (or a variable list) but might also be empty. Something like
I am (still) using Stata 16.1 fully updated on Windows 11.
Edit:
Here is my workaround that I would like to share:
Code:
clear sysuse auto confirm variable , exact
I think the above should be equivalent to
Code:
clear sysuse auto confirm variable
In case you are wondering, this is relevant in situations where you pass to confirm a local macro that might contain a variable name (or a variable list) but might also be empty. Something like
Code:
confirm `could_be_a_varname_or_empty' , exact
Edit:
Here is my workaround that I would like to share:
Code:
novarabbrev confirm variable ...
cmp-random effects
Good day,
I am trying to estimate the impact of health status expectations (Exp_h) on consumption expectations. I performed a conditional mixed process -CMP-, because the model is non linear, and I have a way of thinking that the regressor (Exp_h (that is binary) is endogenous.
Then, I tried to estimate the model by considering random effects.
cons=ordinal dependent variable
Exp_h=endogenous dummy variable
z1 z2 =dummies IV variables
I get the results of the first and second stage equation and finally this last table, which I have difficulty interpreting.
Any suggestions would be welcome. Thank you very much.
I am trying to estimate the impact of health status expectations (Exp_h) on consumption expectations. I performed a conditional mixed process -CMP-, because the model is non linear, and I have a way of thinking that the regressor (Exp_h (that is binary) is endogenous.
Then, I tried to estimate the model by considering random effects.
Code:
cmp(cons= Exp_h x2 x3 ||id:) (Exp_h=z1 z2 x2 x3|| id:) , ind ($cmp_oprobit $cmp_probit) cl(id) cov(indep unstruct)
Exp_h=endogenous dummy variable
z1 z2 =dummies IV variables
I get the results of the first and second stage equation and finally this last table, which I have difficulty interpreting.
Any suggestions would be welcome. Thank you very much.
Code:
Random effects parameters Estimate Std. Err. [95% Conf. Interval] Level: id Cons Standard deviations _cons .6071207 .0526769 .5121775 .7196637 Exp_h Standard deviations _cons .8759736 .0612121 .7638533 1.004551 Level: Observations Standard deviations Cons 1 (constrained) Exp_h 1 (constrained) Cross-eq correlation Cons Exp_h .4120378 .0875992 .2272259 .5682024
How to set the X axis in the DASP package in stata
I want to use the DASP plug-in in stata and use the Lorenz curve to analyze the fairness of inter-provincial health resources. The vertical axis sets health resources, but how to set the horizontal axis. I want to set them as the cumulative percentage of population, cumulative percentage of geographic area, and economic GDP. Cumulative percentage, how to set it? I even don't quite understand the meaning of ranking variable
Thank you very much
Array
Thank you very much
Array
Sunday, December 26, 2021
Business calender: Get omitted dates using bodf()
I have a list of dates for specific company events of listed US companies. This variable (reguldate) may include weekends as well as holidays.
For each of these companies, I have a dataset containing daily stock price data from which I created a business calender:
I then transformed the regular event dates using:
The help file states:
However, if reguldate is a non trading date (according to the business calender I created) I would like to get previous trading date instead of a missing value, so something like: businessdate-1
Is there a way to include those values what are omited according to my .stbcal file?
Thanks
For each of these companies, I have a dataset containing daily stock price data from which I created a business calender:
Code:
bcal create biscal, from(date) replace
Code:
gen businessdate= bofd("biscal",reguldate)
Function bofd() returns missing when the date does not appear on the specified calendar.
Is there a way to include those values what are omited according to my .stbcal file?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int regulardate float businessdate 14607 104 14570 79 14571 80 14574 82 14572 81 14577 83 14578 84 14579 85 14578 84 14579 85 14577 83 14580 86 14580 86 14580 86 14585 89 14584 88 14573 . 14584 88 14585 89 14585 89 end format %tdDD/NN/CCYY regulardate format %tbbiscal businessdate
Optimal lags and Error-correction model
Good morning everyone;
I want to perform a cointegration with the Error correction model.
Everything works fine. My data with optimal lags 1 and 2 are of the same order I(1).
They are stationary in the first difference.
However here is my code to determine if, in the first difference, they are stationary:
varsoc dloga
* Optimal lag = 2 by HQIC, SBIC, AIC
varsoc dlogb
* Optimal lag = 0 by HQIC, SBIC, AIC
As you can see the optimal lag for the first difference of the second variable b is 0 with all criteria. I wanted to know if this would be a problem later in the estimation of my coefficients for the error correction model.
Thanks in advance.
Pita
I want to perform a cointegration with the Error correction model.
Everything works fine. My data with optimal lags 1 and 2 are of the same order I(1).
They are stationary in the first difference.
However here is my code to determine if, in the first difference, they are stationary:
varsoc dloga
* Optimal lag = 2 by HQIC, SBIC, AIC
varsoc dlogb
* Optimal lag = 0 by HQIC, SBIC, AIC
As you can see the optimal lag for the first difference of the second variable b is 0 with all criteria. I wanted to know if this would be a problem later in the estimation of my coefficients for the error correction model.
Thanks in advance.
Pita
How to evaluate the following hypothesizes?
Hello all!
I am currently working on a research project and I have a dataset of bankrupt and non-bankrupt European companies.
First of all, before I can conduct this research I have to think about my methodology how I will evaluate the hypothesizes.
My knowledge in stata is still limited, any help will be highly appreciated.
I need to construct a default predication model using a number of variables.
I use R&D as independent variable and some control variables (age, size, leverage, liquidity, Z-score, industry, and acquisitions).
The model I use is the logit model.
My first basic-hypothesis is: R&D spending has a positive impact on the prediction of bankruptcy because it improves the financial performance of a firm.
The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)
The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)
By not controlling size and age, I can deduce whether the prediction model improves by looking at the significance of the variables and the adjusted R squared of the model. Here I'm not sure yet, can someone confirm please?
The final hypothesis: There is a non-linear U-shaped relationship between R&D spending and the probability of bankruptcy.
Here I include a quadratic term and check whether the quadratic term of R&D is significant. Is this a correct method? In this hypothesis age and size will be control variables as original.
The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + 𝛽8𝑋8 + errorterm)
Kind regards,
Chun H
I am currently working on a research project and I have a dataset of bankrupt and non-bankrupt European companies.
First of all, before I can conduct this research I have to think about my methodology how I will evaluate the hypothesizes.
My knowledge in stata is still limited, any help will be highly appreciated.
I need to construct a default predication model using a number of variables.
I use R&D as independent variable and some control variables (age, size, leverage, liquidity, Z-score, industry, and acquisitions).
The model I use is the logit model.
My first basic-hypothesis is: R&D spending has a positive impact on the prediction of bankruptcy because it improves the financial performance of a firm.
The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)
- X1 = R&D
- X2 = Leverage
- X3 = Size
- X4 = Liquidity
- X5 = Z-score
- X6 = age
- X7 = industry
-
The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)
- X1 = R&D
- X2 = Size
- X3 = Age
- X4 = Liquidity
- X5 = Z-score
- X6 = leverage
- X7 = industry
By not controlling size and age, I can deduce whether the prediction model improves by looking at the significance of the variables and the adjusted R squared of the model. Here I'm not sure yet, can someone confirm please?
The final hypothesis: There is a non-linear U-shaped relationship between R&D spending and the probability of bankruptcy.
Here I include a quadratic term and check whether the quadratic term of R&D is significant. Is this a correct method? In this hypothesis age and size will be control variables as original.
The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + 𝛽8𝑋8 + errorterm)
- X1 = R&D
- X2 = R&D2
- X3 = Leverage
- X4 = Size
- X5 = Liquidity
- X6 = Z-score
- X7 = age
- X8 = industry
Kind regards,
Chun H
Problem with parallel: parallelize a loop
Hello to all,
I am not very strong in STATA but I wrote a program. I wanted to parallelize a loop. But, while searching, I didn't find a commercial way to parallelize it directly. So I integrated the loop in a program that I parallelized afterward.
The objective is to make regression under several sub-samples chosen with the variable "rp_partner_rank".
Then I have to get information about my variable of interest: lag_fdi_in_all
and the stocks in the variables.
But the code doesn't work, I have tested several ways with even the "parallele do" command, saving the loop in a do file. but the results are not what I expect. Please can you help me?
here are the error results
How should I parallelize this loop?
I am not very strong in STATA but I wrote a program. I wanted to parallelize a loop. But, while searching, I didn't find a commercial way to parallelize it directly. So I integrated the loop in a program that I parallelized afterward.
The objective is to make regression under several sub-samples chosen with the variable "rp_partner_rank".
Then I have to get information about my variable of interest: lag_fdi_in_all
and the stocks in the variables.
Code:
program define savereg qui { sum rp_partner_rank forvalues i = `r(min)' (1) `r(max)' { xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id) lincom _b[lag_fdi_in_all], l(95) replace beta = r(estimate) in `i' replace se = r(se) in `i' replace i_pos = `i' in `i' replace lb_95 = r(lb) in `i' replace ub_95 = r(ub) in `i' lincom _b[lag_fdi_in_all], l(90) replace lb_90 = r(lb) in `i' replace ub_90 = r(ub) in `i' lincom _b[lag_fdi_in_all], l(99) replace lb_99 = r(lb) in `i' replace ub_99 = r(ub) in `i' } } end preserve drop beta se i_pos lb_* ub_* g i_pos = . g beta = . g se = . g lb_90 = . g lb_95 = . g lb_99 = . g ub_90 = . g ub_95 = . g ub_99 = . parallel prog(savereg): savereg *parallel: savereg *parallel do "$Code\saveregdo.do" *drop if beta==. keep beta se i_pos lb_* ub_* parallel append, do(savereg) prog(savereg) e(3) twoway rarea ub_90 lb_90 i_pos , astyle(ci) || /// line beta i_pos save "$DataCreated\asup", replace restore
But the code doesn't work, I have tested several ways with even the "parallele do" command, saving the loop in a do file. but the results are not what I expect. Please can you help me?
here are the error results
Code:
. parallel prog(savereg): savereg -------------------------------------------------------------------------------- Parallel Computing with Stata (by GVY) Clusters : 4 pll_id : l9d9vacs17 Running at : G:\Etude et biblio\Université\tours\Master2 IE\Econometrie avancee\memoire\donn > ees\Code Randtype : datetime Waiting for the clusters to finish... cluster 0002 Exited with error -199- while running the command/dofile (view log)... cluster 0003 Exited with error -199- while running the command/dofile (view log)... cluster 0004 Exited with error -199- while running the command/dofile (view log)... cluster 0001 Exited with error -199- while running the command/dofile (view log)... -------------------------------------------------------------------------------- Enter -parallel printlog #- to checkout logfiles. -------------------------------------------------------------------------------- .
Problem with parallel
Hello to all,
I am not very strong in STATA but I wrote a program. I wanted to parallelize a loop. But, while searching, I didn't find a commercial way to parallelize it directly. So I integrated the loop in a program that I parallelized afterward.
The objective is to make regression under several sub-samples chosen with the variable "rp_partner_rank".
Then I have to get information about my variable of interest: lag_fdi_in_all
and the stocks in the variables.
But the code doesn't work, I have tested several ways with even the "parallele do" command, saving the loop in a do file. but the results are not what I expect. Please can you help me?
here are the error results
How should I parallelize this loop?
I am not very strong in STATA but I wrote a program. I wanted to parallelize a loop. But, while searching, I didn't find a commercial way to parallelize it directly. So I integrated the loop in a program that I parallelized afterward.
The objective is to make regression under several sub-samples chosen with the variable "rp_partner_rank".
Then I have to get information about my variable of interest: lag_fdi_in_all
and the stocks in the variables.
Code:
program define savereg qui { sum rp_partner_rank forvalues i = `r(min)' (1) `r(max)' { xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id) lincom _b[lag_fdi_in_all], l(95) replace beta = r(estimate) in `i' replace se = r(se) in `i' replace i_pos = `i' in `i' replace lb_95 = r(lb) in `i' replace ub_95 = r(ub) in `i' lincom _b[lag_fdi_in_all], l(90) replace lb_90 = r(lb) in `i' replace ub_90 = r(ub) in `i' lincom _b[lag_fdi_in_all], l(99) replace lb_99 = r(lb) in `i' replace ub_99 = r(ub) in `i' } } end preserve drop beta se i_pos lb_* ub_* g i_pos = . g beta = . g se = . g lb_90 = . g lb_95 = . g lb_99 = . g ub_90 = . g ub_95 = . g ub_99 = . parallel prog(savereg): savereg *parallel: savereg *parallel do "$Code\saveregdo.do" *drop if beta==. keep beta se i_pos lb_* ub_* parallel append, do(savereg) prog(savereg) e(3) twoway rarea ub_90 lb_90 i_pos , astyle(ci) || /// line beta i_pos save "$DataCreated\asup", replace restore
But the code doesn't work, I have tested several ways with even the "parallele do" command, saving the loop in a do file. but the results are not what I expect. Please can you help me?
here are the error results
Code:
. parallel prog(savereg): savereg -------------------------------------------------------------------------------- Parallel Computing with Stata (by GVY) Clusters : 4 pll_id : l9d9vacs17 Running at : G:\Etude et biblio\Université\tours\Master2 IE\Econometrie avancee\memoire\donn > ees\Code Randtype : datetime Waiting for the clusters to finish... cluster 0002 Exited with error -199- while running the command/dofile (view log)... cluster 0003 Exited with error -199- while running the command/dofile (view log)... cluster 0004 Exited with error -199- while running the command/dofile (view log)... cluster 0001 Exited with error -199- while running the command/dofile (view log)... -------------------------------------------------------------------------------- Enter -parallel printlog #- to checkout logfiles. -------------------------------------------------------------------------------- .