I am writing the following code:
local variables= "taxratesales taxrateVAT"
estpost tabstat `variables', statistics(mean p50 sd min max n) columns(statistics)
esttab using temp, replace cells("Mean Median SD Min Max N") nomtitle nonumber
On the stata window: it shows me the following output:
Summary statistics: mean p50 sd min max count
for variables: taxratesales taxrateVAT
| e(mean) e(p50) e(sd) e(min) e(max) e(count)
-------------+------------------------------------------------------------------
taxratesales | 8.288876 8 4.568087 0 115 88935
taxrateVAT | 7.30818 5 4.8678 0 65 93709
. esttab using temp, replace cells("Mean Median SD Min Max N") nomtitle nonumber
(note: file temp.txt not found)
(output written to temp.txt)
But when I open the .txt file that is generated, I see the following:
------------------------------------------------------------------------------------------
Mean Median SD Min Max N
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
N 85992
------------------------------------------------------------------------------------------
Can anyone explain why this happened?
Thanks
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Tuesday, April 30, 2019
marginal effect of latent class logit model
I am trying to obtain the marginal effect of the membership variable in latent class logit model Array
(e.g. if the age is more elder, what would be the probability of being in class 1 and class2 ) , i can't find any command
(e.g. if the age is more elder, what would be the probability of being in class 1 and class2 ) , i can't find any command
Propensity score matching
Hi all,
I am currently trying to run PSM on STATA14 to compare effect of treatment 1 vs treatment 2 (control). I have a few questions before attempting the analysis:
1. How do I start the matching? Or am I right to say the teffects command would match treatment and control groups?
2. To work out the standardised difference, I presume I have to work out the means for each covariates prior to matching and then run the pstest command to obtain the means and standardised bias/ difference post-matching?
3. After analysis for example with teffect, unmatched group is displayed. However, I am more interested in the control group. Are they the same? If not how do I gather that?
4. How do I obtain the number of observations in both treatment and control groups post-matching?
Sorry for all the questions. I am relatively new to STATA so would appreciate if you guys can help guide me with this.
Regards
Ken
I am currently trying to run PSM on STATA14 to compare effect of treatment 1 vs treatment 2 (control). I have a few questions before attempting the analysis:
1. How do I start the matching? Or am I right to say the teffects command would match treatment and control groups?
2. To work out the standardised difference, I presume I have to work out the means for each covariates prior to matching and then run the pstest command to obtain the means and standardised bias/ difference post-matching?
3. After analysis for example with teffect, unmatched group is displayed. However, I am more interested in the control group. Are they the same? If not how do I gather that?
4. How do I obtain the number of observations in both treatment and control groups post-matching?
Sorry for all the questions. I am relatively new to STATA so would appreciate if you guys can help guide me with this.
Regards
Ken
Understanding Stata's default yscale() choice
Hi everyone,
I am trying to find out how Stata chooses the exact default range of the y-axis. The article "Stata tip 23: Regaining control over axis range" (The Stata Journal (2005), 5, Number 3, pp. 467-468) notes that "to determine the range of an axis, Stata begins with the minimum and maximum of the data. Then it will widen (but never narrow) the axis range as instructed byrange(). Finally, it will widen the axis if necessary to accommodate any axis labels."
Does anyone have additional insights on how Stata determines how much white space to leave below the minimum and above the maximum respectively for a simple plot command without additional user-specifications?
Many thanks,
Christina
I am trying to find out how Stata chooses the exact default range of the y-axis. The article "Stata tip 23: Regaining control over axis range" (The Stata Journal (2005), 5, Number 3, pp. 467-468) notes that "to determine the range of an axis, Stata begins with the minimum and maximum of the data. Then it will widen (but never narrow) the axis range as instructed byrange(). Finally, it will widen the axis if necessary to accommodate any axis labels."
Does anyone have additional insights on how Stata determines how much white space to leave below the minimum and above the maximum respectively for a simple plot command without additional user-specifications?
Many thanks,
Christina
Error message on assigning different methods of obtaining start values for latent class analysis
Hi all,
I am running a 3-class latent class model on 17 categorical indicators and keep getting error messages. The total sample size is 2771.
Initially I ran the following codes:
And got error message "initial value not feasible". Then I went back to Stata manuscript (intro 12 — Convergence problems and how to solve them) and learned to use -startvalues()- option to assign different method to obtain initial values. Then I typed the following codes:
But I got the error message "option startvalues() invalid; method iv not allowed". I could not figure out grammar errors in my codes. I greatly appreciate your any provided clues to help resolve the issue.
Many thanks in advance.
Mengmeng
I am running a 3-class latent class model on 17 categorical indicators and keep getting error messages. The total sample size is 2771.
Initially I ran the following codes:
gsem (gn10_category gn11_category gn12_category gn13_category gn16_category gn18_category gn19_category gn20_category gn21_category gn22_category gn23_category gn25_category gn27_category gn39_category gn40_category gn41_category gn44_category <-) if complete == 1, ologit lclass(C 3)
gsem (gn10_category gn11_category gn12_category gn13_category gn16_category gn18_category gn19_category gn20_category gn21_category gn22_category gn23_category gn25_category gn27_category gn39_category gn40_category gn41_category gn44_category <-) if complete == 1, ologit lclass(C 3) startvalues(iv)
Many thanks in advance.
Mengmeng
Adding controls, insignificant treatment var turns significant
Dear friends,
I am running a Fixed Effects regression to test the effect of a policy on firms' patent application. The outcome variable is number of patents.
I first run the simple model with FEs but without any controls:
Now the Treated variable is statistically insignificant.
I then add controls:
As you can see, after adding controls, the Treated variable turns statistically significant.
Can anyone help me understand why this is the case? I know that often time adding controls turns a significant variable insignificant as the controls can absorb some explanation power. But I just can't figure out why in my case, a insignificant variable becomes significant once controls are added.
And which result shall I trust? Does the policy really have a significant impact on firms' patent application?
Thank you very much!
I am running a Fixed Effects regression to test the effect of a policy on firms' patent application. The outcome variable is number of patents.
I first run the simple model with FEs but without any controls:
Code:
Conditional fixed-effects Poisson regression Number of obs = 47,536
Group variable: firm_id Number of groups = 7,670
Obs per group:
min = 2
avg = 6.2
max = 9
Wald chi2(9) = 469.47
Log pseudolikelihood = -36500.224 Prob > chi2 = 0.0000
(Std. Err. adjusted for clustering on firm_id)
----------------------------------------------------------------------------------
| Robust
application_num | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
treated | -.0541731 .2172701 -0.25 0.803 -.4800147 .3716685
|
application_year |
1999 | .5463735 .1321019 4.14 0.000 .2874586 .8052884
2000 | 1.042662 .2174868 4.79 0.000 .6163956 1.468928
2001 | 1.35629 .3345565 4.05 0.000 .7005712 2.012008
2002 | 2.180516 .3422223 6.37 0.000 1.509772 2.851259
2003 | 2.626014 .3407459 7.71 0.000 1.958164 3.293864
2004 | 2.883888 .3663026 7.87 0.000 2.165948 3.601827
2005 | 3.217897 .3785679 8.50 0.000 2.475918 3.959877
2006 | 3.594803 .4207512 8.54 0.000 2.770146 4.41946
----------------------------------------------------------------------------------
I then add controls:
Code:
Conditional fixed-effects Poisson regression Number of obs = 45,948
Group variable: firm_id Number of groups = 7,583
Obs per group:
min = 2
avg = 6.1
max = 9
Wald chi2(13) = 1555.79
Log pseudolikelihood = -22604.768 Prob > chi2 = 0.0000
(Std. Err. adjusted for clustering on firm_id)
----------------------------------------------------------------------------------
| Robust
application_num | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
treated | .1731523 .0770954 2.25 0.025 .0220481 .3242564
total_assets_log | .324462 .0806509 4.02 0.000 .1663891 .4825349
total_profit_log | -.6414265 .1401608 -4.58 0.000 -.9161365 -.3667165
cum_claims_log | 1.192533 .0713113 16.72 0.000 1.052765 1.3323
age_log | -.1982098 .1659591 -1.19 0.232 -.5234837 .1270642
|
application_year |
1999 | .2394592 .0855993 2.80 0.005 .0716876 .4072308
2000 | .2442759 .0859471 2.84 0.004 .0758226 .4127292
2001 | .1151367 .1044195 1.10 0.270 -.0895217 .3197951
2002 | .1297499 .1482964 0.87 0.382 -.1609057 .4204054
2003 | .0002627 .1531101 0.00 0.999 -.2998277 .300353
2004 | -.2282316 .1756614 -1.30 0.194 -.5725216 .1160585
2005 | -.4450677 .2097618 -2.12 0.034 -.8561931 -.0339422
2006 | -.7040352 .2517939 -2.80 0.005 -1.197542 -.2105283
----------------------------------------------------------------------------------
Can anyone help me understand why this is the case? I know that often time adding controls turns a significant variable insignificant as the controls can absorb some explanation power. But I just can't figure out why in my case, a insignificant variable becomes significant once controls are added.
And which result shall I trust? Does the policy really have a significant impact on firms' patent application?
Thank you very much!
CAPM OLS event study calculating CAR for multiple companies at the time
Hello Stata friends,
I am conducting a CAPM event study with 34 events and eight different companies. I would like to compare the different CARs for each of the 34 events.
All the data to compute CAR is in stata. The events are marked from 1 to 34.
Is there any convenient and elegant way to compute this?
Is there a possibility to compute at least the CAR for all eight companies per event?
Any help would be highly appreciated and will lead to infinite thankfulness.
All the best,
Lulas
I am conducting a CAPM event study with 34 events and eight different companies. I would like to compare the different CARs for each of the 34 events.
All the data to compute CAR is in stata. The events are marked from 1 to 34.
Is there any convenient and elegant way to compute this?
Is there a possibility to compute at least the CAR for all eight companies per event?
Any help would be highly appreciated and will lead to infinite thankfulness.
All the best,
Lulas
Time Dummy Variables in stcox
Hi,
I am analyzing the factors that affect the time-on-market when selling a house using -stcox-.
Because my data set includes houses that were listed anytime between 2012 and 2018, I'd like to allow the baseline hazard to vary for each year. I considered stratifying the regression by the -strata()- option but as part of my research I want to observe the "time" effects.
I then constructed dummy variables for each year (2012=0), ran -stcox- but got a very low hazard ratios for the last year (2018).
My questions are:
I am analyzing the factors that affect the time-on-market when selling a house using -stcox-.
Because my data set includes houses that were listed anytime between 2012 and 2018, I'd like to allow the baseline hazard to vary for each year. I considered stratifying the regression by the -strata()- option but as part of my research I want to observe the "time" effects.
I then constructed dummy variables for each year (2012=0), ran -stcox- but got a very low hazard ratios for the last year (2018).
My questions are:
- Is my approach to include the time dummy variables correct?
- Is there any reason why I got such a low hazard ratio for 2018? Could it be related to the fact that my data set includes right-censored observations?
Code:
stcox log_size log_price house_age i.year
failure _d: isSold == 1
analysis time _t: NumOfMntsOnMarket
id: HouseID
Iteration 0: log likelihood = -48285.125
Iteration 1: log likelihood = -47533.344
Iteration 2: log likelihood = -47472.034
Iteration 3: log likelihood = -47466.515
Iteration 4: log likelihood = -47466.406
Iteration 5: log likelihood = -47466.406
Refining estimates:
Iteration 0: log likelihood = -47466.406
Cox regression -- Breslow method for ties
No. of subjects = 6925 Number of obs = 6925
No. of failures = 5921
Time at risk = 18419
LR chi2(9) = 1637.44
Log likelihood = -47466.406 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_size | .7465455 .0093898 -23.24 0.000 .7283668 .7651779
log_price | .732612 .0354403 -6.43 0.000 .6663416 .8054732
house_age | 1.001557 .0006235 2.50 0.012 1.000336 1.00278
|
year |
2013 | 1.186201 .04516 4.49 0.000 1.10091 1.278099
2014 | 1.173349 .0468697 4.00 0.000 1.08499 1.268904
2015 | 1.263186 .0541346 5.45 0.000 1.161418 1.373872
2016 | 1.123802 .0608042 2.16 0.031 1.01073 1.249524
2017 | .4400602 .030792 -11.73 0.000 .3836644 .5047458
2018 | .151731 .01692 -16.91 0.000 .1219422 .1887968
------------------------------------------------------------------------------
Why is the mixed command slow? Can it be sped up?
I'm attempting to do a simulation comparing resutls from a random effects model with a random intercept to some other regression models, but the mixed command slows down my simulation too much to be useful. In my experience running mixed effects commands in SAS is relatively quick, and the LEMR command in R is relatively quick. Why is mixed so slow, and can it be sped up?
For context, I'm using stata 14. I have a very simple model that I run with "mixed y T || cluster_id:"
Thanks.
For context, I'm using stata 14. I have a very simple model that I run with "mixed y T || cluster_id:"
Thanks.
Population Variable Signifcance
I am currently analysing a Log-Lin model, in which I am adding the independent variable LN Pop to test the significance within the regression model.
The regression output within stata states the p value > 0.05 significance level, therefore i do not reject the Null hypothesis, implying LN Pop has significance.
However the stata output for the lincom command regarding LN DPI and LN POP where DPI is consumers real income = total personal expenditure + real savings) provides a p value of 0.000.
The Null hypothesis i am testing is: Ho = LogPop + Beta y (income elasticity) = 1
The stata output states i should reject the Null, with Population having no significance, I am asking for help on how population can both be significant and insignificant and what is the implication of this? whether further regression is needed using per capita variations rather than aggregate.
Apologies for any confusion in the question, I am new and unfamiliar to stata and econometric regression, any help is appreciated.
Thanks,
J
The regression output within stata states the p value > 0.05 significance level, therefore i do not reject the Null hypothesis, implying LN Pop has significance.
However the stata output for the lincom command regarding LN DPI and LN POP where DPI is consumers real income = total personal expenditure + real savings) provides a p value of 0.000.
The Null hypothesis i am testing is: Ho = LogPop + Beta y (income elasticity) = 1
The stata output states i should reject the Null, with Population having no significance, I am asking for help on how population can both be significant and insignificant and what is the implication of this? whether further regression is needed using per capita variations rather than aggregate.
Apologies for any confusion in the question, I am new and unfamiliar to stata and econometric regression, any help is appreciated.
Thanks,
J
Regression on unbalanced panel
Hi,
I'm currently working on impact of risk - taking behavior on firm growth, and here is my panel: (the number wasn't real)
My panel is unbalanced.
According to XU Peng's paper, Risk taking and firm growth
RISK: the standard deviation of EBITDA(t)/Assets(t) over 4 years.
Performance : sum of EBITDA(t)/Assets(t) over 4 2000-2003.
So I calculated EBITDA/Assets of each firm each year.
Performance of 2000= sum(Ebitda/assets firm A 2000, ebitda/assets firm B 2000 and so on)
RISK 2000= stdev.p(Ebitda/assets firm A 2000, ebitda/assets firm B 2000 and so on)
4 rows, 4 year of RISK and PERFORMANCE
So my questions are:
1. Did I calculate RISK and PERFORMANCE right?
2. How can I regress those, with
growth= risk + control variables
performance=risk +control variable
risk = age + size + ownership+ leverage + income
3. How about auto-correlation Wooldridge test, White test and Variance inflation factor (VIF) test, are they can be run normally?
I'm a beginner and I do this research for requirement, so I don't know much about this. Thanks for your time. I'm appreciated any help.
I'm currently working on impact of risk - taking behavior on firm growth, and here is my panel: (the number wasn't real)
| NAME | YEAR | GROWTH | INCOME | AGE | SIZE | RISK | PERFORMANCE |
| A | 2000 | 12 | 8 | 6 | 6 | 3 | 4 |
| A | 2001 | 14 | 6 | 7 | 6 | 2 | 3 |
| A | 2002 | 15 | 9 | 7 | 6 | 2 | 2 |
| A | 2003 | 16 | 5 | 6 | 4 | 3 | 3 |
| B | 2000 | 14 | 3 | 4 | 3 | ||
| B | 2001 | 17 | 2 | 3 | 4 | ||
| B | 2002 | 13 | 5 | 2 | 2 | ||
| B | 2003 | 12 | 3 | 5 | 9 | ||
| C | 2000 | 22 | 2 | 6 | 3 | ||
| C | 2001 | 17 | 7 | 7 | 4 | ||
| C | 2002 | 22 | 4 | 4 | 5 | ||
| C | 2003 | 34 | 4 | 3 | 7 |
According to XU Peng's paper, Risk taking and firm growth
RISK: the standard deviation of EBITDA(t)/Assets(t) over 4 years.
Performance : sum of EBITDA(t)/Assets(t) over 4 2000-2003.
So I calculated EBITDA/Assets of each firm each year.
Performance of 2000= sum(Ebitda/assets firm A 2000, ebitda/assets firm B 2000 and so on)
RISK 2000= stdev.p(Ebitda/assets firm A 2000, ebitda/assets firm B 2000 and so on)
4 rows, 4 year of RISK and PERFORMANCE
So my questions are:
1. Did I calculate RISK and PERFORMANCE right?
2. How can I regress those, with
growth= risk + control variables
performance=risk +control variable
risk = age + size + ownership+ leverage + income
3. How about auto-correlation Wooldridge test, White test and Variance inflation factor (VIF) test, are they can be run normally?
I'm a beginner and I do this research for requirement, so I don't know much about this. Thanks for your time. I'm appreciated any help.
Pincipal Component Analysis Index
Hi everyone.
I am working with data from 126 schools in rural Angola. I want to create a index for school infrastructure and use it in my regressions. My data looks as following:
I standarsized all the measures of school infrastructure that I want to include and I used the command -predict- in order to create my Index. Some of the variables included are dummy variables, but since I standartized them all, they are all centered at zero. However, I am new to the concept of PCA and I am not sure what I am doing in STATA is correct. I am using the following code:
local measures "std_I_water15 std_I_electricity15 std_bathrooms15 std_I_chairs15 std_I_classrooms15"
pca measures
predict indexpca15
pca std_I_water15 std_I_electricity15 std_bathrooms15 std_I_chairs15 std_I_classrooms15
Principal components/correlation Number of obs = 126
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.6917 .398078 0.3383 0.3383
Comp2 | 1.29363 .479866 0.2587 0.5971
Comp3 | .813761 .12548 0.1628 0.7598
Comp4 | .688281 .175654 0.1377 0.8975
Comp5 | .512627 . 0.1025 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
-------------+--------------------------------------------------+-------------
std_I_wat~15 | 0.2529 0.6278 0.4259 0.5311 -0.2802 | 0
std_I_ele~15 | 0.5307 0.2801 0.3070 -0.6114 0.4146 | 0
std_bathr~15 | 0.5189 0.0835 -0.6795 0.3800 0.3430 | 0
std_I_cha~15 | 0.2406 -0.6324 0.5076 0.4074 0.3442 | 0
std_I_cla~15 | 0.5720 -0.3472 -0.0702 -0.1837 -0.7166 | 0
------------------------------------------------------------------------------
Q1. Do I need to -rotate- the PCA; if yes, what is the interpretation for the rotation?
Q2. Once I run the code, I obtain the unexplained variance always equal to zero; does this make sense or I am doing something uncorrect?
Q3. Would you suggest a different iter to obtain the PCA?
I am working with data from 126 schools in rural Angola. I want to create a index for school infrastructure and use it in my regressions. My data looks as following:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long school_id float pc15 byte chalkboards15 float bathrooms15 int I_desks15 byte I_classrooms15 1 0 17 1 21 9 2 0 16 2 18 8 3 0 21 0 5 13 4 0 9 9 5 0 5 1 10 0 6 5 6 1 12 8 16 12 7 1 11 4 15 9 8 0 7 2 120 3 9 1 2 2 2 2 10 1 28 4 12 23 11 0 6 1 9 6 12 0 36 2 7 7 13 1 13 4 4 11 14 1 7 3 3 3 15 0 13 1 6 2 16 0 10 1 8 10 17 0 8 0 5 3 18 0 10 0 5 3 19 0 34 2 17 13 20 0 4 2 6 2 21 0 25 0 3 3 22 1 16 2 14 11 23 0 5 0 100 3 24 1 9 2 8 5 25 0 5 0 119 3 26 0 14 0 1 2 27 0 4 0 3 2 28 0 3 2 120 3 29 0 12 4 20 9 30 0 0 0 0 0 31 0 2 0 82 2 32 0 3 2 3 3 33 0 20 1 14 9 34 0 10 2 14 6 35 0 8 2 15 8 36 0 6 4 6 6 37 0 6 2 3 5 38 1 13 0 3 10 39 0 7 2 5 5 40 0 14 0 139 4 41 0 8 0 2 1 42 0 6 0 1 0 43 0 15 2 2 3 44 0 3 2 3 3 45 0 13 2 300 13 46 0 6 0 4 3 47 0 7 2 5 6 48 0 3 2 0 3 49 0 3 2 4 3 50 0 6 0 2 3 51 0 6 2 3 3 52 0 8 2 1 3 53 0 5 2 4 3 54 0 3 2 0 3 55 0 9 0 3 3 56 0 4 2 0 2 57 0 7 2 3 3 58 0 4 2 62 2 59 0 5 2 2 3 60 0 4 2 4 2 61 0 12 2 404 11 62 0 5 0 3 3 63 0 2 2 1 2 64 0 2 2 0 2 65 0 8 2 4 2 66 0 6 0 4 0 67 0 6 1 3 2 68 0 6 1 5 3 69 0 12 2 9 3 70 0 3 2 1 2 71 0 10 2 6 5 72 0 3 0 1 0 73 0 5 2 4 3 74 0 8 2 2 6 75 0 6 0 3 0 76 0 6 2 270 6 77 0 6 0 2 0 78 0 7 0 5 0 79 0 9 0 0 1 80 0 4 1 4 4 81 1 8 2 25 7 82 0 8 0 1 7 83 0 16 2 8 3 84 0 4 2 5 3 85 0 6 2 1 5 86 0 4 0 120 3 87 0 18 4 8 18 88 0 11 0 5 3 89 0 9 2 9 7 90 0 8 1 7 6 91 0 6 2 6 6 92 0 13 4 13 11 93 0 14 4 7 4 94 0 2 0 75 2 95 0 7 0 3 2 96 0 3 2 5 3 97 0 10 2 6 8 98 0 3 0 120 3 99 0 8 2 7 4 100 0 2 0 0 1 end
local measures "std_I_water15 std_I_electricity15 std_bathrooms15 std_I_chairs15 std_I_classrooms15"
pca measures
predict indexpca15
pca std_I_water15 std_I_electricity15 std_bathrooms15 std_I_chairs15 std_I_classrooms15
Principal components/correlation Number of obs = 126
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.6917 .398078 0.3383 0.3383
Comp2 | 1.29363 .479866 0.2587 0.5971
Comp3 | .813761 .12548 0.1628 0.7598
Comp4 | .688281 .175654 0.1377 0.8975
Comp5 | .512627 . 0.1025 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
-------------+--------------------------------------------------+-------------
std_I_wat~15 | 0.2529 0.6278 0.4259 0.5311 -0.2802 | 0
std_I_ele~15 | 0.5307 0.2801 0.3070 -0.6114 0.4146 | 0
std_bathr~15 | 0.5189 0.0835 -0.6795 0.3800 0.3430 | 0
std_I_cha~15 | 0.2406 -0.6324 0.5076 0.4074 0.3442 | 0
std_I_cla~15 | 0.5720 -0.3472 -0.0702 -0.1837 -0.7166 | 0
------------------------------------------------------------------------------
Q1. Do I need to -rotate- the PCA; if yes, what is the interpretation for the rotation?
Q2. Once I run the code, I obtain the unexplained variance always equal to zero; does this make sense or I am doing something uncorrect?
Q3. Would you suggest a different iter to obtain the PCA?
latent class analysis: Can Stata fit 2 latent variables (categorical) in one model
Hi Statalists,
I am trying to do a latent class analysis with 2 latent variables (categorical) within the same dataset and within the sam model.
For example, latent variable c1 will be determined by v1-v4, while latent variable c2 will be determined by v5-v8. I wonder if Stata could handle this?
Any thought would be much appreciated!
Thank-you in advance!
Yingyi
I am trying to do a latent class analysis with 2 latent variables (categorical) within the same dataset and within the sam model.
For example, latent variable c1 will be determined by v1-v4, while latent variable c2 will be determined by v5-v8. I wonder if Stata could handle this?
Any thought would be much appreciated!
Thank-you in advance!
Yingyi
importing dataset with long variables names into stata through loop statements
Dear Stata Users,
I am trying to import dataset below into stata from csv Array but some variable names are too long and and are separated by '/' so they come out in the format v23, v24 v25, etc. Kindly I seek guidance on how I can write a loop statement in stata to import this dataset into stata without renaming the variables in the format v23, v24 and etc. I I would prefer to start ready from right and stops where the 1st '/' and pick that as the variable name. I believe this is possible but through loops. Kindly I would appreciate any guidance on how to fix this. see attached data format for guidance.
Thank you.
Collins
I am trying to import dataset below into stata from csv Array but some variable names are too long and and are separated by '/' so they come out in the format v23, v24 v25, etc. Kindly I seek guidance on how I can write a loop statement in stata to import this dataset into stata without renaming the variables in the format v23, v24 and etc. I I would prefer to start ready from right and stops where the 1st '/' and pick that as the variable name. I believe this is possible but through loops. Kindly I would appreciate any guidance on how to fix this. see attached data format for guidance.
Thank you.
Collins
Interpreting main effects model with omitted interaction terms
Suppose I have two regressors, task availability (Xa) and task participation (Xp), and a DV Y. One can only participate in a task if it is available, but one can choose not to participate even if a task is available. Baseline model Y ~ Xa, should be the total effect of task availability. Y ~ Xa + Xa * Xp, adds the effect of participation. Now, if the interaction model is the true model, would Y ~ Xa be biased, because of omitted variable problem? That is, the total effect cannot be estimated using the baseline model?
graph hbox qs
Hey guys,
i have a question about relabel function, so here is my command
my question is, when I am trying re label option, I use 1 and 2, those are the ones that worked, but I coded morethan50k as 0 and 1 (1 being more than 50k), why stata wont relabel based on the values inserted in the cells, why 1,2. and not 0,1
does my question make sense?
thanks
i have a question about relabel function, so here is my command
graph hbox age, over(morethan50k, relabel (1"lots of money" 2"less money")) over(sex, relabel(1"male" 2"female")) name(graph1, repla
> ce) nooutsides asy
> ce) nooutsides asy
does my question make sense?
thanks
Multinomial fractional response model in panel data
Hello all!
I am mainly working in the context of rural non-farm sector diversification. Thus, i want to model the diversification strategies of the farmers. My dependent variable is share of income from a particular category in total income of the household. Thus, i want to estimate the diversification decisions of the household using a multinomial fractional response model. However, I have a panel data . So how can the fmlogit command be extended in this case to account for panel data? I am really stuck because of the STATA Command. Any reply would be appreciated.
Help Interpreting Results from regress function with unbalanced panel data
Hello! First post here, will do my best to follow all FAQ rules.
I am working on a project with a dataset that has 162,000 observations and 52 variables. Each observation is firm results from a given year. Overall, I am seeking to determine effect of immigration in a given Norwegian municipality on individual firm performance.
variables of interest are:
imm_share : % of workforce in a given municipality in a given year that is classified as an immigrant
ROA: Return on Assets, Firm Profit divided by Firm Assets in a given year
aar: year dummy
industry: industry the firm operates dummy
log_ansatte: log of number of employees at a firm in a given year
log_firmage: log of firm age in a given year
the employees and firm age are meant to be proxies for firm size.
example of dataset:
. dataex ROA imm_share aar log_firmage log_salg
----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ROA imm_share) int aar float(log_firmage log_salg)
.0858681 .02696629 2001 3.0910425 9.262743
.04753989 .05016723 2001 1.94591 9.31722
.16474044 .036985237 2001 2.1972246 9.242129
.04280008 .04942902 2001 3.178054 9.332735
.06279306 .029482344 2001 4.204693 11.091865
.036365848 .031799663 2001 2.833213 11.284744
our estimation and results:
reg ROA imm_share i.aar i.industry log_firmage log_ans if e(sample),vce(cluster cid)
Array
MY QUESTION:
When we run this regress function, and use several variations of control variables, we are always getting 0.00 to 0.012 p value. We wouldnt expect this level of significance. Anyone have some steps to correct or a possible explanation? What would this result signify?
We are stumped as how to best explain this part of the results.
Thank you so much for any insight you can provide.
I am working on a project with a dataset that has 162,000 observations and 52 variables. Each observation is firm results from a given year. Overall, I am seeking to determine effect of immigration in a given Norwegian municipality on individual firm performance.
variables of interest are:
imm_share : % of workforce in a given municipality in a given year that is classified as an immigrant
ROA: Return on Assets, Firm Profit divided by Firm Assets in a given year
aar: year dummy
industry: industry the firm operates dummy
log_ansatte: log of number of employees at a firm in a given year
log_firmage: log of firm age in a given year
the employees and firm age are meant to be proxies for firm size.
example of dataset:
. dataex ROA imm_share aar log_firmage log_salg
----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ROA imm_share) int aar float(log_firmage log_salg)
.0858681 .02696629 2001 3.0910425 9.262743
.04753989 .05016723 2001 1.94591 9.31722
.16474044 .036985237 2001 2.1972246 9.242129
.04280008 .04942902 2001 3.178054 9.332735
.06279306 .029482344 2001 4.204693 11.091865
.036365848 .031799663 2001 2.833213 11.284744
our estimation and results:
reg ROA imm_share i.aar i.industry log_firmage log_ans if e(sample),vce(cluster cid)
Array
MY QUESTION:
When we run this regress function, and use several variations of control variables, we are always getting 0.00 to 0.012 p value. We wouldnt expect this level of significance. Anyone have some steps to correct or a possible explanation? What would this result signify?
We are stumped as how to best explain this part of the results.
Thank you so much for any insight you can provide.
Clustering across Individuals/Households AND Areas
Dear Statalisters,
I am currently working on a dataset in China regarding Individual spending. I have panel data and run something like the following:
Basically, I have a panel data set with numerous individuals. I have a measure for HomocideRate in the respective city the individual lived in at time t. I therefore want to cluster at city level. I think this should make sense. Then furthermore I have variables on individual characteristics, like height, age, gender, and personal income characteristics.
Then I have more characteristics across households: The survey basically asks things about the household, i.e. 'How much children live in this household', 'to how many cars does this household have access' etc. I also include these in my individual regression, where the values are all the same for each individual member of the household. From my understanding, I now also should cluster at the household level.
Is this thinking correct? --> I essentially want to cluster at City- AND at Household-level.
Many thanks in advance,
Andreas
I am currently working on a dataset in China regarding Individual spending. I have panel data and run something like the following:
Code:
xtset ID year xtreg Spending HomicideRateCity PersonalIncomeCharacteristics Gender Age HouseholdCharacteristics, fe vce(XXXXXX)
Then I have more characteristics across households: The survey basically asks things about the household, i.e. 'How much children live in this household', 'to how many cars does this household have access' etc. I also include these in my individual regression, where the values are all the same for each individual member of the household. From my understanding, I now also should cluster at the household level.
Is this thinking correct? --> I essentially want to cluster at City- AND at Household-level.
Many thanks in advance,
Andreas
Interaction term in survival analysis (streg)
Hello,
I am posting my first question here to ask how to interpret the interaction term in survival analysis regression.
I'm working on the survival analysis, using the exponential model.
And I would like to include the interaction term between my main variable(work1, time-varying) and calendar year to see how the effects of 'work1' vary with time.
'work1' variable is categorical variable with 3 categories: employed(reference), never employed and previously employed.
And 'calyear' variable is also a categorical variable with 7 categories: 1980-1984(reference), 1985-1989, and so on..
So I used the command:
streg i.work1 i.calyear work1#calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
and got the result like this (I cropped the result of the rest of variables, being too long) :
Array
So, I have two questions regarding this result.
The first question is how to interpret the hazard ratio of each category for the interaction term.
Since the categories for interaction term that include any of reference category are missing,
it is not clear for me what the hazard ratio means.
For example, what does the coefficient of 'never employed#198-1989' mean?
It might be a relative risk, but compared to what?
And the second question is,
My prime interest would be to see how the hazard ratio of each category of 'work1' variable changes over time..
So I am wondering if there is a way to get the hazard ratio of every category of interaction term?
I tried (1) margins command after running the regression, and I found 'margins' is not suitable to get what I want in the survival analysis context.
and (2) including only interaction term without main effect to make Stata show all the categories, by trying this command:
streg i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
but excluding the main effects themselves might not be appropriate.
So if there is anyone who can possibly have an answer, it would be very grateful to share your knowledge.
Thank you so much for your attention!
I will be looking forward to hearing from you!
I am posting my first question here to ask how to interpret the interaction term in survival analysis regression.
I'm working on the survival analysis, using the exponential model.
And I would like to include the interaction term between my main variable(work1, time-varying) and calendar year to see how the effects of 'work1' vary with time.
'work1' variable is categorical variable with 3 categories: employed(reference), never employed and previously employed.
And 'calyear' variable is also a categorical variable with 7 categories: 1980-1984(reference), 1985-1989, and so on..
So I used the command:
streg i.work1 i.calyear work1#calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
and got the result like this (I cropped the result of the rest of variables, being too long) :
Array
So, I have two questions regarding this result.
The first question is how to interpret the hazard ratio of each category for the interaction term.
Since the categories for interaction term that include any of reference category are missing,
it is not clear for me what the hazard ratio means.
For example, what does the coefficient of 'never employed#198-1989' mean?
It might be a relative risk, but compared to what?
And the second question is,
My prime interest would be to see how the hazard ratio of each category of 'work1' variable changes over time..
So I am wondering if there is a way to get the hazard ratio of every category of interaction term?
I tried (1) margins command after running the regression, and I found 'margins' is not suitable to get what I want in the survival analysis context.
and (2) including only interaction term without main effect to make Stata show all the categories, by trying this command:
streg i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust
but excluding the main effects themselves might not be appropriate.
So if there is anyone who can possibly have an answer, it would be very grateful to share your knowledge.
Thank you so much for your attention!
I will be looking forward to hearing from you!

How to imputate missing values for one year in a pooled cross section dataset?
Hello everyone, I am new to Statalist and I will try to be brief and concise.
I am working with the Ethiopian Medium and Large Manufacturing Census for the years 1998-2009, carried out by the Central Statistical Agency of Ethiopia (at the following link you can find the metadata for the year 2009 http://catalog.ihsn.org/index.php/ca...ata_dictionary).
My problem is that in 2005 a survey was conducted (instead of the whole census), and this feature seems to have an influence on the outcome of my analysis.
I think that this survey has not been carried out using a random sampling approach because my summary statistics on the share of private and public firms in 2005 are fairly different with respect to the other years (in particular, the share of public firms seems to be higher wrt the rest of the years). Also the summary statistics on my main variables of interest (i.e. wages and number of workers) seem to be somehow biased for the year 2005.
Is there any way to imputate the values for the year 2005 instead than using the survey? I thought about calculating an average between the value of the variables in the years 2004-2006, even thou I am aware that this is not a very precise approach....any other advice?
I am posting the table of my summary statistics with the red color to undeline the things which are a bit "wierd" in order to allow u to see the problem:
Thank you in advance!
PS I know, also the year 2008 is not so nice ahen it comes to summary statistics...
I am working with the Ethiopian Medium and Large Manufacturing Census for the years 1998-2009, carried out by the Central Statistical Agency of Ethiopia (at the following link you can find the metadata for the year 2009 http://catalog.ihsn.org/index.php/ca...ata_dictionary).
My problem is that in 2005 a survey was conducted (instead of the whole census), and this feature seems to have an influence on the outcome of my analysis.
I think that this survey has not been carried out using a random sampling approach because my summary statistics on the share of private and public firms in 2005 are fairly different with respect to the other years (in particular, the share of public firms seems to be higher wrt the rest of the years). Also the summary statistics on my main variables of interest (i.e. wages and number of workers) seem to be somehow biased for the year 2005.
Is there any way to imputate the values for the year 2005 instead than using the survey? I thought about calculating an average between the value of the variables in the years 2004-2006, even thou I am aware that this is not a very precise approach....any other advice?
I am posting the table of my summary statistics with the red color to undeline the things which are a bit "wierd" in order to allow u to see the problem:
| Evolution of Ethiopian manufacturing sector, average values | |||||||||||||
| 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | ||
| Number of firms | 725 | 739 | 731 | 765 | 883 | 939 | 997 | 991 | 1153 | 1339 | 1734 | 1948 | |
| Share of Private | 0.81 | 0.81 | 0.82 | 0.83 | 0.85 | 0.86 | 0.86 | 0.64 | 0.88 | 0.91 | 0.43 | 0.95 | |
| Share of Public | 0.19 | 0.19 | 0.18 | 0.17 | 0.15 | 0.14 | 0.14 | 0.36 | 0.12 | 0.09 | 0.57 | 0.05 | |
| Median employment | 20 | 21 | 21 | 23 | 18 | 20 | 23 | 28 | 24 | 20 | 17 | 16 | |
| Share of firms located in the capital | 0.66 | 0.63 | 0.63 | 0.60 | 0.61 | 0.58 | 0.55 | 0.46 | 0.53 | 0.50 | 0.44 | 0.39 | |
| Exported value added | 0.0206 | 0.0217 | 0.0233 | 0.0238 | 0.0194 | 0.0227 | 0.0208 | 0.0262 | 0.0205 | 0.0195 | 0.0151 | 0.0166 | |
| Capital intensity (capital/worker) '000 Birr | 26.46 | 23.94 | 38.48 | 121.03 | 68.09 | 69.16 | 79.73 | 102.11 | 89.90 | 84.89 | 114.28 | 122.19 | |
| Gender pay gap (Wm-Wf)/Wm | 0.16 | 0.13 | 0.16 | 0.13 | 0.15 | 0.17 | 0.13 | -0.25 | 0.05 | 0.11 | 0.12 | 0.02 | |
| Gender gap in workers comp (Nm-Nf)/Nm | 0.50 | 0.45 | 0.45 | 0.43 | 0.43 | 0.45 | 0.48 | 0.37 | 0.40 | 0.42 | 0.41 | 0.30 | |
| Technology level of the industry, ISIC classification, share | |||||||||||||
| 1 | 0.50 | 0.50 | 0.51 | 0.52 | 0.51 | 0.49 | 0.50 | 0.41 | 0.47 | 0.44 | 0.40 | 0.39 | |
| 2 | 0.20 | 0.19 | 0.20 | 0.20 | 0.21 | 0.22 | 0.23 | 0.18 | 0.25 | 0.29 | 0.36 | 0.38 | |
| 3 | 0.26 | 0.26 | 0.24 | 0.23 | 0.23 | 0.24 | 0.23 | 0.13 | 0.24 | 0.24 | 0.21 | 0.20 | |
| 4 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | |
| . | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.03 | 0.27 | 0.04 | 0.03 | 0.03 | 0.03 | |
| Share of firms in each industry | |||||||||||||
| Food and Beverage | 0.28 | 0.28 | 0.29 | 0.31 | 0.30 | 0.29 | 0.29 | 0.21 | 0.29 | 0.26 | 0.25 | 0.25 | |
| Textile and Garments | 0.07 | 0.07 | 0.07 | 0.07 | 0.06 | 0.06 | 0.07 | 0.06 | 0.06 | 0.05 | 0.03 | 0.04 | |
| Leather and Footwear | 0.08 | 0.07 | 0.06 | 0.07 | 0.06 | 0.06 | 0.06 | 0.06 | 0.05 | 0.05 | 0.04 | 0.04 | |
| Wood and Furniture | 0.03 | 0.03 | 0.03 | 0.02 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.03 | 0.02 | |
| Printing and Paper | 0.07 | 0.08 | 0.09 | 0.08 | 0.08 | 0.08 | 0.07 | 0.08 | 0.07 | 0.07 | 0.06 | 0.05 | |
| Chemical and Plastic | 0.09 | 0.09 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.09 | 0.10 | 0.09 | 0.08 | 0.07 | |
| Non Metal | 0.11 | 0.11 | 0.10 | 0.11 | 0.11 | 0.12 | 0.12 | 0.07 | 0.12 | 0.20 | 0.26 | 0.29 | |
| Metal and Machinery | 0.27 | 0.27 | 0.27 | 0.26 | 0.27 | 0.28 | 0.28 | 0.17 | 0.29 | 0.24 | 0.23 | 0.23 | |
| Explained share | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.77 | 1.00 | 1.00 | 1.00 | 1.00 | |
Thank you in advance!
PS I know, also the year 2008 is not so nice ahen it comes to summary statistics...
generate yearly date variable
Hi,
very simple question:
I want to generate a yearly time variable from 1870 - 2017 in my stata spreadsheet. I could not find the code yet.
many thanks
C
very simple question:
I want to generate a yearly time variable from 1870 - 2017 in my stata spreadsheet. I could not find the code yet.
many thanks
C
for loop
Hi there,
I have the following code in stata:
*2018
*Form 4
generate Form4= 1 if Performance2018 ==1 & form== "4"
replace Form4= 0 if Performance2018==0 & form=="4"
replace Form4= 3 if Performance2018==3 & form=="4"
replace Form4= 5 if Performance2018==5 & form=="4"
label values Form4 Einteilung
*Distinct
egen tag2018 = tag(Form4 cik)
generate distinctform2018= 1 if tag2018 ==1 & Form4 == 1 & year ==2018
replace distinctform2018 = 3 if tag2018 ==1 & Form4 == 3 & year ==2018
replace distinctform2018 = 0 if tag2018 ==1 & Form4 == 0 & year ==2018
replace distinctform2018 = 5 if tag2018 ==1 & Form4 == 5 & year ==2018
label values distinctform2018 Einteilung
drop tag2018
*Average
egen tag2018 = tag(distinctform2018 Form4)
bysort cik Form4: gen Form4forms2018 = _N
generate Form4_gute2018= Form4forms2018 if Form4 ==1 & distinctform2018 == 1
generate Form4_schlechte2018= Form4forms2018 if Form4 ==0 & distinctform2018 == 0
drop Form4 distinctform2018 tag2018 Form4forms2018
My concern is to repeat this Code for every year till 2009. Is it possible with a for loop code that only the year (2018, 2017,2016,2015...) will change? Otherwise i have copy and paste the whole code and change the year manually.
Thank you a lot in advance!
Cheers,
Sergej
I have the following code in stata:
*2018
*Form 4
generate Form4= 1 if Performance2018 ==1 & form== "4"
replace Form4= 0 if Performance2018==0 & form=="4"
replace Form4= 3 if Performance2018==3 & form=="4"
replace Form4= 5 if Performance2018==5 & form=="4"
label values Form4 Einteilung
*Distinct
egen tag2018 = tag(Form4 cik)
generate distinctform2018= 1 if tag2018 ==1 & Form4 == 1 & year ==2018
replace distinctform2018 = 3 if tag2018 ==1 & Form4 == 3 & year ==2018
replace distinctform2018 = 0 if tag2018 ==1 & Form4 == 0 & year ==2018
replace distinctform2018 = 5 if tag2018 ==1 & Form4 == 5 & year ==2018
label values distinctform2018 Einteilung
drop tag2018
*Average
egen tag2018 = tag(distinctform2018 Form4)
bysort cik Form4: gen Form4forms2018 = _N
generate Form4_gute2018= Form4forms2018 if Form4 ==1 & distinctform2018 == 1
generate Form4_schlechte2018= Form4forms2018 if Form4 ==0 & distinctform2018 == 0
drop Form4 distinctform2018 tag2018 Form4forms2018
My concern is to repeat this Code for every year till 2009. Is it possible with a for loop code that only the year (2018, 2017,2016,2015...) will change? Otherwise i have copy and paste the whole code and change the year manually.
Thank you a lot in advance!
Cheers,
Sergej
Unsure if SCARs are correct (event study)
Dear Statalist community,
I'm working on an event study and I want to use standardized cumulative abnormal returns, however I'm unsure if my results are correct. I want to use SCAR(-1, +1) and SCAR(-2, +2) in my study.
I'm using the following data:
With the following code:
My results are as follows for SCAR(-2, +2)
Array
Thank you in advance!
I'm working on an event study and I want to use standardized cumulative abnormal returns, however I'm unsure if my results are correct. I want to use SCAR(-1, +1) and SCAR(-2, +2) in my study.
I'm using the following data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int eventtime double(return marketreturn) float(firmid NR_MMM AR_MMM) -171 -.027066 -.025666 1 -.02592428 -.0011417216 -170 -.000902 .001233 1 .001208978 -.0021109781 -169 .019445 .019076 1 .01920737 .00023763232 -168 .001948 .001974 1 .0019564312 -8.431195e-06 -167 .015204 .014315 1 .014404905 .000799095 -166 .018788 .01829 1 .018414522 .0003734784 -165 -.003079 -.003588 1 -.0036540066 .00057500665 -164 .009817 .008036 1 .008071223 .0017457767 -163 .008534 .008818 1 .008860034 -.0003260339 -162 .001437 .000725 1 .000696554 .000740446 -161 -.000226 .001276 1 .0012523525 -.0014783525 -160 -.007799 -.006825 1 -.006919197 -.0008798032 -159 -.004298 -.004716 1 -.00479183 .0004938302 -158 .014726 .014853 1 .01494759 -.00022159042 -157 .003398 .00457 1 .004575039 -.0011770388 -156 -.000676 .000271 1 .0002386002 -.0009146002 -155 -.001174 -.001421 1 -.001468135 .0002941349 -154 -.00803 -.005825 1 -.005910488 -.0021195116 -153 .013958 .016628 1 .016738048 -.002780048 -152 .009512 .01103 1 .011091297 -.0015792975 -151 -.002833 -.001913 1 -.0019644196 -.0008685804 -150 -.004484 -.002554 1 -.002611002 -.001872998 -149 .013662 .01184 1 .011908351 .0017536485 -148 -.00221 -.00045 1 -.00048867875 -.0017213213 -147 -.004013 -.00481 1 -.004886649 .0008736486 -146 .012462 .011874 1 .011942647 .0005193526 -145 .003109 .002728 1 .0027169974 .0003920026 -144 -.003391 -.003545 1 -.003610632 .00021963214 -143 -.001398 -.001132 1 -.001176618 -.00022138184 -142 -.000379 -.000348 1 -.0003857905 6.790481e-06 -141 -.009617 -.009823 1 -.009943306 .0003263055 -140 .00129 .001511 1 .001489399 -.000199399 -139 -.004136 -.003228 1 -.0032908716 -.0008451284 -138 -.014517 -.01399 1 -.014146594 -.0003704057 -137 -.009785 -.011207 1 -.011339358 .001554358 -136 .013755 .014903 1 .014998026 -.0012430262 -135 -.001368 -.001339 1 -.0013854208 .000017420794 -134 .015289 .016162 1 .01626799 -.0009789907 -133 -.000995 -.001123 1 -.0011675397 .00017253975 -132 .003012 .00381 1 .0038084204 -.0007964204 -131 -.000676 -.001235 1 -.001280515 .000604515 -130 .002372 .001222 1 .0011978822 .0011741178 -129 .001165 -.000129 1 -.0001648833 .0013298832 -128 .00078 .000594 1 .00056441315 .00021558686 -127 -.003856 -.004641 1 -.004716177 .000860177 -126 .009524 .010681 1 .010739258 -.0012152576 -125 -.010729 -.010996 1 -.01112652 .000397521 -124 -.01433 -.014374 1 -.01453394 .00020393883 -123 .016455 .020526 1 .020669995 -.004214995 -122 -.009795 -.00699 1 -.007085634 -.002709366 -121 -.006114 -.00649 1 -.00658128 .0004672795 -120 -.0066 -.007739 1 -.007841157 .001241157 -119 .00205 .002251 1 .0022358433 -.00018584334 -118 -.019851 -.019423 1 -.01962691 -.00022409014 -117 .001597 .004756 1 .004762659 -.003165659 -116 .011313 .010619 1 .010676718 .0006362817 -115 .014906 .014515 1 .014606647 .0002993528 -114 -.014615 -.015041 1 -.015206748 .0005917477 -113 -.015068 -.017797 1 -.017986748 .0029187484 -112 .007152 .007778 1 .007810976 -.0006589764 -111 .008839 .008817 1 .008859024 -.000020024383 -110 .013685 .012418 1 .012491385 .0011936155 -109 -.000831 -.001599 1 -.001647685 .0008166851 -108 -.003328 -.002179 1 -.0022327362 -.0010952638 -107 .010021 .01063 1 .010687814 -.0006668141 -106 -.007376 -.007217 1 -.007314611 -.00006138924 -105 -.008202 -.009412 1 -.009528726 .0013267263 -104 -.0149 -.015304 1 -.015472038 .0005720377 -103 .001397 .002012 1 .001994762 -.0005977621 -102 -.013822 -.013115 1 -.013263974 -.00055802567 -101 -.02396 -.0237 1 -.023941156 -.00001884448 -100 -.010925 -.010838 1 -.010967145 .00004214474 -99 -.001716 .000853 1 .0008256687 -.002541669 -98 .006078 .007803 1 .007836194 -.001758194 -97 -.025859 -.024965 1 -.025217174 -.0006418264 -96 .015461 .016696 1 .016806642 -.0013456416 -95 -.021697 -.021599 1 -.02182186 .00012485836 -94 -.002418 .000532 1 .0005018732 -.002919873 -93 -.010609 -.011694 1 -.0118306 .001221599 -92 .005685 .005195 1 .005205482 .0004795182 -91 .022497 .020284 1 .02042589 .0020711122 -90 -.016696 -.015638 1 -.015808947 -.0008870526 -89 .015938 .014144 1 .014232416 .0017055843 -88 -.010554 -.010863 1 -.010992362 .00043836216 -87 .00508 .005529 1 .00554239 -.0004623905 -86 .024569 .02476 1 .024940867 -.000371867 -85 -.000359 -.000443 1 -.0004816178 .00012261781 -84 -.020012 -.018743 1 -.018940987 -.0010710129 -83 .005759 .004992 1 .005000714 .0007582858 -82 .003651 .001527 1 .0015055384 .0021454617 -81 -.019618 -.018481 1 -.018676706 -.0009412944 -80 -.016957 -.014154 1 -.014312023 -.0026449766 -79 -.003064 -.000664 1 -.0007045424 -.0023594575 -78 .000261 -.000189 1 -.0002254058 .0004864058 -77 -.011979 -.012301 1 -.012442886 .0004638859 -76 .020146 .019518 1 .019653216 .000492784 -75 .017685 .016517 1 .016626082 .0010589176 -74 .018118 .01648 1 .01658876 .0015292395 -73 -.004207 -.004666 1 -.004741395 .0005343949 -72 -.000227 -.000026 1 -.00006098628 -.0001660137 end
Code:
quietly describe
bys eventtime: gen N = _N
preserve
drop if eventtime>-11
collapse (sd) AR_MMM,by(firmid)
rename AR_MMM si
keep firmid si
save "M:\tmp"
restore
merge m:1 firmid using "M:\tmp"
erase "M:\tmp.dta"
drop if eventtime <-2
drop if eventtime >2
// Generate SAR
gen SAR = AR_MMM/si
** Create cumulative abnormal returns
sort firmid eventtime
by firmid: gen CAR_MMM =sum(AR_MMM)
** Create cumulative standardized abnormal returns
by firmid: gen CSAR_MMM = sum(SAR)
Array
Thank you in advance!
Calculating MHHI with triple summation
Hi everybody!
I'm looking to create a Modified Herfindahl-Hirschmann Index (MHHI) in STATA from the attached formula. It creates a market index that also take into account cross ownership of the companies in the sector. I have all the variables I need to execute the formula, I just simply have no idea how to do it in STATA.
At first I thought is should use loops to make STATA go through each company (as the formula describes) but i don't know how to make STATA aware of the relationship between the owners and companies, creating that kind of sumproduct that the formula prescribes for each owner in the company.
All inpust will be well appreciated!
Below is a some of my sample data (highlighted is the name of the variable in the formula):
input byte(Company_number_j Owner_number_i) double(Owner_share_gamma Market_Share_s)
1 1 .29 .2996783902352667
1 2 .12 .2996783902352667
1 3 .29 .2996783902352667
1 4 .29 .299678390235267
2 5 .5 .0466357542171158
2 5 .5 .0466357542171158
3 1 .29 .48362543478072323
3 2 .12 .48362543478072323
3 3 .29 .48362543478072323
3 4 .29 .48362543478072323
4 6 .05 .17006042076689426
4 7 .11 .17006042076689426
4 8 .73 .17006042076689426
4 9 .11 .17006042076689426
The formula:
Array
K is the other companies and therefore when j=1, k=2 then 3 then 4
The formula requires a triple summation but i think breaking it down into single summations will be an advantage.
I'm looking to create a Modified Herfindahl-Hirschmann Index (MHHI) in STATA from the attached formula. It creates a market index that also take into account cross ownership of the companies in the sector. I have all the variables I need to execute the formula, I just simply have no idea how to do it in STATA.
At first I thought is should use loops to make STATA go through each company (as the formula describes) but i don't know how to make STATA aware of the relationship between the owners and companies, creating that kind of sumproduct that the formula prescribes for each owner in the company.
All inpust will be well appreciated!
Below is a some of my sample data (highlighted is the name of the variable in the formula):
input byte(Company_number_j Owner_number_i) double(Owner_share_gamma Market_Share_s)
1 1 .29 .2996783902352667
1 2 .12 .2996783902352667
1 3 .29 .2996783902352667
1 4 .29 .299678390235267
2 5 .5 .0466357542171158
2 5 .5 .0466357542171158
3 1 .29 .48362543478072323
3 2 .12 .48362543478072323
3 3 .29 .48362543478072323
3 4 .29 .48362543478072323
4 6 .05 .17006042076689426
4 7 .11 .17006042076689426
4 8 .73 .17006042076689426
4 9 .11 .17006042076689426
The formula:
Array
K is the other companies and therefore when j=1, k=2 then 3 then 4
The formula requires a triple summation but i think breaking it down into single summations will be an advantage.
Generating a new variable but missing values have been included
Hi,
My code is:
However it has put a 1 in the new variable where there are missing values in var_33, rather than just where there is a score of 2 or greater in var_33.
Any advice?
Thanks
My code is:
Code:
generate var_66 = 1 if var_33 >=2
Any advice?
Thanks
Trying to create a new variable out of the response of the respondents in the survey ! Your help will be much appreciated
Hello Intelligent People,
The version of STATA i am using is 14.2 on a Windows 10
Here is a glimpse of what my data looks like
So basically i am trying to create a new variable i.e. no_method. I believe that this variable can be derived from the responses of the respondents. So if non of the respondents
have used any method, it will be counted as an observation of "no_method".
So from the data above, there are 4 respondents who have used no method at all
I would be most obliged to you if you could please let me know how to create a new variable no_method.
Kind Regards
The version of STATA i am using is 14.2 on a Windows 10
Here is a glimpse of what my data looks like
| methods1 | method2 | method3 | method4 | method5 |
| yes | no | yes | yes | yes |
| no | no | no | yes | no |
| no | yes | yes | yes | yes |
| no | no | no | no | no |
| yes | no | no | yes | yes |
| no | no | no | no | no |
| no | no | no | no | no |
| yes | yes | no | yes | no |
| no | no | no | no | no |
| yes | yes | no | no | yes |
have used any method, it will be counted as an observation of "no_method".
So from the data above, there are 4 respondents who have used no method at all
I would be most obliged to you if you could please let me know how to create a new variable no_method.
Kind Regards
Regression gives different results depending on the order of the independent variables
I am running what is essentially a difference in differences regression on a large dataset with a lot of fixed effects. Bizarrely (maddeningly, even), I get a slightly different coefficient on my main treatment effect depending on the order I provide the list of independent variables. The regression has individual level data, with county fixed effects, month fixed effects, and state-specific trends. The variable "treatment" is equal to 1 if a state-level policy has gone into effect in the person's state as of the current month. There is a separate set of trends for New York City, as NYC implemented its own policy.
Here's my code and output
I realize those are long regression commands, but if you look closely you'll see that they both have the same list of variables, just in a difference order. Both versions drop a few factor levels of i.month for collinearity, but they both drop the same ones. I get the same result using areg instead of xtreg. The regressions each take more than an hour to run, so trying different variations is cumbersome. The problem doesn't replicate if I use a random 0.5% subsample of my data. I'm running out of ideas here--anyone know what's going on? Really just want to know which version is more likely to be the "right" coefficient.
I'm running Stata/MP 15.1 on a Linux server with Red Hat 6.
Here's my code and output
Code:
. local conditions if month<=695 & state_cd <= 56 & birthyear_last!=. . qui xtreg success_14 treated i.month i.state#c.open_dt 1.NYC#c.open_dt i.state#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt i.state#c.open_dt#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt#c.open_dt sc RACE Hispanic EDUCATION Median_income i.bankrank birthyear_last `conditions' , fe vce(cluster state_cd) . disp _b[treated] .0064373 . qui xtreg success_14 treated sc RACE Hispanic EDUCATION Median_income i.bankrank i.month i.state#c.open_dt 1.NYC#c.open_dt i.state#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt i.state#c.open_dt#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt#c.open_dt birthyear_last `conditions' , fe vce(cluster state_cd) . disp _b[treated] .00803661
I'm running Stata/MP 15.1 on a Linux server with Red Hat 6.
Create categorical variable*
hi! new to stata/in a stats class... i have a dataset that has individual race dummy variables. Is there a way for me to merge all of the dummy variables onto one new categorical variable?
tobit regression with collinearity
Hi i am running a tobit regresison for data across 2 years 2007-2008:
My variables include 10 log price categories for alcohol types on trade and off trade : l_p_wine_on l_p_wine_off etc
I also have a log income variable : log_income
My dependent variables are the expenditure shares of the alcohol type expenditure divided by total expenditure : e.g exp_share_wine_on expshare_wine_off
I am looking at the price elasticities of demand and the cross price elasticities of demand vary across each alcohol type and vary across socio-economic groups, government regions and gender
My prices for alcohols are constant throughout the year (i am using the average year price) however they vary between years
here is a data-ex for some of my variables
I am then running a tobit regression as follows:
I have censored the data at zero since some households report no consumption of alcohol
However my results are as follows:
Q1. Why are my price variables other than the price of the same dependent variable omitted (as i am trying to work out cross price elasticity of demand) i understand its due to collinearity but what is causing this and how do i overcome it?
Q2. Why is the year dummy variable omitted?
I am following a model which has done close to the same thing and they didn't have this problem
Thanks so much in advance
My variables include 10 log price categories for alcohol types on trade and off trade : l_p_wine_on l_p_wine_off etc
I also have a log income variable : log_income
My dependent variables are the expenditure shares of the alcohol type expenditure divided by total expenditure : e.g exp_share_wine_on expshare_wine_off
I am looking at the price elasticities of demand and the cross price elasticities of demand vary across each alcohol type and vary across socio-economic groups, government regions and gender
My prices for alcohols are constant throughout the year (i am using the average year price) however they vary between years
here is a data-ex for some of my variables
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(l_p_wine_on l_p_beer_on l_p_spirits_on l_p_wine_off l_p_spirits_off > l_p_beer_off expshare_wine_on expshare_beer_off logincome) byte(socio_group > gor) int year byte sexhrp .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.433789 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .01142119 > 5.898746 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.898213 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0550356 .015000853 > 6.399842 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .0016348386 > 5.584012 3 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .015073973 > 7.020905 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.225338 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.911331 3 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.219934 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.533279 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.2492094 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .00609936 > 6.168564 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.835587 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.940566 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .006249688 > 5.331317 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.786775 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .003858888 > 7.201894 2 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.476967 2 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.009435 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .010382757 > 6.377679 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.982862 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.11283 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .0023888294 > 6.279646 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .001813489 > 6.294915 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .005435922 > 6.704463 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.747566 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .005957043 > 6.11456 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .014408222 .016718158 > 6.605068 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .018981254 0 > 6.019785 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.088818 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.779476 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .008590408 > 6.514719 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .005628793 .018012136 > 6.960443 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .005657709 > 6.424075 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.920457 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .008473212 0 > 6.898255 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .02177079 > 5.623837 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.812526 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.182973 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.514611 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.109314 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.362559 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.30903 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.26414 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.3593974 3 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .017695729 0 > 4.77104 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.069847 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .01336186 > 6.690271 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.80408 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.628306 5 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .016276948 > 6.522627 5 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.519619 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .035966147 > 6.422951 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.557673 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.602438 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.402017 3 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0029820926 0 > 7.401286 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .015186014 > 7.176426 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.746554 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.474176 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .01607261 > 6.874416 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .010095213 > 6.662046 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .024986824 .0423164 > 6.069906 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .001250104 0 > 7.438652 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .03693495 0 > 7.021414 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .02268917 > 6.5658 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.958667 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.192117 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0016070686 0 > 5.815264 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.34921 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.279 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.516609 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.554516 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.347932 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .01880577 0 > 5.93925 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .07133046 > 6.985651 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .005728897 .005415315 > 7.12227 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .01053234 .021376746 > 6.8088 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0019776237 0 > 6.519822 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.490757 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.787439 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.457868 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.921752 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 7.098411 2 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.400603 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .013181653 0 > 6.857086 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .003744323 0 > 6.710182 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.136498 6 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .02200635 0 > 7.438652 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0019496685 0 > 6.911319 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .0006786454 > 6.854755 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 7.438652 2 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 6.609726 1 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .004789272 0 > 6.868133 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.182907 4 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.823194 1 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 4.812526 6 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .04808098 > 5.530222 4 2 2007 2 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 0 > 5.793585 3 2 2007 1 .60158 -.010050327 .662688 -.9416085 -.967584 -.967584 0 .11235794 > 6.436151 1 2 2007 2 end label values gor gor label def gor 2 "north west", modify label values sexhrp sexhrp label def sexhrp 1 "male", modify label def sexhrp 2 "female", modify
Code:
tobit expshare_wine_on l_p_wine_on l_p_beer_on l_p_cider_on l_p_spirits_on l_p_alcopops_on l_p_wine_off l_p_beer_off l_p_spirits_off l_p_cider_off l_p_alcopops_off logincome i.socio_group i.gor i.year i.sexhrp , ll(0)
However my results are as follows:
Code:
. tobit expshare_wine_on l_p_wine_on l_p_beer_on l_p_cider_on l_p_spirits_on l_p_alcopops_on l_p_wine_off
> l_p_beer_off l_p_spirits_off l_p_cider_off l_p_alcopops_off logincome i.socio_group i.gor i.year i.sex
> hrp , ll(0)
note: l_p_beer_on omitted because of collinearity
note: l_p_cider_on omitted because of collinearity
note: l_p_spirits_on omitted because of collinearity
note: l_p_alcopops_on omitted because of collinearity
note: l_p_wine_off omitted because of collinearity
note: l_p_beer_off omitted because of collinearity
note: l_p_spirits_off omitted because of collinearity
note: l_p_cider_off omitted because of collinearity
note: l_p_alcopops_off omitted because of collinearity
note: 2008.year omitted because of collinearity
Refining starting values:
Grid node 0: log likelihood = -5976.9775
Fitting full model:
Iteration 0: log likelihood = -5976.9775
Iteration 1: log likelihood = -640.92644
Iteration 2: log likelihood = 1103.185
Iteration 3: log likelihood = 1808.8673
Iteration 4: log likelihood = 1909.2432
Iteration 5: log likelihood = 1910.562
Iteration 6: log likelihood = 1910.5625
Iteration 7: log likelihood = 1910.5625
Tobit regression Number of obs = 11,962
Uncensored = 2,312
Limits: lower = 0 Left-censored = 9,650
upper = +inf Right-censored = 0
LR chi2(14) = 927.97
Prob > chi2 = 0.0000
Log likelihood = 1910.5625 Pseudo R2 = -0.3207
-------------------------------------------------------------------------------------------
expshare_wine_on | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
l_p_wine_on | -.028118 .0162136 -1.73 0.083 -.0598992 .0036632
l_p_beer_on | 0 (omitted)
l_p_cider_on | 0 (omitted)
l_p_spirits_on | 0 (omitted)
l_p_alcopops_on | 0 (omitted)
l_p_wine_off | 0 (omitted)
l_p_beer_off | 0 (omitted)
l_p_spirits_off | 0 (omitted)
l_p_cider_off | 0 (omitted)
l_p_alcopops_off | 0 (omitted)
logincome | .0125922 .0006706 18.78 0.000 .0112778 .0139066
|
socio_group |
2 | .0014811 .0010997 1.35 0.178 -.0006745 .0036368
3 | -.0078991 .0012672 -6.23 0.000 -.0103829 -.0054152
4 | -.0098159 .003836 -2.56 0.011 -.0173351 -.0022968
5 | .0065436 .0035439 1.85 0.065 -.0004031 .0134903
6 | -.0027114 .0010429 -2.60 0.009 -.0047556 -.0006672
|
gor |
north west | -.0004291 .0014892 -0.29 0.773 -.0033481 .00249
merseyside | -.0009579 .0014654 -0.65 0.513 -.0038303 .0019145
yorkshire and the humber | .0017352 .0014609 1.19 0.235 -.0011284 .0045987
east midlands | -.0012567 .0021903 -0.57 0.566 -.00555 .0030366
west midlands | -.0024181 .0018331 -1.32 0.187 -.0060112 .0011751
eastern | -.0016493 .0017609 -0.94 0.349 -.005101 .0018023
|
year |
2008 | 0 (omitted)
|
sexhrp |
female | .0011694 .000803 1.46 0.145 -.0004046 .0027435
_cons | -.0853166 .0110156 -7.75 0.000 -.106909 -.0637242
--------------------------+----------------------------------------------------------------
var(e.expshare_wine_on)| .0007865 .0000268 .0007357 .0008408
------------------------------------------------------------------------------------------
Q2. Why is the year dummy variable omitted?
I am following a model which has done close to the same thing and they didn't have this problem
Thanks so much in advance
Logic for using matched sample versus unmatched sample
Hi all,
I am stuck on a problem of when to use a matched versus unmatched sample. I have a list of companies and I am trying figure out the effect of a treatment (a policy change in this case) on company financial performance (the dependent variable). I will use difference-in-differences for the analysis.
I do not know when (generally speaking) a matched sample design is better/worse than an unmatched sample design. To my understanding, matching "controls" for confounding variables before running regression models. Alternatively, using an unmatched sample, one would add the control variables into the regression model to "control" for these factors.
But broadly speaking, can someone please help me decide if I should use a matched sample with this case (pros/cons) and if so/not, why?
Roger C.
I am stuck on a problem of when to use a matched versus unmatched sample. I have a list of companies and I am trying figure out the effect of a treatment (a policy change in this case) on company financial performance (the dependent variable). I will use difference-in-differences for the analysis.
I do not know when (generally speaking) a matched sample design is better/worse than an unmatched sample design. To my understanding, matching "controls" for confounding variables before running regression models. Alternatively, using an unmatched sample, one would add the control variables into the regression model to "control" for these factors.
But broadly speaking, can someone please help me decide if I should use a matched sample with this case (pros/cons) and if so/not, why?
Roger C.
no observations error event study
Dear Stata community,
I'm trying to run a loop to generate the normal returns for my event study, however I run into the following error: no observations r(2000);
A sample of the data that I'm using:
My current code is looking as follows:
Thanks in advance
I'm trying to run a loop to generate the normal returns for my event study, however I run into the following error: no observations r(2000);
A sample of the data that I'm using:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(firmid date) int eventtime double(return marketreturn) 10104 20150928 -171 -.027066 -.025666 10104 20150929 -170 -.000902 .001233 10104 20150930 -169 .019445 .019076 10104 20151001 -168 .001948 .001974 10104 20151002 -167 .015204 .014315 10104 20151005 -166 .018788 .01829 10104 20151006 -165 -.003079 -.003588 10104 20151007 -164 .009817 .008036 10104 20151008 -163 .008534 .008818 10104 20151009 -162 .001437 .000725 10104 20151012 -161 -.000226 .001276 10104 20151013 -160 -.007799 -.006825 10104 20151014 -159 -.004298 -.004716 10104 20151015 -158 .014726 .014853 10104 20151016 -157 .003398 .00457 10104 20151019 -156 -.000676 .000271 10104 20151020 -155 -.001174 -.001421 10104 20151021 -154 -.00803 -.005825 10104 20151022 -153 .013958 .016628 10104 20151023 -152 .009512 .01103 10104 20151026 -151 -.002833 -.001913 10104 20151027 -150 -.004484 -.002554 10104 20151028 -149 .013662 .01184 10104 20151029 -148 -.00221 -.00045 10104 20151030 -147 -.004013 -.00481 10104 20151102 -146 .012462 .011874 10104 20151103 -145 .003109 .002728 10104 20151104 -144 -.003391 -.003545 10104 20151105 -143 -.001398 -.001132 10104 20151106 -142 -.000379 -.000348 10104 20151109 -141 -.009617 -.009823 10104 20151110 -140 .00129 .001511 10104 20151111 -139 -.004136 -.003228 10104 20151112 -138 -.014517 -.01399 10104 20151113 -137 -.009785 -.011207 10104 20151116 -136 .013755 .014903 10104 20151117 -135 -.001368 -.001339 10104 20151118 -134 .015289 .016162 10104 20151119 -133 -.000995 -.001123 10104 20151120 -132 .003012 .00381 10104 20151123 -131 -.000676 -.001235 10104 20151124 -130 .002372 .001222 10104 20151125 -129 .001165 -.000129 10104 20151127 -128 .00078 .000594 10104 20151130 -127 -.003856 -.004641 10104 20151201 -126 .009524 .010681 10104 20151202 -125 -.010729 -.010996 10104 20151203 -124 -.01433 -.014374 10104 20151204 -123 .016455 .020526 10104 20151207 -122 -.009795 -.00699 10104 20151208 -121 -.006114 -.00649 10104 20151209 -120 -.0066 -.007739 10104 20151210 -119 .00205 .002251 10104 20151211 -118 -.019851 -.019423 10104 20151214 -117 .001597 .004756 10104 20151215 -116 .011313 .010619 10104 20151216 -115 .014906 .014515 10104 20151217 -114 -.014615 -.015041 10104 20151218 -113 -.015068 -.017797 10104 20151221 -112 .007152 .007778 10104 20151222 -111 .008839 .008817 10104 20151223 -110 .013685 .012418 10104 20151224 -109 -.000831 -.001599 10104 20151228 -108 -.003328 -.002179 10104 20151229 -107 .010021 .01063 10104 20151230 -106 -.007376 -.007217 10104 20151231 -105 -.008202 -.009412 10104 20160104 -104 -.0149 -.015304 10104 20160105 -103 .001397 .002012 10104 20160106 -102 -.013822 -.013115 10104 20160107 -101 -.02396 -.0237 10104 20160108 -100 -.010925 -.010838 10104 20160111 -99 -.001716 .000853 10104 20160112 -98 .006078 .007803 10104 20160113 -97 -.025859 -.024965 10104 20160114 -96 .015461 .016696 10104 20160115 -95 -.021697 -.021599 10104 20160119 -94 -.002418 .000532 10104 20160120 -93 -.010609 -.011694 10104 20160121 -92 .005685 .005195 10104 20160122 -91 .022497 .020284 10104 20160125 -90 -.016696 -.015638 10104 20160126 -89 .015938 .014144 10104 20160127 -88 -.010554 -.010863 10104 20160128 -87 .00508 .005529 10104 20160129 -86 .024569 .02476 10104 20160201 -85 -.000359 -.000443 10104 20160202 -84 -.020012 -.018743 10104 20160203 -83 .005759 .004992 10104 20160204 -82 .003651 .001527 10104 20160205 -81 -.019618 -.018481 10104 20160208 -80 -.016957 -.014154 10104 20160209 -79 -.003064 -.000664 10104 20160210 -78 .000261 -.000189 10104 20160211 -77 -.011979 -.012301 10104 20160212 -76 .020146 .019518 10104 20160216 -75 .017685 .016517 10104 20160217 -74 .018118 .01648 10104 20160218 -73 -.004207 -.004666 10104 20160219 -72 -.000227 -.000026 end
My current code is looking as follows:
Code:
su firmid
local N `r(max)'
//Generate normal returns variable
gen NR_MMM =.
//Creating a loop to generate normal returns
forvalues i = 1/`N'{
display `i'/`N'
** Market Model method (MMM)
quietly reg return marketreturn if (firmid==`i' & eventtime<=-1)
quietly predict r if firmid==`i'
quietly replace NR_MMM = r if firmid==`i'
quietly drop r
}
gen AR_MMM = return - NR_MMM
GMM estimation inside a do loop with foreach
Dear Friends,
I am trying to use GMM to estimate a model that is nonlinear in its parameters.
The model has 3 parameters: rho, sigma1, and sigma2.
I would like to obtain model coefficients, standard errors, and the value of the objective function for different combinations of starting values.
To accomplish this, I set up three do loops, one within the other, and let rho vary according the counter "j", sigma1 vary according to the counter, "k", and sigma2 vary according to the counter "l."
The code is below. The code works. However, I have two questions:
a. How can I get the values of j,k, and l to be displayed after each step in the loop
For example, when I include display j, in the code below, I get "j not found" and the program exits.
b. Sometimes, GMM does not converge and I get the following message, "flat or discontinuous region encountered" and the program exits the do loop. Is there some way I can include
a line which causes the program to go to the next loop increment, rather than exiting the program? In fortran, I could do this with an error handling condition.
Thanks so much!
Srinivasan Rangan
Code:
foreach j in 0.1 0.2 0.3 0.4 0.5 {
foreach k in 5 10 {
foreach l in 5 10 {
gmm (Eq1:abn_ret2_unwinsored_stage1_w - (1 + {rho}*{sigma1})*ue_p_scaled_centered_w - ({sigma2})*(sqrt(1-{rho}^2))*tq2_w_centered) if sample_to_use == 3, instruments(ue_p_scaled_centered_w tq2_centered_w lag2mret_w lag2chusd_w) winitial(identity) vce(cluster cnum) from(rho `j' sigma1 `k' sigma2 `l') technique(bfgs) wmatrix(unadjusted)
mat list e(b)
mat list e(V)
display e(Q)
}
}
}
I am trying to use GMM to estimate a model that is nonlinear in its parameters.
The model has 3 parameters: rho, sigma1, and sigma2.
I would like to obtain model coefficients, standard errors, and the value of the objective function for different combinations of starting values.
To accomplish this, I set up three do loops, one within the other, and let rho vary according the counter "j", sigma1 vary according to the counter, "k", and sigma2 vary according to the counter "l."
The code is below. The code works. However, I have two questions:
a. How can I get the values of j,k, and l to be displayed after each step in the loop
For example, when I include display j, in the code below, I get "j not found" and the program exits.
b. Sometimes, GMM does not converge and I get the following message, "flat or discontinuous region encountered" and the program exits the do loop. Is there some way I can include
a line which causes the program to go to the next loop increment, rather than exiting the program? In fortran, I could do this with an error handling condition.
Thanks so much!
Srinivasan Rangan
Code:
foreach j in 0.1 0.2 0.3 0.4 0.5 {
foreach k in 5 10 {
foreach l in 5 10 {
gmm (Eq1:abn_ret2_unwinsored_stage1_w - (1 + {rho}*{sigma1})*ue_p_scaled_centered_w - ({sigma2})*(sqrt(1-{rho}^2))*tq2_w_centered) if sample_to_use == 3, instruments(ue_p_scaled_centered_w tq2_centered_w lag2mret_w lag2chusd_w) winitial(identity) vce(cluster cnum) from(rho `j' sigma1 `k' sigma2 `l') technique(bfgs) wmatrix(unadjusted)
mat list e(b)
mat list e(V)
display e(Q)
}
}
}
cross classified multilevel structure
Dear Stata Forum,
I am using mixed estimating a cross-classified multilevel model with three levels, where I have approximately 100 000 observations divided into 12 cohorts cross-classified with 8 periods both nested in 9 countries. When I run the code:
it seems Stata cross-classifies also the country and period variable (8 periods * 9 countries = 72 groups), instead of simply estimating country as a third level above the second (cohort-period cross-classified) level.
What would be the right way to write the syntax so that Stata would not cross-classify country with period?
Thank you very much in advance for your answers!
I am using mixed estimating a cross-classified multilevel model with three levels, where I have approximately 100 000 observations divided into 12 cohorts cross-classified with 8 periods both nested in 9 countries. When I run the code:
Code:
mixed DV IV1 i.IV2 || _all: R.cohort || period: || country: , variance
------------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+--------------------------------------------
_all | 1 114,788 114,788.0 114,788
essround | 8 13,509 14,348.5 14,887
country | 72 1,164 1,594.3 2,289
What would be the right way to write the syntax so that Stata would not cross-classify country with period?
Thank you very much in advance for your answers!
Idiosyncratic Volatility - Rolling Window
Hello everyone!
I am trying to compute idiosyncratic volatility on a rolling basis of 24 months with monthly data. I have created a month/year variable called ymdate.
This has been my code so far:
bys perm : asreg Excess_USD_w MKT SMB HML, wind(ymdate 24)
gen residuals = Excess_USD_w - _b_cons - _b_MKT*MKT - _b_SMB*SMB - _b_HML*HML
bys perm: egen IVOL=sd (residuals)
In the end, I want to look at each year. However, my min and max values are the same for multiple years, not just for two years as specified by my rolling window. In addition, my mean value just keep on increasing.
Would anyone know what I have done wrong? Thank you very much in advance!
I am trying to compute idiosyncratic volatility on a rolling basis of 24 months with monthly data. I have created a month/year variable called ymdate.
This has been my code so far:
bys perm : asreg Excess_USD_w MKT SMB HML, wind(ymdate 24)
gen residuals = Excess_USD_w - _b_cons - _b_MKT*MKT - _b_SMB*SMB - _b_HML*HML
bys perm: egen IVOL=sd (residuals)
In the end, I want to look at each year. However, my min and max values are the same for multiple years, not just for two years as specified by my rolling window. In addition, my mean value just keep on increasing.
Would anyone know what I have done wrong? Thank you very much in advance!
Mean if respondents have answerd atleast out of 3 variables
Hi I would reaaly be grateful if anyone can help with providing a code for this.
I have 3 sets of variables relating to the same thing. I was wondering what code I can use to calculate the mean of these variables if the respondents answered at least 2 of them. Many Thanks
I have 3 sets of variables relating to the same thing. I was wondering what code I can use to calculate the mean of these variables if the respondents answered at least 2 of them. Many Thanks
New to STATA | Linear regression on STATA
Hi,
I am new to STATA and I am trying to do a linear regression analysis for a college project.
I am getting the error message, "matrix not positive definite" when I run the 'reg' command
Further, when I run the 'vif' command to check for multicollinearity the error message reads, "not appropriate after regress, nocons;
use option uncentered to get uncentered VIFs"
Please help
I am new to STATA and I am trying to do a linear regression analysis for a college project.
I am getting the error message, "matrix not positive definite" when I run the 'reg' command
Further, when I run the 'vif' command to check for multicollinearity the error message reads, "not appropriate after regress, nocons;
use option uncentered to get uncentered VIFs"
Please help
cross-correlations in long format
Hi guys, I am currently working on a dataset in long format and I need the average corss-correlation of all variables. I have tried to convert the dataset from long to wide using the reshape code but Stata returns 'values of variable t not unique within firmid'. I also tried it with a loop with the command xcorr, but I cannot find the correct way to do it. Can someone please help me?
The dataset looks like this:
firmid t return
1 -20 0,02
1 -19 -0,1
1 -18 0,014
1 -17 etc.
2 -20
2 -19
2 -18
2 -17
etc...
So I need the average cross-correlation at time -20, the average cross-correlation at time -19 etc stored as a new variable.
Huub
The dataset looks like this:
firmid t return
1 -20 0,02
1 -19 -0,1
1 -18 0,014
1 -17 etc.
2 -20
2 -19
2 -18
2 -17
etc...
So I need the average cross-correlation at time -20, the average cross-correlation at time -19 etc stored as a new variable.
Huub
Converting XY coordinates into lat-long
Hi,
I have a data set that records the pick-up and drop-off locations of a particular item. The coordinates of these locations are in the form of XY coordinates, and I would like to calculate the distance, in kms, between the two locations using "geodist." To do the same, I have to convert the XY coordinates to Lat-Long coordinates. Appreciate any help to figure out how to convert XY coordinates to Lat-Long coordinates.
Thank you!
I have a data set that records the pick-up and drop-off locations of a particular item. The coordinates of these locations are in the form of XY coordinates, and I would like to calculate the distance, in kms, between the two locations using "geodist." To do the same, I have to convert the XY coordinates to Lat-Long coordinates. Appreciate any help to figure out how to convert XY coordinates to Lat-Long coordinates.
Thank you!
Independence assumption in Cross-Classified Multilevel Models?
Dear everyone,
In cross-classified (multilevel) models, the cross-classified variance components are generally assumed to be independent. Does anyone know of any (published or unpublished) discussion of this assumption or how deviations from independence may affect the results?
Best regards,
Are Skeie Hermansen
University of Oslo
In cross-classified (multilevel) models, the cross-classified variance components are generally assumed to be independent. Does anyone know of any (published or unpublished) discussion of this assumption or how deviations from independence may affect the results?
Best regards,
Are Skeie Hermansen
University of Oslo
Monday, April 29, 2019
How to choose which duplicates to drop/keep?
Hi there,
I'm working on data from 2012-2018 concerning HIV-positive pregnant women in treatment and loss to follow-up. For those who have been in treatment more than once in relation to having more children, their ID occurs two, three or four times. The information attached to the duplicate ID's are, among other, start and end date (some end dates are missing) of treatment. I need to maintain the "latest" IDs and date variables for those that occur more than once in order to trace their lastes contact (among those where end date is missing) with the clinic and register whether or not they are LTFU. Each woman is only supposed to occur once during the study period. If I make a simple "duplicates drop idp, force" I will have to use the start date of their first treatment and their latest contact with the clinic from later visits to calculates their follow-up time which then will be too long.
Any thoughts? (I'm using Stata 14 on a Mac)
Thank you!
Best regards, Laura
Array
I'm working on data from 2012-2018 concerning HIV-positive pregnant women in treatment and loss to follow-up. For those who have been in treatment more than once in relation to having more children, their ID occurs two, three or four times. The information attached to the duplicate ID's are, among other, start and end date (some end dates are missing) of treatment. I need to maintain the "latest" IDs and date variables for those that occur more than once in order to trace their lastes contact (among those where end date is missing) with the clinic and register whether or not they are LTFU. Each woman is only supposed to occur once during the study period. If I make a simple "duplicates drop idp, force" I will have to use the start date of their first treatment and their latest contact with the clinic from later visits to calculates their follow-up time which then will be too long.
Any thoughts? (I'm using Stata 14 on a Mac)
Thank you!
Best regards, Laura
Array
Replacing values of multiple variables at once
I have 9 variables (v1 - v9) with binary responses coded as yes = 1 no =2.
I want to replace all he values of no to 0 for all variables at once.
Can anyone help me with the codes.
I want to replace all he values of no to 0 for all variables at once.
Can anyone help me with the codes.