Tuesday, April 30, 2019

Esttab Output File not generated properly

I am writing the following code:

local variables= "taxratesales taxrateVAT"
estpost tabstat `variables', statistics(mean p50 sd min max n) columns(statistics)
esttab using temp, replace cells("Mean Median SD Min Max N") nomtitle nonumber


On the stata window: it shows me the following output:


Summary statistics: mean p50 sd min max count
for variables: taxratesales taxrateVAT

| e(mean) e(p50) e(sd) e(min) e(max) e(count)
-------------+------------------------------------------------------------------
taxratesales | 8.288876 8 4.568087 0 115 88935
taxrateVAT | 7.30818 5 4.8678 0 65 93709


. esttab using temp, replace cells("Mean Median SD Min Max N") nomtitle nonumber
(note: file temp.txt not found)
(output written to temp.txt)


But when I open the .txt file that is generated, I see the following:

------------------------------------------------------------------------------------------
Mean Median SD Min Max N
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
N 85992
------------------------------------------------------------------------------------------


Can anyone explain why this happened?

Thanks

marginal effect of latent class logit model

I am trying to obtain the marginal effect of the membership variable in latent class logit model Array

(e.g. if the age is more elder, what would be the probability of being in class 1 and class2 ) , i can't find any command

Propensity score matching

Hi all,

I am currently trying to run PSM on STATA14 to compare effect of treatment 1 vs treatment 2 (control). I have a few questions before attempting the analysis:

1. How do I start the matching? Or am I right to say the teffects command would match treatment and control groups?
2. To work out the standardised difference, I presume I have to work out the means for each covariates prior to matching and then run the pstest command to obtain the means and standardised bias/ difference post-matching?
3. After analysis for example with teffect, unmatched group is displayed. However, I am more interested in the control group. Are they the same? If not how do I gather that?
4. How do I obtain the number of observations in both treatment and control groups post-matching?

Sorry for all the questions. I am relatively new to STATA so would appreciate if you guys can help guide me with this.

Regards
Ken

Understanding Stata's default yscale() choice

Hi everyone,

I am trying to find out how Stata chooses the exact default range of the y-axis. The article "Stata tip 23: Regaining control over axis range" (The Stata Journal (2005), 5, Number 3, pp. 467-468) notes that "to determine the range of an axis, Stata begins with the minimum and maximum of the data. Then it will widen (but never narrow) the axis range as instructed byrange(). Finally, it will widen the axis if necessary to accommodate any axis labels."

Does anyone have additional insights on how Stata determines how much white space to leave below the minimum and above the maximum respectively for a simple plot command without additional user-specifications?

Many thanks,
Christina


Error message on assigning different methods of obtaining start values for latent class analysis

Hi all,

I am running a 3-class latent class model on 17 categorical indicators and keep getting error messages. The total sample size is 2771.

Initially I ran the following codes:
gsem (gn10_category gn11_category gn12_category gn13_category gn16_category gn18_category gn19_category gn20_category gn21_category gn22_category gn23_category gn25_category gn27_category gn39_category gn40_category gn41_category gn44_category <-) if complete == 1, ologit lclass(C 3)
And got error message "initial value not feasible". Then I went back to Stata manuscript (intro 12 — Convergence problems and how to solve them) and learned to use -startvalues()- option to assign different method to obtain initial values. Then I typed the following codes:
gsem (gn10_category gn11_category gn12_category gn13_category gn16_category gn18_category gn19_category gn20_category gn21_category gn22_category gn23_category gn25_category gn27_category gn39_category gn40_category gn41_category gn44_category <-) if complete == 1, ologit lclass(C 3) startvalues(iv)
But I got the error message "option startvalues() invalid; method iv not allowed". I could not figure out grammar errors in my codes. I greatly appreciate your any provided clues to help resolve the issue.

Many thanks in advance.

Mengmeng

Adding controls, insignificant treatment var turns significant

Dear friends,

I am running a Fixed Effects regression to test the effect of a policy on firms' patent application. The outcome variable is number of patents.

I first run the simple model with FEs but without any controls:

Code:
Conditional fixed-effects Poisson regression    Number of obs     =     47,536
Group variable: firm_id                         Number of groups  =      7,670

                                                Obs per group:
                                                              min =          2
                                                              avg =        6.2
                                                              max =          9

                                                Wald chi2(9)      =     469.47
Log pseudolikelihood  = -36500.224              Prob > chi2       =     0.0000

                                    (Std. Err. adjusted for clustering on firm_id)
----------------------------------------------------------------------------------
                 |               Robust
 application_num |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
         treated |  -.0541731   .2172701    -0.25   0.803    -.4800147    .3716685
                 |
application_year |
           1999  |   .5463735   .1321019     4.14   0.000     .2874586    .8052884
           2000  |   1.042662   .2174868     4.79   0.000     .6163956    1.468928
           2001  |    1.35629   .3345565     4.05   0.000     .7005712    2.012008
           2002  |   2.180516   .3422223     6.37   0.000     1.509772    2.851259
           2003  |   2.626014   .3407459     7.71   0.000     1.958164    3.293864
           2004  |   2.883888   .3663026     7.87   0.000     2.165948    3.601827
           2005  |   3.217897   .3785679     8.50   0.000     2.475918    3.959877
           2006  |   3.594803   .4207512     8.54   0.000     2.770146     4.41946
----------------------------------------------------------------------------------
Now the Treated variable is statistically insignificant.
I then add controls:

Code:
Conditional fixed-effects Poisson regression    Number of obs     =     45,948
Group variable: firm_id                         Number of groups  =      7,583

                                                Obs per group:
                                                              min =          2
                                                              avg =        6.1
                                                              max =          9

                                                Wald chi2(13)     =    1555.79
Log pseudolikelihood  = -22604.768              Prob > chi2       =     0.0000

                                    (Std. Err. adjusted for clustering on firm_id)
----------------------------------------------------------------------------------
                 |               Robust
 application_num |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
         treated |   .1731523   .0770954     2.25   0.025     .0220481    .3242564
total_assets_log |    .324462   .0806509     4.02   0.000     .1663891    .4825349
total_profit_log |  -.6414265   .1401608    -4.58   0.000    -.9161365   -.3667165
  cum_claims_log |   1.192533   .0713113    16.72   0.000     1.052765      1.3323
         age_log |  -.1982098   .1659591    -1.19   0.232    -.5234837    .1270642
                 |
application_year |
           1999  |   .2394592   .0855993     2.80   0.005     .0716876    .4072308
           2000  |   .2442759   .0859471     2.84   0.004     .0758226    .4127292
           2001  |   .1151367   .1044195     1.10   0.270    -.0895217    .3197951
           2002  |   .1297499   .1482964     0.87   0.382    -.1609057    .4204054
           2003  |   .0002627   .1531101     0.00   0.999    -.2998277     .300353
           2004  |  -.2282316   .1756614    -1.30   0.194    -.5725216    .1160585
           2005  |  -.4450677   .2097618    -2.12   0.034    -.8561931   -.0339422
           2006  |  -.7040352   .2517939    -2.80   0.005    -1.197542   -.2105283
----------------------------------------------------------------------------------
As you can see, after adding controls, the Treated variable turns statistically significant.

Can anyone help me understand why this is the case? I know that often time adding controls turns a significant variable insignificant as the controls can absorb some explanation power. But I just can't figure out why in my case, a insignificant variable becomes significant once controls are added.

And which result shall I trust? Does the policy really have a significant impact on firms' patent application?

Thank you very much!

CAPM OLS event study calculating CAR for multiple companies at the time

Hello Stata friends,

I am conducting a CAPM event study with 34 events and eight different companies. I would like to compare the different CARs for each of the 34 events.
All the data to compute CAR is in stata. The events are marked from 1 to 34.


Is there any convenient and elegant way to compute this?
Is there a possibility to compute at least the CAR for all eight companies per event?


Any help would be highly appreciated and will lead to infinite thankfulness.

All the best,
Lulas

Time Dummy Variables in stcox

Hi,

I am analyzing the factors that affect the time-on-market when selling a house using -stcox-.

Because my data set includes houses that were listed anytime between 2012 and 2018, I'd like to allow the baseline hazard to vary for each year. I considered stratifying the regression by the -strata()- option but as part of my research I want to observe the "time" effects.

I then constructed dummy variables for each year (2012=0), ran -stcox- but got a very low hazard ratios for the last year (2018).

My questions are:
  1. Is my approach to include the time dummy variables correct?
  2. Is there any reason why I got such a low hazard ratio for 2018? Could it be related to the fact that my data set includes right-censored observations?
Here is my code:
Code:
stcox log_size log_price house_age i.year

         failure _d:  isSold == 1
   analysis time _t:  NumOfMntsOnMarket
                 id:  HouseID

Iteration 0:   log likelihood = -48285.125
Iteration 1:   log likelihood = -47533.344
Iteration 2:   log likelihood = -47472.034
Iteration 3:   log likelihood = -47466.515
Iteration 4:   log likelihood = -47466.406
Iteration 5:   log likelihood = -47466.406
Refining estimates:
Iteration 0:   log likelihood = -47466.406

Cox regression -- Breslow method for ties

No. of subjects =         6925                     Number of obs   =      6925
No. of failures =         5921
Time at risk    =        18419
                                                   LR chi2(9)      =   1637.44
Log likelihood  =   -47466.406                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    log_size |   .7465455   .0093898   -23.24   0.000     .7283668    .7651779
   log_price |    .732612   .0354403    -6.43   0.000     .6663416    .8054732
   house_age |   1.001557   .0006235     2.50   0.012     1.000336     1.00278
             |
        year |
       2013  |   1.186201     .04516     4.49   0.000      1.10091    1.278099
       2014  |   1.173349   .0468697     4.00   0.000      1.08499    1.268904
       2015  |   1.263186   .0541346     5.45   0.000     1.161418    1.373872
       2016  |   1.123802   .0608042     2.16   0.031      1.01073    1.249524
       2017  |   .4400602    .030792   -11.73   0.000     .3836644    .5047458
       2018  |    .151731     .01692   -16.91   0.000     .1219422    .1887968
------------------------------------------------------------------------------

Why is the mixed command slow? Can it be sped up?

I'm attempting to do a simulation comparing resutls from a random effects model with a random intercept to some other regression models, but the mixed command slows down my simulation too much to be useful. In my experience running mixed effects commands in SAS is relatively quick, and the LEMR command in R is relatively quick. Why is mixed so slow, and can it be sped up?

For context, I'm using stata 14. I have a very simple model that I run with "mixed y T || cluster_id:"

Thanks.

Population Variable Signifcance

I am currently analysing a Log-Lin model, in which I am adding the independent variable LN Pop to test the significance within the regression model.
The regression output within stata states the p value > 0.05 significance level, therefore i do not reject the Null hypothesis, implying LN Pop has significance.
However the stata output for the lincom command regarding LN DPI and LN POP where DPI is consumers real income = total personal expenditure + real savings) provides a p value of 0.000.

The Null hypothesis i am testing is: Ho = LogPop + Beta y (income elasticity) = 1
The stata output states i should reject the Null, with Population having no significance, I am asking for help on how population can both be significant and insignificant and what is the implication of this? whether further regression is needed using per capita variations rather than aggregate.

Apologies for any confusion in the question, I am new and unfamiliar to stata and econometric regression, any help is appreciated.

Thanks,

J

Regression on unbalanced panel

Hi,

I'm currently working on impact of risk - taking behavior on firm growth, and here is my panel: (the number wasn't real)
NAME YEAR GROWTH INCOME AGE SIZE RISK PERFORMANCE
A 2000 12 8 6 6 3 4
A 2001 14 6 7 6 2 3
A 2002 15 9 7 6 2 2
A 2003 16 5 6 4 3 3
B 2000 14 3 4 3
B 2001 17 2 3 4
B 2002 13 5 2 2
B 2003 12 3 5 9
C 2000 22 2 6 3
C 2001 17 7 7 4
C 2002 22 4 4 5
C 2003 34 4 3 7
My panel is unbalanced.

According to XU Peng's paper, Risk taking and firm growth
RISK: the standard deviation of EBITDA(t)/Assets(t) over 4 years.
Performance : sum of EBITDA(t)/Assets(t) over 4 2000-2003.

So I calculated EBITDA/Assets of each firm each year.
Performance of 2000= sum(Ebitda/assets firm A 2000, ebitda/assets firm B 2000 and so on)
RISK 2000= stdev.p(Ebitda/assets firm A 2000, ebitda/assets firm B 2000 and so on)

4 rows, 4 year of RISK and PERFORMANCE

So my questions are:
1. Did I calculate RISK and PERFORMANCE right?
2. How can I regress those, with
growth= risk + control variables
performance=risk +control variable
risk = age + size + ownership+ leverage + income
3. How about auto-correlation Wooldridge test, White test and Variance inflation factor (VIF) test, are they can be run normally?

I'm a beginner and I do this research for requirement, so I don't know much about this. Thanks for your time. I'm appreciated any help.

Pincipal Component Analysis Index

Hi everyone.

I am working with data from 126 schools in rural Angola. I want to create a index for school infrastructure and use it in my regressions. My data looks as following:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long school_id float pc15 byte chalkboards15 float bathrooms15 int I_desks15 byte I_classrooms15
  1 0 17 1  21  9
  2 0 16 2  18  8
  3 0 21 0   5 13
  4 0  9 9   5  0
  5 1 10 0   6  5
  6 1 12 8  16 12
  7 1 11 4  15  9
  8 0  7 2 120  3
  9 1  2 2   2  2
 10 1 28 4  12 23
 11 0  6 1   9  6
 12 0 36 2   7  7
 13 1 13 4   4 11
 14 1  7 3   3  3
 15 0 13 1   6  2
 16 0 10 1   8 10
 17 0  8 0   5  3
 18 0 10 0   5  3
 19 0 34 2  17 13
 20 0  4 2   6  2
 21 0 25 0   3  3
 22 1 16 2  14 11
 23 0  5 0 100  3
 24 1  9 2   8  5
 25 0  5 0 119  3
 26 0 14 0   1  2
 27 0  4 0   3  2
 28 0  3 2 120  3
 29 0 12 4  20  9
 30 0  0 0   0  0
 31 0  2 0  82  2
 32 0  3 2   3  3
 33 0 20 1  14  9
 34 0 10 2  14  6
 35 0  8 2  15  8
 36 0  6 4   6  6
 37 0  6 2   3  5
 38 1 13 0   3 10
 39 0  7 2   5  5
 40 0 14 0 139  4
 41 0  8 0   2  1
 42 0  6 0   1  0
 43 0 15 2   2  3
 44 0  3 2   3  3
 45 0 13 2 300 13
 46 0  6 0   4  3
 47 0  7 2   5  6
 48 0  3 2   0  3
 49 0  3 2   4  3
 50 0  6 0   2  3
 51 0  6 2   3  3
 52 0  8 2   1  3
 53 0  5 2   4  3
 54 0  3 2   0  3
 55 0  9 0   3  3
 56 0  4 2   0  2
 57 0  7 2   3  3
 58 0  4 2  62  2
 59 0  5 2   2  3
 60 0  4 2   4  2
 61 0 12 2 404 11
 62 0  5 0   3  3
 63 0  2 2   1  2
 64 0  2 2   0  2
 65 0  8 2   4  2
 66 0  6 0   4  0
 67 0  6 1   3  2
 68 0  6 1   5  3
 69 0 12 2   9  3
 70 0  3 2   1  2
 71 0 10 2   6  5
 72 0  3 0   1  0
 73 0  5 2   4  3
 74 0  8 2   2  6
 75 0  6 0   3  0
 76 0  6 2 270  6
 77 0  6 0   2  0
 78 0  7 0   5  0
 79 0  9 0   0  1
 80 0  4 1   4  4
 81 1  8 2  25  7
 82 0  8 0   1  7
 83 0 16 2   8  3
 84 0  4 2   5  3
 85 0  6 2   1  5
 86 0  4 0 120  3
 87 0 18 4   8 18
 88 0 11 0   5  3
 89 0  9 2   9  7
 90 0  8 1   7  6
 91 0  6 2   6  6
 92 0 13 4  13 11
 93 0 14 4   7  4
 94 0  2 0  75  2
 95 0  7 0   3  2
 96 0  3 2   5  3
 97 0 10 2   6  8
 98 0  3 0 120  3
 99 0  8 2   7  4
100 0  2 0   0  1
end
I standarsized all the measures of school infrastructure that I want to include and I used the command -predict- in order to create my Index. Some of the variables included are dummy variables, but since I standartized them all, they are all centered at zero. However, I am new to the concept of PCA and I am not sure what I am doing in STATA is correct. I am using the following code:

local measures "std_I_water15 std_I_electricity15 std_bathrooms15 std_I_chairs15 std_I_classrooms15"
pca measures
predict indexpca15


pca std_I_water15 std_I_electricity15 std_bathrooms15 std_I_chairs15 std_I_classrooms15

Principal components/correlation Number of obs = 126
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000

--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.6917 .398078 0.3383 0.3383
Comp2 | 1.29363 .479866 0.2587 0.5971
Comp3 | .813761 .12548 0.1628 0.7598
Comp4 | .688281 .175654 0.1377 0.8975
Comp5 | .512627 . 0.1025 1.0000
--------------------------------------------------------------------------

Principal components (eigenvectors)

------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
-------------+--------------------------------------------------+-------------
std_I_wat~15 | 0.2529 0.6278 0.4259 0.5311 -0.2802 | 0
std_I_ele~15 | 0.5307 0.2801 0.3070 -0.6114 0.4146 | 0
std_bathr~15 | 0.5189 0.0835 -0.6795 0.3800 0.3430 | 0
std_I_cha~15 | 0.2406 -0.6324 0.5076 0.4074 0.3442 | 0
std_I_cla~15 | 0.5720 -0.3472 -0.0702 -0.1837 -0.7166 | 0
------------------------------------------------------------------------------


Q1. Do I need to -rotate- the PCA; if yes, what is the interpretation for the rotation?

Q2. Once I run the code, I obtain the unexplained variance always equal to zero; does this make sense or I am doing something uncorrect?

Q3. Would you suggest a different iter to obtain the PCA?


latent class analysis: Can Stata fit 2 latent variables (categorical) in one model

Hi Statalists,

I am trying to do a latent class analysis with 2 latent variables (categorical) within the same dataset and within the sam model.

For example, latent variable c1 will be determined by v1-v4, while latent variable c2 will be determined by v5-v8. I wonder if Stata could handle this?

Any thought would be much appreciated!

Thank-you in advance!
Yingyi

importing dataset with long variables names into stata through loop statements

Dear Stata Users,

I am trying to import dataset below into stata from csv Array but some variable names are too long and and are separated by '/' so they come out in the format v23, v24 v25, etc. Kindly I seek guidance on how I can write a loop statement in stata to import this dataset into stata without renaming the variables in the format v23, v24 and etc. I I would prefer to start ready from right and stops where the 1st '/' and pick that as the variable name. I believe this is possible but through loops. Kindly I would appreciate any guidance on how to fix this. see attached data format for guidance.

Thank you.
Collins

Interpreting main effects model with omitted interaction terms

Suppose I have two regressors, task availability (Xa) and task participation (Xp), and a DV Y. One can only participate in a task if it is available, but one can choose not to participate even if a task is available. Baseline model Y ~ Xa, should be the total effect of task availability. Y ~ Xa + Xa * Xp, adds the effect of participation. Now, if the interaction model is the true model, would Y ~ Xa be biased, because of omitted variable problem? That is, the total effect cannot be estimated using the baseline model?

graph hbox qs

Hey guys,

i have a question about relabel function, so here is my command

graph hbox age, over(morethan50k, relabel (1"lots of money" 2"less money")) over(sex, relabel(1"male" 2"female")) name(graph1, repla
> ce) nooutsides asy
my question is, when I am trying re label option, I use 1 and 2, those are the ones that worked, but I coded morethan50k as 0 and 1 (1 being more than 50k), why stata wont relabel based on the values inserted in the cells, why 1,2. and not 0,1

does my question make sense?

thanks

Multinomial fractional response model in panel data


Hello all!
I am mainly working in the context of rural non-farm sector diversification. Thus, i want to model the diversification strategies of the farmers. My dependent variable is share of income from a particular category in total income of the household. Thus, i want to estimate the diversification decisions of the household using a multinomial fractional response model. However, I have a panel data . So how can the fmlogit command be extended in this case to account for panel data? I am really stuck because of the STATA Command. Any reply would be appreciated.





Help Interpreting Results from regress function with unbalanced panel data

Hello! First post here, will do my best to follow all FAQ rules.

I am working on a project with a dataset that has 162,000 observations and 52 variables. Each observation is firm results from a given year. Overall, I am seeking to determine effect of immigration in a given Norwegian municipality on individual firm performance.

variables of interest are:

imm_share : % of workforce in a given municipality in a given year that is classified as an immigrant
ROA: Return on Assets, Firm Profit divided by Firm Assets in a given year
aar: year dummy
industry: industry the firm operates dummy
log_ansatte: log of number of employees at a firm in a given year
log_firmage: log of firm age in a given year

the employees and firm age are meant to be proxies for firm size.

example of dataset:
. dataex ROA imm_share aar log_firmage log_salg

----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ROA imm_share) int aar float(log_firmage log_salg)
.0858681 .02696629 2001 3.0910425 9.262743
.04753989 .05016723 2001 1.94591 9.31722
.16474044 .036985237 2001 2.1972246 9.242129
.04280008 .04942902 2001 3.178054 9.332735
.06279306 .029482344 2001 4.204693 11.091865
.036365848 .031799663 2001 2.833213 11.284744

our estimation and results:

reg ROA imm_share i.aar i.industry log_firmage log_ans if e(sample),vce(cluster cid)


Array


MY QUESTION:
When we run this regress function, and use several variations of control variables, we are always getting 0.00 to 0.012 p value. We wouldnt expect this level of significance. Anyone have some steps to correct or a possible explanation? What would this result signify?

We are stumped as how to best explain this part of the results.

Thank you so much for any insight you can provide.

Clustering across Individuals/Households AND Areas

Dear Statalisters,

I am currently working on a dataset in China regarding Individual spending. I have panel data and run something like the following:

Code:
xtset ID year
xtreg Spending HomicideRateCity PersonalIncomeCharacteristics Gender Age HouseholdCharacteristics, fe vce(XXXXXX)
Basically, I have a panel data set with numerous individuals. I have a measure for HomocideRate in the respective city the individual lived in at time t. I therefore want to cluster at city level. I think this should make sense. Then furthermore I have variables on individual characteristics, like height, age, gender, and personal income characteristics.
Then I have more characteristics across households: The survey basically asks things about the household, i.e. 'How much children live in this household', 'to how many cars does this household have access' etc. I also include these in my individual regression, where the values are all the same for each individual member of the household. From my understanding, I now also should cluster at the household level.

Is this thinking correct? --> I essentially want to cluster at City- AND at Household-level.

Many thanks in advance,
Andreas

Interaction term in survival analysis (streg)

Hello,


I am posting my first question here to ask how to interpret the interaction term in survival analysis regression.

I'm working on the survival analysis, using the exponential model.
And I would like to include the interaction term between my main variable(work1, time-varying) and calendar year to see how the effects of 'work1' vary with time.

'work1' variable is categorical variable with 3 categories: employed(reference), never employed and previously employed.
And 'calyear' variable is also a categorical variable with 7 categories: 1980-1984(reference), 1985-1989, and so on..


So I used the command:

streg i.work1 i.calyear work1#calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust

and got the result like this (I cropped the result of the rest of variables, being too long) :

Array

So, I have two questions regarding this result.

The first question is how to interpret the hazard ratio of each category for the interaction term.

Since the categories for interaction term that include any of reference category are missing,
it is not clear for me what the hazard ratio means.

For example, what does the coefficient of 'never employed#198-1989' mean?
It might be a relative risk, but compared to what?


And the second question is,
My prime interest would be to see how the hazard ratio of each category of 'work1' variable changes over time..
So I am wondering if there is a way to get the hazard ratio of every category of interaction term?

I tried (1) margins command after running the regression, and I found 'margins' is not suitable to get what I want in the survival analysis context.

and (2) including only interaction term without main effect to make Stata show all the categories, by trying this command:

streg i.work1#i.calyear workex workex2 i.educ2 i.agegroup i.marital3, dist(exponential) robust

but excluding the main effects themselves might not be appropriate.


So if there is anyone who can possibly have an answer, it would be very grateful to share your knowledge.
Thank you so much for your attention!

I will be looking forward to hearing from you!

How to imputate missing values for one year in a pooled cross section dataset?

Hello everyone, I am new to Statalist and I will try to be brief and concise.

I am working with the Ethiopian Medium and Large Manufacturing Census for the years 1998-2009, carried out by the Central Statistical Agency of Ethiopia (at the following link you can find the metadata for the year 2009 http://catalog.ihsn.org/index.php/ca...ata_dictionary).

My problem is that in 2005 a survey was conducted (instead of the whole census), and this feature seems to have an influence on the outcome of my analysis.
I think that this survey has not been carried out using a random sampling approach because my summary statistics on the share of private and public firms in 2005 are fairly different with respect to the other years (in particular, the share of public firms seems to be higher wrt the rest of the years). Also the summary statistics on my main variables of interest (i.e. wages and number of workers) seem to be somehow biased for the year 2005.

Is there any way to imputate the values for the year 2005 instead than using the survey? I thought about calculating an average between the value of the variables in the years 2004-2006, even thou I am aware that this is not a very precise approach....any other advice?

I am posting the table of my summary statistics with the red color to undeline the things which are a bit "wierd" in order to allow u to see the problem:
Evolution of Ethiopian manufacturing sector, average values
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Number of firms 725 739 731 765 883 939 997 991 1153 1339 1734 1948
Share of Private 0.81 0.81 0.82 0.83 0.85 0.86 0.86 0.64 0.88 0.91 0.43 0.95
Share of Public 0.19 0.19 0.18 0.17 0.15 0.14 0.14 0.36 0.12 0.09 0.57 0.05
Median employment 20 21 21 23 18 20 23 28 24 20 17 16
Share of firms located in the capital 0.66 0.63 0.63 0.60 0.61 0.58 0.55 0.46 0.53 0.50 0.44 0.39
Exported value added 0.0206 0.0217 0.0233 0.0238 0.0194 0.0227 0.0208 0.0262 0.0205 0.0195 0.0151 0.0166
Capital intensity (capital/worker) '000 Birr 26.46 23.94 38.48 121.03 68.09 69.16 79.73 102.11 89.90 84.89 114.28 122.19
Gender pay gap (Wm-Wf)/Wm 0.16 0.13 0.16 0.13 0.15 0.17 0.13 -0.25 0.05 0.11 0.12 0.02
Gender gap in workers comp (Nm-Nf)/Nm 0.50 0.45 0.45 0.43 0.43 0.45 0.48 0.37 0.40 0.42 0.41 0.30
Technology level of the industry, ISIC classification, share
1 0.50 0.50 0.51 0.52 0.51 0.49 0.50 0.41 0.47 0.44 0.40 0.39
2 0.20 0.19 0.20 0.20 0.21 0.22 0.23 0.18 0.25 0.29 0.36 0.38
3 0.26 0.26 0.24 0.23 0.23 0.24 0.23 0.13 0.24 0.24 0.21 0.20
4 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00
. 0.04 0.04 0.04 0.04 0.04 0.04 0.03 0.27 0.04 0.03 0.03 0.03
Share of firms in each industry
Food and Beverage 0.28 0.28 0.29 0.31 0.30 0.29 0.29 0.21 0.29 0.26 0.25 0.25
Textile and Garments 0.07 0.07 0.07 0.07 0.06 0.06 0.07 0.06 0.06 0.05 0.03 0.04
Leather and Footwear 0.08 0.07 0.06 0.07 0.06 0.06 0.06 0.06 0.05 0.05 0.04 0.04
Wood and Furniture 0.03 0.03 0.03 0.02 0.03 0.02 0.02 0.02 0.02 0.03 0.03 0.02
Printing and Paper 0.07 0.08 0.09 0.08 0.08 0.08 0.07 0.08 0.07 0.07 0.06 0.05
Chemical and Plastic 0.09 0.09 0.08 0.08 0.08 0.08 0.08 0.09 0.10 0.09 0.08 0.07
Non Metal 0.11 0.11 0.10 0.11 0.11 0.12 0.12 0.07 0.12 0.20 0.26 0.29
Metal and Machinery 0.27 0.27 0.27 0.26 0.27 0.28 0.28 0.17 0.29 0.24 0.23 0.23
Explained share 0.99 1.00 1.00 1.00 1.00 1.00 1.00 0.77 1.00 1.00 1.00 1.00

Thank you in advance!


PS I know, also the year 2008 is not so nice ahen it comes to summary statistics...

generate yearly date variable

Hi,

very simple question:

I want to generate a yearly time variable from 1870 - 2017 in my stata spreadsheet. I could not find the code yet.

many thanks
C

for loop

Hi there,

I have the following code in stata:

*2018
*Form 4
generate Form4= 1 if Performance2018 ==1 & form== "4"
replace Form4= 0 if Performance2018==0 & form=="4"
replace Form4= 3 if Performance2018==3 & form=="4"
replace Form4= 5 if Performance2018==5 & form=="4"
label values Form4 Einteilung

*Distinct
egen tag2018 = tag(Form4 cik)
generate distinctform2018= 1 if tag2018 ==1 & Form4 == 1 & year ==2018
replace distinctform2018 = 3 if tag2018 ==1 & Form4 == 3 & year ==2018
replace distinctform2018 = 0 if tag2018 ==1 & Form4 == 0 & year ==2018
replace distinctform2018 = 5 if tag2018 ==1 & Form4 == 5 & year ==2018
label values distinctform2018 Einteilung
drop tag2018

*Average
egen tag2018 = tag(distinctform2018 Form4)
bysort cik Form4: gen Form4forms2018 = _N
generate Form4_gute2018= Form4forms2018 if Form4 ==1 & distinctform2018 == 1
generate Form4_schlechte2018= Form4forms2018 if Form4 ==0 & distinctform2018 == 0

drop Form4 distinctform2018 tag2018 Form4forms2018


My concern is to repeat this Code for every year till 2009. Is it possible with a for loop code that only the year (2018, 2017,2016,2015...) will change? Otherwise i have copy and paste the whole code and change the year manually.

Thank you a lot in advance!

Cheers,

Sergej

Unsure if SCARs are correct (event study)

Dear Statalist community,

I'm working on an event study and I want to use standardized cumulative abnormal returns, however I'm unsure if my results are correct. I want to use SCAR(-1, +1) and SCAR(-2, +2) in my study.

I'm using the following data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int eventtime double(return marketreturn) float(firmid NR_MMM AR_MMM)
-171 -.027066 -.025666 1    -.02592428   -.0011417216
-170 -.000902  .001233 1    .001208978   -.0021109781
-169  .019445  .019076 1     .01920737   .00023763232
-168  .001948  .001974 1   .0019564312  -8.431195e-06
-167  .015204  .014315 1    .014404905     .000799095
-166  .018788   .01829 1    .018414522    .0003734784
-165 -.003079 -.003588 1  -.0036540066   .00057500665
-164  .009817  .008036 1    .008071223    .0017457767
-163  .008534  .008818 1    .008860034   -.0003260339
-162  .001437  .000725 1    .000696554     .000740446
-161 -.000226  .001276 1   .0012523525   -.0014783525
-160 -.007799 -.006825 1   -.006919197   -.0008798032
-159 -.004298 -.004716 1    -.00479183    .0004938302
-158  .014726  .014853 1     .01494759  -.00022159042
-157  .003398   .00457 1    .004575039   -.0011770388
-156 -.000676  .000271 1   .0002386002   -.0009146002
-155 -.001174 -.001421 1   -.001468135    .0002941349
-154  -.00803 -.005825 1   -.005910488   -.0021195116
-153  .013958  .016628 1    .016738048    -.002780048
-152  .009512   .01103 1    .011091297   -.0015792975
-151 -.002833 -.001913 1  -.0019644196   -.0008685804
-150 -.004484 -.002554 1   -.002611002    -.001872998
-149  .013662   .01184 1    .011908351    .0017536485
-148  -.00221  -.00045 1 -.00048867875   -.0017213213
-147 -.004013  -.00481 1   -.004886649    .0008736486
-146  .012462  .011874 1    .011942647    .0005193526
-145  .003109  .002728 1   .0027169974    .0003920026
-144 -.003391 -.003545 1   -.003610632   .00021963214
-143 -.001398 -.001132 1   -.001176618  -.00022138184
-142 -.000379 -.000348 1  -.0003857905   6.790481e-06
-141 -.009617 -.009823 1   -.009943306    .0003263055
-140   .00129  .001511 1    .001489399    -.000199399
-139 -.004136 -.003228 1  -.0032908716   -.0008451284
-138 -.014517  -.01399 1   -.014146594   -.0003704057
-137 -.009785 -.011207 1   -.011339358     .001554358
-136  .013755  .014903 1    .014998026   -.0012430262
-135 -.001368 -.001339 1  -.0013854208  .000017420794
-134  .015289  .016162 1     .01626799   -.0009789907
-133 -.000995 -.001123 1  -.0011675397   .00017253975
-132  .003012   .00381 1   .0038084204   -.0007964204
-131 -.000676 -.001235 1   -.001280515     .000604515
-130  .002372  .001222 1   .0011978822    .0011741178
-129  .001165 -.000129 1  -.0001648833    .0013298832
-128   .00078  .000594 1  .00056441315   .00021558686
-127 -.003856 -.004641 1   -.004716177     .000860177
-126  .009524  .010681 1    .010739258   -.0012152576
-125 -.010729 -.010996 1    -.01112652     .000397521
-124  -.01433 -.014374 1    -.01453394   .00020393883
-123  .016455  .020526 1    .020669995    -.004214995
-122 -.009795  -.00699 1   -.007085634    -.002709366
-121 -.006114  -.00649 1    -.00658128    .0004672795
-120   -.0066 -.007739 1   -.007841157     .001241157
-119   .00205  .002251 1   .0022358433  -.00018584334
-118 -.019851 -.019423 1    -.01962691  -.00022409014
-117  .001597  .004756 1    .004762659    -.003165659
-116  .011313  .010619 1    .010676718    .0006362817
-115  .014906  .014515 1    .014606647    .0002993528
-114 -.014615 -.015041 1   -.015206748    .0005917477
-113 -.015068 -.017797 1   -.017986748    .0029187484
-112  .007152  .007778 1    .007810976   -.0006589764
-111  .008839  .008817 1    .008859024 -.000020024383
-110  .013685  .012418 1    .012491385    .0011936155
-109 -.000831 -.001599 1   -.001647685    .0008166851
-108 -.003328 -.002179 1  -.0022327362   -.0010952638
-107  .010021   .01063 1    .010687814   -.0006668141
-106 -.007376 -.007217 1   -.007314611  -.00006138924
-105 -.008202 -.009412 1   -.009528726    .0013267263
-104   -.0149 -.015304 1   -.015472038    .0005720377
-103  .001397  .002012 1    .001994762   -.0005977621
-102 -.013822 -.013115 1   -.013263974  -.00055802567
-101  -.02396   -.0237 1   -.023941156  -.00001884448
-100 -.010925 -.010838 1   -.010967145   .00004214474
 -99 -.001716  .000853 1   .0008256687    -.002541669
 -98  .006078  .007803 1    .007836194    -.001758194
 -97 -.025859 -.024965 1   -.025217174   -.0006418264
 -96  .015461  .016696 1    .016806642   -.0013456416
 -95 -.021697 -.021599 1    -.02182186   .00012485836
 -94 -.002418  .000532 1   .0005018732    -.002919873
 -93 -.010609 -.011694 1     -.0118306     .001221599
 -92  .005685  .005195 1    .005205482    .0004795182
 -91  .022497  .020284 1     .02042589    .0020711122
 -90 -.016696 -.015638 1   -.015808947   -.0008870526
 -89  .015938  .014144 1    .014232416    .0017055843
 -88 -.010554 -.010863 1   -.010992362   .00043836216
 -87   .00508  .005529 1     .00554239   -.0004623905
 -86  .024569   .02476 1    .024940867    -.000371867
 -85 -.000359 -.000443 1  -.0004816178   .00012261781
 -84 -.020012 -.018743 1   -.018940987   -.0010710129
 -83  .005759  .004992 1    .005000714    .0007582858
 -82  .003651  .001527 1   .0015055384    .0021454617
 -81 -.019618 -.018481 1   -.018676706   -.0009412944
 -80 -.016957 -.014154 1   -.014312023   -.0026449766
 -79 -.003064 -.000664 1  -.0007045424   -.0023594575
 -78  .000261 -.000189 1  -.0002254058    .0004864058
 -77 -.011979 -.012301 1   -.012442886    .0004638859
 -76  .020146  .019518 1    .019653216     .000492784
 -75  .017685  .016517 1    .016626082    .0010589176
 -74  .018118   .01648 1     .01658876    .0015292395
 -73 -.004207 -.004666 1   -.004741395    .0005343949
 -72 -.000227 -.000026 1 -.00006098628   -.0001660137
end
With the following code:

Code:
quietly describe
bys eventtime: gen N = _N

preserve
 drop if eventtime>-11
 collapse  (sd) AR_MMM,by(firmid)
 rename AR_MMM si
 keep firmid si
 save "M:\tmp"
 restore
 
 merge m:1 firmid using "M:\tmp"
 erase "M:\tmp.dta"
 
 drop if eventtime <-2
 drop if eventtime >2
 
 // Generate SAR
 gen SAR = AR_MMM/si
 
  ** Create cumulative abnormal returns 
    sort firmid eventtime
    by firmid: gen CAR_MMM =sum(AR_MMM) 
 
  ** Create cumulative standardized abnormal returns
    by firmid: gen CSAR_MMM = sum(SAR)
My results are as follows for SCAR(-2, +2)

Array

Thank you in advance!


Calculating MHHI with triple summation

Hi everybody!
I'm looking to create a Modified Herfindahl-Hirschmann Index (MHHI) in STATA from the attached formula. It creates a market index that also take into account cross ownership of the companies in the sector. I have all the variables I need to execute the formula, I just simply have no idea how to do it in STATA.
At first I thought is should use loops to make STATA go through each company (as the formula describes) but i don't know how to make STATA aware of the relationship between the owners and companies, creating that kind of sumproduct that the formula prescribes for each owner in the company.
All inpust will be well appreciated!

Below is a some of my sample data (highlighted is the name of the variable in the formula):

input byte(Company_number_j Owner_number_i) double(Owner_share_gamma Market_Share_s)
1 1 .29 .2996783902352667
1 2 .12 .2996783902352667
1 3 .29 .2996783902352667
1 4 .29 .299678390235267
2 5 .5 .0466357542171158
2 5 .5 .0466357542171158
3 1 .29 .48362543478072323
3 2 .12 .48362543478072323
3 3 .29 .48362543478072323
3 4 .29 .48362543478072323
4 6 .05 .17006042076689426
4 7 .11 .17006042076689426
4 8 .73 .17006042076689426
4 9 .11 .17006042076689426

The formula:
Array
K is the other companies and therefore when j=1, k=2 then 3 then 4
The formula requires a triple summation but i think breaking it down into single summations will be an advantage.

Generating a new variable but missing values have been included

Hi,

My code is:

Code:
generate var_66 = 1 if var_33 >=2
However it has put a 1 in the new variable where there are missing values in var_33, rather than just where there is a score of 2 or greater in var_33.
Any advice?
Thanks

Trying to create a new variable out of the response of the respondents in the survey ! Your help will be much appreciated

Hello Intelligent People,

The version of STATA i am using is 14.2 on a Windows 10

Here is a glimpse of what my data looks like
methods1 method2 method3 method4 method5
yes no yes yes yes
no no no yes no
no yes yes yes yes
no no no no no
yes no no yes yes
no no no no no
no no no no no
yes yes no yes no
no no no no no
yes yes no no yes
So basically i am trying to create a new variable i.e. no_method. I believe that this variable can be derived from the responses of the respondents. So if non of the respondents
have used any method, it will be counted as an observation of "no_method".
So from the data above, there are 4 respondents who have used no method at all

I would be most obliged to you if you could please let me know how to create a new variable no_method.

Kind Regards

Regression gives different results depending on the order of the independent variables

I am running what is essentially a difference in differences regression on a large dataset with a lot of fixed effects. Bizarrely (maddeningly, even), I get a slightly different coefficient on my main treatment effect depending on the order I provide the list of independent variables. The regression has individual level data, with county fixed effects, month fixed effects, and state-specific trends. The variable "treatment" is equal to 1 if a state-level policy has gone into effect in the person's state as of the current month. There is a separate set of trends for New York City, as NYC implemented its own policy.

Here's my code and output

Code:
. local conditions if month<=695 & state_cd <= 56 & birthyear_last!=.

. qui xtreg success_14 treated i.month  i.state#c.open_dt 1.NYC#c.open_dt i.state#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt i.state#c.open_dt#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt#c.open_dt sc RACE Hispanic EDUCATION Median_income i.bankrank birthyear_last `conditions' , fe vce(cluster state_cd)

. disp _b[treated]
.0064373

. qui xtreg success_14 treated  sc RACE Hispanic EDUCATION Median_income i.bankrank i.month  i.state#c.open_dt 1.NYC#c.open_dt i.state#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt i.state#c.open_dt#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt#c.open_dt  birthyear_last `conditions' , fe vce(cluster state_cd)

. disp _b[treated]
.00803661
I realize those are long regression commands, but if you look closely you'll see that they both have the same list of variables, just in a difference order. Both versions drop a few factor levels of i.month for collinearity, but they both drop the same ones. I get the same result using areg instead of xtreg. The regressions each take more than an hour to run, so trying different variations is cumbersome. The problem doesn't replicate if I use a random 0.5% subsample of my data. I'm running out of ideas here--anyone know what's going on? Really just want to know which version is more likely to be the "right" coefficient.

I'm running Stata/MP 15.1 on a Linux server with Red Hat 6.

Create categorical variable*

hi! new to stata/in a stats class... i have a dataset that has individual race dummy variables. Is there a way for me to merge all of the dummy variables onto one new categorical variable?

tobit regression with collinearity

Hi i am running a tobit regresison for data across 2 years 2007-2008:
My variables include 10 log price categories for alcohol types on trade and off trade : l_p_wine_on l_p_wine_off etc
I also have a log income variable : log_income
My dependent variables are the expenditure shares of the alcohol type expenditure divided by total expenditure : e.g exp_share_wine_on expshare_wine_off
I am looking at the price elasticities of demand and the cross price elasticities of demand vary across each alcohol type and vary across socio-economic groups, government regions and gender

My prices for alcohols are constant throughout the year (i am using the average year price) however they vary between years

here is a data-ex for some of my variables
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(l_p_wine_on l_p_beer_on l_p_spirits_on l_p_wine_off l_p_spirits_off
>  l_p_beer_off expshare_wine_on expshare_beer_off logincome) byte(socio_group 
> gor) int year byte sexhrp
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.433789 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .01142119 
>  5.898746 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.898213 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584    .0550356  .015000853 
>  6.399842 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0 .0016348386 
>  5.584012 3 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .015073973 
>  7.020905 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.225338 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.911331 3 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.219934 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.533279 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
> 4.2492094 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .00609936 
>  6.168564 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.835587 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.940566 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .006249688 
>  5.331317 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.786775 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .003858888 
>  7.201894 2 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.476967 2 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.009435 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .010382757 
>  6.377679 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.982862 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>   6.11283 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0 .0023888294 
>  6.279646 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .001813489 
>  6.294915 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .005435922 
>  6.704463 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.747566 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .005957043 
>   6.11456 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .014408222  .016718158 
>  6.605068 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .018981254           0 
>  6.019785 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.088818 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.779476 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .008590408 
>  6.514719 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .005628793  .018012136 
>  6.960443 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .005657709 
>  6.424075 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.920457 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .008473212           0 
>  6.898255 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .02177079 
>  5.623837 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.812526 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.182973 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.514611 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.109314 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.362559 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>   5.30903 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>   5.26414 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
> 4.3593974 3 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .017695729           0 
>   4.77104 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.069847 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .01336186 
>  6.690271 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>   5.80408 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.628306 5 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .016276948 
>  6.522627 5 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.519619 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .035966147 
>  6.422951 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.557673 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.602438 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.402017 3 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0029820926           0 
>  7.401286 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .015186014 
>  7.176426 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.746554 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.474176 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .01607261 
>  6.874416 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0  .010095213 
>  6.662046 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .024986824    .0423164 
>  6.069906 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .001250104           0 
>  7.438652 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584   .03693495           0 
>  7.021414 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .02268917 
>    6.5658 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.958667 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.192117 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0016070686           0 
>  5.815264 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>   5.34921 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>     6.279 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.516609 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.554516 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.347932 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584   .01880577           0 
>   5.93925 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .07133046 
>  6.985651 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .005728897  .005415315 
>   7.12227 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584   .01053234  .021376746 
>    6.8088 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0019776237           0 
>  6.519822 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.490757 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.787439 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.457868 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.921752 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  7.098411 2 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.400603 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .013181653           0 
>  6.857086 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .003744323           0 
>  6.710182 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.136498 6 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584   .02200635           0 
>  7.438652 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584 .0019496685           0 
>  6.911319 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0 .0006786454 
>  6.854755 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  7.438652 2 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  6.609726 1 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584  .004789272           0 
>  6.868133 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.182907 4 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.823194 1 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  4.812526 6 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .04808098 
>  5.530222 4 2 2007 2
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0           0 
>  5.793585 3 2 2007 1
.60158 -.010050327 .662688 -.9416085 -.967584 -.967584           0   .11235794 
>  6.436151 1 2 2007 2
end
label values gor gor
label def gor 2 "north west", modify
label values sexhrp sexhrp
label def sexhrp 1 "male", modify
label def sexhrp 2 "female", modify
I am then running a tobit regression as follows:

Code:
 tobit expshare_wine_on l_p_wine_on l_p_beer_on l_p_cider_on l_p_spirits_on l_p_alcopops_on l_p_wine_off l_p_beer_off l_p_spirits_off l_p_cider_off l_p_alcopops_off logincome i.socio_group i.gor i.year i.sexhrp , ll(0)
I have censored the data at zero since some households report no consumption of alcohol

However my results are as follows:
Code:
. tobit expshare_wine_on l_p_wine_on l_p_beer_on l_p_cider_on l_p_spirits_on l_p_alcopops_on l_p_wine_off
>  l_p_beer_off l_p_spirits_off l_p_cider_off l_p_alcopops_off logincome i.socio_group i.gor i.year i.sex
> hrp , ll(0)
note: l_p_beer_on omitted because of collinearity
note: l_p_cider_on omitted because of collinearity
note: l_p_spirits_on omitted because of collinearity
note: l_p_alcopops_on omitted because of collinearity
note: l_p_wine_off omitted because of collinearity
note: l_p_beer_off omitted because of collinearity
note: l_p_spirits_off omitted because of collinearity
note: l_p_cider_off omitted because of collinearity
note: l_p_alcopops_off omitted because of collinearity
note: 2008.year omitted because of collinearity

Refining starting values:

Grid node 0:   log likelihood = -5976.9775

Fitting full model:

Iteration 0:   log likelihood = -5976.9775  
Iteration 1:   log likelihood = -640.92644  
Iteration 2:   log likelihood =   1103.185  
Iteration 3:   log likelihood =  1808.8673  
Iteration 4:   log likelihood =  1909.2432  
Iteration 5:   log likelihood =   1910.562  
Iteration 6:   log likelihood =  1910.5625  
Iteration 7:   log likelihood =  1910.5625  

Tobit regression                                Number of obs     =     11,962
                                                   Uncensored     =      2,312
Limits: lower = 0                                  Left-censored  =      9,650
        upper = +inf                               Right-censored =          0

                                                LR chi2(14)       =     927.97
                                                Prob > chi2       =     0.0000
Log likelihood =  1910.5625                     Pseudo R2         =    -0.3207

-------------------------------------------------------------------------------------------
         expshare_wine_on |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
              l_p_wine_on |   -.028118   .0162136    -1.73   0.083    -.0598992    .0036632
              l_p_beer_on |          0  (omitted)
             l_p_cider_on |          0  (omitted)
           l_p_spirits_on |          0  (omitted)
          l_p_alcopops_on |          0  (omitted)
             l_p_wine_off |          0  (omitted)
             l_p_beer_off |          0  (omitted)
          l_p_spirits_off |          0  (omitted)
            l_p_cider_off |          0  (omitted)
         l_p_alcopops_off |          0  (omitted)
                logincome |   .0125922   .0006706    18.78   0.000     .0112778    .0139066
                          |
              socio_group |
                       2  |   .0014811   .0010997     1.35   0.178    -.0006745    .0036368
                       3  |  -.0078991   .0012672    -6.23   0.000    -.0103829   -.0054152
                       4  |  -.0098159    .003836    -2.56   0.011    -.0173351   -.0022968
                       5  |   .0065436   .0035439     1.85   0.065    -.0004031    .0134903
                       6  |  -.0027114   .0010429    -2.60   0.009    -.0047556   -.0006672
                          |
                      gor |
              north west  |  -.0004291   .0014892    -0.29   0.773    -.0033481      .00249
              merseyside  |  -.0009579   .0014654    -0.65   0.513    -.0038303    .0019145
yorkshire and the humber  |   .0017352   .0014609     1.19   0.235    -.0011284    .0045987
           east midlands  |  -.0012567   .0021903    -0.57   0.566      -.00555    .0030366
           west midlands  |  -.0024181   .0018331    -1.32   0.187    -.0060112    .0011751
                 eastern  |  -.0016493   .0017609    -0.94   0.349     -.005101    .0018023
                          |
                     year |
                    2008  |          0  (omitted)
                          |
                   sexhrp |
                  female  |   .0011694    .000803     1.46   0.145    -.0004046    .0027435
                    _cons |  -.0853166   .0110156    -7.75   0.000     -.106909   -.0637242
--------------------------+----------------------------------------------------------------
   var(e.expshare_wine_on)|   .0007865   .0000268                      .0007357    .0008408
------------------------------------------------------------------------------------------
Q1. Why are my price variables other than the price of the same dependent variable omitted (as i am trying to work out cross price elasticity of demand) i understand its due to collinearity but what is causing this and how do i overcome it?
Q2. Why is the year dummy variable omitted?

I am following a model which has done close to the same thing and they didn't have this problem

Thanks so much in advance

Logic for using matched sample versus unmatched sample

Hi all,

I am stuck on a problem of when to use a matched versus unmatched sample. I have a list of companies and I am trying figure out the effect of a treatment (a policy change in this case) on company financial performance (the dependent variable). I will use difference-in-differences for the analysis.

I do not know when (generally speaking) a matched sample design is better/worse than an unmatched sample design. To my understanding, matching "controls" for confounding variables before running regression models. Alternatively, using an unmatched sample, one would add the control variables into the regression model to "control" for these factors.

But broadly speaking, can someone please help me decide if I should use a matched sample with this case (pros/cons) and if so/not, why?

Roger C.

no observations error event study

Dear Stata community,

I'm trying to run a loop to generate the normal returns for my event study, however I run into the following error: no observations r(2000);

A sample of the data that I'm using:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(firmid date) int eventtime double(return marketreturn)
10104 20150928 -171 -.027066 -.025666
10104 20150929 -170 -.000902  .001233
10104 20150930 -169  .019445  .019076
10104 20151001 -168  .001948  .001974
10104 20151002 -167  .015204  .014315
10104 20151005 -166  .018788   .01829
10104 20151006 -165 -.003079 -.003588
10104 20151007 -164  .009817  .008036
10104 20151008 -163  .008534  .008818
10104 20151009 -162  .001437  .000725
10104 20151012 -161 -.000226  .001276
10104 20151013 -160 -.007799 -.006825
10104 20151014 -159 -.004298 -.004716
10104 20151015 -158  .014726  .014853
10104 20151016 -157  .003398   .00457
10104 20151019 -156 -.000676  .000271
10104 20151020 -155 -.001174 -.001421
10104 20151021 -154  -.00803 -.005825
10104 20151022 -153  .013958  .016628
10104 20151023 -152  .009512   .01103
10104 20151026 -151 -.002833 -.001913
10104 20151027 -150 -.004484 -.002554
10104 20151028 -149  .013662   .01184
10104 20151029 -148  -.00221  -.00045
10104 20151030 -147 -.004013  -.00481
10104 20151102 -146  .012462  .011874
10104 20151103 -145  .003109  .002728
10104 20151104 -144 -.003391 -.003545
10104 20151105 -143 -.001398 -.001132
10104 20151106 -142 -.000379 -.000348
10104 20151109 -141 -.009617 -.009823
10104 20151110 -140   .00129  .001511
10104 20151111 -139 -.004136 -.003228
10104 20151112 -138 -.014517  -.01399
10104 20151113 -137 -.009785 -.011207
10104 20151116 -136  .013755  .014903
10104 20151117 -135 -.001368 -.001339
10104 20151118 -134  .015289  .016162
10104 20151119 -133 -.000995 -.001123
10104 20151120 -132  .003012   .00381
10104 20151123 -131 -.000676 -.001235
10104 20151124 -130  .002372  .001222
10104 20151125 -129  .001165 -.000129
10104 20151127 -128   .00078  .000594
10104 20151130 -127 -.003856 -.004641
10104 20151201 -126  .009524  .010681
10104 20151202 -125 -.010729 -.010996
10104 20151203 -124  -.01433 -.014374
10104 20151204 -123  .016455  .020526
10104 20151207 -122 -.009795  -.00699
10104 20151208 -121 -.006114  -.00649
10104 20151209 -120   -.0066 -.007739
10104 20151210 -119   .00205  .002251
10104 20151211 -118 -.019851 -.019423
10104 20151214 -117  .001597  .004756
10104 20151215 -116  .011313  .010619
10104 20151216 -115  .014906  .014515
10104 20151217 -114 -.014615 -.015041
10104 20151218 -113 -.015068 -.017797
10104 20151221 -112  .007152  .007778
10104 20151222 -111  .008839  .008817
10104 20151223 -110  .013685  .012418
10104 20151224 -109 -.000831 -.001599
10104 20151228 -108 -.003328 -.002179
10104 20151229 -107  .010021   .01063
10104 20151230 -106 -.007376 -.007217
10104 20151231 -105 -.008202 -.009412
10104 20160104 -104   -.0149 -.015304
10104 20160105 -103  .001397  .002012
10104 20160106 -102 -.013822 -.013115
10104 20160107 -101  -.02396   -.0237
10104 20160108 -100 -.010925 -.010838
10104 20160111  -99 -.001716  .000853
10104 20160112  -98  .006078  .007803
10104 20160113  -97 -.025859 -.024965
10104 20160114  -96  .015461  .016696
10104 20160115  -95 -.021697 -.021599
10104 20160119  -94 -.002418  .000532
10104 20160120  -93 -.010609 -.011694
10104 20160121  -92  .005685  .005195
10104 20160122  -91  .022497  .020284
10104 20160125  -90 -.016696 -.015638
10104 20160126  -89  .015938  .014144
10104 20160127  -88 -.010554 -.010863
10104 20160128  -87   .00508  .005529
10104 20160129  -86  .024569   .02476
10104 20160201  -85 -.000359 -.000443
10104 20160202  -84 -.020012 -.018743
10104 20160203  -83  .005759  .004992
10104 20160204  -82  .003651  .001527
10104 20160205  -81 -.019618 -.018481
10104 20160208  -80 -.016957 -.014154
10104 20160209  -79 -.003064 -.000664
10104 20160210  -78  .000261 -.000189
10104 20160211  -77 -.011979 -.012301
10104 20160212  -76  .020146  .019518
10104 20160216  -75  .017685  .016517
10104 20160217  -74  .018118   .01648
10104 20160218  -73 -.004207 -.004666
10104 20160219  -72 -.000227 -.000026
end

My current code is looking as follows:

Code:
 su firmid
 local N `r(max)'
 
 //Generate normal returns variable
 gen NR_MMM =.
 
 //Creating a loop to generate normal returns
 forvalues i = 1/`N'{

display `i'/`N'

** Market Model method (MMM)
quietly reg return marketreturn if (firmid==`i' & eventtime<=-1)
quietly predict r if firmid==`i'
quietly replace NR_MMM = r if firmid==`i'
quietly drop r

}
gen AR_MMM = return - NR_MMM
Thanks in advance

GMM estimation inside a do loop with foreach

Dear Friends,

I am trying to use GMM to estimate a model that is nonlinear in its parameters.

The model has 3 parameters: rho, sigma1, and sigma2.

I would like to obtain model coefficients, standard errors, and the value of the objective function for different combinations of starting values.

To accomplish this, I set up three do loops, one within the other, and let rho vary according the counter "j", sigma1 vary according to the counter, "k", and sigma2 vary according to the counter "l."

The code is below. The code works. However, I have two questions:

a. How can I get the values of j,k, and l to be displayed after each step in the loop

For example, when I include display j, in the code below, I get "j not found" and the program exits.

b. Sometimes, GMM does not converge and I get the following message, "flat or discontinuous region encountered" and the program exits the do loop. Is there some way I can include
a line which causes the program to go to the next loop increment, rather than exiting the program? In fortran, I could do this with an error handling condition.

Thanks so much!

Srinivasan Rangan

Code:

foreach j in 0.1 0.2 0.3 0.4 0.5 {
foreach k in 5 10 {
foreach l in 5 10 {
gmm (Eq1:abn_ret2_unwinsored_stage1_w - (1 + {rho}*{sigma1})*ue_p_scaled_centered_w - ({sigma2})*(sqrt(1-{rho}^2))*tq2_w_centered) if sample_to_use == 3, instruments(ue_p_scaled_centered_w tq2_centered_w lag2mret_w lag2chusd_w) winitial(identity) vce(cluster cnum) from(rho `j' sigma1 `k' sigma2 `l') technique(bfgs) wmatrix(unadjusted)
mat list e(b)
mat list e(V)
display e(Q)
}
}
}

cross classified multilevel structure

Dear Stata Forum,

I am using mixed estimating a cross-classified multilevel model with three levels, where I have approximately 100 000 observations divided into 12 cohorts cross-classified with 8 periods both nested in 9 countries. When I run the code:
Code:
mixed DV IV1 i.IV2 || _all: R.cohort || period: || country: , variance

------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1    114,788  114,788.0    114,788
       essround |          8     13,509   14,348.5     14,887
        country |         72      1,164    1,594.3      2,289
it seems Stata cross-classifies also the country and period variable (8 periods * 9 countries = 72 groups), instead of simply estimating country as a third level above the second (cohort-period cross-classified) level.
What would be the right way to write the syntax so that Stata would not cross-classify country with period?

Thank you very much in advance for your answers!

Idiosyncratic Volatility - Rolling Window

Hello everyone!

I am trying to compute idiosyncratic volatility on a rolling basis of 24 months with monthly data. I have created a month/year variable called ymdate.

This has been my code so far:

bys perm : asreg Excess_USD_w MKT SMB HML, wind(ymdate 24)
gen residuals = Excess_USD_w - _b_cons - _b_MKT*MKT - _b_SMB*SMB - _b_HML*HML
bys perm: egen IVOL=sd (residuals)

In the end, I want to look at each year. However, my min and max values are the same for multiple years, not just for two years as specified by my rolling window. In addition, my mean value just keep on increasing.

Would anyone know what I have done wrong? Thank you very much in advance!


Mean if respondents have answerd atleast out of 3 variables

Hi I would reaaly be grateful if anyone can help with providing a code for this.

I have 3 sets of variables relating to the same thing. I was wondering what code I can use to calculate the mean of these variables if the respondents answered at least 2 of them. Many Thanks

New to STATA | Linear regression on STATA

Hi,

I am new to STATA and I am trying to do a linear regression analysis for a college project.

I am getting the error message, "matrix not positive definite" when I run the 'reg' command

Further, when I run the 'vif' command to check for multicollinearity the error message reads, "not appropriate after regress, nocons;
use option uncentered to get uncentered VIFs"

Please help

cross-correlations in long format

Hi guys, I am currently working on a dataset in long format and I need the average corss-correlation of all variables. I have tried to convert the dataset from long to wide using the reshape code but Stata returns 'values of variable t not unique within firmid'. I also tried it with a loop with the command xcorr, but I cannot find the correct way to do it. Can someone please help me?
The dataset looks like this:

firmid t return
1 -20 0,02
1 -19 -0,1
1 -18 0,014
1 -17 etc.
2 -20
2 -19
2 -18
2 -17
etc...

So I need the average cross-correlation at time -20, the average cross-correlation at time -19 etc stored as a new variable.

Huub

Converting XY coordinates into lat-long

Hi,

I have a data set that records the pick-up and drop-off locations of a particular item. The coordinates of these locations are in the form of XY coordinates, and I would like to calculate the distance, in kms, between the two locations using "geodist." To do the same, I have to convert the XY coordinates to Lat-Long coordinates. Appreciate any help to figure out how to convert XY coordinates to Lat-Long coordinates.

Thank you!

Independence assumption in Cross-Classified Multilevel Models?

Dear everyone,

In cross-classified (multilevel) models, the cross-classified variance components are generally assumed to be independent. Does anyone know of any (published or unpublished) discussion of this assumption or how deviations from independence may affect the results?

Best regards,
Are Skeie Hermansen

University of Oslo

Monday, April 29, 2019

How to choose which duplicates to drop/keep?

Hi there,

I'm working on data from 2012-2018 concerning HIV-positive pregnant women in treatment and loss to follow-up. For those who have been in treatment more than once in relation to having more children, their ID occurs two, three or four times. The information attached to the duplicate ID's are, among other, start and end date (some end dates are missing) of treatment. I need to maintain the "latest" IDs and date variables for those that occur more than once in order to trace their lastes contact (among those where end date is missing) with the clinic and register whether or not they are LTFU. Each woman is only supposed to occur once during the study period. If I make a simple "duplicates drop idp, force" I will have to use the start date of their first treatment and their latest contact with the clinic from later visits to calculates their follow-up time which then will be too long.

Any thoughts? (I'm using Stata 14 on a Mac)

Thank you!

Best regards, Laura

Array

Replacing values of multiple variables at once

I have 9 variables (v1 - v9) with binary responses coded as yes = 1 no =2.

I want to replace all he values of no to 0 for all variables at once.
Can anyone help me with the codes.