Friday, November 30, 2018

What stats are appropriate to assess model fit for nested logit models if Wald is not possible for clustered data?

I used vce(cluster) to account for clustering within 9 groups in a set of nested logistic regression models. Stata doesn't want to give me Wald chi2 stats because I have too many variables in the model in relation to # of clusters, and used up my df. Stata also said both Wald and lrtest would be misleading. So, what *wouldn't* be misleading to report to describe fit and compare fit among nested models? Are pseudo-R square, AIC, BIC, and log likelihood #s still meaningful to interpret? Or are there other stats I don't know about?

Thanks in advance!

How do I combine observations within a dataset?

Hi all,

I appended two datasets based on ID. I was not able to merge these datasets, because ID did not identify individual observations. So,I ended with a dataset that looks like the one below.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID float(date1 date2 duplicate)
1 20737     . 1
1     . 20438 2
2     . 20775 1
2 20930     . 2
2 21129     . 3
3 20796     . 1
3 21157     . 2
4     . 20873 1
4     . 20180 2
4 20858     . 3
end
format %td date1
format %td date2
I would like to modify the dataset above, to look like the one below.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID float(date1 date2 duplicate)
1 20737 20438 1
2 20930 20775 2
2 21129 20775 3
4 20858 20873 1
4 20858 20180 2
end
format %td date1
format %td date2
In other words, I would like to keep observations identified by ID that have both date1 and date2, but the combinations of ID date1 and date2 vary.
ID 1 has one date1 and one date2. These just need to be combined into one observation.
ID 2 has two date1 and one date2. In this case I need to combine the observations into two distinct observations, the firsts with one date1 and date2, and the second with the other date1 and date2 again.
ID 3 only has date1, these must be dropped.
ID 4 has one date1 and two date2. Similar to ID 2, in this case I need to combine the observations into two distinct observations, the first with date1 and one date2, and the second with date1 and date2 again.

Thank you very much!

Difference between catplot and tabplot

Dear Stata users,

I have a dataset comprised of two variables, the foobarx variable is what I concerned, the type variable is an indicator of different types of foobarx. I use -catplot- and -tabplot- (both from SSC) to generate three plots, g1, g2 and g3. The g1 plot seems to be identical to g2 plot, but what g3 plot means in this case? Thank you!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int type double foobarx
1    1
1  1.3
1  1.5
1  1.2
1    1
1  2.4
1  1.5
1    1
1  2.1
1  1.4
1    3
1  1.2
1    2
1    1
1    1
1  1.2
1    2
1   .8
1  2.1
1   .1
1    1
1    2
1  1.4
1    2
1    2
1  1.7
1  1.3
1    2
1  1.5
1   .8
1  .83
1  2.2
1   .8
1   .8
1 1.46
1   .8
1   .5
1    2
1  2.5
1  1.2
1   .7
1   .3
1  1.5
1   .8
1   .8
1  .03
1   .4
1  1.4
1    1
1  1.4
2    .4
2   2.6
2 4.375
2   1.2
2     1
2   2.4
2   1.5
2    .5
2   2.1
2   .66
2     4
2   1.5
2     2
2     1
2     1
2   1.2
2  1.66
2    .8
2   2.1
2   .07
2    .5
2     2
2   1.4
2     2
2   1.6
2   1.7
2   1.3
2  1.69
2   1.5
2   .32
2  .416
2     2
2    .3
2     1
2   1.2
2    .8
2    .5
2    .8
2   2.5
2   1.2
2    .7
2    .3
2    .7
2  .667
2    .8
2   .02
2  1.25
2   1.4
2    .5
2   1.4
end

graph drop _all
catplot type foobarx, recast(bar) asyvars var2opts(label(labsize(tiny))) legend(order(1 "Type==1" 2 "Type==2")) name(g1)
tabplot type foobarx, separate(type) xlabel(, labsize(tiny)) name(g2)
tabplot foobarx, separate(type) xlabel(, labsize(tiny)) name(g3)

I created a YouTube video on using CODE delimiters & dataex – looking for feedback on making it better

Hi everyone,

It seems like using the CODE delimiters and -dataex- is a hurdle for new posters (particularly -dataex-). I've tried to help people along by creating a short tutorial on using them and why they are important.

I'm looking for your feedback on ways I could improve it or things that I should change in the next go round (I consider the current posting a "rough draft"). The video is longer than I expected, and has too many “ums” and pauses (and you’ll notice a phone call I tried to edit out around 5:20). Definitely listen to it at 1.5 or greater

Also note: While I am awesome with Excel and pretty good with Stata, I am a novice video editor, and so if anyone wanted to help me edit some of that stuff out, that would be appreciated. I recorded this using Screencast-o-matic (yes, that’s its real name). It’s low budget (I paid $20 for 3 yrs), but I haven’t sat down to learn Camtasia or any of the Adobe products

So if you say, "Why don't you add in some awesome intro music like the video put out by StataCorp" I will likely respond, "Thanks - show me how to do that."

The YouTube video is here https://youtu.be/bXfaRCAOPbI

The Statalist post on converting incident data to weekly rates is here


All the best,
--David

Predicting residuals of an EGARCH model for panel data using rangerun/rangestat

Hello Statalisters,

My data is daily panel data that consists of 379 firms:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int Timestamp long Company float(id resid)
18535 1 1  -.5671907
18536 1 1  -.6045171
18539 1 1  -.6121998
18540 1 1  -.6303514
18541 1 1   -.669826
18542 1 1   -.702639
18543 1 1   -.719861
18546 1 1  -.6933156
18547 1 1   -.731853
18548 1 1  -.7365888
18549 1 1  -.7747283
18550 1 1  -.7631773
18553 1 1  -.7770729
18554 1 1  -.8012897
18555 1 1  -.8964902
18556 1 1   -.913472
18557 1 1  -.9855964
18560 1 1   -.999239
18561 1 1  -.9881337
18562 1 1 -1.0019863
18563 1 1 -1.0206912
18564 1 1 -1.0217553
18567 1 1 -1.0474454
18568 1 1  -1.045138
18569 1 1 -1.0861173
18570 1 1 -1.0813621
18571 1 1 -1.1092516
18574 1 1 -1.0936003
18575 1 1   -1.11229
18576 1 1 -1.1059849
18577 1 1 -1.1025311
18578 1 1 -1.1074014
18581 1 1 -1.1109506
18582 1 1 -1.1378505
18583 1 1 -1.0972226
18584 1 1  -1.081413
18585 1 1 -1.0957986
18588 1 1 -1.2032365
18589 1 1 -1.1156594
18590 1 1 -1.1093435
18591 1 1  -1.119484
18592 1 1 -1.1405734
18595 1 1 -1.1548105
18596 1 1 -1.1471893
18597 1 1 -1.1450335
18598 1 1 -1.2197285
18599 1 1 -1.2260885
18602 1 1  -1.254092
18603 1 1 -1.2256622
18604 1 1 -1.2525495
18605 1 1  -1.260203
18606 1 1 -1.2446553
18609 1 1 -1.2434903
18610 1 1 -1.2611834
18611 1 1 -1.2991477
18612 1 1 -1.3018358
18613 1 1  -1.300265
18616 1 1 -1.2987148
18617 1 1  -1.289396
18618 1 1 -1.3019747
18619 1 1 -1.3019783
18623 1 1 -1.3298627
18624 1 1 -1.3500524
18625 1 1 -1.3450558
18626 1 1 -1.3278207
18630 1 1 -1.3412284
18631 1 1 -1.3589562
18632 1 1 -1.3617107
18634 1 1 -1.4429946
18637 1 1  -1.425543
18638 1 1  -1.430764
18639 1 1 -1.4498658
18640 1 1  -1.439363
18641 1 1  -1.480617
18644 1 1  -1.472839
18645 1 1  -1.471289
18646 1 1  -1.559699
18647 1 1   -1.55628
18648 1 1 -1.5369538
18651 1 1 -1.5414712
18652 1 1 -1.5362016
18653 1 1 -1.5512855
18654 1 1  -1.596582
18655 1 1 -1.6015798
18658 1 1 -1.6236287
18659 1 1   -1.62002
18660 1 1  -1.633923
18661 1 1 -1.6823018
18662 1 1 -1.7491394
18665 1 1  -1.698604
18666 1 1 -1.7063757
18667 1 1 -1.6978794
18668 1 1 -1.7265545
18669 1 1  -1.708142
18672 1 1 -1.7848433
18673 1 1 -1.7568833
18674 1 1  -1.757743
18675 1 1 -1.7850047
18676 1 1 -1.7637664
18679 1 1  -1.782558
end
format %tdnn/dd/CCYY Timestamp
label values Company Company
label def Company 1 "AAK.ST", modify
id is basically a firm id and I have 379 ids one for each of my 379 firms. resid is my main variable.
I am trying to run an EGARCH model for each company separately and save the residuals before running the next EGARCH model for id ==2 and so on, using resid as my dependent variable with no independent variables. So this is my model:

Code:
arch resid , earch(1/1) egarch(1/1)
It is possible to run a panel EGARCH model as follows but the problem is saving each regression's residuals before running the next regaression:

Code:
by id: arch resid , earch(1/1) egarch(1/1)
I thought of Mr. Cox's rangerun code and did something like:

Code:
program define EGARCH
    arch resid , earch(1/1) egarch(1/1)
    predict IV, residuals
    exit
end
rangerun EGARCH, interval( Timestamp 0 0) by(id) verbose
but stata wanted a time variable although I already declared my data.

Code:
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname ...
time variable not set, use tsset varname .
Thus, I did this

Code:
program define EGARCH
    xtset id Timestamp, daily
    arch resid , earch(1/1) egarch(1/1)
    predict IV, residuals
    exit
end
rangerun EGARCH, interval( Timestamp 0 0) by(id) verbose
and I got

Code:
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 16oct2013 to 16oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 17oct2013 to 17oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 18oct2013 to 18oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 21oct2013 to 21oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 22oct2013 to 22oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 23oct2013 to 23oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 24oct2013 to 24oct2013
                delta:  1 day
insufficient observations
       panel variable:  id (strongly balanced)
        time variable:  Timestamp, 25oct2013 to 25oct2013
                delta:  1 day
insufficient observations
I know that I have sufficient observations to run the EGARCH model because I have already run the EARCH model without saving the residuals and had no issues there. I must be doing something wrong. I might not have sat up the right interval in the rangerun with 0 0. My understanding is that 0 0 takes the full period of id == 1 :
Code:
rangerun EGARCH, interval( Timestamp 0 0) by(id) verbose
I was also reading about Mr.Cox's rangestat code but I was not able to incorporate the arch/garch models in rangestat.

I appreciated any help/tip in solving my issue.

Workshop on interpreting intetraction effects

Workshop: Interpreting Interaction Effects with ICALC
March 29-30, 2019 Philadelphia

Instructor: Robert L. (Bob) Kaufman
Hosted by: Department of Sociology, Temple University

The workshop is based on Interaction Effects in Linear and Generalized Linear Models (Kaufman, 2019), a comprehensive and accessible text providing a unified approach to interpreting interaction effects. The book develops the statistical basis for the general principles of a set of interpretive tools, introduces the ICALC Toolkit for Stata, and offers start-to-finish examples applying ICALC to show how to interpret interaction effects for a variety of different techniques of analysis.

The workshop provides a foundation in the principles of interpretation and training in the use of the ICALC Toolkit for Stata to produce the calculations, tables and graphics needed to help understand and explain your results.

Register at http://icalcrlk.com/workshop/ . Space is limited.

Array Array

threshold, gaps not allowed

Hi everyone,

I'm doing threshold model with the command "threshold" in Stata15.1, my data is strongly balanced, time is continuous and there's no missing value, but it shows "gaps not allowed". (please see attached) I hope anyone who has used this command could help me with this problem, thank you very much!

Best,
Yan
Array

2019 German Users Group Meeting

_________________________________

2019 GERMAN USERS GROUP MEETING
_________________________________


Date: May 24, 2019
Venue: Ludwig-Maximilians-Universität Munich
Cost: Meeting only: 45 EUR (students 35 EUR)
Workshop only: 65 EUR
Workshop and Meeting: 85 EUR
Submission deadline: February 1, 2019


Call for Presentations
======================

We would like to announce the 17th German Stata Users Group meeting to
be held Friday, May 24, 2019 at:

LMU Munich
Seidlvilla e.V.
Nikolaiplatz 1b
80802 München

All Stata users, from Germany and elsewhere, or those interested in
learning about Stata, are invited to attend.

Presentations are sought on topics that include the following:

- User-written Stata programs
- Case studies of research or teaching using Stata
- Discussions of data management problems
- Reviews of analytic issues
- Surveys or critiques of Stata facilities in specific fields, etc.

The conference language will be English, due to the international
nature of the meeting and the participation of non-German guest
speakers.


Submission guidelines
=====================

If you are interested in presenting a paper, please submit an abstract
by email to stata@soziologie.uni-muenchen.de (max 200 words). The
deadline for submissions is February 1, 2019. Presentations should be
20 minutes or shorter.


Registration
============

Participants are asked to travel at their own expense. There will be a
small conference fee to cover costs for refreshments and lunch. There
will also be an optional informal meal at a restaurant in Munich on
Friday evening at additional cost. You can enroll by contacting Peter
Stenveld or Elena Tsittser by email or by writing or phoning.

DPC Software GmbH
Prinzenstraße 2
42697 Solingen

Tel: +49 212 26066-44
Email: peter.stenveld@dpc-software.de, elena.tsittser@dpc-software.de

The final program will be circulated in March 2019.


Organizers
==========

Scientific Organizers
~~~~~~~~~~~~~~~~~~~~~

Katrin Auspurg
Ludwig-Maximilians-Universität München
katrin.auspurg@lmu.de

Josef Brüderl
Ludwig-Maximilians-Universität München
bruederl@lmu.de

Johannes Giesecke
Humboldt University Berlin
johannes.giesecke@hu-berlin.de

Ulrich Kohler
University of Potsdam
ulrich.kohler@uni-potsdam.de


Logistics Organizer
~~~~~~~~~~~~~~~~~~~

DPC software (dpc-software.de), the distributor of Stata in several
countries, including Germany, the Netherlands, Austria, the Czech
Republic, and Hungary.


Finding a mean when values are repeated

Hello,

I have data involving auctions and contracts. My task is to find the average of the winning bids. So, I used this code to get a formula for the max bid by contract:
egen maxbid= max(bid), by(contractnum).
My problem is that the values for maxbid are repeated every time a certain contract is mentioned. For example, contract 08-492904 has 7 bidders, so the winning bid is listed 7 times. As you can see, this would create incorrect values for the mean if I did sum maxbid. So, how can I fix this error?

does an independent variable predict dependent variable 1 or 2 better

Dear all,

I am running the below 2 models in Stata. They have different dependent variables, and some similar independent variables (certain independent variables were removed since they were statistically insignificant). Below please see the code and results.

I am interested in a variable that is in both models – dar. In the first model, the coefficient on dar is -.062(P<0.001) and the second model the value of the coefficient on dar is 0.058 (P=0.001). I am trying to evaluate whether dar “explains” the first dependent variable or the second dependent variable better. In the first model, values of the dependent variable are mostly between 0 and 100, but with some values above 100 and below 0. In the second model, the values of dependent range from 0-100.

Can I simply compare the magnitude of the coefficients on dar determine this? For example, since the magnitude is larger in the first model, can I conclude that dar “explains” dependent variable 1 better? Is there a better way to do this, for example, just have dar as the only independent variable and compare BIC?


Thanks!!!


Code:
. mixed dep1 dar alp mup zol mupbyzol || _all: R.bor || _all: R.mol || _all: R.pop, reml

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log restricted-likelihood = -37391.644  
Iteration 1:   log restricted-likelihood = -37391.644  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =      8,476
Group variable: _all                            Number of groups  =          1

                                                Obs per group:
                                                              min =      8,476
                                                              avg =    8,476.0
                                                              max =      8,476

                                                Wald chi2(5)      =     226.97
Log restricted-likelihood = -37391.644          Prob > chi2       =     0.0000

------------------------------------------------------------------------------
        dep1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         dar |  -.0624097   .0153719    -4.06   0.000     -.092538   -.0322814
         alp |   8.150469   1.526269     5.34   0.000     5.159037     11.1419
         mup |   8.152124   2.433408     3.35   0.001     3.382732    12.92152
         zol |   6.486658   .7586855     8.55   0.000     4.999661    7.973654
    mupbyzol |  -1.694464    .915423    -1.85   0.064     -3.48866    .0997322
       _cons |   81.06523   2.629083    30.83   0.000     75.91233    86.21814
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity               |
                  var(R.bor) |   63.10153   13.75592      41.16052    96.73842
-----------------------------+------------------------------------------------
_all: Identity               |
                  var(R.mol) |   18.28294   7.468788      8.209574     40.7166
-----------------------------+------------------------------------------------
_all: Identity               |
                  var(R.pop) |   30.08839   6.280492       19.9859    45.29751
-----------------------------+------------------------------------------------
               var(Residual) |   381.8593   5.912967      370.4442    393.6262
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 1810.17               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

.
. mixed dep2 dar alp mup zol alpbymup || _all: R.bor || _all: R.mol || _all: R.pop, reml

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log restricted-likelihood = -38246.114  
Iteration 1:   log restricted-likelihood = -38246.114  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =      8,463
Group variable: _all                            Number of groups  =          1

                                                Obs per group:
                                                              min =      8,463
                                                              avg =    8,463.0
                                                              max =      8,463

                                                Wald chi2(5)      =     246.42
Log restricted-likelihood = -38246.114          Prob > chi2       =     0.0000

------------------------------------------------------------------------------
        dep2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         dar |   .0576079   .0172886     3.33   0.001     .0237229    .0914928
         alp |  -9.863186   2.068844    -4.77   0.000    -13.91805   -5.808327
         mup |  -9.944255   3.193489    -3.11   0.002    -16.20338   -3.685131
         zol |  -6.253207   .4719679   -13.25   0.000    -7.178247   -5.328167
    alpbymup |  -2.215597   1.016369    -2.18   0.029    -4.207643   -.2235513
       _cons |   49.90762   3.607263    13.84   0.000     42.83752    56.97773
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity               |
                  var(R.bor) |   185.1551   38.18656        123.59    277.3884
-----------------------------+------------------------------------------------
_all: Identity               |
                  var(R.mol) |   34.39981   14.81115      14.79325    79.99239
-----------------------------+------------------------------------------------
_all: Identity               |
                  var(R.pop) |   49.99171   10.22253      33.48424    74.63724
-----------------------------+------------------------------------------------
               var(Residual) |   470.1817   7.287957      456.1124     484.685
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 2918.94               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

Optimize() on a segment

Hello,

I would like to maximize my function of a specific segment, let's say [a,b]. Does anyone know how to do this in mata?

Best regards,
Olga

ICC in mi estimate meqrlogit

Hi all,

I have successfully imputed values and run the mi estimate meqrlogit for multilevel analysis, but now I am having troubles to investigate the variance.

To investigate the variance with the un-imputed data set I used the “estat ICC” command to get the Intraclass correlations, but this command does not work with the imputed data set. How can I assess the variance?

Can someone help me, please?

fix effect and clustering residuals

Hello all,

I have a panel of prices of 4 products, in 200 towns, posted by 20 compnies for 12 month.

I would like to have s fix effect on products and towns. I assume that residuals of prices of each product in each town for each month are not iid so i would like to cluster.

would fix effect on product , town + cluster on companies is correct for that?

thank you!


3SLS with non-continuous endogenous variables

Hello Everyone,
Much thanks for the opportunity provided for clarification of doubts and guidance on data analysis.

I am specifying a system of 3 equations using the (reg3 command) for which the two endogenous variables in the system are not continuous. I would like to know if there is a command that can handle this situation. My system of equations appear as follows:

Y1= Y2X2 X3 X4
Y2 = X1+Z1+Z2
X1= V1+V3+V

Y1 (the dependent variable in the final outcome equation) is a continuous dependent variable whereas Y2and X1 are endogenous in the system but non-continuous. The standard reg3 command treats the dependent variables in all the three stages/equations as continuous. It is actually giving me some interesting estimates but I think that is not right since Y2 in equation 2 above is a Limited Dependent Variable (an index bounded between 0 and 1) and X1 in equation 3 is a binary variable (0=No and 1=Yes).

Kindly help me. I will really appreciate.
Thanks!!



Interpretation of Interaction terms

Dear Statalisters

I run following regression for panel data of children: PPVT being a test score, a treatment they get and sex being =1 of male and =2 if female.

Code:
xtreg PPVT treatment##sex (control variables) , fe cluster(ID)

---------------------------------------------------------------------------------------------
                            |               Robust
                    ppvtraw |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
                1.treatment |   5.594682    1.81707     3.08   0.002     2.031623    9.157742
                            |
                        sex |
                    female  |          0  (omitted)
                            |
              treatment#sex |
                  1#female  |  -8.300998   2.405033    -3.45   0.001    -13.01698   -3.585014

                       _cons |   80.75674   19.60288     4.12   0.000     42.31781    119.1957
----------------------------+----------------------------------------------------------------
How do I interpret this?
Can I read it like this:

1. males not getting treatment have an avg. score of 80.75
2. males getting treatment have an avg. score of 86.34 (80.75 + 5.59)
3. females getting treatment have an avg. score of 72.45 (80.75 - 8.30)
4. females not getting treatment have an avg. score of ??

or is it like this:

1. males not getting treatment have an avg. score of 80.75
2. males getting treatment have an avg. score of 86.34
3. females getting treated have an avg. score of 78.05 (80.75 + 5.59 - 8.30)
4. females not getting treatment have an avg. score of ??

Does the interaction coefficient say how much treatment changes the score for females or does it say how much the change differs from the male's change?
And how do I find the coefficient for being just a female without treatment?
Thank you so much for your help!


Best
Arto Arman

Sts graph truncate

Ciao, I am conduct survival analysis and need to truncate x-axis. The survival time data I have obtained start at a time of 5 but sts graph starts at 0. But there is no data that starts before 5. What do I do?


Code:
sts graph
is the code I use.


Array

regression of the panel data: dependent variable and independent variable is the same variable in different group

Hi, I am now running a panel regression, for example, return in year 2018=c+return in year 2010-2017+e. And year 2010-2017 is categorized as group 1, year 2018 is group 0. as attached.
year group return
2010 1 10
2011 1 11
2012 1 12
2013 1 13
2014 1 14
2015 1 15
2016 1 16
2017 1 17
2018 0 18
how should I do this?

Thank you and best,
Ivy

help me whit stata command

dear statalist
it's my first job with been I'm not very practical


I am working on a panel data to understand how the introduction of a bonus has changed the consumption of the population
following the application of the bonus. taking into account the year 2012 when the treatment had not yet been introduced and the year 2014 year
when the treatment was introduced. I have a treaties and untreated groups and the other variables that I took into consideration are the age,
the region of belonging and the number of family members,


is this command of state ok?
reg c bonus_renzi ncomp ireg eta post T_Post , vce(cluster nquest)


is it better that I divide ireg (north south center) and get three different regressions? how can I interpret these results?



Array


Standardised independent variables using logistic regression

I am running a model in which I want to check the substantive effect of the different independent variables. I know my corevar is statistically significant and it confirms my theoretical expectation but I'd like to engage in a discussion regarding its importance in comparison to other determinants well established to be important.

To that end, I re-estimate my model after standardising the continuous variables (using the center command) and leave my categorical and dummy variables untouched. I am surprised that there is no change in the size of the coefficient values which makes me think I’m running the standardised model incorrectly. The code I am using is attached below as is example data.

I assume that I shouldn't be standardising the dummies or the country and year fixed effects since the size of the "magnitude" is captured from the baseline.

Code:
logit voted i.corevar i.gender age i.ethnic eduyrs swi religious i.domicil swd econview politic_interest i.extreme lrscale i.cntryID i.year if fulldata==1 & identifier==1 [pw=weight], robust cluster(cntry)

center age eduyrs swi religious swd econview politic_interest lrscale [pw=weight] if fulldata==1 & identifier==1

logit voted i.corevar i.gender c_age i.ethnic c_eduyrs c_swi c_religious i.domicil c_swd c_econview c_politic_interest i.extreme c_lrscale i.cntryID i.year if fulldata==1 & identifier==1 [pw=weight], robust cluster(cntry)
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(voted corevar) double(gender agea) float ethnic double(eduyrs swi religious domicil swd econview politic_interest) float extreme double lrscale long cntryID float(year fulldata) double identifier float weight str2 cntry
0 0 0 64 0 12 3  1 3  7  8 1 0  5 1 2002 0 1  .2420854 "AT"
1 0 1 76 0  8 2  8 5  0  0 1 1  . 1 2002 0 1  .2095612 "AT"
1 0 0 63 .  9 3  7 3  5  7 3 0  5 1 2002 0 1 .25660998 "AT"
0 0 0 62 0 12 2  1 3  3  0 3 1  . 1 2002 0 1 .24710792 "AT"
1 0 0 33 0 12 3  5 3  5  4 3 0  5 1 2002 0 0 .12852219 "AT"
1 0 0 38 0 15 3  9 1  5  6 4 0  4 1 2002 0 0 .15444924 "AT"
1 0 1 68 0  6 3  5 4  5  5 2 1  . 1 2002 0 1  .1719059 "AT"
0 0 1  . 0 16 3  8 4  8  6 3 1  1 1 2002 0 1 .27422953 "AT"
1 0 0 31 0 13 4  2 3  6  7 2 0  7 1 2002 0 1  .2577774 "AT"
1 0 1 38 0 22 3  7 1  9  6 4 0  5 1 2002 0 1  .2987177 "AT"
1 0 1 48 0 12 2  5 1  8  6 1 0  4 1 2002 0 1  .3828788 "AT"
1 0 1 56 0  9 3  7 5  6  6 2 0  6 1 2002 0 1  .4570764 "AT"
1 0 1 73 0 11 3  8 4  9  7 2 0  5 1 2002 0 1  .2252532 "AT"
1 0 1 44 0 18 4  7 2  7  3 3 0  5 1 2002 0 1 .29472685 "AT"
1 0 0 24 0 11 4  7 4  7  5 4 0  6 1 2002 0 0 .29372233 "AT"
1 0 0 50 0 10 4  0 4  7  8 3 0  6 1 2002 0 1  .2444745 "AT"
1 0 1 76 0 11 3  8 1  8  8 2 0  6 1 2002 0 0 .13989751 "AT"
1 0 1 43 0 13 4  7 3  3  4 4 0  5 1 2002 0 1  .2252532 "AT"
1 0 0 51 . 12 4  5 4  0  0 2 1 10 1 2002 0 0 .09931013 "AT"
1 0 0 42 0 10 2  7 1  2  5 3 1  . 1 2002 0 1  .3089256 "AT"
1 0 1 77 0 11 3  5 4 10  2 1 1  9 1 2002 0 1 .18759787 "AT"
1 0 0 50 0 11 3  5 3  5  4 3 0  5 1 2002 0 1  .6195073 "AT"
1 0 1 30 0 12 3  5 4  4  2 1 1  . 1 2002 0 0 .50263196 "AT"
1 0 0 48 0 13 3  7 4  4  4 2 1  . 1 2002 0 1  .4655739 "AT"
1 0 1 74 0 18 4  7 1  .  7 4 0  7 1 2002 0 0 .13873011 "AT"
1 0 0 37 0 13 4  5 4  4  5 3 0  3 1 2002 0 1 .29616573 "AT"
0 0 0 41 0 11 3  8 4  5  . 2 1  0 1 2002 0 0 .09431476 "AT"
1 0 0 63 1 12 3  8 3  6  6 3 0  5 1 2002 0 0 .12102913 "AT"
1 0 0 24 0 13 3  2 3  9  6 3 0  4 1 2002 0 1  .2420854 "AT"
1 0 0 84 0  8 3  9 2  5  4 2 0  5 1 2002 0 0 .14730912 "AT"
1 0 1 67 0 14 4  5 3  4  7 3 0  7 1 2002 0 1 .27420238 "AT"
. 0 0 49 1 12 2 10 2 10 10 2 0  6 1 2002 0 1  .5663772 "AT"
1 0 0 65 0 12 3  5 4  5  6 2 1  2 1 2002 0 1   .331622 "AT"
. 0 0 17 0 11 4  3 5  6  4 2 0  5 1 2002 0 0  .3150069 "AT"
1 0 0 42 0  9 3 10 3  5  0 2 0  5 1 2002 0 1  .3771776 "AT"
1 0 0 59 0 19 3  5 4  1  4 3 1  . 1 2002 0 1 .18759787 "AT"
0 0 0 25 0 17 4  3 1  5  5 2 0  6 1 2002 0 0 .13101988 "AT"
1 0 0 48 0 12 3  5 2  5  4 4 0  7 1 2002 0 1 .25525254 "AT"
1 0 1 62 0 10 3  3 2  9  7 4 1  2 1 2002 0 0 .12219653 "AT"
1 0 1 47 0 15 4  8 4  5  6 3 0  5 1 2002 0 1 .32146835 "AT"
1 0 1 66 0 12 1  7 4  5  5 3 0  5 1 2002 0 1  .4328325 "AT"
1 0 1 44 0 13 1  7 1  9  5 1 0  3 1 2002 0 1  .2620669 "AT"
1 0 0 47 0 16 3  1 4  5  3 2 1  2 1 2002 0 0  .2439858 "AT"
1 0 0 45 0 19 2  6 4  2  3 3 1  8 1 2002 0 0 .20920826 "AT"
1 0 0 68 0  8 3  5 4 10  7 2 1  8 1 2002 0 1 .16520014 "AT"
1 0 1 21 0 12 3  7 4  0  . 3 1  0 1 2002 0 0  .4336199 "AT"
1 0 1 33 1 22 3  5 1  8  7 3 0  5 1 2002 0 0 .12762627 "AT"
. 0 0 18 1 12 3  5 3  4  7 3 0  5 1 2002 0 0  .3036045 "AT"
1 0 1 52 0 11 4  3 3  1  2 2 0  3 1 2002 0 1 .20280117 "AT"
0 0 1 38 0 14 4  0 2  6  8 3 0  5 1 2002 0 1  .2845732 "AT"
1 0 1 21 0 14 3  4 4  5  6 2 0  3 1 2002 0 0  .8539368 "AT"
1 0 1 20 0 12 2  1 4  8  4 1 1  . 1 2002 0 0 .12662177 "AT"
1 0 1 61 0  8 3  8 4  6  5 4 1  . 1 2002 0 1 .18447576 "AT"
1 0 1 53 0 11 3  6 4  7  7 4 0  5 1 2002 0 1  .3045004 "AT"
1 0 0 59 0  9 2  5 4  3  3 2 1  . 1 2002 0 0 .09882145 "AT"
1 0 1 61 0 14 3  6 4  8  6 3 0  6 1 2002 0 1  .2063305 "AT"
0 0 0 21 0 14 4  7 2  7  6 1 0  4 1 2002 0 0   .426344 "AT"
1 0 1 65 0  9 3  6 4  9  9 2 1  8 1 2002 0 1  .5275817 "AT"
1 0 1 63 1 18 3  5 1  6  5 3 0  3 1 2002 0 1  .4127696 "AT"
0 0 0 84 0  8 1  9 2  .  3 1 0  5 1 2002 0 0 .12561727 "AT"
1 0 0 85 0 10 3  8 4  7  5 2 1  . 1 2002 0 0 .09966306 "AT"
. 0 1 33 1  6 1  8 2  .  . 2 1  . 1 2002 0 1 .25424805 "AT"
0 0 0 20 0 12 4  1 1  4  1 1 0  4 1 2002 0 0  .3042832 "AT"
1 0 1 41 0 12 3  8 2  4  3 3 0  5 1 2002 0 0 .10712897 "AT"
1 0 1 41 0 16 3  0 4  8  8 2 0  5 1 2002 0 1  .3798653 "AT"
1 0 1 65 0 12 3  3 2  0  0 2 0  4 1 2002 0 1 .27979502 "AT"
1 0 0 35 0 20 4  0 1 10  8 4 0  4 1 2002 0 1  .2677681 "AT"
1 0 1 56 0  . 3  5 4  7  7 3 0  5 1 2002 0 1 .20641196 "AT"
0 0 1 23 0 13 4  0 4  .  . 1 1 10 1 2002 0 0 .09936443 "AT"
1 0 1 70 0  8 2  9 4  6  5 1 1  . 1 2002 0 1  .6423666 "AT"
1 0 1 46 0 16 3  0 3  7  5 4 0  5 1 2002 0 0 .13710119 "AT"
0 0 0 50 0  9 3  2 4  6  3 1 0  5 1 2002 0 1  .3150069 "AT"
1 0 1 55 0  8 2  0 4  3  8 2 0  5 1 2002 0 1 .21960625 "AT"
1 0 1 29 1 12 2 10 1  7  8 3 1  . 1 2002 0 0 .13101988 "AT"
. 0 0 35 0 16 2  5 3  5  7 4 0  5 1 2002 0 1  .3101473 "AT"
1 0 0 23 0 12 1  3 3  5  3 1 0  4 1 2002 0 0 .12852219 "AT"
1 0 0 63 0 12 2  5 1  4  3 3 1  2 1 2002 0 0 .15154433 "AT"
0 0 1 53 0 19 3  5 1  3  7 4 0  5 1 2002 0 1  .6062316 "AT"
1 0 1 38 0 12 4  3 3  7  6 3 0  5 1 2002 0 1 .25392225 "AT"
1 0 1 25 0 18 4  3 1  .  5 2 1  8 1 2002 0 0 .13873011 "AT"
1 0 0 42 0  9 3  4 3  8  8 2 0  6 1 2002 0 0 .24846536 "AT"
1 0 1 73 0  9 3  2 1  7  6 2 0  4 1 2002 0 1  .3089256 "AT"
1 0 0 62 0 13 3  5 1  0  0 3 0  5 1 2002 0 0 .13194293 "AT"
1 0 0 79 1  8 3  8 2  7  8 2 1  . 1 2002 0 0 .13989751 "AT"
1 0 1 40 0 12 3  6 4 10  5 3 0  5 1 2002 0 1  .3045004 "AT"
1 0 1 55 0 18 4  0 3  8  9 1 1  8 1 2002 0 1 .25126168 "AT"
1 0 0 55 0 10 3  4 1  8  7 2 1  2 1 2002 0 0 .15444924 "AT"
1 0 0 36 1  9 3  7 4  0  3 3 0  6 1 2002 0 1  .2479224 "AT"
1 0 1 56 0 15 3  7 4  6  5 2 0  4 1 2002 0 1  .2479767 "AT"
1 0 1 71 0 18 4  7 3  6  6 4 0  7 1 2002 0 0 .13710119 "AT"
1 0 1 60 0 12 4  8 4  8  6 4 0  6 1 2002 0 1  .4959534 "AT"
1 0 1 68 0  8 4  4 2  7  7 4 1  2 1 2002 0 1 .25424805 "AT"
1 0 0 24 0 16 4  4 1  5  3 3 0  3 1 2002 0 1  .3139753 "AT"
0 0 1 62 0 12 3  2 4  4  4 3 0  7 1 2002 0 1  .2179773 "AT"
0 0 1 23 0  9 3  4 4  7  7 3 1  0 1 2002 0 0 .13951743 "AT"
1 0 0 63 0 20 3 10 2  2  4 4 1  0 1 2002 0 1  .3042832 "AT"
0 0 0 58 0 11 4  8 1  0  7 1 0  5 1 2002 0 0  .1422866 "AT"
1 0 1 66 0 11 2  8 3  .  5 3 0  4 1 2002 0 0 .09670385 "AT"
1 0 0 38 0 13 3  5 3  5  7 3 0  5 1 2002 0 0 .12778917 "AT"
1 0 0 53 0 11 4  0 1  4  6 2 0  7 1 2002 0 1  .3042832 "AT"
end
label values gender gender
label def gender 0 "Female", modify
label def gender 1 "Male", modify
label values agea agea
label values eduyrs eduyrs
label values swi swi
label def swi 1 "Very difficult on present income", modify
label def swi 2 "Difficult on present income", modify
label def swi 3 "Coping on present income", modify
label def swi 4 "Living comfortably on present income", modify
label values religious rlgdgr
label def rlgdgr 0 "Not at all religious", modify
label def rlgdgr 1 "1", modify
label def rlgdgr 2 "2", modify
label def rlgdgr 3 "3", modify
label def rlgdgr 4 "4", modify
label def rlgdgr 5 "5", modify
label def rlgdgr 6 "6", modify
label def rlgdgr 7 "7", modify
label def rlgdgr 8 "8", modify
label def rlgdgr 9 "9", modify
label def rlgdgr 10 "Very religious", modify
label values domicil domicil
label def domicil 1 "A big city", modify
label def domicil 2 "Suburbs or outskirts of big city", modify
label def domicil 3 "Town or small city", modify
label def domicil 4 "Country village", modify
label def domicil 5 "Farm or home in countryside", modify
label values swd stfdem
label def stfdem 0 "Extremely dissatisfied", modify
label def stfdem 1 "1", modify
label def stfdem 2 "2", modify
label def stfdem 3 "3", modify
label def stfdem 4 "4", modify
label def stfdem 5 "5", modify
label def stfdem 6 "6", modify
label def stfdem 7 "7", modify
label def stfdem 8 "8", modify
label def stfdem 9 "9", modify
label def stfdem 10 "Extremely satisfied", modify
label values econview stfeco
label def stfeco 0 "Extremely dissatisfied", modify
label def stfeco 1 "1", modify
label def stfeco 2 "2", modify
label def stfeco 3 "3", modify
label def stfeco 4 "4", modify
label def stfeco 5 "5", modify
label def stfeco 6 "6", modify
label def stfeco 7 "7", modify
label def stfeco 8 "8", modify
label def stfeco 9 "9", modify
label def stfeco 10 "Extremely satisfied", modify
label values politic_interest politic_interest
label def politic_interest 1 "Not at all interested", modify
label def politic_interest 2 "Hardly Interested", modify
label def politic_interest 3 "Quite Interested", modify
label def politic_interest 4 "Very interested", modify
label values lrscale lrscale
label def lrscale 0 "Left", modify
label def lrscale 1 "1", modify
label def lrscale 2 "2", modify
label def lrscale 3 "3", modify
label def lrscale 4 "4", modify
label def lrscale 5 "5", modify
label def lrscale 6 "6", modify
label def lrscale 7 "7", modify
label def lrscale 8 "8", modify
label def lrscale 9 "9", modify
label def lrscale 10 "Right", modify
label values cntryID cntryID
label def cntryID 1 "AT", modify
label values identifier partner
label def partner 1 "Lives with husband/wife/partner at household grid", modify

Rescaling

Hi Statlist!

Is there a way to rescale the x and y axis in order to show them in in Millions dollars rather than in the format I have in the following figure?
Array

What I did is basically a simple two-way scatter:
Code:
twoway(scatter avsales_new_no_outliers lagged_tot_sales)(lfit avsales_new_no_outliers lagged_tot_sales)
This is a data example:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(Year idfirm) str18 crp float(avsales avsales_existing avsales_new) double nprod float(birthfirm firstyear agefirm) double tot_sales
2008  1 "21ST CENTURY LABS"  94701.45         .  94701.45   4 2008 2008  0 211591.55354704778
2009  1 "21ST CENTURY LABS"  83603.08  83603.08         .   4 2008 2008  1  308746.0587639413
2010  1 "21ST CENTURY LABS"  91092.75  113848.5   69.7861   5 2008 2008  2  407237.1453954251
2011  1 "21ST CENTURY LABS" 118792.94 165043.63  3166.214   7 2008 2008  3  598549.0481277383
2012  1 "21ST CENTURY LABS"  104542.8 156613.64  401.1374   9 2008 2008  4  664061.6305459269
2013  1 "21ST CENTURY LABS" 73541.766 113564.58  1500.705  14 2008 2008  5    608140.67505112
2014  1 "21ST CENTURY LABS"  48557.22  61105.52 13421.997  19 2008 2008  6  715163.0758130773
2015  1 "21ST CENTURY LABS"  52369.71  52369.71         .  18 2008 2008  7  768412.7578732732
2004  2 "3M"                2039558.8 2039558.8         .   6 1975 2004 29  12237352.38962967
2005  2 "3M"                  2109991   2109991         .   6 1975 2004 30 11513177.507807251
2006  2 "3M"                2050668.5 2050668.5         .   6 1975 2004 31 12304011.532418355
2007  2 "3M"                  6330808   6330808         .   6 1975 2004 32 11406777.403632266
2008  2 "3M"                2734569.5   2849051   2047681   7 1975 2004 33 12355367.684264863
2009  2 "3M"                1841746.4 1841746.4         .   7 1975 2004 34 12892224.398947667
2010  2 "3M"                  5212612   5212612         .   6 1975 2004 35 10897763.444727022
2011  2 "3M"                  5262992   5262992         .   6 1975 2004 36  10580189.86100467
2012  2 "3M"                2037856.5 2037856.5         .   6 1975 2004 37 10967562.283051893
2013  2 "3M"                1823603.5 1823603.5         .   6 1975 2004 38  9124157.813647183
2014  2 "3M"                  1785749   1785749         .   4 1975 2004 39  7133930.997182059
2015  2 "3M"                  1356981   1356981         .   4 1975 2004 40  5427924.711631365
2004  3 "A-A SPECTRUM"       53702.81  60055.78  1608.419  46 1995 2004  9 2464566.2638161103
2005  3 "A-A SPECTRUM"       81622.65  117128.7 4692.8643  57 1995 2004 10   4406664.46751826
2006  3 "A-A SPECTRUM"       67388.88  85411.82  2806.715  55 1995 2004 11  1843225.499704768
2007  3 "A-A SPECTRUM"       11123.49 12000.184  3759.274  47 1995 2004 12  476002.8259390887
2008  3 "A-A SPECTRUM"       7850.393  8111.578   3279.65  37 1995 2004 13  160865.5648153848
2009  3 "A-A SPECTRUM"       6033.619  6586.994 2021.6517  33 1995 2004 14   82650.9899064962
2010  3 "A-A SPECTRUM"      4410.1416  6201.939 1065.4524  43 1995 2004 15 126687.58501765266
2011  3 "A-A SPECTRUM"      2412.5247 2653.7156   482.997  27 1995 2004 16  35362.49685707843
2012  3 "A-A SPECTRUM"       1075.685   1263.63 261.25616  16 1995 2004 17 16035.562916347304
2013  3 "A-A SPECTRUM"        3456.24  3290.421 4340.6104  19 1995 2004 18 22272.424034809985
2014  3 "A-A SPECTRUM"      1496.9843 1496.9843         .  15 1995 2004 19  21090.64112449751
2015  3 "A-A SPECTRUM"       4020.239  4886.911 120.21213  11 1995 2004 20  23837.82400447101
2004  4 "A-S MEDICATION"     56691.86  63403.79 17030.424 152 2000 2004  4  8398246.263306491
2005  4 "A-S MEDICATION"     78010.12  90833.17 12908.477 158 2000 2004  5  8251045.135936161
2006  4 "A-S MEDICATION"     94482.98 105549.44 18526.848 173 2000 2004  6  9847010.316997783
2007  4 "A-S MEDICATION"    126469.87 141906.94  54159.39 216 2000 2004  7 13503137.133153718
2008  4 "A-S MEDICATION"     78780.68  90365.45  8950.199 253 2000 2004  8 13768126.595234673
2009  4 "A-S MEDICATION"    73256.164  80078.37  8707.616 272 2000 2004  9  13834060.86932386
2010  4 "A-S MEDICATION"     62012.55  64500.59 18223.094 279 2000 2004 10 12531817.194234656
2011  4 "A-S MEDICATION"      70322.5  72763.29  3689.045 283 2000 2004 11 12433251.178176137
2012  4 "A-S MEDICATION"     64410.49  65984.35   38950.9 292 2000 2004 12 11206517.411851741
2013  4 "A-S MEDICATION"        79104  84296.03  7899.004 309 2000 2004 13 14594676.742348071
2014  4 "A-S MEDICATION"    119126.19  129064.8   4067.63 327 2000 2004 14 26481277.906628065
2015  4 "A-S MEDICATION"     176931.8 179580.45  56984.34 324 2000 2004 15  29946995.67406721
2007  5 "A.J. BART, INC."   157.76617         . 157.76617   1 2007 2007  0   39.4415442669969
2004  6 "AAIPHARMA"          28066686  28066686         .   1 1975 2004 29  28066686.19154623
2005  6 "AAIPHARMA"           5704382   5704382         .   1 1975 2004 30  5704381.338495776
2006  6 "AAIPHARMA"          496491.9  496491.9         .   1 1975 2004 31 496491.87314146105
2007  6 "AAIPHARMA"          9421.785  9421.785         .   1 1975 2004 32  9421.785319502253
2008  6 "AAIPHARMA"          1577.368  1577.368         .   1 1975 2004 33  394.3420099042942
2009  6 "AAIPHARMA"          5520.829  5520.829         .   1 1975 2004 34 1380.2071011962253
2007  7 "AARON INDUSTRIES"   35459.79         .  35459.79   1 2007 2007  0  35459.78814655074
2008  7 "AARON INDUSTRIES"   42076.49  42076.49         .   1 2007 2007  1  42076.49109553711
2009  7 "AARON INDUSTRIES"   381988.5  381988.5         .   1 2007 2007  2   381988.536163157
2010  7 "AARON INDUSTRIES"  1804061.4 1804061.4         .   1 2007 2007  3  451015.3511721381
2011  7 "AARON INDUSTRIES"   270585.6  270585.6         .   1 2007 2007  4 270585.60116440145
2012  7 "AARON INDUSTRIES"    1318229   1318229         .   1 2007 2007  5 329557.26253078564
2013  7 "AARON INDUSTRIES"  1220001.8 1220001.8         .   1 2007 2007  6  305000.4474947418
2014  7 "AARON INDUSTRIES"   693862.8  693862.8         .   1 2007 2007  7 173465.68589452517
2015  7 "AARON INDUSTRIES"  31625.557 31625.557         .   1 2007 2007  8  31625.55690904084
2004  8 "ABBOTT"             11277370  11535204  104568.8 133 1961 2004 43  1499855855.865098
2005  8 "ABBOTT"             18729456  19310292 578339.75 129 1961 2004 44 1689540069.7170644
2006  8 "ABBOTT"             18472182  18746220   8880865 108 1961 2004 45  1598121054.119387
2007  8 "ABBOTT"             32214094  32569138  25255262 103 1961 2004 46  1656992400.409758
2008  8 "ABBOTT"             36879372  38127580   7234408  99 1961 2004 47   2142910992.52344
2009  8 "ABBOTT"             34457748  34848544  67797.15  89 1961 2004 48 2201020370.9059067
2010  8 "ABBOTT"             35879268  37502196   4232198  82 1961 2004 49  2181284539.324025
2011  8 "ABBOTT"             32816464  34031788  2651.776  84 1961 2004 50 2029940306.3960855
2012  8 "ABBOTT"             27333772  31581784  865396.8  94 1961 2004 51 2225894374.1270456
2013  8 "ABBOTT"             46580236  48500412  976053.5  99 1961 2004 52  2727305730.021949
2014  8 "ABBOTT"             33949468  35378588  7886.491  99 1961 2004 53 2359259573.3842134
2015  8 "ABBOTT"             29031148  29661780  22129.54  94 1961 2004 54  2370530389.480409
2004  9 "ABBVIE"            300541632 300541632         .  33 1963 2004 41   9917874207.75519
2005  9 "ABBVIE"            398063520 398063520         .  33 1963 2004 42 10607521709.816193
2006  9 "ABBVIE"            711789248 711789248         .  32 1963 2004 43 11125455014.939121
2007  9 "ABBVIE"            485468992 485468992         .  32 1963 2004 44 10999908825.891459
2008  9 "ABBVIE"            471612384 499406368  26908488  34 1963 2004 45 11020835162.138296
2009  9 "ABBVIE"            619878528 619878528         .  34 1963 2004 46 11052308521.725805
2010  9 "ABBVIE"            775569536 775569536         .  34 1963 2004 47 11844314656.874342
2011  9 "ABBVIE"            471647968 471647968         .  34 1963 2004 48  12248782762.78322
2012  9 "ABBVIE"            423355936 423355936         .  33 1963 2004 49 12198970579.756456
2013  9 "ABBVIE"            670351680 691815744  47894516  30 1963 2004 50 11978041077.652853
2014  9 "ABBVIE"            574183808 630412352  11898461  33 1963 2004 51 13100594403.908373
2015  9 "ABBVIE"            532608064 548018368  24068394  34 1963 2004 52 16695333854.076557
2004 10 "ABER PHARM"        10196.007 10196.007         .   2 2003 2004  1 20392.013641854945
2008 11 "ABKIT"              830263.1         .  830263.1   2 2008 2008  0 1020956.3147491836
2009 11 "ABKIT"             1215518.4 1215518.4         .   2 2008 2008  1  2431036.790986024
2010 11 "ABKIT"             302000.13 302000.13         .   2 2008 2008  2  514778.3255435499
2011 11 "ABKIT"             133911.64 133911.64         .   2 2008 2008  3  66955.82008718849
2012 11 "ABKIT"             1857.2793 1857.2793         .   1 2008 2008  4 1857.2792628005145
2013 11 "ABKIT"             229202.33 229202.33         .   1 2008 2008  5  57300.58370722538
2014 11 "ABKIT"             124664.38 124664.38         .   1 2008 2008  6  31166.09542619379
2012 12 "ABL MEDICAL"        5597.375         .  5597.375   1 2012 2012  0 1399.3437130821144
2013 12 "ABL MEDICAL"        46875.85  46875.85         .   1 2012 2012  1 46875.852646417305
2014 12 "ABL MEDICAL"        92413.03  92413.03         .   1 2012 2012  2  92413.03425338621
2015 12 "ABL MEDICAL"       120786.42 120786.42         .   1 2012 2012  3 120786.41920184325
2004 13 "ACCENTIA PHARM"      3551802   3551802         .   1 2002 2004  2 3551802.0606248565
2005 13 "ACCENTIA PHARM"      5632205   5632205         .   1 2002 2004  3  5632204.502071029
2006 13 "ACCENTIA PHARM"      3682722   6138993 1226450.6   2 2002 2004  4 6445605.7851592805
2007 13 "ACCENTIA PHARM"     11139598  11139598         .   2 2002 2004  5  5607216.736637451
end

the exponential model, using the NLS, poisson and gamma QML estimators

Dear all,

I have a question related to the estimation of the exponential model. This is related to this thread:
https://www.statalist.org/forums/for...sion-nl-vs-reg

The recommendation was to use the Poisson (or gamma quasi-MLEs). If I got it right, this is because of efficiency gains when estimating the parameters.

The issue is my model not fully multiplicative but it has a sum. There is an extra parameter "s" which mean I cannot just add the independent variables. A simplified version of my structural equation would be:

Y = [ X1^(s-1) X2^(s) + X3^(s-1) X4^(s) ]^b1 [ X1^(-s) X2^(s) + X3^(-s) X4^(s) ]^b2

I could give a try to calibrate "s" (then I can compute the sums) to use Poisson or gamma quasi-MLEs. However, it is not clear this is a superior approach just because there is a gain in efficiency. I would lose "s". I had a look to the commands -ppml and -glm and they cannot fit this equation.

Is there still an alternative to NLS to estimate my equation? Any feedback would be most welcomed.

Best,

Paulo

PS: A less simplified version of my equation is

Y_i = [ (p_ij)^(s-1) (q_j)^(s) + (p_ik)^(s-1) (q_k)^(s) ]^b1 [ (p_ij)^(-s) (q_j)^(s) + (p_ik)^(-s) (q_k)^(s) ]^b2

where i, j and K can be the three individuals in my model. I think renaiming these variables as X1-X4 does not change anything but just in case.

Thursday, November 29, 2018

Sampling using sample command

Hello Everyone,

I have a dataset which has data for 100 households from each of 3 cities. Further, 5 members from each household are listed in the dataset. So in total, I have 1500 (3*100*5) observations. The household members are divided into 20 groups based on certain characteristics (each member is assigned a group number between 1 and 20).

Lets call the variables as city, household, member and group.

I want to select (using sample command or any other efficient method) 20 members from each city (one from each group). My condition is that only one member can be selected from each household.

When I run the following command:

bysort city group: sample 1, count

I get one member sampled from each group within each city but this command selects (in some cases) more that one members from one household.
What I want is if one member is selected from some group from household1, then no other member in a particular city should be sampled from household1 and one member from each group should also be selected from each city.

Kindly advise how can I achieve this.

Thank you!

Amit

Incidence rate

Dear All,ho
I used survival analysis to estimate the incidence of mother-to-child transmission of HIV. I used the command "stptime" after setting the "stset".I would like to know how to write about this calculation in the methodology section of my paper using the survival analysis. Please share, ff any one has already written about this in any of the research paper.

thanks and regards,

Rajaram S

Inquire about uniform random-number generation

The textbook (microeconometrics using stata) said,
For reproducibility of results, however, it is best to actually set the initial seed by using
Code:
set seed
. Then if the program is rerun at a later time or by a different researcher, the same results will be obtained.
However,
Code:
set seed 10101
scalar u = runiform()
display u
My display value is different from the textbook's.

Returns to education regression

Hi everyone, I am new to STATA and have a question about a returns to education regression I would like to run. My goal is to find out if returns to education differ for U.S. citizens vs. non-citizens. This is the functional form that I think would be most appropriate: log of wages= B0+ B1 years of education+ B2 male+ B3 experience + B4 experience^2+ B5 citizenship. What are your thoughts? Do you recommend I add or subtract anything? Also, I was thinking about adding an age and age^2 parameter, but the way in which I am calculating experience is ( age-years of education-6) therefore, I thought that would cause collinearity between age and experience? I would be appreciative of any feedback.

URGENT help needed with university data analysis coursework

Here's the deal. I am an absolute noob when it comes to using Stata, having used it only once in my life. My coursework is due very soon and I really need help with the questions. It requires me to include each stata command that would be used to answer each question, and an explanation as to how I managed to answer the question.

I would be extremely grateful if anyone with even the slightest knowledge of using Stata could contribute to this post and help me out!

The questions are attached!


How to analyze intergroup interaction in subgroup analysis pf meta-analysis (metan, network)

Hi,

I would like to conduct a subgroup analysis of meta-analysis and check intergroup interaction.

I'm using metan or network for the analysis.

Does anyone know if we have an appropriate command or option to get the results of the interaction of subgroup analysis?

I'm uusing STATA 15.

Thank you

Yoshibobu Kondo

Heckman model

Dear Stata Users,

I am applying the two step Heckmam model, but I am having a problem.

After having completed the Heckman model.

As further check I have tried to perform the probit model (decision equation of Heckman procedure).

However, even though I use exactly the same variables, the coefficients of some independent v. and pvalues are completely different from those obtained with Heckman.

I have already Heckman in past, but It is the first time i meet this problem.

If someone can help, I will strongly appreciate

thanks in advance

Nicola


​​​​​​​

statistical significance

Hi, How check in stata statistical significance between price and country of origin from data stata.auto ?

How to test correlation between two variables – Panel data

Hi all!

I wonder how it is possible to test for correlation between two variables (panel data). What is the praxis in this case? Do I run a regular panel regression between these two variables or else?


Thanks!

xtoprobit vs xtreg, fe

I want to run a fixed effects regression model in stata using panel data to examine the change in individuals' responses over time. My dependent variable is ordinal so I was planning on using xtoprobit, however, I realize this would be using random effects.

Should I use xtoprobit even though it uses random effects? Or would it be best to run a linear regression with fixed effects (xtreg, fe)? Thank you

Add Confidence intervals to median spline

I would like to add 95% confidence intervals to a median spline. It does not appear to be an option of mspline and the other confidence interval shading options seem to be determined by a regression plot

for example,

Code:
use http://www.stata-press.com/data/r13/auto, clear
tw (qfitci mpg weight, nofit fintensity(10)) (scatter mpg weight, msize(*.5)) (mspline mpg weight)
Is there a way to have the shading determined by the spline as opposed to the quadratic regression?


Using maps to present incidence rates

Hi everyone

I have calculated incidence rates (per 100 000 person years) for disease X by regions in Norway and want to present them in a map by different colours. I know that this is possible but have no clue how to do it?
Any suggestion is wellcomed


Gerhard

Test: latex

If there are \(\pm 1\%\)

Hausman test

Hey, when running the hausman test, should I include all variables? So dependent, independent, moderator (how? with #?) and controll variables?
Do I need to define them somehow?
Because if I run the test like that I get the note: the rank of the differenced variance matrix (4) does not equal the number of coefficients being tested (5); be sure this is what you expect, or there may be problems computing the test. Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.

Thank you!

Use e(selected0) of Lasso Regression in another regression

Hello,

I ran a Lasso Regression and found the best regressors:

rlasso a b c d e f

Assume b c d are selected as best regressors. I can access the list by > di e(selected0) . My question is how I can run another regression and use those selected regressors there

>reg a e(selected0) returns an error.

Thanks.

Sam

Extract Year from Date (11985 - 122018)

Hello, I am working with the BRFS data. I am interested in using a variable - Recent HIV testing.


The variable is recorded in month&date format as in 11985 (for January 1985) and 122018 (December 2018).

I want to create a new variable that just records the years from 1985 thru 2018.

I tried todate command -

Code:
todate HIVTSTD3, gen(hivtestdate) p(mmyyyy)
But this only gives me the YEAR with m1, m3 etc as in 1983m12

I will appreciate any help to accomplish this.


thanks - cY

Inconsistency of data in national survey

Good morning, sorry I am dealing with a topic of inconsistency of data generated by me with those generated by the report according to regions. My goal is to present a map by departments and the results differ by +/- 1% and the degree of intensity the standard used in the report with mine reports different by the difference. How could this data inconsistency be treated?

Sorry for writing this point here, and I appreciate your response.

Generating data to balance a panel dataset

Hi,

I have a panel from Compustat like the below in Table 1 (table 1 is a subset of the data to show 3 different example issues), where gvkey is the firm-specific identifier, fyear is the reporting year, emp is employment, and dlrsn is the reason the firm dropped out of the dataset.

Table 1
gvkey fyear emp dlrsn
001 1996 2 02
001 1997 3 02
001 1998 2 02
001 1999 1 02
002 1996 4 06
002 1997 5 06
002 1998 4 06
002 1999 3 06
002 2000 3 06
002 2001 3 06
002 2002 3 06
002 2003 3 06
003 1996 7 .
003 1997 8 .
. . . .
. . . .
. . . .
003 2016 14 .

I need employment data for each firm all the way up to and including 2016 (as shown in Table 2). However, many firms drop out of the dataset (e.g., because of bankruptcy). For such firms, I want to generate employment numbers for all years from the last date they reported, going up to and including 2016 using the following methodoloy:
  1. If dlrsn is 02 or 03, set employment number to zero from the first year after the last reporting year going up to and including 2016. For example, in Table 2, firm 001 reports up to 1999, I would like to generate data that has fyears 2000-2016 and employment set at 0 because dlrsn is 02.
  2. If dlrsn is 01,04,05,06,07,09,10,20, use the last reported employment number for all years after the last reported year. For example, in Table 2, firm 002 reports up 2003, I would like to generate data that has fyear 2004-2016 and employment is set equal to the last available employment number (i.e.,4) because dlrsn is 06
  3. If the firm does not drop out of the dataset, nothing should change.
Table 2
gvkey fyear emp dlrsn
001 1996 2 02
001 1997 3 02
001 1998 2 02
001 1999 1 02
001* 2000* 0* 02*
001* 2001* 0* 02*
001* 2002* 0* 02*
. . . .
. . . .
. . . .
001 2016 0* 02
002 1996 4 06
002 1997 5 06
002 1998 4 06
002 1999 3 06
002 2000 3 06
002 2001 2 06
002 2002 3 06
002 2003 4 06
002* 2004* 4* 06*
002* 2005* 4* 06*
. . . .
. . . .
. . . .
002* 2016* 4* 06*
003 1996 7 .
003 1997 8 .
. . . .
. . . .
. . . .
003 2016 14 .
Essentially, I want to get from Table 1 to Table 2 and would very much appreciate any advice (note there are thousands of firms where different dates and the above is just a example). I tried the following code but got a bit stuck:

gen dldteyear=year(dldte)
bysort gvkey: egen lastdate=max(fyear)
expand yeardiff if fyear==lastdate


The code was able to duplicate the last reported date in Table 1 the correct number of times but then I was a bit stuck with how to do the next step, I was thinking replace years because they would have to be made consecutive and then replace employment but this started to get a bit messy. I am sure there is probably a better approach then the one I am taking which seems rather mechanical.

Thanks in advance for all support.


Best,
Ali


Getting different results from running same exact code - why?

I am dropping observations based on various if statements written in a do file. I have all the code in the do file highlighted and run it. All I do is "clear" and "log close" in the command window, and hit run again without touching the highlighted code or even reselecting the code. I've run the code 10 times in a row this way, and sometimes I get n=1620 and sometimes I get n=1621. No code has changed between each time I run it, so I am positive that Stata is producing different results when given the exact same input. Why is this happening?


Running Diebold Mariano test using panel data.

Hi everyone,

I am totally new here and very new to Stata, nice to meet you all. This maybe a very simple question, I am sorry if this makes you feel that you are wasting your time. Your help would be very appreciated. I have a panel data with 320 financial products and 20 quarters of data. There are two sets of pricing predictions and the real price trend for each product. The goal of the test is to see which pricing function predicted the real price better in sample. As far as I know and tried, DM test can only run on time series data, so I tried to run the test separately for each product as a time series test, using loop and statsby statement. The code is like the following, Stata says there is repeated time values in sample. But I double checked that there is no duplicated time for each id.

forvalues i = 1(1)320 {
if id == `i' {
tsset quarter
dmariano ln_price ln_VC ln_VT
}
}

Can someone show me to the right direction, Thanks.


Boheng


Latent class and standard erros

Dear Stata users,
I am running a latent class analysis with gsem command. I had some problems on convergence reported in a previous post, but I've solved them.
Now my problem is the following: my model has 6 classes, 11 variables and 6 covariates of the membership functions. Everything works fine, the model converge quite quickly. The only problem is that only for one class and only for one variable (a very important dummy variable equal to 1 if the person is retired) of the membership function, I have no value of the standard error. The value is -702,55 so very high in absolute terms. I suppose this means that for this item the probability is very close to 0.
In this situation, are the values of all the other parameters reliable or not? I mean, can I still use my results?
Thank you

hybrid model pseudo-panel.

hello!

this might sound like a silly question, but I was wondering if one might be able to fit a hybrid model (xthybrid) also when dealing with a pooled crossection. Specifically I have individuals within countries over three year waves (individuals are not the same).
I can run a FE model using dummies for countries and years, but would like to do a bit more.

I tried using
Code:
 xtset country year
which leads nowhere.

Dynamic panel data model with xtabond2/ivreg2/Difference-in-Hansen test

Dear all,
I am using an unbalanced panel data with T=11 and N=200,000 to estimate a dynamic panel data model with the xtabond2 command (with Stata 14.1).

So far, I have assumed that the lag of my dependent variable (L.y) is endogenous and all of my other control variables are exogenous. When I estimate the model using xtabond2 and a two-step system GMM with the underlying assumptions, the coefficients are alright but the AR(2)-test is pretty weak and the Hansen-test doesn’t show the desired result. I dropped the first two years manually and I used the suboption orthogonal since I have gaps in my data.
Do you think that my specification in xtabond2 is correct or did I make some mistakes that I didnt’t recognize so far?
Code:
xtabond2 y L.y x1 x2 x3 x4 x5 x6 x7 yr3-yr11, twostep robust orthogonal ///
> gmm(L.y, lag(2 6) equation(both)) ///
> iv(x1 x2 x2 x4 x5 x6 x7, equation(both)) ///
> iv(yr3-yr11, equation(level))
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------
             |              Corrected
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           y |
         L1. |   .9530541   .0045813   208.03   0.000     .9440749    .9620332
             |
          x1 |   .0009905   .0030752     0.32   0.747    -.0050369    .0070178
          x2 |   -.001352   .0025792    -0.52   0.600    -.0064072    .0037032
          x3 |   .0043922   .0084974     0.52   0.605    -.0122625    .0210468
          x4 |  -.0003227   .0000352    -9.18   0.000    -.0003917   -.0002538
          x5 |  -.0012368   .0007791    -1.59   0.112    -.0027639    .0002903
          x6 |   .0751949   .0200101     3.76   0.000     .0359758    .1144139
          x7 |  -.0035094   .0006449    -5.44   0.000    -.0047734   -.0022453
         yr3 |   .3537909   .0098877    35.78   0.000     .3344114    .3731705
         yr4 |   .3586882   .0052821    67.91   0.000     .3483354     .369041
         yr5 |   .1584298   .0051346    30.86   0.000     .1483661    .1684935
         yr6 |   .1583574   .0041391    38.26   0.000     .1502449    .1664698
         yr7 |   .1061737   .0038656    27.47   0.000     .0985972    .1137502
         yr8 |   .1504396   .0037814    39.78   0.000     .1430281     .157851
         yr9 |   .1798849   .0034038    52.85   0.000     .1732137    .1865561
       yr10 |   .1689228   .0034978    48.29   0.000     .1620672    .1757784
       yr11 |    .0115624   .0032083     3.60   0.000     .0052743    .0178506
    _cons |    .1823155   .0733113     2.49   0.013      .038628    .3260029
 ------------------------------------------------------------------------------
Instruments for orthogonal deviations equation
  Standard
    FOD.(x1 x2 x2 x4 x5 x6 x7)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/6).L.y
Instruments for levels equation
  Standard
    yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 y11
    x1 x2 x2 x4 x5 x6 x7
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.L.y
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -58.80  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -1.57  Pr > z =  0.117
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(36)   = 313.40  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(36)   = 292.69  Prob > chi2 =  0.000
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(28)   = 277.21  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(8)    =  15.48  Prob > chi2 =  0.050
  iv(x1 x2 x2 x4 x5 x6 x7)
    Hansen test excluding group:     chi2(30)   = 275.27  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(6)    =  17.42  Prob > chi2 =  0.008
  iv(yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11, eq(level))
    Hansen test excluding group:     chi2(27)   = 188.63  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(9)    = 104.06  Prob > chi2 =  0.000
When I specify my iv’s separately, I get similar results for the coefficients but different results for the Difference-in-Hansen test:
Code:
Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(29)   = 279.04  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(8)    =  14.31  Prob > chi2 =  0.074
  iv(x1)
    Hansen test excluding group:     chi2(36)   = 292.79  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.56  Prob > chi2 =  0.454
  iv(x2)
    Hansen test excluding group:     chi2(36)   = 292.54  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.81  Prob > chi2 =  0.368
  iv(x3)
    Hansen test excluding group:     chi2(36)   = 292.69  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.66  Prob > chi2 =  0.417
  iv(x4)
    Hansen test excluding group:     chi2(36)   = 293.14  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.21  Prob > chi2 =  0.643
  iv(x5)
    Hansen test excluding group:     chi2(36)   = 293.26  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.09  Prob > chi2 =  0.766
  iv(x6)
    Hansen test excluding group:     chi2(36)   = 293.26  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.10  Prob > chi2 =  0.757
  iv(x7)
    Hansen test excluding group:     chi2(36)   = 293.35  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(1)    =   0.00  Prob > chi2 =  0.982
  iv(yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11, eq(level))
    Hansen test excluding group:     chi2(28)   = 188.71  Prob > chi2 =  0.000
    Difference (null H = exogenous): chi2(9)    = 104.64  Prob > chi2 =  0.000
I am struggling with the correct interpretation of the Difference-in-Hansen test since the 'Hansen test excluding group' is always highly rejected whereas the 'Difference' test is not.

I am also not sure if all of my controls are really exogenous or if they're predetermined or even endogenous.
Can I use the ivreg2 command to test if my controls are exogenous? So for example:
Code:
xi: ivreg2 y L.y x2 x3 x4 x5 x6 x7 (x1=L.x1) i.yr, gmm2s robust cluster(ID) endogtest(x1)
Do you have any suggestions how I could fix my regression command so that the AR(2) and Hansen test show the desired results?
Thanks a lot in advance for your help, any help is highly appreciated.

Kind regards,
Ferdi

How to create a time dimension in one cross-sectional dataset with individual ID, more than one job, each with a starting and ending year

How to create a time dimension in one cross-sectional dataset with individual ID, more than one job, each job with a starting year and ending year, possible overlap in years between jobs?

why I would have different number of cases between models with MI'd data (possibly because of collinear dependencies of one varaible)?


Hi Statalist,
I am running three logistic models with the same DV with Stata 12.
Model 1 has all variables but the control variables.
Model 2 has all variables.
Model 3 has only the significant variables (only 3).

The regression results say Model 1 and 2 have 145 observations, but model 3 has 149 observations (which are all my cases). I wish to know why.
I did multiple imputation on all variable with missing data, so I do not think listwise deletion due to missing data should be the issue.
But one thing I notice that is different between a) models 1 and 2 and b) model 3, is that model 3 does not include one binary variable that in the model 1 and 2 output gives a logistic coefficient of 0 and odds ratio of 1 with both SEs omitted. I know this indicates that this variable is collinear with another and so is dropped, but I wish to know why this would reduce my number of cases and if there is anything i can do about it?
Or maybe I have different number of cases between the models for another reason?
I know that is best practice to have the same number of observations in all models so it would be great to get advice on how to resolve this.

I provide my output below:
model 1
. mi estimate, or: logistic passportdenied ethnicmin foreign intervention democracy social religion independence

Multiple-imputation estimates Imputations = 40
Logistic regression Number of obs = 145
Average RVI = 0.0000
Largest FMI = 0.0000
DF adjustment: Large sample DF: min = 1.68e+67
avg = 1.68e+67
max = .
Model F test: Equal FMI F( 6, 1.2e+69)= 1.15
Within VCE type: OIM Prob > F = 0.3295

--------------------------------------------------------------------------------
passportdenied | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
ethnicmin | 2.262649 2.020946 0.91 0.361 .3929555 13.02839
foreign | .5196286 .353581 -0.96 0.336 .1369284 1.971935
intervention | 4.023013 2.457557 2.28 0.023 1.214994 13.32076
democracy | 1.264409 1.129343 0.26 0.793 .2195902 7.280522
social | .7929084 .5783657 -0.32 0.750 .1898178 3.312143
religion | 1.325323 1.68382 0.22 0.825 .109868 15.98718
independence | 1 (omitted)
_cons | .0730259 .0703768 -2.72 0.007 .0110447 .4828366
--------------------------------------------------------------------------------

. *model 2 full model: including control variables + variables of interest
. mi estimate, or: logistic passportdenied bardate numdetained ageatbar male ethnicmin educyear foreign democracy social religion independence

Multiple-imputation estimates Imputations = 40
Logistic regression Number of obs = 145
Average RVI = 0.0447
Largest FMI = 0.1887
DF adjustment: Large sample DF: min = 1111.90
avg = 301944.31
max = 1676960.32
Model F test: Equal FMI F( 10,183109.0)= 0.72
Within VCE type: OIM Prob > F = 0.7038

--------------------------------------------------------------------------------
passportdenied | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
bardate | .9810666 .024959 -0.75 0.452 .9333471 1.031226
numdetained | 1.253062 .4282628 0.66 0.509 .6412875 2.448456
ageatbar | .9462083 .0273604 -1.91 0.056 .894019 1.001444
male | .9228291 .7386526 -0.10 0.920 .1922208 4.430391
ethnicmin | 2.227631 2.148039 0.83 0.406 .3365465 14.74489
educyear | 1.046962 .0832463 0.58 0.564 .8957871 1.223648
foreign | .9347841 .5921365 -0.11 0.915 .2700964 3.235219
democracy | 2.196149 2.257036 0.77 0.444 .2929911 16.4615
social | 1.096905 .8634727 0.12 0.906 .2344802 5.131356
religion | 2.74065 3.729275 0.74 0.459 .1903649 39.45665
independence | 1 (omitted)
_cons | 6.38e+15 3.23e+17 0.72 0.473 4.24e-28 9.59e+58
--------------------------------------------------------------------------------


****in my simplified, best fit model final model3 , I only include only intervention
> and age at bar*/
. mi estimate, or: logistic passportdenied intervention ageatbar

Multiple-imputation estimates Imputations = 40
Logistic regression Number of obs = 149
Average RVI = 0.0651
Largest FMI = 0.1566
DF adjustment: Large sample DF: min = 1612.71
avg = 421957.67
max = 1261997.47
Model F test: Equal FMI F( 2, 9397.0) = 3.87
Within VCE type: OIM Prob > F = 0.0209

--------------------------------------------------------------------------------
passportdenied | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
intervention | 3.246078 1.791845 2.13 0.033 1.100253 9.57691
ageatbar | .9524412 .0250589 -1.85 0.064 .9045364 1.002883
_cons | .4047996 .3786314 -0.97 0.334 .0646604 2.534206
--------------------------------------------

Problem of handling missing data

Hi everyone,
I am currently working on looking at the impact of intellectual property rights on the Indian pharmaceutical industry. I have a panel data set (secondary data from CMIE) of 350 firms across 28 time periods. However, I am facing a big problem with regard to missing data. Almost all the variables I need to consider in the model (Eg: R&D=f(pat, exports, imported tech etc) have missing data ranging from 10% to 30%. How best would you suggest I handle this problem before undertaking any analysis? List wise deletion in Stata reduces the number of firms to 68, drastically reducing the sample size.

Is multiple imputation of data when all variables have some missing values a possibility in Stata?
Thank you in advance!