I used vce(cluster) to account for clustering within 9 groups in a set of nested logistic regression models. Stata doesn't want to give me Wald chi2 stats because I have too many variables in the model in relation to # of clusters, and used up my df. Stata also said both Wald and lrtest would be misleading. So, what *wouldn't* be misleading to report to describe fit and compare fit among nested models? Are pseudo-R square, AIC, BIC, and log likelihood #s still meaningful to interpret? Or are there other stats I don't know about?
Thanks in advance!
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Friday, November 30, 2018
How do I combine observations within a dataset?
Hi all,
I appended two datasets based on ID. I was not able to merge these datasets, because ID did not identify individual observations. So,I ended with a dataset that looks like the one below.
I would like to modify the dataset above, to look like the one below.
In other words, I would like to keep observations identified by ID that have both date1 and date2, but the combinations of ID date1 and date2 vary.
ID 1 has one date1 and one date2. These just need to be combined into one observation.
ID 2 has two date1 and one date2. In this case I need to combine the observations into two distinct observations, the firsts with one date1 and date2, and the second with the other date1 and date2 again.
ID 3 only has date1, these must be dropped.
ID 4 has one date1 and two date2. Similar to ID 2, in this case I need to combine the observations into two distinct observations, the first with date1 and one date2, and the second with date1 and date2 again.
Thank you very much!
I appended two datasets based on ID. I was not able to merge these datasets, because ID did not identify individual observations. So,I ended with a dataset that looks like the one below.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long ID float(date1 date2 duplicate) 1 20737 . 1 1 . 20438 2 2 . 20775 1 2 20930 . 2 2 21129 . 3 3 20796 . 1 3 21157 . 2 4 . 20873 1 4 . 20180 2 4 20858 . 3 end format %td date1 format %td date2
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long ID float(date1 date2 duplicate) 1 20737 20438 1 2 20930 20775 2 2 21129 20775 3 4 20858 20873 1 4 20858 20180 2 end format %td date1 format %td date2
ID 1 has one date1 and one date2. These just need to be combined into one observation.
ID 2 has two date1 and one date2. In this case I need to combine the observations into two distinct observations, the firsts with one date1 and date2, and the second with the other date1 and date2 again.
ID 3 only has date1, these must be dropped.
ID 4 has one date1 and two date2. Similar to ID 2, in this case I need to combine the observations into two distinct observations, the first with date1 and one date2, and the second with date1 and date2 again.
Thank you very much!
Difference between catplot and tabplot
Dear Stata users,
I have a dataset comprised of two variables, the foobarx variable is what I concerned, the type variable is an indicator of different types of foobarx. I use -catplot- and -tabplot- (both from SSC) to generate three plots, g1, g2 and g3. The g1 plot seems to be identical to g2 plot, but what g3 plot means in this case? Thank you!
I have a dataset comprised of two variables, the foobarx variable is what I concerned, the type variable is an indicator of different types of foobarx. I use -catplot- and -tabplot- (both from SSC) to generate three plots, g1, g2 and g3. The g1 plot seems to be identical to g2 plot, but what g3 plot means in this case? Thank you!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int type double foobarx 1 1 1 1.3 1 1.5 1 1.2 1 1 1 2.4 1 1.5 1 1 1 2.1 1 1.4 1 3 1 1.2 1 2 1 1 1 1 1 1.2 1 2 1 .8 1 2.1 1 .1 1 1 1 2 1 1.4 1 2 1 2 1 1.7 1 1.3 1 2 1 1.5 1 .8 1 .83 1 2.2 1 .8 1 .8 1 1.46 1 .8 1 .5 1 2 1 2.5 1 1.2 1 .7 1 .3 1 1.5 1 .8 1 .8 1 .03 1 .4 1 1.4 1 1 1 1.4 2 .4 2 2.6 2 4.375 2 1.2 2 1 2 2.4 2 1.5 2 .5 2 2.1 2 .66 2 4 2 1.5 2 2 2 1 2 1 2 1.2 2 1.66 2 .8 2 2.1 2 .07 2 .5 2 2 2 1.4 2 2 2 1.6 2 1.7 2 1.3 2 1.69 2 1.5 2 .32 2 .416 2 2 2 .3 2 1 2 1.2 2 .8 2 .5 2 .8 2 2.5 2 1.2 2 .7 2 .3 2 .7 2 .667 2 .8 2 .02 2 1.25 2 1.4 2 .5 2 1.4 end graph drop _all catplot type foobarx, recast(bar) asyvars var2opts(label(labsize(tiny))) legend(order(1 "Type==1" 2 "Type==2")) name(g1) tabplot type foobarx, separate(type) xlabel(, labsize(tiny)) name(g2) tabplot foobarx, separate(type) xlabel(, labsize(tiny)) name(g3)
I created a YouTube video on using CODE delimiters & dataex – looking for feedback on making it better
Hi everyone,
It seems like using the CODE delimiters and -dataex- is a hurdle for new posters (particularly -dataex-). I've tried to help people along by creating a short tutorial on using them and why they are important.
I'm looking for your feedback on ways I could improve it or things that I should change in the next go round (I consider the current posting a "rough draft"). The video is longer than I expected, and has too many “ums” and pauses (and you’ll notice a phone call I tried to edit out around 5:20). Definitely listen to it at 1.5 or greater
Also note: While I am awesome with Excel and pretty good with Stata, I am a novice video editor, and so if anyone wanted to help me edit some of that stuff out, that would be appreciated. I recorded this using Screencast-o-matic (yes, that’s its real name). It’s low budget (I paid $20 for 3 yrs), but I haven’t sat down to learn Camtasia or any of the Adobe products
So if you say, "Why don't you add in some awesome intro music like the video put out by StataCorp" I will likely respond, "Thanks - show me how to do that."
The YouTube video is here https://youtu.be/bXfaRCAOPbI
The Statalist post on converting incident data to weekly rates is here
All the best,
--David
It seems like using the CODE delimiters and -dataex- is a hurdle for new posters (particularly -dataex-). I've tried to help people along by creating a short tutorial on using them and why they are important.
I'm looking for your feedback on ways I could improve it or things that I should change in the next go round (I consider the current posting a "rough draft"). The video is longer than I expected, and has too many “ums” and pauses (and you’ll notice a phone call I tried to edit out around 5:20). Definitely listen to it at 1.5 or greater

Also note: While I am awesome with Excel and pretty good with Stata, I am a novice video editor, and so if anyone wanted to help me edit some of that stuff out, that would be appreciated. I recorded this using Screencast-o-matic (yes, that’s its real name). It’s low budget (I paid $20 for 3 yrs), but I haven’t sat down to learn Camtasia or any of the Adobe products
So if you say, "Why don't you add in some awesome intro music like the video put out by StataCorp" I will likely respond, "Thanks - show me how to do that."
The YouTube video is here https://youtu.be/bXfaRCAOPbI
The Statalist post on converting incident data to weekly rates is here
All the best,
--David
Predicting residuals of an EGARCH model for panel data using rangerun/rangestat
Hello Statalisters,
My data is daily panel data that consists of 379 firms:
id is basically a firm id and I have 379 ids one for each of my 379 firms. resid is my main variable.
I am trying to run an EGARCH model for each company separately and save the residuals before running the next EGARCH model for id ==2 and so on, using resid as my dependent variable with no independent variables. So this is my model:
It is possible to run a panel EGARCH model as follows but the problem is saving each regression's residuals before running the next regaression:
I thought of Mr. Cox's rangerun code and did something like:
but stata wanted a time variable although I already declared my data.
Thus, I did this
and I got
I know that I have sufficient observations to run the EGARCH model because I have already run the EARCH model without saving the residuals and had no issues there. I must be doing something wrong. I might not have sat up the right interval in the rangerun with 0 0. My understanding is that 0 0 takes the full period of id == 1 :
I was also reading about Mr.Cox's rangestat code but I was not able to incorporate the arch/garch models in rangestat.
I appreciated any help/tip in solving my issue.
My data is daily panel data that consists of 379 firms:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int Timestamp long Company float(id resid) 18535 1 1 -.5671907 18536 1 1 -.6045171 18539 1 1 -.6121998 18540 1 1 -.6303514 18541 1 1 -.669826 18542 1 1 -.702639 18543 1 1 -.719861 18546 1 1 -.6933156 18547 1 1 -.731853 18548 1 1 -.7365888 18549 1 1 -.7747283 18550 1 1 -.7631773 18553 1 1 -.7770729 18554 1 1 -.8012897 18555 1 1 -.8964902 18556 1 1 -.913472 18557 1 1 -.9855964 18560 1 1 -.999239 18561 1 1 -.9881337 18562 1 1 -1.0019863 18563 1 1 -1.0206912 18564 1 1 -1.0217553 18567 1 1 -1.0474454 18568 1 1 -1.045138 18569 1 1 -1.0861173 18570 1 1 -1.0813621 18571 1 1 -1.1092516 18574 1 1 -1.0936003 18575 1 1 -1.11229 18576 1 1 -1.1059849 18577 1 1 -1.1025311 18578 1 1 -1.1074014 18581 1 1 -1.1109506 18582 1 1 -1.1378505 18583 1 1 -1.0972226 18584 1 1 -1.081413 18585 1 1 -1.0957986 18588 1 1 -1.2032365 18589 1 1 -1.1156594 18590 1 1 -1.1093435 18591 1 1 -1.119484 18592 1 1 -1.1405734 18595 1 1 -1.1548105 18596 1 1 -1.1471893 18597 1 1 -1.1450335 18598 1 1 -1.2197285 18599 1 1 -1.2260885 18602 1 1 -1.254092 18603 1 1 -1.2256622 18604 1 1 -1.2525495 18605 1 1 -1.260203 18606 1 1 -1.2446553 18609 1 1 -1.2434903 18610 1 1 -1.2611834 18611 1 1 -1.2991477 18612 1 1 -1.3018358 18613 1 1 -1.300265 18616 1 1 -1.2987148 18617 1 1 -1.289396 18618 1 1 -1.3019747 18619 1 1 -1.3019783 18623 1 1 -1.3298627 18624 1 1 -1.3500524 18625 1 1 -1.3450558 18626 1 1 -1.3278207 18630 1 1 -1.3412284 18631 1 1 -1.3589562 18632 1 1 -1.3617107 18634 1 1 -1.4429946 18637 1 1 -1.425543 18638 1 1 -1.430764 18639 1 1 -1.4498658 18640 1 1 -1.439363 18641 1 1 -1.480617 18644 1 1 -1.472839 18645 1 1 -1.471289 18646 1 1 -1.559699 18647 1 1 -1.55628 18648 1 1 -1.5369538 18651 1 1 -1.5414712 18652 1 1 -1.5362016 18653 1 1 -1.5512855 18654 1 1 -1.596582 18655 1 1 -1.6015798 18658 1 1 -1.6236287 18659 1 1 -1.62002 18660 1 1 -1.633923 18661 1 1 -1.6823018 18662 1 1 -1.7491394 18665 1 1 -1.698604 18666 1 1 -1.7063757 18667 1 1 -1.6978794 18668 1 1 -1.7265545 18669 1 1 -1.708142 18672 1 1 -1.7848433 18673 1 1 -1.7568833 18674 1 1 -1.757743 18675 1 1 -1.7850047 18676 1 1 -1.7637664 18679 1 1 -1.782558 end format %tdnn/dd/CCYY Timestamp label values Company Company label def Company 1 "AAK.ST", modify
I am trying to run an EGARCH model for each company separately and save the residuals before running the next EGARCH model for id ==2 and so on, using resid as my dependent variable with no independent variables. So this is my model:
Code:
arch resid , earch(1/1) egarch(1/1)
Code:
by id: arch resid , earch(1/1) egarch(1/1)
Code:
program define EGARCH
arch resid , earch(1/1) egarch(1/1)
predict IV, residuals
exit
end
rangerun EGARCH, interval( Timestamp 0 0) by(id) verbose
Code:
time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname ... time variable not set, use tsset varname .
Code:
program define EGARCH
xtset id Timestamp, daily
arch resid , earch(1/1) egarch(1/1)
predict IV, residuals
exit
end
rangerun EGARCH, interval( Timestamp 0 0) by(id) verbose
Code:
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 16oct2013 to 16oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 17oct2013 to 17oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 18oct2013 to 18oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 21oct2013 to 21oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 22oct2013 to 22oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 23oct2013 to 23oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 24oct2013 to 24oct2013
delta: 1 day
insufficient observations
panel variable: id (strongly balanced)
time variable: Timestamp, 25oct2013 to 25oct2013
delta: 1 day
insufficient observations
Code:
rangerun EGARCH, interval( Timestamp 0 0) by(id) verbose
I appreciated any help/tip in solving my issue.
Workshop on interpreting intetraction effects
Workshop: Interpreting Interaction Effects with ICALC
March 29-30, 2019 Philadelphia
Instructor: Robert L. (Bob) Kaufman
Hosted by: Department of Sociology, Temple University
The workshop is based on Interaction Effects in Linear and Generalized Linear Models (Kaufman, 2019), a comprehensive and accessible text providing a unified approach to interpreting interaction effects. The book develops the statistical basis for the general principles of a set of interpretive tools, introduces the ICALC Toolkit for Stata, and offers start-to-finish examples applying ICALC to show how to interpret interaction effects for a variety of different techniques of analysis.
The workshop provides a foundation in the principles of interpretation and training in the use of the ICALC Toolkit for Stata to produce the calculations, tables and graphics needed to help understand and explain your results.
Register at http://icalcrlk.com/workshop/ . Space is limited.
Array Array
March 29-30, 2019 Philadelphia
Instructor: Robert L. (Bob) Kaufman
Hosted by: Department of Sociology, Temple University
The workshop is based on Interaction Effects in Linear and Generalized Linear Models (Kaufman, 2019), a comprehensive and accessible text providing a unified approach to interpreting interaction effects. The book develops the statistical basis for the general principles of a set of interpretive tools, introduces the ICALC Toolkit for Stata, and offers start-to-finish examples applying ICALC to show how to interpret interaction effects for a variety of different techniques of analysis.
The workshop provides a foundation in the principles of interpretation and training in the use of the ICALC Toolkit for Stata to produce the calculations, tables and graphics needed to help understand and explain your results.
Register at http://icalcrlk.com/workshop/ . Space is limited.
Array Array
threshold, gaps not allowed
Hi everyone,
I'm doing threshold model with the command "threshold" in Stata15.1, my data is strongly balanced, time is continuous and there's no missing value, but it shows "gaps not allowed". (please see attached) I hope anyone who has used this command could help me with this problem, thank you very much!
Best,
Yan

Array
I'm doing threshold model with the command "threshold" in Stata15.1, my data is strongly balanced, time is continuous and there's no missing value, but it shows "gaps not allowed". (please see attached) I hope anyone who has used this command could help me with this problem, thank you very much!
Best,
Yan
2019 German Users Group Meeting
_________________________________
2019 GERMAN USERS GROUP MEETING
_________________________________
Date: May 24, 2019
Venue: Ludwig-Maximilians-Universität Munich
Cost: Meeting only: 45 EUR (students 35 EUR)
Workshop only: 65 EUR
Workshop and Meeting: 85 EUR
Submission deadline: February 1, 2019
Call for Presentations
======================
We would like to announce the 17th German Stata Users Group meeting to
be held Friday, May 24, 2019 at:
LMU Munich
Seidlvilla e.V.
Nikolaiplatz 1b
80802 München
All Stata users, from Germany and elsewhere, or those interested in
learning about Stata, are invited to attend.
Presentations are sought on topics that include the following:
- User-written Stata programs
- Case studies of research or teaching using Stata
- Discussions of data management problems
- Reviews of analytic issues
- Surveys or critiques of Stata facilities in specific fields, etc.
The conference language will be English, due to the international
nature of the meeting and the participation of non-German guest
speakers.
Submission guidelines
=====================
If you are interested in presenting a paper, please submit an abstract
by email to stata@soziologie.uni-muenchen.de (max 200 words). The
deadline for submissions is February 1, 2019. Presentations should be
20 minutes or shorter.
Registration
============
Participants are asked to travel at their own expense. There will be a
small conference fee to cover costs for refreshments and lunch. There
will also be an optional informal meal at a restaurant in Munich on
Friday evening at additional cost. You can enroll by contacting Peter
Stenveld or Elena Tsittser by email or by writing or phoning.
DPC Software GmbH
Prinzenstraße 2
42697 Solingen
Tel: +49 212 26066-44
Email: peter.stenveld@dpc-software.de, elena.tsittser@dpc-software.de
The final program will be circulated in March 2019.
Organizers
==========
Scientific Organizers
~~~~~~~~~~~~~~~~~~~~~
Katrin Auspurg
Ludwig-Maximilians-Universität München
katrin.auspurg@lmu.de
Josef Brüderl
Ludwig-Maximilians-Universität München
bruederl@lmu.de
Johannes Giesecke
Humboldt University Berlin
johannes.giesecke@hu-berlin.de
Ulrich Kohler
University of Potsdam
ulrich.kohler@uni-potsdam.de
Logistics Organizer
~~~~~~~~~~~~~~~~~~~
DPC software (dpc-software.de), the distributor of Stata in several
countries, including Germany, the Netherlands, Austria, the Czech
Republic, and Hungary.
2019 GERMAN USERS GROUP MEETING
_________________________________
Date: May 24, 2019
Venue: Ludwig-Maximilians-Universität Munich
Cost: Meeting only: 45 EUR (students 35 EUR)
Workshop only: 65 EUR
Workshop and Meeting: 85 EUR
Submission deadline: February 1, 2019
Call for Presentations
======================
We would like to announce the 17th German Stata Users Group meeting to
be held Friday, May 24, 2019 at:
LMU Munich
Seidlvilla e.V.
Nikolaiplatz 1b
80802 München
All Stata users, from Germany and elsewhere, or those interested in
learning about Stata, are invited to attend.
Presentations are sought on topics that include the following:
- User-written Stata programs
- Case studies of research or teaching using Stata
- Discussions of data management problems
- Reviews of analytic issues
- Surveys or critiques of Stata facilities in specific fields, etc.
The conference language will be English, due to the international
nature of the meeting and the participation of non-German guest
speakers.
Submission guidelines
=====================
If you are interested in presenting a paper, please submit an abstract
by email to stata@soziologie.uni-muenchen.de (max 200 words). The
deadline for submissions is February 1, 2019. Presentations should be
20 minutes or shorter.
Registration
============
Participants are asked to travel at their own expense. There will be a
small conference fee to cover costs for refreshments and lunch. There
will also be an optional informal meal at a restaurant in Munich on
Friday evening at additional cost. You can enroll by contacting Peter
Stenveld or Elena Tsittser by email or by writing or phoning.
DPC Software GmbH
Prinzenstraße 2
42697 Solingen
Tel: +49 212 26066-44
Email: peter.stenveld@dpc-software.de, elena.tsittser@dpc-software.de
The final program will be circulated in March 2019.
Organizers
==========
Scientific Organizers
~~~~~~~~~~~~~~~~~~~~~
Katrin Auspurg
Ludwig-Maximilians-Universität München
katrin.auspurg@lmu.de
Josef Brüderl
Ludwig-Maximilians-Universität München
bruederl@lmu.de
Johannes Giesecke
Humboldt University Berlin
johannes.giesecke@hu-berlin.de
Ulrich Kohler
University of Potsdam
ulrich.kohler@uni-potsdam.de
Logistics Organizer
~~~~~~~~~~~~~~~~~~~
DPC software (dpc-software.de), the distributor of Stata in several
countries, including Germany, the Netherlands, Austria, the Czech
Republic, and Hungary.
Finding a mean when values are repeated
Hello,
I have data involving auctions and contracts. My task is to find the average of the winning bids. So, I used this code to get a formula for the max bid by contract:
egen maxbid= max(bid), by(contractnum).
My problem is that the values for maxbid are repeated every time a certain contract is mentioned. For example, contract 08-492904 has 7 bidders, so the winning bid is listed 7 times. As you can see, this would create incorrect values for the mean if I did sum maxbid. So, how can I fix this error?
I have data involving auctions and contracts. My task is to find the average of the winning bids. So, I used this code to get a formula for the max bid by contract:
egen maxbid= max(bid), by(contractnum).
My problem is that the values for maxbid are repeated every time a certain contract is mentioned. For example, contract 08-492904 has 7 bidders, so the winning bid is listed 7 times. As you can see, this would create incorrect values for the mean if I did sum maxbid. So, how can I fix this error?
does an independent variable predict dependent variable 1 or 2 better
Dear all,
I am running the below 2 models in Stata. They have different dependent variables, and some similar independent variables (certain independent variables were removed since they were statistically insignificant). Below please see the code and results.
I am interested in a variable that is in both models – dar. In the first model, the coefficient on dar is -.062(P<0.001) and the second model the value of the coefficient on dar is 0.058 (P=0.001). I am trying to evaluate whether dar “explains” the first dependent variable or the second dependent variable better. In the first model, values of the dependent variable are mostly between 0 and 100, but with some values above 100 and below 0. In the second model, the values of dependent range from 0-100.
Can I simply compare the magnitude of the coefficients on dar determine this? For example, since the magnitude is larger in the first model, can I conclude that dar “explains” dependent variable 1 better? Is there a better way to do this, for example, just have dar as the only independent variable and compare BIC?
Thanks!!!
I am running the below 2 models in Stata. They have different dependent variables, and some similar independent variables (certain independent variables were removed since they were statistically insignificant). Below please see the code and results.
I am interested in a variable that is in both models – dar. In the first model, the coefficient on dar is -.062(P<0.001) and the second model the value of the coefficient on dar is 0.058 (P=0.001). I am trying to evaluate whether dar “explains” the first dependent variable or the second dependent variable better. In the first model, values of the dependent variable are mostly between 0 and 100, but with some values above 100 and below 0. In the second model, the values of dependent range from 0-100.
Can I simply compare the magnitude of the coefficients on dar determine this? For example, since the magnitude is larger in the first model, can I conclude that dar “explains” dependent variable 1 better? Is there a better way to do this, for example, just have dar as the only independent variable and compare BIC?
Thanks!!!
Code:
. mixed dep1 dar alp mup zol mupbyzol || _all: R.bor || _all: R.mol || _all: R.pop, reml
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -37391.644
Iteration 1: log restricted-likelihood = -37391.644
Computing standard errors:
Mixed-effects REML regression Number of obs = 8,476
Group variable: _all Number of groups = 1
Obs per group:
min = 8,476
avg = 8,476.0
max = 8,476
Wald chi2(5) = 226.97
Log restricted-likelihood = -37391.644 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
dep1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dar | -.0624097 .0153719 -4.06 0.000 -.092538 -.0322814
alp | 8.150469 1.526269 5.34 0.000 5.159037 11.1419
mup | 8.152124 2.433408 3.35 0.001 3.382732 12.92152
zol | 6.486658 .7586855 8.55 0.000 4.999661 7.973654
mupbyzol | -1.694464 .915423 -1.85 0.064 -3.48866 .0997322
_cons | 81.06523 2.629083 30.83 0.000 75.91233 86.21814
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity |
var(R.bor) | 63.10153 13.75592 41.16052 96.73842
-----------------------------+------------------------------------------------
_all: Identity |
var(R.mol) | 18.28294 7.468788 8.209574 40.7166
-----------------------------+------------------------------------------------
_all: Identity |
var(R.pop) | 30.08839 6.280492 19.9859 45.29751
-----------------------------+------------------------------------------------
var(Residual) | 381.8593 5.912967 370.4442 393.6262
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 1810.17 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
.
. mixed dep2 dar alp mup zol alpbymup || _all: R.bor || _all: R.mol || _all: R.pop, reml
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -38246.114
Iteration 1: log restricted-likelihood = -38246.114
Computing standard errors:
Mixed-effects REML regression Number of obs = 8,463
Group variable: _all Number of groups = 1
Obs per group:
min = 8,463
avg = 8,463.0
max = 8,463
Wald chi2(5) = 246.42
Log restricted-likelihood = -38246.114 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
dep2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dar | .0576079 .0172886 3.33 0.001 .0237229 .0914928
alp | -9.863186 2.068844 -4.77 0.000 -13.91805 -5.808327
mup | -9.944255 3.193489 -3.11 0.002 -16.20338 -3.685131
zol | -6.253207 .4719679 -13.25 0.000 -7.178247 -5.328167
alpbymup | -2.215597 1.016369 -2.18 0.029 -4.207643 -.2235513
_cons | 49.90762 3.607263 13.84 0.000 42.83752 56.97773
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity |
var(R.bor) | 185.1551 38.18656 123.59 277.3884
-----------------------------+------------------------------------------------
_all: Identity |
var(R.mol) | 34.39981 14.81115 14.79325 79.99239
-----------------------------+------------------------------------------------
_all: Identity |
var(R.pop) | 49.99171 10.22253 33.48424 74.63724
-----------------------------+------------------------------------------------
var(Residual) | 470.1817 7.287957 456.1124 484.685
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 2918.94 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
Optimize() on a segment
Hello,
I would like to maximize my function of a specific segment, let's say [a,b]. Does anyone know how to do this in mata?
Best regards,
Olga
I would like to maximize my function of a specific segment, let's say [a,b]. Does anyone know how to do this in mata?
Best regards,
Olga
ICC in mi estimate meqrlogit
Hi all,
I have successfully imputed values and run the mi estimate meqrlogit for multilevel analysis, but now I am having troubles to investigate the variance.
To investigate the variance with the un-imputed data set I used the “estat ICC” command to get the Intraclass correlations, but this command does not work with the imputed data set. How can I assess the variance?
Can someone help me, please?
I have successfully imputed values and run the mi estimate meqrlogit for multilevel analysis, but now I am having troubles to investigate the variance.
To investigate the variance with the un-imputed data set I used the “estat ICC” command to get the Intraclass correlations, but this command does not work with the imputed data set. How can I assess the variance?
Can someone help me, please?
fix effect and clustering residuals
Hello all,
I have a panel of prices of 4 products, in 200 towns, posted by 20 compnies for 12 month.
I would like to have s fix effect on products and towns. I assume that residuals of prices of each product in each town for each month are not iid so i would like to cluster.
would fix effect on product , town + cluster on companies is correct for that?
thank you!
I have a panel of prices of 4 products, in 200 towns, posted by 20 compnies for 12 month.
I would like to have s fix effect on products and towns. I assume that residuals of prices of each product in each town for each month are not iid so i would like to cluster.
would fix effect on product , town + cluster on companies is correct for that?
thank you!
3SLS with non-continuous endogenous variables
Hello Everyone,
Much thanks for the opportunity provided for clarification of doubts and guidance on data analysis.
I am specifying a system of 3 equations using the (reg3 command) for which the two endogenous variables in the system are not continuous. I would like to know if there is a command that can handle this situation. My system of equations appear as follows:
Y1= Y2X2 X3 X4
Y2 = X1+Z1+Z2
X1= V1+V3+V
Y1 (the dependent variable in the final outcome equation) is a continuous dependent variable whereas Y2and X1 are endogenous in the system but non-continuous. The standard reg3 command treats the dependent variables in all the three stages/equations as continuous. It is actually giving me some interesting estimates but I think that is not right since Y2 in equation 2 above is a Limited Dependent Variable (an index bounded between 0 and 1) and X1 in equation 3 is a binary variable (0=No and 1=Yes).
Kindly help me. I will really appreciate.
Thanks!!
Much thanks for the opportunity provided for clarification of doubts and guidance on data analysis.
I am specifying a system of 3 equations using the (reg3 command) for which the two endogenous variables in the system are not continuous. I would like to know if there is a command that can handle this situation. My system of equations appear as follows:
Y1= Y2X2 X3 X4
Y2 = X1+Z1+Z2
X1= V1+V3+V
Y1 (the dependent variable in the final outcome equation) is a continuous dependent variable whereas Y2and X1 are endogenous in the system but non-continuous. The standard reg3 command treats the dependent variables in all the three stages/equations as continuous. It is actually giving me some interesting estimates but I think that is not right since Y2 in equation 2 above is a Limited Dependent Variable (an index bounded between 0 and 1) and X1 in equation 3 is a binary variable (0=No and 1=Yes).
Kindly help me. I will really appreciate.
Thanks!!
Interpretation of Interaction terms
Dear Statalisters
I run following regression for panel data of children: PPVT being a test score, a treatment they get and sex being =1 of male and =2 if female.
How do I interpret this?
Can I read it like this:
1. males not getting treatment have an avg. score of 80.75
2. males getting treatment have an avg. score of 86.34 (80.75 + 5.59)
3. females getting treatment have an avg. score of 72.45 (80.75 - 8.30)
4. females not getting treatment have an avg. score of ??
or is it like this:
1. males not getting treatment have an avg. score of 80.75
2. males getting treatment have an avg. score of 86.34
3. females getting treated have an avg. score of 78.05 (80.75 + 5.59 - 8.30)
4. females not getting treatment have an avg. score of ??
Does the interaction coefficient say how much treatment changes the score for females or does it say how much the change differs from the male's change?
And how do I find the coefficient for being just a female without treatment?
Thank you so much for your help!
Best
Arto Arman
I run following regression for panel data of children: PPVT being a test score, a treatment they get and sex being =1 of male and =2 if female.
Code:
xtreg PPVT treatment##sex (control variables) , fe cluster(ID)
---------------------------------------------------------------------------------------------
| Robust
ppvtraw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
1.treatment | 5.594682 1.81707 3.08 0.002 2.031623 9.157742
|
sex |
female | 0 (omitted)
|
treatment#sex |
1#female | -8.300998 2.405033 -3.45 0.001 -13.01698 -3.585014
_cons | 80.75674 19.60288 4.12 0.000 42.31781 119.1957
----------------------------+----------------------------------------------------------------
Can I read it like this:
1. males not getting treatment have an avg. score of 80.75
2. males getting treatment have an avg. score of 86.34 (80.75 + 5.59)
3. females getting treatment have an avg. score of 72.45 (80.75 - 8.30)
4. females not getting treatment have an avg. score of ??
or is it like this:
1. males not getting treatment have an avg. score of 80.75
2. males getting treatment have an avg. score of 86.34
3. females getting treated have an avg. score of 78.05 (80.75 + 5.59 - 8.30)
4. females not getting treatment have an avg. score of ??
Does the interaction coefficient say how much treatment changes the score for females or does it say how much the change differs from the male's change?
And how do I find the coefficient for being just a female without treatment?
Thank you so much for your help!
Best
Arto Arman
Sts graph truncate
Ciao, I am conduct survival analysis and need to truncate x-axis. The survival time data I have obtained start at a time of 5 but sts graph starts at 0. But there is no data that starts before 5. What do I do?
is the code I use.
Array
Code:
sts graph
Array
regression of the panel data: dependent variable and independent variable is the same variable in different group
Hi, I am now running a panel regression, for example, return in year 2018=c+return in year 2010-2017+e. And year 2010-2017 is categorized as group 1, year 2018 is group 0. as attached.
how should I do this?
Thank you and best,
Ivy
| year | group | return |
| 2010 | 1 | 10 |
| 2011 | 1 | 11 |
| 2012 | 1 | 12 |
| 2013 | 1 | 13 |
| 2014 | 1 | 14 |
| 2015 | 1 | 15 |
| 2016 | 1 | 16 |
| 2017 | 1 | 17 |
| 2018 | 0 | 18 |
Thank you and best,
Ivy
help me whit stata command
dear statalist
it's my first job with been I'm not very practical
I am working on a panel data to understand how the introduction of a bonus has changed the consumption of the population
following the application of the bonus. taking into account the year 2012 when the treatment had not yet been introduced and the year 2014 year
when the treatment was introduced. I have a treaties and untreated groups and the other variables that I took into consideration are the age,
the region of belonging and the number of family members,
is this command of state ok?
reg c bonus_renzi ncomp ireg eta post T_Post , vce(cluster nquest)
is it better that I divide ireg (north south center) and get three different regressions? how can I interpret these results?
Array
it's my first job with been I'm not very practical
I am working on a panel data to understand how the introduction of a bonus has changed the consumption of the population
following the application of the bonus. taking into account the year 2012 when the treatment had not yet been introduced and the year 2014 year
when the treatment was introduced. I have a treaties and untreated groups and the other variables that I took into consideration are the age,
the region of belonging and the number of family members,
is this command of state ok?
reg c bonus_renzi ncomp ireg eta post T_Post , vce(cluster nquest)
is it better that I divide ireg (north south center) and get three different regressions? how can I interpret these results?
Array
Standardised independent variables using logistic regression
I am running a model in which I want to check the substantive effect of the different independent variables. I know my corevar is statistically significant and it confirms my theoretical expectation but I'd like to engage in a discussion regarding its importance in comparison to other determinants well established to be important.
To that end, I re-estimate my model after standardising the continuous variables (using the center command) and leave my categorical and dummy variables untouched. I am surprised that there is no change in the size of the coefficient values which makes me think I’m running the standardised model incorrectly. The code I am using is attached below as is example data.
I assume that I shouldn't be standardising the dummies or the country and year fixed effects since the size of the "magnitude" is captured from the baseline.
To that end, I re-estimate my model after standardising the continuous variables (using the center command) and leave my categorical and dummy variables untouched. I am surprised that there is no change in the size of the coefficient values which makes me think I’m running the standardised model incorrectly. The code I am using is attached below as is example data.
I assume that I shouldn't be standardising the dummies or the country and year fixed effects since the size of the "magnitude" is captured from the baseline.
Code:
logit voted i.corevar i.gender age i.ethnic eduyrs swi religious i.domicil swd econview politic_interest i.extreme lrscale i.cntryID i.year if fulldata==1 & identifier==1 [pw=weight], robust cluster(cntry) center age eduyrs swi religious swd econview politic_interest lrscale [pw=weight] if fulldata==1 & identifier==1 logit voted i.corevar i.gender c_age i.ethnic c_eduyrs c_swi c_religious i.domicil c_swd c_econview c_politic_interest i.extreme c_lrscale i.cntryID i.year if fulldata==1 & identifier==1 [pw=weight], robust cluster(cntry)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(voted corevar) double(gender agea) float ethnic double(eduyrs swi religious domicil swd econview politic_interest) float extreme double lrscale long cntryID float(year fulldata) double identifier float weight str2 cntry 0 0 0 64 0 12 3 1 3 7 8 1 0 5 1 2002 0 1 .2420854 "AT" 1 0 1 76 0 8 2 8 5 0 0 1 1 . 1 2002 0 1 .2095612 "AT" 1 0 0 63 . 9 3 7 3 5 7 3 0 5 1 2002 0 1 .25660998 "AT" 0 0 0 62 0 12 2 1 3 3 0 3 1 . 1 2002 0 1 .24710792 "AT" 1 0 0 33 0 12 3 5 3 5 4 3 0 5 1 2002 0 0 .12852219 "AT" 1 0 0 38 0 15 3 9 1 5 6 4 0 4 1 2002 0 0 .15444924 "AT" 1 0 1 68 0 6 3 5 4 5 5 2 1 . 1 2002 0 1 .1719059 "AT" 0 0 1 . 0 16 3 8 4 8 6 3 1 1 1 2002 0 1 .27422953 "AT" 1 0 0 31 0 13 4 2 3 6 7 2 0 7 1 2002 0 1 .2577774 "AT" 1 0 1 38 0 22 3 7 1 9 6 4 0 5 1 2002 0 1 .2987177 "AT" 1 0 1 48 0 12 2 5 1 8 6 1 0 4 1 2002 0 1 .3828788 "AT" 1 0 1 56 0 9 3 7 5 6 6 2 0 6 1 2002 0 1 .4570764 "AT" 1 0 1 73 0 11 3 8 4 9 7 2 0 5 1 2002 0 1 .2252532 "AT" 1 0 1 44 0 18 4 7 2 7 3 3 0 5 1 2002 0 1 .29472685 "AT" 1 0 0 24 0 11 4 7 4 7 5 4 0 6 1 2002 0 0 .29372233 "AT" 1 0 0 50 0 10 4 0 4 7 8 3 0 6 1 2002 0 1 .2444745 "AT" 1 0 1 76 0 11 3 8 1 8 8 2 0 6 1 2002 0 0 .13989751 "AT" 1 0 1 43 0 13 4 7 3 3 4 4 0 5 1 2002 0 1 .2252532 "AT" 1 0 0 51 . 12 4 5 4 0 0 2 1 10 1 2002 0 0 .09931013 "AT" 1 0 0 42 0 10 2 7 1 2 5 3 1 . 1 2002 0 1 .3089256 "AT" 1 0 1 77 0 11 3 5 4 10 2 1 1 9 1 2002 0 1 .18759787 "AT" 1 0 0 50 0 11 3 5 3 5 4 3 0 5 1 2002 0 1 .6195073 "AT" 1 0 1 30 0 12 3 5 4 4 2 1 1 . 1 2002 0 0 .50263196 "AT" 1 0 0 48 0 13 3 7 4 4 4 2 1 . 1 2002 0 1 .4655739 "AT" 1 0 1 74 0 18 4 7 1 . 7 4 0 7 1 2002 0 0 .13873011 "AT" 1 0 0 37 0 13 4 5 4 4 5 3 0 3 1 2002 0 1 .29616573 "AT" 0 0 0 41 0 11 3 8 4 5 . 2 1 0 1 2002 0 0 .09431476 "AT" 1 0 0 63 1 12 3 8 3 6 6 3 0 5 1 2002 0 0 .12102913 "AT" 1 0 0 24 0 13 3 2 3 9 6 3 0 4 1 2002 0 1 .2420854 "AT" 1 0 0 84 0 8 3 9 2 5 4 2 0 5 1 2002 0 0 .14730912 "AT" 1 0 1 67 0 14 4 5 3 4 7 3 0 7 1 2002 0 1 .27420238 "AT" . 0 0 49 1 12 2 10 2 10 10 2 0 6 1 2002 0 1 .5663772 "AT" 1 0 0 65 0 12 3 5 4 5 6 2 1 2 1 2002 0 1 .331622 "AT" . 0 0 17 0 11 4 3 5 6 4 2 0 5 1 2002 0 0 .3150069 "AT" 1 0 0 42 0 9 3 10 3 5 0 2 0 5 1 2002 0 1 .3771776 "AT" 1 0 0 59 0 19 3 5 4 1 4 3 1 . 1 2002 0 1 .18759787 "AT" 0 0 0 25 0 17 4 3 1 5 5 2 0 6 1 2002 0 0 .13101988 "AT" 1 0 0 48 0 12 3 5 2 5 4 4 0 7 1 2002 0 1 .25525254 "AT" 1 0 1 62 0 10 3 3 2 9 7 4 1 2 1 2002 0 0 .12219653 "AT" 1 0 1 47 0 15 4 8 4 5 6 3 0 5 1 2002 0 1 .32146835 "AT" 1 0 1 66 0 12 1 7 4 5 5 3 0 5 1 2002 0 1 .4328325 "AT" 1 0 1 44 0 13 1 7 1 9 5 1 0 3 1 2002 0 1 .2620669 "AT" 1 0 0 47 0 16 3 1 4 5 3 2 1 2 1 2002 0 0 .2439858 "AT" 1 0 0 45 0 19 2 6 4 2 3 3 1 8 1 2002 0 0 .20920826 "AT" 1 0 0 68 0 8 3 5 4 10 7 2 1 8 1 2002 0 1 .16520014 "AT" 1 0 1 21 0 12 3 7 4 0 . 3 1 0 1 2002 0 0 .4336199 "AT" 1 0 1 33 1 22 3 5 1 8 7 3 0 5 1 2002 0 0 .12762627 "AT" . 0 0 18 1 12 3 5 3 4 7 3 0 5 1 2002 0 0 .3036045 "AT" 1 0 1 52 0 11 4 3 3 1 2 2 0 3 1 2002 0 1 .20280117 "AT" 0 0 1 38 0 14 4 0 2 6 8 3 0 5 1 2002 0 1 .2845732 "AT" 1 0 1 21 0 14 3 4 4 5 6 2 0 3 1 2002 0 0 .8539368 "AT" 1 0 1 20 0 12 2 1 4 8 4 1 1 . 1 2002 0 0 .12662177 "AT" 1 0 1 61 0 8 3 8 4 6 5 4 1 . 1 2002 0 1 .18447576 "AT" 1 0 1 53 0 11 3 6 4 7 7 4 0 5 1 2002 0 1 .3045004 "AT" 1 0 0 59 0 9 2 5 4 3 3 2 1 . 1 2002 0 0 .09882145 "AT" 1 0 1 61 0 14 3 6 4 8 6 3 0 6 1 2002 0 1 .2063305 "AT" 0 0 0 21 0 14 4 7 2 7 6 1 0 4 1 2002 0 0 .426344 "AT" 1 0 1 65 0 9 3 6 4 9 9 2 1 8 1 2002 0 1 .5275817 "AT" 1 0 1 63 1 18 3 5 1 6 5 3 0 3 1 2002 0 1 .4127696 "AT" 0 0 0 84 0 8 1 9 2 . 3 1 0 5 1 2002 0 0 .12561727 "AT" 1 0 0 85 0 10 3 8 4 7 5 2 1 . 1 2002 0 0 .09966306 "AT" . 0 1 33 1 6 1 8 2 . . 2 1 . 1 2002 0 1 .25424805 "AT" 0 0 0 20 0 12 4 1 1 4 1 1 0 4 1 2002 0 0 .3042832 "AT" 1 0 1 41 0 12 3 8 2 4 3 3 0 5 1 2002 0 0 .10712897 "AT" 1 0 1 41 0 16 3 0 4 8 8 2 0 5 1 2002 0 1 .3798653 "AT" 1 0 1 65 0 12 3 3 2 0 0 2 0 4 1 2002 0 1 .27979502 "AT" 1 0 0 35 0 20 4 0 1 10 8 4 0 4 1 2002 0 1 .2677681 "AT" 1 0 1 56 0 . 3 5 4 7 7 3 0 5 1 2002 0 1 .20641196 "AT" 0 0 1 23 0 13 4 0 4 . . 1 1 10 1 2002 0 0 .09936443 "AT" 1 0 1 70 0 8 2 9 4 6 5 1 1 . 1 2002 0 1 .6423666 "AT" 1 0 1 46 0 16 3 0 3 7 5 4 0 5 1 2002 0 0 .13710119 "AT" 0 0 0 50 0 9 3 2 4 6 3 1 0 5 1 2002 0 1 .3150069 "AT" 1 0 1 55 0 8 2 0 4 3 8 2 0 5 1 2002 0 1 .21960625 "AT" 1 0 1 29 1 12 2 10 1 7 8 3 1 . 1 2002 0 0 .13101988 "AT" . 0 0 35 0 16 2 5 3 5 7 4 0 5 1 2002 0 1 .3101473 "AT" 1 0 0 23 0 12 1 3 3 5 3 1 0 4 1 2002 0 0 .12852219 "AT" 1 0 0 63 0 12 2 5 1 4 3 3 1 2 1 2002 0 0 .15154433 "AT" 0 0 1 53 0 19 3 5 1 3 7 4 0 5 1 2002 0 1 .6062316 "AT" 1 0 1 38 0 12 4 3 3 7 6 3 0 5 1 2002 0 1 .25392225 "AT" 1 0 1 25 0 18 4 3 1 . 5 2 1 8 1 2002 0 0 .13873011 "AT" 1 0 0 42 0 9 3 4 3 8 8 2 0 6 1 2002 0 0 .24846536 "AT" 1 0 1 73 0 9 3 2 1 7 6 2 0 4 1 2002 0 1 .3089256 "AT" 1 0 0 62 0 13 3 5 1 0 0 3 0 5 1 2002 0 0 .13194293 "AT" 1 0 0 79 1 8 3 8 2 7 8 2 1 . 1 2002 0 0 .13989751 "AT" 1 0 1 40 0 12 3 6 4 10 5 3 0 5 1 2002 0 1 .3045004 "AT" 1 0 1 55 0 18 4 0 3 8 9 1 1 8 1 2002 0 1 .25126168 "AT" 1 0 0 55 0 10 3 4 1 8 7 2 1 2 1 2002 0 0 .15444924 "AT" 1 0 0 36 1 9 3 7 4 0 3 3 0 6 1 2002 0 1 .2479224 "AT" 1 0 1 56 0 15 3 7 4 6 5 2 0 4 1 2002 0 1 .2479767 "AT" 1 0 1 71 0 18 4 7 3 6 6 4 0 7 1 2002 0 0 .13710119 "AT" 1 0 1 60 0 12 4 8 4 8 6 4 0 6 1 2002 0 1 .4959534 "AT" 1 0 1 68 0 8 4 4 2 7 7 4 1 2 1 2002 0 1 .25424805 "AT" 1 0 0 24 0 16 4 4 1 5 3 3 0 3 1 2002 0 1 .3139753 "AT" 0 0 1 62 0 12 3 2 4 4 4 3 0 7 1 2002 0 1 .2179773 "AT" 0 0 1 23 0 9 3 4 4 7 7 3 1 0 1 2002 0 0 .13951743 "AT" 1 0 0 63 0 20 3 10 2 2 4 4 1 0 1 2002 0 1 .3042832 "AT" 0 0 0 58 0 11 4 8 1 0 7 1 0 5 1 2002 0 0 .1422866 "AT" 1 0 1 66 0 11 2 8 3 . 5 3 0 4 1 2002 0 0 .09670385 "AT" 1 0 0 38 0 13 3 5 3 5 7 3 0 5 1 2002 0 0 .12778917 "AT" 1 0 0 53 0 11 4 0 1 4 6 2 0 7 1 2002 0 1 .3042832 "AT" end label values gender gender label def gender 0 "Female", modify label def gender 1 "Male", modify label values agea agea label values eduyrs eduyrs label values swi swi label def swi 1 "Very difficult on present income", modify label def swi 2 "Difficult on present income", modify label def swi 3 "Coping on present income", modify label def swi 4 "Living comfortably on present income", modify label values religious rlgdgr label def rlgdgr 0 "Not at all religious", modify label def rlgdgr 1 "1", modify label def rlgdgr 2 "2", modify label def rlgdgr 3 "3", modify label def rlgdgr 4 "4", modify label def rlgdgr 5 "5", modify label def rlgdgr 6 "6", modify label def rlgdgr 7 "7", modify label def rlgdgr 8 "8", modify label def rlgdgr 9 "9", modify label def rlgdgr 10 "Very religious", modify label values domicil domicil label def domicil 1 "A big city", modify label def domicil 2 "Suburbs or outskirts of big city", modify label def domicil 3 "Town or small city", modify label def domicil 4 "Country village", modify label def domicil 5 "Farm or home in countryside", modify label values swd stfdem label def stfdem 0 "Extremely dissatisfied", modify label def stfdem 1 "1", modify label def stfdem 2 "2", modify label def stfdem 3 "3", modify label def stfdem 4 "4", modify label def stfdem 5 "5", modify label def stfdem 6 "6", modify label def stfdem 7 "7", modify label def stfdem 8 "8", modify label def stfdem 9 "9", modify label def stfdem 10 "Extremely satisfied", modify label values econview stfeco label def stfeco 0 "Extremely dissatisfied", modify label def stfeco 1 "1", modify label def stfeco 2 "2", modify label def stfeco 3 "3", modify label def stfeco 4 "4", modify label def stfeco 5 "5", modify label def stfeco 6 "6", modify label def stfeco 7 "7", modify label def stfeco 8 "8", modify label def stfeco 9 "9", modify label def stfeco 10 "Extremely satisfied", modify label values politic_interest politic_interest label def politic_interest 1 "Not at all interested", modify label def politic_interest 2 "Hardly Interested", modify label def politic_interest 3 "Quite Interested", modify label def politic_interest 4 "Very interested", modify label values lrscale lrscale label def lrscale 0 "Left", modify label def lrscale 1 "1", modify label def lrscale 2 "2", modify label def lrscale 3 "3", modify label def lrscale 4 "4", modify label def lrscale 5 "5", modify label def lrscale 6 "6", modify label def lrscale 7 "7", modify label def lrscale 8 "8", modify label def lrscale 9 "9", modify label def lrscale 10 "Right", modify label values cntryID cntryID label def cntryID 1 "AT", modify label values identifier partner label def partner 1 "Lives with husband/wife/partner at household grid", modify
Rescaling
Hi Statlist!
Is there a way to rescale the x and y axis in order to show them in in Millions dollars rather than in the format I have in the following figure?
Array
What I did is basically a simple two-way scatter:
This is a data example:
Is there a way to rescale the x and y axis in order to show them in in Millions dollars rather than in the format I have in the following figure?
Array
What I did is basically a simple two-way scatter:
Code:
twoway(scatter avsales_new_no_outliers lagged_tot_sales)(lfit avsales_new_no_outliers lagged_tot_sales)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(Year idfirm) str18 crp float(avsales avsales_existing avsales_new) double nprod float(birthfirm firstyear agefirm) double tot_sales 2008 1 "21ST CENTURY LABS" 94701.45 . 94701.45 4 2008 2008 0 211591.55354704778 2009 1 "21ST CENTURY LABS" 83603.08 83603.08 . 4 2008 2008 1 308746.0587639413 2010 1 "21ST CENTURY LABS" 91092.75 113848.5 69.7861 5 2008 2008 2 407237.1453954251 2011 1 "21ST CENTURY LABS" 118792.94 165043.63 3166.214 7 2008 2008 3 598549.0481277383 2012 1 "21ST CENTURY LABS" 104542.8 156613.64 401.1374 9 2008 2008 4 664061.6305459269 2013 1 "21ST CENTURY LABS" 73541.766 113564.58 1500.705 14 2008 2008 5 608140.67505112 2014 1 "21ST CENTURY LABS" 48557.22 61105.52 13421.997 19 2008 2008 6 715163.0758130773 2015 1 "21ST CENTURY LABS" 52369.71 52369.71 . 18 2008 2008 7 768412.7578732732 2004 2 "3M" 2039558.8 2039558.8 . 6 1975 2004 29 12237352.38962967 2005 2 "3M" 2109991 2109991 . 6 1975 2004 30 11513177.507807251 2006 2 "3M" 2050668.5 2050668.5 . 6 1975 2004 31 12304011.532418355 2007 2 "3M" 6330808 6330808 . 6 1975 2004 32 11406777.403632266 2008 2 "3M" 2734569.5 2849051 2047681 7 1975 2004 33 12355367.684264863 2009 2 "3M" 1841746.4 1841746.4 . 7 1975 2004 34 12892224.398947667 2010 2 "3M" 5212612 5212612 . 6 1975 2004 35 10897763.444727022 2011 2 "3M" 5262992 5262992 . 6 1975 2004 36 10580189.86100467 2012 2 "3M" 2037856.5 2037856.5 . 6 1975 2004 37 10967562.283051893 2013 2 "3M" 1823603.5 1823603.5 . 6 1975 2004 38 9124157.813647183 2014 2 "3M" 1785749 1785749 . 4 1975 2004 39 7133930.997182059 2015 2 "3M" 1356981 1356981 . 4 1975 2004 40 5427924.711631365 2004 3 "A-A SPECTRUM" 53702.81 60055.78 1608.419 46 1995 2004 9 2464566.2638161103 2005 3 "A-A SPECTRUM" 81622.65 117128.7 4692.8643 57 1995 2004 10 4406664.46751826 2006 3 "A-A SPECTRUM" 67388.88 85411.82 2806.715 55 1995 2004 11 1843225.499704768 2007 3 "A-A SPECTRUM" 11123.49 12000.184 3759.274 47 1995 2004 12 476002.8259390887 2008 3 "A-A SPECTRUM" 7850.393 8111.578 3279.65 37 1995 2004 13 160865.5648153848 2009 3 "A-A SPECTRUM" 6033.619 6586.994 2021.6517 33 1995 2004 14 82650.9899064962 2010 3 "A-A SPECTRUM" 4410.1416 6201.939 1065.4524 43 1995 2004 15 126687.58501765266 2011 3 "A-A SPECTRUM" 2412.5247 2653.7156 482.997 27 1995 2004 16 35362.49685707843 2012 3 "A-A SPECTRUM" 1075.685 1263.63 261.25616 16 1995 2004 17 16035.562916347304 2013 3 "A-A SPECTRUM" 3456.24 3290.421 4340.6104 19 1995 2004 18 22272.424034809985 2014 3 "A-A SPECTRUM" 1496.9843 1496.9843 . 15 1995 2004 19 21090.64112449751 2015 3 "A-A SPECTRUM" 4020.239 4886.911 120.21213 11 1995 2004 20 23837.82400447101 2004 4 "A-S MEDICATION" 56691.86 63403.79 17030.424 152 2000 2004 4 8398246.263306491 2005 4 "A-S MEDICATION" 78010.12 90833.17 12908.477 158 2000 2004 5 8251045.135936161 2006 4 "A-S MEDICATION" 94482.98 105549.44 18526.848 173 2000 2004 6 9847010.316997783 2007 4 "A-S MEDICATION" 126469.87 141906.94 54159.39 216 2000 2004 7 13503137.133153718 2008 4 "A-S MEDICATION" 78780.68 90365.45 8950.199 253 2000 2004 8 13768126.595234673 2009 4 "A-S MEDICATION" 73256.164 80078.37 8707.616 272 2000 2004 9 13834060.86932386 2010 4 "A-S MEDICATION" 62012.55 64500.59 18223.094 279 2000 2004 10 12531817.194234656 2011 4 "A-S MEDICATION" 70322.5 72763.29 3689.045 283 2000 2004 11 12433251.178176137 2012 4 "A-S MEDICATION" 64410.49 65984.35 38950.9 292 2000 2004 12 11206517.411851741 2013 4 "A-S MEDICATION" 79104 84296.03 7899.004 309 2000 2004 13 14594676.742348071 2014 4 "A-S MEDICATION" 119126.19 129064.8 4067.63 327 2000 2004 14 26481277.906628065 2015 4 "A-S MEDICATION" 176931.8 179580.45 56984.34 324 2000 2004 15 29946995.67406721 2007 5 "A.J. BART, INC." 157.76617 . 157.76617 1 2007 2007 0 39.4415442669969 2004 6 "AAIPHARMA" 28066686 28066686 . 1 1975 2004 29 28066686.19154623 2005 6 "AAIPHARMA" 5704382 5704382 . 1 1975 2004 30 5704381.338495776 2006 6 "AAIPHARMA" 496491.9 496491.9 . 1 1975 2004 31 496491.87314146105 2007 6 "AAIPHARMA" 9421.785 9421.785 . 1 1975 2004 32 9421.785319502253 2008 6 "AAIPHARMA" 1577.368 1577.368 . 1 1975 2004 33 394.3420099042942 2009 6 "AAIPHARMA" 5520.829 5520.829 . 1 1975 2004 34 1380.2071011962253 2007 7 "AARON INDUSTRIES" 35459.79 . 35459.79 1 2007 2007 0 35459.78814655074 2008 7 "AARON INDUSTRIES" 42076.49 42076.49 . 1 2007 2007 1 42076.49109553711 2009 7 "AARON INDUSTRIES" 381988.5 381988.5 . 1 2007 2007 2 381988.536163157 2010 7 "AARON INDUSTRIES" 1804061.4 1804061.4 . 1 2007 2007 3 451015.3511721381 2011 7 "AARON INDUSTRIES" 270585.6 270585.6 . 1 2007 2007 4 270585.60116440145 2012 7 "AARON INDUSTRIES" 1318229 1318229 . 1 2007 2007 5 329557.26253078564 2013 7 "AARON INDUSTRIES" 1220001.8 1220001.8 . 1 2007 2007 6 305000.4474947418 2014 7 "AARON INDUSTRIES" 693862.8 693862.8 . 1 2007 2007 7 173465.68589452517 2015 7 "AARON INDUSTRIES" 31625.557 31625.557 . 1 2007 2007 8 31625.55690904084 2004 8 "ABBOTT" 11277370 11535204 104568.8 133 1961 2004 43 1499855855.865098 2005 8 "ABBOTT" 18729456 19310292 578339.75 129 1961 2004 44 1689540069.7170644 2006 8 "ABBOTT" 18472182 18746220 8880865 108 1961 2004 45 1598121054.119387 2007 8 "ABBOTT" 32214094 32569138 25255262 103 1961 2004 46 1656992400.409758 2008 8 "ABBOTT" 36879372 38127580 7234408 99 1961 2004 47 2142910992.52344 2009 8 "ABBOTT" 34457748 34848544 67797.15 89 1961 2004 48 2201020370.9059067 2010 8 "ABBOTT" 35879268 37502196 4232198 82 1961 2004 49 2181284539.324025 2011 8 "ABBOTT" 32816464 34031788 2651.776 84 1961 2004 50 2029940306.3960855 2012 8 "ABBOTT" 27333772 31581784 865396.8 94 1961 2004 51 2225894374.1270456 2013 8 "ABBOTT" 46580236 48500412 976053.5 99 1961 2004 52 2727305730.021949 2014 8 "ABBOTT" 33949468 35378588 7886.491 99 1961 2004 53 2359259573.3842134 2015 8 "ABBOTT" 29031148 29661780 22129.54 94 1961 2004 54 2370530389.480409 2004 9 "ABBVIE" 300541632 300541632 . 33 1963 2004 41 9917874207.75519 2005 9 "ABBVIE" 398063520 398063520 . 33 1963 2004 42 10607521709.816193 2006 9 "ABBVIE" 711789248 711789248 . 32 1963 2004 43 11125455014.939121 2007 9 "ABBVIE" 485468992 485468992 . 32 1963 2004 44 10999908825.891459 2008 9 "ABBVIE" 471612384 499406368 26908488 34 1963 2004 45 11020835162.138296 2009 9 "ABBVIE" 619878528 619878528 . 34 1963 2004 46 11052308521.725805 2010 9 "ABBVIE" 775569536 775569536 . 34 1963 2004 47 11844314656.874342 2011 9 "ABBVIE" 471647968 471647968 . 34 1963 2004 48 12248782762.78322 2012 9 "ABBVIE" 423355936 423355936 . 33 1963 2004 49 12198970579.756456 2013 9 "ABBVIE" 670351680 691815744 47894516 30 1963 2004 50 11978041077.652853 2014 9 "ABBVIE" 574183808 630412352 11898461 33 1963 2004 51 13100594403.908373 2015 9 "ABBVIE" 532608064 548018368 24068394 34 1963 2004 52 16695333854.076557 2004 10 "ABER PHARM" 10196.007 10196.007 . 2 2003 2004 1 20392.013641854945 2008 11 "ABKIT" 830263.1 . 830263.1 2 2008 2008 0 1020956.3147491836 2009 11 "ABKIT" 1215518.4 1215518.4 . 2 2008 2008 1 2431036.790986024 2010 11 "ABKIT" 302000.13 302000.13 . 2 2008 2008 2 514778.3255435499 2011 11 "ABKIT" 133911.64 133911.64 . 2 2008 2008 3 66955.82008718849 2012 11 "ABKIT" 1857.2793 1857.2793 . 1 2008 2008 4 1857.2792628005145 2013 11 "ABKIT" 229202.33 229202.33 . 1 2008 2008 5 57300.58370722538 2014 11 "ABKIT" 124664.38 124664.38 . 1 2008 2008 6 31166.09542619379 2012 12 "ABL MEDICAL" 5597.375 . 5597.375 1 2012 2012 0 1399.3437130821144 2013 12 "ABL MEDICAL" 46875.85 46875.85 . 1 2012 2012 1 46875.852646417305 2014 12 "ABL MEDICAL" 92413.03 92413.03 . 1 2012 2012 2 92413.03425338621 2015 12 "ABL MEDICAL" 120786.42 120786.42 . 1 2012 2012 3 120786.41920184325 2004 13 "ACCENTIA PHARM" 3551802 3551802 . 1 2002 2004 2 3551802.0606248565 2005 13 "ACCENTIA PHARM" 5632205 5632205 . 1 2002 2004 3 5632204.502071029 2006 13 "ACCENTIA PHARM" 3682722 6138993 1226450.6 2 2002 2004 4 6445605.7851592805 2007 13 "ACCENTIA PHARM" 11139598 11139598 . 2 2002 2004 5 5607216.736637451 end
the exponential model, using the NLS, poisson and gamma QML estimators
Dear all,
I have a question related to the estimation of the exponential model. This is related to this thread:
https://www.statalist.org/forums/for...sion-nl-vs-reg
The recommendation was to use the Poisson (or gamma quasi-MLEs). If I got it right, this is because of efficiency gains when estimating the parameters.
The issue is my model not fully multiplicative but it has a sum. There is an extra parameter "s" which mean I cannot just add the independent variables. A simplified version of my structural equation would be:
Y = [ X1^(s-1) X2^(s) + X3^(s-1) X4^(s) ]^b1 [ X1^(-s) X2^(s) + X3^(-s) X4^(s) ]^b2
I could give a try to calibrate "s" (then I can compute the sums) to use Poisson or gamma quasi-MLEs. However, it is not clear this is a superior approach just because there is a gain in efficiency. I would lose "s". I had a look to the commands -ppml and -glm and they cannot fit this equation.
Is there still an alternative to NLS to estimate my equation? Any feedback would be most welcomed.
Best,
Paulo
PS: A less simplified version of my equation is
Y_i = [ (p_ij)^(s-1) (q_j)^(s) + (p_ik)^(s-1) (q_k)^(s) ]^b1 [ (p_ij)^(-s) (q_j)^(s) + (p_ik)^(-s) (q_k)^(s) ]^b2
where i, j and K can be the three individuals in my model. I think renaiming these variables as X1-X4 does not change anything but just in case.
I have a question related to the estimation of the exponential model. This is related to this thread:
https://www.statalist.org/forums/for...sion-nl-vs-reg
The recommendation was to use the Poisson (or gamma quasi-MLEs). If I got it right, this is because of efficiency gains when estimating the parameters.
The issue is my model not fully multiplicative but it has a sum. There is an extra parameter "s" which mean I cannot just add the independent variables. A simplified version of my structural equation would be:
Y = [ X1^(s-1) X2^(s) + X3^(s-1) X4^(s) ]^b1 [ X1^(-s) X2^(s) + X3^(-s) X4^(s) ]^b2
I could give a try to calibrate "s" (then I can compute the sums) to use Poisson or gamma quasi-MLEs. However, it is not clear this is a superior approach just because there is a gain in efficiency. I would lose "s". I had a look to the commands -ppml and -glm and they cannot fit this equation.
Is there still an alternative to NLS to estimate my equation? Any feedback would be most welcomed.
Best,
Paulo
PS: A less simplified version of my equation is
Y_i = [ (p_ij)^(s-1) (q_j)^(s) + (p_ik)^(s-1) (q_k)^(s) ]^b1 [ (p_ij)^(-s) (q_j)^(s) + (p_ik)^(-s) (q_k)^(s) ]^b2
where i, j and K can be the three individuals in my model. I think renaiming these variables as X1-X4 does not change anything but just in case.
Thursday, November 29, 2018
Sampling using sample command
Hello Everyone,
I have a dataset which has data for 100 households from each of 3 cities. Further, 5 members from each household are listed in the dataset. So in total, I have 1500 (3*100*5) observations. The household members are divided into 20 groups based on certain characteristics (each member is assigned a group number between 1 and 20).
Lets call the variables as city, household, member and group.
I want to select (using sample command or any other efficient method) 20 members from each city (one from each group). My condition is that only one member can be selected from each household.
When I run the following command:
bysort city group: sample 1, count
I get one member sampled from each group within each city but this command selects (in some cases) more that one members from one household.
What I want is if one member is selected from some group from household1, then no other member in a particular city should be sampled from household1 and one member from each group should also be selected from each city.
Kindly advise how can I achieve this.
Thank you!
Amit
I have a dataset which has data for 100 households from each of 3 cities. Further, 5 members from each household are listed in the dataset. So in total, I have 1500 (3*100*5) observations. The household members are divided into 20 groups based on certain characteristics (each member is assigned a group number between 1 and 20).
Lets call the variables as city, household, member and group.
I want to select (using sample command or any other efficient method) 20 members from each city (one from each group). My condition is that only one member can be selected from each household.
When I run the following command:
bysort city group: sample 1, count
I get one member sampled from each group within each city but this command selects (in some cases) more that one members from one household.
What I want is if one member is selected from some group from household1, then no other member in a particular city should be sampled from household1 and one member from each group should also be selected from each city.
Kindly advise how can I achieve this.
Thank you!
Amit
Incidence rate
Dear All,ho
I used survival analysis to estimate the incidence of mother-to-child transmission of HIV. I used the command "stptime" after setting the "stset".I would like to know how to write about this calculation in the methodology section of my paper using the survival analysis. Please share, ff any one has already written about this in any of the research paper.
thanks and regards,
Rajaram S
I used survival analysis to estimate the incidence of mother-to-child transmission of HIV. I used the command "stptime" after setting the "stset".I would like to know how to write about this calculation in the methodology section of my paper using the survival analysis. Please share, ff any one has already written about this in any of the research paper.
thanks and regards,
Rajaram S
Inquire about uniform random-number generation
The textbook (microeconometrics using stata) said,
However,
My display value is different from the textbook's.
For reproducibility of results, however, it is best to actually set the initial seed by using
. Then if the program is rerun at a later time or by a different researcher, the same results will be obtained.
Code:
set seed
Code:
set seed 10101 scalar u = runiform() display u
Returns to education regression
Hi everyone, I am new to STATA and have a question about a returns to education regression I would like to run. My goal is to find out if returns to education differ for U.S. citizens vs. non-citizens. This is the functional form that I think would be most appropriate: log of wages= B0+ B1 years of education+ B2 male+ B3 experience + B4 experience^2+ B5 citizenship. What are your thoughts? Do you recommend I add or subtract anything? Also, I was thinking about adding an age and age^2 parameter, but the way in which I am calculating experience is ( age-years of education-6) therefore, I thought that would cause collinearity between age and experience? I would be appreciative of any feedback.
URGENT help needed with university data analysis coursework
Here's the deal. I am an absolute noob when it comes to using Stata, having used it only once in my life. My coursework is due very soon and I really need help with the questions. It requires me to include each stata command that would be used to answer each question, and an explanation as to how I managed to answer the question.
I would be extremely grateful if anyone with even the slightest knowledge of using Stata could contribute to this post and help me out!
The questions are attached!
I would be extremely grateful if anyone with even the slightest knowledge of using Stata could contribute to this post and help me out!
The questions are attached!
How to analyze intergroup interaction in subgroup analysis pf meta-analysis (metan, network)
Hi,
I would like to conduct a subgroup analysis of meta-analysis and check intergroup interaction.
I'm using metan or network for the analysis.
Does anyone know if we have an appropriate command or option to get the results of the interaction of subgroup analysis?
I'm uusing STATA 15.
Thank you
Yoshibobu Kondo
I would like to conduct a subgroup analysis of meta-analysis and check intergroup interaction.
I'm using metan or network for the analysis.
Does anyone know if we have an appropriate command or option to get the results of the interaction of subgroup analysis?
I'm uusing STATA 15.
Thank you
Yoshibobu Kondo
Heckman model
Dear Stata Users,
I am applying the two step Heckmam model, but I am having a problem.
After having completed the Heckman model.
As further check I have tried to perform the probit model (decision equation of Heckman procedure).
However, even though I use exactly the same variables, the coefficients of some independent v. and pvalues are completely different from those obtained with Heckman.
I have already Heckman in past, but It is the first time i meet this problem.
If someone can help, I will strongly appreciate
thanks in advance
Nicola
I am applying the two step Heckmam model, but I am having a problem.
After having completed the Heckman model.
As further check I have tried to perform the probit model (decision equation of Heckman procedure).
However, even though I use exactly the same variables, the coefficients of some independent v. and pvalues are completely different from those obtained with Heckman.
I have already Heckman in past, but It is the first time i meet this problem.
If someone can help, I will strongly appreciate
thanks in advance
Nicola
statistical significance
Hi, How check in stata statistical significance between price and country of origin from data stata.auto ?
How to test correlation between two variables – Panel data
Hi all!
I wonder how it is possible to test for correlation between two variables (panel data). What is the praxis in this case? Do I run a regular panel regression between these two variables or else?
Thanks!
I wonder how it is possible to test for correlation between two variables (panel data). What is the praxis in this case? Do I run a regular panel regression between these two variables or else?
Thanks!
xtoprobit vs xtreg, fe
I want to run a fixed effects regression model in stata using panel data to examine the change in individuals' responses over time. My dependent variable is ordinal so I was planning on using xtoprobit, however, I realize this would be using random effects.
Should I use xtoprobit even though it uses random effects? Or would it be best to run a linear regression with fixed effects (xtreg, fe)? Thank you
Should I use xtoprobit even though it uses random effects? Or would it be best to run a linear regression with fixed effects (xtreg, fe)? Thank you
Add Confidence intervals to median spline
I would like to add 95% confidence intervals to a median spline. It does not appear to be an option of mspline and the other confidence interval shading options seem to be determined by a regression plot
for example,
Is there a way to have the shading determined by the spline as opposed to the quadratic regression?
for example,
Code:
use http://www.stata-press.com/data/r13/auto, clear tw (qfitci mpg weight, nofit fintensity(10)) (scatter mpg weight, msize(*.5)) (mspline mpg weight)
Using maps to present incidence rates
Hi everyone
I have calculated incidence rates (per 100 000 person years) for disease X by regions in Norway and want to present them in a map by different colours. I know that this is possible but have no clue how to do it?
Any suggestion is wellcomed
Gerhard
I have calculated incidence rates (per 100 000 person years) for disease X by regions in Norway and want to present them in a map by different colours. I know that this is possible but have no clue how to do it?
Any suggestion is wellcomed
Gerhard
Hausman test
Hey, when running the hausman test, should I include all variables? So dependent, independent, moderator (how? with #?) and controll variables?
Do I need to define them somehow?
Because if I run the test like that I get the note: the rank of the differenced variance matrix (4) does not equal the number of coefficients being tested (5); be sure this is what you expect, or there may be problems computing the test. Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.
Thank you!
Do I need to define them somehow?
Because if I run the test like that I get the note: the rank of the differenced variance matrix (4) does not equal the number of coefficients being tested (5); be sure this is what you expect, or there may be problems computing the test. Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on a similar scale.
Thank you!
Use e(selected0) of Lasso Regression in another regression
Hello,
I ran a Lasso Regression and found the best regressors:
rlasso a b c d e f
Assume b c d are selected as best regressors. I can access the list by > di e(selected0) . My question is how I can run another regression and use those selected regressors there
>reg a e(selected0) returns an error.
Thanks.
Sam
I ran a Lasso Regression and found the best regressors:
rlasso a b c d e f
Assume b c d are selected as best regressors. I can access the list by > di e(selected0) . My question is how I can run another regression and use those selected regressors there
>reg a e(selected0) returns an error.
Thanks.
Sam
Extract Year from Date (11985 - 122018)
Hello, I am working with the BRFS data. I am interested in using a variable - Recent HIV testing.
The variable is recorded in month&date format as in 11985 (for January 1985) and 122018 (December 2018).
I want to create a new variable that just records the years from 1985 thru 2018.
I tried todate command -
But this only gives me the YEAR with m1, m3 etc as in 1983m12
I will appreciate any help to accomplish this.
thanks - cY
The variable is recorded in month&date format as in 11985 (for January 1985) and 122018 (December 2018).
I want to create a new variable that just records the years from 1985 thru 2018.
I tried todate command -
Code:
todate HIVTSTD3, gen(hivtestdate) p(mmyyyy)
I will appreciate any help to accomplish this.
thanks - cY
Inconsistency of data in national survey
Good morning, sorry I am dealing with a topic of inconsistency of data generated by me with those generated by the report according to regions. My goal is to present a map by departments and the results differ by +/- 1% and the degree of intensity the standard used in the report with mine reports different by the difference. How could this data inconsistency be treated?
Sorry for writing this point here, and I appreciate your response.
Sorry for writing this point here, and I appreciate your response.
Generating data to balance a panel dataset
Hi,
I have a panel from Compustat like the below in Table 1 (table 1 is a subset of the data to show 3 different example issues), where gvkey is the firm-specific identifier, fyear is the reporting year, emp is employment, and dlrsn is the reason the firm dropped out of the dataset.
Table 1
I need employment data for each firm all the way up to and including 2016 (as shown in Table 2). However, many firms drop out of the dataset (e.g., because of bankruptcy). For such firms, I want to generate employment numbers for all years from the last date they reported, going up to and including 2016 using the following methodoloy:
Essentially, I want to get from Table 1 to Table 2 and would very much appreciate any advice (note there are thousands of firms where different dates and the above is just a example). I tried the following code but got a bit stuck:
gen dldteyear=year(dldte)
bysort gvkey: egen lastdate=max(fyear)
expand yeardiff if fyear==lastdate
The code was able to duplicate the last reported date in Table 1 the correct number of times but then I was a bit stuck with how to do the next step, I was thinking replace years because they would have to be made consecutive and then replace employment but this started to get a bit messy. I am sure there is probably a better approach then the one I am taking which seems rather mechanical.
Thanks in advance for all support.
Best,
Ali
I have a panel from Compustat like the below in Table 1 (table 1 is a subset of the data to show 3 different example issues), where gvkey is the firm-specific identifier, fyear is the reporting year, emp is employment, and dlrsn is the reason the firm dropped out of the dataset.
Table 1
| gvkey | fyear | emp | dlrsn | |||
| 001 | 1996 | 2 | 02 | |||
| 001 | 1997 | 3 | 02 | |||
| 001 | 1998 | 2 | 02 | |||
| 001 | 1999 | 1 | 02 | |||
| 002 | 1996 | 4 | 06 | |||
| 002 | 1997 | 5 | 06 | |||
| 002 | 1998 | 4 | 06 | |||
| 002 | 1999 | 3 | 06 | |||
| 002 | 2000 | 3 | 06 | |||
| 002 | 2001 | 3 | 06 | |||
| 002 | 2002 | 3 | 06 | |||
| 002 | 2003 | 3 | 06 | |||
| 003 | 1996 | 7 | . | |||
| 003 | 1997 | 8 | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| 003 | 2016 | 14 | . |
I need employment data for each firm all the way up to and including 2016 (as shown in Table 2). However, many firms drop out of the dataset (e.g., because of bankruptcy). For such firms, I want to generate employment numbers for all years from the last date they reported, going up to and including 2016 using the following methodoloy:
- If dlrsn is 02 or 03, set employment number to zero from the first year after the last reporting year going up to and including 2016. For example, in Table 2, firm 001 reports up to 1999, I would like to generate data that has fyears 2000-2016 and employment set at 0 because dlrsn is 02.
- If dlrsn is 01,04,05,06,07,09,10,20, use the last reported employment number for all years after the last reported year. For example, in Table 2, firm 002 reports up 2003, I would like to generate data that has fyear 2004-2016 and employment is set equal to the last available employment number (i.e.,4) because dlrsn is 06
- If the firm does not drop out of the dataset, nothing should change.
| gvkey | fyear | emp | dlrsn | |||
| 001 | 1996 | 2 | 02 | |||
| 001 | 1997 | 3 | 02 | |||
| 001 | 1998 | 2 | 02 | |||
| 001 | 1999 | 1 | 02 | |||
| 001* | 2000* | 0* | 02* | |||
| 001* | 2001* | 0* | 02* | |||
| 001* | 2002* | 0* | 02* | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| 001 | 2016 | 0* | 02 | |||
| 002 | 1996 | 4 | 06 | |||
| 002 | 1997 | 5 | 06 | |||
| 002 | 1998 | 4 | 06 | |||
| 002 | 1999 | 3 | 06 | |||
| 002 | 2000 | 3 | 06 | |||
| 002 | 2001 | 2 | 06 | |||
| 002 | 2002 | 3 | 06 | |||
| 002 | 2003 | 4 | 06 | |||
| 002* | 2004* | 4* | 06* | |||
| 002* | 2005* | 4* | 06* | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| 002* | 2016* | 4* | 06* | |||
| 003 | 1996 | 7 | . | |||
| 003 | 1997 | 8 | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| . | . | . | . | |||
| 003 | 2016 | 14 | . |
gen dldteyear=year(dldte)
bysort gvkey: egen lastdate=max(fyear)
expand yeardiff if fyear==lastdate
The code was able to duplicate the last reported date in Table 1 the correct number of times but then I was a bit stuck with how to do the next step, I was thinking replace years because they would have to be made consecutive and then replace employment but this started to get a bit messy. I am sure there is probably a better approach then the one I am taking which seems rather mechanical.
Thanks in advance for all support.
Best,
Ali
Getting different results from running same exact code - why?
I am dropping observations based on various if statements written in a do file. I have all the code in the do file highlighted and run it. All I do is "clear" and "log close" in the command window, and hit run again without touching the highlighted code or even reselecting the code. I've run the code 10 times in a row this way, and sometimes I get n=1620 and sometimes I get n=1621. No code has changed between each time I run it, so I am positive that Stata is producing different results when given the exact same input. Why is this happening?
Running Diebold Mariano test using panel data.
Hi everyone,
I am totally new here and very new to Stata, nice to meet you all. This maybe a very simple question, I am sorry if this makes you feel that you are wasting your time. Your help would be very appreciated. I have a panel data with 320 financial products and 20 quarters of data. There are two sets of pricing predictions and the real price trend for each product. The goal of the test is to see which pricing function predicted the real price better in sample. As far as I know and tried, DM test can only run on time series data, so I tried to run the test separately for each product as a time series test, using loop and statsby statement. The code is like the following, Stata says there is repeated time values in sample. But I double checked that there is no duplicated time for each id.
forvalues i = 1(1)320 {
if id == `i' {
tsset quarter
dmariano ln_price ln_VC ln_VT
}
}
Can someone show me to the right direction, Thanks.
Boheng
I am totally new here and very new to Stata, nice to meet you all. This maybe a very simple question, I am sorry if this makes you feel that you are wasting your time. Your help would be very appreciated. I have a panel data with 320 financial products and 20 quarters of data. There are two sets of pricing predictions and the real price trend for each product. The goal of the test is to see which pricing function predicted the real price better in sample. As far as I know and tried, DM test can only run on time series data, so I tried to run the test separately for each product as a time series test, using loop and statsby statement. The code is like the following, Stata says there is repeated time values in sample. But I double checked that there is no duplicated time for each id.
forvalues i = 1(1)320 {
if id == `i' {
tsset quarter
dmariano ln_price ln_VC ln_VT
}
}
Can someone show me to the right direction, Thanks.
Boheng
Latent class and standard erros
Dear Stata users,
I am running a latent class analysis with gsem command. I had some problems on convergence reported in a previous post, but I've solved them.
Now my problem is the following: my model has 6 classes, 11 variables and 6 covariates of the membership functions. Everything works fine, the model converge quite quickly. The only problem is that only for one class and only for one variable (a very important dummy variable equal to 1 if the person is retired) of the membership function, I have no value of the standard error. The value is -702,55 so very high in absolute terms. I suppose this means that for this item the probability is very close to 0.
In this situation, are the values of all the other parameters reliable or not? I mean, can I still use my results?
Thank you
I am running a latent class analysis with gsem command. I had some problems on convergence reported in a previous post, but I've solved them.
Now my problem is the following: my model has 6 classes, 11 variables and 6 covariates of the membership functions. Everything works fine, the model converge quite quickly. The only problem is that only for one class and only for one variable (a very important dummy variable equal to 1 if the person is retired) of the membership function, I have no value of the standard error. The value is -702,55 so very high in absolute terms. I suppose this means that for this item the probability is very close to 0.
In this situation, are the values of all the other parameters reliable or not? I mean, can I still use my results?
Thank you
hybrid model pseudo-panel.
hello!
this might sound like a silly question, but I was wondering if one might be able to fit a hybrid model (xthybrid) also when dealing with a pooled crossection. Specifically I have individuals within countries over three year waves (individuals are not the same).
I can run a FE model using dummies for countries and years, but would like to do a bit more.
I tried using
which leads nowhere.
this might sound like a silly question, but I was wondering if one might be able to fit a hybrid model (xthybrid) also when dealing with a pooled crossection. Specifically I have individuals within countries over three year waves (individuals are not the same).
I can run a FE model using dummies for countries and years, but would like to do a bit more.
I tried using
Code:
xtset country year
Dynamic panel data model with xtabond2/ivreg2/Difference-in-Hansen test
Dear all,
I am using an unbalanced panel data with T=11 and N=200,000 to estimate a dynamic panel data model with the xtabond2 command (with Stata 14.1).
So far, I have assumed that the lag of my dependent variable (L.y) is endogenous and all of my other control variables are exogenous. When I estimate the model using xtabond2 and a two-step system GMM with the underlying assumptions, the coefficients are alright but the AR(2)-test is pretty weak and the Hansen-test doesn’t show the desired result. I dropped the first two years manually and I used the suboption orthogonal since I have gaps in my data.
Do you think that my specification in xtabond2 is correct or did I make some mistakes that I didnt’t recognize so far?
When I specify my iv’s separately, I get similar results for the coefficients but different results for the Difference-in-Hansen test:
I am struggling with the correct interpretation of the Difference-in-Hansen test since the 'Hansen test excluding group' is always highly rejected whereas the 'Difference' test is not.
I am also not sure if all of my controls are really exogenous or if they're predetermined or even endogenous.
Can I use the ivreg2 command to test if my controls are exogenous? So for example:
Do you have any suggestions how I could fix my regression command so that the AR(2) and Hansen test show the desired results?
Thanks a lot in advance for your help, any help is highly appreciated.
Kind regards,
Ferdi
I am using an unbalanced panel data with T=11 and N=200,000 to estimate a dynamic panel data model with the xtabond2 command (with Stata 14.1).
So far, I have assumed that the lag of my dependent variable (L.y) is endogenous and all of my other control variables are exogenous. When I estimate the model using xtabond2 and a two-step system GMM with the underlying assumptions, the coefficients are alright but the AR(2)-test is pretty weak and the Hansen-test doesn’t show the desired result. I dropped the first two years manually and I used the suboption orthogonal since I have gaps in my data.
Do you think that my specification in xtabond2 is correct or did I make some mistakes that I didnt’t recognize so far?
Code:
xtabond2 y L.y x1 x2 x3 x4 x5 x6 x7 yr3-yr11, twostep robust orthogonal ///
> gmm(L.y, lag(2 6) equation(both)) ///
> iv(x1 x2 x2 x4 x5 x6 x7, equation(both)) ///
> iv(yr3-yr11, equation(level))
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
Difference-in-Sargan/Hansen statistics may be negative.
Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------
| Corrected
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y |
L1. | .9530541 .0045813 208.03 0.000 .9440749 .9620332
|
x1 | .0009905 .0030752 0.32 0.747 -.0050369 .0070178
x2 | -.001352 .0025792 -0.52 0.600 -.0064072 .0037032
x3 | .0043922 .0084974 0.52 0.605 -.0122625 .0210468
x4 | -.0003227 .0000352 -9.18 0.000 -.0003917 -.0002538
x5 | -.0012368 .0007791 -1.59 0.112 -.0027639 .0002903
x6 | .0751949 .0200101 3.76 0.000 .0359758 .1144139
x7 | -.0035094 .0006449 -5.44 0.000 -.0047734 -.0022453
yr3 | .3537909 .0098877 35.78 0.000 .3344114 .3731705
yr4 | .3586882 .0052821 67.91 0.000 .3483354 .369041
yr5 | .1584298 .0051346 30.86 0.000 .1483661 .1684935
yr6 | .1583574 .0041391 38.26 0.000 .1502449 .1664698
yr7 | .1061737 .0038656 27.47 0.000 .0985972 .1137502
yr8 | .1504396 .0037814 39.78 0.000 .1430281 .157851
yr9 | .1798849 .0034038 52.85 0.000 .1732137 .1865561
yr10 | .1689228 .0034978 48.29 0.000 .1620672 .1757784
yr11 | .0115624 .0032083 3.60 0.000 .0052743 .0178506
_cons | .1823155 .0733113 2.49 0.013 .038628 .3260029
------------------------------------------------------------------------------
Instruments for orthogonal deviations equation
Standard
FOD.(x1 x2 x2 x4 x5 x6 x7)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(2/6).L.y
Instruments for levels equation
Standard
yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 y11
x1 x2 x2 x4 x5 x6 x7
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
DL.L.y
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -58.80 Pr > z = 0.000
Arellano-Bond test for AR(2) in first differences: z = -1.57 Pr > z = 0.117
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(36) = 313.40 Prob > chi2 = 0.000
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(36) = 292.69 Prob > chi2 = 0.000
(Robust, but weakened by many instruments.)
Difference-in-Hansen tests of exogeneity of instrument subsets:
GMM instruments for levels
Hansen test excluding group: chi2(28) = 277.21 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(8) = 15.48 Prob > chi2 = 0.050
iv(x1 x2 x2 x4 x5 x6 x7)
Hansen test excluding group: chi2(30) = 275.27 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(6) = 17.42 Prob > chi2 = 0.008
iv(yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11, eq(level))
Hansen test excluding group: chi2(27) = 188.63 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(9) = 104.06 Prob > chi2 = 0.000
Code:
Difference-in-Hansen tests of exogeneity of instrument subsets:
GMM instruments for levels
Hansen test excluding group: chi2(29) = 279.04 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(8) = 14.31 Prob > chi2 = 0.074
iv(x1)
Hansen test excluding group: chi2(36) = 292.79 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.56 Prob > chi2 = 0.454
iv(x2)
Hansen test excluding group: chi2(36) = 292.54 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.81 Prob > chi2 = 0.368
iv(x3)
Hansen test excluding group: chi2(36) = 292.69 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.66 Prob > chi2 = 0.417
iv(x4)
Hansen test excluding group: chi2(36) = 293.14 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.21 Prob > chi2 = 0.643
iv(x5)
Hansen test excluding group: chi2(36) = 293.26 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.09 Prob > chi2 = 0.766
iv(x6)
Hansen test excluding group: chi2(36) = 293.26 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.10 Prob > chi2 = 0.757
iv(x7)
Hansen test excluding group: chi2(36) = 293.35 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(1) = 0.00 Prob > chi2 = 0.982
iv(yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11, eq(level))
Hansen test excluding group: chi2(28) = 188.71 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(9) = 104.64 Prob > chi2 = 0.000
I am also not sure if all of my controls are really exogenous or if they're predetermined or even endogenous.
Can I use the ivreg2 command to test if my controls are exogenous? So for example:
Code:
xi: ivreg2 y L.y x2 x3 x4 x5 x6 x7 (x1=L.x1) i.yr, gmm2s robust cluster(ID) endogtest(x1)
Thanks a lot in advance for your help, any help is highly appreciated.
Kind regards,
Ferdi
How to create a time dimension in one cross-sectional dataset with individual ID, more than one job, each with a starting and ending year
How to create a time dimension in one cross-sectional dataset with individual ID, more than one job, each job with a starting year and ending year, possible overlap in years between jobs?
why I would have different number of cases between models with MI'd data (possibly because of collinear dependencies of one varaible)?
Hi Statalist,
I am running three logistic models with the same DV with Stata 12.
Model 1 has all variables but the control variables.
Model 2 has all variables.
Model 3 has only the significant variables (only 3).
The regression results say Model 1 and 2 have 145 observations, but model 3 has 149 observations (which are all my cases). I wish to know why.
I did multiple imputation on all variable with missing data, so I do not think listwise deletion due to missing data should be the issue.
But one thing I notice that is different between a) models 1 and 2 and b) model 3, is that model 3 does not include one binary variable that in the model 1 and 2 output gives a logistic coefficient of 0 and odds ratio of 1 with both SEs omitted. I know this indicates that this variable is collinear with another and so is dropped, but I wish to know why this would reduce my number of cases and if there is anything i can do about it?
Or maybe I have different number of cases between the models for another reason?
I know that is best practice to have the same number of observations in all models so it would be great to get advice on how to resolve this.
I provide my output below:
model 1
. mi estimate, or: logistic passportdenied ethnicmin foreign intervention democracy social religion independence
Multiple-imputation estimates Imputations = 40
Logistic regression Number of obs = 145
Average RVI = 0.0000
Largest FMI = 0.0000
DF adjustment: Large sample DF: min = 1.68e+67
avg = 1.68e+67
max = .
Model F test: Equal FMI F( 6, 1.2e+69)= 1.15
Within VCE type: OIM Prob > F = 0.3295
--------------------------------------------------------------------------------
passportdenied | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
ethnicmin | 2.262649 2.020946 0.91 0.361 .3929555 13.02839
foreign | .5196286 .353581 -0.96 0.336 .1369284 1.971935
intervention | 4.023013 2.457557 2.28 0.023 1.214994 13.32076
democracy | 1.264409 1.129343 0.26 0.793 .2195902 7.280522
social | .7929084 .5783657 -0.32 0.750 .1898178 3.312143
religion | 1.325323 1.68382 0.22 0.825 .109868 15.98718
independence | 1 (omitted)
_cons | .0730259 .0703768 -2.72 0.007 .0110447 .4828366
--------------------------------------------------------------------------------
. *model 2 full model: including control variables + variables of interest
. mi estimate, or: logistic passportdenied bardate numdetained ageatbar male ethnicmin educyear foreign democracy social religion independence
Multiple-imputation estimates Imputations = 40
Logistic regression Number of obs = 145
Average RVI = 0.0447
Largest FMI = 0.1887
DF adjustment: Large sample DF: min = 1111.90
avg = 301944.31
max = 1676960.32
Model F test: Equal FMI F( 10,183109.0)= 0.72
Within VCE type: OIM Prob > F = 0.7038
--------------------------------------------------------------------------------
passportdenied | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
bardate | .9810666 .024959 -0.75 0.452 .9333471 1.031226
numdetained | 1.253062 .4282628 0.66 0.509 .6412875 2.448456
ageatbar | .9462083 .0273604 -1.91 0.056 .894019 1.001444
male | .9228291 .7386526 -0.10 0.920 .1922208 4.430391
ethnicmin | 2.227631 2.148039 0.83 0.406 .3365465 14.74489
educyear | 1.046962 .0832463 0.58 0.564 .8957871 1.223648
foreign | .9347841 .5921365 -0.11 0.915 .2700964 3.235219
democracy | 2.196149 2.257036 0.77 0.444 .2929911 16.4615
social | 1.096905 .8634727 0.12 0.906 .2344802 5.131356
religion | 2.74065 3.729275 0.74 0.459 .1903649 39.45665
independence | 1 (omitted)
_cons | 6.38e+15 3.23e+17 0.72 0.473 4.24e-28 9.59e+58
--------------------------------------------------------------------------------
****in my simplified, best fit model final model3 , I only include only intervention
> and age at bar*/
. mi estimate, or: logistic passportdenied intervention ageatbar
Multiple-imputation estimates Imputations = 40
Logistic regression Number of obs = 149
Average RVI = 0.0651
Largest FMI = 0.1566
DF adjustment: Large sample DF: min = 1612.71
avg = 421957.67
max = 1261997.47
Model F test: Equal FMI F( 2, 9397.0) = 3.87
Within VCE type: OIM Prob > F = 0.0209
--------------------------------------------------------------------------------
passportdenied | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
intervention | 3.246078 1.791845 2.13 0.033 1.100253 9.57691
ageatbar | .9524412 .0250589 -1.85 0.064 .9045364 1.002883
_cons | .4047996 .3786314 -0.97 0.334 .0646604 2.534206
--------------------------------------------
Problem of handling missing data
Hi everyone,
I am currently working on looking at the impact of intellectual property rights on the Indian pharmaceutical industry. I have a panel data set (secondary data from CMIE) of 350 firms across 28 time periods. However, I am facing a big problem with regard to missing data. Almost all the variables I need to consider in the model (Eg: R&D=f(pat, exports, imported tech etc) have missing data ranging from 10% to 30%. How best would you suggest I handle this problem before undertaking any analysis? List wise deletion in Stata reduces the number of firms to 68, drastically reducing the sample size.
Is multiple imputation of data when all variables have some missing values a possibility in Stata?
Thank you in advance!
I am currently working on looking at the impact of intellectual property rights on the Indian pharmaceutical industry. I have a panel data set (secondary data from CMIE) of 350 firms across 28 time periods. However, I am facing a big problem with regard to missing data. Almost all the variables I need to consider in the model (Eg: R&D=f(pat, exports, imported tech etc) have missing data ranging from 10% to 30%. How best would you suggest I handle this problem before undertaking any analysis? List wise deletion in Stata reduces the number of firms to 68, drastically reducing the sample size.
Is multiple imputation of data when all variables have some missing values a possibility in Stata?
Thank you in advance!