Hi I am doing repeat cross sectional analysis over 10 years (discharge_year) for of a categorical measure (oc_mix) which have 3 categories(weak, strong, both). I started doing just a descriptive analysis and plotted stacked bar, and calculated the numbers and proportion for each of the category each year. I want now to test if the upward, downward or fluctuating trend seen in each category is statistically significant or not. not sure which test should I use. is it chi square for trend or ologit or some thing else and how to use it in STATA.
discharge year oc_mix
strong week both
2010 1,390 500 200
2011 1450 600 300
2012 1500 679 400
2013 1600 800 100
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Wednesday, November 30, 2022
how to create the start and end date for a year
Dear statalist,
This might be an easy question, but I didn't figure out how to do this. Say I have a set of years, 2010, 2014, 2015, 2019, 2020 etc, I want to create two dates, start date and end date, to account for the first day and last day of the year, e.g., 01jan2010 and 31dec2010, how to create these two variables? Thanks a lot for your help.
This might be an easy question, but I didn't figure out how to do this. Say I have a set of years, 2010, 2014, 2015, 2019, 2020 etc, I want to create two dates, start date and end date, to account for the first day and last day of the year, e.g., 01jan2010 and 31dec2010, how to create these two variables? Thanks a lot for your help.
Perform a paired ttest on two substracts of the same variable
I have a dataset that looks like this:
For each variable (var1, var2, var3, etc), I need to perfom a ttest compairing its mean when attribute = 1 vs the mean of the whole variable. How can I do this?
var1 | var2 | var3 | attribute |
1 | 0.93 | 0.88 | 1 |
1 | 0.76 | 0.20 | 1 |
1 | 0.40 | 0.18 | 0 |
0 | 0.34 | 0.91 | 1 |
0 | 0.09 | 0.51 | 0 |
... | ... | ... | ... |
OLS Regression - explanation of coefficients with control variables
Good morning,
I´m doing an OLS regression with the dependent variable having a health number (
- a dummy variable that takes the value of 1 if migrants have the health number and 0 if not - and the independent variable that is years since migrating (
. To see if as the years since they arrive in the destination country affect their possession of a health number that allows them access to healthcare.
For that I used the controls of Destination Network, having a health visa, being a female, age, having completed at least 12 years of schooling and being employed.
I am however having a hard time to understand the main coefficient since it does not change much across specifications (0.070, 0.071, 0.072, 0.071).
Does this means that the controls variables are not explaining much of what´s going on?
Or does it mean the controls are cancelling each other?
Besides, what does it mean if in some of the columns when I added the different specifications, the main coefficient (so the relation between YSM and the dependent variable) loses significance? For instance, instead of a 1% significance it has 5% significance, although it has similar values?
Thank you
I´m doing an OLS regression with the dependent variable having a health number (
Code:
Health_Number
Code:
YSM
For that I used the controls of Destination Network, having a health visa, being a female, age, having completed at least 12 years of schooling and being employed.
Code:
regress Health_Number YSM outreg2 using Regression0, excel append ctitle(Basic) dec(3) regress Health_Number YSM Dest_Network outreg2 using Regression0, excel append ctitle(Network) dec(3) regress Health_Number YSM Dest_Network Health_Visa outreg2 using Regression0, excel append ctitle(Having a Health Visa) dec(3) regress Health_Number YSM Dest_Network Health_Visa Female Age AtLeast_CompletedSecondaryEduc Employed outreg2 using Regression0, excel append ctitle(Migrant Controls) dec(3)
Does this means that the controls variables are not explaining much of what´s going on?
Or does it mean the controls are cancelling each other?
Besides, what does it mean if in some of the columns when I added the different specifications, the main coefficient (so the relation between YSM and the dependent variable) loses significance? For instance, instead of a 1% significance it has 5% significance, although it has similar values?
Thank you
How to write this with a loop (repetitive coding with numerical values that do not follow each other)
Hello everyone,
How can I write this faster?
Any suggestions please? Is it possible to produce a loop, despite the fact that the "gan1" variable numerical values do not follow each other?
Thank you so much.
Michael
How can I write this faster?
Code:
svy: prop rech if form1==2 & gan1==14 svy: prop rech if form1==2 & gan1==12 svy: prop rech if form1==2 & gan1==8 svy: prop rech if form1==2 & gan1==6 svy: prop rech if form1==2 & gan1==5 svy: prop rech if form1==2 & gan1==4 svy: prop rech if form1==2 & gan1==3 svy: prop rech if form1==2 & gan1==2 svy: prop rech if form1==2 & gan1==1 svy: prop rech if form1==3 & gan1==14 svy: prop rech if form1==3 & gan1==12 svy: prop rech if form1==3 & gan1==8 svy: prop rech if form1==3 & gan1==6 svy: prop rech if form1==3 & gan1==5 svy: prop rech if form1==3 & gan1==4 svy: prop rech if form1==3 & gan1==3 svy: prop rech if form1==3 & gan1==2 svy: prop rech if form1==3 & gan1==1 svy: prop rech if form1==1 & gan1==14 svy: prop rech if form1==1 & gan1==12 svy: prop rech if form1==1 & gan1==8 svy: prop rech if form1==1 & gan1==6 svy: prop rech if form1==1 & gan1==5 svy: prop rech if form1==1 & gan1==4 svy: prop rech if form1==1 & gan1==3 svy: prop rech if form1==1 & gan1==2 svy: prop rech if form1==1 & gan1==1
Thank you so much.
Michael
Two way graph with multiple lines in a loop
Hello everyone, Iam trying to do a graph of multiple lines. What I want is a graph that shows in a same graph diffmain, diff4,diff5,diff21 and diff34 as shown in the image. Iam trying to do it with a loop but it does not work, anyone has an idea of how to run it?
Array
Code:
levelsof state if (state ==4|state ==5|state==21|state==34), local(levels) 4 5 21 34 use "PS2_data", clear levelsof state if (state ==4|state ==5|state==21|state==34), local(levels) use "total_results34",clear foreach l of local levels { drop if rmspe`l'>=2*rmspemain twoway (line diffmain _time,color(black)), name(gmain,replace) twoway ((line diff`l' _time,color(gray)), xline(1989,lcolor(gray) lpattern(dash)) yline(0,lcolor(gray) lpattern(dash))), name(g`l',replace) graph combine gmain g`l' }
Code:
_time rmspemain diffmain rmspe4 diff4 rmspe5 diff5 rmspe21 diff21 rmspe34 diff34) 1970 1.954279 4.857199 2.680468 7.661901 2.1965363 6.316099 3.5513296 10.3319 3.185257 9.161399 1971 1.954279 1.8468013 2.680468 4.467302 2.1965363 1.678701 3.5513296 4.8542 3.185257 4.4762006 1972 1.954279 -.8392038 2.680468 2.714297 2.1965363 .7133963 3.5513296 -2.419502 3.185257 -2.767302 1973 1.954279 -2.0618958 2.680468 2.5514026 2.1965363 -1.788496 3.5513296 -1.9223986 3.185257 -1.2888986 1974 1.954279 -.3828028 2.680468 1.7265977 2.1965363 -.2369025 3.5513296 -.5969042 3.185257 .1664961 1975 1.954279 .5749984 2.680468 .24679898 2.1965363 .3920981 3.5513296 1.0028981 3.185257 1.4219983 1976 1.954279 .3586017 2.680468 .6195006 2.1965363 -.09239785 3.5513296 -.3425991 3.185257 .9643013 1977 1.954279 1.5313005 2.680468 1.0875012 2.1965363 1.366001 3.5513296 -.9009992 3.185257 .35590065 1978 1.954279 2.3699 2.680468 1.039699 2.1965363 2.0144997 3.5513296 .8982986 3.185257 1.6601986 1979 1.954279 -1.4309988 2.680468 -2.4031 2.1965363 -1.730199 3.5513296 -1.1398989 3.185257 -.5653989 1980 1.954279 -.13320343 2.680468 .1253957 2.1965363 -.3608031 3.5513296 -1.229802 3.185257 -1.349701 1981 1.954279 -2.0308006 2.680468 -1.513 2.1965363 -2.757701 3.5513296 -2.3394022 3.185257 -2.7693014 1982 1.954279 -.9976991 2.680468 -.5988988 2.1965363 -1.8019996 3.5513296 -2.633698 3.185257 -2.3449986 1983 1.954279 -.9086969 2.680468 1.0149046 2.1965363 -1.1320968 3.5513296 -4.579699 3.185257 -4.761999 1984 1.954279 1.3141013 2.680468 3.783601 2.1965363 1.093001 3.5513296 -4.6354966 3.185257 -3.2002964 1985 1.954279 -.8733967 2.680468 .3187043 2.1965363 -1.5047966 3.5513296 -3.321697 3.185257 -2.688096 1986 1.954279 -1.2375056 2.680468 .05029415 2.1965363 -1.1422058 3.5513296 -3.9199016 3.185257 -3.229702 1987 1.954279 -3.6921985 2.680468 -2.598498 2.1965363 -4.066399 3.5513296 -3.0094006 3.185257 -2.0427008 1988 1.954279 -2.3528 2.680468 -2.7908006 2.1965363 -1.5935994 3.5513296 -2.8035014 3.185257 -1.9126003 1989 1.954279 -7.688198 2.680468 -9.372597 2.1965363 -7.157198 3.5513296 -5.6707 3.185257 -4.5439997 1990 1.954279 -9.518498 2.680468 -9.311898 2.1965363 -9.742398 3.5513296 -10.726597 3.185257 -11.082798 1991 1.954279 -13.776502 2.680468 -13.709203 2.1965363 -13.9996 3.5513296 -18.012802 3.185257 -16.981604 1992 1.954279 -13.3233 2.680468 -14.2893 2.1965363 -13.5949 3.5513296 -18.1126 3.185257 -17.167501 1993 1.954279 -17.057299 2.680468 -17.5239 2.1965363 -17.500498 3.5513296 -21.837196 3.185257 -20.791197 1994 1.954279 -20.9162 2.680468 -21.6492 2.1965363 -21.8273 3.5513296 -26.5969 3.185257 -27.2637 1995 1.954279 -19.8731 2.680468 -22.1855 2.1965363 -20.6644 3.5513296 -24.5541 3.185257 -23.0643 1996 1.954279 -21.0376 2.680468 -23.1864 2.1965363 -21.874 3.5513296 -24.3403 3.185257 -22.0078 1997 1.954279 -21.4709 2.680468 -24.2357 2.1965363 -22.5902 3.5513296 -24.3264 3.185257 -21.6234 1998 1.954279 -19.1829 2.680468 -20.5654 2.1965363 -20.3489 3.5513296 -26.0876 3.185257 -25.0846 1999 1.954279 -24.5438 2.680468 -26.1011 2.1965363 -25.6143 3.5513296 -27.8567 3.185257 -27.7196 2000 1.954279 -24.2594 2.680468 -25.1009 2.1965363 -25.0719 3.5513296 -27.6924 3.185257 -26.4853 end
using newey west for heteroskedasticity and autocorrelation in vecm
So, I am trying to run a regression that requires me to log and take the first differences of it to render stationary. That is why I am using VECM.
I ran the diagnostic tests, and everything is fine except that I have heteroskedasticity and autocorrelation. So, I tried to use newey west SE in Stata and it fixed the problem. However, I have one question. Should I run let's say
newey d.log(y) d.log(x1) d.log(x2) lag() or only newey log(y) log(x1) log(x2) lag() to follow the VECM?
Thank you in advance!
I ran the diagnostic tests, and everything is fine except that I have heteroskedasticity and autocorrelation. So, I tried to use newey west SE in Stata and it fixed the problem. However, I have one question. Should I run let's say
newey d.log(y) d.log(x1) d.log(x2) lag() or only newey log(y) log(x1) log(x2) lag() to follow the VECM?
Thank you in advance!
Tuesday, November 29, 2022
creating a dummy variable based on percentage
In the following sample dataset the house election result in United States are given from 2002-2020. Candidatevotes indicate the person who is representing the party variable ( democrat, republican, green , independent) how much vote they got and the totalvotes variable indicate how much vote that state-district has.
I want to create an indicator of the incumbent House Representative being of the same party as the President ( democrat_pres variable tells if the year has a democrat president or not).
Also, I want to create a competitive indicator which will hold 1 if the democratic vote share is 40-45% , will hold 2 if the democratic vote share is 46-50%, will hold 3 if democratic vote share is 51-55% , will hold 4 if the democratic vote share is 55-60% and will hold 5 if the democratic vote share is 60%.
Can anyone kindly guide me how I can do the above ?
I want to create an indicator of the incumbent House Representative being of the same party as the President ( democrat_pres variable tells if the year has a democrat president or not).
Also, I want to create a competitive indicator which will hold 1 if the democratic vote share is 40-45% , will hold 2 if the democratic vote share is 46-50%, will hold 3 if democratic vote share is 51-55% , will hold 4 if the democratic vote share is 55-60% and will hold 5 if the democratic vote share is 60%.
Can anyone kindly guide me how I can do the above ?
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int year str20 state byte(district state_fips) str47 party long(candidatevotes totalvotes) float democrat_pres 2016 "ALABAMA" 1 1 "REPUBLICAN" 208083 215893 1 2016 "ALABAMA" 1 1 "" 7810 215893 1 2016 "ALABAMA" 2 1 "REPUBLICAN" 134886 276584 1 2016 "ALABAMA" 2 1 "DEMOCRAT" 112089 276584 1 2016 "ALABAMA" 2 1 "" 29609 276584 1 2016 "ALABAMA" 3 1 "DEMOCRAT" 94549 287104 1 2016 "ALABAMA" 3 1 "REPUBLICAN" 192164 287104 1 2016 "ALABAMA" 3 1 "" 391 287104 1 2016 "ALABAMA" 4 1 "REPUBLICAN" 235925 239444 1 2016 "ALABAMA" 4 1 "" 3519 239444 1 2016 "ALABAMA" 5 1 "REPUBLICAN" 205647 308326 1 2016 "ALABAMA" 5 1 "DEMOCRAT" 102234 308326 1 2002 "CALIFORNIA" 6 6 "REPUBLICAN" 62052 209563 0 2002 "CALIFORNIA" 6 6 "LIBERTARIAN" 4936 209563 0 2002 "CALIFORNIA" 7 6 "REPUBLICAN" 36584 138376 0 2002 "CALIFORNIA" 7 6 "DEMOCRAT" 97849 138376 0 2002 "CALIFORNIA" 7 6 "LIBERTARIAN" 3943 138376 0 2002 "CALIFORNIA" 8 6 "REPUBLICAN" 20063 160441 0 2002 "CALIFORNIA" 8 6 "LIBERTARIAN" 2659 160441 0 2002 "CALIFORNIA" 8 6 "GREEN" 10033 160441 0 2002 "CALIFORNIA" 8 6 "DEMOCRAT" 127684 160441 0 2002 "CALIFORNIA" 8 6 "" 2 160441 0 2002 "CALIFORNIA" 9 6 "DEMOCRAT" 135893 166917 0 2002 "CALIFORNIA" 9 6 "" 6 166917 0 2002 "CALIFORNIA" 9 6 "LIBERTARIAN" 5685 166917 0 2002 "CALIFORNIA" 9 6 "REPUBLICAN" 25333 166917 0 2002 "CALIFORNIA" 10 6 "DEMOCRAT" 126390 167197 0 2002 "CALIFORNIA" 10 6 "LIBERTARIAN" 40807 167197 0 2020 "ARIZONA" 1 4 "REPUBLICAN" 176709 365178 0 2020 "ARIZONA" 1 4 "DEMOCRAT" 188469 365178 0 2020 "ARIZONA" 2 4 "DEMOCRAT" 209945 381054 0 2020 "ARIZONA" 2 4 "REPUBLICAN" 170975 381054 0 2020 "ARIZONA" 2 4 "WRITE-IN (COMMON SENSE MODERATE)" 35 381054 0 2020 "ARIZONA" 2 4 "WRITE-IN (INDEPENDENT)" 99 381054 0 2020 "ARIZONA" 3 4 "REPUBLICAN" 95594 269837 0 2020 "ARIZONA" 3 4 "DEMOCRAT" 174243 269837 0 2020 "ARIZONA" 4 4 "WRITE-IN (INDEPENDENT)" 39 398623 0 2020 "ARIZONA" 4 4 "WRITE-IN (LIBERTARIAN)" 67 398623 0 2020 "ARIZONA" 4 4 "DEMOCRAT" 120484 398623 0 2020 "ARIZONA" 4 4 "WRITE-IN (REPUBLICAN)" 5 398623 0 2020 "ARIZONA" 4 4 "WRITE-IN (INDEPENDENT)" 7 398623 0 2020 "ARIZONA" 4 4 "WRITE-IN (DEMOCRATIC)" 19 398623 0 2020 "ARIZONA" 4 4 "REPUBLICAN" 278002 398623 0 2020 "ARIZONA" 5 4 "REPUBLICAN" 262414 445657 0 end
local macro text with line break
Hi there
Does anyone know if there is a way to spread the text contents of a local macro across multiple lines?
The desired result is to display:
First line
Second line
Does anyone know if there is a way to spread the text contents of a local macro across multiple lines?
Code:
local lines """First line" "Second line""" disp "`lines'"
First line
Second line
Editing graph with a TIFF file
Hi everyone,
Trying to find out if there's a way for me to edit my graph using stata's graph editor with just the TIF file. The graph was produced using stata a few months back by my colleague and I do not have the data anymore to reproduce the graph and edit them
Is there a way to import the TIFF file into stata just so I could edit the graph?
Thanks
Trying to find out if there's a way for me to edit my graph using stata's graph editor with just the TIF file. The graph was produced using stata a few months back by my colleague and I do not have the data anymore to reproduce the graph and edit them
Is there a way to import the TIFF file into stata just so I could edit the graph?
Thanks
pweight with melogit
I have a Panel dataset with 260,647 data points consisting of 260,647 Respondents within 43,400 households and between 1 and 11 survey waves. Using this dataset, I am trying to run a logistic random effects regression. So far, I am using xtlogit, which works fine except that it doesn't allow me to use the survey weights that are needed to make statistical inference. Additionally, I can't account for the household level in the model, which is why I am currently using robust standard errors. I would prefer to specify a model which allows me to include my survey weight with (pweight) and to explicitly model both the panel structure and the multilevel structure. The model looks like this:
(I am using the union data here because I am not allowed to share the actual data that I am using)
The multilevel structure can be added using xtmelogit, but as I understand it, it is no longer part of the official stata which makes me a bit suspicious and also it doesn't allow me to use pweights (and it takes a really long time to run even for the empty model).
A good option seems to be melogit, but for some reason it doesn't converge even when I am using it to fit the empty model without the houdehold level as soon as I include the weights. The model looks like this (of course, there is no weighting variable in the union dataset. To illustrate the problem, I have created a weighting variable pw = 1, but with this, the model runs. So this is just an illustration of what my code looks like and unfortunately can't replicate the problem):
With my data, the output from that model looks like this:
Does anyone have an Idea what could be the issue here even without looking at the data? Or maybe there is another way to specify xtlogit-like models in Stata that allows both pweights and another level?
Code:
webuse union.dta, clear xtset idcode year xtlogit union, vce(robust) re
The multilevel structure can be added using xtmelogit, but as I understand it, it is no longer part of the official stata which makes me a bit suspicious and also it doesn't allow me to use pweights (and it takes a really long time to run even for the empty model).
A good option seems to be melogit, but for some reason it doesn't converge even when I am using it to fit the empty model without the houdehold level as soon as I include the weights. The model looks like this (of course, there is no weighting variable in the union dataset. To illustrate the problem, I have created a weighting variable pw = 1, but with this, the model runs. So this is just an illustration of what my code looks like and unfortunately can't replicate the problem):
Code:
gen pw = 1 melogit union [pw=pw] || idcode:
Fitting fixed-effects model:
Iteration 0: log likelihood = -4.512e+08
Iteration 1: log likelihood = -4.500e+08
Iteration 2: log likelihood = -4.500e+08
Iteration 3: log likelihood = -4.500e+08
Refining starting values:
Grid node 0: log likelihood = .
Grid node 1: log likelihood = .
Grid node 2: log likelihood = .
Grid node 3: log likelihood = .
(note: Grid search failed to find values that will yield a log likelihood value.)
Fitting full model:
initial values not feasible
r(1400);
Iteration 0: log likelihood = -4.512e+08
Iteration 1: log likelihood = -4.500e+08
Iteration 2: log likelihood = -4.500e+08
Iteration 3: log likelihood = -4.500e+08
Refining starting values:
Grid node 0: log likelihood = .
Grid node 1: log likelihood = .
Grid node 2: log likelihood = .
Grid node 3: log likelihood = .
(note: Grid search failed to find values that will yield a log likelihood value.)
Fitting full model:
initial values not feasible
r(1400);
Monday, November 28, 2022
Formal tests of volatility in STATA17
Dear Statlist:
I have longitudinal data on employee data across multiple years and organizations. I've been making a series of line graphs to visualize the trends in employee size across time by organizations. These are intuitive, but I wonder if there are formal tests/commands that let us test volatility. My goal is to see which organizations had the greatest fluctuation in terms of employee size (& salary) over the years. Thanks in advance, and here's an example of my data where "agy" is organization and "adjbasicpay" salary.
I have longitudinal data on employee data across multiple years and organizations. I've been making a series of line graphs to visualize the trends in employee size across time by organizations. These are intuitive, but I wonder if there are formal tests/commands that let us test volatility. My goal is to see which organizations had the greatest fluctuation in terms of employee size (& salary) over the years. Thanks in advance, and here's an example of my data where "agy" is organization and "adjbasicpay" salary.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float year str2 agy long adjbasicpay 1973 "AG" 18080 1973 "AG" 36000 1973 "AG" 34992 1973 "AG" 26671 1973 "AG" 36000 1973 "AG" 33915 1973 "AG" 12056 1973 "AG" 24247 1973 "AG" 30147 1973 "AG" 28267 1973 "AG" 34971 1973 "AG" 36000 1973 "AG" 9235 1973 "AG" 11961 1973 "AG" . 1973 "AG" 15009 1973 "AG" 33899 1973 "AG" 9144 1973 "AG" 36000 1973 "AG" 30486 1973 "AG" 30147 1973 "AG" 16609 1973 "AG" 12634 1973 "AG" 34074 1973 "AG" 9969 1973 "AG" 9874 1973 "AG" 24247 1973 "AG" 21014 1973 "AG" 40000 1973 "AG" 7198 1973 "AG" 14053 1973 "AG" 24247 1973 "AG" . 1973 "AG" 15331 1973 "AG" 36000 1973 "AG" 10002 1973 "AG" 31089 1973 "AG" 14671 1973 "AG" 16609 1973 "AG" 26898 1973 "AG" 13379 1973 "AG" 36000 1973 "AG" 32973 1973 "AG" 17497 1973 "AG" 10528 1973 "AG" 11961 1973 "AG" 8757 1973 "AG" 15331 1973 "AG" 34971 1973 "AG" 38000 1973 "AG" 12373 1973 "AG" 24247 1973 "AG" 15609 1973 "AG" 12501 1973 "AG" 33177 1973 "AG" 12283 1973 "AG" 8591 1973 "AG" 17605 1973 "AG" 15649 1973 "AG" 28267 1973 "AG" 13687 1973 "AG" 31089 1973 "AG" 12283 1973 "AG" 31089 1973 "AG" 34971 1973 "AG" 21671 1973 "AG" 34965 1973 "AG" 13996 1973 "AG" 8299 1973 "AG" 19246 1973 "AG" 23858 1973 "AG" 10234 1973 "AG" 25055 1973 "AG" 13336 1973 "AG" 11297 1973 "AG" 12979 1973 "AG" 33899 1973 "AG" 8299 1973 "AG" 36000 1973 "AG" 12056 1973 "AG" 14603 1973 "AG" 12634 1973 "AG" 31383 1973 "AG" 14928 1973 "AG" 36000 1973 "AG" 9493 1973 "AG" 36000 1973 "AG" 38000 1973 "AG" 10860 1973 "AG" 28267 1973 "AG" 29205 1973 "AG" 34971 1973 "AG" 36000 1973 "AG" 36000 1973 "AG" 33177 1973 "AG" 16609 1973 "AG" 12985 1973 "AG" 23088 1973 "AG" 15945 1973 "AG" 15009 end
Color changes with PNG export.
On the left, we have the graph I actually want (ignore please the slight transparency difference between legend and data colors). On the right, we have the PNG output from
run immediately after the production of the graph. I tried
but this had no effect. I have had no problem producing PNG files with these colors previously. I have also tried exporting to PDF, getting different colors again than either of these.
Array

Code:
graph export XXXX.png, replace
Code:
set printcolor asis
Array
Generating the "opposing" variable in a long dataset
Hi all,
I am using Stata 17/SE on Mac and I am having trouble generating a variable using another observation within a group.
For context, I am working with tennis data. I have two rank variables, one a singles ranking (single_rank) and a doubles_ranking.
The singles rank is reflective of the player's rank, while the doubles rank is the average of the team's double ranking: egen var = mean(var), by(i).
i refers to match number, j is player number 1-2 is team 1 and 3-4 is team 2, Ranking_* variables refer to the original ranking data (MS = men's singles, MD = men's doubles, etc.).
My question is, is there a way that I can generate the opposing player/team's ranking in this long dataset (I have this for the tournament seed variable, where t_ refers to the player and o_ refers to the opponent).
Any help is appreciated!
I am using Stata 17/SE on Mac and I am having trouble generating a variable using another observation within a group.
For context, I am working with tennis data. I have two rank variables, one a singles ranking (single_rank) and a doubles_ranking.
The singles rank is reflective of the player's rank, while the doubles rank is the average of the team's double ranking: egen var = mean(var), by(i).
i refers to match number, j is player number 1-2 is team 1 and 3-4 is team 2, Ranking_* variables refer to the original ranking data (MS = men's singles, MD = men's doubles, etc.).
My question is, is there a way that I can generate the opposing player/team's ranking in this long dataset (I have this for the tournament seed variable, where t_ refers to the player and o_ refers to the opponent).
Code:
input float i byte(j team p_pos t_tourn_seed o_tourn_seed) int(Ranking_MS Ranking_MD Ranking_WS Ranking_WD) float(rank_single rank_dbls) 367 1 1 1 4 . . . . . . . 367 3 2 1 . 4 . . 1326 . 1326 . 368 1 1 1 . 5 . . 1028 638 1028 638 368 3 2 1 5 . . . 536 626 536 626 369 1 1 1 . 3 . . . . . . 369 3 2 1 3 . . . 484 587 484 587 370 1 1 1 . 5 . . . . . . 370 3 2 1 5 . . . 536 626 536 626 371 1 1 1 . . . . . . . . 371 3 2 1 . . . . . . . . 372 1 1 1 4 . . . . . . . 372 3 2 1 . 4 . . . . . . 373 1 1 1 . . . . 692 . 692 . 373 3 2 1 . . . . . . . . 374 1 1 1 . 7 . . 1326 . 1326 . 374 3 2 1 7 . . . 612 620 612 620 375 1 1 1 4 2 . . . . . . 375 3 2 1 2 4 . . 326 324 326 324 376 1 1 1 . . . . 1326 . 1326 . 376 3 2 1 . . . . 986 . 986 . 377 1 1 1 . 3 . . 999 . 999 . 377 3 2 1 3 . . . 484 587 484 587 378 1 1 1 6 . . . 631 454 631 454 378 3 2 1 . 6 . . 938 . 938 . 379 1 1 1 1 4 . . 187 405 187 405 379 3 2 1 4 1 . . . . . . 380 1 1 1 . . . . 1168 . 1168 . 380 3 2 1 . . . . 825 852 825 852 381 1 1 1 1 5 . . 187 405 187 405 381 3 2 1 5 1 . . 536 626 536 626 382 1 1 1 . 3 . . . . . . 382 3 2 1 3 . . . 484 587 484 587 383 1 1 1 6 . . . 631 454 631 454 383 3 2 1 . 6 . . 915 891 915 891 384 1 1 1 1 . . . 187 405 187 405 384 3 2 1 . 1 . . . . . . 385 1 1 1 . . . . 692 . 692 . 385 3 2 1 . . . . 999 . 999 . 386 1 1 1 . . . . . . . . 386 3 2 1 . . . . 1307 . 1307 . 387 1 1 1 1 . . . 187 405 187 405 387 3 2 1 . 1 . . 925 . 925 . 388 1 1 1 . 2 . . . . . . 388 3 2 1 2 . . . 326 324 326 324 389 1 1 1 . 2 . . 1168 . 1168 . 389 3 2 1 2 . . . 326 324 326 324 390 1 1 1 . 2 . . 999 . 999 . 390 3 2 1 2 . . . 326 324 326 324 391 1 1 1 6 2 . . 631 454 631 454 391 3 2 1 2 6 . . 326 324 326 324 392 1 1 1 . . . . 1028 638 1028 638 392 3 2 1 . . . . 752 . 752 . 393 1 1 1 . 7 . . 1153 . 1153 . 393 3 2 1 7 . . . 612 620 612 620 394 1 1 1 . . . . . . . . 394 3 2 1 . . . . 1119 744 1119 744 395 1 1 1 . . . . 999 . 999 . 395 3 2 1 . . . . . . . . 396 1 1 1 4 . . . . . . . 396 3 2 1 . 4 . . . . . . 397 1 1 1 . . . . 1294 . 1294 . 397 3 2 1 . . . . 938 . 938 . 398 1 1 1 . . . . . . . 907 398 2 1 2 . . . . 1383 907 1383 907 398 3 2 1 . . . . 1101 927 1101 927 398 4 2 2 . . . . . . . 927 399 1 1 1 . . . . . . . . 399 2 1 2 . . . . . . . . 399 3 2 1 . . . . . . . . 399 4 2 2 . . . . . . . . 400 1 1 1 . 2 . . 915 891 915 891 400 2 1 2 . 2 . . 1168 . 1168 891 400 3 2 1 2 . . . . . . . 400 4 2 2 2 . . . . . . . 401 1 1 1 1 3 . . 187 405 187 364.5 401 2 1 2 1 3 . . 326 324 326 364.5 401 3 2 1 3 1 . . . . . 620 401 4 2 2 3 1 . . 612 620 612 620 402 1 1 1 3 . . . . . . 620 402 2 1 2 3 . . . 612 620 612 620 402 3 2 1 . 3 . . . . . . 402 4 2 2 . 3 . . . . . . 403 1 1 1 1 . . . 187 405 187 364.5 403 2 1 2 1 . . . 326 324 326 364.5 403 3 2 1 . 1 . . 536 626 536 626 403 4 2 2 . 1 . . 752 . 752 626 404 1 1 1 4 2 . . 1028 638 1028 638 404 2 1 2 4 2 . . . . . 638 404 3 2 1 2 4 . . . . . . 404 4 2 2 2 4 . . . . . . 405 1 1 1 1 . . . 187 405 187 364.5 405 2 1 2 1 . . . 326 324 326 364.5 405 3 2 1 . 1 . . 999 . 999 . 405 4 2 2 . 1 . . 986 . 986 . 406 1 1 1 . . . . . . . 454 406 2 1 2 . . . . 631 454 631 454 406 3 2 1 . . . . 536 626 536 626 406 4 2 2 . . . . 752 . 752 626 407 1 1 1 . 4 . . 1119 744 1119 665.5 407 2 1 2 . 4 . . 484 587 484 665.5 end
convert string yyyy-mm-dd hh:mm:ss to %td format
Hello,
As the title of this question suggests, I have a set of data with variable "Qtm" in string format, e.g., 2010-01-07 19:16:33, I would like to convert it to 07jan2010 format, and ignore the detailed hours in the day. How can I do this? Thanks a lot for any kind help.
Some data here:
As the title of this question suggests, I have a set of data with variable "Qtm" in string format, e.g., 2010-01-07 19:16:33, I would like to convert it to 07jan2010 format, and ignore the detailed hours in the day. How can I do this? Thanks a lot for any kind help.
Some data here:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str19 Qtm "2010-01-07 19:16:33" "2010-01-13 15:25:27" "2010-03-02 15:29:59" "2010-03-15 11:30:11" "2010-05-08 11:12:47" "2010-06-01 03:23:08" "2010-06-02 13:56:21" "2010-06-12 12:48:23" "2010-06-29 12:49:02" "2010-08-11 11:14:59" "2010-09-01 08:58:56" "2010-09-16 14:34:23" "2010-11-16 16:17:54" "2010-01-05 09:04:30" "2010-01-05 14:08:51" "2010-01-16 21:34:55" "2010-02-02 15:32:56" "2010-03-03 17:54:07" "2010-03-04 15:24:07" "2010-03-04 16:08:45" end
Stata command for weighted M-estimator.
Dear all,
Does Stata has commands or packages that can estimate parameters for weighted M-estimators? For example, how to estimate beta with the estimating equations below:
Array
Best,
Liang
Does Stata has commands or packages that can estimate parameters for weighted M-estimators? For example, how to estimate beta with the estimating equations below:
Array
Best,
Liang
double-hurdle model not feasible
Hi,
I'm using a panel double-hurdle model using
command. But I get the following error:
Your help is appreciated. Thanks,
I'm using a panel double-hurdle model using
Code:
xtdhreg
Code:
Obtaining starting values for full model: Iteration 0: log likelihood = 230358.78 Iteration 1: log likelihood = 230440.79 Iteration 2: log likelihood = 230440.81 Fitting full model: initial values not feasible r(1400);
How I find the wrong data?
Dear Statalist,
I have the note: multiple positive outcomes within groups encountered, followed by
clogit Y asc fish_d2 fish_d3 vet_d2 vet_d3 alc_d2 alc_d3 smk_d2 smk_d3 pqt_d2 pqt_d3, group (ncs) cluster(
> ID). Help me to find the ID that has multiple positive outcomes.
Best Wishes,
Sukunta
I have the note: multiple positive outcomes within groups encountered, followed by
clogit Y asc fish_d2 fish_d3 vet_d2 vet_d3 alc_d2 alc_d3 smk_d2 smk_d3 pqt_d2 pqt_d3, group (ncs) cluster(
> ID). Help me to find the ID that has multiple positive outcomes.
Best Wishes,
Sukunta
Create Twoway Line Graph Forcing Gaps for Missing Periods
Hi all,
My data:
I would like to plot these data using a twoway line, with two y-axes; one for new positions, one for unemployment. The x-axis should be for time (Datum_n).
I ran the following code:
When I run the code above, both lines are continuous, however I do not want them to be.
The thing is, as is visible from the data extract, there are gaps in Datum_n, and I would like these gaps to be reflected in the line of the following variable: sum_newpositions_bymonth. Basically, I would want the line of sum_newpositions_bymonth to be discontinuous, meaning that it should be "interrupted" whenever there is no "Datum_n" (e.g. over the period 727 to 732) and then, after the gap, start again at the next available date for Datum_n.
Could anyone please let know how to adapt the code to show this discontinuity?
Many thanks in advance!
My data:
Code:
input float(Datum_n total_unem_bymonth sum_newpositions_bymonth) 723 148245 2261 724 150673 4089 725 144790 855 726 143049 5430 727 145249 5507 732 164182 4655 733 162495 5044 734 152841 5753 735 146375 4993 736 138150 4628 737 127136 3637 738 123275 3318 739 121203 3301 740 115404 3811 744 117633 3418 745 113398 4188 746 105133 3700 747 99974 3164 749 87939 3584
I ran the following code:
Code:
twoway line sum_newpositions_bymonth Datum_n , yaxis(2) ytitle("Monthly Total New Dual VET Positions") || line total_unem_bymonth Datum_n, yaxis(1) ytitle("Number of Registered Unemployed Individuals (Monthly Total)")
The thing is, as is visible from the data extract, there are gaps in Datum_n, and I would like these gaps to be reflected in the line of the following variable: sum_newpositions_bymonth. Basically, I would want the line of sum_newpositions_bymonth to be discontinuous, meaning that it should be "interrupted" whenever there is no "Datum_n" (e.g. over the period 727 to 732) and then, after the gap, start again at the next available date for Datum_n.
Could anyone please let know how to adapt the code to show this discontinuity?
Many thanks in advance!
Treatment Variation Plot
Hello Statalist!
I was wondering if it would be possible for Stata to plot this kind of variation figure.
This figure shows how the treatment was assigned over time for each unit. What command can I use to make this kind of plot?
Thank you in advance.
I was wondering if it would be possible for Stata to plot this kind of variation figure.
This figure shows how the treatment was assigned over time for each unit. What command can I use to make this kind of plot?
Thank you in advance.
Sunday, November 27, 2022
Create a matrix or data set from current variables in stata
Hi everyone!
I have a dataset with three variables: province_code (11 province codes), newnsector (14 sector codes) , r_output. In my Satata file, newnsector is shown as "SA" "SB,..."SP" but when I use command dataex, SA is shown as "1", SB is shown as "2" ... ( I do not know why). Now, I want to create a 11x14 matrix (data set) and the value of the matrix is r_output. The first row of this new matrix is values of r_output of province code 01 in sector order: SA, SB, SC, SD, SE, SF, SG, SH, SK, SL, SM, SN, SO, SP). Similarly, the second row is for province 02 and so on until province code 17. Here, there is a difficult issue. Province code 01 has values for all 14 sectors, but province code 02 and some other provinces do not have r_output of all 14 sectors. Therefore, I need to fill in the values of missing sector codes in those provinces with "0" value to makes sure that I have 11x14 matrix. Now, I am doing manually in excel by exporting the file to excel and then copying, pasting, transposing and fill in the missing sectors with 0 values. It takes a lot of time. I am wondering if Stata has any command for this problem. I tried to do by myself but impossible.
The reason for me to need 11x14 matrix is because I need to multiply this matrix with a 14x14 matrix (input output table). My true data set has 63 provinces and 14 sectors. Can you suggest me the code for the sample data set and show me how to modify for the true dataset of 63 provinces. I really appreciate your help. Thank you so much!
I have a dataset with three variables: province_code (11 province codes), newnsector (14 sector codes) , r_output. In my Satata file, newnsector is shown as "SA" "SB,..."SP" but when I use command dataex, SA is shown as "1", SB is shown as "2" ... ( I do not know why). Now, I want to create a 11x14 matrix (data set) and the value of the matrix is r_output. The first row of this new matrix is values of r_output of province code 01 in sector order: SA, SB, SC, SD, SE, SF, SG, SH, SK, SL, SM, SN, SO, SP). Similarly, the second row is for province 02 and so on until province code 17. Here, there is a difficult issue. Province code 01 has values for all 14 sectors, but province code 02 and some other provinces do not have r_output of all 14 sectors. Therefore, I need to fill in the values of missing sector codes in those provinces with "0" value to makes sure that I have 11x14 matrix. Now, I am doing manually in excel by exporting the file to excel and then copying, pasting, transposing and fill in the missing sectors with 0 values. It takes a lot of time. I am wondering if Stata has any command for this problem. I tried to do by myself but impossible.
The reason for me to need 11x14 matrix is because I need to multiply this matrix with a 14x14 matrix (input output table). My true data set has 63 provinces and 14 sectors. Can you suggest me the code for the sample data set and show me how to modify for the true dataset of 63 provinces. I really appreciate your help. Thank you so much!
Code:
Example generated by -dataex-. To install: ssc install dataex clear input str2 province_code int newnsector float r_output "01" 14 .05285969 "01" 3 .00626575 "01" 13 .17437804 "01" 9 .09589774 "01" 4 .05790508 "01" 6 .0009028614 "01" 7 .0517438 "01" 8 .05596996 "01" 11 .04816558 "01" 5 .15201323 "01" 10 .05520221 "01" 2 .0496197 "01" 1 .02801064 "01" 12 .159773 "02" 9 .0003512017 "02" 5 .000280617 "02" 12 7.206216e-06 "02" 4 .0023777916 "02" 14 5.672023e-06 "02" 7 1.238403e-06 "02" 1 .000064757674 "02" 13 .0001931883 "02" 2 3.4551356e-06 "02" 10 .0001658643 "04" 5 .00004467173 "04" 1 .0002179262 "04" 9 .000660103 "04" 14 9.0871945e-06 "04" 4 .0006858561 "04" 10 .0030890896 "06" 10 .0006495666 "06" 14 4.08849e-06 "06" 5 .00023494294 "06" 9 .00027400945 "06" 2 .000032480897 "06" 12 9.435477e-08 "06" 7 2.440203e-09 "06" 4 .00018620113 "06" 1 .000015656562 "08" 5 .01765398 "08" 2 .0006631739 "08" 4 .006475695 "08" 8 3.717457e-06 "08" 7 .000011507996 "08" 14 1.726268e-06 "08" 10 .003837108 "08" 1 .0005638853 "08" 9 .003737794 "10" 14 6.785504e-06 "10" 12 .00039115755 "10" 1 .0003236186 "10" 4 .0019831005 "10" 2 1.968738e-07 "10" 10 .000279764 "10" 9 .0006842943 "10" 5 .0001658596 "10" 7 .03733616 "11" 9 .0013166956 "11" 14 .00008154935 "11" 10 .00006602735 "11" 1 .00001089103 "11" 5 .00005325684 "11" 4 .00007391685 "12" 4 .0004276246 "12" 10 .000023588744 "12" 1 1.3328968e-06 "12" 5 5.277554e-06 "12" 8 6.246046e-06 "12" 9 .00018962457 "12" 14 1.3077788e-07 "14" 5 .000132034 "14" 12 .000013313354 "14" 10 .00004675482 "14" 14 9.341277e-08 "14" 1 .0027883044 "14" 9 .0024881726 "14" 7 .00022676193 "14" 4 .00009249443 "14" 2 .00008890821 "15" 10 .00013244175 "15" 7 .0003468687 "15" 2 .0005966048 "15" 1 .0002936966 "15" 8 .00008438806 "15" 12 .00007266354 "15" 9 .013388124 "15" 14 .000016993652 "15" 4 .003852513 "15" 5 .002465888 "17" 3 .0005053952 "17" 12 .00006140318 "17" 14 .000034275014 "17" 1 .0008872058 "17" 7 .0007581466 "17" 10 .0011180259 "17" 5 .00037424185 "17" 8 .00012182483 "17" 2 .003683278 "17" 4 .013263932 "17" 13 .0009010206 end label values newnsector nsector label def nsector 1 "SA", modify label def nsector 2 "SB", modify label def nsector 3 "SC", modify label def nsector 4 "SD", modify label def nsector 5 "SE", modify label def nsector 6 "SF", modify label def nsector 7 "SG", modify label def nsector 8 "SH", modify label def nsector 9 "SK", modify label def nsector 10 "SL", modify label def nsector 11 "SM", modify label def nsector 12 "SN", modify label def nsector 13 "SO", modify label def nsector 14 "SP", modify
Reshape data
Hi
I have the following data having following variables lat level lon time air dup time2. The level variables takes only two values 1000 and 925. I want to convert the data to lat level_1000 level_925 lon time air dup time2.
how I can do this please.
Thanks
------------------ copy up to and including the previous line ------------------
I have the following data having following variables lat level lon time air dup time2. The level variables takes only two values 1000 and 925. I want to convert the data to lat level_1000 level_925 lon time air dup time2.
how I can do this please.
Thanks
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float lat int level float lon str19 time float(air dup) str8 time2 35 1000 0 "2021-01-01 00:00:00" 283.39996 1 "00:00:00" 35 925 0 "2021-01-01 00:00:00" 277.69998 2 "00:00:00" 35 925 0 "2021-01-02 00:00:00" 277.69998 1 "00:00:00" 35 1000 0 "2021-01-02 00:00:00" 282.69998 2 "00:00:00" 35 1000 0 "2021-01-03 00:00:00" 280.19998 1 "00:00:00" 35 925 0 "2021-01-03 00:00:00" 275.39996 2 "00:00:00" 35 925 0 "2021-01-04 00:00:00" 277.4 1 "00:00:00" 35 1000 0 "2021-01-04 00:00:00" 282.39996 2 "00:00:00" 35 925 0 "2021-01-05 00:00:00" 277.99997 1 "00:00:00" 35 1000 0 "2021-01-05 00:00:00" 283.89996 2 "00:00:00" 35 925 0 "2021-01-06 00:00:00" 279.89996 1 "00:00:00" 35 1000 0 "2021-01-06 00:00:00" 284.69998 2 "00:00:00" 35 925 0 "2021-01-07 00:00:00" 283.6 1 "00:00:00" 35 1000 0 "2021-01-07 00:00:00" 289.7 2 "00:00:00" 35 1000 0 "2021-01-08 00:00:00" 290.49997 1 "00:00:00" 35 925 0 "2021-01-08 00:00:00" 285 2 "00:00:00" 35 925 0 "2021-01-09 00:00:00" 287.2 1 "00:00:00" 35 1000 0 "2021-01-09 00:00:00" 292.19998 2 "00:00:00" 35 925 0 "2021-01-10 00:00:00" 279.99997 1 "00:00:00" 35 1000 0 "2021-01-10 00:00:00" 285.3 2 "00:00:00" 35 925 0 "2021-01-11 00:00:00" 277.69998 1 "00:00:00" 35 1000 0 "2021-01-11 00:00:00" 282.39996 2 "00:00:00" 35 925 0 "2021-01-12 00:00:00" 275.39996 1 "00:00:00" 35 1000 0 "2021-01-12 00:00:00" 280 2 "00:00:00" 35 1000 0 "2021-01-13 00:00:00" 280.99997 1 "00:00:00" 35 925 0 "2021-01-13 00:00:00" 277.7 2 "00:00:00" 35 925 0 "2021-01-14 00:00:00" 279.89996 1 "00:00:00" 35 1000 0 "2021-01-14 00:00:00" 283.19998 2 "00:00:00" 35 925 0 "2021-01-15 00:00:00" 278 1 "00:00:00" 35 1000 0 "2021-01-15 00:00:00" 282.39996 2 "00:00:00" 35 925 0 "2021-01-16 00:00:00" 280.1 1 "00:00:00" 35 1000 0 "2021-01-16 00:00:00" 283.89996 2 "00:00:00" 35 1000 0 "2021-01-17 00:00:00" 282.89996 1 "00:00:00" 35 925 0 "2021-01-17 00:00:00" 278.49997 2 "00:00:00" 35 1000 0 "2021-01-18 00:00:00" 287.69998 1 "00:00:00" 35 925 0 "2021-01-18 00:00:00" 285.8 2 "00:00:00" 35 1000 0 "2021-01-19 00:00:00" 288.7 1 "00:00:00" 35 925 0 "2021-01-19 00:00:00" 285 2 "00:00:00" 35 1000 0 "2021-01-20 00:00:00" 284.4 1 "00:00:00" 35 925 0 "2021-01-20 00:00:00" 282.6 2 "00:00:00" 35 925 0 "2021-01-21 00:00:00" 283.3 1 "00:00:00" 35 1000 0 "2021-01-21 00:00:00" 288.49997 2 "00:00:00" 35 925 0 "2021-01-22 00:00:00" 282.8 1 "00:00:00" 35 1000 0 "2021-01-22 00:00:00" 287.2 2 "00:00:00" 35 925 0 "2021-01-23 00:00:00" 282.49997 1 "00:00:00" 35 1000 0 "2021-01-23 00:00:00" 287.2 2 "00:00:00" 35 925 0 "2021-01-24 00:00:00" 282.8 1 "00:00:00" 35 1000 0 "2021-01-24 00:00:00" 287.9 2 "00:00:00" 35 925 0 "2021-01-25 00:00:00" 284.8 1 "00:00:00" 35 1000 0 "2021-01-25 00:00:00" 289 2 "00:00:00" 35 1000 0 "2021-01-26 00:00:00" 287.89996 1 "00:00:00" 35 925 0 "2021-01-26 00:00:00" 283.8 2 "00:00:00" 35 1000 0 "2021-01-27 00:00:00" 288.7 1 "00:00:00" 35 925 0 "2021-01-27 00:00:00" 284.89996 2 "00:00:00" 35 925 0 "2021-01-28 00:00:00" 287.8 1 "00:00:00" 35 1000 0 "2021-01-28 00:00:00" 291.99997 2 "00:00:00" 35 925 0 "2021-01-29 00:00:00" 289.89996 1 "00:00:00" 35 1000 0 "2021-01-29 00:00:00" 291.1 2 "00:00:00" 35 925 0 "2021-01-30 00:00:00" 288.69998 1 "00:00:00" 35 1000 0 "2021-01-30 00:00:00" 292 2 "00:00:00" 35 925 0 "2021-01-31 00:00:00" 285.59998 1 "00:00:00" 35 1000 0 "2021-01-31 00:00:00" 290.4 2 "00:00:00" 35 1000 0 "2021-02-01 00:00:00" 290 1 "00:00:00" 35 925 0 "2021-02-01 00:00:00" 284.9 2 "00:00:00" 35 925 0 "2021-02-02 00:00:00" 285.6 1 "00:00:00" 35 1000 0 "2021-02-02 00:00:00" 289.7 2 "00:00:00" 35 925 0 "2021-02-03 00:00:00" 288.19998 1 "00:00:00" 35 1000 0 "2021-02-03 00:00:00" 291.09998 2 "00:00:00" 35 1000 0 "2021-02-04 00:00:00" 291.7 1 "00:00:00" 35 925 0 "2021-02-04 00:00:00" 287.4 2 "00:00:00" 35 1000 0 "2021-02-05 00:00:00" 293.19998 1 "00:00:00" 35 925 0 "2021-02-05 00:00:00" 290.19998 2 "00:00:00" 35 1000 0 "2021-02-06 00:00:00" 293.89996 1 "00:00:00" 35 925 0 "2021-02-06 00:00:00" 289.2 2 "00:00:00" 35 925 0 "2021-02-07 00:00:00" 281.59998 1 "00:00:00" 35 1000 0 "2021-02-07 00:00:00" 286.4 2 "00:00:00" 35 925 0 "2021-02-08 00:00:00" 280.1 1 "00:00:00" 35 1000 0 "2021-02-08 00:00:00" 284.6 2 "00:00:00" 35 925 0 "2021-02-09 00:00:00" 283.2 1 "00:00:00" 35 1000 0 "2021-02-09 00:00:00" 289.09998 2 "00:00:00" 35 1000 0 "2021-02-10 00:00:00" 291.39996 1 "00:00:00" 35 925 0 "2021-02-10 00:00:00" 285.7 2 "00:00:00" 35 925 0 "2021-02-11 00:00:00" 283.69998 1 "00:00:00" 35 1000 0 "2021-02-11 00:00:00" 288.8 2 "00:00:00" 35 925 0 "2021-02-12 00:00:00" 287.9 1 "00:00:00" 35 1000 0 "2021-02-12 00:00:00" 292.8 2 "00:00:00" 35 925 0 "2021-02-13 00:00:00" 283.99997 1 "00:00:00" 35 1000 0 "2021-02-13 00:00:00" 289.59998 2 "00:00:00" 35 1000 0 "2021-02-14 00:00:00" 285.99997 1 "00:00:00" 35 925 0 "2021-02-14 00:00:00" 282.19998 2 "00:00:00" 35 925 0 "2021-02-15 00:00:00" 279.69998 1 "00:00:00" 35 1000 0 "2021-02-15 00:00:00" 284.8 2 "00:00:00" 35 925 0 "2021-02-16 00:00:00" 281.49997 1 "00:00:00" 35 1000 0 "2021-02-16 00:00:00" 284.7 2 "00:00:00" 35 925 0 "2021-02-17 00:00:00" 284.39996 1 "00:00:00" 35 1000 0 "2021-02-17 00:00:00" 285.3 2 "00:00:00" 35 925 0 "2021-02-18 00:00:00" 282.89996 1 "00:00:00" 35 1000 0 "2021-02-18 00:00:00" 284.99997 2 "00:00:00" 35 1000 0 "2021-02-19 00:00:00" 288.69998 1 "00:00:00" 35 925 0 "2021-02-19 00:00:00" 284.8 2 "00:00:00" end
Loop with a synthetic control method
Hi everyone,
I have a dataset at the state level with 39 states and I want to do a synthetic control regression (as shown) only for the states 4,5,21,34 by using the command levelsof ..., local() in a loop . Note that at the end of the regression I have to save the results in dataset "ps" that contains the number of the state. In this example if I run the regression for the state 4 I will save it in the dataset ps4 but I have to save also ps5, ps21 and ps34 in a loop.
Many thanks for your help!
I have a dataset at the state level with 39 states and I want to do a synthetic control regression (as shown) only for the states 4,5,21,34 by using the command levelsof ..., local() in a loop . Note that at the end of the regression I have to save the results in dataset "ps" that contains the number of the state. In this example if I run the regression for the state 4 I will save it in the dataset ps4 but I have to save also ps5, ps21 and ps34 in a loop.
Many thanks for your help!
Code:
tsset state year synth cigsale beer lnincome(1980&1985) age15to24 retprice cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) fig keep(ps4, replace)
Code:
state year cigsale lnincome beer retprice age15to24 29 1970 123.9 . . 39.3 .1831579 32 1970 99.8 . . 39.9 .1780438 10 1970 134.6 . . 30.6 .17651588 21 1970 189.5 . . 38.9 .1615542 14 1970 115.9 . . 34.3 .1851852 27 1970 108.4 . . 38.4 .17545916 22 1970 265.7 . . 31.4 .1707317 25 1970 93.8 . . 37.3 .184466 2 1970 100.3 . . 36.7 .16900676 36 1970 124.3 . . 28.8 .18942162 9 1970 124.8 . . 41.4 .1669667 31 1970 92.7 . . 38.5 .17867868 34 1970 65.5 . . 34.6 .20207743 7 1970 109.9 . . 34.3 .1874455 17 1970 93.4 . . 36.2 .18313035 4 1970 124.8 . . 29.4 .19095023 16 1970 104.3 . . 39.1 .1747241 3 1970 123 . . 38.8 .17815833 33 1970 106.4 . . 40.4 .18314135 13 1970 155.8 . . 28.3 .18131015 15 1970 128.5 . . 38 .1690141 24 1970 172.4 . . 27.3 .1935484 19 1970 111.2 . . 34 .1757925 35 1970 122.6 . . 37.7 .1797753 11 1970 108.5 . . 37.7 .16884956 5 1970 120 . . 42.2 .16292876 12 1970 114 . . 34.2 .18052468 6 1970 155 . . 39 .17335767 38 1970 106.4 . . 38.5 .174287 8 1970 102.4 . . 33.8 .1781206 23 1970 90 . . 39.7 .18485743 37 1970 114.5 . . 33.7 .17259175 28 1970 107.3 . . 38.4 .163376 30 1970 103.6 . . 32.5 .20030876 26 1970 121.6 . . 36.6 .1732195 20 1970 108.1 . . 33.9 .17373738 18 1970 121.3 . . 36 .167593 1 1970 89.8 . . 39.6 .1788618 39 1970 132.2 . . 34.1 .1746988 9 1971 125.6 . . 41.4 .1689976 22 1971 278 . . 34.1 .1723339 31 1971 96.7 . . 38.5 .18049243 13 1971 163.5 . . 30.1 .1822996 39 1971 131.7 . . 34.4 .17722893 19 1971 115.6 . . 34.7 .1771459 17 1971 105.4 . . 37.5 .18437305 12 1971 102.8 . . 38.9 .18155004 6 1971 161.1 . . 41.3 .1758872 34 1971 67.7 . . 36.6 .20206134 27 1971 115.4 . . 39.8 .1765248 15 1971 133.2 . . 38.8 .1703349 7 1971 115.7 . . 35.8 .18786626 36 1971 128.4 . . 30.2 .1898735 18 1971 127.6 . . 36.8 .16925956 1 1971 95.4 . . 42.7 .17992784 25 1971 98.5 . . 38.9 .18638696 24 1971 187.6 . . 29.4 .1936767 8 1971 108.5 . . 33.6 .17609245 23 1971 92.6 . . 41.7 .1860954 28 1971 106.3 . . 44.7 .1650846 2 1971 104.1 . . 38.8 .16995385 32 1971 106.3 . . 41.6 .17881927 20 1971 108.6 . . 34.7 .17521714 33 1971 108.9 . . 42 .18430856 26 1971 124.6 . . 38.1 .1745399 38 1971 105.4 . . 40.2 .17634407 11 1971 108.4 . . 38.5 .170839 37 1971 111.5 . . 41.6 .17312744 3 1971 121 . . 39.7 .17929636 4 1971 125.5 . . 31.1 .1916476 30 1971 115 . . 34.3 .2004893 21 1971 190.5 . . 44 .16377378 10 1971 139.3 . . 32.2 .17797175 16 1971 116.4 . . 40.1 .1767316 14 1971 119.8 . . 39.3 .1867808 29 1971 123.2 . . 40.2 .1838495 35 1971 124.4 . . 39.5 .1813672 5 1971 117.6 . . 45.5 .1646539 7 1972 117 9.63889 . 40.9 .188287 34 1972 71.3 9.601122 . 37.2 .20204525 13 1972 179.4 9.547482 . 30.6 .18328904 31 1972 103 9.630849 . 39.1 .1823062 15 1972 136.5 9.59714 . 41.5 .1716557 26 1972 124.4 9.779579 . 38.4 .17586027 32 1972 111.5 9.569716 . 41.6 .1795947 20 1972 104.9 9.746475 . 41.1 .1766969 29 1972 134.4 9.770211 . 41.6 .1845411 33 1972 108.6 9.675209 . 46.9 .18547577 22 1972 296.2 9.736376 . 36.1 .17393607 27 1972 121.7 9.625027 . 39.8 .17759047 28 1972 109 9.784274 . 44.7 .16679317 8 1972 126.1 9.651839 . 33.7 .1740643 11 1972 109.4 9.730265 . 41.9 .17282845 19 1972 122.2 9.694641 . 40.1 .1784993 4 1972 134.3 9.805548 . 31.2 .19234496 35 1972 138 9.673069 . 40 .18295917 3 1972 123.5 9.930814 . 39.9 .1804344 30 1972 118.7 9.509309 . 34.1 .2006698 21 1972 198.6 9.944233 . 40.6 .16599335 12 1972 111 9.768118 . 38.8 .1825754 end label values state state label def state 1 "Alabama", modify label def state 2 "Arkansas", modify label def state 3 "California", modify label def state 4 "Colorado", modify label def state 5 "Connecticut", modify label def state 6 "Delaware", modify label def state 7 "Georgia", modify label def state 8 "Idaho", modify label def state 9 "Illinois", modify label def state 10 "Indiana", modify label def state 11 "Iowa", modify label def state 12 "Kansas", modify label def state 13 "Kentucky", modify label def state 14 "Louisiana", modify label def state 15 "Maine", modify label def state 16 "Minnesota", modify label def state 17 "Mississippi", modify label def state 18 "Missouri", modify label def state 19 "Montana", modify label def state 20 "Nebraska", modify label def state 21 "Nevada", modify label def state 22 "New Hampshire", modify label def state 23 "New Mexico", modify label def state 24 "North Carolina", modify label def state 25 "North Dakota", modify label def state 26 "Ohio", modify label def state 27 "Oklahoma", modify label def state 28 "Pennsylvania", modify label def state 29 "Rhode Island", modify label def state 30 "South Carolina", modify label def state 31 "South Dakota", modify label def state 32 "Tennessee", modify label def state 33 "Texas", modify label def state 34 "Utah", modify label def state 35 "Vermont", modify label def state 36 "Virginia", modify label def state 37 "West Virginia", modify label def state 38 "Wisconsin", modify label def state 39 "Wyoming", modify
Method deriving weight of second level for mtobit model
I would like to analyse my data in the multi-level Tobit method (metobit) with applying two-level weights. In multi-level linear mixed model we get the first level weight [pweight= IPW] by calculating a propensity weight and the Stata automatically estimates the weight. "size" for cluster level (2nd level). Stata command for the linear mixed model is "mixed unemp i.year inc edu [pweight= IPW]|| year:, pwscale(size)", My data of the unemployment variable is censored in distribution. My data would better fit in metobit method. But metobit does not automatically estimate the "size" weight. Stata command for mtobit is "metobit unemp inc edu [pweight=IPW] || year:, pweight(wvar2) ll(0)". The wvar2 in pweight(wvar2) is the 2nd level weight similar to "size". I do not know how to derive the weight data from my first-level estimate. Stata manual explained the meaning of the weight but not explained the calculation method. Can you please help me by describing a) the 2nd level weight-deriving method as Stata calculates the weight "size" b) providing reading materials to calculate the weight?
Leading zero
Hi statalist community,
What I need, is to put leading zero in single digit number-
Needed output looks like.
RTR
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
Code:
* Example generated by -dataex-. For more info, type help dataex clear input RTR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 end
What I need, is to put leading zero in single digit number-
Needed output looks like.
RTR
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
Multiple random slopes in a mixed linear model
Dear all,
I have unbalanced panel data with repeated sampling over 6 years, hence I am employing mixed linear regressions. I want to incorporate various covariates, which differ either at level 1 (observation level) or level 2 (individual level). I want to assess the effect of reading on cognitive test scores in later life. My level 1 predictors are, among others, age, wealthgroup, and the spread between observations due to the unbalanced nature. The level 1 predictors may change over time within each individual. Mylevel 2 predictors are constant for each individual and are education level and gender. My code for the mixed model with the random intercept is
I now also want to add random slopes. My first indication was to set the spread between observations as a random slope with:
Adding the spread as a random slope significantly improves the fit of my model. However, my question now is whether it is possible (and whether it makes sense) to add all level 1 predictors as random slopes. I have tried the following code, but even after an hour of computation time, no output was given
1. Is it possible to add multiple random slopes?
2. Does it make sense to add multiple slopes?
3. If I cannot/should not add multiple random slopes, how do I deal with the fact that my level 1 predictors vary over time?
Thanks all!
I have unbalanced panel data with repeated sampling over 6 years, hence I am employing mixed linear regressions. I want to incorporate various covariates, which differ either at level 1 (observation level) or level 2 (individual level). I want to assess the effect of reading on cognitive test scores in later life. My level 1 predictors are, among others, age, wealthgroup, and the spread between observations due to the unbalanced nature. The level 1 predictors may change over time within each individual. Mylevel 2 predictors are constant for each individual and are education level and gender. My code for the mixed model with the random intercept is
Code:
mixed testscore reading age wealthgroup spread education gender || mergeid_num:,
Code:
mixed testscore reading age wealthgroup spread education gender || mergeid_num: spread, covariance(unstructured)
Code:
mixed testscore reading age wealthgroup spread education gender || mergeid_num: spread age wealthgroup, covariance(unstructured)
2. Does it make sense to add multiple slopes?
3. If I cannot/should not add multiple random slopes, how do I deal with the fact that my level 1 predictors vary over time?
Thanks all!
Saturday, November 26, 2022
Optimal lag selection in Granger Causality tests
I use [TS] varsoc to obtain the optimum lag length for the Granger causality test. This command reports the optimal number of lags based on different criteria such as Akaike's information criterion (AIC).
Is there any way to store the optimal lag number (obtained based on AIC) in a variable and use it in the next command to estimate causality? Something like this:
Lag= varsoc X Y
tvgc X Y, p(Lag) d(Lag) trend window(30) prefix(_) graph
Is there any way to store the optimal lag number (obtained based on AIC) in a variable and use it in the next command to estimate causality? Something like this:
Lag= varsoc X Y
tvgc X Y, p(Lag) d(Lag) trend window(30) prefix(_) graph
Main effects of two independent variables across five groups
Hi,
I have unbalanced panel data for 160 companies from 5 different subgroups (g1,g2,g3,g4,g5) where group id is defined by business activity type, over 14 years for which I run the following baseline regression: (CVs: additional 7 control variables, l: lagged variable, X1 and X2: continuous independent variables, Y: continuous dependent variable)
xtreg Y l.X1 l.X2 l.CVs i.year,r
I want to check if the main effects of X1 and X2 on Y vary across 5 groups where all are lagged except for Y. For that reason, I ran the following sample regression;
xtreg Y l.X1 l.X2 l.g2.l.g3.l.g4.l.g5 l.c.X1#l.i.g2 l.c.X1#l.i.g3 l.c.X1#l.i.g4 l.c.X1#l.i.g5 l.c.X2#l.i.g2 l.c.X2#l.i.g3 l.c.X2#l.i.g4 l.c.X2#l.i.g5 l.CVs i.year,r
5 times in a row by omitting one different group at a time (g1 is omitted in the first one above).
Is this the correct approach? How about dropping all g2,g3,g4,g5 observations and running the regression for g1 companies only (5 times in total keeping the observations of only one different group at a time)?
My second model is baseline + interaction between X1 and X2:
xtreg Y l.X1 l.X2 l.c.X1#l.c.X2 l.CVs i.year,r
Should I rerun the aforementioned 5 regressions again (this time the interaction included) to observe the differences in the main effects of X1 and X2 across groups as in the following (g1 is omitted):
xtreg Y l.X1 l.X2 l.c.X1#l.c.X2 l.g2.l.g3.l.g4.l.g5 l.c.X1#l.i.g2 l.c.X1#l.i.g3 l.c.X1#l.i.g4 l.c.X1#l.i.g5 l.c.X2#l.i.g2 l.c.X2#l.i.g3 l.c.X2#l.i.g4 l.c.X2#l.i.g5 l.c.X1#l.c.X2#l.i.g2 l.c.X1#l.c.X2#l.i.g3 l.c.X1#l.c.X2#l.i.g4 l.c.X1#l.c.X2#l.i.g5 l.CVs i.year,r
Best,
Lutfi
I have unbalanced panel data for 160 companies from 5 different subgroups (g1,g2,g3,g4,g5) where group id is defined by business activity type, over 14 years for which I run the following baseline regression: (CVs: additional 7 control variables, l: lagged variable, X1 and X2: continuous independent variables, Y: continuous dependent variable)
xtreg Y l.X1 l.X2 l.CVs i.year,r
I want to check if the main effects of X1 and X2 on Y vary across 5 groups where all are lagged except for Y. For that reason, I ran the following sample regression;
xtreg Y l.X1 l.X2 l.g2.l.g3.l.g4.l.g5 l.c.X1#l.i.g2 l.c.X1#l.i.g3 l.c.X1#l.i.g4 l.c.X1#l.i.g5 l.c.X2#l.i.g2 l.c.X2#l.i.g3 l.c.X2#l.i.g4 l.c.X2#l.i.g5 l.CVs i.year,r
5 times in a row by omitting one different group at a time (g1 is omitted in the first one above).
Is this the correct approach? How about dropping all g2,g3,g4,g5 observations and running the regression for g1 companies only (5 times in total keeping the observations of only one different group at a time)?
My second model is baseline + interaction between X1 and X2:
xtreg Y l.X1 l.X2 l.c.X1#l.c.X2 l.CVs i.year,r
Should I rerun the aforementioned 5 regressions again (this time the interaction included) to observe the differences in the main effects of X1 and X2 across groups as in the following (g1 is omitted):
xtreg Y l.X1 l.X2 l.c.X1#l.c.X2 l.g2.l.g3.l.g4.l.g5 l.c.X1#l.i.g2 l.c.X1#l.i.g3 l.c.X1#l.i.g4 l.c.X1#l.i.g5 l.c.X2#l.i.g2 l.c.X2#l.i.g3 l.c.X2#l.i.g4 l.c.X2#l.i.g5 l.c.X1#l.c.X2#l.i.g2 l.c.X1#l.c.X2#l.i.g3 l.c.X1#l.c.X2#l.i.g4 l.c.X1#l.c.X2#l.i.g5 l.CVs i.year,r
Best,
Lutfi
How to draw overlayed coefplot with only one regression
Suppose I have ran the following regression
reg wage i.year#i.gender controls
where gender takes two values. What I want is to use coef such that x axis is years and for each year the estimates for two values of gender are drawn overlayed. How can I achieve that?
reg wage i.year#i.gender controls
where gender takes two values. What I want is to use coef such that x axis is years and for each year the estimates for two values of gender are drawn overlayed. How can I achieve that?
Formatted IQR using Collect
Hi,
I am using the excellent Example 3 in the "Stata Customizable Tables" manual to help me build a table with frequency (percent) for categorical variables, and mean (sd) for continuous variables. Some of my continuous variables are, however, very skewed (age data for infants). For those variables I would like to report age in months as median (IQR). I would like to format the IQR as (p25 – p75). I think that what I need to do is combine both p25 and p75 into a single level of the dimension result, but I'm not sure how to do that.
I am aware that
does this. I have used and loved table1_mc, but I really want to learn the collect system.
For this post, below are data for age in months and sex. What I'm after is:
Here are my data for sex and age (months):
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Here is my code for what I'm after using mean and SD:
Kind regards,
Ryan
I am using the excellent Example 3 in the "Stata Customizable Tables" manual to help me build a table with frequency (percent) for categorical variables, and mean (sd) for continuous variables. Some of my continuous variables are, however, very skewed (age data for infants). For those variables I would like to report age in months as median (IQR). I would like to format the IQR as (p25 – p75). I think that what I need to do is combine both p25 and p75 into a single level of the dimension result, but I'm not sure how to do that.
I am aware that
Code:
table1_mc
For this post, below are data for age in months and sex. What I'm after is:
Male | Female | |
Age (months) | median (p25 – p75) | median (p25 – p75) |
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float age_adm_m byte sex 15.342916 1 54.80082 1 19.74538 1 75.663246 1 19.81109 1 46.45585 1 32.29569 1 3.3839836 2 67.15401 2 14.554415 1 69.650925 2 83.58111 1 15.014374 1 46.75154 2 36.172485 1 35.876797 1 7.096509 1 39.85216 1 7.523614 1 43.6961 1 3.12115 1 36.99384 2 55.78645 2 30.12731 2 52.46817 2 3.613963 1 17.87269 2 31.507187 2 30.58727 1 18.431211 1 43.63039 2 15.967146 2 50.7269 1 32.492813 2 16.689938 1 18.89117 1 30.45585 2 3.581109 2 19.876797 1 82.95688 1 71.29363 2 62.0616 2 30.45585 1 51.44969 2 14.52156 2 11.498973 1 1.4784395 2 28.64887 2 51.58111 1 72.24641 2 31.802876 1 42.48049 1 2.1026695 2 127.5729 1 40.21355 2 8.936345 1 3.876797 2 30.390144 1 44.71458 2 11.17043 1 10.61191 1 39.09651 1 14.52156 2 78.91581 1 16.328543 1 42.21766 1 11.039015 1 80.16427 1 150.70226 2 3.022587 1 59.07187 1 38.40657 1 57.49487 1 59.00616 2 19.58111 2 2.792608 2 79.50719 2 122.71047 2 92.09035 1 2.562628 2 46.02875 1 95.77002 2 34.49692 2 6.702259 1 48 2 43.13758 2 125.40452 2 . 1 76.38604 1 11.334702 1 43.23614 1 59.59753 1 55.88501 1 6.537988 1 82.16838 1 43.00616 1 54.17659 2 25.23203 1 54.2423 1 17.87269 1 end label values sex sex_lbl label def sex_lbl 1 "Male", modify label def sex_lbl 2 "Female", modify
Here is my code for what I'm after using mean and SD:
Code:
table (var) (shortsite), statistic(fvfrequency sex) statistic(fvpercent sex) nototals append collect style header result, level(hide) collect style row stack, nobinder spacer collect style cell border_block, border(right, pattern(nil)) collect layout (sex[1]) (shortsite#result) collect style cell result[fvpercent], sformat("%s%%")
Ryan
About Foreach or Forvalue
Hi! I am trying to create seven summary variables named den_1 to den_7 to simplify the results I have from 34 variables named total1-total34. Although there are 34 different groups, each observation actually only belongs to 7 groups at most. The number of groups that each belongs is shown by the variable group_n.
Enquiry:
Is possible that the -foreach- command or -forvalue- command can search over the 34 variables, take out the values, and put it in a new sets of variables (den_1- den_7)?
Data example skipping variables for simplification:
Thank you all very much!!
Enquiry:
Is possible that the -foreach- command or -forvalue- command can search over the 34 variables, take out the values, and put it in a new sets of variables (den_1- den_7)?
Data example skipping variables for simplification:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(total1 total2 total3 total32 total33 total34 group_n) 208 207 . . . . 4 208 207 . . . . 4 . . . . . . 0 . . . . . . 1 208 207 . . . . 4 . . . . . . 1 . . . . . . 4 . . . . . . 0 . . . . . . 2 . . . . . . 1 . . . . . . 0 . . . . . . 2 . . . . . . 0 . . . . . . 1 . . . . . . 0 208 207 . . . . 4 . . . . . . 2 . . . . . . 3 . . . . . . 0 . . . . . . 0 . . . . . . 2 . . . . . . 0 . . . . . . 0 . . . . 5 . 2 208 207 . . . . 5 . . . . . . 0 . . . . . . 3 . . . . . . 0 . . . . . . 0 . . . . . . 2 . . . . . . 2 . . . . . . 0 . . . . . . 0 . . . . . . 2 . . . . . . 0 208 207 50 . . . 5 . . . . . . 1 . . . . . . 3 . . . . . . 0 . . . . . . 0 . . . . . . 0 . . . . . . 0 208 207 . . . . 5 . . . . . . 2 . . . . . . 2 . . . . . . 1 . . . . . . 1 . . . . . . 2 208 207 . . . . 5 . . . . . . 0 . . . . . . 0 . . . . . . 1 . . . . . . 0 . . . . . . 0 . . . . . . 3 . . . . . . 2 . . . . . . 0 . . . . . . 2 . . . . . . 0 208 207 . . . . 5 . . . . . . 0 . . . . . . 0 . . . . . . 1 . . . . . . 2 208 207 . . . . 6 . . . . . . 4 . . . . . . 2 208 207 . . . . 6 . . . . . . 2 . . . . . . 2 . . . . . . 3 208 207 . . . . 5 . . . . . . 3 . . 50 . . . 2 . . . . . . 0 . . . . . . 2 . . . . . . 0 . . . . . . 3 . . . . . . 0 . . . . . . 0 . . . . . . 0 . . 50 . . . 2 . . 50 . . . 3 . . . . . . 0 . . . . . . 3 . . . . . . 0 . . . . . . 0 208 207 . . . . 5 . . . . . . 0 . . . . . . 0 . . . . . . 2 208 207 . . . . 4 . . . 5 . . 3 . . . . . . 2 . . . . . . 3 . . . . . . 2 . . . . . . 5 . . . . . . 0 . . . . . . 0 . . . . . . 0 end
Friday, November 25, 2022
GLM binomial logit model gof - Deviance
Good day,
I'm using Stata 16 and trying to do gof for a glm logit model but the results show a lot of missing data. My commands are as follows:
The results show a lot of missing data
6616. | . . . . |
|------------------------------------------|
216617. | . . . . |
216618. | . . . . |
216619. | . . . . |
216620. | . . . . |
|------------------------------------------|
216621. | . . . . |
216622. | . . . . |
216623. | . . . . |
216624. | . . . . |
|------------------------------------------|
216625. | . . . . |
216626. | . . . . |
216627. | . . . . |
216628. | . . . . |
|------------------------------------------|
216629. | . . . . |
216630. | . . . . |
216631. | . . . . |
216632. | . . . . |
|------------------------------------------|
216633. | . . . . |
216634. | . . . . |
216635. | . . . . |
216636. | . . . . |
|------------------------------------------|
216637. | . . . . |
216638. | . . . . |
216639. | . . . . |
216640. | . . . . |
|------------------------------------------|
216641. | . . . . |
216642. | . . . . |
216643. | . . . . |
216644. | . . . . |
|------------------------------------------|
216645. | . . . . |
216646. | . . . . |
216647. | . . . . |
216648. | . . . . |
|-----------------------------------------
My dataex
------------------ copy up to and including the previous line ------------------
Please assist.
Regards
Nthato
I'm using Stata 16 and trying to do gof for a glm logit model but the results show a lot of missing data. My commands are as follows:
Code:
glm obese $X [aw=wt], family(binomial) link(logit) predict mu_logit predict dr_logit, deviance qui glm obese $X [aw=wt], family(binomial) link(cloglog) predict mu_cl predict dr_cl, d format mu_logit dr_logit mu_cl dr_cl %9.5f list mu_logit dr_logit mu_cl dr_cl, sep(4)
6616. | . . . . |
|------------------------------------------|
216617. | . . . . |
216618. | . . . . |
216619. | . . . . |
216620. | . . . . |
|------------------------------------------|
216621. | . . . . |
216622. | . . . . |
216623. | . . . . |
216624. | . . . . |
|------------------------------------------|
216625. | . . . . |
216626. | . . . . |
216627. | . . . . |
216628. | . . . . |
|------------------------------------------|
216629. | . . . . |
216630. | . . . . |
216631. | . . . . |
216632. | . . . . |
|------------------------------------------|
216633. | . . . . |
216634. | . . . . |
216635. | . . . . |
216636. | . . . . |
|------------------------------------------|
216637. | . . . . |
216638. | . . . . |
216639. | . . . . |
216640. | . . . . |
|------------------------------------------|
216641. | . . . . |
216642. | . . . . |
216643. | . . . . |
216644. | . . . . |
|------------------------------------------|
216645. | . . . . |
216646. | . . . . |
216647. | . . . . |
216648. | . . . . |
|-----------------------------------------
My dataex
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float obese byte(wquintile fprogram_1 fprogram_2 gender_1 gender_2 race_1 race_2 race_3 race_4 geo_1 geo_2 mbmi_1 mbmi_2 gradem_1 gradem_2 employed_1 employed_2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 0 1 . . . . . . . . . . . . 1 0 1 0 0 1 . . . . . . . . . . . . 1 0 1 0 0 1 . . . . . . . . . . . . . . . . . . 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 . . . . . . . . . . . . 1 0 1 0 0 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 1 0 0 1 0 1 0 0 1 0 . . . . . . 0 1 1 0 1 0 0 1 0 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 0 0 1 0 1 0 0 1 0 . . . . . . . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1 . . . . . . . . . . . . 1 0 1 0 0 1 . . . . . . . . . . . . 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1 0 . . . . . . 0 1 1 0 1 0 0 1 0 0 1 0 . . . . . . . 1 . . 1 0 0 1 0 0 1 0 . . . . . . 0 1 1 0 1 0 0 1 0 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 . . 0 1 0 1 0 0 1 0 . . . . . . 0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 1 0 0 1 0 1 0 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 1 0 0 1 0 1 0 0 1 0 . . . . . . . 1 1 0 1 0 0 1 0 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0 . . . . . . . . . . . . 1 0 1 0 1 0 0 1 . . 1 0 0 1 0 0 1 0 1 0 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 0 1 . 1 . . 0 1 0 1 0 0 1 0 0 1 1 0 0 1 . . . . . . . . . . . . . . . . . . 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 . . . . . . . . . . . . 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 . . . . . . . . . . . . 1 0 1 0 1 0 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . 0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 1 0 1 0 . . . . . . . . . . . . 1 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . 0 1 1 0 1 0 . . . . . . . . . . . . . . . . . . . 1 0 1 1 0 0 1 0 0 1 0 1 0 1 0 1 0 end label values obese obese5 label def obese5 0 "0. Not_Obese", modify label def obese5 1 "1. Obese", modify
Please assist.
Regards
Nthato
Help: weights not allowed r(101)
Hi! I met a mistake in following command:
egen x2 = group(sex re age educ)
gen wm = sex == 1 & re == 1
egen wminc = mean(incwage) [pw=x2] if wm == 1, by(age)
First two lines worked and the mistake reported at the third line: weights not allowed r(101)
Could you please help me solve it ?
egen x2 = group(sex re age educ)
gen wm = sex == 1 & re == 1
egen wminc = mean(incwage) [pw=x2] if wm == 1, by(age)
First two lines worked and the mistake reported at the third line: weights not allowed r(101)
Could you please help me solve it ?
Estimating latent class models in Stata using both categorical and continuous indicator variables
Hi Statalist community,
I am trying to run latent class models and I am fairly new to this type of analysis. I have continuous and categorical indicators. I am using the following link as a reference. In the second example within the link, there is an example to run latent class models where there are both continuous and categorical indicators.
https://stats.oarc.ucla.edu/mplus/se...ixture-models/
However, the analysis is conducted using Mplus which is quite expensive and I am trying to replicate the example using Stata. I was wondering if there is a native Stata command or a user-generated Stata package available for me to replicate the example. In addition, if you have any papers that you could direct me to, I would really appreciate it. Thank you so much.
I am trying to run latent class models and I am fairly new to this type of analysis. I have continuous and categorical indicators. I am using the following link as a reference. In the second example within the link, there is an example to run latent class models where there are both continuous and categorical indicators.
https://stats.oarc.ucla.edu/mplus/se...ixture-models/
However, the analysis is conducted using Mplus which is quite expensive and I am trying to replicate the example using Stata. I was wondering if there is a native Stata command or a user-generated Stata package available for me to replicate the example. In addition, if you have any papers that you could direct me to, I would really appreciate it. Thank you so much.
xtivreg2: endogeneity and overidentification
Hello everyone,
I have an issue with interpret the result of xtivreg2:
- Overidentification is significant
- Endogeneity test is not significant
Please see the attached picture.
I don't know how I should interpret this result. The endogeneity test indicates that there is no endogeneity issue. This implies no need for the I.V. However, the overidentification test suggests the i.v. is a valid one?
Thank you
Array
I have an issue with interpret the result of xtivreg2:
- Overidentification is significant
- Endogeneity test is not significant
Please see the attached picture.
I don't know how I should interpret this result. The endogeneity test indicates that there is no endogeneity issue. This implies no need for the I.V. However, the overidentification test suggests the i.v. is a valid one?
Thank you
Array
Household Fixed Effect
Hello,
Is there an issue with using HH fixed effects or individual fixed effects when the death of a family member is the treatment in a DID framework? Is it feasible just to stick to state or district-fixed effects?
Is there an issue with using HH fixed effects or individual fixed effects when the death of a family member is the treatment in a DID framework? Is it feasible just to stick to state or district-fixed effects?
Problems with command "etable"
Hi there,
My intention is to run "etable" to create a results table of two models.
My code is:
[ATTACH=CONFIG]temp_29317_1669370440451_675[/ATTACH]
However, for some reason (even though I followed the instructions of the stata manual) it continues to create this unwanted table. I would like the results of the second model to be at the same row level of the first model.
[ATTACH=CONFIG]temp_29320_1669370480147_636[/ATTACH][ATTACH=CONFIG]temp_29319_1669370451881_354[/ATTACH]
I would highly appreciate some help. Thanks in advance.
Kind regards,
Antonio
My intention is to run "etable" to create a results table of two models.
My code is:
[ATTACH=CONFIG]temp_29317_1669370440451_675[/ATTACH]
However, for some reason (even though I followed the instructions of the stata manual) it continues to create this unwanted table. I would like the results of the second model to be at the same row level of the first model.
[ATTACH=CONFIG]temp_29320_1669370480147_636[/ATTACH][ATTACH=CONFIG]temp_29319_1669370451881_354[/ATTACH]
I would highly appreciate some help. Thanks in advance.
Kind regards,
Antonio
Which test should i use on Stata ?
Hello,
I would like to know which test to use on Stata according to my configuration.
So I want to compare 3 ways to collect data
So I have 3 groups for which I collect a Y variable according to the methods A B C, and for each person I also collect the Y variable but according to the reference method (called y_base).
It looks like this with random data generated (in my database, i have something like 500 people in each groups)
Which test on Stata should i use to check which one is closer to my y_base ?
Thanks you
I would like to know which test to use on Stata according to my configuration.
So I want to compare 3 ways to collect data
So I have 3 groups for which I collect a Y variable according to the methods A B C, and for each person I also collect the Y variable but according to the reference method (called y_base).
It looks like this with random data generated (in my database, i have something like 500 people in each groups)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id str1 module float(y_base y) 1 "A" 6 2 2 "A" 7 3 3 "A" 6 4 4 "A" 4 4 5 "A" 6 2 6 "A" 4 2 7 "A" 4 5 8 "A" 3 2 9 "A" 6 2 10 "A" 4 6 11 "A" 7 3 12 "A" 5 1 13 "A" 3 3 14 "A" 5 5 15 "A" 1 5 16 "A" 6 4 17 "A" 4 4 18 "A" 3 3 19 "A" 3 5 20 "A" 4 5 21 "B" 4 5 22 "B" 5 3 23 "B" 2 3 24 "B" 3 5 25 "B" 1 5 26 "B" 4 5 27 "B" 1 8 28 "B" 4 6 29 "B" 7 3 30 "B" 8 5 31 "B" 2 5 32 "B" 8 4 33 "B" 7 4 34 "B" 8 1 35 "B" 4 5 36 "B" 6 4 37 "B" 5 7 38 "B" 5 2 39 "B" 5 6 40 "B" 5 5 41 "C" 6 3 42 "C" 8 4 43 "C" 6 3 44 "C" 5 4 45 "C" 6 4 46 "C" 6 4 47 "C" 6 4 48 "C" 3 2 49 "C" 6 3 50 "C" 7 2 51 "C" 5 2 52 "C" 7 4 53 "C" 7 4 54 "C" 4 2 55 "C" 5 2 56 "C" 5 5 57 "C" 6 2 58 "C" 3 5 59 "C" 4 1 60 "C" 5 3 end
Thanks you
Thursday, November 24, 2022
Understanding type of panel
Hi everyone,
I have individual level data for 20 regions for 18 years. The question is to examine the impact of x on y (binary variable). However, there is no individual id. Basically the data looks like this:
It is not the same individuals that are tracked every year. My questions are:
I have individual level data for 20 regions for 18 years. The question is to examine the impact of x on y (binary variable). However, there is no individual id. Basically the data looks like this:
region | year | x | y | age | gender |
1 | 1993 | 20 | 1 | 20 | 1 |
1 | 1993 | 26 | 0 | 25 | 1 |
1 | 1993 | 12 | 1 | 40 | 1 |
1 | 1994 | 13 | 1 | 21 | 0 |
1 | 1994 | 20 | 1 | 30 | 1 |
2 | 1993 | 25 | 0 | 25 | 1 |
- Is this still a panel despite me not knowing anything about individuals?
- I have run a regression using the following command: logit y x gender age i.region i.year, vce (cluster region) Is this the correct way to include region and year fixed effects?
- Should I define it to be a panel data using xtset and then run the xtreg command. I read in a different post that when you have multiple observations under a particular region and year that might not be the right way.
- What if I want to run a non-parametric regression on this? Will having dependent variable and some of the controls are binary impact anything?
How to calculate days and hours between two dates
Hi,
I am calculating days and hours between two dates (admission date/time and discharge date/time), thanks for any suggestions.
Example attached:
clear
input str9 admi_date str9 admi_time str9 dischage_date str9 dischage_time
1 11jan2019 1154 12jan2019 0716
2 15feb2019 0217 08oct2018 0934
3 01dec2019 2314 09feb2020 0817
end
The final results should have two new variables, one denotes the total days between two dates, and another one indicates the total hours between two dates.
Best,
Zichun
I am calculating days and hours between two dates (admission date/time and discharge date/time), thanks for any suggestions.
Example attached:
clear
input str9 admi_date str9 admi_time str9 dischage_date str9 dischage_time
1 11jan2019 1154 12jan2019 0716
2 15feb2019 0217 08oct2018 0934
3 01dec2019 2314 09feb2020 0817
end
The final results should have two new variables, one denotes the total days between two dates, and another one indicates the total hours between two dates.
Best,
Zichun
Problem of missing observations in CAPM estimation
I have following variables from raw data and I want to estimate the return of Stock PG using CAPM model. I found that there are a lot of missing observations due to the date gaps since trading does not occur in the weekend and during public holidays. How can I fix this problem? Should I use a new date variable?
Moreover, after fixing of date, i use the following commands to get the estimation result:
tsset date, d
gen lnPG=ln(PG)
gen rPG=100*(lnPG-L.lnPG)
gen rirf=rPG-rfr
reg rirf r_market
Is it appropriate to use gen rPG=100*(lnPG-L.lnPG) to generate the return of stock PG from its daily close price?
*just ignore hml and smb because I just use CAPM not Fama-Fetch three factor model
Array
where r_market stands for the return of market portfolio, rfr stands for daily return for the Treasury bills (risk-free rate of return), PG is the adjusted close price of stock PG
Moreover, after fixing of date, i use the following commands to get the estimation result:
tsset date, d
gen lnPG=ln(PG)
gen rPG=100*(lnPG-L.lnPG)
gen rirf=rPG-rfr
reg rirf r_market
Is it appropriate to use gen rPG=100*(lnPG-L.lnPG) to generate the return of stock PG from its daily close price?
*just ignore hml and smb because I just use CAPM not Fama-Fetch three factor model
Array
where r_market stands for the return of market portfolio, rfr stands for daily return for the Treasury bills (risk-free rate of return), PG is the adjusted close price of stock PG
egen for numeric variables
Hi, I'm trying to concat two numeric variables whit the egen command, I know when it is used a string variable results, but, when I try to destring a "non-numeric characters" appears and the variable isn't replace. I need to do math operations with this variable so I can't leave it as a string. If you know the solution, I would be very grateful for your help.
e.g. I got two weight variables p1005peso_2 for an integer number and p1005peso_2 for a fraction of the weight
+----------+ +----------+
| p1005p~2 | | p100~o_1 |
|----------| |----------|
1. | 103 | 1. | 6 |
2. | 54 | 2. | 9 |
3. | 62 | 3. | 8 |
4. | 61 | 4. | 5 |
5. | 70 | 5. | 4 |
+----------+ +----------+
**egen peso_con = concat(p1005peso_2 p1005peso_1), punct(.)
. list peso_con in 1/5
+----------+ this is how I need the variable shown, but numeric
| peso_con |
|----------|
1. | 103.6 |
2. | 54.9 |
3. | 62.8 |
4. | 61.5 |
5. | 70.4 |
+----------+
in the missing values shows ". . ."
** destring peso_con, replace
peso_con: contains nonnumeric characters; no replace
e.g. I got two weight variables p1005peso_2 for an integer number and p1005peso_2 for a fraction of the weight
+----------+ +----------+
| p1005p~2 | | p100~o_1 |
|----------| |----------|
1. | 103 | 1. | 6 |
2. | 54 | 2. | 9 |
3. | 62 | 3. | 8 |
4. | 61 | 4. | 5 |
5. | 70 | 5. | 4 |
+----------+ +----------+
**egen peso_con = concat(p1005peso_2 p1005peso_1), punct(.)
. list peso_con in 1/5
+----------+ this is how I need the variable shown, but numeric
| peso_con |
|----------|
1. | 103.6 |
2. | 54.9 |
3. | 62.8 |
4. | 61.5 |
5. | 70.4 |
+----------+
in the missing values shows ". . ."
** destring peso_con, replace
peso_con: contains nonnumeric characters; no replace
DiD
Dear All,
I have ran into several general issues whilst starting to work with DiD.
The data that I have is repeated cross-sections with 12 cities of which 3 are in the treatment group and 9 are in the control group. I want to estimate student grant on enrollment. I attach the sample of my dataset below.
These are the questions I have for DiD.
1. In order to run an appropriate model, from what I've found in Scott Cunningham "Causal inference" is that most of the times the errors are correlated within groups. First, is there a way to test it such as white-test and if not should clustered errors be used regardless?
2. For DiD the most important assumption is parallel trends. However, is there still any use of finding out the degree to which both groups are balanced, if so what would be the appropriate code for that?
3. Most important question is which method would give me the extend to which student grant matters? (I want to find out if there is an intensive effect of student grants on enrollment)
Kindest regards
I have ran into several general issues whilst starting to work with DiD.
The data that I have is repeated cross-sections with 12 cities of which 3 are in the treatment group and 9 are in the control group. I want to estimate student grant on enrollment. I attach the sample of my dataset below.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long pid int year byte city float income byte female float(age ability) byte educ int grant byte enrolled float d 1204550 2010 3 30835.06 0 17.654997 103.11912 10 0 1 0 319284 2014 3 31932.877 0 18.130793 103.68096 14 0 1 0 889613 2013 3 36795.105 1 17.410234 105.94727 18 0 1 0 978314 2014 11 34354.633 1 18.826042 109.90023 13 0 1 1 522308 2018 12 39353.02 0 16.946407 94.4239 11 0 1 0 188084 2014 7 34108.605 0 16.74036 105.9109 12 0 1 0 807206 2016 12 28902.074 0 16.833967 113.8728 13 0 0 0 320333 2013 7 27654.77 0 17.068989 102.17348 8 0 1 0 450150 2010 12 32522.38 1 17.06066 99.98117 13 0 0 0 18346 2016 2 36964.234 1 17.876476 107.94476 15 0 1 0 365140 2010 7 32668.863 1 17.623026 92.24822 14 0 1 0 890188 2018 7 34312.56 0 17.601595 109.98123 13 0 1 0 646758 2018 1 33627.96 0 18.01731 120.2906 10 0 1 0 459585 2015 7 29370.793 1 17.249372 87.39782 6 0 0 0 935203 2013 12 32445.84 0 17.77647 75.570305 18 0 1 0 787581 2011 10 30065.26 1 18.051292 99.88745 5 0 1 1 498375 2015 7 35290.19 1 16.93023 93.36523 13 0 1 0 174629 2019 6 30601.344 0 16.877481 75.047035 9 0 0 0 611611 2011 5 30250.344 0 17.525812 90.92487 14 0 0 0 838826 2016 7 30751.756 0 17.068407 107.4832 14 0 1 0 1083592 2012 6 32664.69 1 17.389822 106.0466 11 0 1 0 242215 2015 7 30082.6 0 17.077396 72.571175 12 0 1 0 510105 2015 5 36238.57 0 17.273191 93.56672 14 0 1 0 1171168 2018 1 27488.486 1 17.905424 104.80125 8 0 1 0 505633 2013 10 38493.77 1 18.317528 109.59848 10 0 1 1 1073150 2010 4 31469.16 0 17.581125 98.7337 10 0 1 0 287828 2018 1 30329.86 1 17.092825 85.97654 17 0 0 0 891757 2017 12 35884.47 0 18.140705 111.96175 12 0 1 0 913149 2019 5 35003.055 0 16.631374 122.54363 11 0 1 0 991665 2015 6 33503.066 1 17.866404 70.98935 18 0 1 0 243071 2011 11 31379.88 1 17.084621 103.64464 14 0 1 1 250937 2017 9 35780.746 0 16.826647 86.37576 13 0 1 0 1027638 2018 12 33101.484 1 17.773394 91.35767 10 0 1 0 22953 2013 10 29485.53 1 17.464046 96.355 13 0 1 1 153108 2018 7 29512.227 1 17.275154 94.25189 12 0 1 0 658013 2013 2 27794.816 1 16.913126 82.87864 10 0 1 0 1180143 2013 5 32402.127 1 16.620207 92.83389 10 0 0 0 312950 2010 7 28712.87 1 17.36845 89.96732 12 0 0 0 13933 2013 2 33346.375 0 17.85618 99.07447 12 0 1 0 461750 2010 12 31624.84 0 18.187187 119.1078 12 0 1 0 1067041 2011 3 28147.123 1 17.787865 108.14761 18 0 0 0 807595 2015 9 31974.695 0 16.861078 114.08148 10 0 1 0 598028 2018 9 35101.645 1 18.408142 100.17933 15 0 0 0 898812 2012 7 34147.723 1 18.142551 88.43004 12 0 1 0 498185 2015 12 33769.85 1 18.160706 76.81758 11 0 1 0 284335 2015 7 27923.943 1 17.350409 105.08566 18 0 0 0 528101 2011 9 30789.766 1 17.574682 111.9395 17 0 0 0 120755 2015 3 38303.02 1 17.399488 102.59693 11 0 1 0 931243 2013 5 26780.69 1 17.589457 112.46512 16 0 1 0 13793 2013 8 33066.164 1 17.885649 78.38871 6 0 0 1 1089259 2019 7 31339.92 0 17.514992 120.0189 17 0 1 0 1157963 2013 3 34078.273 0 17.390617 83.69234 16 0 1 0 864745 2015 12 31167.443 0 17.64933 97.17146 9 0 0 0 597375 2015 11 32886.715 1 17.705078 90.70517 13 0 1 1 652215 2015 11 28142.48 0 17.882828 88.85712 11 0 1 1 228472 2012 12 29800.863 1 17.13096 93.62843 18 0 1 0 138830 2010 3 32122.283 1 18.19804 106.25028 7 0 1 0 744471 2011 11 32415.04 0 17.548534 104.27207 18 0 1 1 598536 2016 2 34121.395 1 16.673536 89.31037 15 0 1 0 1052614 2014 8 27814.113 1 17.010475 108.49964 10 0 1 1 649531 2011 7 27315.727 1 18.158386 81.48443 18 0 1 0 1021262 2012 7 30865.465 1 18.37912 91.87194 16 0 1 0 745194 2014 7 29216.043 1 17.223442 98.18081 15 0 0 0 1178861 2011 8 27306.32 0 17.345772 88.04616 16 0 1 1 1139135 2015 4 31997.08 0 17.50336 82.63182 18 0 1 0 549748 2018 12 31119.97 0 16.786953 90.79367 18 0 1 0 749490 2010 7 25319.2 0 17.834885 85.50574 12 0 0 0 1050823 2013 3 31714.156 1 17.77309 106.7881 12 0 0 0 719246 2016 2 30775.043 0 17.12436 94.05766 7 0 1 0 1005210 2010 8 30388.896 1 17.395172 98.40136 13 0 0 1 447412 2012 9 31698.453 0 17.66549 111.54143 9 0 1 0 1083881 2011 11 35093.87 1 17.310112 102.2826 14 0 0 1 1017055 2015 12 30806.945 1 18.521137 104.66325 11 0 1 0 992772 2012 12 27001.717 0 17.590246 112.75533 14 0 1 0 890703 2013 8 26494.21 1 16.931662 95.20155 12 0 1 1 281129 2019 6 37380.133 0 16.40155 91.67627 18 0 1 0 668979 2019 3 36621.348 1 18.151688 95.8932 14 0 0 0 967675 2015 4 34753.12 0 17.897005 113.03803 18 0 1 0 348630 2010 5 32139.91 1 17.538477 104.08703 10 0 1 0 539534 2014 9 32938.26 0 18.574429 87.49246 11 0 1 0 125183 2013 8 31783.246 1 18.586412 115.38096 16 0 1 1 1083222 2012 8 32871.15 1 17.557425 98.71842 16 0 0 1 808739 2019 4 31418.37 1 16.910383 73.486244 18 0 0 0 972730 2010 3 27586.594 1 17.133444 102.77448 15 0 1 0 177327 2017 9 32548.914 0 16.684296 93.75323 15 0 0 0 103276 2016 1 30006.35 0 16.82226 103.4052 8 0 1 0 856478 2018 7 34106.258 0 17.852732 107.36507 14 0 1 0 923792 2012 6 35160.816 0 17.40168 96.59897 14 0 1 0 1064345 2015 12 34054.652 1 17.536524 101.16731 16 0 1 0 522532 2012 3 32600.11 1 17.823885 95.66457 16 0 1 0 407449 2019 1 29082.324 1 17.626932 94.40669 14 0 0 0 819533 2013 3 30923.36 0 18.071976 100.04102 13 0 1 0 315863 2013 1 30462.123 0 16.875221 93.62828 16 0 1 0 56267 2017 7 31048.93 1 17.411005 99.2887 17 0 0 0 174233 2013 7 28117.346 0 17.959694 112.06963 11 0 1 0 459831 2011 7 34452.543 0 18.144615 105.93893 15 0 1 0 515840 2010 8 32178.18 0 17.602175 89.60943 12 0 1 1 186889 2019 7 32696.75 0 17.649858 95.25651 16 0 1 0 672054 2014 3 35514.21 1 17.791594 93.148 6 0 1 0 900622 2012 3 29506.32 0 17.503372 93.44817 18 0 1 0 end
These are the questions I have for DiD.
1. In order to run an appropriate model, from what I've found in Scott Cunningham "Causal inference" is that most of the times the errors are correlated within groups. First, is there a way to test it such as white-test and if not should clustered errors be used regardless?
2. For DiD the most important assumption is parallel trends. However, is there still any use of finding out the degree to which both groups are balanced, if so what would be the appropriate code for that?
3. Most important question is which method would give me the extend to which student grant matters? (I want to find out if there is an intensive effect of student grants on enrollment)
Kindest regards
Variable not recognized after modification
Hi everyone,
For company data analysis for my thesis, I wanted to create industry dummy variables based on the company's SIC Code 2's. Yesterday I successfully managed to do that by writing this code (and similar ones with different industry names):
generate Ind_Manu=.
replace Ind_Manu=1 if inrange(Sic_code_2,2000,3999)
replace Ind_Manu=0 if Sic_code_2>3999
replace Ind_Manu=0 if Sic_code_2<2000
As a result I got a beautiful dummy variable that indicated a 1 if the company's SIC Code 2 was between 2000-3999, and a value of 0 if not.
Today I tried patching the missing SIC Code 2 values through an alternative database. I basically copied the SIC-Code from the database and inserted it into my dataset at the place of the missing value of my Sic_code_2 variable.
The value was still black, so I assumed it would still be recognized as a numerical value.
However, now I wanted to update the Dummy variable with the new Sic Code 2 data, by running the following line again:
replace Ind_Manu=1 if inrange(Sic_code_2,2000,3999)
Yesterday this line worked perfectly fine, but somehow now I get the error: "no variables defined"
I did not change the name of the variable, so I think it has something to do with the fact that I manually entered data into the Sic Code 2 variable.
Does anyone know how to resolve this issue?
Thanks in advance!
Ruben
For company data analysis for my thesis, I wanted to create industry dummy variables based on the company's SIC Code 2's. Yesterday I successfully managed to do that by writing this code (and similar ones with different industry names):
generate Ind_Manu=.
replace Ind_Manu=1 if inrange(Sic_code_2,2000,3999)
replace Ind_Manu=0 if Sic_code_2>3999
replace Ind_Manu=0 if Sic_code_2<2000
As a result I got a beautiful dummy variable that indicated a 1 if the company's SIC Code 2 was between 2000-3999, and a value of 0 if not.
Today I tried patching the missing SIC Code 2 values through an alternative database. I basically copied the SIC-Code from the database and inserted it into my dataset at the place of the missing value of my Sic_code_2 variable.
The value was still black, so I assumed it would still be recognized as a numerical value.
However, now I wanted to update the Dummy variable with the new Sic Code 2 data, by running the following line again:
replace Ind_Manu=1 if inrange(Sic_code_2,2000,3999)
Yesterday this line worked perfectly fine, but somehow now I get the error: "no variables defined"
I did not change the name of the variable, so I think it has something to do with the fact that I manually entered data into the Sic Code 2 variable.
Does anyone know how to resolve this issue?
Thanks in advance!
Ruben
Unable to report & understand relogit marginal effects
Dear all,
In my (regression) analysis I am trying to see when companies are more likely to demand aid from governments, considering 5 major factors – firms’ size, revenue, unemployment in the economy, imports, gdp growth, and my main explanatory variable “ownership”, that is, if a company is owned by a large multinational corporation.
Some variables vary over time (such as revenue “lrev” and employee size “lnempl”) but others, such as mnc ownership, does not.
My outcome variable is binary – i.e., whether or not a firm has applied for “ad” or not. Main IV is also binary – whether a firm is “mnc_owned” or not. 5 other variables are continuous.
Since the event I am interested in is quite rare, I run a rare events logistic regression using the relogit command (OLS reveal similar results as well).
With all the variables in the model, my results are the following,
relogit ad mnc_owned lrev lnempl unemployment lnimport gdp_growth sector year
Corrected logit estimates Number of obs = 371157
------------------------------------------------------------------------------
| Robust
ad | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mnc_owned | 2.580213 .083656 30.84 0.000 2.416251 2.744176
lrev | .4812687 .0308463 15.60 0.000 .4208109 .5417264
lnempl | -.1178756 .0324088 -3.64 0.000 -.1813956 -.0543555
unemployment | .0195259 .0905168 0.22 0.829 -.1578838 .1969356
lnimport | -.1665573 .9645765 -0.17 0.863 -2.057093 1.723978
gdp_growth | -.0019636 .0226464 -0.09 0.931 -.0463498 .0424226
sector | .0005393 .0000406 13.29 0.000 .0004598 .0006188
year | .0095729 .0405276 0.24 0.813 -.0698597 .0890055
_cons | -28.80196 95.22616 -0.30 0.762 -215.4418 157.8379
------------------------------------------------------------------------------
The results bear my theoretical expectations.
Now I am trying to provide some visuals with a margins command but I have 2 issues I cannot seem to resolve,
When I try to understand the impact of my main IV (mnc ownership), the response is the following:
margins mnc_owned
factor mnc_owned not found in list of covariates
r(322);
So I decided to use dydx command instead, which seems to work:
margins, dydx (mnc_owned)
Average marginal effects Number of obs = 371,157
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: mnc_owned
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mnc_owned | 2.580213 .083656 30.84 0.000 2.416251 2.744176
------------------------------------------------------------------------------
But, now I cannot seems to see the impact of other continuous variables, such as revenue or size at different levels.
For revenue, for instance, when I try,
margins, dydx(lrev) at(lrev=1)
Average marginal effects Number of obs = 371,157
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: lrev
At: lrev = 1
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lrev | .4812687 .0308463 15.60 0.000 .4208109 .5417264
------------------------------------------------------------------------------
But for margins, dydx(lrev) at(lrev=6) the result is also identical – which does not make any sense… the level of revenue between 1 and 6 should most certainly not reveal identical results...
1) I cannot seem to see the difference in the different levels of this variable – how can I see the marginal effect of revenue at 1 versus revenue at 7?
2) Is it also possible to do this by keeping another variable in a given value, such as
margins, dydx(mnc_owned) at(lrev=1)
When I try this, the results do not change whether revenue (lrev) is 1 or 3 or 5 or 6 ... I am trying to understand what I am doing wrong.
I hope I was able to formulate these two issues.
Any help is appreciated.
Best,
Aydin
In my (regression) analysis I am trying to see when companies are more likely to demand aid from governments, considering 5 major factors – firms’ size, revenue, unemployment in the economy, imports, gdp growth, and my main explanatory variable “ownership”, that is, if a company is owned by a large multinational corporation.
Some variables vary over time (such as revenue “lrev” and employee size “lnempl”) but others, such as mnc ownership, does not.
My outcome variable is binary – i.e., whether or not a firm has applied for “ad” or not. Main IV is also binary – whether a firm is “mnc_owned” or not. 5 other variables are continuous.
Since the event I am interested in is quite rare, I run a rare events logistic regression using the relogit command (OLS reveal similar results as well).
With all the variables in the model, my results are the following,
relogit ad mnc_owned lrev lnempl unemployment lnimport gdp_growth sector year
Corrected logit estimates Number of obs = 371157
------------------------------------------------------------------------------
| Robust
ad | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mnc_owned | 2.580213 .083656 30.84 0.000 2.416251 2.744176
lrev | .4812687 .0308463 15.60 0.000 .4208109 .5417264
lnempl | -.1178756 .0324088 -3.64 0.000 -.1813956 -.0543555
unemployment | .0195259 .0905168 0.22 0.829 -.1578838 .1969356
lnimport | -.1665573 .9645765 -0.17 0.863 -2.057093 1.723978
gdp_growth | -.0019636 .0226464 -0.09 0.931 -.0463498 .0424226
sector | .0005393 .0000406 13.29 0.000 .0004598 .0006188
year | .0095729 .0405276 0.24 0.813 -.0698597 .0890055
_cons | -28.80196 95.22616 -0.30 0.762 -215.4418 157.8379
------------------------------------------------------------------------------
The results bear my theoretical expectations.
Now I am trying to provide some visuals with a margins command but I have 2 issues I cannot seem to resolve,
When I try to understand the impact of my main IV (mnc ownership), the response is the following:
margins mnc_owned
factor mnc_owned not found in list of covariates
r(322);
So I decided to use dydx command instead, which seems to work:
margins, dydx (mnc_owned)
Average marginal effects Number of obs = 371,157
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: mnc_owned
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mnc_owned | 2.580213 .083656 30.84 0.000 2.416251 2.744176
------------------------------------------------------------------------------
But, now I cannot seems to see the impact of other continuous variables, such as revenue or size at different levels.
For revenue, for instance, when I try,
margins, dydx(lrev) at(lrev=1)
Average marginal effects Number of obs = 371,157
Model VCE: Robust
Expression: Linear prediction, predict()
dy/dx wrt: lrev
At: lrev = 1
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lrev | .4812687 .0308463 15.60 0.000 .4208109 .5417264
------------------------------------------------------------------------------
But for margins, dydx(lrev) at(lrev=6) the result is also identical – which does not make any sense… the level of revenue between 1 and 6 should most certainly not reveal identical results...
1) I cannot seem to see the difference in the different levels of this variable – how can I see the marginal effect of revenue at 1 versus revenue at 7?
2) Is it also possible to do this by keeping another variable in a given value, such as
margins, dydx(mnc_owned) at(lrev=1)
When I try this, the results do not change whether revenue (lrev) is 1 or 3 or 5 or 6 ... I am trying to understand what I am doing wrong.
I hope I was able to formulate these two issues.
Any help is appreciated.
Best,
Aydin
Wednesday, November 23, 2022
Rearranging columns and rows to make it country-level database
I want to create two columns:
Column A with all the countries listed in the photo below
Column B with one of 3 options (Advanced Economies, Emerging Market Economies, Low-Income Developing Countries) to match whichever category it is placed under in the photo below.
So ideally, we'd end up with something that looks like:
I can do this easily on excel by moving cells around, etc.
But how do I do this with purely Stata? I realize it's not a wide-to-long format reshaping matter.
Thank you in advance,
Array
Column A with all the countries listed in the photo below
Column B with one of 3 options (Advanced Economies, Emerging Market Economies, Low-Income Developing Countries) to match whichever category it is placed under in the photo below.
So ideally, we'd end up with something that looks like:
Country | Type |
Australia | Advanced Economies |
Austria | Advanced Economies |
Albania | Emerging Market Economies |
Afghanistan | Low-Income Developing Countries |
I can do this easily on excel by moving cells around, etc.
But how do I do this with purely Stata? I realize it's not a wide-to-long format reshaping matter.
Thank you in advance,
Array
Side-by-side boxplots with markers for means
Hello,
I have country-level panel data for GDP per capita for the past 10 years that I wish to represent in a side-by-side boxplot (by years. So my x-axis would be years, y-axis would be GDP per capita).
I also have categories of countries. Some countries are classified as "fragile" and others are "non-fragile".
Is there a way to overlay the mean GDP value of "fragile" countries per year on the boxplot graphs? I am aware that others have attempted to do this using the "twoway rbar" method, but was wondering if there is a more efficient way, preferably sticking to the "graph box" syntax
I have country-level panel data for GDP per capita for the past 10 years that I wish to represent in a side-by-side boxplot (by years. So my x-axis would be years, y-axis would be GDP per capita).
I also have categories of countries. Some countries are classified as "fragile" and others are "non-fragile".
Is there a way to overlay the mean GDP value of "fragile" countries per year on the boxplot graphs? I am aware that others have attempted to do this using the "twoway rbar" method, but was wondering if there is a more efficient way, preferably sticking to the "graph box" syntax
Creating a scatterplot with two different variables
How to create a scatterplot with two different variables one being dependent and one being independent?
the original question is:
Create a scatterplot with “salary” as the dependent variable and “unemployed” as the independent variable.
the original question is:
Create a scatterplot with “salary” as the dependent variable and “unemployed” as the independent variable.
Callagain command by Behaghel et al.
Hi
I'm currently trying to run the command "callagain" by Behaghel et al. (to be found here). Unfortunately, the command does not seem to work. One file just opens the help page in Stata for the command and the ado-file cannot be run through, as there are several error messages from line 433 on. Does anybody ever have a similar problem? Is there a quick fix for this?
Thanks a lot for the help!
Best,
Arto
I'm currently trying to run the command "callagain" by Behaghel et al. (to be found here). Unfortunately, the command does not seem to work. One file just opens the help page in Stata for the command and the ado-file cannot be run through, as there are several error messages from line 433 on. Does anybody ever have a similar problem? Is there a quick fix for this?
Thanks a lot for the help!
Best,
Arto
Writing a formula in which stata chooses a value or 1
Hi!
I'm trying to use this formula to calculate eGFR: eGFR = 142*min(standardized Scr/K, 1)^α *max(standardized Scr/K, 1)^-1.200 *0.9938Age *1.012 [if female]
Scr= creatinine, e.g. creatinine=2.4.
In this formula, the min(standardized Scr/K, 1) means that you either choose the creatinine value divided by kappa, OR 1. The same goes for the max-one.
I'm looking for advice on how to write the command so Stata realizes this. Grateful for any help!
I'm trying to use this formula to calculate eGFR: eGFR = 142*min(standardized Scr/K, 1)^α *max(standardized Scr/K, 1)^-1.200 *0.9938Age *1.012 [if female]
Scr= creatinine, e.g. creatinine=2.4.
In this formula, the min(standardized Scr/K, 1) means that you either choose the creatinine value divided by kappa, OR 1. The same goes for the max-one.
I'm looking for advice on how to write the command so Stata realizes this. Grateful for any help!
Adding rows under a variable
Stata command for adding rows lets say 10 rows under a variable which has fixed observations like 40 for each unique id
Tuesday, November 22, 2022
please help. making bar graph more simple with "over" options
Hi, guys.
I'm a newcomers with stata, and there is some trouble in making bar graph.
My data has 46 observations, and it has numeric variable for displaying year-month.
here is some code i had run:
and the graph it produces:
Array
Four problems I want to solve:
1) The x-axis labels are too many so I want to display some labels (e.g. 201910, 202010, 202110, 202210). this graph aims to show that the fluctuation along specific month by year.
2) Similar with problem 1), I want to display some bar labels only ("blabel" option).
3) some bar labels are overlapped. I want to show it not overlapping
4) I want to coloring only last one.
I attach the data here. please some help.
I'm a newcomers with stata, and there is some trouble in making bar graph.
My data has 46 observations, and it has numeric variable for displaying year-month.
here is some code i had run:
Code:
graph bar fluct, over(year, label(labsize(vsmall) angle(45))) /// blabel(bar, size(vsmall) format(%9.1f) /// ) /// ytitle("Fluctuation") graphregion(color(white))
and the graph it produces:

Four problems I want to solve:
1) The x-axis labels are too many so I want to display some labels (e.g. 201910, 202010, 202110, 202210). this graph aims to show that the fluctuation along specific month by year.
2) Similar with problem 1), I want to display some bar labels only ("blabel" option).
3) some bar labels are overlapped. I want to show it not overlapping
4) I want to coloring only last one.
I attach the data here. please some help.
Please help cant work out how to t-test. Im new to stata
Hi guys I'm struggling to compare 2 different regions with a t-test.
As seen from the screenshots I have multiple regions and have generated new separate variables for those 2 regions I want to test.
One of my dependent variables is "patience".
However, when I run the ttest it compares the region I select in the GroupVariable name with all the other existing regions. I am only looking to test patience on those 2 specific regions.
Can someone help me please I'm a beginner.
Thanks
As seen from the screenshots I have multiple regions and have generated new separate variables for those 2 regions I want to test.
One of my dependent variables is "patience".
However, when I run the ttest it compares the region I select in the GroupVariable name with all the other existing regions. I am only looking to test patience on those 2 specific regions.
Can someone help me please I'm a beginner.
Thanks
Pseudo Panel Data and Mediation Analysis
Dear Stata Experts
I'm a PhD student, and I'd like to know how to use the codes to perform the Mediation analysis when using pseudo-panel data.
I'm a PhD student, and I'd like to know how to use the codes to perform the Mediation analysis when using pseudo-panel data.
Using cii proportions with loop
Hello,
I am trying to compute confidence intervals for proportions (number of cases / total) on each observations of a simple database using the ci proportions / cii proportions command.
Here is my code and the database:
This gives an output with empty values. I have also tried with cii proportions, which was also unsuccesful.
I can get the results one by one, by typing the following command - for Africa for instance - but I would like to have it automatized:
Does anyone have a idea on how to solve this?
I am using Stata 14.
Many thanks in anticipation
I am trying to compute confidence intervals for proportions (number of cases / total) on each observations of a simple database using the ci proportions / cii proportions command.
Here is my code and the database:
Code:
input str7 continent cases total Africa 544 863 America 43 172 Asia 372 734 Oceania 19 25 local contlist Africa America Asia Oceania foreach continent of local contlist { ci proportions total cases if continent == "`continent'" }
I can get the results one by one, by typing the following command - for Africa for instance - but I would like to have it automatized:
Code:
cii proportions 863 544
Does anyone have a idea on how to solve this?
I am using Stata 14.
Many thanks in anticipation
Monday, November 21, 2022
LSDV and collinearity
Hi everyone,
I am having trouble implementing a simple least square dummy variable (LSDV) model.
The model I am implementing is the following:
reg log_wage i.vet_yes c.age i.vet_yes#c.age vetcountry female $control daustria dcanada dczechrepublic ddenmark destonia dfinland dfrance dgermany direland djapan dkorea dnetherlands dnorway dpoland dslovakrepublic dspain dsweden duk dusa [pw=weight_adjusted] , vce(robust)
In particular, vetcountry takes value of one if the country is a vocational oriented country, zero otherwise.
Now the main problem is that when running this regression stata drops two variables (dcanada and dgermany: a vocational and a general oriented country).
Am I right to believe that if the coefficients for Vetcountry is estimated, is only due to the fact that two country dummy have dropped from the model? Given that, does it mean that the coefficient estimated for Vetcountry might not be reliable? If so, do you have any suggestions on how to overcome this problem as I really need to estimate this essential independent variable?
Many thanks
I am having trouble implementing a simple least square dummy variable (LSDV) model.
The model I am implementing is the following:
reg log_wage i.vet_yes c.age i.vet_yes#c.age vetcountry female $control daustria dcanada dczechrepublic ddenmark destonia dfinland dfrance dgermany direland djapan dkorea dnetherlands dnorway dpoland dslovakrepublic dspain dsweden duk dusa [pw=weight_adjusted] , vce(robust)
In particular, vetcountry takes value of one if the country is a vocational oriented country, zero otherwise.
Now the main problem is that when running this regression stata drops two variables (dcanada and dgermany: a vocational and a general oriented country).
Am I right to believe that if the coefficients for Vetcountry is estimated, is only due to the fact that two country dummy have dropped from the model? Given that, does it mean that the coefficient estimated for Vetcountry might not be reliable? If so, do you have any suggestions on how to overcome this problem as I really need to estimate this essential independent variable?
Many thanks
Space between axis and line chart
Hi, I have dataset starting in Febuary 2020 but when I create a chart for some reason the x axis starts on the first of January, so there is a gap between the line and the axis. Any help on how to remove it would be much appreciated!
and my data is
Code:
twoway line varx1 date_num
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(date_num varx2) 21960 .03993094 21961 .028612074 21962 .035981826 21963 .03093639 21964 .023211634 21965 .05790542 21966 .07603507 21967 .08387616 21968 .1548472 21969 .17819597 21970 .3043231 21971 .24888514 21972 .4006798 21973 .6627359 21974 .6639794 21975 1.4217416 21976 1.7526433 21977 1.692854 21978 1.842995 21979 2.2927668 21980 3.126146 21981 3.62297 21982 4.1054153 21983 6.309479 21984 8.616345 21985 8.994414 21986 11.120467 21987 14.858947 21988 15.002494 21989 15.87091 21990 19.247196 21991 17.635626 21992 18.665474 21993 19.430426 21994 22.93767 21995 16.207178 21996 14.968385 21997 18.667145 21998 15.427457 21999 15.471897 22000 13.691607 22001 13.549828 22002 11.70698 22003 9.430447 22004 12.200086 22005 9.846559 22006 12.64052 22007 10.324313 22008 9.74395 22009 8.157842 22010 6.826304 22011 8.218202 22012 6.821047 22013 7.32387 22014 6.679968 22015 6.459072 22016 5.239533 22017 4.6077657 22018 4.909679 22019 4.310663 22020 4.1643333 22021 4.0630765 22022 3.8334265 22023 3.1395385 22024 2.461292 22025 3.1197665 22026 2.5037735 22027 3.111698 22028 2.3318615 22029 2.1955478 22030 1.6780353 22031 1.0646594 22032 1.4131098 22033 1.661886 22034 1.460008 22035 1.4889643 22036 1.4995496 22037 1.194172 22038 .8829122 22039 1.0871441 22040 .5983202 22041 .6902591 22042 .58275956 22043 .6993674 22044 .5696148 22045 .6091068 22046 1.35688 22047 1.0096726 22048 1.0628527 22049 1.0090808 22050 .9965445 22051 .6764844 22052 .7071733 22053 .9780415 22054 1.0201133 22055 .7530233 22056 .749878 22057 .9125931 22058 .7052158 22059 .4717451 end format %td date_num