BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Wednesday, November 30, 2022

trend over years for categorical variable

Hi I am doing repeat cross sectional analysis over 10 years (discharge_year) for of a categorical measure (oc_mix) which have 3 categories(weak, strong, both). I started doing just a descriptive analysis and plotted stacked bar, and calculated the numbers and proportion for each of the category each year. I want now to test if the upward, downward or fluctuating trend seen in each category is statistically significant or not. not sure which test should I use. is it chi square for trend or ologit or some thing else and how to use it in STATA.

discharge year oc_mix
strong week both
2010 1,390 500 200
2011 1450 600 300
2012 1500 679 400
2013 1600 800 100

how to create the start and end date for a year

Dear statalist,

This might be an easy question, but I didn't figure out how to do this. Say I have a set of years, 2010, 2014, 2015, 2019, 2020 etc, I want to create two dates, start date and end date, to account for the first day and last day of the year, e.g., 01jan2010 and 31dec2010, how to create these two variables? Thanks a lot for your help.

Perform a paired ttest on two substracts of the same variable

I have a dataset that looks like this:

var1	var2	var3	attribute
1	0.93	0.88	1
1	0.76	0.20	1
1	0.40	0.18	0
0	0.34	0.91	1
0	0.09	0.51	0
...	...	...	...

For each variable (var1, var2, var3, etc), I need to perfom a ttest compairing its mean when attribute = 1 vs the mean of the whole variable. How can I do this?

OLS Regression - explanation of coefficients with control variables

Good morning,

I´m doing an OLS regression with the dependent variable having a health number (

Code:

Health_Number

- a dummy variable that takes the value of 1 if migrants have the health number and 0 if not - and the independent variable that is years since migrating (

Code:

YSM

. To see if as the years since they arrive in the destination country affect their possession of a health number that allows them access to healthcare.
For that I used the controls of Destination Network, having a health visa, being a female, age, having completed at least 12 years of schooling and being employed.

Code:

regress Health_Number YSM  
outreg2 using Regression0, excel append ctitle(Basic) dec(3)
regress Health_Number YSM Dest_Network
outreg2 using Regression0, excel append ctitle(Network) dec(3)
regress Health_Number YSM Dest_Network Health_Visa
outreg2 using Regression0, excel append ctitle(Having a Health Visa) dec(3)
regress Health_Number YSM Dest_Network Health_Visa Female Age AtLeast_CompletedSecondaryEduc Employed
outreg2 using Regression0, excel append ctitle(Migrant Controls) dec(3)

I am however having a hard time to understand the main coefficient since it does not change much across specifications (0.070, 0.071, 0.072, 0.071).
Does this means that the controls variables are not explaining much of what´s going on?
Or does it mean the controls are cancelling each other?
Besides, what does it mean if in some of the columns when I added the different specifications, the main coefficient (so the relation between YSM and the dependent variable) loses significance? For instance, instead of a 1% significance it has 5% significance, although it has similar values?

Array

Thank you

How to write this with a loop (repetitive coding with numerical values that do not follow each other)

Hello everyone,

How can I write this faster?

Code:

svy: prop rech if form1==2 &amp; gan1==14
svy: prop rech if form1==2 &amp; gan1==12
svy: prop rech if form1==2 &amp; gan1==8
svy: prop rech if form1==2 &amp; gan1==6
svy: prop rech if form1==2 &amp; gan1==5
svy: prop rech if form1==2 &amp; gan1==4
svy: prop rech if form1==2 &amp; gan1==3
svy: prop rech if form1==2 &amp; gan1==2
svy: prop rech if form1==2 &amp; gan1==1

svy: prop rech if form1==3 &amp; gan1==14
svy: prop rech if form1==3 &amp; gan1==12
svy: prop rech if form1==3 &amp; gan1==8
svy: prop rech if form1==3 &amp; gan1==6
svy: prop rech if form1==3 &amp; gan1==5
svy: prop rech if form1==3 &amp; gan1==4
svy: prop rech if form1==3 &amp; gan1==3
svy: prop rech if form1==3 &amp; gan1==2
svy: prop rech if form1==3 &amp; gan1==1


svy: prop rech if form1==1 &amp; gan1==14
svy: prop rech if form1==1 &amp; gan1==12
svy: prop rech if form1==1 &amp; gan1==8
svy: prop rech if form1==1 &amp; gan1==6
svy: prop rech if form1==1 &amp; gan1==5
svy: prop rech if form1==1 &amp; gan1==4
svy: prop rech if form1==1 &amp; gan1==3
svy: prop rech if form1==1 &amp; gan1==2
svy: prop rech if form1==1 &amp; gan1==1

Any suggestions please? Is it possible to produce a loop, despite the fact that the "gan1" variable numerical values do not follow each other?

Thank you so much.

Michael

Two way graph with multiple lines in a loop

Hello everyone, Iam trying to do a graph of multiple lines. What I want is a graph that shows in a same graph diffmain, diff4,diff5,diff21 and diff34 as shown in the image. Iam trying to do it with a loop but it does not work, anyone has an idea of how to run it?

Code:

levelsof state if (state ==4|state ==5|state==21|state==34), local(levels) 
4 5 21 34

use "PS2_data", clear
levelsof state if (state ==4|state ==5|state==21|state==34), local(levels) 
use "total_results34",clear
 
  foreach l of local levels {
  drop if rmspe`l'>=2*rmspemain
  twoway (line diffmain _time,color(black)), name(gmain,replace)
  twoway ((line diff`l' _time,color(gray)), xline(1989,lcolor(gray) lpattern(dash)) yline(0,lcolor(gray) lpattern(dash))), name(g`l',replace)
  graph combine gmain g`l'
 }

Code:

 _time rmspemain diffmain rmspe4 diff4 rmspe5 diff5 rmspe21 diff21 rmspe34 diff34)
1970 1.954279   4.857199 2.680468   7.661901 2.1965363   6.316099 3.5513296    10.3319 3.185257   9.161399
1971 1.954279  1.8468013 2.680468   4.467302 2.1965363   1.678701 3.5513296     4.8542 3.185257  4.4762006
1972 1.954279  -.8392038 2.680468   2.714297 2.1965363   .7133963 3.5513296  -2.419502 3.185257  -2.767302
1973 1.954279 -2.0618958 2.680468  2.5514026 2.1965363  -1.788496 3.5513296 -1.9223986 3.185257 -1.2888986
1974 1.954279  -.3828028 2.680468  1.7265977 2.1965363  -.2369025 3.5513296  -.5969042 3.185257   .1664961
1975 1.954279   .5749984 2.680468  .24679898 2.1965363   .3920981 3.5513296  1.0028981 3.185257  1.4219983
1976 1.954279   .3586017 2.680468   .6195006 2.1965363 -.09239785 3.5513296  -.3425991 3.185257   .9643013
1977 1.954279  1.5313005 2.680468  1.0875012 2.1965363   1.366001 3.5513296  -.9009992 3.185257  .35590065
1978 1.954279     2.3699 2.680468   1.039699 2.1965363  2.0144997 3.5513296   .8982986 3.185257  1.6601986
1979 1.954279 -1.4309988 2.680468    -2.4031 2.1965363  -1.730199 3.5513296 -1.1398989 3.185257  -.5653989
1980 1.954279 -.13320343 2.680468   .1253957 2.1965363  -.3608031 3.5513296  -1.229802 3.185257  -1.349701
1981 1.954279 -2.0308006 2.680468     -1.513 2.1965363  -2.757701 3.5513296 -2.3394022 3.185257 -2.7693014
1982 1.954279  -.9976991 2.680468  -.5988988 2.1965363 -1.8019996 3.5513296  -2.633698 3.185257 -2.3449986
1983 1.954279  -.9086969 2.680468  1.0149046 2.1965363 -1.1320968 3.5513296  -4.579699 3.185257  -4.761999
1984 1.954279  1.3141013 2.680468   3.783601 2.1965363   1.093001 3.5513296 -4.6354966 3.185257 -3.2002964
1985 1.954279  -.8733967 2.680468   .3187043 2.1965363 -1.5047966 3.5513296  -3.321697 3.185257  -2.688096
1986 1.954279 -1.2375056 2.680468  .05029415 2.1965363 -1.1422058 3.5513296 -3.9199016 3.185257  -3.229702
1987 1.954279 -3.6921985 2.680468  -2.598498 2.1965363  -4.066399 3.5513296 -3.0094006 3.185257 -2.0427008
1988 1.954279    -2.3528 2.680468 -2.7908006 2.1965363 -1.5935994 3.5513296 -2.8035014 3.185257 -1.9126003
1989 1.954279  -7.688198 2.680468  -9.372597 2.1965363  -7.157198 3.5513296    -5.6707 3.185257 -4.5439997
1990 1.954279  -9.518498 2.680468  -9.311898 2.1965363  -9.742398 3.5513296 -10.726597 3.185257 -11.082798
1991 1.954279 -13.776502 2.680468 -13.709203 2.1965363   -13.9996 3.5513296 -18.012802 3.185257 -16.981604
1992 1.954279   -13.3233 2.680468   -14.2893 2.1965363   -13.5949 3.5513296   -18.1126 3.185257 -17.167501
1993 1.954279 -17.057299 2.680468   -17.5239 2.1965363 -17.500498 3.5513296 -21.837196 3.185257 -20.791197
1994 1.954279   -20.9162 2.680468   -21.6492 2.1965363   -21.8273 3.5513296   -26.5969 3.185257   -27.2637
1995 1.954279   -19.8731 2.680468   -22.1855 2.1965363   -20.6644 3.5513296   -24.5541 3.185257   -23.0643
1996 1.954279   -21.0376 2.680468   -23.1864 2.1965363    -21.874 3.5513296   -24.3403 3.185257   -22.0078
1997 1.954279   -21.4709 2.680468   -24.2357 2.1965363   -22.5902 3.5513296   -24.3264 3.185257   -21.6234
1998 1.954279   -19.1829 2.680468   -20.5654 2.1965363   -20.3489 3.5513296   -26.0876 3.185257   -25.0846
1999 1.954279   -24.5438 2.680468   -26.1011 2.1965363   -25.6143 3.5513296   -27.8567 3.185257   -27.7196
2000 1.954279   -24.2594 2.680468   -25.1009 2.1965363   -25.0719 3.5513296   -27.6924 3.185257   -26.4853
end

Array

using newey west for heteroskedasticity and autocorrelation in vecm

So, I am trying to run a regression that requires me to log and take the first differences of it to render stationary. That is why I am using VECM.

I ran the diagnostic tests, and everything is fine except that I have heteroskedasticity and autocorrelation. So, I tried to use newey west SE in Stata and it fixed the problem. However, I have one question. Should I run let's say
newey d.log(y) d.log(x1) d.log(x2) lag() or only newey log(y) log(x1) log(x2) lag() to follow the VECM?

Thank you in advance!

Tuesday, November 29, 2022

creating a dummy variable based on percentage

In the following sample dataset the house election result in United States are given from 2002-2020. Candidatevotes indicate the person who is representing the party variable ( democrat, republican, green , independent) how much vote they got and the totalvotes variable indicate how much vote that state-district has.

I want to create an indicator of the incumbent House Representative being of the same party as the President ( democrat_pres variable tells if the year has a democrat president or not).

Also, I want to create a competitive indicator which will hold 1 if the democratic vote share is 40-45% , will hold 2 if the democratic vote share is 46-50%, will hold 3 if democratic vote share is 51-55% , will hold 4 if the democratic vote share is 55-60% and will hold 5 if the democratic vote share is 60%.

Can anyone kindly guide me how I can do the above ?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year str20 state byte(district state_fips) str47 party long(candidatevotes totalvotes) float democrat_pres
2016 "ALABAMA"     1 1 "REPUBLICAN"  208083 215893 1
2016 "ALABAMA"     1 1 ""              7810 215893 1
2016 "ALABAMA"     2 1 "REPUBLICAN"  134886 276584 1
2016 "ALABAMA"     2 1 "DEMOCRAT"    112089 276584 1
2016 "ALABAMA"     2 1 ""             29609 276584 1
2016 "ALABAMA"     3 1 "DEMOCRAT"     94549 287104 1
2016 "ALABAMA"     3 1 "REPUBLICAN"  192164 287104 1
2016 "ALABAMA"     3 1 ""               391 287104 1
2016 "ALABAMA"     4 1 "REPUBLICAN"  235925 239444 1
2016 "ALABAMA"     4 1 ""              3519 239444 1
2016 "ALABAMA"     5 1 "REPUBLICAN"  205647 308326 1
2016 "ALABAMA"     5 1 "DEMOCRAT"    102234 308326 1
2002 "CALIFORNIA"  6 6 "REPUBLICAN"   62052 209563 0
2002 "CALIFORNIA"  6 6 "LIBERTARIAN"   4936 209563 0
2002 "CALIFORNIA"  7 6 "REPUBLICAN"   36584 138376 0
2002 "CALIFORNIA"  7 6 "DEMOCRAT"     97849 138376 0
2002 "CALIFORNIA"  7 6 "LIBERTARIAN"   3943 138376 0
2002 "CALIFORNIA"  8 6 "REPUBLICAN"   20063 160441 0
2002 "CALIFORNIA"  8 6 "LIBERTARIAN"   2659 160441 0
2002 "CALIFORNIA"  8 6 "GREEN"        10033 160441 0
2002 "CALIFORNIA"  8 6 "DEMOCRAT"    127684 160441 0
2002 "CALIFORNIA"  8 6 ""                 2 160441 0
2002 "CALIFORNIA"  9 6 "DEMOCRAT"    135893 166917 0
2002 "CALIFORNIA"  9 6 ""                 6 166917 0
2002 "CALIFORNIA"  9 6 "LIBERTARIAN"   5685 166917 0
2002 "CALIFORNIA"  9 6 "REPUBLICAN"   25333 166917 0
2002 "CALIFORNIA" 10 6 "DEMOCRAT"    126390 167197 0
2002 "CALIFORNIA" 10 6 "LIBERTARIAN"  40807 167197 0
2020 "ARIZONA"     1 4 "REPUBLICAN"                       176709 365178 0
2020 "ARIZONA"     1 4 "DEMOCRAT"                         188469 365178 0
2020 "ARIZONA"     2 4 "DEMOCRAT"                         209945 381054 0
2020 "ARIZONA"     2 4 "REPUBLICAN"                       170975 381054 0
2020 "ARIZONA"     2 4 "WRITE-IN (COMMON SENSE MODERATE)"     35 381054 0
2020 "ARIZONA"     2 4 "WRITE-IN (INDEPENDENT)"               99 381054 0
2020 "ARIZONA"     3 4 "REPUBLICAN"                        95594 269837 0
2020 "ARIZONA"     3 4 "DEMOCRAT"                         174243 269837 0
2020 "ARIZONA"     4 4 "WRITE-IN (INDEPENDENT)"               39 398623 0
2020 "ARIZONA"     4 4 "WRITE-IN (LIBERTARIAN)"               67 398623 0
2020 "ARIZONA"     4 4 "DEMOCRAT"                         120484 398623 0
2020 "ARIZONA"     4 4 "WRITE-IN (REPUBLICAN)"                 5 398623 0
2020 "ARIZONA"     4 4 "WRITE-IN (INDEPENDENT)"                7 398623 0
2020 "ARIZONA"     4 4 "WRITE-IN (DEMOCRATIC)"                19 398623 0
2020 "ARIZONA"     4 4 "REPUBLICAN"                       278002 398623 0
2020 "ARIZONA"     5 4 "REPUBLICAN"                       262414 445657 0
end

local macro text with line break

Hi there

Does anyone know if there is a way to spread the text contents of a local macro across multiple lines?

Code:

local lines """First line" "Second line"""
disp "`lines'"

The desired result is to display:

First line
Second line

Editing graph with a TIFF file

Hi everyone,

Trying to find out if there's a way for me to edit my graph using stata's graph editor with just the TIF file. The graph was produced using stata a few months back by my colleague and I do not have the data anymore to reproduce the graph and edit them

Is there a way to import the TIFF file into stata just so I could edit the graph?

Thanks

pweight with melogit

I have a Panel dataset with 260,647 data points consisting of 260,647 Respondents within 43,400 households and between 1 and 11 survey waves. Using this dataset, I am trying to run a logistic random effects regression. So far, I am using xtlogit, which works fine except that it doesn't allow me to use the survey weights that are needed to make statistical inference. Additionally, I can't account for the household level in the model, which is why I am currently using robust standard errors. I would prefer to specify a model which allows me to include my survey weight with (pweight) and to explicitly model both the panel structure and the multilevel structure. The model looks like this:

Code:

webuse union.dta, clear
xtset idcode year
xtlogit union, vce(robust) re

(I am using the union data here because I am not allowed to share the actual data that I am using)

The multilevel structure can be added using xtmelogit, but as I understand it, it is no longer part of the official stata which makes me a bit suspicious and also it doesn't allow me to use pweights (and it takes a really long time to run even for the empty model).

A good option seems to be melogit, but for some reason it doesn't converge even when I am using it to fit the empty model without the houdehold level as soon as I include the weights. The model looks like this (of course, there is no weighting variable in the union dataset. To illustrate the problem, I have created a weighting variable pw = 1, but with this, the model runs. So this is just an illustration of what my code looks like and unfortunately can't replicate the problem):

Code:

gen pw = 1
melogit union [pw=pw] || idcode:

With my data, the output from that model looks like this:

Fitting fixed-effects model:

Iteration 0: log likelihood = -4.512e+08
Iteration 1: log likelihood = -4.500e+08
Iteration 2: log likelihood = -4.500e+08
Iteration 3: log likelihood = -4.500e+08

Refining starting values:

Grid node 0: log likelihood = .
Grid node 1: log likelihood = .
Grid node 2: log likelihood = .
Grid node 3: log likelihood = .
(note: Grid search failed to find values that will yield a log likelihood value.)

Fitting full model:

initial values not feasible
r(1400);

Does anyone have an Idea what could be the issue here even without looking at the data? Or maybe there is another way to specify xtlogit-like models in Stata that allows both pweights and another level?

Monday, November 28, 2022

Formal tests of volatility in STATA17

Dear Statlist:

I have longitudinal data on employee data across multiple years and organizations. I've been making a series of line graphs to visualize the trends in employee size across time by organizations. These are intuitive, but I wonder if there are formal tests/commands that let us test volatility. My goal is to see which organizations had the greatest fluctuation in terms of employee size (& salary) over the years. Thanks in advance, and here's an example of my data where "agy" is organization and "adjbasicpay" salary.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year str2 agy long adjbasicpay
1973 "AG" 18080
1973 "AG" 36000
1973 "AG" 34992
1973 "AG" 26671
1973 "AG" 36000
1973 "AG" 33915
1973 "AG" 12056
1973 "AG" 24247
1973 "AG" 30147
1973 "AG" 28267
1973 "AG" 34971
1973 "AG" 36000
1973 "AG"  9235
1973 "AG" 11961
1973 "AG"     .
1973 "AG" 15009
1973 "AG" 33899
1973 "AG"  9144
1973 "AG" 36000
1973 "AG" 30486
1973 "AG" 30147
1973 "AG" 16609
1973 "AG" 12634
1973 "AG" 34074
1973 "AG"  9969
1973 "AG"  9874
1973 "AG" 24247
1973 "AG" 21014
1973 "AG" 40000
1973 "AG"  7198
1973 "AG" 14053
1973 "AG" 24247
1973 "AG"     .
1973 "AG" 15331
1973 "AG" 36000
1973 "AG" 10002
1973 "AG" 31089
1973 "AG" 14671
1973 "AG" 16609
1973 "AG" 26898
1973 "AG" 13379
1973 "AG" 36000
1973 "AG" 32973
1973 "AG" 17497
1973 "AG" 10528
1973 "AG" 11961
1973 "AG"  8757
1973 "AG" 15331
1973 "AG" 34971
1973 "AG" 38000
1973 "AG" 12373
1973 "AG" 24247
1973 "AG" 15609
1973 "AG" 12501
1973 "AG" 33177
1973 "AG" 12283
1973 "AG"  8591
1973 "AG" 17605
1973 "AG" 15649
1973 "AG" 28267
1973 "AG" 13687
1973 "AG" 31089
1973 "AG" 12283
1973 "AG" 31089
1973 "AG" 34971
1973 "AG" 21671
1973 "AG" 34965
1973 "AG" 13996
1973 "AG"  8299
1973 "AG" 19246
1973 "AG" 23858
1973 "AG" 10234
1973 "AG" 25055
1973 "AG" 13336
1973 "AG" 11297
1973 "AG" 12979
1973 "AG" 33899
1973 "AG"  8299
1973 "AG" 36000
1973 "AG" 12056
1973 "AG" 14603
1973 "AG" 12634
1973 "AG" 31383
1973 "AG" 14928
1973 "AG" 36000
1973 "AG"  9493
1973 "AG" 36000
1973 "AG" 38000
1973 "AG" 10860
1973 "AG" 28267
1973 "AG" 29205
1973 "AG" 34971
1973 "AG" 36000
1973 "AG" 36000
1973 "AG" 33177
1973 "AG" 16609
1973 "AG" 12985
1973 "AG" 23088
1973 "AG" 15945
1973 "AG" 15009
end

Color changes with PNG export.

On the left, we have the graph I actually want (ignore please the slight transparency difference between legend and data colors). On the right, we have the PNG output from

Code:

graph export XXXX.png, replace

run immediately after the production of the graph. I tried

Code:

set printcolor asis

but this had no effect. I have had no problem producing PNG files with these colors previously. I have also tried exporting to PDF, getting different colors again than either of these.

Array

Generating the "opposing" variable in a long dataset

Hi all,

I am using Stata 17/SE on Mac and I am having trouble generating a variable using another observation within a group.

For context, I am working with tennis data. I have two rank variables, one a singles ranking (single_rank) and a doubles_ranking.
The singles rank is reflective of the player's rank, while the doubles rank is the average of the team's double ranking: egen var = mean(var), by(i).

i refers to match number, j is player number 1-2 is team 1 and 3-4 is team 2, Ranking_* variables refer to the original ranking data (MS = men's singles, MD = men's doubles, etc.).

My question is, is there a way that I can generate the opposing player/team's ranking in this long dataset (I have this for the tournament seed variable, where t_ refers to the player and o_ refers to the opponent).

Code:

input float i byte(j team p_pos t_tourn_seed o_tourn_seed) int(Ranking_MS Ranking_MD Ranking_WS Ranking_WD) float(rank_single rank_dbls)
367 1 1 1 4 . . .    .   .    .     .
367 3 2 1 . 4 . . 1326   . 1326     .
368 1 1 1 . 5 . . 1028 638 1028   638
368 3 2 1 5 . . .  536 626  536   626
369 1 1 1 . 3 . .    .   .    .     .
369 3 2 1 3 . . .  484 587  484   587
370 1 1 1 . 5 . .    .   .    .     .
370 3 2 1 5 . . .  536 626  536   626
371 1 1 1 . . . .    .   .    .     .
371 3 2 1 . . . .    .   .    .     .
372 1 1 1 4 . . .    .   .    .     .
372 3 2 1 . 4 . .    .   .    .     .
373 1 1 1 . . . .  692   .  692     .
373 3 2 1 . . . .    .   .    .     .
374 1 1 1 . 7 . . 1326   . 1326     .
374 3 2 1 7 . . .  612 620  612   620
375 1 1 1 4 2 . .    .   .    .     .
375 3 2 1 2 4 . .  326 324  326   324
376 1 1 1 . . . . 1326   . 1326     .
376 3 2 1 . . . .  986   .  986     .
377 1 1 1 . 3 . .  999   .  999     .
377 3 2 1 3 . . .  484 587  484   587
378 1 1 1 6 . . .  631 454  631   454
378 3 2 1 . 6 . .  938   .  938     .
379 1 1 1 1 4 . .  187 405  187   405
379 3 2 1 4 1 . .    .   .    .     .
380 1 1 1 . . . . 1168   . 1168     .
380 3 2 1 . . . .  825 852  825   852
381 1 1 1 1 5 . .  187 405  187   405
381 3 2 1 5 1 . .  536 626  536   626
382 1 1 1 . 3 . .    .   .    .     .
382 3 2 1 3 . . .  484 587  484   587
383 1 1 1 6 . . .  631 454  631   454
383 3 2 1 . 6 . .  915 891  915   891
384 1 1 1 1 . . .  187 405  187   405
384 3 2 1 . 1 . .    .   .    .     .
385 1 1 1 . . . .  692   .  692     .
385 3 2 1 . . . .  999   .  999     .
386 1 1 1 . . . .    .   .    .     .
386 3 2 1 . . . . 1307   . 1307     .
387 1 1 1 1 . . .  187 405  187   405
387 3 2 1 . 1 . .  925   .  925     .
388 1 1 1 . 2 . .    .   .    .     .
388 3 2 1 2 . . .  326 324  326   324
389 1 1 1 . 2 . . 1168   . 1168     .
389 3 2 1 2 . . .  326 324  326   324
390 1 1 1 . 2 . .  999   .  999     .
390 3 2 1 2 . . .  326 324  326   324
391 1 1 1 6 2 . .  631 454  631   454
391 3 2 1 2 6 . .  326 324  326   324
392 1 1 1 . . . . 1028 638 1028   638
392 3 2 1 . . . .  752   .  752     .
393 1 1 1 . 7 . . 1153   . 1153     .
393 3 2 1 7 . . .  612 620  612   620
394 1 1 1 . . . .    .   .    .     .
394 3 2 1 . . . . 1119 744 1119   744
395 1 1 1 . . . .  999   .  999     .
395 3 2 1 . . . .    .   .    .     .
396 1 1 1 4 . . .    .   .    .     .
396 3 2 1 . 4 . .    .   .    .     .
397 1 1 1 . . . . 1294   . 1294     .
397 3 2 1 . . . .  938   .  938     .
398 1 1 1 . . . .    .   .    .   907
398 2 1 2 . . . . 1383 907 1383   907
398 3 2 1 . . . . 1101 927 1101   927
398 4 2 2 . . . .    .   .    .   927
399 1 1 1 . . . .    .   .    .     .
399 2 1 2 . . . .    .   .    .     .
399 3 2 1 . . . .    .   .    .     .
399 4 2 2 . . . .    .   .    .     .
400 1 1 1 . 2 . .  915 891  915   891
400 2 1 2 . 2 . . 1168   . 1168   891
400 3 2 1 2 . . .    .   .    .     .
400 4 2 2 2 . . .    .   .    .     .
401 1 1 1 1 3 . .  187 405  187 364.5
401 2 1 2 1 3 . .  326 324  326 364.5
401 3 2 1 3 1 . .    .   .    .   620
401 4 2 2 3 1 . .  612 620  612   620
402 1 1 1 3 . . .    .   .    .   620
402 2 1 2 3 . . .  612 620  612   620
402 3 2 1 . 3 . .    .   .    .     .
402 4 2 2 . 3 . .    .   .    .     .
403 1 1 1 1 . . .  187 405  187 364.5
403 2 1 2 1 . . .  326 324  326 364.5
403 3 2 1 . 1 . .  536 626  536   626
403 4 2 2 . 1 . .  752   .  752   626
404 1 1 1 4 2 . . 1028 638 1028   638
404 2 1 2 4 2 . .    .   .    .   638
404 3 2 1 2 4 . .    .   .    .     .
404 4 2 2 2 4 . .    .   .    .     .
405 1 1 1 1 . . .  187 405  187 364.5
405 2 1 2 1 . . .  326 324  326 364.5
405 3 2 1 . 1 . .  999   .  999     .
405 4 2 2 . 1 . .  986   .  986     .
406 1 1 1 . . . .    .   .    .   454
406 2 1 2 . . . .  631 454  631   454
406 3 2 1 . . . .  536 626  536   626
406 4 2 2 . . . .  752   .  752   626
407 1 1 1 . 4 . . 1119 744 1119 665.5
407 2 1 2 . 4 . .  484 587  484 665.5
end

Any help is appreciated!

convert string yyyy-mm-dd hh:mm:ss to %td format

Hello,

As the title of this question suggests, I have a set of data with variable "Qtm" in string format, e.g., 2010-01-07 19:16:33, I would like to convert it to 07jan2010 format, and ignore the detailed hours in the day. How can I do this? Thanks a lot for any kind help.

Some data here:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str19 Qtm
"2010-01-07 19:16:33"
"2010-01-13 15:25:27"
"2010-03-02 15:29:59"
"2010-03-15 11:30:11"
"2010-05-08 11:12:47"
"2010-06-01 03:23:08"
"2010-06-02 13:56:21"
"2010-06-12 12:48:23"
"2010-06-29 12:49:02"
"2010-08-11 11:14:59"
"2010-09-01 08:58:56"
"2010-09-16 14:34:23"
"2010-11-16 16:17:54"
"2010-01-05 09:04:30"
"2010-01-05 14:08:51"
"2010-01-16 21:34:55"
"2010-02-02 15:32:56"
"2010-03-03 17:54:07"
"2010-03-04 15:24:07"
"2010-03-04 16:08:45"
end

Stata command for weighted M-estimator.

Dear all,

Does Stata has commands or packages that can estimate parameters for weighted M-estimators? For example, how to estimate beta with the estimating equations below:

Array

Best,
Liang

double-hurdle model not feasible

Hi,

I'm using a panel double-hurdle model using

Code:

xtdhreg

command. But I get the following error:

Code:

Obtaining starting values for full model:

Iteration 0:   log likelihood =  230358.78
Iteration 1:   log likelihood =  230440.79
Iteration 2:   log likelihood =  230440.81

Fitting full model:

initial values not feasible
r(1400);

Your help is appreciated. Thanks,

How I find the wrong data?

Dear Statalist,
I have the note: multiple positive outcomes within groups encountered, followed by
clogit Y asc fish_d2 fish_d3 vet_d2 vet_d3 alc_d2 alc_d3 smk_d2 smk_d3 pqt_d2 pqt_d3, group (ncs) cluster(
> ID). Help me to find the ID that has multiple positive outcomes.
Best Wishes,
Sukunta

Create Twoway Line Graph Forcing Gaps for Missing Periods

Hi all,

My data:

Code:

input float(Datum_n total_unem_bymonth sum_newpositions_bymonth)
723 148245 2261
724 150673 4089
725 144790  855
726 143049 5430
727 145249 5507
732 164182 4655
733 162495 5044
734 152841 5753
735 146375 4993
736 138150 4628
737 127136 3637
738 123275 3318
739 121203 3301
740 115404 3811
744 117633 3418
745 113398 4188
746 105133 3700
747  99974 3164
749  87939 3584

I would like to plot these data using a twoway line, with two y-axes; one for new positions, one for unemployment. The x-axis should be for time (Datum_n).

I ran the following code:

Code:

twoway line sum_newpositions_bymonth Datum_n , yaxis(2) ytitle("Monthly Total New Dual VET Positions") || line total_unem_bymonth Datum_n, yaxis(1) ytitle("Number of Registered Unemployed Individuals (Monthly Total)")

When I run the code above, both lines are continuous, however I do not want them to be.

The thing is, as is visible from the data extract, there are gaps in Datum_n, and I would like these gaps to be reflected in the line of the following variable: sum_newpositions_bymonth. Basically, I would want the line of sum_newpositions_bymonth to be discontinuous, meaning that it should be "interrupted" whenever there is no "Datum_n" (e.g. over the period 727 to 732) and then, after the gap, start again at the next available date for Datum_n.

Could anyone please let know how to adapt the code to show this discontinuity?

Many thanks in advance!

Treatment Variation Plot

Hello Statalist!

I was wondering if it would be possible for Stata to plot this kind of variation figure.
This figure shows how the treatment was assigned over time for each unit. What command can I use to make this kind of plot?
Thank you in advance.

Sunday, November 27, 2022

Create a matrix or data set from current variables in stata

Hi everyone!

I have a dataset with three variables: province_code (11 province codes), newnsector (14 sector codes) , r_output. In my Satata file, newnsector is shown as "SA" "SB,..."SP" but when I use command dataex, SA is shown as "1", SB is shown as "2" ... ( I do not know why). Now, I want to create a 11x14 matrix (data set) and the value of the matrix is r_output. The first row of this new matrix is values of r_output of province code 01 in sector order: SA, SB, SC, SD, SE, SF, SG, SH, SK, SL, SM, SN, SO, SP). Similarly, the second row is for province 02 and so on until province code 17. Here, there is a difficult issue. Province code 01 has values for all 14 sectors, but province code 02 and some other provinces do not have r_output of all 14 sectors. Therefore, I need to fill in the values of missing sector codes in those provinces with "0" value to makes sure that I have 11x14 matrix. Now, I am doing manually in excel by exporting the file to excel and then copying, pasting, transposing and fill in the missing sectors with 0 values. It takes a lot of time. I am wondering if Stata has any command for this problem. I tried to do by myself but impossible.

The reason for me to need 11x14 matrix is because I need to multiply this matrix with a 14x14 matrix (input output table). My true data set has 63 provinces and 14 sectors. Can you suggest me the code for the sample data set and show me how to modify for the true dataset of 63 provinces. I really appreciate your help. Thank you so much!

Code:

 Example generated by -dataex-. To install: ssc install dataex
clear
input str2 province_code int newnsector float r_output
"01" 14     .05285969
"01"  3     .00626575
"01" 13     .17437804
"01"  9     .09589774
"01"  4     .05790508
"01"  6   .0009028614
"01"  7      .0517438
"01"  8     .05596996
"01" 11     .04816558
"01"  5     .15201323
"01" 10     .05520221
"01"  2      .0496197
"01"  1     .02801064
"01" 12       .159773
"02"  9   .0003512017
"02"  5    .000280617
"02" 12  7.206216e-06
"02"  4   .0023777916
"02" 14  5.672023e-06
"02"  7  1.238403e-06
"02"  1 .000064757674
"02" 13   .0001931883
"02"  2 3.4551356e-06
"02" 10   .0001658643
"04"  5  .00004467173
"04"  1   .0002179262
"04"  9    .000660103
"04" 14 9.0871945e-06
"04"  4   .0006858561
"04" 10   .0030890896
"06" 10   .0006495666
"06" 14   4.08849e-06
"06"  5  .00023494294
"06"  9  .00027400945
"06"  2 .000032480897
"06" 12  9.435477e-08
"06"  7  2.440203e-09
"06"  4  .00018620113
"06"  1 .000015656562
"08"  5     .01765398
"08"  2   .0006631739
"08"  4    .006475695
"08"  8  3.717457e-06
"08"  7 .000011507996
"08" 14  1.726268e-06
"08" 10    .003837108
"08"  1   .0005638853
"08"  9    .003737794
"10" 14  6.785504e-06
"10" 12  .00039115755
"10"  1   .0003236186
"10"  4   .0019831005
"10"  2  1.968738e-07
"10" 10    .000279764
"10"  9   .0006842943
"10"  5   .0001658596
"10"  7     .03733616
"11"  9   .0013166956
"11" 14  .00008154935
"11" 10  .00006602735
"11"  1  .00001089103
"11"  5  .00005325684
"11"  4  .00007391685
"12"  4   .0004276246
"12" 10 .000023588744
"12"  1 1.3328968e-06
"12"  5  5.277554e-06
"12"  8  6.246046e-06
"12"  9  .00018962457
"12" 14 1.3077788e-07
"14"  5    .000132034
"14" 12 .000013313354
"14" 10  .00004675482
"14" 14  9.341277e-08
"14"  1   .0027883044
"14"  9   .0024881726
"14"  7  .00022676193
"14"  4  .00009249443
"14"  2  .00008890821
"15" 10  .00013244175
"15"  7   .0003468687
"15"  2   .0005966048
"15"  1   .0002936966
"15"  8  .00008438806
"15" 12  .00007266354
"15"  9    .013388124
"15" 14 .000016993652
"15"  4    .003852513
"15"  5    .002465888
"17"  3   .0005053952
"17" 12  .00006140318
"17" 14 .000034275014
"17"  1   .0008872058
"17"  7   .0007581466
"17" 10   .0011180259
"17"  5  .00037424185
"17"  8  .00012182483
"17"  2    .003683278
"17"  4    .013263932
"17" 13   .0009010206
end
label values newnsector nsector
label def nsector 1 "SA", modify
label def nsector 2 "SB", modify
label def nsector 3 "SC", modify
label def nsector 4 "SD", modify
label def nsector 5 "SE", modify
label def nsector 6 "SF", modify
label def nsector 7 "SG", modify
label def nsector 8 "SH", modify
label def nsector 9 "SK", modify
label def nsector 10 "SL", modify
label def nsector 11 "SM", modify
label def nsector 12 "SN", modify
label def nsector 13 "SO", modify
label def nsector 14 "SP", modify

Reshape data

Hi

I have the following data having following variables lat level lon time air dup time2. The level variables takes only two values 1000 and 925. I want to convert the data to lat level_1000 level_925 lon time air dup time2.

how I can do this please.

Thanks

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float lat int level float lon str19 time float(air dup) str8 time2
35 1000 0 "2021-01-01 00:00:00" 283.39996 1 "00:00:00"
35  925 0 "2021-01-01 00:00:00" 277.69998 2 "00:00:00"
35  925 0 "2021-01-02 00:00:00" 277.69998 1 "00:00:00"
35 1000 0 "2021-01-02 00:00:00" 282.69998 2 "00:00:00"
35 1000 0 "2021-01-03 00:00:00" 280.19998 1 "00:00:00"
35  925 0 "2021-01-03 00:00:00" 275.39996 2 "00:00:00"
35  925 0 "2021-01-04 00:00:00"     277.4 1 "00:00:00"
35 1000 0 "2021-01-04 00:00:00" 282.39996 2 "00:00:00"
35  925 0 "2021-01-05 00:00:00" 277.99997 1 "00:00:00"
35 1000 0 "2021-01-05 00:00:00" 283.89996 2 "00:00:00"
35  925 0 "2021-01-06 00:00:00" 279.89996 1 "00:00:00"
35 1000 0 "2021-01-06 00:00:00" 284.69998 2 "00:00:00"
35  925 0 "2021-01-07 00:00:00"     283.6 1 "00:00:00"
35 1000 0 "2021-01-07 00:00:00"     289.7 2 "00:00:00"
35 1000 0 "2021-01-08 00:00:00" 290.49997 1 "00:00:00"
35  925 0 "2021-01-08 00:00:00"       285 2 "00:00:00"
35  925 0 "2021-01-09 00:00:00"     287.2 1 "00:00:00"
35 1000 0 "2021-01-09 00:00:00" 292.19998 2 "00:00:00"
35  925 0 "2021-01-10 00:00:00" 279.99997 1 "00:00:00"
35 1000 0 "2021-01-10 00:00:00"     285.3 2 "00:00:00"
35  925 0 "2021-01-11 00:00:00" 277.69998 1 "00:00:00"
35 1000 0 "2021-01-11 00:00:00" 282.39996 2 "00:00:00"
35  925 0 "2021-01-12 00:00:00" 275.39996 1 "00:00:00"
35 1000 0 "2021-01-12 00:00:00"       280 2 "00:00:00"
35 1000 0 "2021-01-13 00:00:00" 280.99997 1 "00:00:00"
35  925 0 "2021-01-13 00:00:00"     277.7 2 "00:00:00"
35  925 0 "2021-01-14 00:00:00" 279.89996 1 "00:00:00"
35 1000 0 "2021-01-14 00:00:00" 283.19998 2 "00:00:00"
35  925 0 "2021-01-15 00:00:00"       278 1 "00:00:00"
35 1000 0 "2021-01-15 00:00:00" 282.39996 2 "00:00:00"
35  925 0 "2021-01-16 00:00:00"     280.1 1 "00:00:00"
35 1000 0 "2021-01-16 00:00:00" 283.89996 2 "00:00:00"
35 1000 0 "2021-01-17 00:00:00" 282.89996 1 "00:00:00"
35  925 0 "2021-01-17 00:00:00" 278.49997 2 "00:00:00"
35 1000 0 "2021-01-18 00:00:00" 287.69998 1 "00:00:00"
35  925 0 "2021-01-18 00:00:00"     285.8 2 "00:00:00"
35 1000 0 "2021-01-19 00:00:00"     288.7 1 "00:00:00"
35  925 0 "2021-01-19 00:00:00"       285 2 "00:00:00"
35 1000 0 "2021-01-20 00:00:00"     284.4 1 "00:00:00"
35  925 0 "2021-01-20 00:00:00"     282.6 2 "00:00:00"
35  925 0 "2021-01-21 00:00:00"     283.3 1 "00:00:00"
35 1000 0 "2021-01-21 00:00:00" 288.49997 2 "00:00:00"
35  925 0 "2021-01-22 00:00:00"     282.8 1 "00:00:00"
35 1000 0 "2021-01-22 00:00:00"     287.2 2 "00:00:00"
35  925 0 "2021-01-23 00:00:00" 282.49997 1 "00:00:00"
35 1000 0 "2021-01-23 00:00:00"     287.2 2 "00:00:00"
35  925 0 "2021-01-24 00:00:00"     282.8 1 "00:00:00"
35 1000 0 "2021-01-24 00:00:00"     287.9 2 "00:00:00"
35  925 0 "2021-01-25 00:00:00"     284.8 1 "00:00:00"
35 1000 0 "2021-01-25 00:00:00"       289 2 "00:00:00"
35 1000 0 "2021-01-26 00:00:00" 287.89996 1 "00:00:00"
35  925 0 "2021-01-26 00:00:00"     283.8 2 "00:00:00"
35 1000 0 "2021-01-27 00:00:00"     288.7 1 "00:00:00"
35  925 0 "2021-01-27 00:00:00" 284.89996 2 "00:00:00"
35  925 0 "2021-01-28 00:00:00"     287.8 1 "00:00:00"
35 1000 0 "2021-01-28 00:00:00" 291.99997 2 "00:00:00"
35  925 0 "2021-01-29 00:00:00" 289.89996 1 "00:00:00"
35 1000 0 "2021-01-29 00:00:00"     291.1 2 "00:00:00"
35  925 0 "2021-01-30 00:00:00" 288.69998 1 "00:00:00"
35 1000 0 "2021-01-30 00:00:00"       292 2 "00:00:00"
35  925 0 "2021-01-31 00:00:00" 285.59998 1 "00:00:00"
35 1000 0 "2021-01-31 00:00:00"     290.4 2 "00:00:00"
35 1000 0 "2021-02-01 00:00:00"       290 1 "00:00:00"
35  925 0 "2021-02-01 00:00:00"     284.9 2 "00:00:00"
35  925 0 "2021-02-02 00:00:00"     285.6 1 "00:00:00"
35 1000 0 "2021-02-02 00:00:00"     289.7 2 "00:00:00"
35  925 0 "2021-02-03 00:00:00" 288.19998 1 "00:00:00"
35 1000 0 "2021-02-03 00:00:00" 291.09998 2 "00:00:00"
35 1000 0 "2021-02-04 00:00:00"     291.7 1 "00:00:00"
35  925 0 "2021-02-04 00:00:00"     287.4 2 "00:00:00"
35 1000 0 "2021-02-05 00:00:00" 293.19998 1 "00:00:00"
35  925 0 "2021-02-05 00:00:00" 290.19998 2 "00:00:00"
35 1000 0 "2021-02-06 00:00:00" 293.89996 1 "00:00:00"
35  925 0 "2021-02-06 00:00:00"     289.2 2 "00:00:00"
35  925 0 "2021-02-07 00:00:00" 281.59998 1 "00:00:00"
35 1000 0 "2021-02-07 00:00:00"     286.4 2 "00:00:00"
35  925 0 "2021-02-08 00:00:00"     280.1 1 "00:00:00"
35 1000 0 "2021-02-08 00:00:00"     284.6 2 "00:00:00"
35  925 0 "2021-02-09 00:00:00"     283.2 1 "00:00:00"
35 1000 0 "2021-02-09 00:00:00" 289.09998 2 "00:00:00"
35 1000 0 "2021-02-10 00:00:00" 291.39996 1 "00:00:00"
35  925 0 "2021-02-10 00:00:00"     285.7 2 "00:00:00"
35  925 0 "2021-02-11 00:00:00" 283.69998 1 "00:00:00"
35 1000 0 "2021-02-11 00:00:00"     288.8 2 "00:00:00"
35  925 0 "2021-02-12 00:00:00"     287.9 1 "00:00:00"
35 1000 0 "2021-02-12 00:00:00"     292.8 2 "00:00:00"
35  925 0 "2021-02-13 00:00:00" 283.99997 1 "00:00:00"
35 1000 0 "2021-02-13 00:00:00" 289.59998 2 "00:00:00"
35 1000 0 "2021-02-14 00:00:00" 285.99997 1 "00:00:00"
35  925 0 "2021-02-14 00:00:00" 282.19998 2 "00:00:00"
35  925 0 "2021-02-15 00:00:00" 279.69998 1 "00:00:00"
35 1000 0 "2021-02-15 00:00:00"     284.8 2 "00:00:00"
35  925 0 "2021-02-16 00:00:00" 281.49997 1 "00:00:00"
35 1000 0 "2021-02-16 00:00:00"     284.7 2 "00:00:00"
35  925 0 "2021-02-17 00:00:00" 284.39996 1 "00:00:00"
35 1000 0 "2021-02-17 00:00:00"     285.3 2 "00:00:00"
35  925 0 "2021-02-18 00:00:00" 282.89996 1 "00:00:00"
35 1000 0 "2021-02-18 00:00:00" 284.99997 2 "00:00:00"
35 1000 0 "2021-02-19 00:00:00" 288.69998 1 "00:00:00"
35  925 0 "2021-02-19 00:00:00"     284.8 2 "00:00:00"
end

------------------ copy up to and including the previous line ------------------

Loop with a synthetic control method

Hi everyone,

I have a dataset at the state level with 39 states and I want to do a synthetic control regression (as shown) only for the states 4,5,21,34 by using the command levelsof ..., local() in a loop . Note that at the end of the regression I have to save the results in dataset "ps" that contains the number of the state. In this example if I run the regression for the state 4 I will save it in the dataset ps4 but I have to save also ps5, ps21 and ps34 in a loop.

Many thanks for your help!

Code:

tsset state year
synth cigsale beer  lnincome(1980&1985) age15to24  retprice  cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) fig keep(ps4, replace)

Code:

state year cigsale lnincome beer retprice age15to24
29 1970 123.9        . . 39.3  .1831579
32 1970  99.8        . . 39.9  .1780438
10 1970 134.6        . . 30.6 .17651588
21 1970 189.5        . . 38.9  .1615542
14 1970 115.9        . . 34.3  .1851852
27 1970 108.4        . . 38.4 .17545916
22 1970 265.7        . . 31.4  .1707317
25 1970  93.8        . . 37.3   .184466
 2 1970 100.3        . . 36.7 .16900676
36 1970 124.3        . . 28.8 .18942162
 9 1970 124.8        . . 41.4  .1669667
31 1970  92.7        . . 38.5 .17867868
34 1970  65.5        . . 34.6 .20207743
 7 1970 109.9        . . 34.3  .1874455
17 1970  93.4        . . 36.2 .18313035
 4 1970 124.8        . . 29.4 .19095023
16 1970 104.3        . . 39.1  .1747241
 3 1970   123        . . 38.8 .17815833
33 1970 106.4        . . 40.4 .18314135
13 1970 155.8        . . 28.3 .18131015
15 1970 128.5        . .   38  .1690141
24 1970 172.4        . . 27.3  .1935484
19 1970 111.2        . .   34  .1757925
35 1970 122.6        . . 37.7  .1797753
11 1970 108.5        . . 37.7 .16884956
 5 1970   120        . . 42.2 .16292876
12 1970   114        . . 34.2 .18052468
 6 1970   155        . .   39 .17335767
38 1970 106.4        . . 38.5   .174287
 8 1970 102.4        . . 33.8  .1781206
23 1970    90        . . 39.7 .18485743
37 1970 114.5        . . 33.7 .17259175
28 1970 107.3        . . 38.4   .163376
30 1970 103.6        . . 32.5 .20030876
26 1970 121.6        . . 36.6  .1732195
20 1970 108.1        . . 33.9 .17373738
18 1970 121.3        . .   36   .167593
 1 1970  89.8        . . 39.6  .1788618
39 1970 132.2        . . 34.1  .1746988
 9 1971 125.6        . . 41.4  .1689976
22 1971   278        . . 34.1  .1723339
31 1971  96.7        . . 38.5 .18049243
13 1971 163.5        . . 30.1  .1822996
39 1971 131.7        . . 34.4 .17722893
19 1971 115.6        . . 34.7  .1771459
17 1971 105.4        . . 37.5 .18437305
12 1971 102.8        . . 38.9 .18155004
 6 1971 161.1        . . 41.3  .1758872
34 1971  67.7        . . 36.6 .20206134
27 1971 115.4        . . 39.8  .1765248
15 1971 133.2        . . 38.8  .1703349
 7 1971 115.7        . . 35.8 .18786626
36 1971 128.4        . . 30.2  .1898735
18 1971 127.6        . . 36.8 .16925956
 1 1971  95.4        . . 42.7 .17992784
25 1971  98.5        . . 38.9 .18638696
24 1971 187.6        . . 29.4  .1936767
 8 1971 108.5        . . 33.6 .17609245
23 1971  92.6        . . 41.7  .1860954
28 1971 106.3        . . 44.7  .1650846
 2 1971 104.1        . . 38.8 .16995385
32 1971 106.3        . . 41.6 .17881927
20 1971 108.6        . . 34.7 .17521714
33 1971 108.9        . .   42 .18430856
26 1971 124.6        . . 38.1  .1745399
38 1971 105.4        . . 40.2 .17634407
11 1971 108.4        . . 38.5   .170839
37 1971 111.5        . . 41.6 .17312744
 3 1971   121        . . 39.7 .17929636
 4 1971 125.5        . . 31.1  .1916476
30 1971   115        . . 34.3  .2004893
21 1971 190.5        . .   44 .16377378
10 1971 139.3        . . 32.2 .17797175
16 1971 116.4        . . 40.1  .1767316
14 1971 119.8        . . 39.3  .1867808
29 1971 123.2        . . 40.2  .1838495
35 1971 124.4        . . 39.5  .1813672
 5 1971 117.6        . . 45.5  .1646539
 7 1972   117  9.63889 . 40.9   .188287
34 1972  71.3 9.601122 . 37.2 .20204525
13 1972 179.4 9.547482 . 30.6 .18328904
31 1972   103 9.630849 . 39.1  .1823062
15 1972 136.5  9.59714 . 41.5  .1716557
26 1972 124.4 9.779579 . 38.4 .17586027
32 1972 111.5 9.569716 . 41.6  .1795947
20 1972 104.9 9.746475 . 41.1  .1766969
29 1972 134.4 9.770211 . 41.6  .1845411
33 1972 108.6 9.675209 . 46.9 .18547577
22 1972 296.2 9.736376 . 36.1 .17393607
27 1972 121.7 9.625027 . 39.8 .17759047
28 1972   109 9.784274 . 44.7 .16679317
 8 1972 126.1 9.651839 . 33.7  .1740643
11 1972 109.4 9.730265 . 41.9 .17282845
19 1972 122.2 9.694641 . 40.1  .1784993
 4 1972 134.3 9.805548 . 31.2 .19234496
35 1972   138 9.673069 .   40 .18295917
 3 1972 123.5 9.930814 . 39.9  .1804344
30 1972 118.7 9.509309 . 34.1  .2006698
21 1972 198.6 9.944233 . 40.6 .16599335
12 1972   111 9.768118 . 38.8  .1825754
end
label values state state
label def state 1 "Alabama", modify
label def state 2 "Arkansas", modify
label def state 3 "California", modify
label def state 4 "Colorado", modify
label def state 5 "Connecticut", modify
label def state 6 "Delaware", modify
label def state 7 "Georgia", modify
label def state 8 "Idaho", modify
label def state 9 "Illinois", modify
label def state 10 "Indiana", modify
label def state 11 "Iowa", modify
label def state 12 "Kansas", modify
label def state 13 "Kentucky", modify
label def state 14 "Louisiana", modify
label def state 15 "Maine", modify
label def state 16 "Minnesota", modify
label def state 17 "Mississippi", modify
label def state 18 "Missouri", modify
label def state 19 "Montana", modify
label def state 20 "Nebraska", modify
label def state 21 "Nevada", modify
label def state 22 "New Hampshire", modify
label def state 23 "New Mexico", modify
label def state 24 "North Carolina", modify
label def state 25 "North Dakota", modify
label def state 26 "Ohio", modify
label def state 27 "Oklahoma", modify
label def state 28 "Pennsylvania", modify
label def state 29 "Rhode Island", modify
label def state 30 "South Carolina", modify
label def state 31 "South Dakota", modify
label def state 32 "Tennessee", modify
label def state 33 "Texas", modify
label def state 34 "Utah", modify
label def state 35 "Vermont", modify
label def state 36 "Virginia", modify
label def state 37 "West Virginia", modify
label def state 38 "Wisconsin", modify
label def state 39 "Wyoming", modify

Method deriving weight of second level for mtobit model

I would like to analyse my data in the multi-level Tobit method (metobit) with applying two-level weights. In multi-level linear mixed model we get the first level weight [pweight= IPW] by calculating a propensity weight and the Stata automatically estimates the weight. "size" for cluster level (2nd level). Stata command for the linear mixed model is "mixed unemp i.year inc edu [pweight= IPW]|| year:, pwscale(size)", My data of the unemployment variable is censored in distribution. My data would better fit in metobit method. But metobit does not automatically estimate the "size" weight. Stata command for mtobit is "metobit unemp inc edu [pweight=IPW] || year:, pweight(wvar2) ll(0)". The wvar2 in pweight(wvar2) is the 2nd level weight similar to "size". I do not know how to derive the weight data from my first-level estimate. Stata manual explained the meaning of the weight but not explained the calculation method. Can you please help me by describing a) the 2nd level weight-deriving method as Stata calculates the weight "size" b) providing reading materials to calculate the weight?

Leading zero

Hi statalist community,

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input  RTR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
end

What I need, is to put leading zero in single digit number-

Needed output looks like.

RTR
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17

Multiple random slopes in a mixed linear model

Dear all,

I have unbalanced panel data with repeated sampling over 6 years, hence I am employing mixed linear regressions. I want to incorporate various covariates, which differ either at level 1 (observation level) or level 2 (individual level). I want to assess the effect of reading on cognitive test scores in later life. My level 1 predictors are, among others, age, wealthgroup, and the spread between observations due to the unbalanced nature. The level 1 predictors may change over time within each individual. Mylevel 2 predictors are constant for each individual and are education level and gender. My code for the mixed model with the random intercept is

Code:

mixed testscore reading age wealthgroup spread education gender || mergeid_num:,

I now also want to add random slopes. My first indication was to set the spread between observations as a random slope with:

Code:

mixed testscore reading age wealthgroup spread education gender || mergeid_num: spread, covariance(unstructured)

Adding the spread as a random slope significantly improves the fit of my model. However, my question now is whether it is possible (and whether it makes sense) to add all level 1 predictors as random slopes. I have tried the following code, but even after an hour of computation time, no output was given

Code:

mixed testscore reading age wealthgroup spread education gender || mergeid_num: spread age wealthgroup, covariance(unstructured)

1. Is it possible to add multiple random slopes?
2. Does it make sense to add multiple slopes?
3. If I cannot/should not add multiple random slopes, how do I deal with the fact that my level 1 predictors vary over time?

Thanks all!

Saturday, November 26, 2022

Optimal lag selection in Granger Causality tests

I use [TS] varsoc to obtain the optimum lag length for the Granger causality test. This command reports the optimal number of lags based on different criteria such as Akaike's information criterion (AIC).
Is there any way to store the optimal lag number (obtained based on AIC) in a variable and use it in the next command to estimate causality? Something like this:
Lag= varsoc X Y
tvgc X Y, p(Lag) d(Lag) trend window(30) prefix(_) graph

Main effects of two independent variables across five groups

Hi,

I have unbalanced panel data for 160 companies from 5 different subgroups (g1,g2,g3,g4,g5) where group id is defined by business activity type, over 14 years for which I run the following baseline regression: (CVs: additional 7 control variables, l: lagged variable, X1 and X2: continuous independent variables, Y: continuous dependent variable)

xtreg Y l.X1 l.X2 l.CVs i.year,r

I want to check if the main effects of X1 and X2 on Y vary across 5 groups where all are lagged except for Y. For that reason, I ran the following sample regression;

xtreg Y l.X1 l.X2 l.g2.l.g3.l.g4.l.g5 l.c.X1#l.i.g2 l.c.X1#l.i.g3 l.c.X1#l.i.g4 l.c.X1#l.i.g5 l.c.X2#l.i.g2 l.c.X2#l.i.g3 l.c.X2#l.i.g4 l.c.X2#l.i.g5 l.CVs i.year,r

5 times in a row by omitting one different group at a time (g1 is omitted in the first one above).

Is this the correct approach? How about dropping all g2,g3,g4,g5 observations and running the regression for g1 companies only (5 times in total keeping the observations of only one different group at a time)?

My second model is baseline + interaction between X1 and X2:

xtreg Y l.X1 l.X2 l.c.X1#l.c.X2 l.CVs i.year,r

Should I rerun the aforementioned 5 regressions again (this time the interaction included) to observe the differences in the main effects of X1 and X2 across groups as in the following (g1 is omitted):

xtreg Y l.X1 l.X2 l.c.X1#l.c.X2 l.g2.l.g3.l.g4.l.g5 l.c.X1#l.i.g2 l.c.X1#l.i.g3 l.c.X1#l.i.g4 l.c.X1#l.i.g5 l.c.X2#l.i.g2 l.c.X2#l.i.g3 l.c.X2#l.i.g4 l.c.X2#l.i.g5 l.c.X1#l.c.X2#l.i.g2 l.c.X1#l.c.X2#l.i.g3 l.c.X1#l.c.X2#l.i.g4 l.c.X1#l.c.X2#l.i.g5 l.CVs i.year,r

Best,

Lutfi

How to draw overlayed coefplot with only one regression

Suppose I have ran the following regression
reg wage i.year#i.gender controls
where gender takes two values. What I want is to use coef such that x axis is years and for each year the estimates for two values of gender are drawn overlayed. How can I achieve that?

Duplicate row*

Hi how once can check the duplicate row in the stata please

Formatted IQR using Collect

Hi,

I am using the excellent Example 3 in the "Stata Customizable Tables" manual to help me build a table with frequency (percent) for categorical variables, and mean (sd) for continuous variables. Some of my continuous variables are, however, very skewed (age data for infants). For those variables I would like to report age in months as median (IQR). I would like to format the IQR as (p25 – p75). I think that what I need to do is combine both p25 and p75 into a single level of the dimension result, but I'm not sure how to do that.

I am aware that

Code:

table1_mc

does this. I have used and loved table1_mc, but I really want to learn the collect system.

For this post, below are data for age in months and sex. What I'm after is:

	Male	Female
Age (months)	median (p25 – p75)	median (p25 – p75)

Here are my data for sex and age (months):

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float age_adm_m byte sex
15.342916 1
 54.80082 1
 19.74538 1
75.663246 1
 19.81109 1
 46.45585 1
 32.29569 1
3.3839836 2
 67.15401 2
14.554415 1
69.650925 2
 83.58111 1
15.014374 1
 46.75154 2
36.172485 1
35.876797 1
 7.096509 1
 39.85216 1
 7.523614 1
  43.6961 1
  3.12115 1
 36.99384 2
 55.78645 2
 30.12731 2
 52.46817 2
 3.613963 1
 17.87269 2
31.507187 2
 30.58727 1
18.431211 1
 43.63039 2
15.967146 2
  50.7269 1
32.492813 2
16.689938 1
 18.89117 1
 30.45585 2
 3.581109 2
19.876797 1
 82.95688 1
 71.29363 2
  62.0616 2
 30.45585 1
 51.44969 2
 14.52156 2
11.498973 1
1.4784395 2
 28.64887 2
 51.58111 1
 72.24641 2
31.802876 1
 42.48049 1
2.1026695 2
 127.5729 1
 40.21355 2
 8.936345 1
 3.876797 2
30.390144 1
 44.71458 2
 11.17043 1
 10.61191 1
 39.09651 1
 14.52156 2
 78.91581 1
16.328543 1
 42.21766 1
11.039015 1
 80.16427 1
150.70226 2
 3.022587 1
 59.07187 1
 38.40657 1
 57.49487 1
 59.00616 2
 19.58111 2
 2.792608 2
 79.50719 2
122.71047 2
 92.09035 1
 2.562628 2
 46.02875 1
 95.77002 2
 34.49692 2
 6.702259 1
       48 2
 43.13758 2
125.40452 2
        . 1
 76.38604 1
11.334702 1
 43.23614 1
 59.59753 1
 55.88501 1
 6.537988 1
 82.16838 1
 43.00616 1
 54.17659 2
 25.23203 1
  54.2423 1
 17.87269 1
end
label values sex sex_lbl
label def sex_lbl 1 "Male", modify
label def sex_lbl 2 "Female", modify

------------------ copy up to and including the previous line ------------------

Here is my code for what I'm after using mean and SD:

Code:

table (var) (shortsite), statistic(fvfrequency sex) statistic(fvpercent sex) nototals append
collect style header result, level(hide)
collect style row stack, nobinder spacer
collect style cell border_block, border(right, pattern(nil))
collect layout (sex[1]) (shortsite#result)
collect style cell result[fvpercent], sformat("%s%%")

Kind regards,

Ryan

About Foreach or Forvalue

Hi! I am trying to create seven summary variables named den_1 to den_7 to simplify the results I have from 34 variables named total1-total34. Although there are 34 different groups, each observation actually only belongs to 7 groups at most. The number of groups that each belongs is shown by the variable group_n.

Enquiry:
Is possible that the -foreach- command or -forvalue- command can search over the 34 variables, take out the values, and put it in a new sets of variables (den_1- den_7)?

Data example skipping variables for simplification:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(total1 total2 total3 total32 total33 total34 group_n)
208 207  . . . . 4
208 207  . . . . 4
  .   .  . . . . 0
  .   .  . . . . 1
208 207  . . . . 4
  .   .  . . . . 1
  .   .  . . . . 4
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 1
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 0
  .   .  . . . . 1
  .   .  . . . . 0
208 207  . . . . 4
  .   .  . . . . 2
  .   .  . . . . 3
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . 5 . 2
208 207  . . . . 5
  .   .  . . . . 0
  .   .  . . . . 3
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 2
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 0
208 207 50 . . . 5
  .   .  . . . . 1
  .   .  . . . . 3
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 0
208 207  . . . . 5
  .   .  . . . . 2
  .   .  . . . . 2
  .   .  . . . . 1
  .   .  . . . . 1
  .   .  . . . . 2
208 207  . . . . 5
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 1
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 3
  .   .  . . . . 2
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 0
208 207  . . . . 5
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 1
  .   .  . . . . 2
208 207  . . . . 6
  .   .  . . . . 4
  .   .  . . . . 2
208 207  . . . . 6
  .   .  . . . . 2
  .   .  . . . . 2
  .   .  . . . . 3
208 207  . . . . 5
  .   .  . . . . 3
  .   . 50 . . . 2
  .   .  . . . . 0
  .   .  . . . . 2
  .   .  . . . . 0
  .   .  . . . . 3
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 0
  .   . 50 . . . 2
  .   . 50 . . . 3
  .   .  . . . . 0
  .   .  . . . . 3
  .   .  . . . . 0
  .   .  . . . . 0
208 207  . . . . 5
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 2
208 207  . . . . 4
  .   .  . 5 . . 3
  .   .  . . . . 2
  .   .  . . . . 3
  .   .  . . . . 2
  .   .  . . . . 5
  .   .  . . . . 0
  .   .  . . . . 0
  .   .  . . . . 0
end

Thank you all very much!!

Friday, November 25, 2022

GLM binomial logit model gof - Deviance

Good day,

I'm using Stata 16 and trying to do gof for a glm logit model but the results show a lot of missing data. My commands are as follows:

Code:

glm obese $X [aw=wt], family(binomial) link(logit)
predict mu_logit
predict dr_logit, deviance
qui glm obese $X [aw=wt], family(binomial) link(cloglog)
predict mu_cl
predict dr_cl, d
format mu_logit dr_logit mu_cl dr_cl %9.5f
list mu_logit dr_logit mu_cl dr_cl, sep(4)

The results show a lot of missing data

6616. | . . . . |
|------------------------------------------|
216617. | . . . . |
216618. | . . . . |
216619. | . . . . |
216620. | . . . . |
|------------------------------------------|
216621. | . . . . |
216622. | . . . . |
216623. | . . . . |
216624. | . . . . |
|------------------------------------------|
216625. | . . . . |
216626. | . . . . |
216627. | . . . . |
216628. | . . . . |
|------------------------------------------|
216629. | . . . . |
216630. | . . . . |
216631. | . . . . |
216632. | . . . . |
|------------------------------------------|
216633. | . . . . |
216634. | . . . . |
216635. | . . . . |
216636. | . . . . |
|------------------------------------------|
216637. | . . . . |
216638. | . . . . |
216639. | . . . . |
216640. | . . . . |
|------------------------------------------|
216641. | . . . . |
216642. | . . . . |
216643. | . . . . |
216644. | . . . . |
|------------------------------------------|
216645. | . . . . |
216646. | . . . . |
216647. | . . . . |
216648. | . . . . |
|-----------------------------------------

My dataex

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float obese byte(wquintile fprogram_1 fprogram_2 gender_1 gender_2 race_1 race_2 race_3 race_4 geo_1 geo_2 mbmi_1 mbmi_2 gradem_1 gradem_2 employed_1 employed_2)
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 0 1
. . . . . . . . . . . . 1 0 1 0 0 1
. . . . . . . . . . . . 1 0 1 0 0 1
. . . . . . . . . . . . . . . . . .
0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1
. . . . . . . . . . . . 1 0 1 0 0 1
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 1 0 0 1 0 1 0 0 1 0 . . . . . .
0 1 1 0 1 0 0 1 0 0 1 0 . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. 1 1 0 0 1 0 1 0 0 1 0 . . . . . .
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1
0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1
. . . . . . . . . . . . 1 0 1 0 0 1
. . . . . . . . . . . . 1 0 1 0 0 1
0 1 1 0 1 0 0 1 0 0 1 0 . . . . . .
0 1 1 0 1 0 0 1 0 0 1 0 . . . . . .
. 1 . . 1 0 0 1 0 0 1 0 . . . . . .
0 1 1 0 1 0 0 1 0 0 1 0 . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 . . 0 1 0 1 0 0 1 0 . . . . . .
0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0
0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0
0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 1 0 0 1 0 1 0 0 1 0 . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 1 0 0 1 0 1 0 0 1 0 . . . . . .
. 1 1 0 1 0 0 1 0 0 1 0 . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0
. . . . . . . . . . . . 1 0 1 0 1 0
0 1 . . 1 0 0 1 0 0 1 0 1 0 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1
0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 0 1
. 1 . . 0 1 0 1 0 0 1 0 0 1 1 0 0 1
. . . . . . . . . . . . . . . . . .
1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1
. . . . . . . . . . . . 1 0 0 1 1 0
0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
. . . . . . . . . . . . 1 0 1 0 1 0
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
0 1 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 0 1 0 1 0
. . . . . . . . . . . . 1 0 1 0 1 0
0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0
1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . 0 1 1 0 1 0
. . . . . . . . . . . . . . . . . .
. 1 0 1 1 0 0 1 0 0 1 0 1 0 1 0 1 0
end
label values obese obese5
label def obese5 0 "0. Not_Obese", modify
label def obese5 1 "1. Obese", modify

------------------ copy up to and including the previous line ------------------

Please assist.

Regards
Nthato

Help: weights not allowed r(101)

Hi! I met a mistake in following command:

egen x2 = group(sex re age educ)

gen wm = sex == 1 & re == 1

egen wminc = mean(incwage) [pw=x2] if wm == 1, by(age)

First two lines worked and the mistake reported at the third line: weights not allowed r(101)

Could you please help me solve it ?

Estimating latent class models in Stata using both categorical and continuous indicator variables

Hi Statalist community,

I am trying to run latent class models and I am fairly new to this type of analysis. I have continuous and categorical indicators. I am using the following link as a reference. In the second example within the link, there is an example to run latent class models where there are both continuous and categorical indicators.

https://stats.oarc.ucla.edu/mplus/se...ixture-models/

However, the analysis is conducted using Mplus which is quite expensive and I am trying to replicate the example using Stata. I was wondering if there is a native Stata command or a user-generated Stata package available for me to replicate the example. In addition, if you have any papers that you could direct me to, I would really appreciate it. Thank you so much.

xtivreg2: endogeneity and overidentification

Hello everyone,
I have an issue with interpret the result of xtivreg2:
- Overidentification is significant
- Endogeneity test is not significant

Please see the attached picture.
I don't know how I should interpret this result. The endogeneity test indicates that there is no endogeneity issue. This implies no need for the I.V. However, the overidentification test suggests the i.v. is a valid one?
Thank you

Array

Household Fixed Effect

Hello,
Is there an issue with using HH fixed effects or individual fixed effects when the death of a family member is the treatment in a DID framework? Is it feasible just to stick to state or district-fixed effects?

Problems with command "etable"

Hi there,

My intention is to run "etable" to create a results table of two models.

My code is:

[ATTACH=CONFIG]temp_29317_1669370440451_675[/ATTACH]

However, for some reason (even though I followed the instructions of the stata manual) it continues to create this unwanted table. I would like the results of the second model to be at the same row level of the first model.

[ATTACH=CONFIG]temp_29320_1669370480147_636[/ATTACH][ATTACH=CONFIG]temp_29319_1669370451881_354[/ATTACH]

I would highly appreciate some help. Thanks in advance.

Kind regards,
Antonio

Which test should i use on Stata ?

Hello,

I would like to know which test to use on Stata according to my configuration.

So I want to compare 3 ways to collect data
So I have 3 groups for which I collect a Y variable according to the methods A B C, and for each person I also collect the Y variable but according to the reference method (called y_base).

It looks like this with random data generated (in my database, i have something like 500 people in each groups)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 module float(y_base y)
1 "A" 6 2
2 "A" 7 3
3 "A" 6 4
4 "A" 4 4
5 "A" 6 2
6 "A" 4 2
7 "A" 4 5
8 "A" 3 2
9 "A" 6 2
10 "A" 4 6
11 "A" 7 3
12 "A" 5 1
13 "A" 3 3
14 "A" 5 5
15 "A" 1 5
16 "A" 6 4
17 "A" 4 4
18 "A" 3 3
19 "A" 3 5
20 "A" 4 5
21 "B" 4 5
22 "B" 5 3
23 "B" 2 3
24 "B" 3 5
25 "B" 1 5
26 "B" 4 5
27 "B" 1 8
28 "B" 4 6
29 "B" 7 3
30 "B" 8 5
31 "B" 2 5
32 "B" 8 4
33 "B" 7 4
34 "B" 8 1
35 "B" 4 5
36 "B" 6 4
37 "B" 5 7
38 "B" 5 2
39 "B" 5 6
40 "B" 5 5
41 "C" 6 3
42 "C" 8 4
43 "C" 6 3
44 "C" 5 4
45 "C" 6 4
46 "C" 6 4
47 "C" 6 4
48 "C" 3 2
49 "C" 6 3
50 "C" 7 2
51 "C" 5 2
52 "C" 7 4
53 "C" 7 4
54 "C" 4 2
55 "C" 5 2
56 "C" 5 5
57 "C" 6 2
58 "C" 3 5
59 "C" 4 1
60 "C" 5 3
end

Which test on Stata should i use to check which one is closer to my y_base ?

Thanks you

Thursday, November 24, 2022

Understanding type of panel

Hi everyone,

I have individual level data for 20 regions for 18 years. The question is to examine the impact of x on y (binary variable). However, there is no individual id. Basically the data looks like this:

region	year	x	y	age	gender
1	1993	20	1	20	1
1	1993	26	0	25	1
1	1993	12	1	40	1
1	1994	13	1	21	0
1	1994	20	1	30	1
2	1993	25	0	25	1

It is not the same individuals that are tracked every year. My questions are:

Is this still a panel despite me not knowing anything about individuals?
I have run a regression using the following command: logit y x gender age i.region i.year, vce (cluster region) Is this the correct way to include region and year fixed effects?
Should I define it to be a panel data using xtset and then run the xtreg command. I read in a different post that when you have multiple observations under a particular region and year that might not be the right way.
What if I want to run a non-parametric regression on this? Will having dependent variable and some of the controls are binary impact anything?

How to calculate days and hours between two dates

Hi,

I am calculating days and hours between two dates (admission date/time and discharge date/time), thanks for any suggestions.

Example attached:

clear

input str9 admi_date str9 admi_time str9 dischage_date str9 dischage_time
1 11jan2019 1154 12jan2019 0716
2 15feb2019 0217 08oct2018 0934
3 01dec2019 2314 09feb2020 0817
end

The final results should have two new variables, one denotes the total days between two dates, and another one indicates the total hours between two dates.

Best,
Zichun

Problem of missing observations in CAPM estimation

I have following variables from raw data and I want to estimate the return of Stock PG using CAPM model. I found that there are a lot of missing observations due to the date gaps since trading does not occur in the weekend and during public holidays. How can I fix this problem? Should I use a new date variable?

Moreover, after fixing of date, i use the following commands to get the estimation result:

tsset date, d
gen lnPG=ln(PG)
gen rPG=100*(lnPG-L.lnPG)
gen rirf=rPG-rfr
reg rirf r_market

Is it appropriate to use gen rPG=100*(lnPG-L.lnPG) to generate the return of stock PG from its daily close price?
*just ignore hml and smb because I just use CAPM not Fama-Fetch three factor model
Array

where r_market stands for the return of market portfolio, rfr stands for daily return for the Treasury bills (risk-free rate of return), PG is the adjusted close price of stock PG

egen for numeric variables

Hi, I'm trying to concat two numeric variables whit the egen command, I know when it is used a string variable results, but, when I try to destring a "non-numeric characters" appears and the variable isn't replace. I need to do math operations with this variable so I can't leave it as a string. If you know the solution, I would be very grateful for your help.

e.g. I got two weight variables p1005peso_2 for an integer number and p1005peso_2 for a fraction of the weight
+----------+ +----------+
| p1005p~2 | | p100~o_1 |
|----------| |----------|
1. | 103 | 1. | 6 |
2. | 54 | 2. | 9 |
3. | 62 | 3. | 8 |
4. | 61 | 4. | 5 |
5. | 70 | 5. | 4 |
+----------+ +----------+

**egen peso_con = concat(p1005peso_2 p1005peso_1), punct(.)

. list peso_con in 1/5

+----------+ this is how I need the variable shown, but numeric
| peso_con |
|----------|
1. | 103.6 |
2. | 54.9 |
3. | 62.8 |
4. | 61.5 |
5. | 70.4 |
+----------+
in the missing values shows ". . ."

** destring peso_con, replace
peso_con: contains nonnumeric characters; no replace

DiD

Dear All,

I have ran into several general issues whilst starting to work with DiD.
The data that I have is repeated cross-sections with 12 cities of which 3 are in the treatment group and 9 are in the control group. I want to estimate student grant on enrollment. I attach the sample of my dataset below.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long pid int year byte city float income byte female float(age ability) byte educ int grant byte enrolled float d
1204550 2010  3  30835.06 0 17.654997 103.11912 10 0 1 0
 319284 2014  3 31932.877 0 18.130793 103.68096 14 0 1 0
 889613 2013  3 36795.105 1 17.410234 105.94727 18 0 1 0
 978314 2014 11 34354.633 1 18.826042 109.90023 13 0 1 1
 522308 2018 12  39353.02 0 16.946407   94.4239 11 0 1 0
 188084 2014  7 34108.605 0  16.74036  105.9109 12 0 1 0
 807206 2016 12 28902.074 0 16.833967  113.8728 13 0 0 0
 320333 2013  7  27654.77 0 17.068989 102.17348  8 0 1 0
 450150 2010 12  32522.38 1  17.06066  99.98117 13 0 0 0
  18346 2016  2 36964.234 1 17.876476 107.94476 15 0 1 0
 365140 2010  7 32668.863 1 17.623026  92.24822 14 0 1 0
 890188 2018  7  34312.56 0 17.601595 109.98123 13 0 1 0
 646758 2018  1  33627.96 0  18.01731  120.2906 10 0 1 0
 459585 2015  7 29370.793 1 17.249372  87.39782  6 0 0 0
 935203 2013 12  32445.84 0  17.77647 75.570305 18 0 1 0
 787581 2011 10  30065.26 1 18.051292  99.88745  5 0 1 1
 498375 2015  7  35290.19 1  16.93023  93.36523 13 0 1 0
 174629 2019  6 30601.344 0 16.877481 75.047035  9 0 0 0
 611611 2011  5 30250.344 0 17.525812  90.92487 14 0 0 0
 838826 2016  7 30751.756 0 17.068407  107.4832 14 0 1 0
1083592 2012  6  32664.69 1 17.389822  106.0466 11 0 1 0
 242215 2015  7   30082.6 0 17.077396 72.571175 12 0 1 0
 510105 2015  5  36238.57 0 17.273191  93.56672 14 0 1 0
1171168 2018  1 27488.486 1 17.905424 104.80125  8 0 1 0
 505633 2013 10  38493.77 1 18.317528 109.59848 10 0 1 1
1073150 2010  4  31469.16 0 17.581125   98.7337 10 0 1 0
 287828 2018  1  30329.86 1 17.092825  85.97654 17 0 0 0
 891757 2017 12  35884.47 0 18.140705 111.96175 12 0 1 0
 913149 2019  5 35003.055 0 16.631374 122.54363 11 0 1 0
 991665 2015  6 33503.066 1 17.866404  70.98935 18 0 1 0
 243071 2011 11  31379.88 1 17.084621 103.64464 14 0 1 1
 250937 2017  9 35780.746 0 16.826647  86.37576 13 0 1 0
1027638 2018 12 33101.484 1 17.773394  91.35767 10 0 1 0
  22953 2013 10  29485.53 1 17.464046    96.355 13 0 1 1
 153108 2018  7 29512.227 1 17.275154  94.25189 12 0 1 0
 658013 2013  2 27794.816 1 16.913126  82.87864 10 0 1 0
1180143 2013  5 32402.127 1 16.620207  92.83389 10 0 0 0
 312950 2010  7  28712.87 1  17.36845  89.96732 12 0 0 0
  13933 2013  2 33346.375 0  17.85618  99.07447 12 0 1 0
 461750 2010 12  31624.84 0 18.187187  119.1078 12 0 1 0
1067041 2011  3 28147.123 1 17.787865 108.14761 18 0 0 0
 807595 2015  9 31974.695 0 16.861078 114.08148 10 0 1 0
 598028 2018  9 35101.645 1 18.408142 100.17933 15 0 0 0
 898812 2012  7 34147.723 1 18.142551  88.43004 12 0 1 0
 498185 2015 12  33769.85 1 18.160706  76.81758 11 0 1 0
 284335 2015  7 27923.943 1 17.350409 105.08566 18 0 0 0
 528101 2011  9 30789.766 1 17.574682  111.9395 17 0 0 0
 120755 2015  3  38303.02 1 17.399488 102.59693 11 0 1 0
 931243 2013  5  26780.69 1 17.589457 112.46512 16 0 1 0
  13793 2013  8 33066.164 1 17.885649  78.38871  6 0 0 1
1089259 2019  7  31339.92 0 17.514992  120.0189 17 0 1 0
1157963 2013  3 34078.273 0 17.390617  83.69234 16 0 1 0
 864745 2015 12 31167.443 0  17.64933  97.17146  9 0 0 0
 597375 2015 11 32886.715 1 17.705078  90.70517 13 0 1 1
 652215 2015 11  28142.48 0 17.882828  88.85712 11 0 1 1
 228472 2012 12 29800.863 1  17.13096  93.62843 18 0 1 0
 138830 2010  3 32122.283 1  18.19804 106.25028  7 0 1 0
 744471 2011 11  32415.04 0 17.548534 104.27207 18 0 1 1
 598536 2016  2 34121.395 1 16.673536  89.31037 15 0 1 0
1052614 2014  8 27814.113 1 17.010475 108.49964 10 0 1 1
 649531 2011  7 27315.727 1 18.158386  81.48443 18 0 1 0
1021262 2012  7 30865.465 1  18.37912  91.87194 16 0 1 0
 745194 2014  7 29216.043 1 17.223442  98.18081 15 0 0 0
1178861 2011  8  27306.32 0 17.345772  88.04616 16 0 1 1
1139135 2015  4  31997.08 0  17.50336  82.63182 18 0 1 0
 549748 2018 12  31119.97 0 16.786953  90.79367 18 0 1 0
 749490 2010  7   25319.2 0 17.834885  85.50574 12 0 0 0
1050823 2013  3 31714.156 1  17.77309  106.7881 12 0 0 0
 719246 2016  2 30775.043 0  17.12436  94.05766  7 0 1 0
1005210 2010  8 30388.896 1 17.395172  98.40136 13 0 0 1
 447412 2012  9 31698.453 0  17.66549 111.54143  9 0 1 0
1083881 2011 11  35093.87 1 17.310112  102.2826 14 0 0 1
1017055 2015 12 30806.945 1 18.521137 104.66325 11 0 1 0
 992772 2012 12 27001.717 0 17.590246 112.75533 14 0 1 0
 890703 2013  8  26494.21 1 16.931662  95.20155 12 0 1 1
 281129 2019  6 37380.133 0  16.40155  91.67627 18 0 1 0
 668979 2019  3 36621.348 1 18.151688   95.8932 14 0 0 0
 967675 2015  4  34753.12 0 17.897005 113.03803 18 0 1 0
 348630 2010  5  32139.91 1 17.538477 104.08703 10 0 1 0
 539534 2014  9  32938.26 0 18.574429  87.49246 11 0 1 0
 125183 2013  8 31783.246 1 18.586412 115.38096 16 0 1 1
1083222 2012  8  32871.15 1 17.557425  98.71842 16 0 0 1
 808739 2019  4  31418.37 1 16.910383 73.486244 18 0 0 0
 972730 2010  3 27586.594 1 17.133444 102.77448 15 0 1 0
 177327 2017  9 32548.914 0 16.684296  93.75323 15 0 0 0
 103276 2016  1  30006.35 0  16.82226  103.4052  8 0 1 0
 856478 2018  7 34106.258 0 17.852732 107.36507 14 0 1 0
 923792 2012  6 35160.816 0  17.40168  96.59897 14 0 1 0
1064345 2015 12 34054.652 1 17.536524 101.16731 16 0 1 0
 522532 2012  3  32600.11 1 17.823885  95.66457 16 0 1 0
 407449 2019  1 29082.324 1 17.626932  94.40669 14 0 0 0
 819533 2013  3  30923.36 0 18.071976 100.04102 13 0 1 0
 315863 2013  1 30462.123 0 16.875221  93.62828 16 0 1 0
  56267 2017  7  31048.93 1 17.411005   99.2887 17 0 0 0
 174233 2013  7 28117.346 0 17.959694 112.06963 11 0 1 0
 459831 2011  7 34452.543 0 18.144615 105.93893 15 0 1 0
 515840 2010  8  32178.18 0 17.602175  89.60943 12 0 1 1
 186889 2019  7  32696.75 0 17.649858  95.25651 16 0 1 0
 672054 2014  3  35514.21 1 17.791594    93.148  6 0 1 0
 900622 2012  3  29506.32 0 17.503372  93.44817 18 0 1 0
end

These are the questions I have for DiD.

1. In order to run an appropriate model, from what I've found in Scott Cunningham "Causal inference" is that most of the times the errors are correlated within groups. First, is there a way to test it such as white-test and if not should clustered errors be used regardless?
2. For DiD the most important assumption is parallel trends. However, is there still any use of finding out the degree to which both groups are balanced, if so what would be the appropriate code for that?
3. Most important question is which method would give me the extend to which student grant matters? (I want to find out if there is an intensive effect of student grants on enrollment)

Kindest regards

Variable not recognized after modification

Hi everyone,

For company data analysis for my thesis, I wanted to create industry dummy variables based on the company's SIC Code 2's. Yesterday I successfully managed to do that by writing this code (and similar ones with different industry names):
generate Ind_Manu=.
replace Ind_Manu=1 if inrange(Sic_code_2,2000,3999)
replace Ind_Manu=0 if Sic_code_2>3999
replace Ind_Manu=0 if Sic_code_2<2000

As a result I got a beautiful dummy variable that indicated a 1 if the company's SIC Code 2 was between 2000-3999, and a value of 0 if not.

Today I tried patching the missing SIC Code 2 values through an alternative database. I basically copied the SIC-Code from the database and inserted it into my dataset at the place of the missing value of my Sic_code_2 variable.
The value was still black, so I assumed it would still be recognized as a numerical value.

However, now I wanted to update the Dummy variable with the new Sic Code 2 data, by running the following line again:
replace Ind_Manu=1 if inrange(Sic_code_2,2000,3999)

Yesterday this line worked perfectly fine, but somehow now I get the error: "no variables defined"
I did not change the name of the variable, so I think it has something to do with the fact that I manually entered data into the Sic Code 2 variable.

Does anyone know how to resolve this issue?

Thanks in advance!

Ruben

Unable to report & understand relogit marginal effects

Dear all,

In my (regression) analysis I am trying to see when companies are more likely to demand aid from governments, considering 5 major factors – firms’ size, revenue, unemployment in the economy, imports, gdp growth, and my main explanatory variable “ownership”, that is, if a company is owned by a large multinational corporation.

Some variables vary over time (such as revenue “lrev” and employee size “lnempl”) but others, such as mnc ownership, does not.

My outcome variable is binary – i.e., whether or not a firm has applied for “ad” or not. Main IV is also binary – whether a firm is “mnc_owned” or not. 5 other variables are continuous.

Since the event I am interested in is quite rare, I run a rare events logistic regression using the relogit command (OLS reveal similar results as well).

With all the variables in the model, my results are the following,

relogit ad mnc_owned lrev lnempl unemployment lnimport gdp_growth sector year

Corrected logit estimates Number of obs = 371157

------------------------------------------------------------------------------
| Robust
ad | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mnc_owned | 2.580213 .083656 30.84 0.000 2.416251 2.744176
lrev | .4812687 .0308463 15.60 0.000 .4208109 .5417264
lnempl | -.1178756 .0324088 -3.64 0.000 -.1813956 -.0543555
unemployment | .0195259 .0905168 0.22 0.829 -.1578838 .1969356
lnimport | -.1665573 .9645765 -0.17 0.863 -2.057093 1.723978
gdp_growth | -.0019636 .0226464 -0.09 0.931 -.0463498 .0424226
sector | .0005393 .0000406 13.29 0.000 .0004598 .0006188
year | .0095729 .0405276 0.24 0.813 -.0698597 .0890055
_cons | -28.80196 95.22616 -0.30 0.762 -215.4418 157.8379
------------------------------------------------------------------------------

The results bear my theoretical expectations.
Now I am trying to provide some visuals with a margins command but I have 2 issues I cannot seem to resolve,

When I try to understand the impact of my main IV (mnc ownership), the response is the following:

margins mnc_owned
factor mnc_owned not found in list of covariates
r(322);

So I decided to use dydx command instead, which seems to work:

margins, dydx (mnc_owned)

Average marginal effects Number of obs = 371,157
Model VCE: Robust

Expression: Linear prediction, predict()
dy/dx wrt: mnc_owned

------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mnc_owned | 2.580213 .083656 30.84 0.000 2.416251 2.744176
------------------------------------------------------------------------------

But, now I cannot seems to see the impact of other continuous variables, such as revenue or size at different levels.

For revenue, for instance, when I try,

margins, dydx(lrev) at(lrev=1)

Average marginal effects Number of obs = 371,157
Model VCE: Robust

Expression: Linear prediction, predict()
dy/dx wrt: lrev
At: lrev = 1

------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lrev | .4812687 .0308463 15.60 0.000 .4208109 .5417264
------------------------------------------------------------------------------

But for margins, dydx(lrev) at(lrev=6) the result is also identical – which does not make any sense… the level of revenue between 1 and 6 should most certainly not reveal identical results...

1) I cannot seem to see the difference in the different levels of this variable – how can I see the marginal effect of revenue at 1 versus revenue at 7?
2) Is it also possible to do this by keeping another variable in a given value, such as
margins, dydx(mnc_owned) at(lrev=1)

When I try this, the results do not change whether revenue (lrev) is 1 or 3 or 5 or 6 ... I am trying to understand what I am doing wrong.

I hope I was able to formulate these two issues.
Any help is appreciated.

Best,
Aydin

Wednesday, November 23, 2022

Rearranging columns and rows to make it country-level database

I want to create two columns:
Column A with all the countries listed in the photo below
Column B with one of 3 options (Advanced Economies, Emerging Market Economies, Low-Income Developing Countries) to match whichever category it is placed under in the photo below.

So ideally, we'd end up with something that looks like:

Country	Type
Australia	Advanced Economies
Austria	Advanced Economies
Albania	Emerging Market Economies
Afghanistan	Low-Income Developing Countries

I can do this easily on excel by moving cells around, etc.

But how do I do this with purely Stata? I realize it's not a wide-to-long format reshaping matter.

Thank you in advance,

Array

Side-by-side boxplots with markers for means

Hello,

I have country-level panel data for GDP per capita for the past 10 years that I wish to represent in a side-by-side boxplot (by years. So my x-axis would be years, y-axis would be GDP per capita).

I also have categories of countries. Some countries are classified as "fragile" and others are "non-fragile".

Is there a way to overlay the mean GDP value of "fragile" countries per year on the boxplot graphs? I am aware that others have attempted to do this using the "twoway rbar" method, but was wondering if there is a more efficient way, preferably sticking to the "graph box" syntax

Creating a scatterplot with two different variables

How to create a scatterplot with two different variables one being dependent and one being independent?

the original question is:

Create a scatterplot with “salary” as the dependent variable and “unemployed” as the independent variable.

Callagain command by Behaghel et al.

Hi

I'm currently trying to run the command "callagain" by Behaghel et al. (to be found here). Unfortunately, the command does not seem to work. One file just opens the help page in Stata for the command and the ado-file cannot be run through, as there are several error messages from line 433 on. Does anybody ever have a similar problem? Is there a quick fix for this?

Thanks a lot for the help!
Best,
Arto

Writing a formula in which stata chooses a value or 1

Hi!

I'm trying to use this formula to calculate eGFR: eGFR = 142*min(standardized Scr/K, 1)^α *max(standardized Scr/K, 1)^-1.200 *0.9938Age *1.012 [if female]
Scr= creatinine, e.g. creatinine=2.4.

In this formula, the min(standardized Scr/K, 1) means that you either choose the creatinine value divided by kappa, OR 1. The same goes for the max-one.

I'm looking for advice on how to write the command so Stata realizes this. Grateful for any help!

Adding rows under a variable

Stata command for adding rows lets say 10 rows under a variable which has fixed observations like 40 for each unique id

Tuesday, November 22, 2022

please help. making bar graph more simple with "over" options

Hi, guys.

I'm a newcomers with stata, and there is some trouble in making bar graph.

My data has 46 observations, and it has numeric variable for displaying year-month.

here is some code i had run:

Code:

graph bar fluct, over(year, label(labsize(vsmall) angle(45)))    /// 
    blabel(bar, size(vsmall) format(%9.1f)    ///
    )    ///
    ytitle("Fluctuation") graphregion(color(white))

and the graph it produces:

Array

Four problems I want to solve:

1) The x-axis labels are too many so I want to display some labels (e.g. 201910, 202010, 202110, 202210). this graph aims to show that the fluctuation along specific month by year.

2) Similar with problem 1), I want to display some bar labels only ("blabel" option).

3) some bar labels are overlapped. I want to show it not overlapping

4) I want to coloring only last one.

I attach the data here. please some help.

Please help cant work out how to t-test. Im new to stata

Hi guys I'm struggling to compare 2 different regions with a t-test.

As seen from the screenshots I have multiple regions and have generated new separate variables for those 2 regions I want to test.
One of my dependent variables is "patience".

However, when I run the ttest it compares the region I select in the GroupVariable name with all the other existing regions. I am only looking to test patience on those 2 specific regions.

Can someone help me please I'm a beginner.
Thanks

Pseudo Panel Data and Mediation Analysis

Dear Stata Experts

I'm a PhD student, and I'd like to know how to use the codes to perform the Mediation analysis when using pseudo-panel data.

Using cii proportions with loop

Hello,

I am trying to compute confidence intervals for proportions (number of cases / total) on each observations of a simple database using the ci proportions / cii proportions command.
Here is my code and the database:

Code:

input str7 continent cases total
Africa 544 863
America 43 172
Asia 372 734
Oceania 19 25

local contlist Africa America Asia Oceania
    
    foreach continent of local contlist {
     ci proportions total cases if continent == "`continent'"    
    
}

This gives an output with empty values. I have also tried with cii proportions, which was also unsuccesful.

I can get the results one by one, by typing the following command - for Africa for instance - but I would like to have it automatized:

Code:

cii proportions 863 544

Does anyone have a idea on how to solve this?

I am using Stata 14.

Many thanks in anticipation

Monday, November 21, 2022

LSDV and collinearity

Hi everyone,
I am having trouble implementing a simple least square dummy variable (LSDV) model.
The model I am implementing is the following:

reg log_wage i.vet_yes c.age i.vet_yes#c.age vetcountry female $control daustria dcanada dczechrepublic ddenmark destonia dfinland dfrance dgermany direland djapan dkorea dnetherlands dnorway dpoland dslovakrepublic dspain dsweden duk dusa [pw=weight_adjusted] , vce(robust)

In particular, vetcountry takes value of one if the country is a vocational oriented country, zero otherwise.
Now the main problem is that when running this regression stata drops two variables (dcanada and dgermany: a vocational and a general oriented country).
Am I right to believe that if the coefficients for Vetcountry is estimated, is only due to the fact that two country dummy have dropped from the model? Given that, does it mean that the coefficient estimated for Vetcountry might not be reliable? If so, do you have any suggestions on how to overcome this problem as I really need to estimate this essential independent variable?

Many thanks

Space between axis and line chart

Hi, I have dataset starting in Febuary 2020 but when I create a chart for some reason the x axis starts on the first of January, so there is a gap between the line and the axis. Any help on how to remove it would be much appreciated!

Code:

twoway line varx1 date_num

and my data is

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(date_num varx2)
21960  .03993094
21961 .028612074
21962 .035981826
21963  .03093639
21964 .023211634
21965  .05790542
21966  .07603507
21967  .08387616
21968   .1548472
21969  .17819597
21970   .3043231
21971  .24888514
21972   .4006798
21973   .6627359
21974   .6639794
21975  1.4217416
21976  1.7526433
21977   1.692854
21978   1.842995
21979  2.2927668
21980   3.126146
21981    3.62297
21982  4.1054153
21983   6.309479
21984   8.616345
21985   8.994414
21986  11.120467
21987  14.858947
21988  15.002494
21989   15.87091
21990  19.247196
21991  17.635626
21992  18.665474
21993  19.430426
21994   22.93767
21995  16.207178
21996  14.968385
21997  18.667145
21998  15.427457
21999  15.471897
22000  13.691607
22001  13.549828
22002   11.70698
22003   9.430447
22004  12.200086
22005   9.846559
22006   12.64052
22007  10.324313
22008    9.74395
22009   8.157842
22010   6.826304
22011   8.218202
22012   6.821047
22013    7.32387
22014   6.679968
22015   6.459072
22016   5.239533
22017  4.6077657
22018   4.909679
22019   4.310663
22020  4.1643333
22021  4.0630765
22022  3.8334265
22023  3.1395385
22024   2.461292
22025  3.1197665
22026  2.5037735
22027   3.111698
22028  2.3318615
22029  2.1955478
22030  1.6780353
22031  1.0646594
22032  1.4131098
22033   1.661886
22034   1.460008
22035  1.4889643
22036  1.4995496
22037   1.194172
22038   .8829122
22039  1.0871441
22040   .5983202
22041   .6902591
22042  .58275956
22043   .6993674
22044   .5696148
22045   .6091068
22046    1.35688
22047  1.0096726
22048  1.0628527
22049  1.0090808
22050   .9965445
22051   .6764844
22052   .7071733
22053   .9780415
22054  1.0201133
22055   .7530233
22056    .749878
22057   .9125931
22058   .7052158
22059   .4717451
end
format %td date_num