Saturday, July 31, 2021

Between subjects error term for three-way repeated measures anova

I have been having some difficulties trying to run a three-way, repeated measures anova. I have made several attempts to try and run this successfully, have read the FAQ on repeated measures, past forum posts, the Stata manual, and the 'help' responses within Stata, but continually get an error message stating "could not determine between-subject basic unit; use bseunit() option r(422)"

The syntax I have predominately use is as follows:

Code:
 anova responsetime id congruency / congruency|id cue / cue|id bias / bias|id congruency#cue /
congruency|cue|id congruency#bias / congruency|bias|id cue#bias /
cue|bias|id congruency#cue#bias, repeated(congruency cue)
I have also used the click/point method of running the analysis through the Stata menu's, and have also tried to alter the syntax based on readings/research and still receive the same error.

As an overview on my data: I have 99 subjects who have each completed 816 trials (split between 2 sessions, with each session having 408 trials each), coming to a total of 80784 trials. In both sessions the subjects response time is recorded (m/s) (DV), and subjects are measured on three independent variables. The first is a categorical, within-subjects variable ('Congruency'), where subjects in both sessions are exposed to an equal number of 'congruent' and 'incongruent' trials. The second is a categorical, within subjects variable ('Cue'), where subjects in both sessions are asked to make judgements on either the 'top' or 'bottom' half of the face. The third variable is a categorical variable ('Bias'). This variable has three levels, 'Top Bias', 'Bottom Bias' or 'No Bias'. In 1 of the 2 sessions (either the first or second), all participants are exposed to the 'No Bias' level of this variable. However, in the remaining sessions (either the first or second, depending on which session they experienced 'No Bias'), participants are exposed to either a 'Top Bias' or 'Bottom Bias', but not both.

How these variables exist in the data set:
DV (Response Time): Numeric Variable (m/s)
IV1 (Congruency): Categorical Variable coded as 0 (incongruent) and 1 (congruent)
IV2 (Cue): Categorical Variable coded as 1 (top half) and 2 (bottom half)
IV3 (Bias): Categorical Variable coded as 0 (no bias), 1 (top bias) or 2 (bottom bias).
ID: Participant ID (1, 2, 3 etc.)

The data is in long format.

I am using the most recent version of Stata, on a Macbook Pro if that impacts your suggestions.

Does anyone have any advice on why this may be happening?

Panel Data Regression and ANOVA

Hi,

I am using panel data for US manufacturing companies for the period 2000-19. I am focusing on the impact of international diversification (or GeoGraphic Segment Diversification ie GSD) on Performance (or EBIT_ROA). I have divided my data into 3 eras i.e. pre-crisis period 2001-06 (era=1), crisis period 2007-09 (era=2) and post-crisis period (era=3). Based on my analysis, the margins impact of GSD on performance does not differ significantly in the 3 eras.

I would like to now check whether the level of GSD itself varies in the 3 areas. I did this analysis using xtreg as shown below. My interpretation is that GSD varies significantly across the 3 eras. I would like to check if there is any way to do this analysis for panel data using ANOVA. Thank you.

Code:
. xtreg Ln_GSD l1.era2 l1.era3 if  CoAge>=0 & NATION=="UNITED STATES" & NATIONCODE==840 & FSTS>=1
> 0 & GENERALINDUSTRYCLASSIFICATION ==1 & Year_<2020 & Year_<YearInactive & Discr_GS_Rev!=1, fe c
> luster(n_CUSIP)

Fixed-effects (within) regression               Number of obs     =     26,796
Group variable: n_CUSIP                         Number of groups  =      3,563

R-sq:                                           Obs per group:
     within  = 0.0203                                         min =          1
     between = 0.0000                                         avg =        7.5
     overall = 0.0022                                         max =         19

                                                F(2,3562)         =      54.03
corr(u_i, Xb)  = -0.0417                        Prob > F          =     0.0000

                            (Std. Err. adjusted for 3,563 clusters in n_CUSIP)
------------------------------------------------------------------------------
             |               Robust
      Ln_GSD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        era2 |
         L1. |   .0624621   .0080676     7.74   0.000     .0466444    .0782798
             |
        era3 |
         L1. |   .0999507    .009785    10.21   0.000     .0807659    .1191355
             |
       _cons |  -.4679418    .004862   -96.24   0.000    -.4774745   -.4584091
-------------+----------------------------------------------------------------
     sigma_u |  .60796916
     sigma_e |  .26823125
         rho |  .83706486   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test l1.era2 l1.era3

 ( 1)  L.era2 = 0
 ( 2)  L.era3 = 0

       F(  2,  3562) =   54.03
            Prob > F =    0.0000

Find value starting with a pattern ( similar to LIKE operator in SQL)

Hello ALL,

I am new to Stata, I have a small doubt, Any Help would be greatly appreciated.

TABLE1(.dta)
Column1 Column2 Column3 Column4
ZA112234 CA52167 PK09146 VF90103
ZA114234 CA52564 K091743 VF90115
ZA115432 CA52734 JI090803 CA52012
ZA116444 CA52923 KL02134 CA52173
ZA117642 CA52064 SD13463 CA52212
ZA118543 CA52231 IU46366 VF90122
ZA119654 CA52341 GHJ3454 ZA11199
HK233543 LK23426 LK34534 ZA11275
ZA111298 CA52086 UJ54352 KL678925
ZA112897 CA52112 PK07762 ZA11375
ZA112777 CA52175 PK09123 VF90139
ZA119057 CA52187 PK00264 ZA11455
ZK012397 GF32431 PK09132 VF90149
ZA112532 CA52999 PK09431 VF90150
KK110998 BB52563 JJ09567 FFF9016E




















TABLE2(.dta file)
DATA_VALUE
ZA11
VF90
CA52
GF32
LK
IU

I have two (.dta files)
I want to check the TABLE2, DATA_VALUE’s with all the elements in the TABLE1 and create a new column (Present) in TABLE1 and show 1 or 0 if the match is found (depending on the output).(and at the same time, if we don’t find the match make it as NULL)
Column1 Column2 Column3 Column4 Present
ZA112234 CA52167 null VF90103 1
ZA114234 CA52564 null VF90115 1
ZA115432 CA52734 null CA52012 1
ZA116444 CA52923 null CA52173 1
ZA117642 CA52064 null CA52212 1
ZA118543 CA52231 IU46366 VF90122 1
ZA119654 CA52341 null ZA11199 1
null LK23426 LK34534 ZA11275 1
ZA111298 CA52086 null null 1
ZA112897 CA52112 null ZA11375 1
ZA112777 CA52175 null VF90139 1
ZA119057 CA52187 null ZA11455 1
ZK012397 GF32431 null VF90149 1
ZA112532 CA52999 Null VF90150 1
null null null null 0
























In SQL, we use LIKE operator to see similar strings,
Example: If we have ZA112234,
We can do this, LIKE “ZA11%”-à this will search all the values which has starts with (ZA11) ZA11xxxx.

But I don’t know how to do this in stata 16.

Any help would be greatly appreciated,
Thanks in advance

mata mlib create

I'm having trouble with mata mlib create . I can't see to get a mata library set up anywhere

Changes in Control/Treatment groups after PSM

Dear friends,

Hi. As in the previous post,

Hi All,

In case anyone viewing this thread is interested, here's the code to do what Navid with a few additions to display both graphs side-by-side with the y- axes having common scales.

sysuse auto, clear
psmatch2 foreign mpg, out(price)

// compare _pscores before matching & save graph to disk
twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0, ///
lpattern(dash)), legend( label( 1 "treated") label( 2 "control" ) ) ///
xtitle("propensity scores BEFORE matching") saving(before, replace)

// compare _pscores *after* matching & save graph to disk
gen match=_n1
replace match=_id if match==.
duplicates tag match, gen(dup)
twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0 ///
& dup>0, lpattern(dash)), legend( label( 1 "treated") label( 2 "control" )) ///
xtitle("propensity scores AFTER matching") saving(after, replace)

// combine these two graphs that were saved to disk
// put both graphs on y axes with common scales
graph combine before.gph after.gph, ycommon


Richard
I wondered if the dummy variable for treatment/control groups after PSM was now changed as below,

Code:
 
 replace _treat =1 if  _treat ==1 & _weight != .  
 replace _treat =0 if  _treat ==0 & dup >0
I wanted to do regressions after PSM. Did I need to update the dummy variable for treatment/control groups?

Thank you in advance.

Trying to calculate time length for hospital episodes of multiple consec. rows, but also multiple nested within one 'person'

Hi STATALIST,
First time poster here. I am hoping someone can assist with my hospital record data calculations of nested episodes length of stay, where my dataset has many separations (rows) per person (ppn), and I need to calculate the length of stay for contiguous separations that create one whole episode of care... however there can possibly many many of these 'whole episodes of care' for one person over the 10- years of data. Here is an example - where the 'ppn' shows same person, being 'admitted' (admitdttm) to hospital on a given date/time, separated (dischdttm) on a given date/time - I have calculated ''cont_stay' where one separation doesn't exceed 1 day between the next admission (using [_n-1 etrc] functions.. the 'obs' variable is a continous count of all separations by ppn over the 10 years of data. The episode_seq_no shows the chronological order of separations within ONE unique whole episode of care. What I want to do but can't figure out how (!!!) - - is sum the 'los_sep_days' (days rounded to 2 dec places) of each separation, but within one whole episode of care (i.e. episode_seq_n starts at 1 and finished when cont_stay no longer == 1. I hope this explains it in words.. or here is a snip of the data:
Array


THANK YOU in advance if you can help!! I have spent hours and hours on this and am going round in circles.

Set colors

Sorry, I have asked this before. But now I have two computers and have spend an hour with help and the manuals and can 't remember how to set the background colors for the main and other menus. Any help would be appreciated.

Ric Uslaner

local macro and loop

Hi Statalist, I am trying to use a local macro with a foreach loop to clean up some vars. Here is my code:

Code:
local yn1 q1 q2 q7 q8 q10
foreach var of local yn1 {
    replace `var' = upper(`var')
}
I'm not getting any error messages, but it also doesn't seem to be working. I can provide sample data if needed, but I am assuming that there is just some very obvious typo (or some other error/thing I'm leaving out) that can be pointed out without any data...

Modelling temporal dependence in within-between "hybrid" models

Dear Statalisters,

I would like to know how to properly specify a temporal variable in within-between "hybrid" random effects models.

My model, a logistic regression concerning inter-state conflict initiation , takes the following form:

y: binary dependent variable (1 = conflict initiation)
z1: time-variant predictor
z2: time-variant predictor
t t2 t3 : cubic polynomial of years since conflict initiation

Following Schunck (2013), the within-between variables are constructed:
Code:
by cluster, sort: center z1, prefix(w) mean(b)
by cluster, sort: center z2, prefix(w) mean(b)
And an interaction between z1 and z2 may take the following form
Code:
gen wz1Xbz2 = wz1*bz2
by cluster, sort: center wz1Xbz2, prefix(w_) mean(b_)
And so the model takes the following form:
Code:
logit y wz1 bz1 wz2 bz2  w_wz1Xbz2  b_wz1Xbz2
The next step is to include t t2 t3 in the model, but I am unsure about how to do so. Specifically, is it correct to simply include these variables in the model like so:
Code:
logit y wz1 bz1 wz2 bz2  w_wz1Xbz2  b_wz1Xbz2  t t2 t3
Or do I instead need to generate mean and centred versions of these time variables as well (I am tending toward the former since it does make much sense to me to model peace years as a cluster-average or cluster-deviation)

Further to this, I would also like to include an interaction between ,say, wz1 and t t2 t3. But , of course, to do so, I first need to know how these time variables should be specified in the model.

Finally, I should add that t t2 t3 - as a running count of the number years since conflict - are distinct from the actual time series variable, which is simply the year of observation.

Any advice would be hugely appreciated.

Matthew




Marginal effects for group comparisons in probit regression

Hi all,

I am trying to use a probit model to analyze a bunch of interaction effects for a binary dependent variable in a probit model. My dependent variable is callback, taking the value of 1 if a subject receives a job interview, 0 if not. At the moment, I have 4 independent variables, three of which are binary: black (1 if black, 0 if not), woman (1 if woman, 0 if not), parent (1 if parent, 0 if not), and occupation (which can take 6 values).

As is widely known, marginal effects for interaction terms can't be calculated. For this reason, I am exploring other ways of making group comparisons.

As an example, I want to know what the marginal effect of being a woman is on being black with respect to the probability of receiving a callback. I.e. I want to compare black men with black women.

To do this, I generated the following code:

Code:
margins black, dydx(woman)
I have three questions:

1. is this code appropriate for the described purpose?

2. In the following table, how should I interpret the base outcome? Should I read that the effect of being a woman raises the probability of a callback with .0098 for black subjects and .0104 for white subjects?

Array

3. should I include interaction terms in my initial probit regression? I have combined the previous line of code with the following two regressions, yielding different results:

Code:
probit callback i.occupation##i.black##i.woman##i.parent
Code:
probit callback i.occupation i.black i.parent i.woman


Thanks for your time!

Idiosyncratic Skewness

I run the below code on appended dataset to calculate monthly idiosyncratic skewness as skewness of residuals obtaining by regressing previous one years of daily returns (rt) data on daily market returns (mkt) and square of daily market returns (mkt2). The code however gets hanged and does not produce any results. I can't figure out the issue. Need help.

Code:
xtset mdate stock_id

capture program drop one_regression
program define one_regression
    regress rt mkt mkt2
    predict resid, resid
    summ resid, detail
    gen double skewness = r(skewness)
    gen obs  = r(N)
    gen double mean = r(mean)
end

bysort stock_id mdate (date): gen high = cond(_n==1., mdate, -11)
rangerun one_regression, by(stock_id) interval(mdate -11 0)

collapse(mean) mdate reg_nobs stock_id b_mkt2  , by (stock year month)
order stock year month mdate reg_nobs stock_id b_mkt2
replace b_mkt2 =. if reg_nobs<200

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float stock_id str52 stock float(date mdate) byte month int year float(rt mkt mkt2)
1 "3M India Ltd." 15886 521 6 2003  .009533716  .007865   .0000619
1 "3M India Ltd." 15883 521 6 2003 -.012074256  .009992   .0000998
1 "3M India Ltd." 15882 521 6 2003  .011034168  .003893   .0000152
1 "3M India Ltd." 15881 521 6 2003 -.013975303  .016005 .000256166
1 "3M India Ltd." 15880 521 6 2003 -.003102842  .002033   4.13e-06
1 "3M India Ltd." 15879 521 6 2003  .004186865 -.013336 .000177845
1 "3M India Ltd." 15876 521 6 2003  .004656811  .008685   .0000754
1 "3M India Ltd." 15875 521 6 2003  .001214485  .006526   .0000426
1 "3M India Ltd." 15874 521 6 2003  .031898383  .006009   .0000361
1 "3M India Ltd." 15873 521 6 2003 -.007203266  .021803 .000475352
1 "3M India Ltd." 15872 521 6 2003 -.004266642 -.001889   3.57e-06
1 "3M India Ltd." 15869 521 6 2003  -.01446385   .00344   .0000118
1 "3M India Ltd." 15868 521 6 2003 -.007322823  .010436 .000108913
1 "3M India Ltd." 15867 521 6 2003  .019095726  .008985   .0000807
1 "3M India Ltd." 15866 521 6 2003  -.02163825 -.018968 .000359791
1 "3M India Ltd." 15865 521 6 2003   .09973572  .006161    .000038
1 "3M India Ltd." 15862 521 6 2003 -.017246207  .006592   .0000435
1 "3M India Ltd." 15861 521 6 2003  .004897146  .000296   8.74e-08
1 "3M India Ltd." 15860 521 6 2003 -.006100614  .014723 .000216768
1 "3M India Ltd." 15859 521 6 2003 -.031379614  .001928   3.72e-06
1 "3M India Ltd." 15858 521 6 2003    .0157434    .0028   7.84e-06
1 "3M India Ltd." 15855 520 5 2003  .015344273  .016388 .000268563
1 "3M India Ltd." 15854 520 5 2003 -.007012919  .009597   .0000921
1 "3M India Ltd." 15853 520 5 2003 -.005384392  .010709 .000114693
1 "3M India Ltd." 15852 520 5 2003  -.03143467  -.00696   .0000484
1 "3M India Ltd." 15851 520 5 2003  .009376593  .019336 .000373887
1 "3M India Ltd." 15848 520 5 2003   .02838536  .009681   .0000937
1 "3M India Ltd." 15847 520 5 2003  .006974769  .003643   .0000133
1 "3M India Ltd." 15846 520 5 2003  .009649982  .003658   .0000134
1 "3M India Ltd." 15845 520 5 2003  .006767674  .012953 .000167783
1 "3M India Ltd." 15844 520 5 2003  .008480898 -.007435   .0000553
1 "3M India Ltd." 15841 520 5 2003  -.02464387  .015005 .000225152
1 "3M India Ltd." 15840 520 5 2003  .029448276  .013856 .000191978
1 "3M India Ltd." 15839 520 5 2003 -.010313845  .010457 .000109344
1 "3M India Ltd." 15838 520 5 2003  .015578233  .008679   .0000753
1 "3M India Ltd." 15837 520 5 2003 -.001125183  .007556   .0000571
1 "3M India Ltd." 15834 520 5 2003 -.008068571 -.001759   3.09e-06
1 "3M India Ltd." 15833 520 5 2003  .006875818  -.00786   .0000618
1 "3M India Ltd." 15832 520 5 2003 -.000124182 -.000197   3.88e-08
1 "3M India Ltd." 15831 520 5 2003 -.018608289  .006877   .0000473
1 "3M India Ltd." 15830 520 5 2003   .02503203  .012248  .00015002
1 "3M India Ltd." 15827 520 5 2003   .04601726  .007976   .0000636
1 "3M India Ltd." 15825 519 4 2003  .036240544  .004647   .0000216
1 "3M India Ltd." 15824 519 4 2003   .03760189  .000261   6.83e-08
1 "3M India Ltd." 15823 519 4 2003  .003653921  .006984   .0000488
1 "3M India Ltd." 15820 519 4 2003 -.017259795 -.004785   .0000229
1 "3M India Ltd." 15819 519 4 2003 -.033946905 -.002703   7.31e-06
1 "3M India Ltd." 15818 519 4 2003  .010775134 -.009299   .0000865
1 "3M India Ltd." 15817 519 4 2003   .06061496 -.002435   5.93e-06
1 "3M India Ltd." 15816 519 4 2003  .004707749  .008126    .000066
1 "3M India Ltd." 15812 519 4 2003  -.02267936 -.013345  .00017808
1 "3M India Ltd." 15811 519 4 2003  .016032182  .005756   .0000331
1 "3M India Ltd." 15810 519 4 2003   .06006362  .002192   4.81e-06
1 "3M India Ltd." 15806 519 4 2003   .02069287 -.007982   .0000637
1 "3M India Ltd." 15805 519 4 2003  -.05525858 -.033738 .001138256
1 "3M India Ltd." 15804 519 4 2003  -.02321739 -.009597   .0000921
1 "3M India Ltd." 15803 519 4 2003  .003712956 -.008123    .000066
1 "3M India Ltd." 15802 519 4 2003   .03585195  .015533 .000241275
1 "3M India Ltd." 15799 519 4 2003   .03719635  .009709   .0000943
1 "3M India Ltd." 15798 519 4 2003  .014588794  .010645 .000113306
1 "3M India Ltd." 15797 519 4 2003  .011566542  .013855 .000191964
1 "3M India Ltd." 15796 519 4 2003  -.02304087  .009605   .0000923
1 "3M India Ltd." 15795 518 3 2003  -.07255269 -.021219 .000450234
1 "3M India Ltd." 15792 518 3 2003   .03068691 -.001764   3.11e-06
1 "3M India Ltd." 15791 518 3 2003  .008472209 -.005611   .0000315
1 "3M India Ltd." 15790 518 3 2003 -.003758863 -.000948   8.98e-07
1 "3M India Ltd." 15789 518 3 2003 -.000158863 -.001227   1.51e-06
1 "3M India Ltd." 15788 518 3 2003 -.016294243 -.019509 .000380588
1 "3M India Ltd." 15786 518 3 2003  .005381303  .005288    .000028
1 "3M India Ltd." 15785 518 3 2003 -.013049488  .004289   .0000184
1 "3M India Ltd." 15784 518 3 2003  .023841137  .015823  .00025036
1 "3M India Ltd." 15783 518 3 2003 -.019766705  .006066   .0000368
1 "3M India Ltd." 15781 518 3 2003           . -.007097   .0000504
1 "3M India Ltd." 15777 518 3 2003           . -.001023   1.05e-06
1 "3M India Ltd." 15776 518 3 2003  -.00558522 -.009575   .0000917
1 "3M India Ltd." 15775 518 3 2003  -.01541847  .005431   .0000295
1 "3M India Ltd." 15774 518 3 2003  .007541014 -.009745    .000095
1 "3M India Ltd." 15771 518 3 2003  -.03000204 -.014516 .000210729
1 "3M India Ltd." 15770 518 3 2003   .02274947 -.011087 .000122929
1 "3M India Ltd." 15769 518 3 2003 -.013333968 -.005572   .0000311
1 "3M India Ltd." 15768 518 3 2003   .02991119 -.011461 .000131357
1 "3M India Ltd." 15767 518 3 2003  -.06965129  .000489   2.39e-07
1 "3M India Ltd." 15764 517 2 2003 -.004470147  .001888   3.56e-06
1 "3M India Ltd." 15763 517 2 2003   -.0729779  .008887    .000079
1 "3M India Ltd." 15762 517 2 2003 -.004800923 -.006453   .0000416
1 "3M India Ltd." 15761 517 2 2003  .002670975 -.011124 .000123735
1 "3M India Ltd." 15760 517 2 2003 -.010527073   .00276   7.62e-06
1 "3M India Ltd." 15757 517 2 2003  .012343297  .000805   6.48e-07
1 "3M India Ltd." 15756 517 2 2003 -.000156703 -.000028   7.77e-10
1 "3M India Ltd." 15755 517 2 2003  .006554706  .005188   .0000269
1 "3M India Ltd." 15754 517 2 2003  .023021424  .002122   4.50e-06
1 "3M India Ltd." 15753 517 2 2003  .003809609  .021327  .00045486
1 "3M India Ltd." 15750 517 2 2003 -.023386864 -.009117   .0000831
1 "3M India Ltd." 15748 517 2 2003   .00662512  -.00495   .0000245
1 "3M India Ltd." 15747 517 2 2003   .00925954 -.000978   9.56e-07
1 "3M India Ltd." 15746 517 2 2003  -.01114852 -.008573   .0000735
1 "3M India Ltd." 15743 517 2 2003  .006835646 -.006041   .0000365
1 "3M India Ltd." 15742 517 2 2003  -.00540445  .011588 .000134277
1 "3M India Ltd." 15741 517 2 2003 -.007051407 -.004897    .000024
1 "3M India Ltd." 15740 517 2 2003  .008163984  .000116   1.34e-08
end
format %td date
format %tm mdate
.

Likehood ratio test showing error

Dear colleagues, can someone kindly assist me in identifying problem while you likelihood ratio test in this data.Am getting the error message "df(unrestricted) = df(restricted) = 3".
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int yoa double midyrpop str3 spn_serotype byte n str12 type double(time vaxera postslope) byte(_est_a _est_b)
1999              33620 "13" 0 "non-vaccine"  1 0 0 1 1
2000              34505 "13" 1 "non-vaccine"  2 0 0 1 1
2001              35411 "13" 1 "non-vaccine"  3 0 0 1 1
2002              36340 "13" 0 "non-vaccine"  4 0 0 1 1
2003              39747 "13" 0 "non-vaccine"  5 0 0 1 1
2004              41985 "13" 0 "non-vaccine"  6 0 0 1 1
2005              42738 "13" 0 "non-vaccine"  7 0 0 1 1
2006              43916 "13" 0 "non-vaccine"  8 0 0 1 1
2007              44536 "13" 0 "non-vaccine"  9 0 0 1 1
2008              44820 "13" 0 "non-vaccine" 10 0 0 1 1
2009              46343 "13" 0 "non-vaccine" 11 0 0 1 1
2010              47714 "13" 0 "non-vaccine" 12 0 0 1 1
2012  40730.05464480874 "13" 0 "non-vaccine" 13 1 1 1 1
2013 43214.202739726024 "13" 0 "non-vaccine" 14 1 2 1 1
2014              47807 "13" 0 "non-vaccine" 15 1 3 1 1
2015              47921 "13" 0 "non-vaccine" 16 1 4 1 1
2016  42048.96721311475 "13" 0 "non-vaccine" 17 1 5 1 1
2017  21958.98082191781 "13" 0 "non-vaccine" 18 1 6 1 1
2018              46210 "13" 0 "non-vaccine" 19 1 7 1 1
2019              45972 "13" 2 "non-vaccine" 20 1 8 1 1
end
My code is :
tsset time
glm n vaxera time postslope , link(log) family(nbinomial) vce(hac nwest 2) exp(midyrpop) eform
estimate store a
glm n vaxera time , link(log) family(nbinomial) vce(hac nwest 2) exp(midyrpop) eform
estimate store b
lrtest a b

DSGE model relating-questions

Christiano,2017, abstract stated that
Macroeconomic policy questions involve trade-offs between competing forces in the economy. The problem is how to assess the strength of those forces for the particular policy question at hand. One strategy is to perform experiments on actual economies. Unfortunately, this strategy is not available to social scientists. The only place that we can do experiments is in dynamic stochastic general equilibrium (DSGE) models
I am quite new to this model, so I have some questions as below:

1> What does "competing forces in the economy" mean? I googled but there is no result so far

2> Why we can do experiments is in dynamic stochastic general equilibrium (DSGE) models ?

I did a search about DSGE model but I did not fully intuitively understand it from this description.

Combine two dataset and matching

Hello Everyone,

I have a problem with two different datasets with not the exact name, which makes it hard to match. For example, the first data has the name of the bank Chace bank; the second data is the name of the bank Chase BK.

My question is how can I match it with not the exact name? and also it has to match to the same country

Creating spine variables for regression

I am hoping someone can help me.
I have a time variable coded as months from Jan 2016 to Feb 2020
I am hoping to fit some splines to my regression model to analyse trends over time
For this I need two variables, one is "before" and one is "after", which reflect time before and after the implementation of a policy
I need "before" to run as 0, 1, 2,3,4 etc upto March 2019, and "after" to run 0,1,2,3,4,5 from April 2019
Does anyone know some quick code which can help with this?

Thank you

Fuzzy Regression Discontinuity estimates and plots do not match

I'm using the ssc packages rdrobust and rdplot to estimate and plot the fuzzy RD estimates of a regression of expenditure on population, used as running variable. Basically, the estimate obtained through rdrobust is positive, while when I plot the estimate from the apparently same regression using rdplot the jump is negative. I am using the same options as much as possible as the same kernel estimation method, same bandwidth, etc. However, the results do not match, which is weird given that the two commands come from the same package.

Do you have any idea why is this the case? I am also willing to use different packages or commands if you know some that work better for my case.

Test with panel data

Hi,

This is my first post in this forum, hoping to get some help. I am doing a test on a panel data from year 2000-2020 which should be organised in 2 categorise as 2000-2010 and 2011-2020. I want to see if my data had variations in these 2 categories. I have done regression with data from 2000-2010 and 2011-2020 and had do that in two separate stata softwares. Is there any other way i could do that, using the same software with the same dataset?

Thanks

Categorical variable as an instrument

Hello,

We are running a model with a binary endogenous X variable and a continuous Y variable. We potentially want to instrument our X variable with an unordered categorical variable Z. The model runs fine on Stata and the first stage F-stats are also well above 10; so the instrument fairs well.

Is there any problem with using a categorical variable as an instrument? Since its an unordered categorical variable, we are not sure how to interpret the first stage coefficients of the impact of Z on X.

We would really appreciate any advice on this. Thank you.

Cluster standard errors in SUR estimation - sureg or gsem?

Hi,

I am trying to estimate vote shares of different parties. So, I have 3 parties, each having its own column in the data set. Hence, the sum of vote shares is 1. So, I can estimate only two equations, because otherwise I will get a singular matrix.

So, I have a system of 3 equations and I am using --sureg-- to estimate it.

Code:
sureg (voteshare_party1 voteshare_party2 = X_variables i.fixedeffectvariable1), isure
However, I want to cluster the standard errors at the area level. But I understand from the previous posts on this forum that this is not possible. If this is possible, then please let me know how should I proceed with it.

So, from suggestions on the forum, I used --gsem-- command (cannot use --sem-- because I need to add fixed effects), and my code now is:

Code:
gsem (voteshare_party1 <- X_variables i.fixedeffectvariable1) (voteshare_party2 <- X_variables i.fixedeffectvariable1), covstruct(e.voteshare_party1 e.voteshare_party2, unstructured) nocapslatent vce(cluster Area_level_variable)
I had to use -nocapslatent- because I did not want to include any other variable, apart from those mentioned in the regression in estimation.

However, this kept on running and showing the same likelihood value and (not concave), and so, I had to use iterate(20). Does stopping the iteration in-between give correct results?

My questions:
  1. Can we cluster standard errors in --sureg--?
  2. Will --gsem-- be a more efficient way to estimate the equations in my case?
  3. Does stopping the iteration in-between give correct results?
I am relatively new to Stata, and so, I would really really appreciate your guidance on this.
I am happy to share more context if I haven't explained myself clearly.

Thank You

Friday, July 30, 2021

ttesti: using se as input instead of sd

Hi everyone,
I am trying to calculate the independent t-test using ttesti which requires mean, sd, and sample size. Given that I do not have sd but semean, I need to convert the semean to sd using semean*(10^1.5) formula. My question is how can I specify the multiplication noted above within the ttesti command. I have tried the below code but I got an error: "'31.41644 0.181502' found where number expected". Obviously, one non-efficient solution could be using the display command to obtain the result for the multiplication, but I wonder if there is any way to do the job within the ttesti command itself.
Thanks,
NM

Code:
ttesti 1000 31.41644 0.181502*(10^1.5) 1000 28.6431 0.3995219*(10^1.5)

do we have simpler syntax to realize this goal?

Code:
if lnEmployee1Y ~=. & lnCash1Y~=. & lnTtlAsset1Y~=. & stdRevenue1Y~=. & stdNetProfit1Y~=. & lnRDExp1Y~=.
Thanks so much!

PPML and fixed effects commands

Good evening, guys. I'm currently working on a gravity model with PPML estimator. Here is the thread: I'm using a book from Yotov et al that has a formula for fixed effects. I'm only using exporter fixed effects and getting the same results for all the variables (sum equals 100). Can someone please tell if its correct?

the commands for PPML: ppml exp area dist population gdp gdpc mercosul landlock border

commands for fixed effects: egen exp_time = group (exp year) ; tabulate exp_time, generate (EX_FE)

this is part of the result:

832 | 1 0.04 30.78
833 | 1 0.04 30.82
834 | 1 0.04 30.85
835 | 1 0.04 30.89
836 | 1 0.04 30.92
837 | 1 0.04 30.96
838 | 1 0.04 30.99
839 | 1 0.04 31.03
840 | 1 0.04 31.07
841 | 1 0.04 31.10
842 | 1 0.04 31.14
843 | 1 0.04 31.17
844 | 1 0.04 31.21
845 | 1 0.04 31.25
846 | 1 0.04 31.28
847 | 1 0.04 31.32

the sum equals 100. Shouldn't the fixed effects be different for each year? Tks in advance!

reshape error

Hi, I am trying to reshape from wide to long, but it keeps saying that "variable id does not uniquely identify the observations". I am not sure why the variable ID is not uniquely identified as I generated ID using the code:

Code:
gen seq = _n
In full, this is what I coded and the error:
Code:
gen seq = _n
foreach v of varlist exalus-exvzus {
rename `v' u_`v'
}

reshape long u_, i(seq) j(country_pair_exchange) string
(note: j = exalus exauus exbeus exbzus excaus exchus exdnus execus exeuus exfnus exfrus exgeus exgrus exhkus exinus exirus exitus exjpus exkous exmaus exmxus exneus exnous exnzus expous exsdus exsfus exsius exslus exspus exszus extaus exthus exukus exusal exusec exuseu exusir exusnz exusuk exvzus)

variable id does not uniquely identify the observations
Your data are currently wide.  You are performing a reshape long. You
specified i(seq) and j(country_pair_exchange).  In the current wide form,
variable seq should uniquely identify the observations.  Remember this picture:

long                                wide

        +---------------+                   +------------------+

        | i   j   a   b |                   | i   a1 a2  b1 b2 |

        |---------------| <--- reshape ---> |------------------|

        | 1   1   1   2 |                   | 1   1   3   2  4 |

        | 1   2   3   4 |                   | 2   5   7   6  8 |

        | 2   1   5   6 |                   +------------------+

        | 2   2   7   8 |

        +---------------+

    Type reshape error for a list of the problem observations.

r(9);
The below are the data:
Code:
* Example generated by -dataex-. To install: ssc install dataex

clear

input double date float year str32 hcomnam double(ret sprtrn u_exusal u_exalus u_exbzus u_excaus u_exchus u_exdnus u_exhkus u_exinus u_exjpus)

14612 2000 "SELECT SECTOR SPDR TRUST"          -.019221041351556778 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CBIZ INC"                          -.014814814552664757 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "P G & E CORP"                       -.03658536449074745 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "E C I TELECOM LTD"                 -.003952569328248501 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ISHARES INC"                         .01937984488904476 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "P P L CORP"                        -.030054645612835884 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "TARRAGON CORP NEV"                 -.012195121496915817 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "VICOR CORPORATION"                 -.027006173506379128 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "D R S TECHNOLOGIES INC"             -.03896103799343109 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "PLAINS RESOURCES INC"               -.05999999865889549 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "WEST COAST BANCORP ORE NEW"         .004629629664123058 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "NATIONAL BANKSHARES INC"            .012658228166401386 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "AMERICAN WATER WORKS INC"            -.0117647061124444 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CALIFORNIA WATER SERVICE GROUP"     -.07422680407762527 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "BANK ONE CORP"                               -.02734375 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "AMWEST INSURANCE GROUP INC"                           0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "IVOW INC"                            .17499999701976776 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MODTECH HOLDINGS INC"                .02083333395421505 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "IMPATH INC"                         .007371007464826107 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MOORE WALLACE INC"                                    0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "PRIVATE MEDIA GROUP INC"              .0347222238779068 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "HEARTPORT INC"                       .07894736528396606 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "STRATEGIC GLOBAL INCOME FUND INC"                .03125 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CENTER TRUST INC"                   -.05806451663374901 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "AVIGEN INC"                       -.0030241934582591057 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "JO ANN STORES INC"                                    0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "STONEX GROUP INC"                  -.024390242993831635 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "TECH DATA CORP"                      -.0276497695595026 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "P C C GROUP INC"                     .23076923191547394 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "PLEXUS CORP"                       -.017045455053448677 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "STATEWIDE FINANCIAL CORP"          -.024752475321292877 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MORGAN STANLEY TRUSTS"                                0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "SOUTHDOWN INC"                      -.04116222634911537 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "R P M INTERNATIONAL INC"           -.030674846842885017 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MCRAE INDUSTRIES INC"                                 0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "RENAISSANCE LEARNING INC"           .044692736119031906 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "POTTERS FINANCIAL CORP"              .05263157933950424 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "KRATOS DEFENSE & SECUR SOLS INC"     .03295128792524338 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "PACIFIC GATEWAY PPTYS INC MD"                         0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "INTEGRITY MEDIA INC"                 .03999999910593033 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "SELECT SECTOR SPDR TRUST"          -.012211669236421585 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ABERDEEN ASIA PACIFIC INCOME FD"     .02469135820865631 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "THERMO INSTRUMENT SYSTEMS INC"                        0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "FIRST REGIONAL BANCORP"                               0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MORTON INDUSTRIAL GROUP INC"                          0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "USURF AMERICA INC"                   .10294117778539658 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ARCH CAPITAL GROUP LTD NEW"          -.0891089141368866 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CAPSULE COMMUNICATIONS INC"          .02500000037252903 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "FEDDERS CORP"                       .022727273404598236 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "M C N ENERGY GROUP INC"            -.021052632480859756 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "FIELDWORKS INCORPORATED"                              0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "C T O REALTY GROWTH INC"           -.019607843831181526 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "EMVELCO CORP"                         .0810810774564743 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "L V M H MOET HENNESSY VUITTON"     -.012362637557089329 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "C B L & ASSOCIATES PPTYS INC"     -.0060606058686971664 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "BANKUNITED FINANCIAL CORP"          -.10236220806837082 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "NASHUA CORP"                         .01666666753590107 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "AMERITRANS CAPITAL CORP"            -.05511811003088951 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "THORATEC CORP"                      -.06410256773233414 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "JOURNAL REGISTER CO"                -.04048582911491394 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "NORTHERN TRUST CORP"                -.04775943234562874 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "STOLT NIELSEN S A"                  .035087719559669495 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CAROLINA SOUTHERN BK SPARTANBURG"    .03365384787321091 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "WORLD FUEL SERVICES CORP"                             0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "HOME STAKE OIL & GAS CO"                              0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CAPTEC NET LEASE REALTY INC"         .03333333507180214 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ACMAT CORP"                                           0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "BLACKROCK MUNIHLDGS NJ QLTY FD"     .005434782709926367 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CADILLAC FAIRVIEW CORP"             .010869565419852734 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ATRION CORP"                                          0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "T J INTERNATIONAL INC"            -.0007440476329065859 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "SYNTHETECH INC"                      .05000000074505806 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "D I S C INC"                        -.03030303120613098 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CONCERO INC"                       .0031347961630672216 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "TRIPOS INC"                         -.03804347664117813 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MARKETING SPECIALISTS CORP"                           0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "KOLLMORGEN CORP"                   -.025380710139870644 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "A D C TELECOMMUNICATIONS INC"      -.000861326465383172 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "INTERLEAF INC"                      -.00929368007928133 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "EVANS & SUTHERLAND COMPUTER CORP"    .08743169158697128 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "CINTAS CORP"                       -.016470588743686676 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "AZUREL LTD"                          .06060606241226196 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "COHERENT INC"                       -.02570093423128128 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "T ROWE PRICE GROUP INC"            -.049069374799728394 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "E D P ENERGIAS DE PORTUGAL S A"     -.00358422938734293 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "INSIGHT HEALTH SERVICES CORP"         .1428571492433548 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "RELIANCE GROUP HOLDINGS INC"         .06603773683309555 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "RICHARDSON ELECTRONICS LTD"         -.03333333507180214 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ENGLOBAL CORP"                        .0714285746216774 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "BANK GRANITE CORP"                  -.06104651093482971 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "EASTERN ENTERPRISES"               -.007616974879056215 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ZIEGLER COMPANY INC"               -.012552301399409771 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "MERCHANTS NEW YORK BANCORP INC"    -.007299270015209913 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "PENN AMERICA GROUP INC"                               0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "PAIRGAIN TECHNOLOGIES INC"          -.04845815151929855 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "BROOKDALE LIVING COMM INC"          -.04040404036641121 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "D V I INC"                          -.02469135820865631 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "LIBERTY BANCORP INC NJ"                               0 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "ERICSSON"                            .03615604341030121 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

14612 2000 "COMVERSE TECHNOLOGY INC"             .06347150355577469 -.009549090000000001 .6591 1.5172204521316948 1.805 1.4465 8.2798 7.329 7.7765 43.55 101.7

end

format %td date
I typed in reshape error for the list of problem observations and it says "16928310 of 37899830 observations have duplicate i values". However, even after dropping the duplicates, it is still not working.

I am trying to achieve an output file with firm-year-country(host)-exchange_rates_coefficient-level and I was recommended to reshape the data from wide to long first. Am I writing the -reshape- command wrongly?

Exact matching on Propensity Score Matching

Dear Stata Forum,

I’m trying to perform a Propensity Score Matching with exact matching but found most examples on years using double, which is not my case.
I have a dataset of 2345 children, I want to do exact matching on a series of cofounders X based on binary variable: Gender.
after that I want to do the propensity score matching on the same cofounders X based on Treatment.
Can anyone help me with the code?
Just a plain sample, for just PSM it should be
logit treat X
predict ps

or
pscore treat X, pscore (ps)

psmatch2 treat, out (results) pscore (ps) common

but how to add the binary gender as the criteria for exact matching is really confusing me.


Also how can obtain the pstest results for females and males separately? Does it allow for “if” command?

thank you all very much!

Converting Stata version 16 to Stata version 12 with Stat/Transfer in Linux

Dear Forum Members,

I have some Stata version 16 data files, which I would like to convert to Stata version 12 files. I know that I can use the saveold command in Stata 16 to do this, but our license has expired. I try to use Stat/Transfer in the Linux to do this, but I don't know the specific commands for converting Stata 16 to Stata 12. Your advice is greatly appreciated. Thanks.

post and postfile help after regression

I am running a simple regression repeated for ~1,000 variables in a loop. I would like to capture the beta coefficients, se, and p values for my exposure and 3 adjustment variables and output them to a csv file. I have used the following code for mixed models and cannot figure out how to alter it to use with this simple model. I get the following error code: equation xylose not found
post: above message corresponds to expression 2, variable ratio

I am using v16.1. Below are my sample data and code.

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input int subjectid byte(timepoint sex age) float(aibwkg ratio xylose xylitol valine)
20001 3 1 39 74.58166 .09547353   -.7932054 -1.0131693   4.88012
20001 2 1 39 74.58166 .09547353   .57306993  -.8202816 4.3152866
20001 1 1 39 74.58166 .09547353  -.50978696 -1.1714463  4.004311
20002 3 1 35 74.30232 .06258648    1.187921 -1.0057454 4.0414968
20002 2 1 35 74.30232 .06258648  -.57098293  -.9478416   3.50136
20002 1 1 35 74.30232 .06258648  -.54130113  -.7070154  4.680255
20003 3 0 42 71.09388   .028846     -.87649  .57306993   5.20874
20003 2 0 42 71.09388   .028846      .29774     .27476   4.66839
20003 1 0 42 71.09388   .028846     -.39757    -.36489  4.004348
20004 3 0 36  59.1862 .04826844   -.4515804 -.06558012  5.631982
20004 2 0 36  59.1862 .04826844  -.50320864  -.7149013  4.967944
20004 1 0 36  59.1862 .04826844    -.588939   -.788257  5.110322
20005 3 1 51 58.46099 .04752519   -.9756348 -1.1163021  5.163349
20005 2 1 51 58.46099 .04752519   -.6698709  -.8362016  4.571539
20005 1 1 51 58.46099 .04752519   -.8011949 -1.2734073  5.349926
20006 3 1 19 76.56038 .05422749   -.1484685 -1.1551343  3.649079
20006 2 1 19 76.56038 .05422749   -.7596757  -1.080953  5.256241
20006 1 1 19 76.56038 .05422749 -.002501656 -1.0889778  3.596983
end
Code:
capture postutil clear
tempfile metabolitebresults
postfile handle str32 Metabolite float ratio se_ratio ///
    age se_age sex se_sex aibwkg se_aibwkg using `metabolitebresults'

foreach var of varlist xylose - valine {

    reg `var' ratio age sex aibwkg if timepoint==1
    
    local topost ("`var'")
    foreach x in ratio age sex aibwkg  {
        local topost `topost' (_b[`var':`x']) (_se[`var':`x'])
    }
    
    // POST IT
    post handle `topost'
}

postclose handle

use `metaboliteresults', clear

foreach v of varlist ratio age sex aibwkg {
   gen t_`v' = `v'/se_`v'
   gen p_`v' = 2*normal(-abs(t_`v'))
   order t_`v' p_`v', after(se_`v')
}
export delimited using my_metaboliteresults, replace

Question about "spxtregress, re sarpanel" and serial correlation

Recently saw this tweet from Jeffrey Wooldridge and it made me question my understanding of how spxtregress,resarpanel models are capable of. Wooldridge basically said that Stata's in-house spatial model package is completely incapable of properly fitting a regression model to data that are plagued by serial correlation.

Does the sarpanel estimator for Stata's spxtregress command not account for time-wise autocorrelation at all?

If not, is there anything we can use instead of the spxtregress command or in conjunction with the spxtregress command?

My understanding was that this particular specification of Stata's in-house spatial panel data models was capable of handling serial correlation within panels because the Stata documentation states the "sarpanel estimator was originally developed by Kapoor, Kelejian, and Prucha (2007)." Also, Kapoor, Kelejian, and Prucha stated in that cited paper that the assumptions underlying their estimator “imply that the model disturbances are…both spatially and time-wise autocorrelated.”

I'm aware of xsmle but is there anything I can use that has more flexibility than xsmle or can be used in conjunction with spxtregress?

I'm fairly invested in how much time I've spent diving into how spxtregress works and would like to be able to still use it if possible. I don't really have time to learn a whole new suite of commands and I need to get this study done.

Dropping unnecessary variable from BRFSS data to save space

Is there any efficient way to keep the variables I need and drop all the unnecessary variables ? In BRFSS , I need to clean year by year and it hardly has an consistency over the years. So any suggestion to remove the unnecessary variables will help me to save the space.

Thanks in advance !

fe and first-difference estimator with controls and weight in baseline year

I have been learning from Statalist for some years now. Thank you for being there for us! First time I ask a question as I normally find solutions from reading other posts.

I have a wide dataset (40 observations) and an unbalanced panel (40 countries x 17 years observations)

My dependent variables (EMP1) are employment shares multiplied by 100
My main regressor (tech) is tech imports per million USD of total intermediates

Using my WIDE DATASET, I estimated long-differences using a set of controls (C1... C6) in years t-1 as all of them can be outcomes of the main regressor
So, I run regressions of the form:

Code:
 reg EMP1 tech(t1,t+15) C11995 C21995 C31995 C41995 [aweight=W1995], r
Then, I run regressions of the form

Code:
 ivreg EMP1 C11995 C21995 C31995 C41995 (tech=Z) [aweight=W1995], r
EMP1 = EMPt+15 - EMPt
tech = sum of tech from t up to t+15
using weight in year t-1 (W1995)

I am struggling to run first-difference regressions and fixed-effects regressions with the same variables in the panel dataset
I have data for my DV and controls from 1995 to 2011, and for my main regressor from 1996 to 2011
I do not manage to tell Stata to use the set of controls and the weight in the baseline year 1995

My panel dataset has the following variables, with observations for 40 countries and 17 years

Code:
              storage   display    value
variable name   type    format     label      variable label
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
country         str30   %30s                  Country name
year            int     %10.0g                
tech            float   %9.0g                 tech imports per million USD of total intermediates
AE              float   %9.0g                 Advanced Economy
EE              float   %9.0g                 Emerging Economy
C1              float   %9.0g                 offshoring share imported/total intermediates
C2              double  %10.0g                Exported intermediates in thousand USD
C3              float   %9.0g                 Output per capita in millions 2011 USD
C4              float   %9.0g                 Mean of population with full college
C5              float   %9.0g                 population 55-79 over population 20-54
C6              double  %10.0g                dependency ratio
W               double  %10.0g                Manufacturing value added
EMP1            float   %9.0g                 100 * shareKI/sharenFAB workers
EMP2            float   %9.0g                 100 * shareKI workers
EMP3            float   %9.0g                 100 * sharenFAB workers
EMP4            float   %9.0g                 share MGT workers
EMP5            float   %9.0g                 share RD workers
EMP6            float   %9.0g                 100 * shareKI/sharebFAB workers
EMP7            float   %9.0g                 100 * sharebFAB workers
country_id      float   %14.0g     country_id
                                              group(country)
I want to compare the estimates from using fixed-effects and first-difference estimators.
Additionally, I want to see the changing effect of tech on the EMP1, for which I want to run first-difference regressions using 5 year intervals
Array


I have checked xtreg and areg, but I do not know how to tell Stata to use my controls (C1...C6) and the weight W using year 1995 values, then I was suggested to use areg.
I would be grateful for any help as I am stuck here.
David

How to force vertical lines to be above of all horizontal lines using twoway

Dear all,

How do I force the gray area to be above the yellow areas in the graph below?

Code:
clear
set obs 100
gene x = runiform()
gene y = runiform()
twoway (scatter x y), xline(0.5(0.001)0.6, lcolor(black*0.25))  yline(1(0.001)2 0.5(0.001)0.6, lcolor(gold*0.40))
I have been searching for similar questions, but the closest one was this one:

https://stackoverflow.com/questions/...ots-in-a-graph

Any suggestions? Perhaps more efficient ways to create the shaded area?

All the best,

Tiago

Drop if in range

Hello,

how can I drop a group if there is a missing value in any variable per group. In the example blow, my group is identified by "id". So, I would like to delete all 1, because there is one missing value for this group. Another condition is that it should only be excluded if the missing value for price is within the range (x,y) of var5.

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input str10 date float(Price dividend id var5)
"01/01/2006" 5 . 1 -3
"02/01/2006" 3 . 1 -2
"03/01/2006" . 3 1 -1
"04/01/2006" 5 . 1  0
"05/01/2006" 2 . 1  1
"01/01/2006" 5 . 2 -2
"02/01/2006" 4 . 2 -1
"03/01/2006" 2 . 2  0
"04/01/2006" 1 2 2  1
"05/01/2006" 4 . 2  2
end
Thank you!

Marginal effects at representative values

Dear all,

I am working with Stata's margins command and have some difficulties to interpret the meaning of "marginal effects at representative values".
If I understand it correctly, I specify or fix a covariate at certain values and the others remain as observed.
Stata returns the marginal effects for a chosen variable. Is this marginal effect an average marginal effect or how can I interpret? So the rate of difference in the probability of y, average over all observations, given my fixed representative values in one variable and the others as observed?

Thanks a lot for your help

Daniel








Generating sequence of observations.

Hi, I have the "have" variable and based on that "have" variable, I want to generate the two variables, i.e., "want1" & "want2".

Please see the example below.


Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double(id have want1 want2)
1 0 5 0
1 0 4 0
1 0 3 0
1 0 2 0
1 1 1 1
1 0 0 2
1 0 0 3
1 0 0 4
1 0 0 5
1 0 0 6
2 0 3 0
2 0 2 0
2 1 1 1
2 0 0 2
2 0 0 3
2 0 0 4
2 0 0 5
2 0 0 6
end
the two "want" variables are basically sequence of numbers that are based on value of "have" variable.

Thanks.

NA's in sureg()

Hi,

How are NAs in the dependent variable column dealt for in sureg()?
I want to estimate a system of 4 equations using sureg(), where all the Y-variables have some NA values, and they sum to 1 approximately.

Thank you so much in advance!

r(198) invalid varname

Hello , I do not know why stata tell me "r(198)invalid varname. " I check it for several times. I hope I recieve your assistance so soon . thankk you so much to all.

Code:
foreach var of varlist lnemployment_total{ 


egen lnemployment_total_mean = mean(lnemployment_total )
egen lnemployment_total_sd = sd(lnemployment_total )
bysort ifscode (year): gen lnemployment_total_norm = (lnemployment_total - lnemployment_total_mean)/lnemployment_total_sd
bysort ifscode (year): gen lnemployment_total_low = exp(-3.5*lnemployment_total_norm)/(1+exp(-3.5*lnemployment_total_norm))
bysort ifscode (year): gen lnemployment_total_high = 1 - lnemployment_total_low

*to allow exact comparison with baseline sample - similar results also without replacing
replace lnemployment_total_high=0.5 if lnemployment_total_high==.
replace lnemployment_total_low=0.5 if lnemployment_total_low==.

bysort ifscode (year): gen lnemployment_total_h_dum_recession_sm=lnemployment_total_high*dum_recession
bysort ifscode (year): gen lnemployment_total_l_dum_recession_sm=lnemployment_total_low*dum_recession

bysort ifscode (year): gen ferlmemp_lp_h_dum_recession_sm=lnemployment_total_high*lnfertility
bysort ifscode (year): gen ferlmemp_lp_l_dum_recession_sm=lnemployment_total_low*lnfertility
}
Code:
lnemployment_total_h_dum_recession_sm invalid varname
r(198);
thank you so much.

How to generate the column before a column

Sorry if this has been asked before
I have dataset containing var1 and var2.
I want to generate a new variable(var3), such that it matches the column before a specific column (var2, in the following case),where var3=var1
var1 var2 var3
123 AB 123
124 BB 124
Thanks in advance

Forloop: too many variables specified r(103) error

Hello,

I've been around for a year on this community, but this is my first post. I am running a forloop, and it gives me this error: too many variables specified r(103);.

Here is my code:

Code:
levelsof uscity if us == 1, local(uscitylist)
foreach city of local uscitylist {
    g `city'_NS = .
    replace `city'_NS = .
}
There are 9 items in local uscitylist. The code works for the first 5 items, and then gives the error.

How can I make the forloop work?


Thank you


Repeatedly obtain values after a certain condition holds and ignore observations inbetween

Hi all! I would really appreciate your help with the following:

I have obtained buy signals from stock data close prices for every 5 minutes. If a buy signal is triggered, I want to retrieve a sell signal 4 periods later and ignore the signals in between (if at t=4 the signal is "BUY" it needs to be "SELL"). If after the sell signal directly the buy signal still holds, there should be a buy signal again and after 4 periods a sell signal etc. If directly after the buy signal there is (e.g. for a few periods) no buy signal I would want the next buy signal and then 4 periods after the sell signal etc. I have tried using the generate/replace commands, but that sadly doesn't work all the time (still some errors). Is there another way to do this? In addition, if I would want the buy signal to hold for 3 periods before the buy signal is triggered, how does that change the code?

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double CLOSE str3 buy_signal
4346.03 ""
4344.01 ""
4360 "BUY"
4377.84 "BUY"
4377.59 "BUY"
4340.01 ""
4345.31 "BUY"
4354.9 "BUY"
4340 ""
4340 ""
4340 ""
4345.95 "BUY"
4337 ""
4320.07 ""
4344.95 "BUY"
4320.03 ""
4320.01 ""
4320 ""
4318.51 ""
4325 "BUY"
end

Below shows an example how I suppose it would look like with the correct signals (after "//")
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double CLOSE str3 buy_signal
4346.03 ""
4344.01 ""
4360 "BUY" // "BUY"
4377.84 "BUY"
4377.59 "BUY"
4340.01 ""
4345.31 "BUY" // "SELL"
4354.9 "BUY" // "BUY"
4340 ""
4340 ""
4340 ""
4345.95 "BUY" // "SELL"
4337 ""
4320.07 ""
4344.95 "BUY" // "BUY"
4320.03 ""
4320.01 ""
4320 ""
4318.51 "" // "SELL"
4325 "BUY" // "BUY"
end

Thank you in advance!


What is the required sample size to use Panel VAR in Stata?


Dear researchers, please I would like to ask this question: I have a yearly dataset covering 35 countries and with 3 variables. The idea for the 3 variables is to find a tripartite relationship between the three, e.g., whether A=B=C. Similarly, I want to find whether or not A granger causes B, C in that order. I want to know whether my sample size (N=35, T=14) is too small or not to use Panel VAR. I read pieces of literature, but it seems my sample is too small, and I have been thinking about the options to consider. Please, can you tell me the requirement for the usage of Panel VAR concerning the N and T for yearly data? Thank you.

Thursday, July 29, 2021

Calibration plot for logistic regression models

I want to perform a calibration plot for several logistic regression models like this. But I did not find how to do it with Stata.
Thanks in advance
Array

Problem with Merging

Hi,

I am trying two merge two datasets using the key ID (Neighborhood_Code).
However, Im getting the error
" Each key variable -- the variables on which observations are matched -- must be of the same generic type in the master and using datasets. Same generic
type means both numeric or both string."

The key ID is a combination of numbers like '110111110901'.

In the first dataset, the ID format is %18.0g and in the second %12s.
I want both to be stored as string before merging.
How should I do that ?

Obtain very different results using -xtreg- and -margins- / can I store the results from -margins- in a table

When I run the interaction between a dummy (0/1) variable (at3) and a categorical variable (vtype), the results from -xtreg- are very different to those from -margins-. I would like to know:

1) Why are the results from -xtreg- and -margins- so different?

2) is there a way to combine the results from -margins- to create a table using -esttab- or some other command?

The following code only creates a table from estimates obtained from -xtreg-:
Code:
xtreg c.finr i.at3##i.vtype
margins i.faith2##i.at3
estimates store s1

xtreg c.nonfinr i.at3##i.vtype
margins i.faith2##i.at3
estimates store s2

xtreg c.assetr i.at3##i.vtype
margins i.faith2##i.at3
estimates store s3

xtreg c.debt i.at3##i.vtype
margins i.faith2##i.at3
estimates store s4

xtreg c.wealth2r i.at3##i.vtype
margins i.faith2##i.at3
estimates store s5

esttab s1 s2 s3 s4 s5 using interactions.rtf, b(%10.0fc) star(* 0.10 ** 0.05 *** 0.01) varwidth(35) not nogaps label stats(N r2) /// 
compress replace nonumbers title("Interaction effects" ) mtitle("""")
I'm unable to provide an example at the moment but can do later if needed.

Stata v.15.1. Using panel data.

How to group several columns?

I am dealing with data with information of different candidates, the results of each candidate is distributed by class 1, 2, 3, 4
Candidate A Name | Sex | Votes |Candidate B Name | Sex | Votes | Candidate C Name | Sex | Votes | Candidate D Name | Sex | Votes
Class 1
Class 2
Class 3
Class 4
Firstly, I wish to group the columns of the information of a candidate, such that
Group A [ contains Candidate A Name | Sex | Votes of Candidate A]
Group B [ contains Candidate B Name | Sex | Votes of Candidate B]
​​​​​​​Group C [ contains Candidate B Name | Sex | Votes of Candidate C]

After that, I wish to compare the number of votes of each candidate, and to Generate the name of the candidate only with the Highest Result into a new column

For example, if Candidate D obtains highest number of vote, name fo Candaidate D will be generate in a new column

​​​​​​​Thank you in advance

Breusch and Pagan LM test for random effects

Dear community,

I am deciding among Pooled OLS, Fixed and Random Effects panel models in the presence of first-order autocorrelation ( null hypothesis of Wooldridge test for autocorrelation in panel data is not rejected) . Thus, i have found out that the command . xtregar gives reliable estimates in the presence of AR1. So I run the FE and RE tests ( e.g., . xtregar, re) and then the default Hausman test . hausman fixed random. The null hypothesis is rejected, meaning FE is to be chosen.

1) Am i doing everything correctly till now?

An now i want also to exclude Pooled OLS by running the Breusch and Pagan Lagrangian multiplier test for random effects. Usually it is done with the comand of . xttest0 but it returns an error message last estimates not found r(301)

2) Which command/ procedure would be appropriate in order to exclude Pooled OLS?
I want to mention that there is no heteroscedasticity problem, only the autocorrelation is the issue.

Thank you a lot for support!

Kind Regards
Farid

Puzzling Post Hoc Power Calculation

I was trying to show a colleague that post hoc power calculations are nonsensical with the following example:

Code:
. /* Congruent Results */
.
. power twoproportions 0.146, n1(254426) n2(255237) alpha(.05) power(0.8) test(chi2) effect(ratio)

Performing iteration ...

Estimated experimental-group proportion for a two-sample proportions test
Pearson's chi-squared test
H0: p2 = p1  versus  Ha: p2 != p1; p2 > p1

Study parameters:

        alpha =    0.0500
        power =    0.8000
            N =   509,663
           N1 =   254,426
           N2 =   255,237
        N2/N1 =    1.0032
           p1 =    0.1460

Estimated effect size and experimental-group proportion:

        delta =    1.0191  (ratio)
           p2 =    0.1488

. display "MDE (in %)) " 100*(r(p2) - r(p1))/r(p1)
MDE (in %)) 1.9056807

.
. prtesti 254426 0.146 255237 .1488, level(95) // use MDE from above

Two-sample test of proportions                     x: Number of obs =   254426
                                                   y: Number of obs =   255237
------------------------------------------------------------------------------
             |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |       .146      .0007                      .1446279    .1473721
           y |      .1488   .0007044                      .1474193    .1501807
-------------+----------------------------------------------------------------
        diff |     -.0028   .0009931                     -.0047465   -.0008535
             |  under H0:   .0009931    -2.82   0.005
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -2.8193
    H0: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.0024         Pr(|Z| > |z|) = 0.0048          Pr(Z > z) = 0.9976

. display "Estimated ratio = "  100*(r(P2) - r(P1))/r(P1)
Estimated ratio = 1.9178082
This worked exactly as I anticipated.

But then he gave me a counterexample, where a statistically significant difference is smaller than the MDE:

Code:
. /* Incongruent Results From Experiments */

. prtesti 254426 0.146 255237 .148, level(95)

Two-sample test of proportions                     x: Number of obs =   254426
                                                   y: Number of obs =   255237
------------------------------------------------------------------------------
             |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |       .146      .0007                      .1446279    .1473721
           y |       .148   .0007029                      .1466224    .1493776
-------------+----------------------------------------------------------------
        diff |      -.002    .000992                     -.0039443   -.0000557
             |  under H0:    .000992    -2.02   0.044
------------------------------------------------------------------------------
        diff = prop(x) - prop(y)                                  z =  -2.0161
    H0: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.0219         Pr(|Z| > |z|) = 0.0438          Pr(Z > z) = 0.9781

. display "Estimated ratio = "  100*(r(P2) - r(P1))/r(P1) // SS diff < MDE
Estimated ratio = 1.369863
I am having a hard time wrapping my brain around this. Why can I reject the null when it is smaller than the MDE from the power calculation?

Applying the Mundlak approach in 3-level hierarchical model. Which is the correct mean of a level-2 variable?

Hello!

We have a question about implementing the ‘Mundlak’ approach in a multilevel (3-levels) nested hierarchical model. We have employees (level 1), nested within year-cohorts (level 2), nested within firms (level 3).

In terms of data structure, the dependent variable is employee satisfaction (ordinal measure) at employee level (i) over time (t) and across firms (j) (let’s call this Y_itj), noting that we have repeated cross sections with different individuals observed in every period, while as regressors, we are mainly interested in the impact of a firm level time-variant, but employee invariant, variable (let’s call it X_tj). We apply a 3-level ordered profit model (meoprobit in Stata).

We are concerned with endogeneity issues of the X_tj variable, which we hope to (at least partially) resolve by using some form of a Mundlak approach, by including firm specific averages for X_tj, as well as for all other time-varying explanatory variables. The idea is that if the firm-specific averages are added as additional control variables, then the coefficients of the original variables represent the ‘within effect’, i.e. how changing X_tj affects Y_itj (employee satisfaction).

However, we are not sure whether approach 1 or 2 below is more appropriate, because X_tj is a level 2 (firm level) variable.
  1. The firm specific averages of X_tj (as well as other explanatory variables measured at level 2) need to be calculated by averaging over individuals, even though the variable itself is a Level 2 variable (varies only over time for each firm). That is, in Stata: bysort firm_id: egen mean_X= mean(X). As our data set is unbalanced (so the number of observations for each firm varies over time), these means are driven by the time periods with more observations. For example, in a 2-period model, if a company has a lot of employee reviews in t=1 but very few in t=2, the observations in t=1 will dominate this mean.
  1. Alternatively, as the X_tj variable is a level 2 variable, the firm specific averages need to be calculated by averaging over time periods. That is: we first create a tag that is 1 only for the first observation in the sample per firm/year, and then do: bysort firm_id: egen mean_X= mean(X) if tag==1. This gives equal weight to each time period, irrespective of how many employee-level observations we have in that period. For example, although a company has a lot of employee reviews in t=1 and very few in t=2, the firm specific mean will treat the two periods as equally important.
The two means are different, and we are unsure which approach is the correct one (and which mean produces the ‘true’ contextual effect of X_tj on Y_itj). We have been unable to locate in the literature a detailed treatment of the issue for 3-level models (as opposed to 2-level models where the situation is straightforward). Any advice/suggestions on the above would be very much appreciated.

Error exporting dataset to excel using foreach loop

Hello,
I am trying to export data set to excel using foreach loop in STATA 13.1. But the stata throws the error observations must be between 1 and 1048576
r(198);

I have changed the extension for excel from xls to xlsx. Still I am getting the above error.

combining two graphs

Hello,

Is there a way to combine the scatter plot with the fitted line from the regression with an indicator variable?

I am trying to combine something like this:

Code:
twoway scatter var1 var2 || lfit var1 i.var2
where var2 is an indicator variable taking values 1-5. Thank you.

sureg with fixed effects and clustered errors

I want to estimate a system of 4 equations, where the Y-Variables sum to one approximately, and so the error terms are correlated. The data set contains ~180 variables and ~1600 observations. I tried running sureg with i.variable in the specification to add in fixed effects. So there are three issues that I am facing:

1. The co-efficients with systemfit package in R and Stata are different. (They were the same when I ran felm() in R and reghdfe in Stata.) So, I am not able to understand which one is consistent.
2. I want to add one more factor variable, which is correlate with another X-variable.
3. I want to add clustered standard errors to sureg.

Could you please help me with these issues, or point me to some research papers which used SUR with fixed effects?

I am relatively new to data, so any help on this front would be very valuable to me.

I am happy to discuss the data problem in more detail in personal chat.

Thank You

Triple Difference and Kernel-based PSM

Dear all,

I run differeneces-in-differences regressions with a kernel-based PSM adjustement with the command "diff".
I would like to check a heterogenous treatment effect with a triple diff. This, however, does not work with kernel PSM with the diff command.
I have panel data from 2 waves and I use clusters.
Any suggestions for another command?

Thanks
Doro

Import value labels for each variable in a loop in frequency tables generated using putdocx

Hello,

I have been browsing manuals and forums but I have been stuck with this issue and hoping to get some advice. I managed to get frequency tables for 60 categorical variables into a Ms Word document using the code below in Stata 16.1. I end up with a Word document with 60 tables, each with three columns (Categories, Frequency, Percent) and the number of rows equivalent to the levels of the categorical variable plus one for the totals. However, I cannot figure out how to import the value labels for each variable rather than the numeric codes:

Code:
putdocx clear
            putdocx begin, pagesize(A4) font("Times New Roman", 11, black)
            putdocx paragraph, font("", 11) halign (right) 
            putdocx text (c(current_date)), bold    

            foreach varname of varlist {
                sum `varname'
                scalar num_`varname' = r(N)
                scalar mean_`varname' = r(mean)
                tab `varname', matcell(cell) matrow(row)
                matrix percent = (cell/r(N))*100
                matrix tab_`varname' = (row, cell, percent)
                matrix total = (.,r(N),100)
                matrix tab_`varname' = tab_`varname'\total
                local varlabel : variable label `varname'
                matrix colnames tab_`varname' = Categories Frequency Percent
                putdocx paragraph, font ("Times New Roman", 12, black) halign (left)
                putdocx text ("`varlabel'"), bold
                putdocx table oneway = matrix(tab_`varname'), colnames nformat(%9.2f)
            }
            putdocx save "Title", replace
I have tried adding lines:

Code:
    local valuelabel : value label `varname'
                matrix rownames tab_`varname' = "`valuelabel'"
                putdocx text ("`varlabel'"), bold
                putdocx table oneway = matrix(tab_`varname'), colnames rownames nformat(%9.2f)
            }
But this way I get tables with 4 columns, of which the first one has the variable label in all rows and the second one still has the numerical codes for each level. Any advice?

Many thanks,

Silvia

Contrast statement after multiple imputation

Hi Statalist

I am trying to analyse some data collected from a randomised controlled trial using a mixed model. The outcome score is collected at multiple time points, however there are quite a bit of missing data so I performed a multiple imputation. I performed my mixed model and included an interaction between randomised group and time point in my model so I could obtain the mean difference between the two randomised groups at each point.

Code:
mi estimate: mixed outcome_score i.group i.time i.group##i.time c.outcome_score_baseline || ID:, covariance(unstructured)

Usually I would use a contrast statement to obtain the mean difference between the groups at each time point, however Stata will not allow a contrast statement, or event the lincom command to be run after mi estimate. and mi estimate: contrast will not work.

Code:
contrast ib2.group@i.time, effect // Group A versus Group B

I was wondering if anyone knows of a way to obtain the mean difference between the groups at each time point from a mixed model when using mi estimate. I have tried mi test but that just seems to produce P-values and not the effect.

Thanks in advance
Sam

Graph bar

Hello,

I need to reproduce a similar graph bar to the one attached. It is basically an histogram, but with various subgroups, so I have to do it through graph bar. I am looking for help with the commands.

Thank you in advance

Interpeting interaction estimation reuslts from regressions

Hi all,

My name is Gráinne Gibson and I am currently doing my masters dissertation on the effect of Single-Sex schooling on adolescents non-cognitive and well-being outcomes in Ireland. I have attached a screenshot of my results and I just want to make sure I have the correct interpretation. If you take the coefficient for the interaction between gender and single_sex gender being 1=female 0=otherwise, my estimated coefficient is -1.1052387. I am wondering if I interpret this as 'attending a SS school in Ireland decreases a student competitiveness by 1.105 percentage points" ? Later I run the predictive margins and see the difference between boys in Single-Sex schools and boys in co-ed and receive a positive and significant coefficient (0.947482) which again I would like to interpret correctly. "Boys in SS schools are on average .0947 more competitive than their co-ed counterparts"? for females, it is negative and insignificant.

I would appreciate any advice

Time-varying interaction term with year fixed effects

Hi, I am investigating the effect of a commodity price on conflict. I have created a grid of cells with georeferenced data. I am using xtreg in Stata 16.1 have set Stata to xtset cell year and I am using cell FE, year FE and state-year FE. My independent variable is deaths.

To investigate the possible effect of a policy, I have created an interaction term: log(price) * commodity dummy * post-2010 dummy
where post-2010 is a dummy for if the year is after 2010.

My question I still need to include the term "log(price) * post-2010 dummy" or is the perfectly captured by the year FE?

Quantile Panel Regression

Hello.
I just wanted to know this. How do I exactly know when to use quantile panel regression

multiple values for same value label

Hi everyone,

I am working with a categorical variable called
Code:
plnowm
which expresses the month in which every individual moved into his present address. The categories for the variable are the following:

Code:
 Month moved |
  to present |
   address   |      Freq.     Percent        Cum.
-------------+-----------------------------------
     missing |      1,822        0.76        0.76
inapplicable |    196,551       82.24       83.00
     refusal |         27        0.01       83.01
  don't know |      1,404        0.59       83.60
     January |      2,204        0.92       84.52
    February |      2,389        1.00       85.52
       March |      2,572        1.08       86.60
       April |      2,764        1.16       87.76
         May |      2,790        1.17       88.92
        June |      3,511        1.47       90.39
        July |      3,690        1.54       91.94
      August |      4,102        1.72       93.65
   September |      5,570        2.33       95.98
     October |      3,933        1.65       97.63
    November |      2,925        1.22       98.85 
   December |      2,732        1.14      100.00
      Winter |          1        0.00      100.00
      Spring |          5        0.00      100.00
   Summer |          2        0.00      100.00
    Autumn |          2        0.00      100.00
-------------+-----------------------------------
       Total |    238,996      100.00
And the corresponding values for every category are the following:

Code:
Month moved |
 to present |
  address   |      Freq.     Percent        Cum.
------------+-----------------------------------
         -9 |      1,822        0.76        0.76
         -8 |    196,551       82.24       83.00
         -2 |         27        0.01       83.01
         -1 |      1,404        0.59       83.60
          1 |      2,204        0.92       84.52
          2 |      2,389        1.00       85.52
          3 |      2,572        1.08       86.60
          4 |      2,764        1.16       87.76
          5 |      2,790        1.17       88.92
          6 |      3,511        1.47       90.39
          7 |      3,690        1.54       91.94
          8 |      4,102        1.72       93.65
          9 |      5,570        2.33       95.98
         10 |      3,933        1.65       97.63
         11 |      2,925        1.22       98.85
         12 |      2,732        1.14      100.00
         13 |          1        0.00      100.00
         14 |          5        0.00      100.00
         15 |          2        0.00      100.00
         16 |          2        0.00      100.00
------------+-----------------------------------
      Total |    238,996      100.00
My question is then the following: how can I make it so that more than just one value equals a certain category? e.g. I want to create a label for which the "inapplicable" label is attached not just to all observations equal to -8, but also to observations equal to 13, 14, 15 and 16 (which now are linked respectively to the categories "winter", "spring", "summer" and "autumn"). Also, I would like to do this without having to change the values of observations (e.g. putting every "14" equal to "-8") in order to mantain the integrity of the dataset.

I appreciate any kind of help, and let me know if I was not clear enough!