BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Sunday, June 30, 2019

Taking the average of observations within a specific date range

Hi,

I want to compute the average of a variable (teamsize) for observations within a specific time period. I have a date variable formatted as such that I would like to use. Specifically, I have data on companies and their teams at different points in time. I want to calculate the average team size for a specific company within a given time period, always between the current date and 365 days prior to the observation.

. bysort company: egen mean(teamsize) if inrange()

is my best guess (sorry, new to stata and related programs in general!). I do not know how to specify inrange so that it takes the average of all observations with a date (variable is DATE, formatted as %td) in the range of the observation date and the 365 previous days. For example, if the teamsize was 55 on June 1st 2011, I want to create variable with a mean that takes into account all teamsizes from June 1st 2010 to June 1st 2011, including the team size of June 1st 2011.

It would be awesome if someone could help me out!

Best,
Julian

Create a two-way line graph/bar chart

Hi all

I want to create a line graph/bar chart which looks at the consumption of 3 main fertilizers- UREA, TSP and MOP over a 3 year period across two seasons. I want to include all the three fertilizers in the same graph.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Sur_yr str10 Cult_idHhidVdsid byte(Season Name_mat) str1 Unit_mat double Qty_mat
2012 "BBG12A0005" 1 22 "3"  5
2012 "BBG12A0007" 4 22 "3" 14
2012 "BBG12A0007" 1 25 "3"  5
2012 "BBG12A0007" 4 25 "3"  5
2012 "BBG12A0007" 4 22 "3" 10
2012 "BBG12A0007" 4 22 "3" 10
2012 "BBG12A0007" 4 22 "3" 14
2012 "BBG12A0007" 1 26 "3"  5
2012 "BBG12A0007" 1 22 "3"  2
2012 "BBG12A0007" 1 25 "3" 21
2012 "BBG12A0007" 1 26 "3" 20
2012 "BBG12A0007" 1 22 "3" 10
2012 "BBG12A0008" 1 26 "3" 12
2012 "BBG12A0008" 1 22 "3" 10
2012 "BBG12A0008" 4 25 "3" 10
2012 "BBG12A0008" 1 22 "3"  8
2012 "BBG12A0008" 4 22 "3" 10
2012 "BBG12A0008" 4 22 "3" 10
2012 "BBG12A0008" 1 25 "3"  7
2012 "BBG12A0008" 4 22 "3" 24
2012 "BBG12A0008" 1 25 "3" 14
2012 "BBG12A0008" 1 22 "3" 15
2012 "BBG12A0008" 1 22 "3"  7
2012 "BBG12A0008" 1 25 "3" 24
2012 "BBG12A0008" 4 26 "3"  8
2012 "BBG12A0008" 4 22 "3"  8
2012 "BBG12A0008" 4 22 "3" 12
2012 "BBG12A0008" 1 22 "3" 12
2012 "BBG12A0008" 1 22 "3" 15
2012 "BBG12A0010" 4 22 "3" 15
2012 "BBG12A0010" 1 22 "3" 15
2012 "BBG12A0010" 4 22 "3"  8
2012 "BBG12A0011" 1 25 "3" 25
2012 "BBG12A0011" 4 26 "3" 15
2012 "BBG12A0011" 1 26 "3" 20
2012 "BBG12A0011" 4 22 "3"  8
2012 "BBG12A0011" 4 22 "3" 10
2012 "BBG12A0013" 1 22 "3"  8
2012 "BBG12A0013" 1 25 "3" 15
2012 "BBG12A0013" 4 25 "3"  5
2012 "BBG12A0013" 4 26 "3"  5
2012 "BBG12A0013" 1 22 "3"  5
2012 "BBG12A0015" 1 26 "3" 10
2012 "BBG12A0015" 1 22 "3" 10
2012 "BBG12A0015" 4 22 "3" 10
2012 "BBG12A0015" 4 26 "3"  7
2012 "BBG12A0015" 1 22 "3"  8
2012 "BBG12A0015" 4 26 "3"  4
2012 "BBG12A0015" 1 25 "3"  7
2012 "BBG12A0015" 1 22 "3" 12
2012 "BBG12A0015" 1 25 "3" 10
2012 "BBG12A0016" 4 22 "3"  5
2012 "BBG12A0019" 1 25 "3"  3
2012 "BBG12A0019" 1 26 "3"  2
2012 "BBG12A0019" 1 22 "3"  3
2012 "BBG12A0019" 4 26 "3"  5
2012 "BBG12A0019" 4 22 "3"  8
end
label values Season Seasonname
label def Seasonname 1 "RABI", modify
label def Seasonname 4 "KHARIF", modify
label values Name_mat inputnames
label def inputnames 22 "UREA", modify
label def inputnames 25 "TSP", modify
label def inputnames 26 "MP", modify

How should I go about with it? Any help would be appreciated.

Thanks!
Enakshi

interaction effects: Poisson or Double-limit Tobit

Hi All,

I am working with the DHS data across five countries and looking at relationship between women's empowerment and children's dietary diversity after adjusting for some important exogenous variables. So, my DV is Food Groups (ranging from 1 to 7) and my IVs include 3 different domains of women's empowerment, wealth index, location, age of child, seasonal droughts, etc. I am trying to also examine if the effect of women's empowerment on number of food groups consumed will differ across different socioeconomic group hence I am investigation an interaction between women'e empowerment and wealth index. Please find the codes I used below and my Stata outputs. I ran both DL-Tobit and Poisson and estimated the margins command afterwards but I am struggling to interpret the margins. I think DL-Tobit might be a better model since the DV variable is count with a lower and upper limit. Please, what am I doing wrong and how can I interpret the interactions after running the -margins- command

Code:

tobit food_group c.att_score#b4 c.att_score#v190 b19 v025 built_population_2014 growing_season_length irrigation drought_episodes [pw=sample_weight] if ID=="MOZ", ll(1) ul(7) cluster (dhsclust)

Code:

Tobit regression                                Number of obs    =      2,303
F(  12,   2291)    =      64.43
Prob > F    =     0.0000
Log pseudolikelihood = -4916.4141               Pseudo R2    =     0.0107

(Std. Err. adjusted for 576    clusters in dhsclust)
    
Robust
food_group       Coef.          Std. Err.      t    P>t    [95% Conf. Interval]
    
b4#c.att_score 
male              .4682713    .2814429     1.66   0.096    -.0836382    1.020181
female          .3964391    .2830491      1.40   0.161    -.1586201    .9514983
                      
v190#c.att_score 
poorest        -.1941122   .3194032      -0.61   0.543    -.8204618    .4322374
poorer          -.2820354   .2949845     -0.96   0.339    -.8604999    .2964291
middle         -.3193704   .3110293      -1.03   0.305    -.9292989     .290558
richer          -.6558261    .301876       -2.17   0.030    -1.247805   -.0638473
richest            0  (omitted)
                      
b19             .0724426   .0110004       6.59   0.000    .0508708     .0940143
v025           -.2106143   .1878946     -1.12   0.262    -.5790755    .1578469
built_pop    .0001429    .0000382      3.74   0.000    .000068       .0002178
grwing_sea .000125   .0000238        5.25   0.000    .0000783     .0001716
irrigation    -.0000334    .000032      -1.04   0.297    -.0000962     .0000294
drought_epi -.0000109   .0000295    -0.37   0.712    -.0000688    .0000471
_cons          2.417986   .3833341      6.31   0.000    1.666268      3.169704
    
/sigma    2.035057   .0595378    1.918304    2.151811
    
330  left-censored observations at food_group <= 1
1,859     uncensored observations
114 right-censored observations at food_group >= 7

Code:

margins, dydx(*) atmeans predict  (e(1,.))

Code:

        
 Delta-method
                 dy/dx          Std. Err.          z    P>z     [95% Conf.    Interval]
        
b4 
female    -.0234546     .0444658    -0.53   0.598     -.110606    .0636968
att_score    .0905979   .0646847     1.40   0.161    -.0361817    .2173775
                      
v190 
poorer    -.0288767     .0680193     -0.42   0.671     -.162192    .1044386
middle     -.041052     .0765348     -0.54   0.592    -.1910574    .1089534
richer    -.1484396      .0759134     -1.96   0.051     -.297227    .0003479
richest     .0647657    .106857        0.61   0.544      -.14467    .2742015
                      
b19         .0467903      .0069769      6.71   0.000     .0331159    .0604647
v025       -.1360349      .1214412    -1.12   0.263    -.3740553    .1019856
built_pop .0000923       .0000248     3.72   0.000     .0000436    .000141
grwng_seas .0000807   .0000155     5.21   0.000     .0000504    .0001111
irrigation   -.0000216     .0000207    -1.04   0.297    -.0000621    .000019
drought_epi  -7.04e-06   .0000191    -0.37   0.713    -.0000445    .0000304
        
Note: dy/dx for factor    levels is the discrete change from the base level.

Code:

poisson food_group c.att_score#b4 c.att_score#v190 b19 v025 built_population_2014 growing_season_length irrigation drought_episodes [pw=sample_weight] if ID=="RWA", irr cluster (dhsclust)

Code:

margins, dydx(*)

Estimation output for Svy: meologit and melogit commands

Hi all,

I am using Stata 14.1 and i am getting an odd but of output.

I am estimating a series of mixed effect ordered logit and binary logit using the svy prefix.

My results are given me very small values for my random effect (10^-34) which make me suspect the model is estimating poorly, partly because their are no random effects (likely based on the model trying to control for the relevant variables)

So in order to exam this, i want to examine the log to see if the results are non-concaved (a standard check), but i was shocked to find i had no iteration log in my log files. I then issued the following command

Code:

 svy: meologit dep_var ind_var fixed_effects || id: , log

But I still don't get any estimation output. Is it not possible with the svy prefix or have i missed something

Thanks in advance

How to filter dates within an interval from a set of observation

Hello,

i have a dataset of around 900,000 observations from about 130,000 ids. The variables i am working with are mostly dates. i wanted to take just the first event date that happened during 1 year period after the interview date. Some of the dates recorded can come from the events that happened before the interview date.

my data would look like this

ID	intdate	effdate1	effdate2	effdate3	dureffdate1	dureffdate2
1	8/01/2010	28-Apr-10	28-Jul-11		110	566
2	30/08/2010	20-Dec-11			477
3	6/01/2010	31-Jul-10			206
4	13/01/2010	16-Apr-10			93
5	4/08/2010	27-Jul-10	14-Apr-11		-8	253
6	11/03/2010	22-Dec-10			286
7	20/10/2010	23-Sep-10			-27
8	3/11/2010	8-Dec-11			400
9	16/06/2010	6-Sep-11			447
10	29/06/2010	6-Dec-10	25-Jan-11		160	210
11	1/11/2010	6-Oct-11			339
12	15/02/2010	9-Aug-10			175
13	12/11/2010	27-Sep-11			319
14	2/02/2010	10-Mar-11			401
15	14/02/2010	8-Aug-11			540

from intdate and effdate1 to 8 i calculated the interval which resulted in dureffdate1 to 8

i wanted to take just the first date of event which happened during 1 year period after intdate.
how do i do that without deleting other observations as i need them for other analysis.

thank you

how to get x-standardized coefficient in logistic regression, i.e., reproduce results from "listcoef" command

I was trying to get the standardized coefficient from ologit model. "listcoef" can produce the table nicely, but I want to reproduce the regression table with just x-standardized coefficients.
So, what I was doing is, first, standardize the X variables, then, regress the Y on standardized Xs. But the standardized coefficients from my model are different from the "listcoef" results for the categorical variables. Standardize categorical variables will literally yield the same thing, so I kept using

Code:

i.foreign i.hdroom

rather than standardized variables. Please look at my code below:

Code:

sysuse auto,clear

* drop missing data
drop if rep78==.

* recode headroom into three categories
recode headroom (1.5 2 2.5 =1 "small") ///
(3 3.5 =2 "medium") ///
(4 4.5 5 =3 "large"), ///
gen(hdroom) label(headroom)
tab hdroom,mi

* unstandardized coefficient
ologit rep78 price i.foreign i.hdroom

* get standardized coefficient using "listcoef"
listcoef, std help

* standardize X variables
foreach v in price foreign hdroom {
egen std`v' = std(`v')
}

* get x-standardized coefficient
ologit rep78 stdprice i.foreign i.hdroom

Did I misunderstand x-standardized coefficient? How does "listcoef" work?

reshape - wide connections

I have variables in wide that belong together i groups (eg. var1a and var2a).
I would like to reshape into long while assigning the value in the var. institution to each obs/id. The data example should then become 4 obs. long.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id str24 institution str17 var1a str36 var2a str17 var1b str36 var2b
43 "hus 1" "1 kvinde" "Bil" "1 kvinde" "Bil"
44 "hus 2" "1 mand"   "Bil" "1 kvinde" "Bil"
end

hope someone can help me.
Best regards
Lars

If above is unclear it is this data structure I aim for:

id	institution	var1	var
43-1	hus1	1 kvinde	Bil
43-2	hus1	1 kvinde	Bil
44-1	hus2	1 mand	Bil
44-2	hus2	1 kvinde	Bil

Cox regression assumption

I am doing a research to examine the association between sleep and all-cause mortality. The average follow-up time is only 5 years for both men and women.

I was thinking to run cox regression since my outcome is death. However, after checking the assumption, unfortunately, I have evidence of non-proportional hazards with almost all covariates.
Please any suggestion of what to do next or should I run different regression type?

Thank you

SEM, error coefficients output.

Hi,
How can I receive the outputs for the "path coefficient" of error terms to each items of a latent variable? The output only shows the variance of error terms at the end of the table. I need to fixate the path coefficient of error terms to their reciprocal items for running another model. However, I don't know how to receive them first.

For example, I know I can fixate the path coefficient of each error term. For example: (e.a1@1->a1) (Latent->a1 a2 a3) , how can I receive the error path coefficient value for e.a2 -> a2 or e.a3->a3 ?

Please help.

New Stata 16 versions of dolog, dotex and dologx on SSC

Thanks as always to Kit Baum, new versions of the dolog, dotex and dologx packages are now available for download from SSC. In Stata,use the ssc command to do this, or adoupdate if you already have old versions of these pachages.

The packages dolog, dotex and dologx are described as below on my website. The new versions have been updated to Stata Version 16. However, users of older Stata versions can still download old versions of these packages from my website by typing, in Stata,

net from http://www.rogernewsonresources.org.uk/

and selecting the subfolder for the user's own Stata version, where the appropriate old version can be found.

Best wishes

Roger

---------------------------------------------------------------------------------------
package dolog from http://www.rogernewsonresources.org.uk/stata16
---------------------------------------------------------------------------------------

TITLE
dolog: Execute commands from a do-file, creating a text or SMCL log file

DESCRIPTION/AUTHOR(S)
dolog and dosmcl (like do) causes Stata to execute the commands stored in
filename.do as if they were entered from the keyboard, and echos the commands
as it executes them, creating a text log file filename.log (in the case of
dolog) or a SMCL log file filename.smcl (in the case of dosmcl). If filename
is specified without an extension, then filename.do is assumed. If filename
is specified with an extension other than .do, or with no extension, then the
log file will still have .log or .smcl as its extension, so dolog and dosmcl
will not overwrite the original do-file. Arguments are allowed (as with do or
run).

Author: Roger Newson
Distribution-Date: 27june2019
Stata-Version: 16

INSTALLATION FILES (click here to install)
dolog.ado
dosmcl.ado
dolog.sthlp
dosmcl.sthlp
---------------------------------------------------------------------------------------
(click here to return to the previous screen)

---------------------------------------------------------------------------------------
package dotex from http://www.rogernewsonresources.org.uk/stata16
---------------------------------------------------------------------------------------

TITLE
dotex: Execute a do-file generating a SJ LaTeX log

DESCRIPTION/AUTHOR(S)
dotex is a version of the dolog package, and causes Stata to execute the
commands stored in a do-file named filename.do as if they were entered from
the keyboard, and echoes the commands as it executes them, creating a log file
filename.tex written in Stata Journal (SJ) LaTeX. The file filename.tex (or
parts of it) can then be included in the stlog environment of SJ LaTeX. The
dotex package was derived by hybridising the dolog package with the SJ's
logopen and logclose, which also create log files written in SJ LaTeX for
inclusion in the stlog environment. The dotex package has the advantage that
a user can run the same do-file using dotex, dolog and do, creating a SJ
LaTeX log file, a text log file, and no log file, respectively.

Author: Roger Newson
Distribution-Date: 27june2019
Stata-Version: 16

INSTALLATION FILES (click here to install)
dotex.ado
dotex.sthlp
---------------------------------------------------------------------------------------
(click here to return to the previous screen)

---------------------------------------------------------------------------------------
package dologx from http://www.rogernewsonresources.org.uk/stata16
---------------------------------------------------------------------------------------

TITLE
dologx: Multiple versions of dolog for executing certification scripts

DESCRIPTION/AUTHOR(S)
dologx (like dolog) causes Stata to execute the commands stored in a do-file
named filename.do as if they were entered from the keyboard, and echos the
commands as it executes them, creating a log file filename.log. The dologx
package contains multiple versions of dolog, written in multiple Stata
versions, intended for use when running certification scripts. Usually, a
do-file should contain a version statement at or near the top, so it will
still run in the Stata version in which it was written, even if the user runs
it under a later version of Stata. Certification scripts are an exception to
this rule, because they are run under multiple Stata versions, to certify
that the package being tested works under all of these versions. A
certification script therefore should not contain a version statement at the
top. The version of Stata under which it run will therefore be the version in
force in the program that calls it, even if that program is dolog. The
standard version of dolog should therefore not be used to run a certification
script, and the user should use the dologx package instead, using dolog6 to
run it under Stata 6, dolog7 to run it under Stata 7, and so on.

Author: Roger Newson
Distribution-Date: 27june2019
Stata-Version: 16

INSTALLATION FILES (click here to install)
dolog6.ado
dolog7.ado
dolog8.ado
dolog9.ado
dolog10.ado
dolog11.ado
dolog12.ado
dolog13.ado
dolog14.ado
dolog15.ado
dolog16.ado
dolog6.sthlp
dolog7.sthlp
dolog8.sthlp
dolog9.sthlp
dolog10.sthlp
dolog11.sthlp
dolog12.sthlp
dolog13.sthlp
dolog14.sthlp
dolog15.sthlp
dolog16.sthlp
dologx.sthlp
---------------------------------------------------------------------------------------
(click here to return to the previous screen)

Gravity model of migration: ppml vs. ppmlhdfe

Dear all,

I'm working on a gravity model of migration for my thesis. My data contains 28 EU countries as destination countries, 130 non-EU countries as origin countries and spans 10 years.
My dependant variable is the number of first-time issued residence permits for employment for any given courtrypair (as I'm focussing on work-related migration) and I want to interpret the influence of different migration policies. Due to the large number of zeros in the dependant variable, I want to use the PPML estimator as described by Santos Silva & Tenreyo (2006) and used in most gravity-related literature in the recent years.

I want to run the following command:

Code:

ppml permits_all lgdp_o lgdp_d ldist contig comlang_off colony comcol market_test shortage_list point_system job_offer dFE_ot*, cluster(dist)

market_test, shortage_list, point_system and job_offer are dummy variables for the different migration policies, dFE_ot* are origin*time FE that I want to include

As I'm using Stata 15.1 IC, I can't run the code because 1300 origin*time dummies are too much for the IC version. Before buying a new licence (poor student here), I checked for alternatives and found the following three commands:
1. xtpoisson, fe
2. ppml_panel_sg
3. ppmlhdfe

I'm (theoretically) abel to use xtpoisson, fe and ppml_panel_sg, but since they are set in regards to the kinds of fixed effects they use, they are not practical for me.
I like ppmlhdfe, as I can decide which fixed effects to add (as with ppml), but don't need to inflate my modell with the manually added FE dummy variables (and thus can run the regression in Stata IC).

Now to my question:
Is my understanding correct, that the following to codes would produce the same output?

Code:

ppml permits_all lgdp_o lgdp_d ldist contig comlang_off colony comcol market_test shortage_list point_system job_offer dFE_ot*, cluster(dist)

Code:

ppmlhdfe permits_all lgdp_o lgdp_d ldist contig comlang_off colony comcol market_test shortage_list point_system job_offer, a(i.origin#i.year)

Best regards,
Sarah

Probit Marginal Effect

Hello Everyone,
I am working on a Probit regression for my research, I want to report the marginal effect but I can't seem to figure out the command to use to export my results to excel.
I had initially tried this command;
" margins, dydx(*)" to get the marginal effect and the command "outreg2 using myreg,replace excel " to export my result. But outreg2 command I used tends to export my regression output, not the marginal effect output.
Please, can anyone recommend a command for me?
Thanks

Recoding in panel data

How to recode values by groups in panel data by using Stata commands? For example, the score for id1 at wave 1 is 3, and I seek to let the score at wave 2 to 4 to be 3.

id	wave	score
1	1	3
1	2
1	3
1	4
2	1	1
2	2
2	3
2	4
3	1	2
3	2
3	3
3	4

Ideal:

id	wave	score
1	1	3
1	2	3
1	3	3
1	4	3
2	1	1
2	2	1
2	3	1
2	4	1
3	1	2
3	2	2
3	3	2
3	4	2

Propensity Score matching: how to define the dependent variable

Hello everyone, we are trying to use propensity score matching (PSM) to find the suitable control group. Our data set is at the exporter (China)-product(HS6)-destination (EU)-year level from 2000-2013. The dataset contains over 650,000 unit of observations. The treatment group is the products that are subject to the EU antidumping (AD) measures. Between 2000-2013, we had 160 product at the HS6 level exported from China to the EU faced with EU AD measures. We want to use PSM to find the control group, as the products that were subject to the EU AD measures are not random, using PSM could avoid this selection bias. We use the logit model and to estimate the probability of a product being imposed by the AD measure, based on a set of observable characteristics. The estimation equation is as follows:
Pr(AD=1)_p=beta_0+beta_1 IP(China)_pt-1+beta_2 GDP_t+ RER_t+year FE+error term

IP(China)_pt-1 is lagged import penetration, which is defined as the share of import from China over total imports in the EU at the HS6 level;
GDP_t: is the GDP growth rate in EU in year t
RER_t: is the log real exchange rate in terms of Euro per Chinese RMB;
I also include the year fixed effects.

My question is how should we define the dependent variable (DV). More specifically, should we define the dependent variable as AD_p=1, if the product is subject to EU AD and 0 otherwise? With this definition, the DV does not change over time, only varies across products. Alternatively, we can define the dependent variable as AD_pt=1, if the product is subject to EU AD in year t and it remains to 1 if this measure is still in force, and 0 otherwise. For instance, if a product imposed an AD measure in 2005 and the measure stayed in force until 2010, then AD_pt=1 between 2005 and 2010, but for the years before the treatment, AD_pt=0, and for the years after the measure is revoked (i.e., after 2010), AD_pt=0. If the DV indeed needs to vary at time dimension, my question is how PSM could find a control for the treated product before the treatment. Specifically, if the product was treated between 2005 and 2010, the rest of the years are all taking 0, how could PSM find a control group for the treated product say in 2004 or 2011?

I am very confusing what is the right definition for the DV, and I really appreciate your help and suggestion.

Create fiscal years from years and months

Dear Statalist,

I would like to create a fiscal year variable that take values like 2005-06, 2006-07 and so on till 2018-19, from the variables year, month as given below.

Code:

year    month
2006    Jan
2006    Feb
2006    Mar
2006    Apr
2006    May
2006    Jun
2006    Jul
2006    Aug
2006    Sep
2006    Oct
2006    Nov
2006    Dec
2007    Jan
2007    Feb
2007    Mar
2007    Apr
2007    May
2007    Jun
2007    Jul
---------
---------
2019   Mar
2019   Apr

Can someone help me code this?
Thanks.

Values of multiple columns in one column using a loop

Hi everyone,

I'd like to place all values of multiple columns in one column. I tried stack using a loop:

Code:

foreach id of local temp {
            stack cum_abnormal_return_short`id', into(cum_abnormal_return_short)
            stack cum_abnormal_return_med`id', into(cum_abnormal_return_med)
            stack cum_abnormal_return_long`id', into(cum_abnormal_return_long)
            }

Unfortunately, an error occurs as stack changes the dataset. Is there any other way?

Thanks,

Paul

Variable Labels

Hello dear forum members, I am new here and am currently learning Stata. I am a beginner and have only one question. Maybe there is already a thread to it, I apologize ifthere is a similar one. Now my problem. I have already published an Exel file with a list of buildings (number of floors, year of construction and buildings on the property). Array The task is to delete all observations with more than 10 floors. Here is my code

Code:

drop if STORY> 10

:D so far so good. No science. But why are not they deleted? When I go to the "35" but I see a "5". Do you see what I mean? The AGE is also wrong from front to back. It should be deducted from the year 2019 - YRB, but with AGE I do not get the right values. Maybe the pros can help me a bit and give me a hint what I'm doing wrong. I would be very thankful. Yours sincerely

Problems creating Herfindahl-Hirschman Index

Hello everyone,

I want to create the Herfindahl-Hirschman Index for my data. I have quartlery data for different banks. I want to have the HHI for market share.

I tried the command

hhi marketshare, by(bank_id, date) however it only gives me 1 for every observation. Leaving out the date gives me results, however I want the date included.
(same with the hhi5 command). When I try the entropyetc command it says matsize too small (even after I increased the matsize)

Does anyone have a suggestion?

Thank you in advance

Taking the average of observations within a specific date range

Code of Patell Z-statistic

Goodmorning everyone,

I'd like to derive a Z-statistic by means of the Patell test (https://www.eventstudytools.com/sign...e-tests#Patell). Unfortunately, I did not found an option within Stata to perform the test or found any material on the forum. Is there anyone who's got either a command to use or code to execute? I've tried the code underneath, but is does not seem to provide good values (CAR: cumulative abnormal returns/AR: abnormal return/CAAR: cumulative average abnormal return).

Thanks in advance,

Paul

Code:

gen temp123=1
                levelsof temp123, local(123)
                foreach id of local 123{
                    by ticker: egen s2_short_`id' = sd(logret) if estimation_window_short`id'==1
                    by ticker: egen L_short_`id' = count(logret) if estimation_window_short`id'==1
                    by ticker: egen aRM_short_`id' = mean(logret) if estimation_window_short`id'==1
                    by ticker: egen R_short_`id' = max(logret) if eventid==`id' &estimation_window_short`id'==1
                    }
                            
                foreach id of local 123{
                    by ticker: gen Rsum_short_t_`id' = logret-aRM_short_`id' if estimation_window_short`id'==1
                    }
                                                
                foreach id of local 123{
                    by ticker: egen Rsum_short_`id' = sum(Rsum_short_t_`id'^2) if estimation_window_short`id'==1
                    }
                    
                foreach id of local 123{
                    drop Rsum_short_t_`id'
                    }    
                                
            //3.10.1.2 Part-by-part calculation of statistic                        
                //(Rmt-ARm)^2
                foreach id of local 123{
                    gen part1_s_`id' = R_short_`id'- aRM_short_`id' if estimation_window_short`id'==1
                    replace part1_s_`id' = part1_s_`id'^2  if estimation_window_short`id'==1
                    }
                
                //(Rmt-ARm)^2/sum(Rmt-ARm)^2
                foreach id of local 123{
                    gen part2_s_`id' = part1_s_`id'/Rsum_short_`id' if estimation_window_short`id'==1
                    }
                
                //1+(1/L)+(part2)
                foreach id of local 123{
                    gen part3_s_`id' = 1+(1/L_short_`id')+part2_s_`id' if estimation_window_short`id'==1
                    }
                
                //s2(part3)^0.5
                foreach id of local 123{
                    gen part4_s_`id' = sqrt(s2_short_`id'*part3_s_`id') if estimation_window_short`id'==1
                    }
                
                foreach id of local 123{
                    by ticker: replace part4_s_`id' = part4_s_`id'[_n-1] if missing(part4_s_`id')&event_window_short`id'==1 
                    }
                            
            //3.10.1.3 Standardize abnormal returns Patell short
                foreach id of local 123{
                    gen AR_sd_s_pat`id' =. if estimation_window_short`id'==1
                    }    
                    
                foreach id of local 123{
                    replace AR_sd_s_pat`id' = abnormal_return_short`id'/part4_s_`id' if event_window_short`id'==1
                    }
                        
            //3.10.1.4 Drop variables
            foreach id of local 123{
                drop R_short_`id' aRM_short_`id' L_short_`id' s2_short_`id' part1_s_`id' part2_s_`id' part3_s_`id' part4_s_`id' 
                }
                
            save temp11.dta, replace
            
        //3.10.2 Cummulate standardized Patell abnormal returns short
            use temp11.dta, clear
            foreach id of local 123 {
                    by ticker: egen CAR_pat_t_s`id'= sum(AR_sd_s_pat`id') if event_window_short`id'==1
                    }    
                    
            foreach id of local 123 {
                    by ticker: gen CAR_pat_s`id' = CAR_pat_t_s`id'*(1/(sqrt(3))) if event_window_short`id'==1
                    }    
                        
        //3.10.3 Derive Patell Z-statistic
            foreach id of local 123 {
                    egen CAAR_P_s`id'= mean(CAR_pat_s`id') if eventid==`id'
                    }
                                    
            foreach id of local 123 {
                    egen No_CAAR_P_s`id'= count(CAAR_P_s`id')
                    replace No_CAAR_P_s`id'= sqrt(No_CAAR_P_s`id')
                    }                        
                                                
            foreach id of local 123 {
                    gen test_P_s`id'= CAAR_P_s`id'* No_CAAR_P_s`id' 
                    }

renaming variables using values they hold

Hey everyone,

Is there a simple command (or few lines of code) to rename variables based on a specific string value they hold?

In more detail, the situation is that I have an elections results data set in which every variable is a "party", and the values it holds represent the amount of votes for that party. the problem is that variables are named by the letters represent each party on the ballots. I would like to rename the variables so that they indicate the parties' names. For doing that, I have another small table that has the same unwanted names (the representing letters) as variable names, but under each of them there is a string value of the party's real name.

Any ideas how to deal with this issue?

Much appreciation,
Ben

How to display the P value of the mediation with khb command?

Hi all,

Recently I read a paper published in JAMA Pediatrics(doi:10.1001/jamapediatrics.2019.1212), the authors provide the P value of the mediating variables using the -khb- command in Stata (Table 2, listed below). Array

I want to estimate the the P value in Summary of confounding part(Conf_Pct column) and Components of Difference part(P_Reduced column), as illustrated in Table 2, but didnot know how.

The -khb- is user-written program and can be installed by command:

HTML Code:

. net sj 13-1 st0236_2
. net install st0236_2   // INSTALLATION FILES 
. net get st0236_2       // ANCILLARY FILES, including dlsy_khb.dta and khb.do

Below is my codes and results, can anyone offer any clue?

HTML Code:

. use dlsy_khb.dta

. khb logit univ fses || abil intact boy, disentangle summary verbose

(omitted)

Logistic regression                             Number of obs     =      1,896
                                                LR chi2(4)        =     216.87
                                                Prob > chi2       =     0.0000
Log likelihood = -468.31516                     Pseudo R2         =     0.1880

------------------------------------------------------------------------------
        univ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        fses |   .3817324   .0778061     4.91   0.000     .2292353    .5342295
        abil |   1.065516    .106775     9.98   0.000     .8562405    1.274791
      intact |    1.08391   .7386558     1.47   0.142    -.3638292    2.531648
         boy |   .9821406   .1848351     5.31   0.000     .6198704    1.344411
       _cons |  -4.462997   .7479123    -5.97   0.000    -5.928878   -2.997116
------------------------------------------------------------------------------

(omitted)

Logistic regression                             Number of obs     =      1,896
                                                LR chi2(4)        =     216.87
                                                Prob > chi2       =     0.0000
Log likelihood = -468.31516                     Pseudo R2         =     0.1880

------------------------------------------------------------------------------
        univ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        fses |   .5805281   .0786111     7.38   0.000     .4264531    .7346031
    __000001 |   1.065516    .106775     9.98   0.000     .8562405    1.274791
    __000002 |    1.08391   .7386558     1.47   0.142    -.3638292    2.531648
    __000003 |   .9821406   .1848351     5.31   0.000     .6198704    1.344411
       _cons |  -2.945969    .124697   -23.63   0.000    -3.190371   -2.701568
------------------------------------------------------------------------------

Decomposition using the KHB-Method

Model-Type:  logit                                 Number of obs     =    1896
Variables of Interest: fses                        Pseudo R2         =    0.19
Z-variable(s): abil intact boy
------------------------------------------------------------------------------
        univ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
fses         |
     Reduced |   .5805281   .0786111     7.38   0.000     .4264531    .7346031
        Full |   .3817324   .0778061     4.91   0.000     .2292353    .5342295
        Diff |   .1987956   .0359394     5.53   0.000     .1283557    .2692355
------------------------------------------------------------------------------

Summary of confounding

        Variable | Conf_ratio    Conf_Pct   Resc_Fact  
    -------------+-------------------------------------
            fses |  1.5207722       34.24   1.1317064  
    ---------------------------------------------------

Components of Difference

      Z-Variable |      Coef    Std_Err     P_Diff  P_Reduced  
    -------------+---------------------------------------------
    fses         |                                            
            abil |  .1661177   .0301003      83.56      28.61  
          intact |   .020142   .0144611      10.13       3.47  
             boy |  .0125359    .011524       6.31       2.16  
    -----------------------------------------------------------

Thank you all in advance!

The user-written program -khb-, created by Ulrich Kohler, Kristian Bernt Karlson, and Anders Holm, and detailed in the following article:
Kohler, U., K.B. Karlson, and A. Holm. 2011. "Comparing Coefficients of Nested Nonlinear Probability Models." Stata Journal, 11(3): 420-38.
https://journals.sagepub.com/doi/pdf...867X1101100306

Sample size estimation for comparison of three proportions: Control, Treatment1 and Treatment2

Dear StataList,
How can I calculate the sample size requirements for proportions in 3 (three) groups, namely: Control, Treatment1, and Treatment 2, when the outcome proportions typically range between 2% and 20%? Indeed, is it necessary to compare 3 groups - is it sufficient to compare Treatment1 vs Control and Treatment2 vs Control, and Treatment1 vs Treatment2, when I wish to establish that at least one of the treatment groups differs statistically from the Control group, and that Treatment1 differs significantly from Treatment2. Ideally, I would like to detect statistically significant differences between Controls and each of the treatment groups and between Treatment1 and Treatment2.
I will appreciate your expert comments.
Dora Pearce

Random Effect Autocorrelation Test

Dear Good People,

I am currently shifting to using Random Effect for my data and I would like to know if there is a spesific command to execute autocorrelation test using Random Effect? Previously I use the command "xtserial y1 x1 x2". And also, I use robust for my Random Effect, is there any different in the command? Thank you!

Saturday, June 29, 2019

Average calculation issues

Hi there,

I would like to calculate the average account receivable using netaccountsreceivable and lag_net_account_receivable.

However, I found the average value seems to be incorrect. For example, it is 276860608 for stkcd 1 in 1995, shouldn't it be (309246048+244475184)/2 = 276860616?

Could someone please tell me why this would happen and how to avoid that?

Thanks

code I used:

Code:

gen avg_account_receivable= (netaccountsreceivable+lag_net_account_receivable)/2

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long stkcd int year float(netaccountsreceivable lag_net_account_receivable avg_account_receivable)
1 1990           .           .            .
1 1991           .           .            .
1 1992           .           .            .
1 1993           .           .            .
1 1994   244475184           .            .
1 1995   309246048   244475184    276860608
1 1996   469556608   309246048    389401344
1 1997   534341664   469556608    501949120
1 1998   516883264   534341664    525612480
1 1999   378508704   516883264    447696000
1 2000  -396003232   378508704     -8747264
1 2001    11202902  -396003232   -192400160
1 2002  -762710976    11202902   -375754048
1 2003  -833153216  -762710976   -797932096
1 2004  -874245760  -833153216   -853699456
1 2005  -763604032  -874245760   -818924928
1 2006           .  -763604032            .
1 2007           .           .            .
1 2008 15109592064           .            .
1 2009 35209261056 15109592064  25159426048
1 2010 32229515264 35209261056  33719388160
1 2011 1.84321e+11 32229515264 108275253248
1 2012 99201843200 1.84321e+11 141761413120
1 2013 1.91714e+11 99201843200 145457922048
1 2014 2.56183e+11 1.91714e+11 2.239485e+11
1 2015 3.14259e+11 2.56183e+11  2.85221e+11
1 2016 4.19846e+11 3.14259e+11 3.670525e+11
1 2017 4.25209e+11 4.19846e+11 4.225275e+11
1 2018           . 4.25209e+11            .
2 1991   142333296           .            .
2 1992   125544736   142333296    133939016
2 1993   158023472   125544736    141784096
2 1994   386629024   158023472    272326240
2 1995   650885952   386629024    518757504
2 1996   691276416   650885952    671081216
2 1997   775456832   691276416    733366656
2 1998   528862240   775456832    652159552
2 1999   535368512   528862240    532115392
2 2000   516286848   535368512    525827680
2 2001   477500384   516286848    496893632
2 2002   302297120   477500384    389898752
2 2003   365968992   302297120    334133056
2 2004   838037504   365968992    602003264
2 2005  1082277376   838037504    960157440
2 2006  1035615424  1082277376   1058946432
2 2007   864883008  1035615424    950249216
2 2008   922774848   864883008    893828928
2 2009   713191936   922774848    817983360
2 2010  1594024576   713191936   1153608192
2 2011  1514813824  1594024576   1554419200
2 2012  1886548480  1514813824   1700681216
2 2013  3078969856  1886548480   2482759168
2 2014  1894071808  3078969856   2486520832
2 2015  2510653184  1894071808   2202362368
2 2016  2075256832  2510653184   2292955136
2 2017  1432733952  2075256832   1753995392
2 2018  1586180736  1432733952   1509457408
3 1991    96595616           .            .
3 1992   183449968    96595616    140022784
3 1993   389976800   183449968    286713376
3 1994   533173248   389976800    461575040
3 1995   652192256   533173248    592682752
3 1996   949415296   652192256    800803776
3 1997   710684608   949415296    830049920
3 1998   566756352   710684608    638720512
3 1999   711459392   566756352    639107840
3 2000   522822464   711459392    617140928
3 2001   344030464   522822464    433426464
4 1991    18389532           .            .
4 1992    53118160    18389532     35753848
4 1993   203273392    53118160    128195776
4 1994   191898672   203273392    197586032
4 1995   279380832   191898672    235639744
4 1996   118328432   279380832    198854624
4 1997   126063984   118328432    122196208
4 1998    93838936   126063984    109951456
4 1999    55299972    93838936     74569456
4 2000   222196064    55299972    138748016
4 2001    31434708   222196064    126815384
4 2002    62747196    31434708     47090952
4 2003    88416992    62747196     75582096
4 2004    98330824    88416992     93373904
4 2005    76457512    98330824     87394168
4 2006    62786196    76457512     69621856
4 2007     7229633    62786196     35007916
4 2008     4995476     7229633      6112554
4 2009     6860687     4995476      5928081
4 2010   4296549.5     6860687      5578618
4 2011     4540338   4296549.5      4418444
4 2012     8129105     4540338      6334721
4 2013     3760743     8129105      5944924
4 2014     2742071     3760743      3251407
4 2015   1726513.8     2742071    2234292.5
4 2016     1706211   1726513.8    1716362.5
4 2017     9456606     1706211      5581409
4 2018    27268320     9456606     18362464
5 1992   155349056           .            .
5 1993   166755648   155349056    161052352
5 1994   200326880   166755648    183541264
5 1995   127329032   200326880    163827952
end

Expression builder dialog

Is there any way to make the expression builder dialog larger? The lists are too small to be usable and require constant scrolling both vertical and horizontal.
(Stata 15 on Windows 10):

Array

Python means no more calling RDBMSs to get the UTC offset

Updated my -tslog.ado-. See log file attached.

Panel Data with same values/data for independent variables across multiple countries

I am analyzing the impact of US monetary policy variables on capital flows to emerging markets and intend to use panel data analysis. I have data on capital flows for 20 countries individually from 2000-2018 and US monetary policy variables for the same period.

What I don't understand is that the US monetary policy variables (inflation, industrial production, spead) - the independent variables would remain the same for every country from 2000-2018.

Is it correct to use panel regression in this case?

Roommates' max expenditure with parents' educations

Dear All, I was asked this question here. The data set is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id exp) long roomnumber byte(feduc meduc)
53 1700 105111 4 5
43 1800 105111 8 3
57 1500 105211 5 6
56 2000 105211 5 6
60 2100 105211 3 3
58 1321 105211 4 4
63 2500 105211 7 7
59  900 105212 6 5
62 1200 105212 6 3
72 1200 105212 5 7
end

Firstly, for each room (`roomnumber'), there are alternative numbers of roommates (with different `id'). We want to obtain a new variable for each roommate, say `max_exp', which is the maximum of `exp' in each `roomnumber', excluding himself. I have done this by ( ssc install asrol)

Code:

bys roomnumber: asrol exp, stat(max) xf(focal)

with result as

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id exp) long roomnumber byte(feduc meduc) double max_exp
53 1700 105111 4 5 1800
43 1800 105111 8 3 1700
57 1500 105211 5 6 2500
56 2000 105211 5 6 2500
60 2100 105211 3 3 2500
58 1321 105211 4 4 2500
63 2500 105211 7 7 2100
59  900 105212 6 5 1200
62 1200 105212 6 3 1200
72 1200 105212 5 7 1200
end

Secondly, I want to generate two additional variable, say `feduc1' and`meduc1', which is the `feduc' and`meduc' from the one with maximum expenditure. Taking the average of `feduc' and`meduc' if there are ties in the maximum expenditures.Any suggestions? Thanks.

Combining local macros in Stata

I am trying to create a loop that copies specific files (which have a specific identifier at the end of the filename) from a list of folders to another folder using Windows shell commands. However, I am having trouble working with the local macros in Stata. I have tried troubleshooting this with a variety of different ways but get various issues.

1) When I run the commands below:
local folder "folder"
local version "version10"
di "C:\...\`folder'\*`version'.*"

Stata shows this:
C:\...`folder'\*version10.*

It is unclear to me why the `folder' does not return value the assigned to the local macro and also why the backslash somehow disappears.

2) I tried to get around this by combining the local macros as follows:
local folder "folder"
local version "version10"
local path = "C:\..." + "`folder'" + "\*" + "`version'" + ".*"
di "`path'"

This works and returns the following path:
C:\...\folder\*version10.*

However, when I tried to create a loop to do this for all the folders as follows:
local names "a" "b" "c" (These would be the folder names)
local version "version10"
foreach folder of local names {
local path = "C:\..." + "`folder'" + "\*" + "`version'" + ".*"
di "`path'"
}

I get a "too few quotes" error, even though this is the same exact command that I ran previously that returns the correct path.

I would really appreciate any help with this.

Blog entry on quickly setting up Python with Stata 16

Here is my first blog entry on how to quickly set up Python with Stata 16 https://fintechprofessor.com/2019/06...with-stata-16/

Can Stata 16 frames be appended?

Fellow Statalsters (especially StataCorp)

Many thanks to StataCorp for Stata 16. The frames are a specially welcome addition, and I have been playing with these over the weekend so far. I definitely hope to frameify my resultsset-generating programs to make resultsframes.

One immediate query. Is it possible to append frames (as we append datasets using the append commnd)? I would find such a possibility very useful in for the Stata 16 version of the parmby module of my parmest package. I have looked in the help for frames and for append, but have so far found nothing about appending frames. (Am I just not looking in the right place?)

Best wishes

Roger

Mixed-effects logistic regression for panel data

Hello, I am using mixed-effects logistic regression for panel data in STATA 15, and I was wondering if my commands are correct.

My DV is a binary variable, and each respondent was surveyed once a year for five years. So, each respondent has five repeated measures. In the data, the respondent identifier is the variable ID. I also have time-varying covariates IV1, IV2, and time-invariant covariates IV3 and IV4, and a time variable Year. I want to use a mixed-effects logistic regression as follows:

Code:

melogit DV IV1 IV2 IV3 IV4 Year || ID: Year, cov(un)

But, this is kind of like a growth curve model except for the binary DV. I was wondering if my command is correct, and what is the difference between a growth curve model and a mixed-effects logistic model for panel data. Thank you very much!

Ereturn Scalar on MI Estimate?

Hi All,

I currently am trying to perform a bootstrap on mi estimate: logistic. However, when I run my code, I get the following error:

Bootstrap replications (20)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxx
insufficient observations to compute bootstrap standard errors
no results will be saved

My code is as follows:

Code:

mi set flong
mi register imputed asd age sex bmi smoke numlevels dosym
    
set seed 54321
    
    capture program drop myboot
    program myboot, eclass properties(mi)
        mi impute chained (logit) asd sex smoke (pmm) bmi dosym (ologit) numlevels, augment force add(60) 
        mi estimate: logistic asd age sex bmi smoke numlevels dosym
            
        ereturn scalar b_a = el(e(b_mi),1,1)
        ereturn scalar b_b = el(e(b_mi),1,2)
        ereturn scalar b_c = el(e(b_mi),1,3)
        ereturn scalar b_d = el(e(b_mi),1,4)
        ereturn scalar b_e = el(e(b_mi),1,5)
        ereturn scalar b_f = el(e(b_mi),1,6)
        ereturn scalar b_g = el(e(b_mi),1,7)
        
    end

bootstrap b_age=e(b_a) b_sex=e(b_b) b_bmi=e(b_c) b_smoke=e(b_d) b_lvls=e(b_e) b_dosym=e(b_f) b_int=e(b_g), reps(20) nodrop :myboot

I tried to look into why this might be happening and believe it is due to the ereturn function not recognizing my mi estimates as an eclass program for the following bootstrap command.

Any advice on how to address this or what else may be the issue? Thanks for your help in advance!

2x2 AB/BA crossover trial analysis resources/advice

Hi all.

I would like to conduct the analysis of a trial comparing two drugs (A and B) on reducing pain associated with a procedure.

Subjects are randomly exposed to either drug A or B then undergo a procedure and asked to score the pain associated with that procedure 0-10. After a washout period, they are exposed to drug B or A (the one they were not initially exposed to) and then undergo the same procedure, again being asked to score their pain 0-10. Subjects are also asked to score their satisfaction with each drug, again on score 0-10. Subjects are also asked to state their overall preference (drug A or B).

Although I can find resources on crossover trials (such as the textbook by Senn 2002, as well as some online explanations of crossover trials), I am unable to find a decent resource with regard to the analysis of one in Stata.

Using "help pkcross" provides some information, but is somewhat brief.

Can anyone help? Indeed, can pkcross be used for experiments such as this or is it more designed for pharmacokinetic experiments as the name suggests?

Regards

Summarizing special character of string variable*

Is there any command by which I can summarize special characters (e.g. *,-.) of string variable?

Observations in descriptive and multivariate analyses

Hi, I am just writing to ask something about the observations in descriptive and multivariate analyses in my paper. Do I Need to make them in all of the tests the same? Like if I have 2000 observations in the regression, do I need to add if e(sample) when I run the descriptive analyses?
Thank you in advance!

PCA vs MCA for index with survey weights

Dear All,

I am trying to calculate a wealth index using Principal Component Analysis. The quintiles derived from it are to be used as proxy for household socioeconomic status in a regression later.

I am using household income separately as an explanatory variable--my own reasoning being wealth and income are different ideas and it is also in accordance with the literature I am following.

Now, the variables for the index are all binary except for one ordinal categorical variable.

It is suggested that if only ordinal and nominal categorical data is used, multiple correspondence analysis is the apparent method.

However, I am using survey data, and while the user-written command -pca- accommodates for aweight, the other user-written -mca- does not.

What would be the best approach under these circumstances. Would going forward with -pca- be appropriate?

Constrained systems of sfcross regressions!

Hi everyone,

I'm trying to do something which I'm not sure is possible with Stata.

Basically, I'm trying to replicate a paper on rent control (Caudill 1993) which compared prices of controlled rental units to the inefficient output of a production function (where inputs are the hedonic characteristics of the units) and prices of uncontrolled units as the inefficient output of a cost function; the "frontier" level would be the equilibrium price in absence of controls. The comparison stems from the fact that controlled rents should be lower than the hypothetical equilibrium price in absence of controls (lower than the "frontier") while uncontrolled rents should be higher.

My dataset contains rent and housing characteristics on a sample of housing units, and contains both controlled and uncontrolled units.
So I need to estimate a stochastic frontier production function and a stochastic frontier cost function simultaneously on different groups of my dataset (but the Y variable and the X variables are the same!), imposing coefficients of the two models to be equal. The dependent variable Y is the rent and the independent variables X1 X2 X3 X4 are characteristics of the units. Units are controlled if C=1 and uncontrolled if C=0.
So to estimate the separate frontiers for the two groups (the production SF for controlled units and cost SF for uncontrolled) I'd do:

Code:

sfcross Y X1 X2 X3 X4 if c==1

sfcross Y X1 X2 X3 X4 if c==0, cost

Is there a way to estimate the two frontiers simultaneously, imposing the beta coefficients for the two models to be equal?

Or, as a second-best option, is there a way to estimate the first model first, and then estimate the second imposing the coefficients of the first one?

Hope it is clear enough.
Aurora

mixed models xtreg xtmixed

Hi All,

I have a question about xtreg and xtmixed.

If I enter the following command I get this result.

xtset interven2

xtreg difftot i.gender2

Random-effects GLS regression Number of obs = 85
Group variable: interven2 Number of groups = 3

R-sq: within = 0.0015 Obs per group: min = 13
between = 0.1840 avg = 28.3
overall = 0.0007 max = 55

Wald chi2(2) = 0.06
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.9722

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | .0294211 .151446 0.19 0.846 -.2674076 .3262497
Non-bin | -.0499999 .4162483 -0.12 0.904 -.8658316 .7658317
|
_cons | .281 .0724595 3.88 0.000 .1389819 .4230181
-----------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .55706407
rho | 0 (fraction of variance due to u_i)
----------------------------------------------------------------------------------

However if I request mle with xtreg I get the following result.

xtreg difftot i.gender2, mle

Fitting constant-only model:
Iteration 0: log likelihood = -72.359585
Iteration 1: log likelihood = -72.330535
Iteration 2: log likelihood = -72.33045

Fitting full model:
Iteration 0: log likelihood = -72.427795
Iteration 1: log likelihood = -72.067824
Iteration 2: log likelihood = -72.017598
Iteration 3: log likelihood = -72.013335
Iteration 4: log likelihood = -72.013311

Random-effects ML regression Number of obs = 85
Group variable: interven2 Number of groups = 3

Random effects u_i ~ Gaussian Obs per group: min = 13
avg = 28.3
max = 55

LR chi2(2) = 0.63
Log likelihood = -72.013311 Prob > chi2 = 0.7282

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | -.0463684 .1571585 -0.30 0.768 -.3543934 .2616566
Non-bin | -.3585661 .4490142 -0.80 0.425 -1.238618 .5214855
|
_cons | .3761473 .1436128 2.62 0.009 .0946713 .6576232
-----------------+----------------------------------------------------------------
/sigma_u | .1827952 .1268277 .0469224 .7121132
/sigma_e | .5517788 .0434 .4729487 .6437482
rho | .0988951 .1260109 .0033984 .5520696
----------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 1.44 Prob>=chibar2 = 0.115

Why should the coefficients and sigma_u variance change?
Similarly this also occurs with xtmixed as can be seen with the following results just below.

It has to do with the constant which I notice is mentioned in the first line of the just previous output but I don't understand how.

Also, the results for xtmixed seem to be opposite to the results from xtreg in the sense that where I request no constant with xtmixed, the results are
identical to where xtreg does not have a constant only model - the first model above..

An explanation of why this is happening would be much appreciated.

xtmixed difftot i.gender2 || interven2:, mle

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -72.013312
Iteration 1: log likelihood = -72.013311

Computing standard errors:

Mixed-effects ML regression Number of obs = 85
Group variable: interven2 Number of groups = 3

Obs per group: min = 13
avg = 28.3
max = 55

Wald chi2(2) = 0.77
Log likelihood = -72.013311 Prob > chi2 = 0.6805

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | -.0463684 .1537466 -0.30 0.763 -.3477063 .2549695
Non-bin|Diff-ID | -.3585661 .4175263 -0.86 0.390 -1.176903 .4597703
|
_cons | .3761473 .1364022 2.76 0.006 .1088039 .6434906
----------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
interven2: Identity |
sd(_cons) | .1827952 .1268308 .0469209 .7121365
-----------------------------+------------------------------------------------
sd(Residual) | .5517788 .0434 .4729487 .6437482
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 1.44 Prob >= chibar2 = 0.1151

xtmixed difftot i.gender2 || interven2:, mle noconstant

Note: all random-effects equations are empty; model is linear regression

Mixed-effects ML regression Number of obs = 85

Wald chi2(2) = 0.06
Log likelihood = -72.733388 Prob > chi2 = 0.9712

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | .0294211 .1487494 0.20 0.843 -.2621224 .3209645
Non-bin|Diff-ID | -.0499999 .4088367 -0.12 0.903 -.8513052 .7513053
|
_cons | .281 .0711693 3.95 0.000 .1415107 .4204894
----------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
sd(Residual) | .5693547 .0436675 .4898902 .6617091
------------------------------------------------------------------------------

Thanks in advance,

Don

Different markers with rcapsym

Hi,

I would like to have 2 different markers for the starting points and the end points. So for example, the 07 starting score should be a hollow circle and the 18 end score should be solid circle. The code that I have right now only gives me circles:

twoway (rcapsym v07 v18 indic, lwidth(thick) msize(medlarge) msymbol(o)) if ccode == 679, ylabel(-4(1)4, labsize(huge)) ymtick(, labsize(vlarge)) xlabel(, labsize(huge)) xmtick(, labsize(vlarge)) title(Yemen, size(huge)) xsize(1.5) ysize(4)

however I have tried different ways, adding more marker specifications to msymbol, but it does not work.

twoway (rcapsym v07 v18 indic, lwidth(thick) msize(medlarge) msymbol(o oh)) if ccode == 679, ylabel(-4(1)4, labsize(huge)) ymtick(, labsize(vlarge)) xlabel(, labsize(huge)) xmtick(, labsize(vlarge)) title(Yemen, size(huge)) xsize(1.5) ysize(4)

Is there even a way to do this?

Thank you,

Steffi

Friday, June 28, 2019

Regarding the question of the spmatrix import comand

i do a spmatrix import, while it
Array

i modify my matrix from the command of spmatrix export, so i am sure my matrix is correct because i can't change the format of it.
i don't know why this error happen .

thanks for help

How to drop duplicate ID observation series with different variable values

Hi all,

I hava an panel dataset for firms and stock returns between 2005 and 2015. However, for some observations (sorted by FirmID and Date) I have duplicates with differing stock prices.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(DailyObservation FirmID) long Date double ClosingPrice float dup_obs
675 23 16439 7.46 1
  1 23 16439 2.95 2
  2 23 16440 2.96 1
676 23 16440 7.59 2
  3 23 16441 2.95 1
677 23 16441 7.58 2
  4 23 16442 2.96 1
678 23 16442 7.75 2
679 23 16443 7.76 1
  5 23 16443 3.25 2
680 23 16446 7.84 0
681 23 16447    8 1
  6 23 16447 2.95 2
682 23 16448  7.8 0
683 23 16449 7.72 1
  7 23 16449 2.95 2
684 23 16450 7.84 0
685 23 16453 7.93 1
  8 23 16453 2.95 2
  9 23 16454 2.95 1
686 23 16454 7.83 2
 10 23 16455 2.95 1
687 23 16455 7.85 2
 11 23 16456 2.95 1
688 23 16456 7.79 2
 12 23 16457 2.95 1
689 23 16457 7.78 2
 13 23 16460 3.02 1
690 23 16460 7.68 2
691 23 16461 7.68 1
 14 23 16461 2.95 2
692 23 16462 7.77 1
 15 23 16462 2.95 2
 16 23 16463 2.95 1
693 23 16463 7.85 2
694 23 16464 7.84 1
 17 23 16464 2.95 2
695 23 16467  7.9 1
 18 23 16467 2.95 2
696 23 16468 8.09 1
 19 23 16468 2.93 2
697 23 16469 8.07 1
 20 23 16469  3.1 2
 21 23 16470    3 1
698 23 16470 7.96 2
 22 23 16471 3.01 1
699 23 16471 8.02 2
700 23 16474 8.28 0
701 23 16475 8.36 1
 23 23 16475 2.94 2
 24 23 16476 3.31 1
702 23 16476 8.85 2
 25 23 16477 3.35 1
703 23 16477 8.76 2
704 23 16478 8.78 1
 26 23 16478  3.3 2
 27 23 16481 3.03 1
705 23 16481 8.59 2
 28 23 16482 3.29 1
706 23 16482 8.71 2
 29 23 16483 3.02 1
707 23 16483 8.74 2
 30 23 16484 3.03 1
708 23 16484  8.8 2
709 23 16485 8.57 1
 31 23 16485  3.2 2
710 23 16488 8.51 1
 32 23 16488 3.02 2
 33 23 16489 3.02 1
711 23 16489 8.36 2
 34 23 16490 3.02 1
712 23 16490 8.26 2
713 23 16491 8.23 0
 35 23 16492 3.02 1
714 23 16492 8.38 2
715 23 16495 8.46 0
716 23 16496 8.27 1
 36 23 16496 3.02 2
717 23 16497 8.39 1
 37 23 16497 3.03 2
718 23 16498 8.23 1
 38 23 16498 3.02 2
 39 23 16499 3.05 1
719 23 16499 8.31 2
 40 23 16502 3.02 1
720 23 16502 8.41 2
 41 23 16503 3.02 1
721 23 16503 8.51 2
 42 23 16504 2.93 1
722 23 16504 8.52 2
 43 23 16505 2.95 1
723 23 16505 8.42 2
 44 23 16506  3.2 1
724 23 16506 8.62 2
725 23 16509  8.4 1
 45 23 16509  3.2 2
726 23 16510 8.04 1
 46 23 16510  3.3 2
727 23 16511 7.96 0
 47 23 16512 3.35 1
end
format %d Date

Dup_Obs was derived using an ADO file and the code:

Code:

dup FirmID Date

I would like to drop one duplicate time series set, either the series with the lower closing prices or the series that does not start with DailyObservation #1. I am having trouble coming up with the code to drop one of the series. I have tried:

Code:

dup FirmID Date, drop

However this drops all dup_obs that are not equal to 0. This parts of both of the duplicate series to be dropped.
Using code such as

Code:

drop if dup_obs!=0 & DailyObservation>2865

also does not help because since it is an unbalanced panel data not all duplicate series go until such a high Observation number.

appending a string and time variables

Two of the variables in my dataset needs to be appended into a new variable. The first of represents a data in this format (string variable): 19Nov2019, and another time in this format: 04:00. I need to create a new variable out of these two - the date and time of sessions. For example, values would be 19Nov2019 04:00; then 19Nov2019 05:00, etc. How should I do it? Thanks a lot beforehand.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 delibdate double delibtime
"26Nov2018"   -1.8933948e+12
"26Nov2018"   -1.8934002e+12
"03Dec2018"   -1.8934056e+12
"08March2019" -1.8933957e+12
"26Nov2018"   -1.8933948e+12
"15Feb2019"   -1.8933912e+12
"15Feb2019"   -1.8933912e+12
"03Dec2018"   -1.8934056e+12
"04March2019" -1.8933966e+12
"26Nov2018"   -1.8934002e+12
"26Nov2018"   -1.8934002e+12
"03Dec2018"   -1.8933948e+12
"04March2019" -1.8933966e+12
"15Feb2019"   -1.8933912e+12
"08March2019" -1.8933957e+12
"04March2019" -1.8934011e+12
"08March2019" -1.8933912e+12
""                         .
"08March2019" -1.8933912e+12
"08March2019" -1.8933957e+12
""                         .
""                         .
"04March2019" -1.8933966e+12
"03Dec2018"   -1.8934056e+12
"04March2019" -1.8933966e+12
"03Dec2018"   -1.8934056e+12
""                         .
"04March2019" -1.8933966e+12
"15Feb2019"   -1.8933957e+12
"08March2019" -1.8933957e+12
"26Nov2018"   -1.8933948e+12
"03Dec2018"   -1.8934002e+12
"08March2019" -1.8933912e+12
"03Dec2018"   -1.8934002e+12
"08March2019" -1.8933957e+12
"03Dec2018"   -1.8933948e+12
""                         .
""                         .
""                         .
"26Nov2018"   -1.8934056e+12
"26Nov2018"   -1.8934056e+12
"08March2019" -1.8933912e+12
"15Feb2019"   -1.8933957e+12
"26Nov2018"   -1.8934056e+12
""                         .
"26Nov2018"   -1.8934002e+12
"03Dec2018"   -1.8934056e+12
"26Nov2018"   -1.8934002e+12
"04March2019" -1.8934011e+12
"26Nov2018"   -1.8934056e+12
"04March2019" -1.8933966e+12
""                         .
"08March2019" -1.8933912e+12
"04March2019" -1.8933966e+12
"26Nov2018"   -1.8933948e+12
""                         .
""                         .
"03Dec2018"   -1.8933948e+12
"15Feb2019"   -1.8933912e+12
"26Nov2018"   -1.8933948e+12
""                         .
"26Nov2018"   -1.8933948e+12
""                         .
"26Nov2018"   -1.8934056e+12
"15Feb2019"   -1.8933912e+12
""                         .
"03Dec2018"   -1.8934056e+12
"03Dec2018"   -1.8933948e+12
"03Dec2018"   -1.8934002e+12
""                         .
""                         .
"04March2019" -1.8933966e+12
"04March2019" -1.8934011e+12
"08March2019" -1.8933957e+12
"15Feb2019"   -1.8933957e+12
"26Nov2018"   -1.8933948e+12
""                         .
""                         .
"15Feb2019"   -1.8933957e+12
"04March2019" -1.8934011e+12
"15Feb2019"   -1.8933957e+12
""                         .
""                         .
"26Nov2018"   -1.8934002e+12
"26Nov2018"   -1.8933948e+12
"03Dec2018"   -1.8933948e+12
"08March2019" -1.8933957e+12
"26Nov2018"   -1.8934056e+12
""                         .
"26Nov2018"   -1.8933948e+12
"03Dec2018"   -1.8933948e+12
"03Dec2018"   -1.8934056e+12
"04March2019" -1.8934011e+12
"03Dec2018"   -1.8934056e+12
"15Feb2019"   -1.8933912e+12
end
format %tchh:MM delibtime

Error Code r 601

Hello,

I am trying to import an excel file into STATA but I receive the error code r 601. I am using the correct syntax and path location. I actually used the syntax and path location earlier this week and it worked perfect. Why would it suddenly start giving me this error?

Problems with -collapse- command

Hi all,

I have a problem when I use the -collapse- command. In the following code every time I use the command -collapse- it does not matter if my variable with two categories has (0 and 1) or (1 and 2) or any other pair of numbers; I always get for that variable a collapse of only 0 ( if my two categories are 0 and 1).

To explain myself, better If I have two variables X and Z. My variable Z is numerical categorical and its values are of two categories, for example, have values of 2 and 3. Then, when I use the command -collapse-:
collapse X, by (z) -> I have as result from X and Z only one value for Z that is 2. I do not understand why?

I should have obtained observations for Z = 2 and Z = 3 as a result.

My code is the following

Code:

* Creamos los labels para los loops
local trat_agrup agrupado
local trat_byciu byciudad

* Creamos los labels para los loops // EMPLEO //
preserve
use "$data\labels_var.dta", replace
rename name_var variables
keep variables
duplicates drop variables, force
levelsof variables, local(variables)
disp `variables'
count    
restore

* Agrupamos los tratamientos
foreach base in `trat_agrup' `trat_byciu' {

* Llamamos la base
    quietly use "$data\EMPSAL_synth_byciudad.dta", clear 
    quietly joinby cvemun using "$data/SYNTH_cvemun_`base'_weight.dta", unmatched(both)
    quietly tab _merge, mis
    
    quietly keep emp_* salprom_* salmed_* year month Weight_* norte_tax cvemun

* Separamos las unidades tratadas
    quietly preserve
    quietly keep if norte_tax==1
    quietly collapse (mean) emp_* salprom_* salmed_* (sum) Weight_* (first) norte_tax, by(year month)
    quietly tempfile base_did_1
    quietly save `base_did_1', replace
    quietly restore

* Juntamos las unidades control con el agrupado de las unidades tratadas
    quietly preserve
    quietly keep if norte_tax==0
    quietly tempfile base_did_2
    quietly save `base_did_2', replace
    quietly restore

    quietly use `base_did_2', clear
    quietly append using `base_did_1'
    quietly replace cvemun=33000 if cvemun==.

* Volvemos a generar el indice para las variables
foreach var in `variables' {
            quietly gen Log`var'=log(`var')
            quietly gen byte baseyearmonth=1 if (year==2018 & month==9)
            quietly by cvemun (baseyearmonth), sort: gen Index`var' = Log`var' - Log`var'[1]
            quietly drop baseyearmonth
}    

    quietly drop norte_tax
    quietly gen norte_tax=(cvemun==33000)

    
* Creamos los labels para los loops // EMPLEO //
    quietly preserve
    quietly use "$data\labels_var.dta", replace
    quietly keep if empsal=="emp"
    quietly rename name_var variables
    quietly keep variables
    quietly duplicates drop variables, force
    quietly levelsof variables, local(variables1)
    quietly disp `variables1'
    quietly count    
    quietly restore

foreach var in `variables1' {

* Generamos solo las tendencias de los grupos control y tratamiento
            quietly preserve

            quietly collapse (mean) Index`var' [aw=Weight_`var'], by(year month norte_tax)
        
            quietly reshape wide Index`var',  i(year month) j(norte_tax)
        
            quietly gen date = ym(year,month)
            quietly tsset  date, monthly
            quietly rename (date Index`var'0 Index`var'1) (_time _Y_synthetic _Y_treated)

* Pasamos a porcentajes el valor de la estimacion
            quietly replace _Y_treated=_Y_treated*100 + 100
            quietly replace _Y_synthetic=_Y_synthetic*100 +100
            quietly gen dif=_Y_treated - _Y_synthetic
        
            quietly twoway     (line _Y_treated _time, lwidth(medthick) lpattern(dash) lcolor(green) sort)  //// 
                            (line _Y_synthetic _time, lwidth(medthick) lpattern(solid) lcolor(blue) sort) ////
                            (line dif _time, lwidth(medthick) lpattern(shortdash) lcolor(black) sort yaxis(2)), ////
                             xtitle("", /*margin(medium) height(12)*/ size(medsmall)) ////
                             xlabel(`=tm(2015m1)'(4)`=tm(2019m10)', angle(45) grid glwidth(medthin) glpattern(dash) labsize(medsmall)) ////
                             xline(`=tm(2019m1)', lwidth(medthin) lpattern(dash) lcolor(red)) ////
                             ytitle("Índice (%)", margin(medium) /*height(12)*/ size(medsmall)) ////
                             ylabel(70(10)125, angle(0) /*format(%03.2f)*/ grid glwidth(medthin) glpattern(dash) labsize(medsmall)) ////
                             ytitle("Diferencia T - C", axis(2) margin(medium) /*height(12)*/ size(medsmall)) ////
                             ylabel(-4(4)16, axis(2) angle(0) /*format(%03.2f) grid glwidth(medthin) glpattern(dash)*/ labsize(medsmall)) ///
                             yline(0, axis(2) lwidth(medthin) lpattern(solid) lcolor(gs9)) ///
                             text(125 `=tm(2019m1)' "Incremento" "ene. 2019", size(medium) color(grey) place(w)) ////
                             legend(order(1 "Tratamiento (Municipios Zona Norte)" ///
                                           2 "Control sintético (Municipios NO Zona Norte)" ///
                                           3 "Tratamiento - Control") row(3) size(medsmall) symxsize(*0.6) /*span region(lcolor(white))*/) ////
                             title("Variable: `var'", margin(medlarge)) ////
                             subtitle("`base'", height(-12)) ///
                             caption("NOTA1: Índice con base en septiembre 2018" ///
                                     "NOTA2: Estimaciones hasta mes de Mayo" size(small))       ////             
                             graphregion(fcolor(white))
            quietly graph export "$graph\synth_manu`base'_`var'.emf", replace font("Times New Roman")
            quietly restore
}
}

I do not know to load a database to statalist. If you could explain to me how it is loading a database.

Thanks,

Alexis Rodas

power analysis after ttest

I'm trying to import the means and stds from a two sample ttest into power twosample, but the power command seems to accept only typed-in numbers and not r() style pointers to numbers. Am I doing something wrong or is the power command?

Here is what I tried:

run the ttest
. ttest s_wellbeing if gysj_inexper , by (gysj_yn)
... output deleted

confirm that I can retrieve means and standard deviations using r() notation
. dis r(mu_1) " " r(mu_2) " " r(sd_1) " " r(sd_2)
3.3537638 3.4037035 1.0217724 .9781072

power returns an error when I use the results from ttest in r() form
power twomeans r(mu_1)r(mu_2), sd1(r(sd_1)) sd2(r(sd_2))
means must contain numbers
r(198);

But it works fine if I type in the numbers directly
power twomeans 3.3537638 3.4037035, sd1(1.0217724) sd2(.9781072)

Performing iteration ...
... results deleted

xtqreg and reghdfe

Hello,

My question can be summarized as... why does xtqreg and reghdfe produce similar results when my data set has large outliers on both ends?

When I just use 'reghdfe' with firm and year fixed effects, due to large outliers, the estimated coefficients do not make sense at all; they are way too big. Thus, I decided to estimate with a quantile regression with fixed effects so that the estimates are less sensitive to large outliers.

I understand that qregpd and xtqreg estimate very different models, thus, they should produce different estimates. When I use qregpd, the scale of estimates is sensible. However, when I use xtqreg, the scale of estimates looks very similar to ones produced by reghdfe. I am little bit confused because median and mean of my data set are very different, so I expected xtqreg to produce different results from reghdfe. Could anyone help me understand why this might be happening?

Thank you so much in advance!

Sincerely,
Soyoung

How to control for something within a Diff in Diff Graph

Hello everybody,

I am currently trying to create the usual Diff in Diff Graph e.g:

Array

My setting:

I have panel data of about 4 million German companies.
I regress the yearly log total assets growth of those companies on a business tax levy increase dummy of the municipal the respective firm is located in. The dummy is one if the municipal increased the business tax levy this year
I am expecting that there is a negative effect on log total assets growth of a firm within the same year and the years after the municipal increased the tax.
I have 16 years, 11.000 municipals and 1500 tax increase events. So I have to standardize the x-axis to event time instead of years.

I receive the following results in my regression:

Array

Using the following Code:

Code:

//Firm  and state  control variables
    xtset bvd_id year
    local independent "F3.hebesatzIncreaseDummy F2.hebesatzIncreaseDummy F1.hebesatzIncreaseDummy hebesatzIncreaseDummy L1.hebesatzIncreaseDummy L2.hebesatzIncreaseDummy L3.hebesatzIncreaseDummy"
    local firmControls "L1.assets_total_million L1.ratio_leverage L1.ratio_leverage_change age" // assets_total_log age
    local stateControls "L1.gspGrowthRate L1.gspGrowthRate_change L1.unemploymentRate L1.unemploymentRate_change" //i.municipalAssetsTotalQuantil population 
        
//Regression    
    qui eststo spezifikation1: reg growth_assets `independent' `firmControls' `stateControls', vce(cluster statekz)
    qui eststo spezifikation2: reghdfe growth_assets `independent' `firmControls' `stateControls', absorb(i.year i.industry_sic_2_digit) vce(cluster statekz)
    qui eststo spezifikation3: reghdfe growth_assets `independent' `firmControls' `stateControls', absorb(i.year##i.industry_sic_2_digit) vce(cluster statekz)
    qui eststo spezifikation4: reghdfe growth_assets `independent' `firmControls' `stateControls', absorb(i.year##i.industry_sic_2_digit i.municipalId) vce(cluster statekz)

//Regression output    
    esttab spezifikation1 spezifikation2 spezifikation3 spezifikation4, b("%-8.5f") t ///
    stats(N r2_a, labels("N" "Adj. R-Square") fmt("%-8.0fc" "%-8.3f")) ///
    varwidth(22)  ///
    nonumbers mtitles("No FE" "Year Ind" "Year##Ind" "Year##Ind Mun" "Model 5" "Model 6" "Model 7" "Model 8") ///
    nonotes addnote("t-values werden in Klammern angegeben.")

But somehow my Diff in Diff graph looks like this:

Array

I am trying to understand why the log total assets growth mean of the control group also decreases after the event. I am using the following code to generate the graph. I am new to Stata so please forgive me if there are more effective ways to achieve the same (actually please tell me if there are).

Code:

foreach group in 1 0 {
    foreach time in "L5" "L4" "L3" "L2" "L1" "L0" "F1" "F2" "F3" "F4" "F5" { 
       qui: sum `time'.growth_assets if hebesatzIncreaseDummy == `group', de
        scalar group`group'Time`time' = r(mean)
    }    
}

//Clear Dataset
drop _all

//Create Graph Dataset out of Scalars
set obs 22

gen treated = 0
gen eventtime = 0
gen growth_assets_mean = 0

scalar obs = 1
foreach group in 1 0 {
    foreach time in "L5" "L4" "L3" "L2" "L1" "L0" "F1" "F2" "F3" "F4" "F5" {

        replace treated = `group' in `=obs'
        
        replace eventtime = real(substr("`time'", 2, 2)) in `=obs'
        if (substr("`time'", 1, 1) == "L") {
            replace eventtime = eventtime *-1 in `=obs'
        }
            
        replace growth_assets_mean = group`group'Time`time' in `=obs'
        
        scalar obs = `=obs' + 1
    }    
}

//Graph
twoway (line growth_assets_mean eventtime if treated == 1) (line growth_assets_mean eventtime if treated == 0), legend(lab (1 "Treated firms") lab(2 "Non treated firms")) ylabel(#6) xlabel(#11, grid)

I think I have to somehow control for years. Since most of the tax increases happened in more recent years and the log total assets growth mean also declined in more recent years. But I don't know how to do that.

I would be very thankful for any help!

Best regards,
Andres