BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Wednesday, June 30, 2021

replicating an instrumented model in eregress and ivreg2 (or ivregress)

Hi Stata people,

I would like to replicate a model I am using in eregress (Stata 15 and up) with ivreg2 (Baum et al user-written command SSC).
I am using Stata 16.
This is all just for robustness checks of the eregress model, so I wanted to run the ivreg2 for the post-estimation diagnostics (which are fantastic, by the way). However, I would be happy with a solution in which the results of eregress = ivregress.

The model I *actually* am using in eregress is this:

Code:

eregress outcome_rate i.category## i.moderator controldummy growthrate , endog (moderator=  c.instrument i.category controldummy growthrate, nomain) vce(robust)

The complication is that my binary endogenous variable (var name = "moderator") is also used in an interaction with a categorical variable with 5 levels.
In the simulated data below, there are 3 possible categories and I don't have enough instruments. In the real data, things are worse because the I have 5 possible values for the categorical variable and I would need around 10 instruments.

As a workaround, I created a brute force interaction so that I wouldn't need 8 extra instruments (yes, I know that is not how one should handle interactions, but I ask that you please keep playing along here).
THE PROBLEM is that I can't get the two commands to give the same results with the interaction.

Starting with a plain model. these give equivalent results (coefficients are nearly identical):

Code:

ivreg2 outcome_rate   i.category ( moderator  =  c.instrument   ) controldummy growthrate , robust

Code:

eregress outcome_rate i.category i.moderator controldummy growthrate , endog (moderator=  c.instrument i.category controldummy growthrate, nomain) vce(robust)

Then, I create the fake-interaction term:

Code:

gen interaction= category*moderator

And introduce it into the models:

Code:

ivreg2 outcome_rate   i.category ( moderator interaction  =  c.instrument   c.instrument#interaction ) controldummy growthrate , robust

(note that 'category' is the direct effect, while 'interaction' is actually category*moderator)

But in eregress I either have to include the whole interaction or leave it out, but what I want to do is put just the endogenous part of the interaction into the endog() part of the model.
Neither of these next models are equivalent to the ivreg2:

Code:

eregress outcome_rate i.category moderator interaction controldummy growthrate , endog (moderator=  c.instrument i.category interaction controldummy growthrate, nomain) vce(robust)

Code:

eregress outcome_rate i.category moderator interaction controldummy growthrate, endog (moderator=  c.instrument i.category                   controldummy growthrate, nomain) vce(robust)

Code:

eregress outcome_rate i.category moderator interaction controldummy growthrate , endog (moderator =  c.instrument i.category interaction controldummy growthrate, nomain)  endog (interaction =  c.instrument i.category  controldummy growthrate, nomain)vce(robust)

Help! I'm hoping the Stata-listers might see something I have not---either to make ivreg2=eregress OR get around the lack of instruments when you have an endogenous variable that must be interacted with a 5-level category.

Thanks so much!
-Renee

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float outcome_rate byte(category moderator controldummy) float(growthrate instrument)
  .0512572 1 0 0  -1.158599  1.200202
 1.4415318 2 1 1 -1.6130273  .8802252
  .4398367 2 0 0 -1.5614135 1.1793866
 2.0684404 1 0 1 -1.1937082  .9026076
.020747656 2 0 0  -.8504853  .8710138
 .04264894 2 0 0 -1.1334147  .8710138
 2.1645558 1 0 0  -.7279454  .8710138
-1.0435052 1 1 0  -.9729756   1.18916
 1.2788826 2 0 0  .25533304 1.2002015
  .4057586 2 0 0  -1.416769  .9069632
 1.1645558 1 0 0  -.7279454  .8710138
 1.0435052 1 1 0  -.9729756   1.18916
  .2788826 1 1 1  .25533304 1.2002015
 -.4057586 2 1 1  -1.416769  .9069632
  .0512572 3 1 0  -1.158599  1.200202
 1.4415318 3 1 1 -1.6130273  .8802252
  .4398367 3 0 1 -1.5614135 1.1793866
 2.0684404 3 1 1 -1.1937082  .9026076
.020747656 3 1 0  -.8504853  .8710138
 .04264894 3 0 0 -1.1334147  .8710138
end

replicating an instrumented model in eregress and ivreg2 (or ivregress)

Hi Stata people,

I would like to replicate a model I am using in eregress (Stata 15 and up) and ivreg2 (Baum et al user-written command SSC).
I am using Stata 16.
This is all just for robustness checks of the eregress model, so I wanted to run the ivreg2 for the post-estimation diagnostics (which are fantastic, by the way). However, I would be happy with a solution in which the results of eregress = ivregress.

The complication is that my binary endogenous variable (var name = "moderator") is also used in an interaction with a categorical variable with 5 levels.
In the simulated data below, there are 3 possible categories and I don't have enough instruments. In the real data, things are worse because the I have 5 possible values for the categorical variable and I would need around 10 instruments.

As a workaround, I created a brute force interaction so that I wouldn't need 8 extra instruments (yes, I know that is not how one should handle interactions, but I ask that you please keep playing along here).
THE PROBLEM is that I can't get the two commands to give the same results with the interaction.

Starting with a plain model. these give equivalent results (coefficients are nearly identical):

Code:

ivreg2 outcome_rate   i.category ( moderator  =  c.instrument   ) controldummy growthrate , robust

Code:

eregress outcome_rate i.category i.moderator controldummy growthrate , endog (moderator=  c.instrument i.category controldummy growthrate, nomain) vce(robust)

Then, I create the fake-interaction term:

Code:

gen interaction= category*moderator

And introduce it into the models:

Code:

ivreg2 outcome_rate   i.category ( moderator interaction  =  c.instrument   c.instrument#interaction ) controldummy growthrate , robust

Code:

eregress outcome_rate i.category moderator interaction controldummy growthrate , endog (moderator=  c.instrument i.category interaction controldummy growthrate, nomain) vce(robust)

Code:

eregress outcome_rate i.category moderator interaction controldummy growthrate, endog (moderator=  c.instrument i.category                   controldummy growthrate, nomain) vce(robust)

Code:

eregress outcome_rate i.category moderator interaction controldummy growthrate , endog (moderator =  c.instrument i.category interaction controldummy growthrate, nomain)  endog (interaction =  c.instrument i.category  controldummy growthrate, nomain)vce(robust)

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float outcome_rate byte(category moderator controldummy) float(growthrate instrument)
  .0512572 1 0 0  -1.158599  1.200202
 1.4415318 2 1 1 -1.6130273  .8802252
  .4398367 2 0 0 -1.5614135 1.1793866
 2.0684404 1 0 1 -1.1937082  .9026076
.020747656 2 0 0  -.8504853  .8710138
 .04264894 2 0 0 -1.1334147  .8710138
 2.1645558 1 0 0  -.7279454  .8710138
-1.0435052 1 1 0  -.9729756   1.18916
 1.2788826 2 0 0  .25533304 1.2002015
  .4057586 2 0 0  -1.416769  .9069632
 1.1645558 1 0 0  -.7279454  .8710138
 1.0435052 1 1 0  -.9729756   1.18916
  .2788826 1 1 1  .25533304 1.2002015
 -.4057586 2 1 1  -1.416769  .9069632
  .0512572 3 1 0  -1.158599  1.200202
 1.4415318 3 1 1 -1.6130273  .8802252
  .4398367 3 0 1 -1.5614135 1.1793866
 2.0684404 3 1 1 -1.1937082  .9026076
.020747656 3 1 0  -.8504853  .8710138
 .04264894 3 0 0 -1.1334147  .8710138
end

Stata 17 ptrends

Hi, I'm a student working on a diff in diff using Stata 17's new didregress. I used the postestimation command estat ptrends and it confirmed that I had pre treatment parallel trends. My question is how that could be determined using an F test? Please help me!

How to create a two-axis graph by using coefficient plotting?

When I run this regression for coefficient plotting purposes

Code:

reghdfe wNET_PRO_MAR dkt_4 dkt_3 dkt_2 dkt_1 dkt_0 dkt1 dkt2 dkt3 dkt4  pt wFIRM_SIZE LNGDP UNEMPLOYMENT IMP_OVER_GDP INFLATION wLEVERAGE, a(TYPE2 yr)

(dkt_4 is at year -4, dkt_3 is at year -3,...,dkt_0 is the adoption year...,dkt4 is at year 4 around the adoption year)

I have the result as below Array

Could you please let me know if there is any code to plot the coefficients of dkt_4 dkt_3 dkt_2 dkt_1 dkt_0 dkt1 dkt2 dkt3 dkt4 to a two-axis graph while the vertical is the magnitude of the coefficients and the horizontal axis is the year similarly to the figure below Array

Following this website( http://repec.sowi.unibe.ch/stata/coe...g-started.html ), I try to run the code

coefplot, vertical yline(0) drop (_cons wFIRM_SIZE LNGDP UNEMPLOYMENT INFLATION wTANGIBILITY wLAG_SAL_GRO FCF wLOG_MVE wCAP_INT REA_INT_RAT wPROFIT wLEVERAGE wINVESTMENT wDIVIDEND median_mtbv Herfindahl)

The result is like that

Array
I am wondering how to change the name of the variables at the horizontal. For example dkt_4 to -4, or dkt4 to 4

Outfile to raw without aproximating values of data

Hi, how can I export data into a raw file preserving the original values of large numbers?

I am using outfile with "comma wide" options and large number values in my data change. For example the value 14543102 changes to 14500000.

Thanks!

How to copy the output table in Stata to excel with the cell-by-cell format?

When running this code,

Code:

esttab m1 m2 m3 m4 m5 m6,star(* 0.1 ** 0.05 *** 0.01) ar2 title(" dependent variable ") mtitle("w/o contr" "f&c_contr" "ind_contr" "ind*year_fe" "reg*year_fe" "non_US firms") keep(pt wFIRM_SIZE LNGDP UNEMPLOYMENT IMP_OVER_GDP INFLATION wLEVERAGE median_mtbv Herfindahl)

I got the result as below

Array

I am wondering how to copy and paste it to the excel worksheet that cell by cell separately. I mean, the Table in excel should have 7 columns, 30 rows. I read this document about esttab (http://repec.org/bocode/e/estout/hlp_esttab.html), I have tried some options like csv, plain, fixed but it does not work. At this moment, when I use these options or the smcl default option, the result when pasting to Excel should be in one cell for the output of one row.
For example, pt -0.0108*** -0.00383*** -0.00372*** -0.00336*** 0.000616 -0.00206 is displayed in one cell

Neighboring States

Hello Statalisters,

I am running into a problem with some code. I am writing a policy diffusion paper and need to create a variable that codes neighboring states 1 (0 if not a neighboring state). Someone (thankfully) has already done this, however I keep getting a r(198) error when I attempt to use it. A data example is below:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long state_abbrv float state_fips long state_name str2 stateabbr str20 statename
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
2 1 1 "AL" "Alabama"
end
label values state_abbrv state_abbrv
label def state_abbrv 2 "AL", modify
label values state_name statename
label def statename 1 "Alabama", modify

------------------ copy up to and including the previous line ------------------

Listed 10 out of 24900 observations

The code I'm using comes from this website: http://theincidentaleconomist.com/wo...th-stata-code/ (super helpful for this by the way). However, when I enter the code I get this error:

HTML Code:

. gen sAKn = inlist(stateabbr,”WA”)
”WA” invalid name
r(198);

This also happens if I use gen sAKn = inlist(stateabbr==”WA”). I know stateabbr is a string variable so I cannot for the life of me figure out why this isn't working.

This happens even if I use the state_abbrv variable as well (although that's a long variable currently). I prefer using the state abbreviations because I don't know off the top of my head the state_fips code. Any assistance would be greatly appreciated.

Curiousity about the default symbols of Stata?

Today when I read the esttab command of Stata (http://repec.org/bocode/e/estout/hlp_esttab.html), I saw the star options
Array

I know how to convert it to * for p<0.1, ** for p<0.05, *** for p<0.01 (the commonly-used p-value to set the levels of significance). But I am wondering why Stata set the default like that, is there anything important that I missed?

Testing Differences in Pearson's Correlations in Survey Data

Hello everyone:

I am trying to compare the correlation among the same two numerical variables (x1 and x2) in two different populations (n1 and n2) in NHANES data (survey data). I have been using the corr_svy to calculate the correlations but I want to test whether the correlation between x1 and x2 differs significantly in n1 as opposed to n2. I have seen the user command cortesti but this does not take weights. Is it necessary to use weights when comparing these correlations? I also see the user command corrcii (which does take weights) and I could examine whether confidence intervals overlap but I am wondering if it is possible to get an actual p-value. Does anyone have any thoughts or suggestions?

I appreciate any assistance, thank you. I am using Stata 16.1.

Subsetting a single CSV file into multiple CSV files

Hello,

I am trying to subset one CSV file into multiple CSV files. I wrote the codes below utilizing preserve and restore, and it's for one CSV file with 1150 records. It works.

Code:

import delimited "10_Exercise session.csv", clear
scalar file_num=ceil(r(N)/100)

preserve
drop if _n>100
export delimited "10_Exercise session1.csv", replace
restore

preserve
drop if 100<=_n & _n<200
export delimited "10_Exercise session2.csv", replace
restore

...

preserve
drop if 1100<=_n 
export delimited "10_Exercise session12.csv", replace
restore

However, I have other CSV files with different numbers of records (e.g., 500 records or 700 records) so the number of CSV files at the end will be different from file to file.
I tried to use scalar for r(N) then use foreach command but failed. I don't think I have enough coding skills for this yet, so I would appreciate it if you can advise on how to write codes for the task.

Thank you.

Lags in Bayesian vars

I am on state 17 trying to run a Bayesian var.. Does anyone here know how lags are determined in Bayesian vars? I mean, how to find the length of lags in Bayesian vars?

Combining tables with identical variables, long format

Hey guys,

Is there an easy way how to combine two tables with identical variables in a long format? I know the way to do it in R, but I'm relatively new to STATA.

something like

Code:

country year Mtradevalues Xtradevalues reporter
DE 2006 1210361 19155.856 BGR
FR 2006 1930305 23593.872 BGR
PT 2006 2396189 15831.303 BGR
HU 2006 2180976 38385.497 BGR
DE 2007 4231964 28908270 BGR
FR 2007 4605479 51528.886 BGR
PT 2007 5494027 32150.922 BGR
HU 2007 6568278 22475.737 BGR

with

Code:

country year Mtradevalues Xtradevalues reporter
DE 2006 51321281 22475.737 CAN
FR 2006 890429 16259.211 CAN
PT 2006 5754045 15831.303 CAN
HU 2006 6443940 15831.303 CAN
DE 2007 8168050 16259.211 CAN
FR 2007 1321281 22475.737 CAN
PT 2007 1598655 23593.872 CAN
HU 2007 1300962 19155.856 CAN

Code:

country year Mtradevalues Xtradevalues reporter
DE 2006 1210361 19155.856 BGR
FR 2006 1930305 23593.872 BGR
PT 2006 2396189 15831.303 BGR
HU 2006 2180976 38385.497 BGR
DE 2007 4231964 28908270 BGR
FR 2007 4605479 51528.886 BGR
PT 2007 5494027 32150.922 BGR
HU 2007 6568278 22475.737 BGR
DE 2006 51321281 22475.737 CAN
FR 2006 890429 16259.211 CAN
PT 2006 5754045 15831.303 CAN
HU 2006 6443940 15831.303 CAN
DE 2007 8168050 16259.211 CAN
FR 2007 1321281 22475.737 CAN
PT 2007 1598655 23593.872 CAN
HU 2007 1300962 19155.856 CAN

Many thanks in advance

Improving performance of large excel tables with tabout, append - suggestions?

Dear Statalist,

I have been an avid reader of the list for years and it has helped me tremendously in so many situations. I know that people are not crazy about providing feedback on user-written commands. Still, I was wondering whether anybody had any suggestions for me. I am not looking for a solution (I don't think there is a simple solution), rather for suggestions for optimisation.

I like using tabout (SSC) to create nicely formatted tables in excel right from Stata. However, for some projects I have to create a lot of descriptive statistics in tables (sometimes 100+ across multiple sheets), that ideally should be in one output file, otherwise I am afraid things get messy and version control is difficult.

tabout works great for me generally. However, once I start adding several tables on multiple sheets in Excel combining loops with the tabout, append option (let's say, 30+ tables), it starts slowing down massively.

Has anybody else experienced similar issues when creating many tables using tabout, append? Does anybody have any suggestions for improved performance?

Smith et al (2004) Panel Unit Root Test

Hello everyone,

I would like to compare first and second generation unit root test results by applying Smith et al. (2004) (MİN, MAX, WS) test but I don't know how to apply it.
Could anyone help me with the command, please?

Thanks.

Need help generating new variable to capture 18th observation onwards for each data group

Dear all,
I have generated a new variable (Ambtemp2oC_01) to capture temperature readings above two degrees Celsius for different sensors. Now I need to generate a new variable from Ambtemp2oC_01 to capture the 18th observation onwards for each data blocks (that is all data observations before a blank space) for each sensor. Below is a sample of the data.

Observations	SerialNumber	Ambtemp2oC_01
1	1
2	1
3	1
4	1	6.89
5	1	9.56
6	1	12
7	1	11.94
8	1	12.22
9	1	12.89
10	1	13.61
11	1	14.44
12	1	15.56
13	1	18.78
14	1	17.28
15	1	16.33
16	1	16.67
17	1	17.17
18	1	17.83
19	1	18.44
20	1	18.83
21	1	19.11
22	1	19.33
23	1	19.56
24	1	19.72
25	1	19.83
26	1	19.89
27	1	20
28	1	20.11
29	1	20.22
30	1	20.28
31	1	20.33
32	1	20.39
33	1	20.44
34	1	20.56
35	1	20.67
36	1	20.72
37	1	20.83
38	1	22.72
39	1	23.17
40	1	23.17
41	1	23.11
42	1	23.06
43	1	23.17
44	2	15.94
45	2	15.11
46	2	15.06
47	2	15.06
48	2	15.06
49	2	15
50	2	15
51	2	15
52	2	15
53	2	15.06
54	2	15.06
55	2	15.17
56	2	15.5
57	2	16.11
58	2	16.5
59	2	16.78
60	2	17.11
61	2	17.39
62	2	17.67
63	2	17.94
64	2	18.22
65	2	18.44
66	2	18.72
67	2	18.94
68	2	19.17
69	2	19.39
70	2
71	2
72	2
73	2
74	2	2.06
75	2	2.11
76	2	2.11
77	2	2.17
78	2	2.22
79	2	2.28
80	2	2.33
81	2	2.33
82	2	2.39
83	2	2.44
84	2	2.44
85	2	2.5
86	2	2.56
87	2	2.56
88	2	2.61
89	2	2.67
90	2	2.67
91	2	2.72
92	2	2.78
93	2	2.78
94	2	2.83
95	2	2.83
96	2	2.89
97	2	2.89
98	2	2.94
99	2	2.94
100	2	3
101	2	3
102	2	3.06
103	2	3.06
104	2
105	2
106	2

Thanks for your assistance.
Roland

Tsfill with multiple observations per id & per year

Hi dear statalist,

I have an unbalanced panel with gaps in years, with worker level information about occupation (code) and firm for several years.

A given worker may be observed in several firms within a year if she moves and even twice in the same firm if she changes occupation, but can only be observed once per year in a given occupation code within a given firm. Hence, i am using as unique identifier (id) a combination of person_id, occupation and firm.

I have some gaps years and i was using tsfill but then realized this was giving me some problems.

To provide you with an example of why, consider my dataset as follows for a given worker which changes occupation within the same firm only and where i have used tsfill:

Person_ID	Firm_ID	Estab_id	year	Occupation	Team_id	id
035	553	353	2003	1222	571254	14
			2004			14
			2005			14
			2006			14
			2007			14
			2008			14
			2009			14
			2010			14
			2011			14
			2012			14
			2013			14
035	553	353	2014	1222	571254	14
035	553	353	2002	2145	571267	15
			2003			15
035	553	353	2004	2145	571267	15
035	553	353	2005	2145	571267	15
			2006			15
			2007			15
035	553	353	2008	2145	571267	15
035	553	353	2009	2145	571267	15
			2010			15
035	553	353	2011	2145	571267	15
035	553	353	2012	2145	571267	15

In 2002 she was working at occupation 2145 and in 2003 changed to occupation 1222. In the following year, went back to occupation 2145 up until 2014 where she returned to occupation 1222.
As she is observed in 2003 and in 2014 in the same occupation (1222), stata creates missing years even though we know she was employed in those years a different occupation in between...
I know this happens because of the way i created the unique identifier, but since i have multiple observations per worker per year this was the only way to create a unique id..

Any idea on how to solve this problem with the tsfill??

Thanks a lot!

xtdpdsys and xtabond2 GMM results

Dear Stata community,

I read a number of posts regarding gmm xtdpdsys and xtabond2 but unfortunately I am not achieving the expected outcomes.
Description of the situation point:

I developed my GMM models using the following xtdpdsys code:

. xtdpdsys MTBV PFD CEOD PID LNBS LNTA DUMFOREIGNSTOCK DUMFOREIGNOWN NUMBLOCKHOLDERS, lags(1) maxldep(2) maxlags(1) twostep vce(robust) artests(2)

According to some posts I read, I tryed the Hansen test using the following xtabond2 code:

. xtabond2 L(0/1).MTBV PFD CEOD PID LNBS LNTA DUMFOREIGNSTOCK DUMFOREIGNOWN NUMBLOCKHOLDERS, gmm(MTBV, lag(2 3)) iv(PFD CEOD PID LNBS LNTA DUMFOREIGNSTOCK DUMFOREIGNOWN NUMBLOCKHOLDERS, eq(diff)) h(2) robust twostep

In fact, I obtained the Hansen test but, despite the same number of instruments, the regress results from both codes are quite different.
Could please help me to correct the xtabond2 code in order to obtain the same results derived from xtdpdsys?
Thank you in advance!

Repeated cross sections

Hi everybody,
I have a dataset where each year parties (from different countries) running for elections are evaluated and assigned a score based on their programme. So the dataset contains repeated cross sections.
I want to run a regression with year fixed effects and countries fixed effects, and I am using the following code:
xtset country_num year
xi: reg score i.year i.country_num
where country_num is the variable identifying the country.
I found a hint to write it in an old post, but I wanted to be sure it works also with repeated cross sections (and not just with panel data). Also, if it's not the right code, which commands should I use?
Thanks in advance

ROC curves in Stata

Hi guys,

I am quite new to the topic. However I constructed the toy-example ROC curve displayed in the figure below. ROC curves determine the sensitivity and the specificity of an estimator when its threshold is varied. Stata provides different commands for them: roctab, roccomp, rocfit, rocgold, rocreg, and rocregplot.
Nonetheless, I have constructed figure below using a simple twoway connected command (because I pre-computed FPR and TPR in Matlab).

Array

Now the problem is that I would like to compute the Area Under the Curve (AUC) of such ROC (i.e. area between diagonal line and the curve). How can I do that?

Thank you

omitted because of collinearity

Dear Stata friends

I wanted to run the following regression with fixed effects:

Code:

reg Arbeitslosenrate_ln avg i.numvar if num<=224 , abs(code)

Unfortunatelly, I stata tells, that the coefficient is omitted because of collinearity. What can I do?

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float code int numvar float Arbeitslosenrate_ln str2 anos2 byte anos2_n float avg_GAP_c str3 estu str2 nuts1 byte nuts1_n str2 sexo byte sexo_n float(Anzahl_Betroffene post) long estu_n float indicate
 1 214  -.8803588 "05" 5 .007879958 "1" "1" 1 "1" 1  2 0 1          0
 1 218 -1.1574527 "05" 5 .007879958 "1" "1" 1 "1" 1  2 0 1          0
 1 222  -1.609438 "05" 5 .007879958 "1" "1" 1 "1" 1  2 0 1          0
 1 226  -.9734491 "05" 5 .007879958 "1" "1" 1 "1" 1  2 0 1          0
 1 230  -.6690497 "05" 5 .007879958 "1" "1" 1 "1" 1  2 0 1          0
 1 234 -1.2039728 "05" 5 .007879958 "1" "1" 1 "1" 1  2 1 1 .007879958
 1 238 -1.6486586 "05" 5 .007879958 "1" "1" 1 "1" 1  2 1 1 .007879958
 1 240  -1.299283 "05" 5 .007879958 "1" "1" 1 "1" 1  2 1 1 .007879958
 2 214  -.8472978 "05" 5 .022296773 "1" "1" 1 "6" 6  4 0 1          0
 2 218 -1.0986123 "05" 5 .022296773 "1" "1" 1 "6" 6  4 0 1          0
 2 222  -.8556661 "05" 5 .022296773 "1" "1" 1 "6" 6  4 0 1          0
 2 226 -1.1895841 "05" 5 .022296773 "1" "1" 1 "6" 6  4 0 1          0
 2 230 -1.0033021 "05" 5 .022296773 "1" "1" 1 "6" 6  4 0 1          0
 2 234 -1.1451323 "05" 5 .022296773 "1" "1" 1 "6" 6  4 1 1 .022296773
 2 238  -.6931472 "05" 5 .022296773 "1" "1" 1 "6" 6  4 1 1 .022296773
 2 240  -.9162907 "05" 5 .022296773 "1" "1" 1 "6" 6  4 1 1 .022296773
 3 214  -.7884573 "05" 5 .015599324 "1" "2" 2 "1" 1  4 0 1          0
 3 218  -.7932307 "05" 5 .015599324 "1" "2" 2 "1" 1  4 0 1          0
 3 222 -1.1486228 "05" 5 .015599324 "1" "2" 2 "1" 1  4 0 1          0
 3 226  -.8690379 "05" 5 .015599324 "1" "2" 2 "1" 1  4 0 1          0
 3 230 -1.2039728 "05" 5 .015599324 "1" "2" 2 "1" 1  4 0 1          0
 3 234  -1.609438 "05" 5 .015599324 "1" "2" 2 "1" 1  4 1 1 .015599324
 3 238  -.9808292 "05" 5 .015599324 "1" "2" 2 "1" 1  4 1 1 .015599324
 3 240  -.9903987 "05" 5 .015599324 "1" "2" 2 "1" 1  4 1 1 .015599324
 4 214  -.6632943 "05" 5  .03912635 "1" "2" 2 "6" 6  7 0 1          0
 4 218  -.6131045 "05" 5  .03912635 "1" "2" 2 "6" 6  7 0 1          0
 4 222 -1.5260563 "05" 5  .03912635 "1" "2" 2 "6" 6  7 0 1          0
 4 226  -.7239188 "05" 5  .03912635 "1" "2" 2 "6" 6  7 0 1          0
 4 230  -.4054651 "05" 5  .03912635 "1" "2" 2 "6" 6  7 0 1          0
 4 234  -.6359887 "05" 5  .03912635 "1" "2" 2 "6" 6  7 1 1  .03912635
 4 238  -.6418539 "05" 5  .03912635 "1" "2" 2 "6" 6  7 1 1  .03912635
 4 240 -1.3217558 "05" 5  .03912635 "1" "2" 2 "6" 6  7 1 1  .03912635
 5 214  -.7884573 "05" 5  .07623795 "1" "3" 3 "1" 1  6 0 1          0
 5 218 -1.1631508 "05" 5  .07623795 "1" "3" 3 "1" 1  6 0 1          0
 5 222  -.6359887 "05" 5  .07623795 "1" "3" 3 "1" 1  6 0 1          0
 5 226 -1.1786549 "05" 5  .07623795 "1" "3" 3 "1" 1  6 0 1          0
 5 230 -1.0116009 "05" 5  .07623795 "1" "3" 3 "1" 1  6 0 1          0
 5 234 -1.0116009 "05" 5  .07623795 "1" "3" 3 "1" 1  6 1 1  .07623795
 5 238  -.7884573 "05" 5  .07623795 "1" "3" 3 "1" 1  6 1 1  .07623795
 5 240 -1.0986123 "05" 5  .07623795 "1" "3" 3 "1" 1  6 1 1  .07623795
 6 214 -1.0296193 "05" 5  .13680671 "1" "3" 3 "6" 6  9 0 1          0
 6 218  -.9808292 "05" 5  .13680671 "1" "3" 3 "6" 6  9 0 1          0
 6 222  -.6286086 "05" 5  .13680671 "1" "3" 3 "6" 6  9 0 1          0
 6 226 -1.3862944 "05" 5  .13680671 "1" "3" 3 "6" 6  9 0 1          0
 6 230 -1.0986123 "05" 5  .13680671 "1" "3" 3 "6" 6  9 0 1          0
 6 234  -1.466337 "05" 5  .13680671 "1" "3" 3 "6" 6  9 1 1  .13680671
 6 238  -.8109302 "05" 5  .13680671 "1" "3" 3 "6" 6  9 1 1  .13680671
 6 240  -.3364722 "05" 5  .13680671 "1" "3" 3 "6" 6  9 1 1  .13680671
 7 214  -.6800752 "05" 5  .04406996 "1" "4" 4 "1" 1 12 0 1          0
 7 218  -.7852625 "05" 5  .04406996 "1" "4" 4 "1" 1 12 0 1          0
 7 222  -.7716188 "05" 5  .04406996 "1" "4" 4 "1" 1 12 0 1          0
 7 226  -.8472978 "05" 5  .04406996 "1" "4" 4 "1" 1 12 0 1          0
 7 230 -1.0266387 "05" 5  .04406996 "1" "4" 4 "1" 1 12 0 1          0
 7 234 -1.2321438 "05" 5  .04406996 "1" "4" 4 "1" 1 12 1 1  .04406996
 7 238 -1.0560527 "05" 5  .04406996 "1" "4" 4 "1" 1 12 1 1  .04406996
 7 240 -1.2110903 "05" 5  .04406996 "1" "4" 4 "1" 1 12 1 1  .04406996
 8 214  -.6253246 "05" 5  .07526647 "1" "4" 4 "6" 6  9 0 1          0
 8 218  -.5925036 "05" 5  .07526647 "1" "4" 4 "6" 6  9 0 1          0
 8 222  -.6225296 "05" 5  .07526647 "1" "4" 4 "6" 6  9 0 1          0
 8 226  -.6480267 "05" 5  .07526647 "1" "4" 4 "6" 6  9 0 1          0
 8 230 -.55431074 "05" 5  .07526647 "1" "4" 4 "6" 6  9 0 1          0
 8 234  -.8407832 "05" 5  .07526647 "1" "4" 4 "6" 6  9 1 1  .07526647
 8 238  -.8708284 "05" 5  .07526647 "1" "4" 4 "6" 6  9 1 1  .07526647
 8 240    -.71562 "05" 5  .07526647 "1" "4" 4 "6" 6  9 1 1  .07526647
 9 214  -.7731899 "05" 5 .010951692 "1" "5" 5 "1" 1  5 0 1          0
 9 218  -.8556661 "05" 5 .010951692 "1" "5" 5 "1" 1  5 0 1          0
 9 222  -.8630462 "05" 5 .010951692 "1" "5" 5 "1" 1  5 0 1          0
 9 226 -1.1093076 "05" 5 .010951692 "1" "5" 5 "1" 1  5 0 1          0
 9 230   -.995428 "05" 5 .010951692 "1" "5" 5 "1" 1  5 0 1          0
 9 234 -1.1325138 "05" 5 .010951692 "1" "5" 5 "1" 1  5 1 1 .010951692
 9 238  -1.446919 "05" 5 .010951692 "1" "5" 5 "1" 1  5 1 1 .010951692
 9 240 -1.2089604 "05" 5 .010951692 "1" "5" 5 "1" 1  5 1 1 .010951692
10 214  -.8788643 "05" 5  .01166011 "1" "5" 5 "6" 6  7 0 1          0
10 218 -.59019923 "05" 5  .01166011 "1" "5" 5 "6" 6  7 0 1          0
10 222 -1.0414538 "05" 5  .01166011 "1" "5" 5 "6" 6  7 0 1          0
10 226  -.7817006 "05" 5  .01166011 "1" "5" 5 "6" 6  7 0 1          0
10 230  -.8415672 "05" 5  .01166011 "1" "5" 5 "6" 6  7 0 1          0
10 234 -1.1592369 "05" 5  .01166011 "1" "5" 5 "6" 6  7 1 1  .01166011
10 238 -1.2750688 "05" 5  .01166011 "1" "5" 5 "6" 6  7 1 1  .01166011
10 240  -.8754688 "05" 5  .01166011 "1" "5" 5 "6" 6  7 1 1  .01166011
11 214 -.56661695 "05" 5 .006574537 "1" "6" 6 "1" 1  7 0 1          0
11 218  -.5630941 "05" 5 .006574537 "1" "6" 6 "1" 1  7 0 1          0
11 222  -.5887441 "05" 5 .006574537 "1" "6" 6 "1" 1  7 0 1          0
11 226  -.7330385 "05" 5 .006574537 "1" "6" 6 "1" 1  7 0 1          0
11 230   -.827625 "05" 5 .006574537 "1" "6" 6 "1" 1  7 0 1          0
11 234  -.8729403 "05" 5 .006574537 "1" "6" 6 "1" 1  7 1 1 .006574537
11 238  -.9966131 "05" 5 .006574537 "1" "6" 6 "1" 1  7 1 1 .006574537
11 240  -1.377926 "05" 5 .006574537 "1" "6" 6 "1" 1  7 1 1 .006574537
12 214  -.4133083 "05" 5 .009388856 "1" "6" 6 "6" 6  4 0 1          0
12 218  -.4388015 "05" 5 .009388856 "1" "6" 6 "6" 6  4 0 1          0
12 222  -.6099968 "05" 5 .009388856 "1" "6" 6 "6" 6  4 0 1          0
12 226  -.4836299 "05" 5 .009388856 "1" "6" 6 "6" 6  4 0 1          0
12 230  -.7595223 "05" 5 .009388856 "1" "6" 6 "6" 6  4 0 1          0
12 234   -.674798 "05" 5 .009388856 "1" "6" 6 "6" 6  4 1 1 .009388856
12 238  -.8167611 "05" 5 .009388856 "1" "6" 6 "6" 6  4 1 1 .009388856
12 240  -.8397506 "05" 5 .009388856 "1" "6" 6 "6" 6  4 1 1 .009388856
13 214  -.7457909 "05" 5 .013999105 "1" "7" 7 "1" 1  2 0 1          0
13 218   -.645138 "05" 5 .013999105 "1" "7" 7 "1" 1  2 0 1          0
13 222  -.6567796 "05" 5 .013999105 "1" "7" 7 "1" 1  2 0 1          0
13 226  -.6678293 "05" 5 .013999105 "1" "7" 7 "1" 1  2 0 1          0
end
format %tq numvar
label values estu_n estu_n
label def estu_n 1 "1", modify

------------------ copy up to and including the previous line ------------------

I would be very thankful if someone could help me. Have a great day.

Felix

Change in the sign of a variable in a regression output

Dear all,

I am currently working on a regression. My aim is to assess whether ESG scores affect the cost of bank loans (variable "AllinDrawn" which is the spread of the loan interest rate over LIBOR, adding any annual (or facility) fee paid to the lender, and it is measured in basis points for each dollar drawn). Below you can find my model:

AllinDrawn_i,_t = f(ESGEnvironmentalPillarScore_i,t-1, ESGSocialPillarScore_i,t-1, ESGGovernancePillarScore_i,t-1, Book_value_of_borrower_assets_i,t-1,Price-to-book_ratio_i,t-1, Debt-equity_ratio_i,t-1, Altman-z_score_i,t-1, EBIT_i,t-1, ROE_i,t-1, Interest_Coverage_ratio_i,t-1, CurrentAssets/CurrentLiabilities_{i,t-1, PPE/TotalAssets}_i,t-1,Book_value_of_lender_assets_i,t-1, Dealsize_t, Maturity_t, Number_of_lenders_t,dummy_collateral_t, dummy_loantype_t, dummy_corporate_purpose_t, dummy_country, dummy_time)

My dataset is composed by 1354 observations over the time period 2010-2020. I do not have a balanced panel as different loans are granted to different firms every year; thus, it may occur that the same firm is present throughout the whole period or for one year only. For this reason, I was thinking of using pooled ols as estimation method. In order to understand which is the best option I have used:
1. the command reg combined with the command robust
2. the command reg combined with vce(cluster borrower company id) to account for the cluster effect
3. the command reg combined with vce(robust)

I was also thinking that I could apply a log-transformation to the variables AllinDrawn, Book_value_of_borrower_assets, Maturity and Dealsize as they are positively skewed.

Code:


* Example generated by -dataex-. For more info, type help dataex
clear
input double(AllinDrawn ESGEnvironmentalPillarScore ESGSocialPillarScore	ESGGovernancePillarScore)
250 77.5557011795543 64.7579193328731 88.7326388888888
630 5.41978129948686 3.53632478632478 0
275 18.564126394052 65.9194214876033 13.5558069381598
250 31.8097413602534 17.4276859504132 23.1316137566137
275 31.8097413602534 17.4276859504132 23.1316137566137
300 10.7844598613613 6.05716253443526 2.51572327044024
460 21.18 31.11 8.33
200 17.24 31.16 25.01
250 17.5230674087816 74.7279614325069 0
225 44.7532467532467 50.2441860465116 0
225 44.7532467532467 50.2441860465116 0
325 14.8953267935558 4.64531680440771 0
300 69.7174447174446 42.1959358617987 31.4429824561403
325 69.7174447174446 42.1959358617987 31.4429824561403
300 69.7174447174446 42.1959358617987 31.4429824561403
300 69.7174447174446 42.1959358617987 31.4429824561403
325 69.7174447174446 42.1959358617987 31.4429824561403
300 69.7174447174446 42.1959358617987 31.4429824561403
325 69.7174447174446 42.1959358617987 31.4429824561403
37.5 27.3032002127565 36.7112304037709 18.9411157024793
end

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(BookvalueoftotalassetsFirm Pricetobookratio DebttoEquityLeverage	Altmanzscore EBIT ROE InterestCoverageRatio BVofCABVofCLLiquidity PPETotalassetsTangibility)
24032 1.0432595133933986 1.406099290780142 1.280729860186418 1247	.06594186576534986 3.681214421252372 2.1813494732122845 .07739680426098536
1572.123 1.447634142903065 .07418778253489806 1.2572250390077622 181.934	.07680436156491312 10.343469568047611 3.861882603848577 .677739591622284
786.401 6.312142016730666 .015752023662174805 .5796776708066242 87.059	.06793264871184676 24.618894123481276 4.380098988998617 .20942369096682228
8094.3 1.2947419015853134 1.4887629546248495 1.1027599668902808 525.9	.029708127253270204 2.4519056261343013 1.8862742241002477 .193617730995886
8094.3 1.2947419015853134 1.4887629546248495 1.1027599668902808 525.9	.029708127253270204 2.4519056261343013 1.8862742241002477 .193617730995886
5937.156 2.7769083937119223 .33365353539984816 .36842434323773876 673.24	.040207092608939246 9.660145073783433 .6675808980320502 .0398877846564921
4645.943 1.997013829529877 2.4163164860084185 .13777250388134338 41.896	.030533362893849964 2.247503143654155 .4604915937554856 .8303532781181344
359.4 -6.364957170420625 -.16689280868385345 2.136366165831942 71.5	.09742114432565924 29.866666666666664 1.4602230483271377 .22120200333889817
5234.318 4.109158671361969 .7731561886119267 2.084586320510141 884.737	.08095669371134429 12.320208982852689 1.1410925133890246 .06600095752684496
5407.3 1.427857787689333 1.0857068426536993 2.886581103323285 386.9	.06020710056697605 5.900889453621346 1.2077922077922076 .3197899136352708
5407.3 1.427857787689333 1.0857068426536993 2.886581103323285 386.9	.06020710056697605 5.900889453621346 1.2077922077922076 .3197899136352708
1200.269 6.296495347642471 .7488521977848601 1.183230259216892 150.992	.02562144501510765 9.576944571457602 2.059774358974359 .024478679362709525
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
2339.679 1.7450898699218107 .34194769086211746 2.1581339576924865 269.709	.07939763211947916 9.374060374656962 2.7416232869728647 .07157990476471345
16081.984 2.736890126343609 .3551863619917197 2.244584617171612 1491.547	.039283398572413544 12.497946481111873 2.0011704596990167 .13285220281278728
end

Code:


* Example generated by -dataex-. For more	info,	type	help	dataex
clear
input double FacilityAmt int Maturity
3.000e+09 10
1.500e+08 36
50000000 36
6.500e+08 41
3.000e+08 65
7.500e+08 36
1.000e+08 40
240496000 48
1.700e+09 40
581582051.61 51
479417948.39 51
1.275e+09 72
2.650e+08 60
167407404.52 60
9521089.21 60
3.677e+08 60
126560000 60
1.0031e+09 72
379680000 72
1.000e+09 12
end

I have first run a regression of AllinDrawn on ESGEnvironmentalPillarScore_i,t-1, ESGSocialPillarScore_i,t-1, ESGGovernancePillarScore_i,t-1.

Code:

reg AllinDrawn ESGEnvironmentalPillarScore ESGSocialPillarScore ESGGovernancePillarScore, robust

Linear regression                               Number of obs     =      1,354
                                                F(3, 1350)        =      36.61
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0698
                                                Root MSE          =      98.88

---------------------------------------------------------------------------------------------
                            |               Robust
                 AllinDrawn | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
----------------------------+----------------------------------------------------------------
ESGEnvironmentalPillarScore |  -.0753313    .118659    -0.63   0.526    -.3081074    .1574448
       ESGSocialPillarScore |  -.3126749   .1829552    -1.71   0.088    -.6715823    .0462324
   ESGGovernancePillarScore |   -1.01193   .1307176    -7.74   0.000    -1.268362   -.7554986
                      _cons |   238.8086   7.892187    30.26   0.000     223.3263    254.2909

When I then add more variables the coefficient of the variable ESGEnvironmentalPillarScore_i,t-1 becomes positive, contrary to my expectations. Could it be an issue of omitted variables? Or is there any other possible explanation? It is the first time I have used Stata so it could very well be I have made other mistakes when constructing my model.

Code:

reg AllinDrawn ESGEnvironmentalPillarScore ESGSocialPillarScore ESGGovernancePillarScore BookvalueoftotalassetsFirm Pricetobookratio
>  DebttoEquityLeverage Altmanzscore EBIT ROE InterestCoverageRatio BVofCABVofCLLiquidity PPETotalassetsTangibility, robust

Linear regression                               Number of obs     =      1,322
                                                F(12, 1309)       =      28.71
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2597
                                                Root MSE          =     86.596

---------------------------------------------------------------------------------------------
                            |               Robust
                 AllinDrawn | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
----------------------------+----------------------------------------------------------------
ESGEnvironmentalPillarScore |   .0739704   .1128945     0.66   0.512    -.1475035    .2954443
       ESGSocialPillarScore |  -.1242622   .1538227    -0.81   0.419    -.4260281    .1775037
   ESGGovernancePillarScore |   -.801108   .1166373    -6.87   0.000    -1.029924   -.5722915
 BookvalueoftotalassetsFirm |   .0001103   .0000908     1.21   0.225    -.0000678    .0002884
           Pricetobookratio |  -.8398775   .2880863    -2.92   0.004    -1.405039   -.2747162
       DebttoEquityLeverage |   1.745794   .9277623     1.88   0.060    -.0742692    3.565858
               Altmanzscore |  -14.10079   2.155519    -6.54   0.000    -18.32944   -9.872142
                       EBIT |  -.0065537   .0015351    -4.27   0.000    -.0095652   -.0035421
                        ROE |  -6.562555   2.130634    -3.08   0.002    -10.74239   -2.382725
      InterestCoverageRatio |   .0207107   .0416538     0.50   0.619    -.0610048    .1024262
      BVofCABVofCLLiquidity |  -3.882811   2.132499    -1.82   0.069    -8.066301    .3006782
  PPETotalassetsTangibility |   5.673791   11.57927     0.49   0.624    -17.04216    28.38974
                      _cons |   251.7368    8.69042    28.97   0.000     234.6881    268.7855
---------------------------------------------------------------------------------------------

Thanks a lot in advance for your time.

Kind regards.
Giulia Bonacina

How to create a dummy variable that tracks the repetition of other variables

Hello all,

I have a dataset on user ratings of golf courses and I am trying to create a dummy variable called "compared_rating". Here is what I want the variable to look like:

compared_rating = 0 if that review made on a day when that user made no other reviews
compared_rating = 1 if that review made on days when that user made >1 review

In the data that I have, users can either create a new review or update an older one. With the compared_rating variable I am primarily interested in tracking if there is a creation and 1(+) update(s) on the same day or two+ updates on the same day. It would be ideal to also include users that have created two+ reviews on the same day in the compared_rating variable but this is less important.

This is my initial intuition for creating this variable:

PHP Code:

sort userid updatedat createdat

quietly by userid updatedat createdat: gen dup = cond(_N==1,0,_n)
gen compared_rating = 1 if dup != 0

replace compared_rating = 0 if dup == 0

Unfortunately, the code above only has about 1,700 values for compared_rating = 1 which makes me believe that I am doing something wrong since this is a dataset with over 1.3 million total entries. I have attatched a data example to help you all get a better idea of the data I am working with. I appreciate all insights!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str17 userid strL(updatedat createdat) double rating
"001544VKwW" "2020-08-20T15:26:13.310Z" "2020-08-20T15:26:13.310Z"   1
"001544VKwW" "2020-10-10T19:16:36.043Z" "2020-10-10T19:16:36.043Z"   5
"007pm3BK9b" "2019-09-08T22:14:07.698Z" "2019-09-08T22:14:07.698Z"   2
"007pm3BK9b" "2019-09-09T23:19:32.089Z" "2019-09-09T23:19:32.089Z" 3.5
"007pm3BK9b" "2019-09-14T20:39:25.065Z" "2019-09-14T20:39:25.065Z"   4
"007pm3BK9b" "2019-09-20T21:14:52.105Z" "2019-09-20T21:14:52.105Z"   4
"007pm3BK9b" "2019-09-20T22:58:02.317Z" "2019-09-20T22:58:02.317Z"   5
"007pm3BK9b" "2019-09-26T22:54:42.854Z" "2019-09-26T22:54:42.854Z"   4
"007pm3BK9b" "2019-10-04T15:46:39.163Z" "2019-10-04T15:46:39.163Z"   4
"007pm3BK9b" "2019-10-04T18:36:17.445Z" "2019-10-04T18:36:17.445Z" 3.5
"007pm3BK9b" "2019-10-07T22:58:08.831Z" "2019-10-07T22:58:08.831Z"   3
"007pm3BK9b" "2019-10-08T20:25:50.499Z" "2019-10-08T20:25:50.499Z" 3.5
"007pm3BK9b" "2019-10-08T22:38:23.154Z" "2019-10-08T22:38:23.154Z"   4
"007pm3BK9b" "2019-10-09T17:27:58.500Z" "2019-10-09T17:27:58.500Z"   5
"007pm3BK9b" "2019-10-09T20:01:01.182Z" "2019-10-09T20:01:01.182Z"   5
"007pm3BK9b" "2019-10-15T17:09:36.862Z" "2019-10-15T17:09:36.862Z" 3.5
"007pm3BK9b" "2019-10-15T18:51:09.289Z" "2019-10-15T18:51:09.289Z"   3
"007pm3BK9b" "2019-10-18T18:05:30.405Z" "2019-10-18T18:05:30.405Z"   3
"007pm3BK9b" "2019-10-20T21:17:03.182Z" "2019-10-20T21:17:03.182Z"   4
"007pm3BK9b" "2019-10-23T20:28:24.449Z" "2019-10-23T20:28:24.449Z"   5
"007pm3BK9b" "2019-11-18T01:31:13.733Z" "2019-11-18T01:31:13.733Z"   2
"009FmWLEeE" "2020-08-26T22:00:42.076Z" "2020-08-26T22:00:42.076Z"   5
"00BLGIWzcJ" "2019-05-15T00:30:45.612Z" "2019-05-15T00:30:45.612Z"   3
"00BLGIWzcJ" "2019-05-16T03:19:49.164Z" "2019-05-16T03:19:49.164Z"   4
"00BLGIWzcJ" "2019-05-31T00:18:37.680Z" "2019-05-31T00:18:37.680Z" 4.5
"00BLGIWzcJ" "2019-05-31T22:56:08.884Z" "2019-05-31T22:56:08.884Z"   5
"00BLGIWzcJ" "2019-06-06T01:18:30.047Z" "2019-06-06T01:18:30.047Z"   4
"00BLGIWzcJ" "2019-06-12T18:43:49.985Z" "2019-06-12T18:43:49.985Z" 1.5
"00BLGIWzcJ" "2020-02-17T21:00:45.484Z" "2020-02-17T21:00:45.484Z"   4
"00BLGIWzcJ" "2020-09-19T21:17:02.268Z" "2020-09-19T21:17:02.268Z" 4.5
"00BLGIWzcJ" "2020-10-15T01:46:14.302Z" "2020-10-15T01:46:14.302Z" 2.5
"00BLGIWzcJ" "2020-10-28T01:36:04.510Z" "2020-10-28T01:36:04.510Z"   4
"00BLGIWzcJ" "2020-11-01T16:20:27.406Z" "2020-11-01T16:20:27.406Z" 4.5
"00BLGIWzcJ" "2020-11-02T02:53:17.199Z" "2020-11-02T02:53:17.199Z"   5
"00BLGIWzcJ" "2020-11-02T02:54:47.077Z" "2020-11-02T02:54:47.077Z" 3.5
"00BLGIWzcJ" "2020-11-17T16:50:39.625Z" "2020-11-17T16:50:39.625Z"   5
"00BLGIWzcJ" "2020-11-17T20:04:32.899Z" "2020-11-17T20:04:32.899Z"   4
"00BLGIWzcJ" "2020-11-17T23:00:34.687Z" "2020-11-17T23:00:34.687Z"   4
"00BLGIWzcJ" "2020-11-22T18:51:21.183Z" "2020-11-22T18:51:21.183Z"   4
"00BLGIWzcJ" "2020-12-21T00:01:09.474Z" "2020-12-21T00:01:09.474Z" 3.5
"00BLGIWzcJ" "2020-12-22T04:52:20.770Z" "2020-12-22T04:52:20.770Z" 3.5
"00D0EXepFD" "2020-06-21T11:27:02.659Z" "2020-06-21T11:27:02.659Z"   5
"00D0EXepFD" "2020-07-02T15:26:11.556Z" "2020-07-02T15:26:11.556Z"   3
"00D0EXepFD" "2020-07-05T14:38:53.997Z" "2020-07-05T14:38:53.997Z"   4
"00D0EXepFD" "2020-07-06T11:05:02.099Z" "2020-07-06T11:05:02.099Z"   2
"00D0EXepFD" "2020-07-07T11:04:25.031Z" "2020-07-07T11:04:25.031Z"   4
"00D0EXepFD" "2020-07-10T10:07:39.704Z" "2020-07-10T10:07:39.704Z"   4
"00D0EXepFD" "2020-09-19T08:09:47.819Z" "2020-05-09T16:17:29.435Z"   4
"00FqmmCeLF" "2020-11-03T20:59:07.579Z" "2020-11-03T20:59:07.579Z"   5
"00FqmmCeLF" "2020-11-06T22:58:37.928Z" "2020-11-06T22:58:37.928Z"   5
"00GM0zPWrU" "2020-02-15T21:37:23.713Z" "2020-02-15T21:37:23.713Z" 4.5
"00GM0zPWrU" "2020-03-01T21:47:38.896Z" "2020-03-01T21:47:38.896Z"   4
"00GM0zPWrU" "2020-03-20T20:03:22.536Z" "2020-03-20T20:03:22.536Z"   4
"00GM0zPWrU" "2020-09-25T01:34:35.463Z" "2020-09-25T01:34:35.463Z"   5
"00GM0zPWrU" "2020-09-26T19:33:44.294Z" "2020-09-26T19:33:44.294Z"   4
"00GM0zPWrU" "2020-10-10T14:54:53.694Z" "2020-10-10T14:54:53.694Z"   4
"00GRD0DXsA" "2020-07-04T14:22:33.072Z" "2020-07-04T14:22:33.072Z" 3.5
"00GRD0DXsA" "2020-07-06T17:23:02.309Z" "2020-07-06T17:23:02.309Z"   4
"00GRD0DXsA" "2020-07-08T14:41:02.262Z" "2020-07-08T14:41:02.262Z"   4
"00GRD0DXsA" "2020-07-09T16:06:46.700Z" "2020-07-09T16:06:46.700Z"   4
"00I4u5rBjQ" "2017-10-17T12:54:28.220Z" "2017-09-10T01:07:47.692Z"   4
"00I4u5rBjQ" "2018-06-25T03:38:34.679Z" "2017-09-10T01:07:56.427Z" 2.5
"00IonREp6n" "2018-10-14T18:58:58.867Z" "2018-10-14T18:58:58.867Z"   5
"00LskfiGlk" "2020-05-19T22:54:16.392Z" "2020-05-19T22:54:16.392Z"   4
"00M2bF7aW1" "2020-06-30T17:06:22.024Z" "2020-06-30T17:06:22.024Z"   5
"00M2bF7aW1" "2020-07-15T23:53:41.959Z" "2020-07-15T23:53:41.959Z"   5
"00M2bF7aW1" "2020-08-17T22:15:15.140Z" "2020-08-17T22:15:15.140Z"   5
"00M9ISXXrC" "2018-11-27T15:41:18.681Z" "2018-11-27T15:41:18.681Z" 3.5
"00M9ISXXrC" "2018-12-08T19:59:12.883Z" "2018-12-08T19:59:12.883Z" 2.5
"00M9ISXXrC" "2018-12-13T20:04:55.256Z" "2018-12-13T20:04:55.256Z" 3.5
"00M9ISXXrC" "2018-12-13T21:58:36.894Z" "2018-12-13T21:58:36.894Z"   3
"00M9ISXXrC" "2018-12-28T18:56:12.788Z" "2018-12-28T18:56:12.788Z" 3.5
"00M9ISXXrC" "2019-07-14T18:51:41.578Z" "2019-07-14T18:51:41.578Z"   5
"00M9ISXXrC" "2019-11-07T18:32:29.926Z" "2019-11-07T18:32:29.926Z"   5
"00M9ISXXrC" "2020-05-26T17:51:07.521Z" "2020-05-26T17:51:07.521Z"   5
"00M9ISXXrC" "2020-06-17T09:48:45.849Z" "2018-05-28T17:44:51.120Z"  .5
"00M9ISXXrC" "2020-08-27T23:23:25.689Z" "2018-09-20T00:03:45.160Z" 3.5
"00M9ISXXrC" "2020-11-29T17:32:09.222Z" "2020-11-29T17:32:09.222Z"   4
"00OFXMYGwM" "2020-09-11T15:28:47.021Z" "2020-09-11T15:28:47.021Z" 4.5
"00OFXMYGwM" "2020-10-21T14:29:34.250Z" "2020-07-25T01:05:27.775Z"   5
"00OqHKXhlZ" "2020-09-10T02:03:45.776Z" "2020-09-10T02:03:45.776Z"   3
"00P0wOQrpc" "2020-08-14T12:03:49.138Z" "2020-08-14T12:03:49.138Z"   5
"00P0wOQrpc" "2020-09-03T20:13:28.203Z" "2020-09-03T20:13:28.203Z"   4
"00P0wOQrpc" "2020-09-26T17:06:49.386Z" "2020-06-15T12:48:28.137Z" 3.5
"00P0wOQrpc" "2020-09-27T12:06:28.200Z" "2020-09-27T12:06:28.200Z"   3
"00P0wOQrpc" "2020-10-16T12:07:26.127Z" "2020-10-16T12:07:26.127Z" 4.5
"00QpTCKkvk" "2018-05-21T16:09:36.829Z" "2018-05-21T16:09:36.829Z"   4
"00ROkedpRn" "2020-10-11T21:57:36.182Z" "2020-10-11T21:57:36.182Z"   4
"00ROkedpRn" "2020-10-12T22:52:53.616Z" "2020-10-12T22:52:53.616Z" 4.5
"00ROkedpRn" "2020-11-20T18:44:56.250Z" "2020-11-20T18:44:56.250Z"   4
"00ROkedpRn" "2020-12-22T19:01:57.296Z" "2020-12-22T19:01:57.296Z"   4
"00ROkedpRn" "2020-12-23T18:35:37.114Z" "2020-12-23T18:35:37.114Z"   4
"00ROkedpRn" "2020-12-27T20:04:22.402Z" "2020-12-27T20:04:22.402Z"   4
"00ROkedpRn" "2020-12-28T19:25:49.071Z" "2020-12-28T19:25:49.071Z"   5
"00SGfJeiLC" "2020-05-24T21:41:50.669Z" "2020-05-24T21:41:50.669Z"   5
"00SGfJeiLC" "2020-07-04T19:45:34.921Z" "2020-07-04T19:45:34.921Z"   5
"00SGfJeiLC" "2020-07-18T16:37:40.389Z" "2020-07-18T16:37:40.389Z"   5
"00SGfJeiLC" "2020-08-22T00:30:23.378Z" "2020-08-22T00:30:23.378Z"   5
"00TAAjSt8W" "2019-07-27T20:47:41.636Z" "2019-07-27T20:47:41.636Z"   4
"00TAAjSt8W" "2019-07-28T17:26:58.591Z" "2019-07-28T17:26:58.591Z"   3
end

How to flag observations that are repeated after a pre-specified length of time?

Hi,

Please consider this example data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 id float(year surgerydate dischargedate)
"251546/3"  2007 17258 17280
"251546/3"  2007 17258 17280
"991523/01" 2007 17259 17266
"622810/3"  2007 17259 17265
"992424/01" 2007 17260 17272
"991880/01" 2007 17260 17267
"992424/01" 2007 17260 17272
"960950/2"  2007 17260 17268
"990923/01" 2007 17261 17270
"697025/4"  2007 17261 17274
"108844/01" 2007 17261 17263
"820882/4"  2007 17261 17261
"127408/2"  2007 17261 17269
"992102/01" 2007 17261 17265
"185089/3"  2007 17261 17265
"991653/01" 2007 17261 17266
"992890/01" 2007 17261 17269
"127407/2"  2007 17262 17265
"473752/01" 2007 17262 17269
"992952/01" 2007 17262 17269
"517830/2"  2007 17262 17270
"990896/01" 2007 17262 17263
"697023/3"  2007 17262 17269
"990933/01" 2007 17262 17269
"137595/3"  2007 17262 17272
"619651/01" 2007 17262 17263
"992852/01" 2007 17262 17264
"992836/01" 2007 17262 17265
"992243/01" 2007 17262 17266
"386580/3"  2007 17262 17269
"334471/4"  2007 17262 17265
"622842/2"  2007 17262 17263
"499304/01" 2007 17262 17268
"991488/01" 2007 17262 17263
"864925/6"  2007 17262 17268
"991095/01" 2007 17262 17270
"385971/01" 2007 17262 17263
"991188/01" 2007 17262 17269
"992416/01" 2007 17263 17265
"359888/4"  2007 17263 17274
"991753/01" 2007 17263 17273
"992966/01" 2007 17263 17276
"517153/2"  2007 17263 17270
"75960/3"   2007 17263 17265
"829202/2"  2007 17263 17273
"992737/01" 2007 17263 17274
"991309/01" 2007 17263 17293
"1291261/2" 2007 17263 17270
"992975/01" 2007 17264 17268
"991938/01" 2007 17264 17273
"762746/4"  2007 17264 17271
"97798/2"   2007 17264 17272
"992863/01" 2007 17265 17273
"692341/2"  2007 17265 17268
"992437/01" 2007 17265 17276
"993024/01" 2007 17265 17273
"991346/01" 2007 17265 17270
"638566/3"  2007 17265 17274
"496736/2"  2007 17265 17270
"992789/01" 2007 17265 17273
"904146/2"  2007 17265 17307
"991852/01" 2007 17265 17268
"147289/3"  2007 17265 17274
"993075/01" 2007 17265 17266
"21468/2"   2007 17265 17272
"991254/01" 2007 17265 17272
"990941/01" 2007 17265 17274
"871/2"     2007 17265 17275
"804281/2"  2007 17265 17280
"361267/3"  2007 17265 17273
"127406/3"  2007 17265 17275
"61523/2"   2007 17266 17274
"991345/01" 2007 17266 17270
"991926/01" 2007 17266 17275
"987499/5"  2007 17266 17273
"991771/01" 2007 17266 17271
"990869/01" 2007 17266 17276
"323668/5"  2007 17266 17272
"359889/3"  2007 17266 17274
"447549/2"  2007 17266 17269
"992210/01" 2007 17266 17274
"991225/01" 2007 17266 17273
"990939/01" 2007 17266 17271
"991543/01" 2007 17266 17270
"517152/4"  2007 17266 17274
"991913/01" 2007 17266 17270
"757203/2"  2007 17266 17275
"992706/01" 2007 17266 17275
"108843/2"  2007 17266 17277
"992343/01" 2007 17266 17272
"138302/2"  2007 17266 17274
"991431/01" 2007 17266 17274
"1260671/2" 2007 17266 17273
"61523/2"   2007 17266 17274
"108843/2"  2007 17266 17277
"992902/01" 2007 17266 17275
"991562/01" 2007 17266 17287
"510164/2"  2007 17266 17273
"991011/01" 2007 17266 17275
"992474/01" 2007 17266 17270
end
format %td surgerydate
format %td dischargedate

I want to look at readmission within 30 days of discharge from first admission. So, I want to flag IDs that are repeated within 30 days of discharge date. That is, for eg, if any ID has been discharged on 24th April,2007, and has appeared again on surgery date 15th May,2007, I want to flag that ID.

Any help in this regard would be greatly appreciated.

Thanks,

Tuesday, June 29, 2021

Using Reshape to sort wide into long

Hello Everyone,

I want to convert my wide data into a long panel data for currency data.

Currently the data looks like :

Type Date Australia Austria Belgium ........
Spot 01/01/1980 0.72 `1.25 2.5
Forward 01/01/1980 0.73 1.26 2.55
Spot 01/02/1980 0.75 `1.19 2.53
Forward 01/02/1980 0.76 1.20 2.57

I want the data to look in the following panel forward

Country Date Spot Forward

Australia 01/01/1980 0.72 0.73
Australia 01/02/1980 0.75 0.76
Austria 01/01/1980 1.25 1.26
Austria 01/02/1980 1.19 1.20
Belgium 01/01/1980 2.5 2.55
Belgium 01/02/1980 2.53 2.57

The -datex is below )not formated for number of spaces.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str7 type int date double(australia austria belgium brazil bulgaria)
"Spot"    7549  .8564603240548401 12.652675529705027 28.707926879933527 0 .
"Spot"    7579  .8506900878293601  12.80008364700962 28.954412379757425 0 .
"Spot"    7610  .8532842234499693 13.585013300593413 30.673214651115206 0 .
"Spot"    7640  .8595445839540328 13.749725473505002 31.142796339646736 0 .
"Forward" 7671                  . 13.237130907570053 31.681304893350063 . .
"Spot"    7671  .8471350899205353  13.27896277708072  31.76495190296947 0 .
"Spot"    7702  .8561089825457642 15.025542784163473  34.26990208599404 0 .
"Forward" 9010                  . 20.381811204911745 58.653100537221796 . .
"Forward" 9040                  . 21.392921212121212  61.84645656565656 . .
"Spot"    9040             1.2042 21.462626262626262 61.838367676767675 0 .
"Forward" 9071                  .  21.05748987854251  60.08906072874494 . .
"Spot"    9071             1.1743 21.121441295546557  60.08097165991902 0 .
"Forward" 9101                  .  21.75939014202172 62.552213868003335 . .
"Spot"    9101             1.1685 21.791963241436925  62.48954887218044 0 .
end
format %td date

Thank you,
Afak Nazim

Is adoupdate, update stallig for anyone else? Advice/

For the first time, I'm having trouble with adoupdate, update today. It goes through identifying what needs to be updated just fine, but when starting to download packages, it stalls, showing progress at 0%. Stata ultimately just dies and becomes unresponsive. I'm running Stata 17 MP4 on MacOS Big Sur.

Has anyone had a similar experience or can anyone offer advice?

Thanks.

How to Clean String variable with errors in data entry

Hi Stata Forum,

I am a relatively new user to STATA. I am working on a project that uses data scraped off a website that people have manually entered information into. I have a string variable that is supposed to contain a simple one phrase description "آخر جلسة" etc.

Quite a few entries contain information that should not be there, i.e. the date or multiplications of the entry:

المتابعة (إنشاء الملف) : [12008/2102/2020 آخر جلسة 2021-03-31 09:00:00] [42/2201/2020 آخر جلسة 2020-11-04 12:00:00]
المتابعة (إنشاء الملف) : [13/2114/2020 آخر جلسة 2021-02-02 13:00:00]

Most of the data has been entered correctly and the mistakes are not consistent, so I can't simply delete the first set of unneeded digits.

One of my ideas is to split the variable by the spaces and than drop values that are incorrect and than try and work all the correct values into a single column through if conditions and replace. Does this sound reasonable and are there any commands that could help make this easier?

Kind regards,
Mathew Toll

Using loop to replace values in a variable

Dear All,

I am new to loop. Is there a way I can replace the missing values with the number 1 each time they are selected in a random selection. I was thinking if we can use the command "randomselect" with loop. The percentage of selection is 4% of the total for each year. The unique id is identification (which will be unique for each person) So for each year i will have to randomly select 4% and assign it a number 1. It is possble that the same person might be selected twice. How can I do that using loop and randomselect. My data is in long format

Any help would be useful since I am quite new to looping in STATA

A sample of my dataset Some variables have been renamed since these are highly confidential data

copy starting from the next line ------- ---------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double empid float var1 str14 sublevel float year
2001001 1 "A" 2007
2001001 . "A" 2008
2001001 . "A" 2009
2001001 . "A" 2010
2001001 . "A" 2011
2001001 . "A" 2012
2001001 . "A" 2013
2001001 . "A" 2014
2001001 . "A" 2015
2001001 . "A" 2016
2001001 . "A" 2017
2001001 . "A" 2018
2001001 . "A" 2019
2001001 . "A" 2020
2001001 . "A" 2021
2001001 . "A" 2022
2001001 . "A" 2023
2001001 . "A" 2024
2001001 . "A" 2025
2001001 . "A" 2026
2001004 1 "A" 2007
2001004 . "A" 2008
2001004 . "A" 2009
2001004 . "A" 2010
2001004 . "A" 2011
2001004 . "A" 2012
2001004 . "A" 2013
2001004 . "A" 2014
2001004 . "A" 2015
2001004 . "A" 2016
2001004 . "A" 2017
2001004 . "A" 2018
2001004 . "A" 2019
2001004 . "A" 2020
2001004 . "A" 2021
2001004 . "A" 2022
2001004 . "A" 2023
2001004 . "A" 2024
2001004 . "A" 2025
2001004 . "A" 2026
2001008 1 "A" 2007
2001008 . "A" 2008
2001008 . "A" 2009
2001008 . "A" 2010
2001008 . "A" 2011
2001008 . "A" 2012
2001008 . "A" 2013
2001008 . "A" 2014
2001008 . "A" 2015
2001008 . "A" 2016
2001008 . "A" 2017
2001008 . "A" 2018
2001008 . "A" 2019
2001008 . "A" 2020
2001008 . "A" 2021
2001008 . "A" 2022
2001008 . "A" 2023
2001008 . "A" 2024
2001008 . "A" 2025
2001008 . "A" 2026
2001009 1 "A" 2007
2001009 . "A" 2008
2001009 . "A" 2009
2001009 . "A" 2010
2001009 . "A" 2011
2001009 . "A" 2012
2001009 . "A" 2013
2001009 . "A" 2014
2001009 . "A" 2015
2001009 . "A" 2016
2001009 . "A" 2017
2001009 . "A" 2018
2001009 . "A" 2019
2001009 . "A" 2020
2001009 . "A" 2021
2001009 . "A" 2022
2001009 . "A" 2023
2001009 . "A" 2024
2001009 . "A" 2025
2001009 . "A" 2026
2001010 1 "A" 2007
2001010 . "A" 2008
2001010 . "A" 2009
2001010 . "A" 2010
2001010 . "A" 2011
2001010 . "A" 2012
2001010 . "A" 2013
2001010 . "A" 2014
2001010 . "A" 2015
2001010 . "A" 2016
2001010 . "A" 2017
2001010 . "A" 2018
2001010 . "A" 2019
2001010 . "A" 2020
2001010 . "A" 2021
2001010 . "A" 2022
2001010 . "A" 2023
2001010 . "A" 2024
2001010 . "A" 2025
2001010 . "A" 2026
end
label values var1 positionl

copy up to and including the previous line -- ---------------

Whether can I do two mean difference tests simultanuously?

Dear Statalists,

My dataset has 4 variables A, B, C and D. I would like to test mean difference of A and B simultaneously with mean difference test of C and D. Specifically, (mean A = mean B) & (mean C = mean D). Could I do this with STATA or other software?

Thank you so much for any suggestion!

Why the variables'name from esttab presentation is different from that of presented table by using esttab

My code is as below

Code:

reghdfe wNET_PRO_MAR pt , a(TYPE2 yr)
est sto m1


reghdfe wNET_PRO_MAR pt wFIRM_SIZE LNGDP UNEMPLOYMENT IMP_OVER_GDP INFLATION wLEVERAGE , a(TYPE2 yr)
est sto m2


reghdfe wNET_PRO_MAR pt wFIRM_SIZE LNGDP UNEMPLOYMENT IMP_OVER_GDP INFLATION wLEVERAGE median_mtbv Herfindahl , a(TYPE2 yr)
est sto m3

esttab m1 m2 m3 ,star(* 0.1 ** 0.05 *** 0.01) ar2 label title(" **wNET_PRO_MAR -reghdfe** ") mtitle("w/o contr" "f&c_contr" "ind_contr" "ind*year_fe" "reg*year_fe" "non_US firms") keep(pt wFIRM_SIZE LNGDP UNEMPLOYMENT IMP_OVER_GDP INFLATION wLEVERAGE median_mtbv Herfindahl)

The result is like that

Array

As can be seen that the appearance of the variables is quite strange, can I ask how to deal with this problem. Thank you in advance.

Generating Variables Acc. to Multiple Conditions And Earliest Occurrence

I am working with a dataset that contains names, the frequency of a name according to gender, and the year in which this data was recorded. It looks like this:
Array

My problem is that I want to create a variable that simultaneously looks for the earliest occurrence in which a name is recorded and checks that bothmale and female are above zero. I've done a bit of searching beforehand and know how to do each step individually, but not together. That is to say, I know that I can generate a variable recording the earlist occurrence using something like

Code:

 egen earliestyear = min(year), by(name)

I know I can also generate a variable recording if female and male meet certain conditions by writing something along the lines of

Code:

gen examplevariable = whatever if male > 0 & female > 0

But how can I tell STATA to generate a variable in which the gender conditions are met in the earliest name a year is recorded? I've tried using conditionals within the egen function, but am getting nonsense results.

Code:

egen examplevariable2 = min(year & female > 0 & male > 0), by(name)

is probably telling STATA to find the minimum frequencies of the gender conditions, which is not what I'm looking to do. All help is appreciated.