BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Tuesday, June 30, 2020

How to declare "string" condition in If condition

Hello,

I am newbie in Stata and I have a problem with stata when I try to declare If condition.

In my DB, I have "country" variable which has "Germany" values but when I run the regression:

"oprobit adl iadl depression_scale chronicw2 if country==Germany". It said that "Germany not found"
or
" oprobit adl iadl depression_scale if "country"=="Germany" ". It said that "no observations"

How can I fix it?

Thank you

Matching and generating hourly data

Dear Stata Users,

I am working on hourly data sets. The datasets have the "zipcodes" variable as the common var to merge/joinby. In data-A I have the hourly data, which should be merged with data-B.

Now, in both datasets, I have each zipcodes repeated more than one time, therefore, it is impossible to use merge (m:1, or 1:m). I would use "joinby" to merge the datasets.

I would like to get some help with generating a variable (such as Uhrzeit) to get in data-B an hourly-time (Uhrzeit) for each zipcode for each year.

Finally, do I have to generate the hourly-time before or after joining (joinby).

data-A:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long zipcode str11 stationcode double no2 str7 Uhrzeit float(Datum datum_tag datum_monat datum_jahr)
63741 "DEBY005" 30.49 "01:00" 20089 1 1 2015
91522 "DEBY001" 33.75 "01:00" 20089 1 1 2015
63741 "DEBY005" 30.25 "02:00" 20089 1 1 2015
91522 "DEBY001" 32.98 "02:00" 20089 1 1 2015
63741 "DEBY005" 32.94 "03:00" 20089 1 1 2015
91522 "DEBY001" 30.58 "03:00" 20089 1 1 2015
63741 "DEBY005" 30.54 "04:00" 20089 1 1 2015
91522 "DEBY001" 27.21 "04:00" 20089 1 1 2015
63741 "DEBY005" 29.84 "05:00" 20089 1 1 2015
91522 "DEBY001" 28.95 "05:00" 20089 1 1 2015
63741 "DEBY005" 28.39 "06:00" 20089 1 1 2015
91522 "DEBY001"  27.5 "06:00" 20089 1 1 2015
63741 "DEBY005" 28.89 "07:00" 20089 1 1 2015
91522 "DEBY001" 26.32 "07:00" 20089 1 1 2015
63741 "DEBY005" 31.39 "08:00" 20089 1 1 2015
91522 "DEBY001" 27.57 "08:00" 20089 1 1 2015
63741 "DEBY005" 33.36 "09:00" 20089 1 1 2015
91522 "DEBY001" 25.35 "09:00" 20089 1 1 2015
63741 "DEBY005" 28.75 "10:00" 20089 1 1 2015
91522 "DEBY001" 21.24 "10:00" 20089 1 1 2015
63741 "DEBY005" 27.13 "11:00" 20089 1 1 2015
91522 "DEBY001" 20.72 "11:00" 20089 1 1 2015
63741 "DEBY005" 23.34 "12:00" 20089 1 1 2015
91522 "DEBY001" 25.09 "12:00" 20089 1 1 2015
63741 "DEBY005" 30.87 "13:00" 20089 1 1 2015
91522 "DEBY001" 21.51 "13:00" 20089 1 1 2015
63741 "DEBY005" 35.34 "14:00" 20089 1 1 2015
91522 "DEBY001" 23.25 "14:00" 20089 1 1 2015
63741 "DEBY005" 38.24 "15:00" 20089 1 1 2015
91522 "DEBY001"    28 "15:00" 20089 1 1 2015
63741 "DEBY005" 34.68 "16:00" 20089 1 1 2015
91522 "DEBY001" 21.67 "16:00" 20089 1 1 2015
63741 "DEBY005" 42.17 "17:00" 20089 1 1 2015
91522 "DEBY001" 24.92 "17:00" 20089 1 1 2015
63741 "DEBY005" 43.97 "18:00" 20089 1 1 2015
91522 "DEBY001" 29.73 "18:00" 20089 1 1 2015
63741 "DEBY005" 33.09 "19:00" 20089 1 1 2015
91522 "DEBY001"  27.7 "19:00" 20089 1 1 2015
63741 "DEBY005" 20.34 "20:00" 20089 1 1 2015
91522 "DEBY001" 23.79 "20:00" 20089 1 1 2015
63741 "DEBY005" 17.08 "21:00" 20089 1 1 2015
91522 "DEBY001" 18.47 "21:00" 20089 1 1 2015
63741 "DEBY005" 18.25 "22:00" 20089 1 1 2015
91522 "DEBY001" 18.46 "22:00" 20089 1 1 2015
63741 "DEBY005" 18.69 "23:00" 20089 1 1 2015
91522 "DEBY001" 17.73 "23:00" 20089 1 1 2015
63741 "DEBY005" 21.54 "24:00" 20089 1 1 2015
91522 "DEBY001" 13.44 "24:00" 20089 1 1 2015
63741 "DEBY005" 15.73 "01:00" 20090 2 1 2015
91522 "DEBY001" 14.07 "01:00" 20090 2 1 2015
63741 "DEBY005" 13.33 "02:00" 20090 2 1 2015
91522 "DEBY001" 15.11 "02:00" 20090 2 1 2015
63741 "DEBY005" 12.15 "03:00" 20090 2 1 2015
91522 "DEBY001" 18.38 "03:00" 20090 2 1 2015
63741 "DEBY005" 15.88 "04:00" 20090 2 1 2015
91522 "DEBY001" 21.81 "04:00" 20090 2 1 2015
63741 "DEBY005" 39.34 "05:00" 20090 2 1 2015
91522 "DEBY001" 23.83 "05:00" 20090 2 1 2015
63741 "DEBY005" 36.97 "06:00" 20090 2 1 2015
91522 "DEBY001" 21.01 "06:00" 20090 2 1 2015
63741 "DEBY005" 29.83 "07:00" 20090 2 1 2015
91522 "DEBY001" 40.23 "07:00" 20090 2 1 2015
63741 "DEBY005" 30.41 "08:00" 20090 2 1 2015
91522 "DEBY001" 45.88 "08:00" 20090 2 1 2015
63741 "DEBY005" 23.77 "09:00" 20090 2 1 2015
91522 "DEBY001" 53.98 "09:00" 20090 2 1 2015
63741 "DEBY005" 35.29 "10:00" 20090 2 1 2015
91522 "DEBY001" 39.31 "10:00" 20090 2 1 2015
63741 "DEBY005" 19.62 "11:00" 20090 2 1 2015
91522 "DEBY001" 44.46 "11:00" 20090 2 1 2015
63741 "DEBY005" 15.99 "12:00" 20090 2 1 2015
91522 "DEBY001" 44.58 "12:00" 20090 2 1 2015
63741 "DEBY005" 20.42 "13:00" 20090 2 1 2015
91522 "DEBY001"  33.5 "13:00" 20090 2 1 2015
63741 "DEBY005" 19.29 "14:00" 20090 2 1 2015
91522 "DEBY001" 35.01 "14:00" 20090 2 1 2015
63741 "DEBY005" 22.52 "15:00" 20090 2 1 2015
91522 "DEBY001" 34.49 "15:00" 20090 2 1 2015
63741 "DEBY005" 25.37 "16:00" 20090 2 1 2015
91522 "DEBY001" 33.18 "16:00" 20090 2 1 2015
63741 "DEBY005" 25.44 "17:00" 20090 2 1 2015
91522 "DEBY001" 32.39 "17:00" 20090 2 1 2015
63741 "DEBY005" 32.56 "18:00" 20090 2 1 2015
91522 "DEBY001" 35.52 "18:00" 20090 2 1 2015
63741 "DEBY005"  35.9 "19:00" 20090 2 1 2015
91522 "DEBY001" 29.65 "19:00" 20090 2 1 2015
63741 "DEBY005" 23.61 "20:00" 20090 2 1 2015
91522 "DEBY001" 23.87 "20:00" 20090 2 1 2015
63741 "DEBY005" 33.62 "21:00" 20090 2 1 2015
91522 "DEBY001" 22.03 "21:00" 20090 2 1 2015
63741 "DEBY005" 38.14 "22:00" 20090 2 1 2015
91522 "DEBY001" 17.59 "22:00" 20090 2 1 2015
63741 "DEBY005"    32 "23:00" 20090 2 1 2015
91522 "DEBY001" 14.65 "23:00" 20090 2 1 2015
63741 "DEBY005" 30.47 "24:00" 20090 2 1 2015
91522 "DEBY001" 11.15 "24:00" 20090 2 1 2015
63741 "DEBY005" 45.94 "01:00" 20091 3 1 2015
91522 "DEBY001" 10.94 "01:00" 20091 3 1 2015
63741 "DEBY005" 53.28 "02:00" 20091 3 1 2015
91522 "DEBY001" 10.15 "02:00" 20091 3 1 2015
end
format %td Datum

data-B:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long zipcode str13 city float population str5 areakm2 str10 county str14 street_type str11 location str7 speedlimit
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Kreisstraßen"  "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "Vz325"  
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "Vz325"  
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "60"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "60"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "70"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "Vz325"  
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "60"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "Schritt"
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "70"    
91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
end

If you need further clarification, I would be happy to do it.

Best,
Ami

Pathways / event history

Dear Statalist users

I am hoping you can solve a query I have

I want to look at pathways for young people aged between 15 and 30 in and out of various activities: Employment, FT; Employment PT; Unemployment; Not in the labour force; Home-making and study; and Study-only. I want to make comparisons between young people pre-Global Financial Crisis and young people post-Global Financial Crisis as well as comparisons between men and women. I'm thinking event-history analysis but I'm not quite sure how to go about it? I want to look at pathways (movements between the various activities) and maybe (although not necessarily) how long they stayed in the activities before moving.

If event history is indeed the right way to go about it - how would I make the required time duration variables?

I have longitudinal panel data spanning from 2001-2017.

Can anyone assist?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(activity GFC) byte sex
 3 0 2
10 0 2
10 0 2
10 1 2
10 1 2
 5 1 2
 5 1 2
12 1 2
12 1 2
 3 1 2
 1 1 2
 1 1 2
 1 0 2
 3 1 2
11 1 2
 1 1 2
10 1 2
11 1 2
 3 1 2
11 1 2
 3 1 2
10 1 2
10 1 2
10 0 2
10 0 2
 7 0 2
10 0 2
11 0 1
 1 0 1
 1 0 1
 1 0 1
 1 0 1
 1 0 1
 1 0 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 0 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
12 1 2
12 1 2
12 1 2
12 1 2
 5 1 2
 3 1 2
 3 0 2
11 0 2
12 1 2
 3 1 2
 3 1 2
 3 1 2
 3 1 2
10 1 2
10 1 2
10 1 2
10 1 2
 3 1 2
10 1 2
10 1 2
10 1 2
 3 1 2
 3 1 2
 3 1 2
 3 1 1
 3 1 1
 3 1 1
 3 1 1
 3 1 1
 1 0 1
 3 0 1
 1 0 1
 1 0 1
 1 0 1
 5 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
11 1 1
11 1 1
 1 1 1
 1 1 1
 1 1 1
end
label values activity activity
label def activity 1 "[1] Employed, full-time", modify
label def activity 3 "[3] Employed, part-time", modify
label def activity 5 "[5] Unemployed", modify
label def activity 7 "[7] Not in the labour force", modify
label def activity 10 "[10] Home-making / caring", modify
label def activity 11 "[11] Work and study", modify
label def activity 12 "[12] Study", modify
label values sex QSEX
label def QSEX 1 "[1] Male", modify
label def QSEX 2 "[2] Female", modify

best
Brendan

Generating a variable

I want to generate variable that indicate if individuals continue to participate in a program at age of 72 or after. For example, if an individual participated in the program at age 69 and 73 then a value of 1 should be given, if an individual participated only before the age of 72 then a value of 2 should be given and if an individual participated only at the age of 72 or after then should be given a value of 3.

A sample of data structure is below.

e.g.

ID age Participation_date

B001 68 05nov2012

B001 70 07may2015

B001 72 09jun2017

B002 67 28nov2011

B002 68 22oct2012

B002 69 25nov2013

B002 70 10nov2014

B002 71 14dec2015

B002 72 12dec2016

B003 73 25feb2012

B003 75 08oct2013

B003 77 12feb2016

B004 76 16jun2012

B004 78 22may2014

B005 68 17nov2012

B006 76 12mar2013

B006 78 29apr2015

B007 72 22jun2012

B007 74 04aug2014

B008 71 29jan2013

B008 73 04mar2015

B008 75 30mar2017

B009 72 28jan2015

B010 71 28feb2012

B010 74 03jun2014

B011 73 04feb2013

B011 76 17sep2015

Thanks for the help

Problem with SVY : poisson regression

Hi all,

I'm trying to estimate the effect of containement measures on COVID 19 number of cases evolution (in one city).
The thing is i have only aggregated number of cases over 4 differents periods of time from march 11 to june 26
I also have the number of tests performed, age categories and proportion of males
As first 3 measures were implemented more than 3 weeks ago, im not sure how to define my var measure var. I tried differents forms (Time1=105 days from implementation to this day; Time2=95 days of implementation etc..) but none worked.

I tried to define svyset with pweight, single unit (centered) and i plan to use poisson regression

Does anyone have any suggestion for me please ?

svy:poisson Logcase Tests SexR Time1 Time2 etc...

thanks

Regressing growth rates of a variable on its initial level for separate decades

Dear all,

I am currently trying to regress the growth rate of a variable on its initial level and an intercept. Specifically, the compound annual growth rate of GDP per capita is the dependent variable, and the regressor is the initial level of GDP. The growth rate is estimated over periods of 10 years, so the initial level GDP would be the GDP prevailing in the 1st year of each 10 year-period. An example of the data is provided below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 countrycode int year float gdp
"DEU" 1991 33836.418
"DEU" 1992  34227.57
"DEU" 1993  33670.29
"DEU" 1994  34358.34
"DEU" 1995  34783.29
"DEU" 1996 34967.477
"DEU" 1997 35539.133
"DEU" 1998 36251.195
"DEU" 1999 36913.184
"DEU" 2000 37930.484
"DEU" 2001 38509.605
"DEU" 2002 38368.617
"DEU" 2003  38073.76
"DEU" 2004  38535.17
"DEU" 2005 38835.383
"DEU" 2006  40362.29
"DEU" 2007  41622.36
"DEU" 2008  42102.85
"DEU" 2009  39804.92
"DEU" 2010  41531.93
"DEU" 2011  43969.26
"DEU" 2012  44070.92
"DEU" 2013  44139.03
"DEU" 2014  44933.72
"DEU" 2015   45321.4
"DEU" 2016  45959.57
"DEU" 2017  46916.82
"FIN" 1991 31250.535
"FIN" 1992 30043.363
"FIN" 1993  29692.36
"FIN" 1994 30727.863
"FIN" 1995 31901.844
"FIN" 1996 32963.473
"FIN" 1997 34947.375
"FIN" 1998 36756.746
"FIN" 1999  38277.61
"FIN" 2000  40403.55
"FIN" 2001  41363.69
"FIN" 2002  41967.97
"FIN" 2003  42706.92
"FIN" 2004  44283.16
"FIN" 2005  45358.56
"FIN" 2006  47004.62
"FIN" 2007  49285.27
"FIN" 2008  49440.97
"FIN" 2009  45231.96
"FIN" 2010  46459.97
"FIN" 2011  47423.21
"FIN" 2012  46538.58
"FIN" 2013   45906.8
"FIN" 2014   45550.5
"FIN" 2015  45655.22
"FIN" 2016  46720.56
"FIN" 2017  48033.29
"FRA" 1991  32683.35
"FRA" 1992 33041.363
"FRA" 1993 32691.684
"FRA" 1994  33338.34
"FRA" 1995 33917.926
"FRA" 1996 34275.605
"FRA" 1997 34952.523
"FRA" 1998 36073.637
"FRA" 1999  37116.41
"FRA" 2000  38309.44
"FRA" 2001 38786.086
"FRA" 2002  38942.28
"FRA" 2003 38985.535
"FRA" 2004  39794.64
"FRA" 2005  40152.69
"FRA" 2006  40850.36
"FRA" 2007   41582.8
"FRA" 2008  41456.48
"FRA" 2009  40058.68
"FRA" 2010  40638.34
"FRA" 2011  41329.04
"FRA" 2012  41258.27
"FRA" 2013  41282.99
"FRA" 2014  41480.77
"FRA" 2015  41793.54
"FRA" 2016  42141.84
"FRA" 2017  43001.59
"NLD" 1991 36286.313
"NLD" 1992 36627.406
"NLD" 1993 36830.414
"NLD" 1994 37693.047
"NLD" 1995  38676.07
"NLD" 1996  39844.98
"NLD" 1997  41356.45
"NLD" 1998  43019.19
"NLD" 1999  44885.09
"NLD" 2000  46435.21
"NLD" 2001  47158.42
"NLD" 2002  46960.18
"NLD" 2003  46811.89
"NLD" 2004  47575.48
"NLD" 2005  48437.88
"NLD" 2006  50033.88
"NLD" 2007  51808.77
"NLD" 2008  52727.52
"NLD" 2009  50533.51
"NLD" 2010  50950.04
"NLD" 2011   51499.6
"NLD" 2012   50780.7
"NLD" 2013   50565.3
"NLD" 2014  51100.84
"NLD" 2015  51871.58
"NLD" 2016   52727.1
"NLD" 2017  53942.09
"SWE" 1991  36791.93
"SWE" 1992  36152.99
"SWE" 1993  35201.15
"SWE" 1994 36340.152
"SWE" 1995 37595.406
"SWE" 1996 38141.777
"SWE" 1997  39301.33
"SWE" 1998  40953.22
"SWE" 1999  42685.54
"SWE" 2000  44694.43
"SWE" 2001  45228.91
"SWE" 2002  46071.99
"SWE" 2003  46931.17
"SWE" 2004  48769.29
"SWE" 2005   49981.3
"SWE" 2006  51988.43
"SWE" 2007  53374.82
"SWE" 2008  52832.31
"SWE" 2009  50164.93
"SWE" 2010  52817.44
"SWE" 2011  54020.13
"SWE" 2012  53283.64
"SWE" 2013  53408.79
"SWE" 2014  54334.29
"SWE" 2015   56139.5
"SWE" 2016  56776.29
"SWE" 2017  57367.43
end

Here is where I get stuck and I do not know how to proceed. First, I do not want to do this regression for succesive years. Rather, I would like to do this regression for separate decades, so per country I should only have 3 datapoints considered in the regression: the periods 1991-2001,2001-2011,2007-2017. I realize that the last datapoint overlaps with the 2nd datapoint, but I do not have the data for 2018,2019,2020 so at least this way I can calculate the most recent compound annual growth rate. Problem is, I do not know how to write this down in a command to achieve this result. Moreover, I would like to control for period (decade effects) so I created a decadal identifier for the regression. I am currently using the following command to create decadal identifiers: gen decade=10*floor(year/10) but when I include it in my regression as reg gdp_growth initial_gdp i.decade, vce(cluster cc) the only decade dummy I see is 2010, whereas I think I should have at least 1 more? So I figured maybe I am not properly writing the code down for the decade dummies, so maybe there is a better way to write down a command to control for decade fixed effects.

I hope I have explained my question clear, and I am grateful for any advice you may have for me regarding this problem.

Thank you in advance.

Best,

Satya

Is it possible to obtain ultra-precision in calculating p-values in Stata?

Dear Statalisters,

What would be your suggestion to compute p-values as low as 1e-25?

For example:

One-sided test

gene double p = 1-normal(10.42045)

All the best,

Tiago

difference between meologit with sampling weights and svy:meologit?

HI, I have a general question about the difference in 2 commands. I have multilevel survey data and an ordered 3 level dependent variable (outcome). I'm wondering what the difference is in the commands between a)survey setting the data and running meologit as a svy command vs. b)running meologit and adding in sampling weights. Does anyone have thoughts on that? (Note: I didn't use dataex because I'm not worried about replicating this or how it is running, just whether there are differences between these 2 commands).

Many thanks to any who might reply!
--Ann

Examples below, outcome=ckd_3cat and independent var=stunting
example a:
svydescribe

Survey: Describing stage 1 sampling units

pweight: w_ind_norm
VCE: linearized
Single unit: missing
Strata 1: site
SU 1: hogar
FPC 1: <zero>
. svy: meologit ckdu_3cat stunting
(running meologit on estimation sample)

Survey: Ordered logistic regression

Number of strata = 2 Number of obs = 773
Number of PSUs = 314 Population size = 814.402204
Design df = 312
F( 1, 312) = 2.55
Prob > F = 0.1115

------------------------------------------------------------------------------
| Linearized
ckdu_3cat | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stunting | .4876653 .3055801 1.60 0.112 -.113593 1.088924
-------------+----------------------------------------------------------------
/cut1 | 2.569918 .207242 2.162149 2.977686
/cut2 | 6.545487 .6893908 5.189044 7.90193
------------------------------------------------------------------------------

example B:
meologit ckdu_3cat stunting [pweight = w_ind_norm] || hogar:

Fitting fixed-effects model:

Iteration 0: log likelihood = -241.89419
Iteration 1: log likelihood = -237.86994
Iteration 2: log likelihood = -237.30416
Iteration 3: log likelihood = -237.3033
Iteration 4: log likelihood = -237.3033

Refining starting values:

Grid node 0: log likelihood = -232.26489

Fitting full model:

Iteration 0: log pseudolikelihood = -232.26489
Iteration 1: log pseudolikelihood = -227.83392
Iteration 2: log pseudolikelihood = -227.57108
Iteration 3: log pseudolikelihood = -227.56916
Iteration 4: log pseudolikelihood = -227.56916

Mixed-effects ologit regression Number of obs = 773
Group variable: hogar Number of groups = 313

Obs per group:
min = 1
avg = 2.5
max = 10

Integration method: mvaghermite Integration pts. = 7

Wald chi2(1) = 2.57
Log pseudolikelihood = -227.56916 Prob > chi2 = 0.1091
(Std. Err. adjusted for 313 clusters in hogar)
------------------------------------------------------------------------------
| Robust
ckdu_3cat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stunting | .5356627 .3342881 1.60 0.109 -.1195299 1.190855
-------------+----------------------------------------------------------------
/cut1 | 3.25844 .3152659 2.64053 3.87635
/cut2 | 7.401028 .7907553 5.851176 8.95088
-------------+----------------------------------------------------------------
hogar |
var(_cons)| 1.625323 .5789674 .808588 3.267021
------------------------------------------------------------------------------

Constant weight variable in panel data

Dear users,

I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.

The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?

I thank you very much in advance.

Constant weight variable in panel data

Weight in panel data

Systems of simultaneous equations with non-linear constraints

Dear Stata users,

I have the following issue: I have to estimate a system of simultaneous equations with non-linear constraints of form 0 < a < 1 using OLS.

Example 2 in https://www.stata.com/support/faqs/s...nstraints/#ex2 shows how to impose non-linear constraints of the form 0 < a < 1 using nl command that makes use of the inverse logit function.

In my case, I have to run a pair of regressions simultaneously with a non-linear constraint of the form 0 < [eqn1]beta + [eqn2]beta < 1.

Originally, I used reg3 and imposed a linear constraint of the form [eqn1]beta + [eqn2]beta = 1 but sometimes I got negative coefficients because I could not also impose the positiveness of the estimated coefficients.

So, I would like to know if there is a trick or a way you can suggest to run simultaneous regression equations with non-linear equations of form 0 < a < 1 without using any reparameterization of the coefficients as nl does?

Thank you,

Anna

egen maxdate gives dates 01/01/2500

Hello everyone

I am trying to get the difference between two dates to calculate follow up per patient
I am using a long format dataset with various observations per patient id, no missing values.

I can't spot why for some patients I get 01/01/2500 as the maxdate date, which is wrong.

edate format is float %dM_d,_CY , is this an issue?

This is what I'm using:

bysort patid : egen maxdate = max(edate)

Does anyone know why this might be happening?

Thank you,

Louisa

Excel Table split by socio-demographics using Putexcel

Hello fellow Stata community,

I´m looking for a way to automate my Analysis Output to Excel using Putexcel. I already found this Code by eric_a_booth that helps me create summary tables for each of the four example variables, each on an own sheet:

foreach j of varlist var1 var2 d_var1 d_var2{
di `"`j'"'
su `j'
putexcel A3=rscalars using `"test.xlsx"', sheet("S_`j'") modify keepcellf
putexcel A1=(`" Example for `j' "') using `"test.xlsx"', sheet("S_`j'") modify keepcellf
}

di `"{browse `"test.xlsx"': Click to open `"test.xlsx"' }"'

What I´m now looking for, would be an extension of this. My goal would be creating automated tables like the one attached, displaying the values for each variable over the values of a socio-demographic variable. In a shortened try, I programmed the following:

sum var1
*return list
putexcel A1=("Table 1") B3=("Obs") C3=("Total") D3=("Female") E3=("Male") F3=("Divers") using results, replace
putexcel C4=matrix(r(mean)*100) using results, modify
sum var1 if gender==1
putexcel D4=matrix(r(mean)*100) using results, modify
sum var1 if gender==2
putexcel E4=matrix(r(mean)*100) using results, modify
sum var1 if gender==3
putexcel F4=matrix(r(mean)*100) using results, modify
...

Since I want to run this over dozens of var´s it seems inevitable to create some kind of a loop.

I hope the example table helps understanding where i´m trying to head with this, otherwise I´ll try to further specify my challenge.
Thanks in advance for any help given, I´d be more than thankful for any advise.

Regards
Marvin

Array

Exporting detailed summary statistics to Word

Hello helpful Stata users,

I am trying to generate a word export of my dependent variable via the sum function with details.
Should basically look like this in the end:
[ATTACH=CONFIG]temp_18812_1593519184716_955[/ATTACH]
Now I have tried the asdoc export, which somehow seems to transform it into a one line table missing all the useful percentile information and just summing up Obs, Mean and so on...
> asdoc sum Exports_USD_Hundred_Million, detail

I have also tried using esttab, however I am unfamiliar with the command and couldn't produce what I was looking for.
Is there an option for asdoc, which just exports the table like it normally shows in Stata without the asdoc command in front? Or any other way, I can generate output such as in the above picture?

Thank you very much for any help and advice!

Kind regards,
Till

ppmlhdfe Pseudo R2 0.99+

Here is the model to estimate the impact of free trade agreements (fta) and preferential trade agreements (pta) on exports (exports) for annual data from 1990-2018 with 4 year intervals:

ppmlhdfe exports fta pta , a(im#year ex#year im#ex) cluster(im#ex)

The value of pseudo R2 remains as high as 0.99+, and it stays there for various model specifications. Should I doubt something for this? Is there some alternative measure I may additionally calculate?

Bernanke Sims Decomposition Restriction for VAR IRFs

Hi all,
I'm wondering if anyone knows if STATA has the functionality to do a bernanke sims decomposition when producing IRFs from VAR models, such that each variable can only affect one another after a lag of one, but that each variable can affect itself contemporaneously? I know that there is cholesky but this doesn't prevent contemporaneous influence other than through the ordering.

Thanks in advance!

Importance of misspecification test vs. R-sq. and consequences of xtsktest

Dear Statalist Members,

I am analyzing a balanced panel of around 2400 firms over 12 years (Stata 13). The output I am able to present here is based on test data, as I am not allowed (or able to) extract the original files. The only difference is the number of firms, which is higher in the original dataset, and that most of my explanatory variables turn out to be significant, unlike in this sample data. F-statistic in the original is F(11,13432) Prob>F 0.0000, R-sq. overall is 0.9639.

My goal is to analyze the effect of investments in computer (investict), product and process innovations on the demand for highskilled workers. Controls include the size of the firm in terms of employees (total), the industry, a dummy for West Germany (west), a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD, and some more.

I have used xtserial and xttest3 which have lead me to include clustered robust standard errors. Using xtoverid,made me decide to use fixed effects. -testparm- has made me include year fixed effects. So my regression is now:

Code:

  xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd t
> ech i.year, fe vce(cluster idnum)
note: west omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      4344
Group variable: idnum                           Number of groups   =       498

R-sq:  within  = 0.1005                         Obs per group: min =         1
       between = 0.5034                                        avg =       8.7
       overall = 0.4393                                        max =        11

                                                F(21,497)          =      2.60
corr(u_i, Xb)  = 0.3892                         Prob > F           =    0.0001

                                (Std. Err. adjusted for 498 clusters in idnum)
------------------------------------------------------------------------------
             |               Robust
   highskill |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   investict |   .7032893   .2711382     2.59   0.010      .170571    1.236008
product_inno |   .2723859   .6988765     0.39   0.697    -1.100731    1.645503
process_inno |  -.3938082   .4501978    -0.87   0.382    -1.278334    .4907173
       total |    .101938   .0245108     4.16   0.000     .0537805    .1500954
        west |          0  (omitted)
    industry |   .1624997   .1911486     0.85   0.396    -.2130592    .5380586
  collective |  -.2838042   .5861356    -0.48   0.628    -1.435413    .8678049
 exportshare |   .8483747   2.351452     0.36   0.718    -3.771638    5.468387
  investment |   1.44e-06   5.98e-07     2.41   0.016     2.68e-07    2.62e-06
    turnover |  -1.99e-07   1.39e-07    -1.43   0.153    -4.73e-07    7.46e-08
         rnd |  -1.103514   .9824249    -1.12   0.262    -3.033732    .8267042
        tech |  -.6756037   .2828397    -2.39   0.017    -1.231313   -.1198947
             |
        year |
       2008  |   .0310991   .3815399     0.08   0.935    -.7185309    .7807291
       2009  |   .4981931   .3197414     1.56   0.120    -.1300184    1.126405
       2010  |   .7890588   .4913133     1.61   0.109    -.1762483    1.754366
       2011  |   1.109093   .5630923     1.97   0.049     .0027585    2.215428
       2012  |   1.189345   .5407669     2.20   0.028      .126874    2.251816
       2013  |   .0965383   .7094676     0.14   0.892    -1.297387    1.490464
       2014  |   .4120097   .6609871     0.62   0.533    -.8866637    1.710683
       2015  |  -.1867301   .7267681    -0.26   0.797    -1.614647    1.241187
       2016  |   .1137137   .5447759     0.21   0.835     -.956634    1.184061
       2017  |  -.4267298   .7349041    -0.58   0.562    -1.870632    1.017172
             |
       _cons |   4.706464   2.350515     2.00   0.046     .0882924    9.324636
-------------+----------------------------------------------------------------
     sigma_u |  22.632204
     sigma_e |  7.5596268
         rho |  .89962854   (fraction of variance due to u_i)
------------------------------------------------------------------------------

I originally intended to use the share of highskilled employees as my dependent variable, but after reading the paper of Kronman (1993) and several posts in this forum concerning the problems with ratios, I have switched to using the absolute number of highskilled employees (highskill) and include the total number of employees as a control. This has increased my R-squared by a lot (it was only 0.016 before).

On the other hand, I tested my model specification using:

Code:

 predict fitted, xb
g sq_fitted=fitted^2
xtreg highskill fitted sq_fitted
test sq_fitted

The p-value was 0.8 before when using the share, now it is significant (0.0000) and telling me my model is misspecified. Now my question is, if the test I used to test for misspecification is the right thing to do here and if yes, what else can I do now concerning my specification? Or is a high R-Sq. enough to argue that my model fits?

Also I don't understand why the dummy for west would be omitted, none of the regressors are highly correlated.

I have read many posts in this forum and run several tests that made me end up with this fixed effects regression model, so I am confused about the result of the specification test. I have also tried -areg-, absorb(idnum) vce(cluster idnum), which has slightly different coefficients and a higher R-Sq. (as is normal) than the -xtreg, fe- but it has the same result in the misspecification test.

Testing for normality using

Code:

 xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd tech, re vce(cluster idnum)

(re because it is not possible with fe) and then -xtsktest- has given me the following:

Code:

   xtsktest
(running _xtsktest_calculations on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50

Tests for skewness and kurtosis                 Number of obs      =      4344
                                                Replications       =        50

                                 (Replications based on 498 clusters in idnum)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  Skewness_e |  -1805.438   1230.613    -1.47   0.142    -4217.396    606.5195
  Kurtosis_e |   456552.4   194447.7     2.35   0.019     75441.97    837662.8
  Skewness_u |    12182.3   2960.393     4.12   0.000     6380.038    17984.56
  Kurtosis_u |    1510700   274557.2     5.50   0.000     972577.4     2048822
------------------------------------------------------------------------------
Joint test for Normality on e:        chi2(2) =   7.67    Prob > chi2 = 0.0217
Joint test for Normality on u:        chi2(2) =  47.21    Prob > chi2 = 0.0000
------------------------------------------------------------------------------

Could this mean I should transform my data using logs as there are issues with normality? or what are the consequences?

I appreciate any input on my issues, thanks in advance,

Helen

Testing for weak instruments in 2SLS regression using robust SE (just-identified model)

Hi,

I'am currently struggling to test for weak instruments when conducting a 2sls regression using robust standard errors (vce robust). My model ist just-identified (i.e., one instrument and one endogenous variable).

estat firststage provides me with the robust F statistic. However, I do not know to what compare it to. Does the threshold of 10 apply here?

Using weakivtest provides me with an effective F statistic that equals the robust F statisic since my model ist just-identified.

Using ivreg2 provides me with weak identification tests:

Cragg-Donald Wald F statistic: is this statistic valid when using vce robust? What can I compare it to? Does the threshold of 10 apply here?
Kleibergen-Paap rk Wald F statistic: this statistic again equals the robust F statistic.

My main question is:

Does the rule of thumb robust F statistic / effective F statistic / Kleibergen-Paap rk Wald F statistic > 10 apply in this context?

Esport summary statistics by 5 groups

Dear all,

Recfently I presented my research project to a faculty. One of the comment for improvement is related with descriptive statistics visualization.
Indeed, I presented my summary statistics by 5 groups in this way

Group1 _____Mean Std.dev
- var1
- var2
- var3
Group2
-var1
-var2
-var3

Nevertheless, they suggested me that it would be better to present in this way

________Group1 ________Group2
_______Mean. Std.dev__ Mean.Std.dev
var1
var2
var3

Do you have any suggestions on how to solve it?

Generating a new variable with standardized values compared to a healthy control group mean and SD (z scores)

I'm working with crossectional test data with a selection of test results (all continuous variables) in both a patient and healthy control (HC) group. I have made separate variables for the two groups' test results e.g. test1_controls, test1_patients, test2_controls, test2_patients etc. There are 95 patients and 48 healthy controls. I've attached the data for test 1.
In order to compare the patient group to the healthy control group I want to make a new variable for each test with z-scores; generating a new value for each patient comparing them to the mean of the HC group so I can see how far above or below the "normal" data they fall. So if the test 1 z-score for patient number 1 was -2.3 that would mean they were 2.3 standard deviations below the mean of the healthy control group data for that test. What code would you use for this?
I can do a generate a standardized value using egen test1_patients_z = std(test1_patients)
but these standardized values are only based on the mean and SD of the patient group and I haven't been able to find any options that give me the result I need comparing to the healthy control mean.
I tried egen test1_patients_z = std(test1_patients), mean(#) std(#)
with # being manually entered mean and SD of test1_controls (which I got by codebook) but that did not work (it just shifted the whole scale upwards by the HC mean rather than the normal z score mean of 0).
Thanks for your help!

Linear mixed effects models with random effects for subject

I have a dataset of 16 patients with 10 variables. The dependent variable is "bmizpre". It is a longitudinal study with a total of 3 time-points (variable time is "point"). T. The independent variables are "gender", "drug", "bmicategory " and "diseasetype". The variable defining different patients is "ptid". I would like to get result as the mean difference (95% CI) in bmizpre for different covariates

I am interested to analyze bmizpre over time by gender, drug, "bmicategory " and "diseasetype" with random effects for subject.

My question is do I have to run different univariate model included time, the covariate, and the interaction between time and the covariate ?

This is the command that I used:

Code:

mixed bmizpre gender##c.point || ptid: point

Do I have to repeat for all covariates ?

Following is the output

Code:

 mixed bmizpre gender##c.point || ptid: point

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -46.530546  
Iteration 1:   log likelihood = -45.773561  
Iteration 2:   log likelihood = -45.698352  
Iteration 3:   log likelihood = -45.698038  
Iteration 4:   log likelihood = -45.698038  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =         32
Group variable: ptid                            Number of groups  =         16

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2

                                                Wald chi2(3)      =       1.46
Log likelihood = -45.698038                     Prob > chi2       =     0.6918

--------------------------------------------------------------------------------
       bmizpre |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
      1.gender |      .1375   .5429129     0.25   0.800    -.9265897     1.20159
         point |    .225625   .1912063     1.18   0.238    -.1491324    .6003824
               |
gender#c.point |
            1  |   -.179375   .2704065    -0.66   0.507     -.709362     .350612
               |
         _cons |    -.21125   .3838974    -0.55   0.582    -.9636751     .541175
--------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
ptid: Independent            |
                  var(point) |   1.06e-18   1.40e-17      6.07e-30    1.86e-07
                  var(_cons) |   .5940602   .3300578      .1999428    1.765042
-----------------------------+------------------------------------------------
               var(Residual) |   .5849574   .2068139      .2925356    1.169687
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 4.69                  Prob > chi2 = 0.0960

Note: LR test is conservative and provided only for reference.

Unbalance panel data with gap in the timeseries

I have data of the following format where there is gap in between the time-series.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte period str1 country byte(var1 var2) float var3
1 "A" 11  . 1.2
2 "A" 12 23 1.5
3 "A"  . 25 1.4
4 "A" 13 19   1
5 "A" 14 18 1.6
1 "B" 14 12 2.1
2 "B" 15 16 1.6
3 "B" 13 15 1.8
4 "B" 15 14 1.4
5 "B" 15  .  13
end

I want to estimate GMM model from the data. Can I run GMM model from this data as the time-series is broken? For example, will the the the observations (4 obs) in var1 be included in the analysis as observation in third period is missing?

Stata keybindings (keyboard shortcuts) using a hammer spoon (almost perfect, except for one thing)

I've installed hammerspoon on my Mac OS.
As a result, I can now use Emacs keybindings in Stata as well.
It's very useful.
Unfortunately, Ctrl-b doesn't work anymore, so I can't re-run the old command freely.
I can't use Ctrl-r and Ctrl-b to select freely when running it.
That's the only thing that's really disappointing.
Does anyone know of a better way to do this?
I could find keywords as terminfo and termcap, but I don't know how to set it up.
Thank you for in advance.
Somebody please give me some advice.

Please refer to the following URL for hammerspoon.
https://gist.github.com/justintanner...5d5196cf22e98a

Monday, June 29, 2020

How to test coefficent difference across two group

Hi there,
I have a problem in my research project, where I want to test the coefficent difference across two group, as shown by bellow pictures. Tell me how to test through STATA, please.
Moreover, you can also tell me other ways to test the difference through STATA, with the exception of suest, bootstrap, and chowtest.

Thanks for your help!

Array

Residuals as Dependent Variable - Interpretation of Coefficients?

Hi there,

I try to model daily stock market trading volume with a bunch of independent variables like stock returns, stock price volatility, etc. in an time-series regression model.
In the first place, I estimate a rolling AR(1) model of trading volume over a ten-day window in order to use the residuals of this model as "shocks" to volume as dependent variable in my main regression setup.

Is there any way to compute the effect of an independent variable on volume given that choice of the dependent variable?
I mean, can you make a meaningful statement like "Given the estimated coefficient of dependent variable X: If one increases X by one unit (one percent), c.p. the effect on stock market volume amounts to ..."?

Thanks!

Saving a file takes forever

Hi, I have a problem with saving an appended file. I have couple of files to be appended, but it takes hours till a file saved! Sorry if my question is very simple. Any suggestion?

Measuring Knowledge Level based on survey answers

Hi!

I asked my respondents a few questions with 04 options each

For example,
"Which one is the largest state in area? Texas, Alaska, Delaware, Maine"
"Which state is located in the West coast? California, Nevada, New York, Illinois"
...
...
I want to develop a measure of "Geography Knowledge". What is the standard practice or tools that are used frequently? If you can provide some link to published work that will do too.

I am doing some obvious one such as,
1) Dummy for correct answer - adding the correct answer dummy for each respondents.
2) Rank of correct answers - such as for the first question - Alaska (4), Texas (3), Maine (2), Delaware (1).

Problem is, I am assuming equal distance between the "correctness" of the answers. For example, I want to give more points to Texas (more than 03) and less point to someone who answered Delaware as the largest state (less than 01).

PS. I do know the correct answers to each question.

How to draw a twoway picture like these

How to draw a twoway picture like these, these two pictures，thanks

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year double(telephone growth)
2005  57.22               14.4
2006  63.39               10.8
2007  69.45  9.600000000000001
2008  74.29                  7
2009  79.89                7.5
2010  86.41  8.200000000000001
2011  94.81  9.700000000000001
2012  103.1  8.700000000000001
2013 109.95 6.6000000000000005
2014 112.26                2.1
2015  109.3               -2.6
2016 110.55                1.1
end

Array

Convert SIF date to year

Hi all,

Below is a few observations of my large panel data set I got after linking several data together. My question is about variable mth , type int. I thought it is internal stata format for time. but now I am not sure.

Code:

clear
input int mth str5 ticker
285 "A" 
286 "A" 
287 "A" 
312 "A" 
313 "A" 
314 "A" 
315 "A" 
317 "A" 
318 "A" 
end

What I need is to generate a year variable out of this mth variable.

I thought mth is a SIF time, because in the earlier merging process across 3 stata data, I used

Code:

gen mth=month(date)   // I assume this is a month stored SIF time

to generate this variable, and used it in merging process.

Please let me know if a year variable can be created from mth.

Thank you,

Rochelle

Desscriptive Statistics and Matrices

I have the code below. I want to export the result from the code below to excel and also want to add standard deviations for the two groups SCTP and NonSCTP.
Code:
local vars "HAZ WAZ WHZ Female melevel region twins children_U5 water_source helevel year district AdjHAZ newcage newcage_6 newcage_11 newreligion newwealthsco newhhsex newwage newhhweight newchild_weight agediff"

matrix means = J(23, 3, -99)
matrix colnames means = NonSCTP SCTP t-value
matrix rownames means = `vars'

local irow = 0
qui {
foreach var of varlist `vars' {
local ++irow
sum `var' if SCTP == 0
matrix means[`irow',1] = r(mean)
sum `var' if SCTP == 1
matrix means[`irow',2] = r(mean)
ttest `var', by(SCTP)
matrix means[`irow',3] = r(t)

}
}
matrix list means, format(%15.4f)

xtreg Y X1 X2...Xn i.country, fe- Is it possible/ correct/ feasible?

Declared panel data as xtset year (instead of xtset panelid year)

xtset YEAR
panel variable: YEAR (balanced)

. xtreg logXijt logGDPitGDPjt logPCGDPitPCGDPjt logPCGDPDijt TRGDPit TRGDPjt logDistanceij Borderij i.PanelIDIndoASEAN, fe
Is it correct way of computing country wise fixed effects?

.reg logXijt logGDPitGDPjt logPCGDPitPCGDPjt logPCGDPDijt TRGDPit TRGDPjt logDistanceij Borderij i.PanelIDIndoASEAN. This is also countrywise fixed effect command.

I was told to give command in point 2 for countyrwise fixed effect and now I am confused if it is possible? Please answer this query

Regards
Saba Gulnaz

Limitation of arguments in mata

Cheers together,

currently, I am trying to find the internal rate of return or cost of equity for companies over time. So I have many companies and for each company a time series of data. Hence, I am trying to find the cost of equity for each year for a given company. This requires to solve an equation like: Price = function(cost of equity, earnings0, earnings1, earnings2,.....) . Here the cost of equity is the only unkonwn paramter, while I have 13 input paramters.

In this context I wrote some mata code which loops over all observations and determines the cost of equity (here k). Nevertheles, the command optimize allows only for 9 Input paramters....

Does anyone have an idea how to increase the number of arguments or can recommand any other command that allows for at least 13 arguments?

Thanks a lof for your help and consideration!

Best,
Dominik

HTML Code:

mata: mata clear  

mata

P = st_data(.,("me"))
GL = st_data(.,("g_l"))
B0 = st_data(.,("be"))
B1 = st_data(.,("be_1"))
B2 = st_data(.,("be_2"))
B3 = st_data(.,("be_3"))
B4 = st_data(.,("be_4"))
B5 = st_data(.,("be_5"))
E1 = st_data(.,("E_1"))
E2 = st_data(.,("E_2"))
E3 = st_data(.,("E_3"))
E4 = st_data(.,("E_4"))
E5 = st_data(.,("E_5"))

void eval0(todo, k, p, gl, b0, b1, b2, b3, b4, b5, e1, e2, e3, e4, e5, v, g, H)  {
    v =  (p :- b0 :- ((e1 :- k :* b0) :/ (1 :+ k)) :- ((e2 :- k :* b1) :/ (1 :+ k)^2) :- ((e3 :- k :* b2) :/ (1 :+ k)^3) :- ((e4 :- k :* b3) :/ (1 :+ k)^4) :- ((e5 :- k :* b4) :/ (1 :+ k)^5) :- (((e5 :- k :* b4) :* (1+gl)) :/ ((1 :+ k)^5) :* (k-gl)))^2                                    
    }
    S = optimize_init()
    optimize_init_which(S,  "min")
    optimize_init_conv_ptol(S, 1e-12)
    optimize_init_conv_vtol(S, 1e-12)
    optimize_init_evaluator(S, &eval0())

    optimize_init_params(S, (0))
    
    for(i=1;i<=235313;i++) {
        p =P[i..i,1..1]
        gl =GL[i..i,1..1]
        b0 =B0[i..i,1..1]
        b1 =B1[i..i,1..1]
        b2 =B2[i..i,1..1]
        b3 =B3[i..i,1..1]
        b4 =B4[i..i,1..1]
        b5 =B5[i..i,1..1]
        e1 =E1[i..i,1..1]
        e2 =E2[i..i,1..1]
        e3 =E3[i..i,1..1]
        e4 =E4[i..i,1..1]
        e5 =E5[i..i,1..1]
        
        optimize_init_argument(S, 1, p)
        optimize_init_argument(S, 2, gl)
        optimize_init_argument(S, 3, b0)
        optimize_init_argument(S, 4, b1)
        optimize_init_argument(S, 5, b2)
        optimize_init_argument(S, 6, b3)
        optimize_init_argument(S, 7, b4)
        optimize_init_argument(S, 8, b5)
        optimize_init_argument(S, 9, E1)
        optimize_init_argument(S, 10, E2)
        optimize_init_argument(S, 11, E3)
        optimize_init_argument(S, 12, E4)
        optimize_init_argument(S, 13, E5)
        
        k= optimize(S)
        k
        ri = k
        ri
        st_matrix("r"+strofreal(i),ri)
        if (i == 1) R = st_matrix( "r1")
        if (i >= 2) R = R \ ri
        }
        R
end

How to import several excels

Dear Statalist,

I need to create a database using several (hundreds) of excels like the one I show below. The problem is that I do not know how to do it since the name of each file does not follow a consecutive order. I mean, the first file could be NACE0113 and the second one NACE0119...
Another problem is that there are several rows at the beginning that I do not need, in fact I only need from row 11 onwards. However, I also need to differentiate with respect to when x1 is the median (rows 14 to 36), or x1 is the mean (rows 39 to 61) for the different years. So, an optimal solution (as far as I see) would be to put first the median like (mex1 mex2...) and then as other different variables the means like (mnx1 mnx2...) for the different years.
I do not know if this is possible, but i am pretty lost here.
Do you know of any way that help to solve this, or any other solution?
Thanks in advance. Array

wildcard in string variable

I'm trying to link different business locations to appropriate zip codes using a do-file. I have business names in a name variable and created a new variable to hold the zipcode. Some of the business names are similar but they have a location indicator at the end such as "Office - Allendale" and "Office-Grand Rapids". I tried using a replace command with an if statement but it doesn't appear that I can use wildcards in that. Is there a way for me to do this? Below is how I'm currently doing it...

replace zip="49401" if locationname==" Office - Allendale"

This works for just a few instances but I have a health system with hundreds of entries so trying to avoid having to individually manage each one since they have the embedded location descriptor" Thank you!

How to organise my data, Im a beginner!

Hi
I would like to know what commands to use to organise my data,
I currently have a dataset with over 1000 observations and would like to condense it all.
My data varies by country and there are many different variables but these have been collected in different years, I would like to combine all the years into one observation.

For example:

I have something like this

Country Year Var1 Var2 Var3 Var4
1 2001 5
1 2002 7 2
1 2003 4

and would like to collapse all the info by country into one single observation point ( irregardless of the date) such that I get:

Country Var1 Var2 Var3 Var4
1 5 7 2 4

Any ideas as to how i can do this?

Also Sorry I haven't used the dataset example thing, like I say im new to stata and that didn't really work- hope this is easy enough for you to help!

Thank you

estimating sample size for cohort studies

I am trying to compute the sample size of a cohort study, pregnant women beyond 20 weeks of gestation with normal blood pressure <130/80 mmhg versus pregnant women beyong 20 weeks of gestation with subclinical elevation of blood pressure 130 - 139/ 80 - 89 mmhg. The outcome of interest is onset (incidence) of pregancy induced hypetenstion as per existing threasholds 140/90 mmhg). Recent cohort study of 2,090 normotensive women, 1318 (63.0%) remained normotensive for their entire antenatal course prior to delivery admission and 772 (37.0%) had new-onset blood pressure elevations between 130-139/80-89 mmHg. The incidence of pregnancy induced hypertension in the normotensive group was 11.6% vs 32% in the blood pressure elevation group. https://www.ajog.org/article/S0002-9378(20)30635-9/pdf

Therefore using 0.116 as the proportion of the outcome in the normotensive group, i would like to calculate the sample size for a cohort study powered at 80%, with type I error of 5%.

will the power twoproportions function in STATA sufice? how would I incoperate loss to follow up (drop out) in the said syntax?

Counting up within ID

Hi everyone,

Quick question that may be quite simple to answer but I am having trouble wrapping my head around it.
Below I have pasted some example data.

We have our participant ID and a variable called numberofsample.

Code:

clear
input float(ID numberofsample)
 1 1
 2 1
 3 3
 3 3
 3 3
 4 3
 4 3
 4 3
 5 1
 6 3
 6 3
 6 3
 7 2
 7 2
 8 3
 8 3
 8 3
 9 3
 9 3
 9 3
end

We can see that participand ID was included in 1 number of sample so has one record.
Participant ID 3 was included in 3 samples so has 3 records. etc.

I was wondering if there was any script that could essentially create a new var (lets call it experiment), that would make the data look like this (below)?

Code:

clear
input float(ID numberofsample experiment)
 1 1 1
 2 1 1
 3 3 1
 3 3 2
 3 3 3
 4 3 1
 4 3 2
 4 3 3
 5 1 1
 6 3 1
 6 3 2
 6 3 3
 7 2 1
 7 2 2
 8 3 1
 8 3 2
 8 3 3
 9 3 1
 9 3 2
 9 3 3
end

basically for any numberofsample that is 1 that the experiment would equal 1
if numberofsample is 2, then the first record within the ID would have experiment = 1 and then the second record experiment = 2
if numberofsample is 3, then the first record within the ID would have experiment = 1 and then the second record experiment = 2 and the third record experiment = 3

Any help would be super appreciated!

Kind regards,
Ryan

Producting publication tables for loneway command

Dear Stata users,
I computed different intraclass correlations using the loneway command. Which command can I use to produce publication-quality tables for these results?
Best, Anne

standard deviation of a variable

Hi everyone,
I have a set data for 10 years. I want to calculate the standard deviation of one variable during year y-2 to y (three years) in Stata 16. Can anybody say which command I have to use?
Thanks in advance.

Nelson aalen cumulative hazard function

Hi all!

Hope you are doing good. I have a question about the Nelson aalen cumulative hazard function. I made a graph about chance of discharge after a certain surgery.
As you can see in the exported graph does the graph begin at day 5, how can change it so it begin at 0 with a probablity of 0? I can change the labels of the axis but there will be no line drawn from 0.
The graph looks the way it does because there is no chance of discharge in the first 4 days. I hope you understand my question! Let me know if you know how to help

Kind regards,
Daniel

Array

Treatment Effect in Panel Data with Many Time Periods and Various Treatment Starts

Dear Statalist,

I have a question regarding the identification of a treatment effect in a large panel dataset (800.000 observations), spanning many time periods (monthly values for each pixel for 17 years, 2000-2017). The data is on a land management project, and the objective is to identify whether the project has had an observable effect on vegetation as measured by satellite data. Treatment happened in different treatment areas at different times between 2009 and 2015, so there is no “clean” pre-treatment and post-treatment period.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float(yrmo year month critws_implemstart ndvi ndvi_mean ndvi_an treatmentyear)
10869164 624 2012 1 . 1.2073 1.2279375 -1.6806648 0
10869164 672 2016 1 . 1.2025 1.2279375  -2.071561 0
10869164 636 2013 1 .   1.21 1.2279375 -1.4607764 0
10869164 480 2000 1 .      . 1.2279375          . 0
10869164 528 2004 1 . 1.2136 1.2279375 -1.1676018 0
10869164 660 2015 1 .  1.232 1.2279375  .33084205 0
10869164 588 2009 1 . 1.1974 1.2279375  -2.486893 0
10869164 576 2008 1 .  1.252 1.2279375   1.959588 0
10869164 504 2002 1 . 1.2442 1.2279375  1.3243778 0
10869164 612 2011 1 . 1.2146 1.2279375 -1.0861704 0
10869164 600 2010 1 . 1.1985 1.2279375 -2.3973064 0
10869164 492 2001 1 . 1.2364 1.2279375   .6891677 0
10869164 684 2017 1 .  1.204 1.2279375  -1.949404 0
10869164 516 2003 1 . 1.2128 1.2279375 -1.2327528 0
10869164 540 2005 1 . 1.2132 1.2279375 -1.2001822 0
10869164 564 2007 1 . 1.2301 1.2279375   .1761145 0
10869164 648 2014 1 . 1.2241 1.2279375  -.3125132 0
10869164 552 2006 1 . 1.2212 1.2279375  -.5486819 0
10869165 492 2001 1 . 1.2211 1.2083875  1.0520201 0
10869165 624 2012 1 . 1.1841 1.2083875 -2.0099068 0
10869165 516 2003 1 .  1.193 1.2083875  -1.273394 0
10869165 684 2017 1 . 1.2126 1.2083875  .34860495 0
10869165 528 2004 1 . 1.1913 1.2083875 -1.4140712 0
10869165 540 2005 1 . 1.1849 1.2083875 -1.9437017 0
10869165 588 2009 1 . 1.1823 1.2083875 -2.1588707 0
10869165 636 2013 1 . 1.1968 1.2083875  -.9589226 0
10869165 648 2014 1 . 1.2069 1.2083875  -.1230974 0
10869165 480 2000 1 .      . 1.2083875          . 0
10869165 672 2016 1 . 1.2016 1.2083875 -.56170213 0
10869165 612 2011 1 . 1.1981 1.2083875  -.8513431 0
10869165 564 2007 1 .  1.213 1.2083875   .3817124 0
10869165 504 2002 1 . 1.2263 1.2083875   1.482348 0
10869165 600 2010 1 . 1.1814 1.2083875 -2.2333524 0
10869165 576 2008 1 . 1.2369 1.2083875  2.3595476 0
10869165 660 2015 1 . 1.2336 1.2083875  2.0864604 0
10869165 552 2006 1 . 1.2006 1.2083875  -.6444511 0
10869166 480 2000 1 .      . 1.2400625          . 0
10869166 648 2014 1 . 1.2619 1.2400625  1.7609978 0
10869166 684 2017 1 . 1.2456 1.2400625   .4465509 0
10869166 528 2004 1 . 1.2176 1.2400625 -1.8113997 0
10869166 540 2005 1 . 1.2199 1.2400625  -1.625923 0
10869166 564 2007 1 . 1.2531 1.2400625  1.0513633 0
10869166 552 2006 1 . 1.2385 1.2400625  -.1259998 0
10869166 492 2001 1 . 1.2311 1.2400625  -.7227468 0
10869166 600 2010 1 . 1.2223 1.2400625 -1.4323813 0
10869166 624 2012 1 .  1.198 1.2400625  -3.391968 0
10869166 516 2003 1 . 1.2231 1.2400625  -1.367877 0
10869166 672 2016 1 . 1.2359 1.2400625   -.335663 0
10869166 504 2002 1 . 1.2566 1.2400625   1.333606 0
10869166 576 2008 1 . 1.2806 1.2400625   3.268987 0
10869166 660 2015 1 . 1.3035 1.2400625   5.115676 0
10869166 636 2013 1 . 1.2509 1.2400625   .8739523 0
10869166 588 2009 1 . 1.2284 1.2400625  -.9404755 0
10869166 612 2011 1 . 1.2305 1.2400625  -.7711299 0
10869167 516 2003 1 . 1.2405 1.2541875 -1.0913434 0
10869167 684 2017 1 . 1.2396 1.2541875 -1.1631054 0
10869167 492 2001 1 .  1.244 1.2541875  -.8122794 0
10869167 552 2006 1 . 1.2647 1.2541875   .8381993 0
10869167 480 2000 1 .      . 1.2541875          . 0
10869167 588 2009 1 . 1.2456 1.2541875  -.6847046 0
10869167 624 2012 1 . 1.2141 1.2541875 -3.1962895 0
10869167 648 2014 1 . 1.2474 1.2541875  -.5411806 0
10869167 540 2005 1 . 1.2301 1.2541875 -1.9205605 0
10869167 504 2002 1 . 1.2826 1.2541875  2.2654173 0
10869167 660 2015 1 . 1.2837 1.2541875  2.3531191 0
10869167 576 2008 1 . 1.2796 1.2541875   2.026217 0
10869167 612 2011 1 . 1.2445 1.2541875  -.7724063 0
10869167 636 2013 1 . 1.2521 1.2541875 -.16644034 0
10869167 564 2007 1 . 1.2523 1.2541875  -.1504911 0
10869167 672 2016 1 . 1.2464 1.2541875  -.6209172 0
10869167 528 2004 1 . 1.2397 1.2541875 -1.1551307 0
10869167 600 2010 1 .  1.229 1.2541875 -2.0082717 0
10869185 684 2017 1 . 1.2134  1.227675 -1.1627634 0
10869185 564 2007 1 . 1.2184  1.227675  -.7554898 0
10869185 552 2006 1 . 1.1851  1.227675  -3.467938 0
10869185 540 2005 1 . 1.2016  1.227675  -2.123934 0
10869185 600 2010 1 . 1.1986  1.227675 -2.3682904 0
10869185 612 2011 1 . 1.2374  1.227675   .7921554 0
10869185 648 2014 1 . 1.2048  1.227675 -1.8632742 0
10869185 624 2012 1 . 1.2223  1.227675   -.437812 0
10869185 528 2004 1 . 1.2314  1.227675   .3034233 0
10869185 516 2003 1 . 1.2204  1.227675 -.59258235 0
10869185 660 2015 1 . 1.2069  1.227675   -1.69222 0
10869185 576 2008 1 . 1.2483  1.227675  1.6800046 0
10869185 480 2000 1 .      .  1.227675          . 0
10869185 672 2016 1 . 1.1799  1.227675  -3.891495 0
10869185 636 2013 1 . 1.2167  1.227675  -.8939666 0
10869185 492 2001 1 .  1.219  1.227675  -.7066185 0
10869185 588 2009 1 . 1.1958  1.227675 -2.5963724 0
10869185 504 2002 1 . 1.2972  1.227675   5.663144 0
10869188 576 2008 1 . 1.2204  1.212275   .6702231 0
10869188 528 2004 1 . 1.2308  1.212275   1.528119 0
10869188 636 2013 1 . 1.2004  1.212275   -.979566 0
10869188 600 2010 1 . 1.1956  1.212275 -1.3755126 0
10869188 660 2015 1 . 1.2162  1.212275   .3237686 0
10869188 516 2003 1 . 1.2004  1.212275   -.979566 0
10869188 672 2016 1 .  1.196  1.212275  -1.342521 0
10869188 684 2017 1 . 1.2042  1.212275  -.6661029 0
10869188 612 2011 1 . 1.2205  1.212275   .6784735 0
10869188 504 2002 1 .  1.239  1.212275  2.2045274 0
end
format %tm yrmo

At the moment I am employing a very simple type of difference-in-difference style estimation looking like this:

Code:

reg ndvi_an treatmentyear i.year i.month, r

The dependent variable (ndvi_an) is the percentage deviation of vegetation from the 2000-2008 monthly mean of a specific pixel, and the treatment variable (treatmentyear) is the number of years since project implementation started. The i.year and i.month are dummies controlling for between year effects and seasonality.

Now, excuse me if the question is broad, but I simply wonder: isn’t there a better way to test for the treatment effect in this case? Ideally I would want to see if treatment areas deviate from their pre-treatment trend in a different way than control areas (and by doing so also somehow testing the parallel trends assumption), but I am simply a bit out of my econometrical/coding depth here. I’ve played around with xtset and xtreg for a few days, but I can’t really figure out if and how that would help.

As always thankful for any assistance,
Lars

serial/cross sectional autocorrelation and heteroscedasticity in panel data

Hello everyone,

I am working with a paneldataset with N=170 and T=5. I want to test for heteroscedasticity/autocorrelation but I kind of get lost in all the different functions. I know xttest2 tests for autocorrelation and xttest3 for heteroscedasticity. However, I am wondering why the BP test cannot be used in this case, is the BP test a test for one wave data? furthermore, I am struggling to understand the difference between serial autocorrelation and cross sectional autocorrelation. It seems like xttest2 tests for cross sectional autocorrelation, but given that I am working with time series, I feel serial is the way to go.

Kind regards,
Timea

hcost: error occurred while loading hcost.ado

Good day all,

i need to run a hcost command to calculate cost estimate based on censored data [https://www.stata-journal.com/articl...article=st0399]

attached is a part of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(case_id castage6) int(treatmentduration days_post1year) byte censurepost1y double(hctotal hctotal_post1y)
 1 2  916  551 0            4899.58 1088.48
 2 2 1465 1100 0           10278.75  6236.8
 3 2  618  253 0             4245.7   889.3
 4 2  727  362 0           5500.966  1500.9
 5 2 1527 1162 0            3763.54  1721.7
 6 1 1197  832 0           48420.82  4053.8
 7 2 2670 2305 0 14814.779999999999  7231.4
 8 2 3909 3544 0           18826.22   14630
 9 1 1827 1462 0           61879.55   15288
10 2  435   70 0            2811.76     240
11 2  471  106 0            3628.82   377.6
12 2 3485 3120 0            5627.92    3683
13 2 4136 3771 0           15821.55 11648.6
14 2  741  375 0            9464.49  1245.5
15 2 1220  855 0            4998.21  1815.1
16 2  962  597 0               8375  1375.6
17 2  588  223 0            4292.17   874.5
18 1  544  179 0           53026.91  3578.5
19 2 2001 1636 0            7544.65    2450
20 2  438   73 0            3730.01     240
end

i managed to run the analysis for the first time. however, I am not able to rerun the test or on different variable using hcost again as it continues to show error

my command:

Code:

stset days_post1year , id(case_id) failure(censurepost1y)
hcost case_id hctotal_post1y , l(365) method(0) group(castage6)

it shows:

Code:

_sum_table() already in library
(1 line skipped)
(error occurred while loading hcost.ado)
r(110)

I believe this is because the calculation is stored in a temporary file and is preventing me from running a similar analysis.

if I try to repeat the test by saving with a different file to run the analysis, it still shows a similar error

command:

Code:

sjlog using hcost10, replace
stset days_post1year , id(case_id) failure(censurepost1y)
hcost case_id hctotal_post1y , l(365) method(0) group(castage6)

i still get the error:

Code:

_sum_table() already in library
(1 line skipped)
(error occurred while loading hcost.ado)
r(110);

is there a way for me to clear this temporary file or to save it under a different temporary file for analysis? or any solutions to resolve the problem. any help is much appreciated.

regards.

Hurdle model with nehurdle command in Stata 12

I am using nehurdle command in Stata 12. My dependent variable is education expenditure (consisting of many zeros), hence I am using a two part regression. I am interested in elasticity estimates for the intensity (value) equation. Should I use the exponential option with nehurdle? Or is it better to use margins?

year dummies in xtreg and pooled OLS

Hello,

I ran both a fixed effects model and pooled OLS model on the same dataset. I am now wondering if, including year dummies and thus time effects, is comparable in both regressions. In the literature they mention "time fixed effects" to control for variables that are constant acreoss firms but change over time in the fixed effects model and "aggregate time effects" in the pooled OLS model. I am now wondering if the regression happens in a different way concerning these time dummies. And if so, what is the difference?

Kind regards,
Timea

Elasticity estimates in Tobit regression

Dear Statalisters

This is my first post so please excuse if the question is not posed correctly.

While running a tobit regression (with censoring at 0), my dependent variable captures spending on education (hence there are many zeros), and hence I would like an estimate of proportional changes in expenditure with respect to the regressors. In a simple regression model, this is achieved by taking the log of the dependent variable, however, here the dependent variable has zeros and taking a log results in missing values since log of zero is not defined.

Could anyone suggest how to get around this issue? Will the margins command help here?

Sunday, June 28, 2020

Help/advice on importing large number of text files into Stata

Dear all,

I have a collection of around 2,400 PDFs of parliamentary debate transcriptions that I would like to import into Stata. Having found no easy solution to directly importing PDFs into Stata, I have batch converted them to text files to import them.

I have tried using multimport (multimport delimited, extensions (txt) clear) as a way to bring all of the text files in. However, this command by itself is incorrect because it returns only 200 observations, when there should be around 1 million. I have read the help file and tried to look at alternative approaches (for example a loop involving import delimited) but couldn't solve this issue.

The attraction of multimport is that I can potentially record the filename as a new variable, which would be helpful in later processing.

I have two questions based on this:

1. Is conversion of PDFs to text files before importing the appropriate way to approach this problem?
2. If multimport is the correct command, does anyone have any insight on how to tailor the command to get the appropriate output?

Thanks,
Nate

Tabout Error - Conformability r(3200)

Hello

I am using running a series of crosstab tables using the following tabout command:

Code:

tabout sex v501_marital_stat_r religion v106_education_r v190a_wealthquintiles v025_urbanrural ///
condomless2partners if hivstatusoutcome==2 using EmmaDHS_2B.txt, replace c(row) svy f(1) ///
style(tab) stats(chi2) font(bold) npos(col) percent pop

However, I am getting a sort of conformability error - produced below:

Code:

build_ncol():  3200  conformability error
do_output():     -  function returned error
 <istmt>:     -  function returned error
r(3200);
end of do-file
r(3200);

I am not sure how to resolve this - I have changed the order of variables and even reduced them to as few as 5 as presented in the example data included here, but the error still appears.

I am wondering if you could give me assistance to resolving this. I am including an a scratch data below using tabout.

I look forward to any assistance - cheers, cY

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float sex byte v501_marital_stat_r int religion byte(v106_education_r v190a_wealthquintiles v025_urbanrural) float(condomless2partners hivstatusoutcome)
0 1 2 0 . 2 . 1
1 1 3 0 4 2 . 1
1 0 2 1 . 2 . 1
1 1 1 2 . 2 . 1
1 1 1 0 . 1 . .
1 0 2 2 . 2 . 1
1 1 1 1 . 1 . .
0 1 2 2 . 1 . 1
1 1 3 0 . 2 . .
1 1 3 0 3 2 . 1
1 1 2 2 . 2 . 1
1 1 2 1 . 2 . .
0 1 3 1 . 1 . 1
0 0 . 1 . 2 . 1
0 1 3 2 . 1 1 1
1 0 2 1 . 1 . 1
1 0 3 1 . 2 . 1
0 1 . 1 . 2 . 1
0 1 2 1 . 2 . 1
0 1 1 2 . 1 . 1
0 0 2 2 . 1 . 1
1 1 2 0 . 1 . .
1 0 2 2 . 2 . 1
1 1 2 0 . 2 . 1
1 1 1 1 . 2 . .
1 1 3 0 . 2 . 1
1 1 2 0 . 1 1 1
1 1 1 2 . 2 . .
1 0 2 2 . 1 . 1
0 0 2 2 . 2 . .
1 1 2 2 . 2 . 3
1 0 3 2 . 2 . .
0 0 3 0 2 1 . 1
1 0 . 1 . 2 . 1
1 1 2 0 . 2 . 1
1 1 3 0 . 2 . 1
1 1 2 2 . 2 . 1
0 1 2 2 . 2 1 1
1 0 3 0 . 2 . .
0 0 2 1 . 2 . 1
1 0 3 1 . 2 . .
0 1 1 0 . 2 . 1
0 1 2 1 . 2 . 1
0 0 2 1 . 2 . 1
1 1 3 0 . 1 . 1
1 0 2 2 . 1 0 1
0 0 3 0 1 1 1 1
1 1 3 0 . 2 . 1
0 1 . 1 . 1 . 1
1 0 1 0 . 2 . 1
end
label values sex gender
label def gender 0 "male", modify
label def gender 1 "female", modify
label values v501_marital_stat_r married
label def married 0 "not married/living together", modify
label def married 1 "married/living together", modify
label values religion religion
label def religion 1 "Catholic", modify
label def religion 2 "ChristianPentecostal", modify
label def religion 3 "Others(TradMuslim)", modify
label values v106_education_r edu
label def edu 0 "no education", modify
label def edu 1 "primary", modify
label def edu 2 "secondary or higher", modify
label values v190a_wealthquintiles MV190A
label def MV190A 1 "poorest", modify
label def MV190A 2 "poorer", modify
label def MV190A 3 "middle", modify
label def MV190A 4 "richer", modify
label values v025_urbanrural URBAN
label def URBAN 1 "urban", modify
label def URBAN 2 "rural", modify
label values condomless2partners yesno
label values hivstatusoutcome hivstatuso
label def hivstatuso 1 "hiv negative", modify
label def hivstatuso 3 "hiv positive and unaware", modify

Calculating effect sizes after MI and mixed

Hello All,

I hope this message finds you well!

After running the following code is there a way to calculate effect sizes?

Code:

mi estimate : mixed dnT1_sum SR_Fall age_month edu mvpa_all light_sed MVPA_FallSR  c_sex1 || classid:

I came across the following Stata blog but the recommendations are not feasible with my above analysis. I received the following error after trying "estat esize"

Code:

. estat esize
estat esize not valid
r(321);

Thank you for your time and help,
Patrick

Help needed to adjust the outcome of bysort command

Dear Stata community members,

I have created a count variable k and p based on the following syntax :
by ID Illness (Year), sort: gen k = _n

by ID Illness Year (k), sort: replace k = k[1]

by ID ClassofIllness (Year), sort: gen p = _n

by ID ClassofIllness Year (p), sort: replace p = p[1]

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str23 ID str10 Illness float(YearofContracting DosageOfAntibiotic ClassofIllness k k1 p p1)
"Patient 1" "Malaria"    2001 100 1 1 1  1 1
"Patient 1" "Typhoid"    2001  15 1 1 1  1 1
"Patient 1" "Typhoid"    2002  26 1 2 2  3 3
"Patient 1" "Common flu" 2003   0 1 1 0  4 0
"Patient 1" "Allergy"    2004  26 1 1 1  5 4
"Patient 1" "Allergy"    2004  10 1 2 2  5 5
"Patient 1" "Allergy"    2005   0 1 3 0  7 0
"Patient 1" "Common flu" 2006   0 1 2 0  8 0
"Patient 1" "Common flu" 2007   0 1 3 0  9 0
"Patient 1" "Common flu" 2008   0 1 4 0 10 0
"Patient 1" "Common flu" 2009   0 1 5 0 11 0
"Patient 1" "Common flu" 2010   0 1 6 0 12 0
"Patient 1" "Allergy"    2012   9 1 4 3 13 6
"Patient 1" "Typhoid"    2013  18 1 3 3 14 7
"Patient 1" "Malaria"    2014  13 1 2 2 15 8
"Patient 1" "Allergy"    2015   0 1 5 0 16 0
"Patient 1" "Common flu" 2016  60 1 7 1 17 9
end

------------------ copy up to and including the previous line ------------------

However, the issue is, I want to have the counter as shown in k1 and p1. That is,
The row where dosageofantibiotics==0 , the counter should put the value as zero. The next time (as per the bysort conditions) the counter should pick from where it left before the zero (as per the bysort conditions).

Please help.
Regards.

Can I select Fixed-effects model?

Dear,

First of all, I really thank you everyone here sparing their efforts to help the person like me.

- Fixed-effects model is proven valid by F-test. (The null hypothesis is rejected by the F-test)
- Random-effects model is not proven by LM test. (The null hypothesis is not rejected by the Lagrange multiplier method)
- Fixed-effects model is chosen by Hausman test. (The null hypothesis is rejected by Hausman test)

In this case,
Can I select Fixed-effects model?

Thanks in advance for your help!

Generating summary statistics and exporting to excel

I have a table that looks like this:

Country Name	Country Code	Series Name	Series Code	2016 [YR2016]	2017 [YR2017]	2018 [YR2018]
Afghanistan	AFG	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	77869931554	79945392646	80769357876
Albania	ALB	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	36239313232	37624046896	39183653161
Algeria	DZA	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	4.71936E+11	4.78071E+11	4.84764E+11
American Samoa	ASM	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	..	..	..
Andorra	AND	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	..	..	..
Angola	AGO	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	2.18309E+11	2.17987E+11	2.13337E+11
Antigua and Barbuda	ATG	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	1835521960	1893259104	2033155752
Arab World	ARB	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	5.88323E+12	5.95482E+12	6.09747E+12
Argentina	ARG	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	1.01085E+12	1.03782E+12	1.01207E+12
Armenia	ARM	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	33187468758	35676528915	37531708419
Aruba	ABW	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	3531919864	3578912448	..
Australia	AUS	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	1.17534E+12	1.20316E+12	1.23854E+12
Austria	AUT	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	4.69057E+11	4.80673E+11	4.92304E+11
Azerbaijan	AZE	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	1.39546E+11	1.39153E+11	1.41119E+11
Bahamas, The	BHS	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	13470395241	13479375208	13690429864
Bahrain	BHR	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	68590652908	71199513662	72464535612
Bangladesh	BGD	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	6.19293E+11	6.64404E+11	7.1665E+11
Barbados	BRB	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	4537574770	4529721747	4507144307
Belarus	BLR	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	1.69342E+11	1.7363E+11	1.79098E+11
Belgium	BEL	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	5.6641E+11	5.77535E+11	5.85958E+11
Belize	BLZ	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	2633432198	2671282228	2752366426
Benin	BEN	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	32196993426	34023063768	36301676625
Bermuda	BMU	GDP, PPP (constant 2017 international $)	NY.GDP.MKTP.PP.KD	..	..	..

I want to generate a table of summary statistics (average, median, standard deviation and number of observations) by country. I want the results to be directly saved to excel.

I tried using the sumstats command.

sumstats /// (2016[YR2016]) /// using "test.xlsx" , replace stats(mean p50 sd) I get errors in this code. Is there an alternative way to obtain summary statistics without the sumstats command?

How do you maintain your library of code sinppets?

This question is for anyone whose programmed for long enough to find that keeping snippets of code to hand is useful for remembering interesting or useful patterns, or to keep track of rare use cases, etc. If you do maintain a personal library, can you share details of how your organize it? Do you use an app? Do you have a loose collection of text files? What are your likes or grumbles?

New version of -ranktest- available on SSC

With thanks to Kit Baum, a new version of ranktest by Kleibergen-Schaffer-Windmeijer, version 2.0.03, is now available on SSC Archives.

Tests of rank have various practical applications; in econometrics probably the most common is the test of the requirement in a linear IV/GMM model that the matrix E(z_i x_i') is full rank, where z_i is the vector of instruments and x_i is the vector of endogenous regressors.

The updates to ranktest are extensive and include the addition of GMM-based J-type rank tests as proposed by Windmeijer (2018); see https://ideas.repec.org/p/bri/uobdis/18-696.html.

ranktest is a required component for ivreg2 and related packages. Please note, however, that the new features of ranktest require Stata 13 or higher. If called under version control or by an earlier version of Stata, ranktest will call a new program also included in the package, ranktest11. ranktest11 is essentially the previous version of ranktest version 1.4.01. Because ivreg2 runs under version control, the results reported by the current version of ivreg2 will be unaffected by this update. Similar remarks apply to related packages.