Hello,
I am newbie in Stata and I have a problem with stata when I try to declare If condition.
In my DB, I have "country" variable which has "Germany" values but when I run the regression:
"oprobit adl iadl depression_scale chronicw2 if country==Germany". It said that "Germany not found"
or
" oprobit adl iadl depression_scale if "country"=="Germany" ". It said that "no observations"
How can I fix it?
Thank you
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Tuesday, June 30, 2020
Matching and generating hourly data
Dear Stata Users,
I am working on hourly data sets. The datasets have the "zipcodes" variable as the common var to merge/joinby. In data-A I have the hourly data, which should be merged with data-B.
Now, in both datasets, I have each zipcodes repeated more than one time, therefore, it is impossible to use merge (m:1, or 1:m). I would use "joinby" to merge the datasets.
I would like to get some help with generating a variable (such as Uhrzeit) to get in data-B an hourly-time (Uhrzeit) for each zipcode for each year.
Finally, do I have to generate the hourly-time before or after joining (joinby).
data-A:
data-B:
If you need further clarification, I would be happy to do it.
Best,
Ami
I am working on hourly data sets. The datasets have the "zipcodes" variable as the common var to merge/joinby. In data-A I have the hourly data, which should be merged with data-B.
Now, in both datasets, I have each zipcodes repeated more than one time, therefore, it is impossible to use merge (m:1, or 1:m). I would use "joinby" to merge the datasets.
I would like to get some help with generating a variable (such as Uhrzeit) to get in data-B an hourly-time (Uhrzeit) for each zipcode for each year.
Finally, do I have to generate the hourly-time before or after joining (joinby).
data-A:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long zipcode str11 stationcode double no2 str7 Uhrzeit float(Datum datum_tag datum_monat datum_jahr) 63741 "DEBY005" 30.49 "01:00" 20089 1 1 2015 91522 "DEBY001" 33.75 "01:00" 20089 1 1 2015 63741 "DEBY005" 30.25 "02:00" 20089 1 1 2015 91522 "DEBY001" 32.98 "02:00" 20089 1 1 2015 63741 "DEBY005" 32.94 "03:00" 20089 1 1 2015 91522 "DEBY001" 30.58 "03:00" 20089 1 1 2015 63741 "DEBY005" 30.54 "04:00" 20089 1 1 2015 91522 "DEBY001" 27.21 "04:00" 20089 1 1 2015 63741 "DEBY005" 29.84 "05:00" 20089 1 1 2015 91522 "DEBY001" 28.95 "05:00" 20089 1 1 2015 63741 "DEBY005" 28.39 "06:00" 20089 1 1 2015 91522 "DEBY001" 27.5 "06:00" 20089 1 1 2015 63741 "DEBY005" 28.89 "07:00" 20089 1 1 2015 91522 "DEBY001" 26.32 "07:00" 20089 1 1 2015 63741 "DEBY005" 31.39 "08:00" 20089 1 1 2015 91522 "DEBY001" 27.57 "08:00" 20089 1 1 2015 63741 "DEBY005" 33.36 "09:00" 20089 1 1 2015 91522 "DEBY001" 25.35 "09:00" 20089 1 1 2015 63741 "DEBY005" 28.75 "10:00" 20089 1 1 2015 91522 "DEBY001" 21.24 "10:00" 20089 1 1 2015 63741 "DEBY005" 27.13 "11:00" 20089 1 1 2015 91522 "DEBY001" 20.72 "11:00" 20089 1 1 2015 63741 "DEBY005" 23.34 "12:00" 20089 1 1 2015 91522 "DEBY001" 25.09 "12:00" 20089 1 1 2015 63741 "DEBY005" 30.87 "13:00" 20089 1 1 2015 91522 "DEBY001" 21.51 "13:00" 20089 1 1 2015 63741 "DEBY005" 35.34 "14:00" 20089 1 1 2015 91522 "DEBY001" 23.25 "14:00" 20089 1 1 2015 63741 "DEBY005" 38.24 "15:00" 20089 1 1 2015 91522 "DEBY001" 28 "15:00" 20089 1 1 2015 63741 "DEBY005" 34.68 "16:00" 20089 1 1 2015 91522 "DEBY001" 21.67 "16:00" 20089 1 1 2015 63741 "DEBY005" 42.17 "17:00" 20089 1 1 2015 91522 "DEBY001" 24.92 "17:00" 20089 1 1 2015 63741 "DEBY005" 43.97 "18:00" 20089 1 1 2015 91522 "DEBY001" 29.73 "18:00" 20089 1 1 2015 63741 "DEBY005" 33.09 "19:00" 20089 1 1 2015 91522 "DEBY001" 27.7 "19:00" 20089 1 1 2015 63741 "DEBY005" 20.34 "20:00" 20089 1 1 2015 91522 "DEBY001" 23.79 "20:00" 20089 1 1 2015 63741 "DEBY005" 17.08 "21:00" 20089 1 1 2015 91522 "DEBY001" 18.47 "21:00" 20089 1 1 2015 63741 "DEBY005" 18.25 "22:00" 20089 1 1 2015 91522 "DEBY001" 18.46 "22:00" 20089 1 1 2015 63741 "DEBY005" 18.69 "23:00" 20089 1 1 2015 91522 "DEBY001" 17.73 "23:00" 20089 1 1 2015 63741 "DEBY005" 21.54 "24:00" 20089 1 1 2015 91522 "DEBY001" 13.44 "24:00" 20089 1 1 2015 63741 "DEBY005" 15.73 "01:00" 20090 2 1 2015 91522 "DEBY001" 14.07 "01:00" 20090 2 1 2015 63741 "DEBY005" 13.33 "02:00" 20090 2 1 2015 91522 "DEBY001" 15.11 "02:00" 20090 2 1 2015 63741 "DEBY005" 12.15 "03:00" 20090 2 1 2015 91522 "DEBY001" 18.38 "03:00" 20090 2 1 2015 63741 "DEBY005" 15.88 "04:00" 20090 2 1 2015 91522 "DEBY001" 21.81 "04:00" 20090 2 1 2015 63741 "DEBY005" 39.34 "05:00" 20090 2 1 2015 91522 "DEBY001" 23.83 "05:00" 20090 2 1 2015 63741 "DEBY005" 36.97 "06:00" 20090 2 1 2015 91522 "DEBY001" 21.01 "06:00" 20090 2 1 2015 63741 "DEBY005" 29.83 "07:00" 20090 2 1 2015 91522 "DEBY001" 40.23 "07:00" 20090 2 1 2015 63741 "DEBY005" 30.41 "08:00" 20090 2 1 2015 91522 "DEBY001" 45.88 "08:00" 20090 2 1 2015 63741 "DEBY005" 23.77 "09:00" 20090 2 1 2015 91522 "DEBY001" 53.98 "09:00" 20090 2 1 2015 63741 "DEBY005" 35.29 "10:00" 20090 2 1 2015 91522 "DEBY001" 39.31 "10:00" 20090 2 1 2015 63741 "DEBY005" 19.62 "11:00" 20090 2 1 2015 91522 "DEBY001" 44.46 "11:00" 20090 2 1 2015 63741 "DEBY005" 15.99 "12:00" 20090 2 1 2015 91522 "DEBY001" 44.58 "12:00" 20090 2 1 2015 63741 "DEBY005" 20.42 "13:00" 20090 2 1 2015 91522 "DEBY001" 33.5 "13:00" 20090 2 1 2015 63741 "DEBY005" 19.29 "14:00" 20090 2 1 2015 91522 "DEBY001" 35.01 "14:00" 20090 2 1 2015 63741 "DEBY005" 22.52 "15:00" 20090 2 1 2015 91522 "DEBY001" 34.49 "15:00" 20090 2 1 2015 63741 "DEBY005" 25.37 "16:00" 20090 2 1 2015 91522 "DEBY001" 33.18 "16:00" 20090 2 1 2015 63741 "DEBY005" 25.44 "17:00" 20090 2 1 2015 91522 "DEBY001" 32.39 "17:00" 20090 2 1 2015 63741 "DEBY005" 32.56 "18:00" 20090 2 1 2015 91522 "DEBY001" 35.52 "18:00" 20090 2 1 2015 63741 "DEBY005" 35.9 "19:00" 20090 2 1 2015 91522 "DEBY001" 29.65 "19:00" 20090 2 1 2015 63741 "DEBY005" 23.61 "20:00" 20090 2 1 2015 91522 "DEBY001" 23.87 "20:00" 20090 2 1 2015 63741 "DEBY005" 33.62 "21:00" 20090 2 1 2015 91522 "DEBY001" 22.03 "21:00" 20090 2 1 2015 63741 "DEBY005" 38.14 "22:00" 20090 2 1 2015 91522 "DEBY001" 17.59 "22:00" 20090 2 1 2015 63741 "DEBY005" 32 "23:00" 20090 2 1 2015 91522 "DEBY001" 14.65 "23:00" 20090 2 1 2015 63741 "DEBY005" 30.47 "24:00" 20090 2 1 2015 91522 "DEBY001" 11.15 "24:00" 20090 2 1 2015 63741 "DEBY005" 45.94 "01:00" 20091 3 1 2015 91522 "DEBY001" 10.94 "01:00" 20091 3 1 2015 63741 "DEBY005" 53.28 "02:00" 20091 3 1 2015 91522 "DEBY001" 10.15 "02:00" 20091 3 1 2015 end format %td Datum
data-B:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long zipcode str13 city float population str5 areakm2 str10 county str14 street_type str11 location str7 speedlimit 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Kreisstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "Vz325" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "Vz325" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "60" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "60" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "70" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "Vz325" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts" "60" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" " innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts" "50" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "Schritt" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Ortsstraße" "innerorts" "30" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "100" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts" "70" 91522 "Ansbach" 41.847 "99,91" "kreisfreie" "Sonstige" "innerorts" "30" end
Best,
Ami
Pathways / event history
Dear Statalist users
I am hoping you can solve a query I have
I want to look at pathways for young people aged between 15 and 30 in and out of various activities: Employment, FT; Employment PT; Unemployment; Not in the labour force; Home-making and study; and Study-only. I want to make comparisons between young people pre-Global Financial Crisis and young people post-Global Financial Crisis as well as comparisons between men and women. I'm thinking event-history analysis but I'm not quite sure how to go about it? I want to look at pathways (movements between the various activities) and maybe (although not necessarily) how long they stayed in the activities before moving.
If event history is indeed the right way to go about it - how would I make the required time duration variables?
I have longitudinal panel data spanning from 2001-2017.
Can anyone assist?
best
Brendan
I am hoping you can solve a query I have
I want to look at pathways for young people aged between 15 and 30 in and out of various activities: Employment, FT; Employment PT; Unemployment; Not in the labour force; Home-making and study; and Study-only. I want to make comparisons between young people pre-Global Financial Crisis and young people post-Global Financial Crisis as well as comparisons between men and women. I'm thinking event-history analysis but I'm not quite sure how to go about it? I want to look at pathways (movements between the various activities) and maybe (although not necessarily) how long they stayed in the activities before moving.
If event history is indeed the right way to go about it - how would I make the required time duration variables?
I have longitudinal panel data spanning from 2001-2017.
Can anyone assist?
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(activity GFC) byte sex 3 0 2 10 0 2 10 0 2 10 1 2 10 1 2 5 1 2 5 1 2 12 1 2 12 1 2 3 1 2 1 1 2 1 1 2 1 0 2 3 1 2 11 1 2 1 1 2 10 1 2 11 1 2 3 1 2 11 1 2 3 1 2 10 1 2 10 1 2 10 0 2 10 0 2 7 0 2 10 0 2 11 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 1 2 12 1 2 12 1 2 12 1 2 5 1 2 3 1 2 3 0 2 11 0 2 12 1 2 3 1 2 3 1 2 3 1 2 3 1 2 10 1 2 10 1 2 10 1 2 10 1 2 3 1 2 10 1 2 10 1 2 10 1 2 3 1 2 3 1 2 3 1 2 3 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 0 1 3 0 1 1 0 1 1 0 1 1 0 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 end label values activity activity label def activity 1 "[1] Employed, full-time", modify label def activity 3 "[3] Employed, part-time", modify label def activity 5 "[5] Unemployed", modify label def activity 7 "[7] Not in the labour force", modify label def activity 10 "[10] Home-making / caring", modify label def activity 11 "[11] Work and study", modify label def activity 12 "[12] Study", modify label values sex QSEX label def QSEX 1 "[1] Male", modify label def QSEX 2 "[2] Female", modify
Brendan
Generating a variable
I want to generate variable that indicate if individuals continue to participate in a program at age of 72 or after. For example, if an individual participated in the program at age 69 and 73 then a value of 1 should be given, if an individual participated only before the age of 72 then a value of 2 should be given and if an individual participated only at the age of 72 or after then should be given a value of 3.
A sample of data structure is below.
e.g.
ID age Participation_date
B001 68 05nov2012
B001 70 07may2015
B001 72 09jun2017
B002 67 28nov2011
B002 68 22oct2012
B002 69 25nov2013
B002 70 10nov2014
B002 71 14dec2015
B002 72 12dec2016
B003 73 25feb2012
B003 75 08oct2013
B003 77 12feb2016
B004 76 16jun2012
B004 78 22may2014
B005 68 17nov2012
B006 76 12mar2013
B006 78 29apr2015
B007 72 22jun2012
B007 74 04aug2014
B008 71 29jan2013
B008 73 04mar2015
B008 75 30mar2017
B009 72 28jan2015
B010 71 28feb2012
B010 74 03jun2014
B011 73 04feb2013
B011 76 17sep2015
Thanks for the help
A sample of data structure is below.
e.g.
ID age Participation_date
B001 68 05nov2012
B001 70 07may2015
B001 72 09jun2017
B002 67 28nov2011
B002 68 22oct2012
B002 69 25nov2013
B002 70 10nov2014
B002 71 14dec2015
B002 72 12dec2016
B003 73 25feb2012
B003 75 08oct2013
B003 77 12feb2016
B004 76 16jun2012
B004 78 22may2014
B005 68 17nov2012
B006 76 12mar2013
B006 78 29apr2015
B007 72 22jun2012
B007 74 04aug2014
B008 71 29jan2013
B008 73 04mar2015
B008 75 30mar2017
B009 72 28jan2015
B010 71 28feb2012
B010 74 03jun2014
B011 73 04feb2013
B011 76 17sep2015
Thanks for the help
Problem with SVY : poisson regression
Hi all,
I'm trying to estimate the effect of containement measures on COVID 19 number of cases evolution (in one city).
The thing is i have only aggregated number of cases over 4 differents periods of time from march 11 to june 26
I also have the number of tests performed, age categories and proportion of males
As first 3 measures were implemented more than 3 weeks ago, im not sure how to define my var measure var. I tried differents forms (Time1=105 days from implementation to this day; Time2=95 days of implementation etc..) but none worked.
I tried to define svyset with pweight, single unit (centered) and i plan to use poisson regression
Does anyone have any suggestion for me please ?
svy:poisson Logcase Tests SexR Time1 Time2 etc...
thanks
I'm trying to estimate the effect of containement measures on COVID 19 number of cases evolution (in one city).
The thing is i have only aggregated number of cases over 4 differents periods of time from march 11 to june 26
I also have the number of tests performed, age categories and proportion of males
As first 3 measures were implemented more than 3 weeks ago, im not sure how to define my var measure var. I tried differents forms (Time1=105 days from implementation to this day; Time2=95 days of implementation etc..) but none worked.
I tried to define svyset with pweight, single unit (centered) and i plan to use poisson regression
Does anyone have any suggestion for me please ?
svy:poisson Logcase Tests SexR Time1 Time2 etc...
thanks
Regressing growth rates of a variable on its initial level for separate decades
Dear all,
I am currently trying to regress the growth rate of a variable on its initial level and an intercept. Specifically, the compound annual growth rate of GDP per capita is the dependent variable, and the regressor is the initial level of GDP. The growth rate is estimated over periods of 10 years, so the initial level GDP would be the GDP prevailing in the 1st year of each 10 year-period. An example of the data is provided below:
Here is where I get stuck and I do not know how to proceed. First, I do not want to do this regression for succesive years. Rather, I would like to do this regression for separate decades, so per country I should only have 3 datapoints considered in the regression: the periods 1991-2001,2001-2011,2007-2017. I realize that the last datapoint overlaps with the 2nd datapoint, but I do not have the data for 2018,2019,2020 so at least this way I can calculate the most recent compound annual growth rate. Problem is, I do not know how to write this down in a command to achieve this result. Moreover, I would like to control for period (decade effects) so I created a decadal identifier for the regression. I am currently using the following command to create decadal identifiers: gen decade=10*floor(year/10) but when I include it in my regression as reg gdp_growth initial_gdp i.decade, vce(cluster cc) the only decade dummy I see is 2010, whereas I think I should have at least 1 more? So I figured maybe I am not properly writing the code down for the decade dummies, so maybe there is a better way to write down a command to control for decade fixed effects.
I hope I have explained my question clear, and I am grateful for any advice you may have for me regarding this problem.
Thank you in advance.
Best,
Satya
I am currently trying to regress the growth rate of a variable on its initial level and an intercept. Specifically, the compound annual growth rate of GDP per capita is the dependent variable, and the regressor is the initial level of GDP. The growth rate is estimated over periods of 10 years, so the initial level GDP would be the GDP prevailing in the 1st year of each 10 year-period. An example of the data is provided below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str3 countrycode int year float gdp "DEU" 1991 33836.418 "DEU" 1992 34227.57 "DEU" 1993 33670.29 "DEU" 1994 34358.34 "DEU" 1995 34783.29 "DEU" 1996 34967.477 "DEU" 1997 35539.133 "DEU" 1998 36251.195 "DEU" 1999 36913.184 "DEU" 2000 37930.484 "DEU" 2001 38509.605 "DEU" 2002 38368.617 "DEU" 2003 38073.76 "DEU" 2004 38535.17 "DEU" 2005 38835.383 "DEU" 2006 40362.29 "DEU" 2007 41622.36 "DEU" 2008 42102.85 "DEU" 2009 39804.92 "DEU" 2010 41531.93 "DEU" 2011 43969.26 "DEU" 2012 44070.92 "DEU" 2013 44139.03 "DEU" 2014 44933.72 "DEU" 2015 45321.4 "DEU" 2016 45959.57 "DEU" 2017 46916.82 "FIN" 1991 31250.535 "FIN" 1992 30043.363 "FIN" 1993 29692.36 "FIN" 1994 30727.863 "FIN" 1995 31901.844 "FIN" 1996 32963.473 "FIN" 1997 34947.375 "FIN" 1998 36756.746 "FIN" 1999 38277.61 "FIN" 2000 40403.55 "FIN" 2001 41363.69 "FIN" 2002 41967.97 "FIN" 2003 42706.92 "FIN" 2004 44283.16 "FIN" 2005 45358.56 "FIN" 2006 47004.62 "FIN" 2007 49285.27 "FIN" 2008 49440.97 "FIN" 2009 45231.96 "FIN" 2010 46459.97 "FIN" 2011 47423.21 "FIN" 2012 46538.58 "FIN" 2013 45906.8 "FIN" 2014 45550.5 "FIN" 2015 45655.22 "FIN" 2016 46720.56 "FIN" 2017 48033.29 "FRA" 1991 32683.35 "FRA" 1992 33041.363 "FRA" 1993 32691.684 "FRA" 1994 33338.34 "FRA" 1995 33917.926 "FRA" 1996 34275.605 "FRA" 1997 34952.523 "FRA" 1998 36073.637 "FRA" 1999 37116.41 "FRA" 2000 38309.44 "FRA" 2001 38786.086 "FRA" 2002 38942.28 "FRA" 2003 38985.535 "FRA" 2004 39794.64 "FRA" 2005 40152.69 "FRA" 2006 40850.36 "FRA" 2007 41582.8 "FRA" 2008 41456.48 "FRA" 2009 40058.68 "FRA" 2010 40638.34 "FRA" 2011 41329.04 "FRA" 2012 41258.27 "FRA" 2013 41282.99 "FRA" 2014 41480.77 "FRA" 2015 41793.54 "FRA" 2016 42141.84 "FRA" 2017 43001.59 "NLD" 1991 36286.313 "NLD" 1992 36627.406 "NLD" 1993 36830.414 "NLD" 1994 37693.047 "NLD" 1995 38676.07 "NLD" 1996 39844.98 "NLD" 1997 41356.45 "NLD" 1998 43019.19 "NLD" 1999 44885.09 "NLD" 2000 46435.21 "NLD" 2001 47158.42 "NLD" 2002 46960.18 "NLD" 2003 46811.89 "NLD" 2004 47575.48 "NLD" 2005 48437.88 "NLD" 2006 50033.88 "NLD" 2007 51808.77 "NLD" 2008 52727.52 "NLD" 2009 50533.51 "NLD" 2010 50950.04 "NLD" 2011 51499.6 "NLD" 2012 50780.7 "NLD" 2013 50565.3 "NLD" 2014 51100.84 "NLD" 2015 51871.58 "NLD" 2016 52727.1 "NLD" 2017 53942.09 "SWE" 1991 36791.93 "SWE" 1992 36152.99 "SWE" 1993 35201.15 "SWE" 1994 36340.152 "SWE" 1995 37595.406 "SWE" 1996 38141.777 "SWE" 1997 39301.33 "SWE" 1998 40953.22 "SWE" 1999 42685.54 "SWE" 2000 44694.43 "SWE" 2001 45228.91 "SWE" 2002 46071.99 "SWE" 2003 46931.17 "SWE" 2004 48769.29 "SWE" 2005 49981.3 "SWE" 2006 51988.43 "SWE" 2007 53374.82 "SWE" 2008 52832.31 "SWE" 2009 50164.93 "SWE" 2010 52817.44 "SWE" 2011 54020.13 "SWE" 2012 53283.64 "SWE" 2013 53408.79 "SWE" 2014 54334.29 "SWE" 2015 56139.5 "SWE" 2016 56776.29 "SWE" 2017 57367.43 end
I hope I have explained my question clear, and I am grateful for any advice you may have for me regarding this problem.
Thank you in advance.
Best,
Satya
Is it possible to obtain ultra-precision in calculating p-values in Stata?
Dear Statalisters,
What would be your suggestion to compute p-values as low as 1e-25?
For example:
One-sided test
gene double p = 1-normal(10.42045)
All the best,
Tiago
What would be your suggestion to compute p-values as low as 1e-25?
For example:
One-sided test
gene double p = 1-normal(10.42045)
All the best,
Tiago
difference between meologit with sampling weights and svy:meologit?
HI, I have a general question about the difference in 2 commands. I have multilevel survey data and an ordered 3 level dependent variable (outcome). I'm wondering what the difference is in the commands between a)survey setting the data and running meologit as a svy command vs. b)running meologit and adding in sampling weights. Does anyone have thoughts on that? (Note: I didn't use dataex because I'm not worried about replicating this or how it is running, just whether there are differences between these 2 commands).
Many thanks to any who might reply!
--Ann
Examples below, outcome=ckd_3cat and independent var=stunting
example a:
svydescribe
Survey: Describing stage 1 sampling units
pweight: w_ind_norm
VCE: linearized
Single unit: missing
Strata 1: site
SU 1: hogar
FPC 1: <zero>
. svy: meologit ckdu_3cat stunting
(running meologit on estimation sample)
Survey: Ordered logistic regression
Number of strata = 2 Number of obs = 773
Number of PSUs = 314 Population size = 814.402204
Design df = 312
F( 1, 312) = 2.55
Prob > F = 0.1115
------------------------------------------------------------------------------
| Linearized
ckdu_3cat | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stunting | .4876653 .3055801 1.60 0.112 -.113593 1.088924
-------------+----------------------------------------------------------------
/cut1 | 2.569918 .207242 2.162149 2.977686
/cut2 | 6.545487 .6893908 5.189044 7.90193
------------------------------------------------------------------------------
example B:
meologit ckdu_3cat stunting [pweight = w_ind_norm] || hogar:
Fitting fixed-effects model:
Iteration 0: log likelihood = -241.89419
Iteration 1: log likelihood = -237.86994
Iteration 2: log likelihood = -237.30416
Iteration 3: log likelihood = -237.3033
Iteration 4: log likelihood = -237.3033
Refining starting values:
Grid node 0: log likelihood = -232.26489
Fitting full model:
Iteration 0: log pseudolikelihood = -232.26489
Iteration 1: log pseudolikelihood = -227.83392
Iteration 2: log pseudolikelihood = -227.57108
Iteration 3: log pseudolikelihood = -227.56916
Iteration 4: log pseudolikelihood = -227.56916
Mixed-effects ologit regression Number of obs = 773
Group variable: hogar Number of groups = 313
Obs per group:
min = 1
avg = 2.5
max = 10
Integration method: mvaghermite Integration pts. = 7
Wald chi2(1) = 2.57
Log pseudolikelihood = -227.56916 Prob > chi2 = 0.1091
(Std. Err. adjusted for 313 clusters in hogar)
------------------------------------------------------------------------------
| Robust
ckdu_3cat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stunting | .5356627 .3342881 1.60 0.109 -.1195299 1.190855
-------------+----------------------------------------------------------------
/cut1 | 3.25844 .3152659 2.64053 3.87635
/cut2 | 7.401028 .7907553 5.851176 8.95088
-------------+----------------------------------------------------------------
hogar |
var(_cons)| 1.625323 .5789674 .808588 3.267021
------------------------------------------------------------------------------
Many thanks to any who might reply!
--Ann
Examples below, outcome=ckd_3cat and independent var=stunting
example a:
svydescribe
Survey: Describing stage 1 sampling units
pweight: w_ind_norm
VCE: linearized
Single unit: missing
Strata 1: site
SU 1: hogar
FPC 1: <zero>
. svy: meologit ckdu_3cat stunting
(running meologit on estimation sample)
Survey: Ordered logistic regression
Number of strata = 2 Number of obs = 773
Number of PSUs = 314 Population size = 814.402204
Design df = 312
F( 1, 312) = 2.55
Prob > F = 0.1115
------------------------------------------------------------------------------
| Linearized
ckdu_3cat | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stunting | .4876653 .3055801 1.60 0.112 -.113593 1.088924
-------------+----------------------------------------------------------------
/cut1 | 2.569918 .207242 2.162149 2.977686
/cut2 | 6.545487 .6893908 5.189044 7.90193
------------------------------------------------------------------------------
example B:
meologit ckdu_3cat stunting [pweight = w_ind_norm] || hogar:
Fitting fixed-effects model:
Iteration 0: log likelihood = -241.89419
Iteration 1: log likelihood = -237.86994
Iteration 2: log likelihood = -237.30416
Iteration 3: log likelihood = -237.3033
Iteration 4: log likelihood = -237.3033
Refining starting values:
Grid node 0: log likelihood = -232.26489
Fitting full model:
Iteration 0: log pseudolikelihood = -232.26489
Iteration 1: log pseudolikelihood = -227.83392
Iteration 2: log pseudolikelihood = -227.57108
Iteration 3: log pseudolikelihood = -227.56916
Iteration 4: log pseudolikelihood = -227.56916
Mixed-effects ologit regression Number of obs = 773
Group variable: hogar Number of groups = 313
Obs per group:
min = 1
avg = 2.5
max = 10
Integration method: mvaghermite Integration pts. = 7
Wald chi2(1) = 2.57
Log pseudolikelihood = -227.56916 Prob > chi2 = 0.1091
(Std. Err. adjusted for 313 clusters in hogar)
------------------------------------------------------------------------------
| Robust
ckdu_3cat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stunting | .5356627 .3342881 1.60 0.109 -.1195299 1.190855
-------------+----------------------------------------------------------------
/cut1 | 3.25844 .3152659 2.64053 3.87635
/cut2 | 7.401028 .7907553 5.851176 8.95088
-------------+----------------------------------------------------------------
hogar |
var(_cons)| 1.625323 .5789674 .808588 3.267021
------------------------------------------------------------------------------
Constant weight variable in panel data
Dear users,
I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.
The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?
I thank you very much in advance.
I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.
The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?
I thank you very much in advance.
Constant weight variable in panel data
Dear users,
I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.
The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?
I thank you very much in advance.
I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.
The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?
I thank you very much in advance.
Weight in panel data
Dear users,
I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.
The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?
I thank you very much in advance.
I am working with a static panel dataset including variables at the household level for three years. In the first wave only rural hh were included n the survey, in the second urban households added, increasing the number of households from around 4000 to 4500.
The cross-sectional datasets provide a weight value per household each year. I conduct a fixed-effect regression at household-level, therefore, I need a constant weight variable within HH_ID.
Unfofourtunatly, I do not know how to construct it, can I just take the average of the weights for the three years?
I thank you very much in advance.
Systems of simultaneous equations with non-linear constraints
Dear Stata users,
I have the following issue: I have to estimate a system of simultaneous equations with non-linear constraints of form 0 < a < 1 using OLS.
Example 2 in https://www.stata.com/support/faqs/s...nstraints/#ex2 shows how to impose non-linear constraints of the form 0 < a < 1 using nl command that makes use of the inverse logit function.
In my case, I have to run a pair of regressions simultaneously with a non-linear constraint of the form 0 < [eqn1]beta + [eqn2]beta < 1.
Originally, I used reg3 and imposed a linear constraint of the form [eqn1]beta + [eqn2]beta = 1 but sometimes I got negative coefficients because I could not also impose the positiveness of the estimated coefficients.
So, I would like to know if there is a trick or a way you can suggest to run simultaneous regression equations with non-linear equations of form 0 < a < 1 without using any reparameterization of the coefficients as nl does?
Thank you,
Anna
I have the following issue: I have to estimate a system of simultaneous equations with non-linear constraints of form 0 < a < 1 using OLS.
Example 2 in https://www.stata.com/support/faqs/s...nstraints/#ex2 shows how to impose non-linear constraints of the form 0 < a < 1 using nl command that makes use of the inverse logit function.
In my case, I have to run a pair of regressions simultaneously with a non-linear constraint of the form 0 < [eqn1]beta + [eqn2]beta < 1.
Originally, I used reg3 and imposed a linear constraint of the form [eqn1]beta + [eqn2]beta = 1 but sometimes I got negative coefficients because I could not also impose the positiveness of the estimated coefficients.
So, I would like to know if there is a trick or a way you can suggest to run simultaneous regression equations with non-linear equations of form 0 < a < 1 without using any reparameterization of the coefficients as nl does?
Thank you,
Anna
egen maxdate gives dates 01/01/2500
Hello everyone
I am trying to get the difference between two dates to calculate follow up per patient
I am using a long format dataset with various observations per patient id, no missing values.
I can't spot why for some patients I get 01/01/2500 as the maxdate date, which is wrong.
edate format is float %dM_d,_CY , is this an issue?
This is what I'm using:
bysort patid : egen maxdate = max(edate)
Does anyone know why this might be happening?
Thank you,
Louisa
I am trying to get the difference between two dates to calculate follow up per patient
I am using a long format dataset with various observations per patient id, no missing values.
I can't spot why for some patients I get 01/01/2500 as the maxdate date, which is wrong.
edate format is float %dM_d,_CY , is this an issue?
This is what I'm using:
bysort patid : egen maxdate = max(edate)
Does anyone know why this might be happening?
Thank you,
Louisa
Excel Table split by socio-demographics using Putexcel
Hello fellow Stata community,
I´m looking for a way to automate my Analysis Output to Excel using Putexcel. I already found this Code by eric_a_booth that helps me create summary tables for each of the four example variables, each on an own sheet:
foreach j of varlist var1 var2 d_var1 d_var2{
di `"`j'"'
su `j'
putexcel A3=rscalars using `"test.xlsx"', sheet("S_`j'") modify keepcellf
putexcel A1=(`" Example for `j' "') using `"test.xlsx"', sheet("S_`j'") modify keepcellf
}
di `"{browse `"test.xlsx"': Click to open `"test.xlsx"' }"'
What I´m now looking for, would be an extension of this. My goal would be creating automated tables like the one attached, displaying the values for each variable over the values of a socio-demographic variable. In a shortened try, I programmed the following:
sum var1
*return list
putexcel A1=("Table 1") B3=("Obs") C3=("Total") D3=("Female") E3=("Male") F3=("Divers") using results, replace
putexcel C4=matrix(r(mean)*100) using results, modify
sum var1 if gender==1
putexcel D4=matrix(r(mean)*100) using results, modify
sum var1 if gender==2
putexcel E4=matrix(r(mean)*100) using results, modify
sum var1 if gender==3
putexcel F4=matrix(r(mean)*100) using results, modify
...
Since I want to run this over dozens of var´s it seems inevitable to create some kind of a loop.
I hope the example table helps understanding where i´m trying to head with this, otherwise I´ll try to further specify my challenge.
Thanks in advance for any help given, I´d be more than thankful for any advise.
Regards
Marvin
Array
I´m looking for a way to automate my Analysis Output to Excel using Putexcel. I already found this Code by eric_a_booth that helps me create summary tables for each of the four example variables, each on an own sheet:
foreach j of varlist var1 var2 d_var1 d_var2{
di `"`j'"'
su `j'
putexcel A3=rscalars using `"test.xlsx"', sheet("S_`j'") modify keepcellf
putexcel A1=(`" Example for `j' "') using `"test.xlsx"', sheet("S_`j'") modify keepcellf
}
di `"{browse `"test.xlsx"': Click to open `"test.xlsx"' }"'
What I´m now looking for, would be an extension of this. My goal would be creating automated tables like the one attached, displaying the values for each variable over the values of a socio-demographic variable. In a shortened try, I programmed the following:
sum var1
*return list
putexcel A1=("Table 1") B3=("Obs") C3=("Total") D3=("Female") E3=("Male") F3=("Divers") using results, replace
putexcel C4=matrix(r(mean)*100) using results, modify
sum var1 if gender==1
putexcel D4=matrix(r(mean)*100) using results, modify
sum var1 if gender==2
putexcel E4=matrix(r(mean)*100) using results, modify
sum var1 if gender==3
putexcel F4=matrix(r(mean)*100) using results, modify
...
Since I want to run this over dozens of var´s it seems inevitable to create some kind of a loop.
I hope the example table helps understanding where i´m trying to head with this, otherwise I´ll try to further specify my challenge.
Thanks in advance for any help given, I´d be more than thankful for any advise.
Regards
Marvin
Array
Exporting detailed summary statistics to Word
Hello helpful Stata users,
I am trying to generate a word export of my dependent variable via the sum function with details.
Should basically look like this in the end:
[ATTACH=CONFIG]temp_18812_1593519184716_955[/ATTACH]
Now I have tried the asdoc export, which somehow seems to transform it into a one line table missing all the useful percentile information and just summing up Obs, Mean and so on...
> asdoc sum Exports_USD_Hundred_Million, detail
I have also tried using esttab, however I am unfamiliar with the command and couldn't produce what I was looking for.
Is there an option for asdoc, which just exports the table like it normally shows in Stata without the asdoc command in front? Or any other way, I can generate output such as in the above picture?
Thank you very much for any help and advice!
Kind regards,
Till
I am trying to generate a word export of my dependent variable via the sum function with details.
Should basically look like this in the end:
[ATTACH=CONFIG]temp_18812_1593519184716_955[/ATTACH]
Now I have tried the asdoc export, which somehow seems to transform it into a one line table missing all the useful percentile information and just summing up Obs, Mean and so on...
> asdoc sum Exports_USD_Hundred_Million, detail
I have also tried using esttab, however I am unfamiliar with the command and couldn't produce what I was looking for.
Is there an option for asdoc, which just exports the table like it normally shows in Stata without the asdoc command in front? Or any other way, I can generate output such as in the above picture?
Thank you very much for any help and advice!
Kind regards,
Till
ppmlhdfe Pseudo R2 0.99+
Here is the model to estimate the impact of free trade agreements (fta) and preferential trade agreements (pta) on exports (exports) for annual data from 1990-2018 with 4 year intervals:
ppmlhdfe exports fta pta , a(im#year ex#year im#ex) cluster(im#ex)
The value of pseudo R2 remains as high as 0.99+, and it stays there for various model specifications. Should I doubt something for this? Is there some alternative measure I may additionally calculate?
ppmlhdfe exports fta pta , a(im#year ex#year im#ex) cluster(im#ex)
The value of pseudo R2 remains as high as 0.99+, and it stays there for various model specifications. Should I doubt something for this? Is there some alternative measure I may additionally calculate?
Bernanke Sims Decomposition Restriction for VAR IRFs
Hi all,
I'm wondering if anyone knows if STATA has the functionality to do a bernanke sims decomposition when producing IRFs from VAR models, such that each variable can only affect one another after a lag of one, but that each variable can affect itself contemporaneously? I know that there is cholesky but this doesn't prevent contemporaneous influence other than through the ordering.
Thanks in advance!
I'm wondering if anyone knows if STATA has the functionality to do a bernanke sims decomposition when producing IRFs from VAR models, such that each variable can only affect one another after a lag of one, but that each variable can affect itself contemporaneously? I know that there is cholesky but this doesn't prevent contemporaneous influence other than through the ordering.
Thanks in advance!
Importance of misspecification test vs. R-sq. and consequences of xtsktest
Dear Statalist Members,
I am analyzing a balanced panel of around 2400 firms over 12 years (Stata 13). The output I am able to present here is based on test data, as I am not allowed (or able to) extract the original files. The only difference is the number of firms, which is higher in the original dataset, and that most of my explanatory variables turn out to be significant, unlike in this sample data. F-statistic in the original is F(11,13432) Prob>F 0.0000, R-sq. overall is 0.9639.
My goal is to analyze the effect of investments in computer (investict), product and process innovations on the demand for highskilled workers. Controls include the size of the firm in terms of employees (total), the industry, a dummy for West Germany (west), a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD, and some more.
I have used xtserial and xttest3 which have lead me to include clustered robust standard errors. Using xtoverid,made me decide to use fixed effects. -testparm- has made me include year fixed effects. So my regression is now:
I originally intended to use the share of highskilled employees as my dependent variable, but after reading the paper of Kronman (1993) and several posts in this forum concerning the problems with ratios, I have switched to using the absolute number of highskilled employees (highskill) and include the total number of employees as a control. This has increased my R-squared by a lot (it was only 0.016 before).
On the other hand, I tested my model specification using:
The p-value was 0.8 before when using the share, now it is significant (0.0000) and telling me my model is misspecified. Now my question is, if the test I used to test for misspecification is the right thing to do here and if yes, what else can I do now concerning my specification? Or is a high R-Sq. enough to argue that my model fits?
Also I don't understand why the dummy for west would be omitted, none of the regressors are highly correlated.
I have read many posts in this forum and run several tests that made me end up with this fixed effects regression model, so I am confused about the result of the specification test. I have also tried -areg-, absorb(idnum) vce(cluster idnum), which has slightly different coefficients and a higher R-Sq. (as is normal) than the -xtreg, fe- but it has the same result in the misspecification test.
Testing for normality using
(re because it is not possible with fe) and then -xtsktest- has given me the following:
Could this mean I should transform my data using logs as there are issues with normality? or what are the consequences?
I appreciate any input on my issues, thanks in advance,
Helen
I am analyzing a balanced panel of around 2400 firms over 12 years (Stata 13). The output I am able to present here is based on test data, as I am not allowed (or able to) extract the original files. The only difference is the number of firms, which is higher in the original dataset, and that most of my explanatory variables turn out to be significant, unlike in this sample data. F-statistic in the original is F(11,13432) Prob>F 0.0000, R-sq. overall is 0.9639.
My goal is to analyze the effect of investments in computer (investict), product and process innovations on the demand for highskilled workers. Controls include the size of the firm in terms of employees (total), the industry, a dummy for West Germany (west), a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD, and some more.
I have used xtserial and xttest3 which have lead me to include clustered robust standard errors. Using xtoverid,made me decide to use fixed effects. -testparm- has made me include year fixed effects. So my regression is now:
Code:
xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd t > ech i.year, fe vce(cluster idnum) note: west omitted because of collinearity Fixed-effects (within) regression Number of obs = 4344 Group variable: idnum Number of groups = 498 R-sq: within = 0.1005 Obs per group: min = 1 between = 0.5034 avg = 8.7 overall = 0.4393 max = 11 F(21,497) = 2.60 corr(u_i, Xb) = 0.3892 Prob > F = 0.0001 (Std. Err. adjusted for 498 clusters in idnum) ------------------------------------------------------------------------------ | Robust highskill | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- investict | .7032893 .2711382 2.59 0.010 .170571 1.236008 product_inno | .2723859 .6988765 0.39 0.697 -1.100731 1.645503 process_inno | -.3938082 .4501978 -0.87 0.382 -1.278334 .4907173 total | .101938 .0245108 4.16 0.000 .0537805 .1500954 west | 0 (omitted) industry | .1624997 .1911486 0.85 0.396 -.2130592 .5380586 collective | -.2838042 .5861356 -0.48 0.628 -1.435413 .8678049 exportshare | .8483747 2.351452 0.36 0.718 -3.771638 5.468387 investment | 1.44e-06 5.98e-07 2.41 0.016 2.68e-07 2.62e-06 turnover | -1.99e-07 1.39e-07 -1.43 0.153 -4.73e-07 7.46e-08 rnd | -1.103514 .9824249 -1.12 0.262 -3.033732 .8267042 tech | -.6756037 .2828397 -2.39 0.017 -1.231313 -.1198947 | year | 2008 | .0310991 .3815399 0.08 0.935 -.7185309 .7807291 2009 | .4981931 .3197414 1.56 0.120 -.1300184 1.126405 2010 | .7890588 .4913133 1.61 0.109 -.1762483 1.754366 2011 | 1.109093 .5630923 1.97 0.049 .0027585 2.215428 2012 | 1.189345 .5407669 2.20 0.028 .126874 2.251816 2013 | .0965383 .7094676 0.14 0.892 -1.297387 1.490464 2014 | .4120097 .6609871 0.62 0.533 -.8866637 1.710683 2015 | -.1867301 .7267681 -0.26 0.797 -1.614647 1.241187 2016 | .1137137 .5447759 0.21 0.835 -.956634 1.184061 2017 | -.4267298 .7349041 -0.58 0.562 -1.870632 1.017172 | _cons | 4.706464 2.350515 2.00 0.046 .0882924 9.324636 -------------+---------------------------------------------------------------- sigma_u | 22.632204 sigma_e | 7.5596268 rho | .89962854 (fraction of variance due to u_i) ------------------------------------------------------------------------------
I originally intended to use the share of highskilled employees as my dependent variable, but after reading the paper of Kronman (1993) and several posts in this forum concerning the problems with ratios, I have switched to using the absolute number of highskilled employees (highskill) and include the total number of employees as a control. This has increased my R-squared by a lot (it was only 0.016 before).
On the other hand, I tested my model specification using:
Code:
predict fitted, xb g sq_fitted=fitted^2 xtreg highskill fitted sq_fitted test sq_fitted
Also I don't understand why the dummy for west would be omitted, none of the regressors are highly correlated.
I have read many posts in this forum and run several tests that made me end up with this fixed effects regression model, so I am confused about the result of the specification test. I have also tried -areg-, absorb(idnum) vce(cluster idnum), which has slightly different coefficients and a higher R-Sq. (as is normal) than the -xtreg, fe- but it has the same result in the misspecification test.
Testing for normality using
Code:
xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd tech, re vce(cluster idnum)
Code:
xtsktest (running _xtsktest_calculations on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Tests for skewness and kurtosis Number of obs = 4344 Replications = 50 (Replications based on 498 clusters in idnum) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Skewness_e | -1805.438 1230.613 -1.47 0.142 -4217.396 606.5195 Kurtosis_e | 456552.4 194447.7 2.35 0.019 75441.97 837662.8 Skewness_u | 12182.3 2960.393 4.12 0.000 6380.038 17984.56 Kurtosis_u | 1510700 274557.2 5.50 0.000 972577.4 2048822 ------------------------------------------------------------------------------ Joint test for Normality on e: chi2(2) = 7.67 Prob > chi2 = 0.0217 Joint test for Normality on u: chi2(2) = 47.21 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------
I appreciate any input on my issues, thanks in advance,
Helen
Testing for weak instruments in 2SLS regression using robust SE (just-identified model)
Hi,
I'am currently struggling to test for weak instruments when conducting a 2sls regression using robust standard errors (vce robust). My model ist just-identified (i.e., one instrument and one endogenous variable).
estat firststage provides me with the robust F statistic. However, I do not know to what compare it to. Does the threshold of 10 apply here?
Using weakivtest provides me with an effective F statistic that equals the robust F statisic since my model ist just-identified.
Using ivreg2 provides me with weak identification tests:
My main question is:
I'am currently struggling to test for weak instruments when conducting a 2sls regression using robust standard errors (vce robust). My model ist just-identified (i.e., one instrument and one endogenous variable).
estat firststage provides me with the robust F statistic. However, I do not know to what compare it to. Does the threshold of 10 apply here?
Using weakivtest provides me with an effective F statistic that equals the robust F statisic since my model ist just-identified.
Using ivreg2 provides me with weak identification tests:
- Cragg-Donald Wald F statistic: is this statistic valid when using vce robust? What can I compare it to? Does the threshold of 10 apply here?
- Kleibergen-Paap rk Wald F statistic: this statistic again equals the robust F statistic.
My main question is:
- Does the rule of thumb robust F statistic / effective F statistic / Kleibergen-Paap rk Wald F statistic > 10 apply in this context?
Esport summary statistics by 5 groups
Dear all,
Recfently I presented my research project to a faculty. One of the comment for improvement is related with descriptive statistics visualization.
Indeed, I presented my summary statistics by 5 groups in this way
Group1 _____Mean Std.dev
- var1
- var2
- var3
Group2
-var1
-var2
-var3
Nevertheless, they suggested me that it would be better to present in this way
________Group1 ________Group2
_______Mean. Std.dev__ Mean.Std.dev
var1
var2
var3
Do you have any suggestions on how to solve it?
Recfently I presented my research project to a faculty. One of the comment for improvement is related with descriptive statistics visualization.
Indeed, I presented my summary statistics by 5 groups in this way
Group1 _____Mean Std.dev
- var1
- var2
- var3
Group2
-var1
-var2
-var3
Nevertheless, they suggested me that it would be better to present in this way
________Group1 ________Group2
_______Mean. Std.dev__ Mean.Std.dev
var1
var2
var3
Do you have any suggestions on how to solve it?
Generating a new variable with standardized values compared to a healthy control group mean and SD (z scores)
I'm working with crossectional test data with a selection of test results (all continuous variables) in both a patient and healthy control (HC) group. I have made separate variables for the two groups' test results e.g. test1_controls, test1_patients, test2_controls, test2_patients etc. There are 95 patients and 48 healthy controls. I've attached the data for test 1.
In order to compare the patient group to the healthy control group I want to make a new variable for each test with z-scores; generating a new value for each patient comparing them to the mean of the HC group so I can see how far above or below the "normal" data they fall. So if the test 1 z-score for patient number 1 was -2.3 that would mean they were 2.3 standard deviations below the mean of the healthy control group data for that test. What code would you use for this?
I can do a generate a standardized value using egen test1_patients_z = std(test1_patients)
but these standardized values are only based on the mean and SD of the patient group and I haven't been able to find any options that give me the result I need comparing to the healthy control mean.
I tried egen test1_patients_z = std(test1_patients), mean(#) std(#)
with # being manually entered mean and SD of test1_controls (which I got by codebook) but that did not work (it just shifted the whole scale upwards by the HC mean rather than the normal z score mean of 0).
Thanks for your help!
In order to compare the patient group to the healthy control group I want to make a new variable for each test with z-scores; generating a new value for each patient comparing them to the mean of the HC group so I can see how far above or below the "normal" data they fall. So if the test 1 z-score for patient number 1 was -2.3 that would mean they were 2.3 standard deviations below the mean of the healthy control group data for that test. What code would you use for this?
I can do a generate a standardized value using egen test1_patients_z = std(test1_patients)
but these standardized values are only based on the mean and SD of the patient group and I haven't been able to find any options that give me the result I need comparing to the healthy control mean.
I tried egen test1_patients_z = std(test1_patients), mean(#) std(#)
with # being manually entered mean and SD of test1_controls (which I got by codebook) but that did not work (it just shifted the whole scale upwards by the HC mean rather than the normal z score mean of 0).
Thanks for your help!
Linear mixed effects models with random effects for subject
I have a dataset of 16 patients with 10 variables. The dependent variable is "bmizpre". It is a longitudinal study with a total of 3 time-points (variable time is "point"). T. The independent variables are "gender", "drug", "bmicategory " and "diseasetype". The variable defining different patients is "ptid". I would like to get result as the mean difference (95% CI) in bmizpre for different covariates
I am interested to analyze bmizpre over time by gender, drug, "bmicategory " and "diseasetype" with random effects for subject.
My question is do I have to run different univariate model included time, the covariate, and the interaction between time and the covariate ?
This is the command that I used:
Do I have to repeat for all covariates ?
Following is the output
I am interested to analyze bmizpre over time by gender, drug, "bmicategory " and "diseasetype" with random effects for subject.
My question is do I have to run different univariate model included time, the covariate, and the interaction between time and the covariate ?
This is the command that I used:
Code:
mixed bmizpre gender##c.point || ptid: point
Following is the output
Code:
mixed bmizpre gender##c.point || ptid: point Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -46.530546 Iteration 1: log likelihood = -45.773561 Iteration 2: log likelihood = -45.698352 Iteration 3: log likelihood = -45.698038 Iteration 4: log likelihood = -45.698038 Computing standard errors: Mixed-effects ML regression Number of obs = 32 Group variable: ptid Number of groups = 16 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(3) = 1.46 Log likelihood = -45.698038 Prob > chi2 = 0.6918 -------------------------------------------------------------------------------- bmizpre | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- 1.gender | .1375 .5429129 0.25 0.800 -.9265897 1.20159 point | .225625 .1912063 1.18 0.238 -.1491324 .6003824 | gender#c.point | 1 | -.179375 .2704065 -0.66 0.507 -.709362 .350612 | _cons | -.21125 .3838974 -0.55 0.582 -.9636751 .541175 -------------------------------------------------------------------------------- ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ptid: Independent | var(point) | 1.06e-18 1.40e-17 6.07e-30 1.86e-07 var(_cons) | .5940602 .3300578 .1999428 1.765042 -----------------------------+------------------------------------------------ var(Residual) | .5849574 .2068139 .2925356 1.169687 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(2) = 4.69 Prob > chi2 = 0.0960 Note: LR test is conservative and provided only for reference.
Unbalance panel data with gap in the timeseries
I have data of the following format where there is gap in between the time-series.
I want to estimate GMM model from the data. Can I run GMM model from this data as the time-series is broken? For example, will the the the observations (4 obs) in var1 be included in the analysis as observation in third period is missing?
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte period str1 country byte(var1 var2) float var3 1 "A" 11 . 1.2 2 "A" 12 23 1.5 3 "A" . 25 1.4 4 "A" 13 19 1 5 "A" 14 18 1.6 1 "B" 14 12 2.1 2 "B" 15 16 1.6 3 "B" 13 15 1.8 4 "B" 15 14 1.4 5 "B" 15 . 13 end
Stata keybindings (keyboard shortcuts) using a hammer spoon (almost perfect, except for one thing)
I've installed hammerspoon on my Mac OS.
As a result, I can now use Emacs keybindings in Stata as well.
It's very useful.
Unfortunately, Ctrl-b doesn't work anymore, so I can't re-run the old command freely.
I can't use Ctrl-r and Ctrl-b to select freely when running it.
That's the only thing that's really disappointing.
Does anyone know of a better way to do this?
I could find keywords as terminfo and termcap, but I don't know how to set it up.
Thank you for in advance.
Somebody please give me some advice.
Please refer to the following URL for hammerspoon.
https://gist.github.com/justintanner...5d5196cf22e98a
As a result, I can now use Emacs keybindings in Stata as well.
It's very useful.
Unfortunately, Ctrl-b doesn't work anymore, so I can't re-run the old command freely.
I can't use Ctrl-r and Ctrl-b to select freely when running it.
That's the only thing that's really disappointing.
Does anyone know of a better way to do this?
I could find keywords as terminfo and termcap, but I don't know how to set it up.
Thank you for in advance.
Somebody please give me some advice.
Please refer to the following URL for hammerspoon.
https://gist.github.com/justintanner...5d5196cf22e98a
Monday, June 29, 2020
How to test coefficent difference across two group
Hi there,
I have a problem in my research project, where I want to test the coefficent difference across two group, as shown by bellow pictures. Tell me how to test through STATA, please.
Moreover, you can also tell me other ways to test the difference through STATA, with the exception of suest, bootstrap, and chowtest.
Thanks for your help!
Array
I have a problem in my research project, where I want to test the coefficent difference across two group, as shown by bellow pictures. Tell me how to test through STATA, please.
Moreover, you can also tell me other ways to test the difference through STATA, with the exception of suest, bootstrap, and chowtest.
Thanks for your help!
Array
Residuals as Dependent Variable - Interpretation of Coefficients?
Hi there,
I try to model daily stock market trading volume with a bunch of independent variables like stock returns, stock price volatility, etc. in an time-series regression model.
In the first place, I estimate a rolling AR(1) model of trading volume over a ten-day window in order to use the residuals of this model as "shocks" to volume as dependent variable in my main regression setup.
Is there any way to compute the effect of an independent variable on volume given that choice of the dependent variable?
I mean, can you make a meaningful statement like "Given the estimated coefficient of dependent variable X: If one increases X by one unit (one percent), c.p. the effect on stock market volume amounts to ..."?
Thanks!
I try to model daily stock market trading volume with a bunch of independent variables like stock returns, stock price volatility, etc. in an time-series regression model.
In the first place, I estimate a rolling AR(1) model of trading volume over a ten-day window in order to use the residuals of this model as "shocks" to volume as dependent variable in my main regression setup.
Is there any way to compute the effect of an independent variable on volume given that choice of the dependent variable?
I mean, can you make a meaningful statement like "Given the estimated coefficient of dependent variable X: If one increases X by one unit (one percent), c.p. the effect on stock market volume amounts to ..."?
Thanks!
Saving a file takes forever
Hi, I have a problem with saving an appended file. I have couple of files to be appended, but it takes hours till a file saved! Sorry if my question is very simple. Any suggestion?
Measuring Knowledge Level based on survey answers
Hi!
I asked my respondents a few questions with 04 options each
For example,
"Which one is the largest state in area? Texas, Alaska, Delaware, Maine"
"Which state is located in the West coast? California, Nevada, New York, Illinois"
...
...
I want to develop a measure of "Geography Knowledge". What is the standard practice or tools that are used frequently? If you can provide some link to published work that will do too.
I am doing some obvious one such as,
1) Dummy for correct answer - adding the correct answer dummy for each respondents.
2) Rank of correct answers - such as for the first question - Alaska (4), Texas (3), Maine (2), Delaware (1).
Problem is, I am assuming equal distance between the "correctness" of the answers. For example, I want to give more points to Texas (more than 03) and less point to someone who answered Delaware as the largest state (less than 01).
PS. I do know the correct answers to each question.
I asked my respondents a few questions with 04 options each
For example,
"Which one is the largest state in area? Texas, Alaska, Delaware, Maine"
"Which state is located in the West coast? California, Nevada, New York, Illinois"
...
...
I want to develop a measure of "Geography Knowledge". What is the standard practice or tools that are used frequently? If you can provide some link to published work that will do too.
I am doing some obvious one such as,
1) Dummy for correct answer - adding the correct answer dummy for each respondents.
2) Rank of correct answers - such as for the first question - Alaska (4), Texas (3), Maine (2), Delaware (1).
Problem is, I am assuming equal distance between the "correctness" of the answers. For example, I want to give more points to Texas (more than 03) and less point to someone who answered Delaware as the largest state (less than 01).
PS. I do know the correct answers to each question.
How to draw a twoway picture like these
How to draw a twoway picture like these, these two pictures,thanks
Array
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year double(telephone growth) 2005 57.22 14.4 2006 63.39 10.8 2007 69.45 9.600000000000001 2008 74.29 7 2009 79.89 7.5 2010 86.41 8.200000000000001 2011 94.81 9.700000000000001 2012 103.1 8.700000000000001 2013 109.95 6.6000000000000005 2014 112.26 2.1 2015 109.3 -2.6 2016 110.55 1.1 end
Convert SIF date to year
Hi all,
Below is a few observations of my large panel data set I got after linking several data together. My question is about variable mth , type int. I thought it is internal stata format for time. but now I am not sure.
What I need is to generate a year variable out of this mth variable.
I thought mth is a SIF time, because in the earlier merging process across 3 stata data, I used
to generate this variable, and used it in merging process.
Please let me know if a year variable can be created from mth.
Thank you,
Rochelle
Below is a few observations of my large panel data set I got after linking several data together. My question is about variable mth , type int. I thought it is internal stata format for time. but now I am not sure.
Code:
clear input int mth str5 ticker 285 "A" 286 "A" 287 "A" 312 "A" 313 "A" 314 "A" 315 "A" 317 "A" 318 "A" end
I thought mth is a SIF time, because in the earlier merging process across 3 stata data, I used
Code:
gen mth=month(date) // I assume this is a month stored SIF time
Please let me know if a year variable can be created from mth.
Thank you,
Rochelle
Desscriptive Statistics and Matrices
I have the code below. I want to export the result from the code below to excel and also want to add standard deviations for the two groups SCTP and NonSCTP.
Code:
local vars "HAZ WAZ WHZ Female melevel region twins children_U5 water_source helevel year district AdjHAZ newcage newcage_6 newcage_11 newreligion newwealthsco newhhsex newwage newhhweight newchild_weight agediff"
matrix means = J(23, 3, -99)
matrix colnames means = NonSCTP SCTP t-value
matrix rownames means = `vars'
local irow = 0
qui {
foreach var of varlist `vars' {
local ++irow
sum `var' if SCTP == 0
matrix means[`irow',1] = r(mean)
sum `var' if SCTP == 1
matrix means[`irow',2] = r(mean)
ttest `var', by(SCTP)
matrix means[`irow',3] = r(t)
}
}
matrix list means, format(%15.4f)
Code:
local vars "HAZ WAZ WHZ Female melevel region twins children_U5 water_source helevel year district AdjHAZ newcage newcage_6 newcage_11 newreligion newwealthsco newhhsex newwage newhhweight newchild_weight agediff"
matrix means = J(23, 3, -99)
matrix colnames means = NonSCTP SCTP t-value
matrix rownames means = `vars'
local irow = 0
qui {
foreach var of varlist `vars' {
local ++irow
sum `var' if SCTP == 0
matrix means[`irow',1] = r(mean)
sum `var' if SCTP == 1
matrix means[`irow',2] = r(mean)
ttest `var', by(SCTP)
matrix means[`irow',3] = r(t)
}
}
matrix list means, format(%15.4f)
xtreg Y X1 X2...Xn i.country, fe- Is it possible/ correct/ feasible?
Declared panel data as xtset year (instead of xtset panelid year)
xtset YEAR
panel variable: YEAR (balanced)
. xtreg logXijt logGDPitGDPjt logPCGDPitPCGDPjt logPCGDPDijt TRGDPit TRGDPjt logDistanceij Borderij i.PanelIDIndoASEAN, fe
Is it correct way of computing country wise fixed effects?
.reg logXijt logGDPitGDPjt logPCGDPitPCGDPjt logPCGDPDijt TRGDPit TRGDPjt logDistanceij Borderij i.PanelIDIndoASEAN. This is also countrywise fixed effect command.
I was told to give command in point 2 for countyrwise fixed effect and now I am confused if it is possible? Please answer this query
Regards
Saba Gulnaz
xtset YEAR
panel variable: YEAR (balanced)
. xtreg logXijt logGDPitGDPjt logPCGDPitPCGDPjt logPCGDPDijt TRGDPit TRGDPjt logDistanceij Borderij i.PanelIDIndoASEAN, fe
Is it correct way of computing country wise fixed effects?
.reg logXijt logGDPitGDPjt logPCGDPitPCGDPjt logPCGDPDijt TRGDPit TRGDPjt logDistanceij Borderij i.PanelIDIndoASEAN. This is also countrywise fixed effect command.
I was told to give command in point 2 for countyrwise fixed effect and now I am confused if it is possible? Please answer this query
Regards
Saba Gulnaz
Limitation of arguments in mata
Cheers together,
currently, I am trying to find the internal rate of return or cost of equity for companies over time. So I have many companies and for each company a time series of data. Hence, I am trying to find the cost of equity for each year for a given company. This requires to solve an equation like: Price = function(cost of equity, earnings0, earnings1, earnings2,.....) . Here the cost of equity is the only unkonwn paramter, while I have 13 input paramters.
In this context I wrote some mata code which loops over all observations and determines the cost of equity (here k). Nevertheles, the command optimize allows only for 9 Input paramters....
Does anyone have an idea how to increase the number of arguments or can recommand any other command that allows for at least 13 arguments?
Thanks a lof for your help and consideration!
Best,
Dominik
currently, I am trying to find the internal rate of return or cost of equity for companies over time. So I have many companies and for each company a time series of data. Hence, I am trying to find the cost of equity for each year for a given company. This requires to solve an equation like: Price = function(cost of equity, earnings0, earnings1, earnings2,.....) . Here the cost of equity is the only unkonwn paramter, while I have 13 input paramters.
In this context I wrote some mata code which loops over all observations and determines the cost of equity (here k). Nevertheles, the command optimize allows only for 9 Input paramters....
Does anyone have an idea how to increase the number of arguments or can recommand any other command that allows for at least 13 arguments?
Thanks a lof for your help and consideration!
Best,
Dominik
HTML Code:
mata: mata clear
mata
P = st_data(.,("me"))
GL = st_data(.,("g_l"))
B0 = st_data(.,("be"))
B1 = st_data(.,("be_1"))
B2 = st_data(.,("be_2"))
B3 = st_data(.,("be_3"))
B4 = st_data(.,("be_4"))
B5 = st_data(.,("be_5"))
E1 = st_data(.,("E_1"))
E2 = st_data(.,("E_2"))
E3 = st_data(.,("E_3"))
E4 = st_data(.,("E_4"))
E5 = st_data(.,("E_5"))
void eval0(todo, k, p, gl, b0, b1, b2, b3, b4, b5, e1, e2, e3, e4, e5, v, g, H) {
v = (p :- b0 :- ((e1 :- k :* b0) :/ (1 :+ k)) :- ((e2 :- k :* b1) :/ (1 :+ k)^2) :- ((e3 :- k :* b2) :/ (1 :+ k)^3) :- ((e4 :- k :* b3) :/ (1 :+ k)^4) :- ((e5 :- k :* b4) :/ (1 :+ k)^5) :- (((e5 :- k :* b4) :* (1+gl)) :/ ((1 :+ k)^5) :* (k-gl)))^2
}
S = optimize_init()
optimize_init_which(S, "min")
optimize_init_conv_ptol(S, 1e-12)
optimize_init_conv_vtol(S, 1e-12)
optimize_init_evaluator(S, &eval0())
optimize_init_params(S, (0))
for(i=1;i<=235313;i++) {
p =P[i..i,1..1]
gl =GL[i..i,1..1]
b0 =B0[i..i,1..1]
b1 =B1[i..i,1..1]
b2 =B2[i..i,1..1]
b3 =B3[i..i,1..1]
b4 =B4[i..i,1..1]
b5 =B5[i..i,1..1]
e1 =E1[i..i,1..1]
e2 =E2[i..i,1..1]
e3 =E3[i..i,1..1]
e4 =E4[i..i,1..1]
e5 =E5[i..i,1..1]
optimize_init_argument(S, 1, p)
optimize_init_argument(S, 2, gl)
optimize_init_argument(S, 3, b0)
optimize_init_argument(S, 4, b1)
optimize_init_argument(S, 5, b2)
optimize_init_argument(S, 6, b3)
optimize_init_argument(S, 7, b4)
optimize_init_argument(S, 8, b5)
optimize_init_argument(S, 9, E1)
optimize_init_argument(S, 10, E2)
optimize_init_argument(S, 11, E3)
optimize_init_argument(S, 12, E4)
optimize_init_argument(S, 13, E5)
k= optimize(S)
k
ri = k
ri
st_matrix("r"+strofreal(i),ri)
if (i == 1) R = st_matrix( "r1")
if (i >= 2) R = R \ ri
}
R
end
How to import several excels
Dear Statalist,
I need to create a database using several (hundreds) of excels like the one I show below. The problem is that I do not know how to do it since the name of each file does not follow a consecutive order. I mean, the first file could be NACE0113 and the second one NACE0119...
Another problem is that there are several rows at the beginning that I do not need, in fact I only need from row 11 onwards. However, I also need to differentiate with respect to when x1 is the median (rows 14 to 36), or x1 is the mean (rows 39 to 61) for the different years. So, an optimal solution (as far as I see) would be to put first the median like (mex1 mex2...) and then as other different variables the means like (mnx1 mnx2...) for the different years.
I do not know if this is possible, but i am pretty lost here.
Do you know of any way that help to solve this, or any other solution?
Thanks in advance. Array
I need to create a database using several (hundreds) of excels like the one I show below. The problem is that I do not know how to do it since the name of each file does not follow a consecutive order. I mean, the first file could be NACE0113 and the second one NACE0119...
Another problem is that there are several rows at the beginning that I do not need, in fact I only need from row 11 onwards. However, I also need to differentiate with respect to when x1 is the median (rows 14 to 36), or x1 is the mean (rows 39 to 61) for the different years. So, an optimal solution (as far as I see) would be to put first the median like (mex1 mex2...) and then as other different variables the means like (mnx1 mnx2...) for the different years.
I do not know if this is possible, but i am pretty lost here.
Do you know of any way that help to solve this, or any other solution?
Thanks in advance. Array
wildcard in string variable
I'm trying to link different business locations to appropriate zip codes using a do-file. I have business names in a name variable and created a new variable to hold the zipcode. Some of the business names are similar but they have a location indicator at the end such as "Office - Allendale" and "Office-Grand Rapids". I tried using a replace command with an if statement but it doesn't appear that I can use wildcards in that. Is there a way for me to do this? Below is how I'm currently doing it...
replace zip="49401" if locationname==" Office - Allendale"
This works for just a few instances but I have a health system with hundreds of entries so trying to avoid having to individually manage each one since they have the embedded location descriptor" Thank you!
replace zip="49401" if locationname==" Office - Allendale"
This works for just a few instances but I have a health system with hundreds of entries so trying to avoid having to individually manage each one since they have the embedded location descriptor" Thank you!
How to organise my data, Im a beginner!
Hi
I would like to know what commands to use to organise my data,
I currently have a dataset with over 1000 observations and would like to condense it all.
My data varies by country and there are many different variables but these have been collected in different years, I would like to combine all the years into one observation.
For example:
I have something like this
Country Year Var1 Var2 Var3 Var4
1 2001 5
1 2002 7 2
1 2003 4
and would like to collapse all the info by country into one single observation point ( irregardless of the date) such that I get:
Country Var1 Var2 Var3 Var4
1 5 7 2 4
Any ideas as to how i can do this?
Also Sorry I haven't used the dataset example thing, like I say im new to stata and that didn't really work- hope this is easy enough for you to help!
Thank you
I would like to know what commands to use to organise my data,
I currently have a dataset with over 1000 observations and would like to condense it all.
My data varies by country and there are many different variables but these have been collected in different years, I would like to combine all the years into one observation.
For example:
I have something like this
Country Year Var1 Var2 Var3 Var4
1 2001 5
1 2002 7 2
1 2003 4
and would like to collapse all the info by country into one single observation point ( irregardless of the date) such that I get:
Country Var1 Var2 Var3 Var4
1 5 7 2 4
Any ideas as to how i can do this?
Also Sorry I haven't used the dataset example thing, like I say im new to stata and that didn't really work- hope this is easy enough for you to help!
Thank you

estimating sample size for cohort studies
I am trying to compute the sample size of a cohort study, pregnant women beyond 20 weeks of gestation with normal blood pressure <130/80 mmhg versus pregnant women beyong 20 weeks of gestation with subclinical elevation of blood pressure 130 - 139/ 80 - 89 mmhg. The outcome of interest is onset (incidence) of pregancy induced hypetenstion as per existing threasholds 140/90 mmhg). Recent cohort study of 2,090 normotensive women, 1318 (63.0%) remained normotensive for their entire antenatal course prior to delivery admission and 772 (37.0%) had new-onset blood pressure elevations between 130-139/80-89 mmHg. The incidence of pregnancy induced hypertension in the normotensive group was 11.6% vs 32% in the blood pressure elevation group. https://www.ajog.org/article/S0002-9378(20)30635-9/pdf
Therefore using 0.116 as the proportion of the outcome in the normotensive group, i would like to calculate the sample size for a cohort study powered at 80%, with type I error of 5%.
will the power twoproportions function in STATA sufice? how would I incoperate loss to follow up (drop out) in the said syntax?
Therefore using 0.116 as the proportion of the outcome in the normotensive group, i would like to calculate the sample size for a cohort study powered at 80%, with type I error of 5%.
will the power twoproportions function in STATA sufice? how would I incoperate loss to follow up (drop out) in the said syntax?
Counting up within ID
Hi everyone,
Quick question that may be quite simple to answer but I am having trouble wrapping my head around it.
Below I have pasted some example data.
We have our participant ID and a variable called numberofsample.
We can see that participand ID was included in 1 number of sample so has one record.
Participant ID 3 was included in 3 samples so has 3 records. etc.
I was wondering if there was any script that could essentially create a new var (lets call it experiment), that would make the data look like this (below)?
basically for any numberofsample that is 1 that the experiment would equal 1
if numberofsample is 2, then the first record within the ID would have experiment = 1 and then the second record experiment = 2
if numberofsample is 3, then the first record within the ID would have experiment = 1 and then the second record experiment = 2 and the third record experiment = 3
Any help would be super appreciated!
Kind regards,
Ryan
Quick question that may be quite simple to answer but I am having trouble wrapping my head around it.
Below I have pasted some example data.
We have our participant ID and a variable called numberofsample.
Code:
clear input float(ID numberofsample) 1 1 2 1 3 3 3 3 3 3 4 3 4 3 4 3 5 1 6 3 6 3 6 3 7 2 7 2 8 3 8 3 8 3 9 3 9 3 9 3 end
Participant ID 3 was included in 3 samples so has 3 records. etc.
I was wondering if there was any script that could essentially create a new var (lets call it experiment), that would make the data look like this (below)?
Code:
clear input float(ID numberofsample experiment) 1 1 1 2 1 1 3 3 1 3 3 2 3 3 3 4 3 1 4 3 2 4 3 3 5 1 1 6 3 1 6 3 2 6 3 3 7 2 1 7 2 2 8 3 1 8 3 2 8 3 3 9 3 1 9 3 2 9 3 3 end
if numberofsample is 2, then the first record within the ID would have experiment = 1 and then the second record experiment = 2
if numberofsample is 3, then the first record within the ID would have experiment = 1 and then the second record experiment = 2 and the third record experiment = 3
Any help would be super appreciated!
Kind regards,
Ryan
Producting publication tables for loneway command
Dear Stata users,
I computed different intraclass correlations using the loneway command. Which command can I use to produce publication-quality tables for these results?
Best, Anne
I computed different intraclass correlations using the loneway command. Which command can I use to produce publication-quality tables for these results?
Best, Anne
standard deviation of a variable
Hi everyone,
I have a set data for 10 years. I want to calculate the standard deviation of one variable during year y-2 to y (three years) in Stata 16. Can anybody say which command I have to use?
Thanks in advance.
I have a set data for 10 years. I want to calculate the standard deviation of one variable during year y-2 to y (three years) in Stata 16. Can anybody say which command I have to use?
Thanks in advance.
Nelson aalen cumulative hazard function
Hi all!
Hope you are doing good. I have a question about the Nelson aalen cumulative hazard function. I made a graph about chance of discharge after a certain surgery.
As you can see in the exported graph does the graph begin at day 5, how can change it so it begin at 0 with a probablity of 0? I can change the labels of the axis but there will be no line drawn from 0.
The graph looks the way it does because there is no chance of discharge in the first 4 days. I hope you understand my question! Let me know if you know how to help
Kind regards,
Daniel
Array
Hope you are doing good. I have a question about the Nelson aalen cumulative hazard function. I made a graph about chance of discharge after a certain surgery.
As you can see in the exported graph does the graph begin at day 5, how can change it so it begin at 0 with a probablity of 0? I can change the labels of the axis but there will be no line drawn from 0.
The graph looks the way it does because there is no chance of discharge in the first 4 days. I hope you understand my question! Let me know if you know how to help
Kind regards,
Daniel
Array
Treatment Effect in Panel Data with Many Time Periods and Various Treatment Starts
Dear Statalist,
I have a question regarding the identification of a treatment effect in a large panel dataset (800.000 observations), spanning many time periods (monthly values for each pixel for 17 years, 2000-2017). The data is on a land management project, and the objective is to identify whether the project has had an observable effect on vegetation as measured by satellite data. Treatment happened in different treatment areas at different times between 2009 and 2015, so there is no “clean” pre-treatment and post-treatment period.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long id float(yrmo year month critws_implemstart ndvi ndvi_mean ndvi_an treatmentyear) 10869164 624 2012 1 . 1.2073 1.2279375 -1.6806648 0 10869164 672 2016 1 . 1.2025 1.2279375 -2.071561 0 10869164 636 2013 1 . 1.21 1.2279375 -1.4607764 0 10869164 480 2000 1 . . 1.2279375 . 0 10869164 528 2004 1 . 1.2136 1.2279375 -1.1676018 0 10869164 660 2015 1 . 1.232 1.2279375 .33084205 0 10869164 588 2009 1 . 1.1974 1.2279375 -2.486893 0 10869164 576 2008 1 . 1.252 1.2279375 1.959588 0 10869164 504 2002 1 . 1.2442 1.2279375 1.3243778 0 10869164 612 2011 1 . 1.2146 1.2279375 -1.0861704 0 10869164 600 2010 1 . 1.1985 1.2279375 -2.3973064 0 10869164 492 2001 1 . 1.2364 1.2279375 .6891677 0 10869164 684 2017 1 . 1.204 1.2279375 -1.949404 0 10869164 516 2003 1 . 1.2128 1.2279375 -1.2327528 0 10869164 540 2005 1 . 1.2132 1.2279375 -1.2001822 0 10869164 564 2007 1 . 1.2301 1.2279375 .1761145 0 10869164 648 2014 1 . 1.2241 1.2279375 -.3125132 0 10869164 552 2006 1 . 1.2212 1.2279375 -.5486819 0 10869165 492 2001 1 . 1.2211 1.2083875 1.0520201 0 10869165 624 2012 1 . 1.1841 1.2083875 -2.0099068 0 10869165 516 2003 1 . 1.193 1.2083875 -1.273394 0 10869165 684 2017 1 . 1.2126 1.2083875 .34860495 0 10869165 528 2004 1 . 1.1913 1.2083875 -1.4140712 0 10869165 540 2005 1 . 1.1849 1.2083875 -1.9437017 0 10869165 588 2009 1 . 1.1823 1.2083875 -2.1588707 0 10869165 636 2013 1 . 1.1968 1.2083875 -.9589226 0 10869165 648 2014 1 . 1.2069 1.2083875 -.1230974 0 10869165 480 2000 1 . . 1.2083875 . 0 10869165 672 2016 1 . 1.2016 1.2083875 -.56170213 0 10869165 612 2011 1 . 1.1981 1.2083875 -.8513431 0 10869165 564 2007 1 . 1.213 1.2083875 .3817124 0 10869165 504 2002 1 . 1.2263 1.2083875 1.482348 0 10869165 600 2010 1 . 1.1814 1.2083875 -2.2333524 0 10869165 576 2008 1 . 1.2369 1.2083875 2.3595476 0 10869165 660 2015 1 . 1.2336 1.2083875 2.0864604 0 10869165 552 2006 1 . 1.2006 1.2083875 -.6444511 0 10869166 480 2000 1 . . 1.2400625 . 0 10869166 648 2014 1 . 1.2619 1.2400625 1.7609978 0 10869166 684 2017 1 . 1.2456 1.2400625 .4465509 0 10869166 528 2004 1 . 1.2176 1.2400625 -1.8113997 0 10869166 540 2005 1 . 1.2199 1.2400625 -1.625923 0 10869166 564 2007 1 . 1.2531 1.2400625 1.0513633 0 10869166 552 2006 1 . 1.2385 1.2400625 -.1259998 0 10869166 492 2001 1 . 1.2311 1.2400625 -.7227468 0 10869166 600 2010 1 . 1.2223 1.2400625 -1.4323813 0 10869166 624 2012 1 . 1.198 1.2400625 -3.391968 0 10869166 516 2003 1 . 1.2231 1.2400625 -1.367877 0 10869166 672 2016 1 . 1.2359 1.2400625 -.335663 0 10869166 504 2002 1 . 1.2566 1.2400625 1.333606 0 10869166 576 2008 1 . 1.2806 1.2400625 3.268987 0 10869166 660 2015 1 . 1.3035 1.2400625 5.115676 0 10869166 636 2013 1 . 1.2509 1.2400625 .8739523 0 10869166 588 2009 1 . 1.2284 1.2400625 -.9404755 0 10869166 612 2011 1 . 1.2305 1.2400625 -.7711299 0 10869167 516 2003 1 . 1.2405 1.2541875 -1.0913434 0 10869167 684 2017 1 . 1.2396 1.2541875 -1.1631054 0 10869167 492 2001 1 . 1.244 1.2541875 -.8122794 0 10869167 552 2006 1 . 1.2647 1.2541875 .8381993 0 10869167 480 2000 1 . . 1.2541875 . 0 10869167 588 2009 1 . 1.2456 1.2541875 -.6847046 0 10869167 624 2012 1 . 1.2141 1.2541875 -3.1962895 0 10869167 648 2014 1 . 1.2474 1.2541875 -.5411806 0 10869167 540 2005 1 . 1.2301 1.2541875 -1.9205605 0 10869167 504 2002 1 . 1.2826 1.2541875 2.2654173 0 10869167 660 2015 1 . 1.2837 1.2541875 2.3531191 0 10869167 576 2008 1 . 1.2796 1.2541875 2.026217 0 10869167 612 2011 1 . 1.2445 1.2541875 -.7724063 0 10869167 636 2013 1 . 1.2521 1.2541875 -.16644034 0 10869167 564 2007 1 . 1.2523 1.2541875 -.1504911 0 10869167 672 2016 1 . 1.2464 1.2541875 -.6209172 0 10869167 528 2004 1 . 1.2397 1.2541875 -1.1551307 0 10869167 600 2010 1 . 1.229 1.2541875 -2.0082717 0 10869185 684 2017 1 . 1.2134 1.227675 -1.1627634 0 10869185 564 2007 1 . 1.2184 1.227675 -.7554898 0 10869185 552 2006 1 . 1.1851 1.227675 -3.467938 0 10869185 540 2005 1 . 1.2016 1.227675 -2.123934 0 10869185 600 2010 1 . 1.1986 1.227675 -2.3682904 0 10869185 612 2011 1 . 1.2374 1.227675 .7921554 0 10869185 648 2014 1 . 1.2048 1.227675 -1.8632742 0 10869185 624 2012 1 . 1.2223 1.227675 -.437812 0 10869185 528 2004 1 . 1.2314 1.227675 .3034233 0 10869185 516 2003 1 . 1.2204 1.227675 -.59258235 0 10869185 660 2015 1 . 1.2069 1.227675 -1.69222 0 10869185 576 2008 1 . 1.2483 1.227675 1.6800046 0 10869185 480 2000 1 . . 1.227675 . 0 10869185 672 2016 1 . 1.1799 1.227675 -3.891495 0 10869185 636 2013 1 . 1.2167 1.227675 -.8939666 0 10869185 492 2001 1 . 1.219 1.227675 -.7066185 0 10869185 588 2009 1 . 1.1958 1.227675 -2.5963724 0 10869185 504 2002 1 . 1.2972 1.227675 5.663144 0 10869188 576 2008 1 . 1.2204 1.212275 .6702231 0 10869188 528 2004 1 . 1.2308 1.212275 1.528119 0 10869188 636 2013 1 . 1.2004 1.212275 -.979566 0 10869188 600 2010 1 . 1.1956 1.212275 -1.3755126 0 10869188 660 2015 1 . 1.2162 1.212275 .3237686 0 10869188 516 2003 1 . 1.2004 1.212275 -.979566 0 10869188 672 2016 1 . 1.196 1.212275 -1.342521 0 10869188 684 2017 1 . 1.2042 1.212275 -.6661029 0 10869188 612 2011 1 . 1.2205 1.212275 .6784735 0 10869188 504 2002 1 . 1.239 1.212275 2.2045274 0 end format %tm yrmo
At the moment I am employing a very simple type of difference-in-difference style estimation looking like this:
Code:
reg ndvi_an treatmentyear i.year i.month, r
Now, excuse me if the question is broad, but I simply wonder: isn’t there a better way to test for the treatment effect in this case? Ideally I would want to see if treatment areas deviate from their pre-treatment trend in a different way than control areas (and by doing so also somehow testing the parallel trends assumption), but I am simply a bit out of my econometrical/coding depth here. I’ve played around with xtset and xtreg for a few days, but I can’t really figure out if and how that would help.
As always thankful for any assistance,
Lars
serial/cross sectional autocorrelation and heteroscedasticity in panel data
Hello everyone,
I am working with a paneldataset with N=170 and T=5. I want to test for heteroscedasticity/autocorrelation but I kind of get lost in all the different functions. I know xttest2 tests for autocorrelation and xttest3 for heteroscedasticity. However, I am wondering why the BP test cannot be used in this case, is the BP test a test for one wave data? furthermore, I am struggling to understand the difference between serial autocorrelation and cross sectional autocorrelation. It seems like xttest2 tests for cross sectional autocorrelation, but given that I am working with time series, I feel serial is the way to go.
Kind regards,
Timea
I am working with a paneldataset with N=170 and T=5. I want to test for heteroscedasticity/autocorrelation but I kind of get lost in all the different functions. I know xttest2 tests for autocorrelation and xttest3 for heteroscedasticity. However, I am wondering why the BP test cannot be used in this case, is the BP test a test for one wave data? furthermore, I am struggling to understand the difference between serial autocorrelation and cross sectional autocorrelation. It seems like xttest2 tests for cross sectional autocorrelation, but given that I am working with time series, I feel serial is the way to go.
Kind regards,
Timea
hcost: error occurred while loading hcost.ado
Good day all,
i need to run a hcost command to calculate cost estimate based on censored data [https://www.stata-journal.com/articl...article=st0399]
attached is a part of the data:
i managed to run the analysis for the first time. however, I am not able to rerun the test or on different variable using hcost again as it continues to show error
my command:
it shows:
I believe this is because the calculation is stored in a temporary file and is preventing me from running a similar analysis.
if I try to repeat the test by saving with a different file to run the analysis, it still shows a similar error
command:
i still get the error:
is there a way for me to clear this temporary file or to save it under a different temporary file for analysis? or any solutions to resolve the problem. any help is much appreciated.
regards.
i need to run a hcost command to calculate cost estimate based on censored data [https://www.stata-journal.com/articl...article=st0399]
attached is a part of the data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(case_id castage6) int(treatmentduration days_post1year) byte censurepost1y double(hctotal hctotal_post1y) 1 2 916 551 0 4899.58 1088.48 2 2 1465 1100 0 10278.75 6236.8 3 2 618 253 0 4245.7 889.3 4 2 727 362 0 5500.966 1500.9 5 2 1527 1162 0 3763.54 1721.7 6 1 1197 832 0 48420.82 4053.8 7 2 2670 2305 0 14814.779999999999 7231.4 8 2 3909 3544 0 18826.22 14630 9 1 1827 1462 0 61879.55 15288 10 2 435 70 0 2811.76 240 11 2 471 106 0 3628.82 377.6 12 2 3485 3120 0 5627.92 3683 13 2 4136 3771 0 15821.55 11648.6 14 2 741 375 0 9464.49 1245.5 15 2 1220 855 0 4998.21 1815.1 16 2 962 597 0 8375 1375.6 17 2 588 223 0 4292.17 874.5 18 1 544 179 0 53026.91 3578.5 19 2 2001 1636 0 7544.65 2450 20 2 438 73 0 3730.01 240 end
my command:
Code:
stset days_post1year , id(case_id) failure(censurepost1y) hcost case_id hctotal_post1y , l(365) method(0) group(castage6)
Code:
_sum_table() already in library (1 line skipped) (error occurred while loading hcost.ado) r(110)
if I try to repeat the test by saving with a different file to run the analysis, it still shows a similar error
command:
Code:
sjlog using hcost10, replace stset days_post1year , id(case_id) failure(censurepost1y) hcost case_id hctotal_post1y , l(365) method(0) group(castage6)
Code:
_sum_table() already in library (1 line skipped) (error occurred while loading hcost.ado) r(110);
regards.
Hurdle model with nehurdle command in Stata 12
I am using nehurdle command in Stata 12. My dependent variable is education expenditure (consisting of many zeros), hence I am using a two part regression. I am interested in elasticity estimates for the intensity (value) equation. Should I use the exponential option with nehurdle? Or is it better to use margins?
year dummies in xtreg and pooled OLS
Hello,
I ran both a fixed effects model and pooled OLS model on the same dataset. I am now wondering if, including year dummies and thus time effects, is comparable in both regressions. In the literature they mention "time fixed effects" to control for variables that are constant acreoss firms but change over time in the fixed effects model and "aggregate time effects" in the pooled OLS model. I am now wondering if the regression happens in a different way concerning these time dummies. And if so, what is the difference?
Kind regards,
Timea
I ran both a fixed effects model and pooled OLS model on the same dataset. I am now wondering if, including year dummies and thus time effects, is comparable in both regressions. In the literature they mention "time fixed effects" to control for variables that are constant acreoss firms but change over time in the fixed effects model and "aggregate time effects" in the pooled OLS model. I am now wondering if the regression happens in a different way concerning these time dummies. And if so, what is the difference?
Kind regards,
Timea
Elasticity estimates in Tobit regression
Dear Statalisters
This is my first post so please excuse if the question is not posed correctly.
While running a tobit regression (with censoring at 0), my dependent variable captures spending on education (hence there are many zeros), and hence I would like an estimate of proportional changes in expenditure with respect to the regressors. In a simple regression model, this is achieved by taking the log of the dependent variable, however, here the dependent variable has zeros and taking a log results in missing values since log of zero is not defined.
Could anyone suggest how to get around this issue? Will the margins command help here?
This is my first post so please excuse if the question is not posed correctly.
While running a tobit regression (with censoring at 0), my dependent variable captures spending on education (hence there are many zeros), and hence I would like an estimate of proportional changes in expenditure with respect to the regressors. In a simple regression model, this is achieved by taking the log of the dependent variable, however, here the dependent variable has zeros and taking a log results in missing values since log of zero is not defined.
Could anyone suggest how to get around this issue? Will the margins command help here?
Sunday, June 28, 2020
Help/advice on importing large number of text files into Stata
Dear all,
I have a collection of around 2,400 PDFs of parliamentary debate transcriptions that I would like to import into Stata. Having found no easy solution to directly importing PDFs into Stata, I have batch converted them to text files to import them.
I have tried using multimport (multimport delimited, extensions (txt) clear) as a way to bring all of the text files in. However, this command by itself is incorrect because it returns only 200 observations, when there should be around 1 million. I have read the help file and tried to look at alternative approaches (for example a loop involving import delimited) but couldn't solve this issue.
The attraction of multimport is that I can potentially record the filename as a new variable, which would be helpful in later processing.
I have two questions based on this:
1. Is conversion of PDFs to text files before importing the appropriate way to approach this problem?
2. If multimport is the correct command, does anyone have any insight on how to tailor the command to get the appropriate output?
Thanks,
Nate
I have a collection of around 2,400 PDFs of parliamentary debate transcriptions that I would like to import into Stata. Having found no easy solution to directly importing PDFs into Stata, I have batch converted them to text files to import them.
I have tried using multimport (multimport delimited, extensions (txt) clear) as a way to bring all of the text files in. However, this command by itself is incorrect because it returns only 200 observations, when there should be around 1 million. I have read the help file and tried to look at alternative approaches (for example a loop involving import delimited) but couldn't solve this issue.
The attraction of multimport is that I can potentially record the filename as a new variable, which would be helpful in later processing.
I have two questions based on this:
1. Is conversion of PDFs to text files before importing the appropriate way to approach this problem?
2. If multimport is the correct command, does anyone have any insight on how to tailor the command to get the appropriate output?
Thanks,
Nate
Tabout Error - Conformability r(3200)
Hello
I am using running a series of crosstab tables using the following tabout command:
Code:
tabout sex v501_marital_stat_r religion v106_education_r v190a_wealthquintiles v025_urbanrural ///
condomless2partners if hivstatusoutcome==2 using EmmaDHS_2B.txt, replace c(row) svy f(1) ///
style(tab) stats(chi2) font(bold) npos(col) percent pop
However, I am getting a sort of conformability error - produced below:
Code:
build_ncol(): 3200 conformability error
do_output(): - function returned error
<istmt>: - function returned error
r(3200);
end of do-file
r(3200);
I am not sure how to resolve this - I have changed the order of variables and even reduced them to as few as 5 as presented in the example data included here, but the error still appears.
I am wondering if you could give me assistance to resolving this. I am including an a scratch data below using tabout.
I look forward to any assistance - cheers, cY
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float sex byte v501_marital_stat_r int religion byte(v106_education_r v190a_wealthquintiles v025_urbanrural) float(condomless2partners hivstatusoutcome)
0 1 2 0 . 2 . 1
1 1 3 0 4 2 . 1
1 0 2 1 . 2 . 1
1 1 1 2 . 2 . 1
1 1 1 0 . 1 . .
1 0 2 2 . 2 . 1
1 1 1 1 . 1 . .
0 1 2 2 . 1 . 1
1 1 3 0 . 2 . .
1 1 3 0 3 2 . 1
1 1 2 2 . 2 . 1
1 1 2 1 . 2 . .
0 1 3 1 . 1 . 1
0 0 . 1 . 2 . 1
0 1 3 2 . 1 1 1
1 0 2 1 . 1 . 1
1 0 3 1 . 2 . 1
0 1 . 1 . 2 . 1
0 1 2 1 . 2 . 1
0 1 1 2 . 1 . 1
0 0 2 2 . 1 . 1
1 1 2 0 . 1 . .
1 0 2 2 . 2 . 1
1 1 2 0 . 2 . 1
1 1 1 1 . 2 . .
1 1 3 0 . 2 . 1
1 1 2 0 . 1 1 1
1 1 1 2 . 2 . .
1 0 2 2 . 1 . 1
0 0 2 2 . 2 . .
1 1 2 2 . 2 . 3
1 0 3 2 . 2 . .
0 0 3 0 2 1 . 1
1 0 . 1 . 2 . 1
1 1 2 0 . 2 . 1
1 1 3 0 . 2 . 1
1 1 2 2 . 2 . 1
0 1 2 2 . 2 1 1
1 0 3 0 . 2 . .
0 0 2 1 . 2 . 1
1 0 3 1 . 2 . .
0 1 1 0 . 2 . 1
0 1 2 1 . 2 . 1
0 0 2 1 . 2 . 1
1 1 3 0 . 1 . 1
1 0 2 2 . 1 0 1
0 0 3 0 1 1 1 1
1 1 3 0 . 2 . 1
0 1 . 1 . 1 . 1
1 0 1 0 . 2 . 1
end
label values sex gender
label def gender 0 "male", modify
label def gender 1 "female", modify
label values v501_marital_stat_r married
label def married 0 "not married/living together", modify
label def married 1 "married/living together", modify
label values religion religion
label def religion 1 "Catholic", modify
label def religion 2 "ChristianPentecostal", modify
label def religion 3 "Others(TradMuslim)", modify
label values v106_education_r edu
label def edu 0 "no education", modify
label def edu 1 "primary", modify
label def edu 2 "secondary or higher", modify
label values v190a_wealthquintiles MV190A
label def MV190A 1 "poorest", modify
label def MV190A 2 "poorer", modify
label def MV190A 3 "middle", modify
label def MV190A 4 "richer", modify
label values v025_urbanrural URBAN
label def URBAN 1 "urban", modify
label def URBAN 2 "rural", modify
label values condomless2partners yesno
label values hivstatusoutcome hivstatuso
label def hivstatuso 1 "hiv negative", modify
label def hivstatuso 3 "hiv positive and unaware", modify
Calculating effect sizes after MI and mixed
Hello All,
I hope this message finds you well!
After running the following code is there a way to calculate effect sizes?
I came across the following Stata blog but the recommendations are not feasible with my above analysis. I received the following error after trying "estat esize"
Thank you for your time and help,
Patrick
I hope this message finds you well!
After running the following code is there a way to calculate effect sizes?
Code:
mi estimate : mixed dnT1_sum SR_Fall age_month edu mvpa_all light_sed MVPA_FallSR c_sex1 || classid:
Code:
. estat esize estat esize not valid r(321);
Patrick
Help needed to adjust the outcome of bysort command
Dear Stata community members,
I have created a count variable k and p based on the following syntax :
by ID Illness (Year), sort: gen k = _n
by ID Illness Year (k), sort: replace k = k[1]
by ID ClassofIllness (Year), sort: gen p = _n
by ID ClassofIllness Year (p), sort: replace p = p[1]
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
However, the issue is, I want to have the counter as shown in k1 and p1. That is,
The row where dosageofantibiotics==0 , the counter should put the value as zero. The next time (as per the bysort conditions) the counter should pick from where it left before the zero (as per the bysort conditions).
Please help.
Regards.
I have created a count variable k and p based on the following syntax :
by ID Illness (Year), sort: gen k = _n
by ID Illness Year (k), sort: replace k = k[1]
by ID ClassofIllness (Year), sort: gen p = _n
by ID ClassofIllness Year (p), sort: replace p = p[1]
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str23 ID str10 Illness float(YearofContracting DosageOfAntibiotic ClassofIllness k k1 p p1) "Patient 1" "Malaria" 2001 100 1 1 1 1 1 "Patient 1" "Typhoid" 2001 15 1 1 1 1 1 "Patient 1" "Typhoid" 2002 26 1 2 2 3 3 "Patient 1" "Common flu" 2003 0 1 1 0 4 0 "Patient 1" "Allergy" 2004 26 1 1 1 5 4 "Patient 1" "Allergy" 2004 10 1 2 2 5 5 "Patient 1" "Allergy" 2005 0 1 3 0 7 0 "Patient 1" "Common flu" 2006 0 1 2 0 8 0 "Patient 1" "Common flu" 2007 0 1 3 0 9 0 "Patient 1" "Common flu" 2008 0 1 4 0 10 0 "Patient 1" "Common flu" 2009 0 1 5 0 11 0 "Patient 1" "Common flu" 2010 0 1 6 0 12 0 "Patient 1" "Allergy" 2012 9 1 4 3 13 6 "Patient 1" "Typhoid" 2013 18 1 3 3 14 7 "Patient 1" "Malaria" 2014 13 1 2 2 15 8 "Patient 1" "Allergy" 2015 0 1 5 0 16 0 "Patient 1" "Common flu" 2016 60 1 7 1 17 9 end
However, the issue is, I want to have the counter as shown in k1 and p1. That is,
The row where dosageofantibiotics==0 , the counter should put the value as zero. The next time (as per the bysort conditions) the counter should pick from where it left before the zero (as per the bysort conditions).
Please help.
Regards.
Can I select Fixed-effects model?
Dear,
First of all, I really thank you everyone here sparing their efforts to help the person like me.
- Fixed-effects model is proven valid by F-test. (The null hypothesis is rejected by the F-test)
- Random-effects model is not proven by LM test. (The null hypothesis is not rejected by the Lagrange multiplier method)
- Fixed-effects model is chosen by Hausman test. (The null hypothesis is rejected by Hausman test)
In this case,
Can I select Fixed-effects model?
Thanks in advance for your help!
First of all, I really thank you everyone here sparing their efforts to help the person like me.

- Fixed-effects model is proven valid by F-test. (The null hypothesis is rejected by the F-test)
- Random-effects model is not proven by LM test. (The null hypothesis is not rejected by the Lagrange multiplier method)
- Fixed-effects model is chosen by Hausman test. (The null hypothesis is rejected by Hausman test)
In this case,
Can I select Fixed-effects model?
Thanks in advance for your help!
Generating summary statistics and exporting to excel
I have a table that looks like this:
I want to generate a table of summary statistics (average, median, standard deviation and number of observations) by country. I want the results to be directly saved to excel.
I tried using the sumstats command.
sumstats /// (2016[YR2016]) /// using "test.xlsx" , replace stats(mean p50 sd) I get errors in this code. Is there an alternative way to obtain summary statistics without the sumstats command?
Country Name | Country Code | Series Name | Series Code | 2016 [YR2016] | 2017 [YR2017] | 2018 [YR2018] |
Afghanistan | AFG | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 77869931554 | 79945392646 | 80769357876 |
Albania | ALB | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 36239313232 | 37624046896 | 39183653161 |
Algeria | DZA | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 4.71936E+11 | 4.78071E+11 | 4.84764E+11 |
American Samoa | ASM | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | .. | .. | .. |
Andorra | AND | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | .. | .. | .. |
Angola | AGO | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 2.18309E+11 | 2.17987E+11 | 2.13337E+11 |
Antigua and Barbuda | ATG | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 1835521960 | 1893259104 | 2033155752 |
Arab World | ARB | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 5.88323E+12 | 5.95482E+12 | 6.09747E+12 |
Argentina | ARG | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 1.01085E+12 | 1.03782E+12 | 1.01207E+12 |
Armenia | ARM | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 33187468758 | 35676528915 | 37531708419 |
Aruba | ABW | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 3531919864 | 3578912448 | .. |
Australia | AUS | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 1.17534E+12 | 1.20316E+12 | 1.23854E+12 |
Austria | AUT | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 4.69057E+11 | 4.80673E+11 | 4.92304E+11 |
Azerbaijan | AZE | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 1.39546E+11 | 1.39153E+11 | 1.41119E+11 |
Bahamas, The | BHS | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 13470395241 | 13479375208 | 13690429864 |
Bahrain | BHR | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 68590652908 | 71199513662 | 72464535612 |
Bangladesh | BGD | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 6.19293E+11 | 6.64404E+11 | 7.1665E+11 |
Barbados | BRB | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 4537574770 | 4529721747 | 4507144307 |
Belarus | BLR | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 1.69342E+11 | 1.7363E+11 | 1.79098E+11 |
Belgium | BEL | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 5.6641E+11 | 5.77535E+11 | 5.85958E+11 |
Belize | BLZ | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 2633432198 | 2671282228 | 2752366426 |
Benin | BEN | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | 32196993426 | 34023063768 | 36301676625 |
Bermuda | BMU | GDP, PPP (constant 2017 international $) | NY.GDP.MKTP.PP.KD | .. | .. | .. |
I want to generate a table of summary statistics (average, median, standard deviation and number of observations) by country. I want the results to be directly saved to excel.
I tried using the sumstats command.
sumstats /// (2016[YR2016]) /// using "test.xlsx" , replace stats(mean p50 sd) I get errors in this code. Is there an alternative way to obtain summary statistics without the sumstats command?
How do you maintain your library of code sinppets?
This question is for anyone whose programmed for long enough to find that keeping snippets of code to hand is useful for remembering interesting or useful patterns, or to keep track of rare use cases, etc. If you do maintain a personal library, can you share details of how your organize it? Do you use an app? Do you have a loose collection of text files? What are your likes or grumbles?
New version of -ranktest- available on SSC
With thanks to Kit Baum, a new version of ranktest by Kleibergen-Schaffer-Windmeijer, version 2.0.03, is now available on SSC Archives.
Tests of rank have various practical applications; in econometrics probably the most common is the test of the requirement in a linear IV/GMM model that the matrix E(z_i x_i') is full rank, where z_i is the vector of instruments and x_i is the vector of endogenous regressors.
The updates to ranktest are extensive and include the addition of GMM-based J-type rank tests as proposed by Windmeijer (2018); see https://ideas.repec.org/p/bri/uobdis/18-696.html.
ranktest is a required component for ivreg2 and related packages. Please note, however, that the new features of ranktest require Stata 13 or higher. If called under version control or by an earlier version of Stata, ranktest will call a new program also included in the package, ranktest11. ranktest11 is essentially the previous version of ranktest version 1.4.01. Because ivreg2 runs under version control, the results reported by the current version of ivreg2 will be unaffected by this update. Similar remarks apply to related packages.
Tests of rank have various practical applications; in econometrics probably the most common is the test of the requirement in a linear IV/GMM model that the matrix E(z_i x_i') is full rank, where z_i is the vector of instruments and x_i is the vector of endogenous regressors.
The updates to ranktest are extensive and include the addition of GMM-based J-type rank tests as proposed by Windmeijer (2018); see https://ideas.repec.org/p/bri/uobdis/18-696.html.
ranktest is a required component for ivreg2 and related packages. Please note, however, that the new features of ranktest require Stata 13 or higher. If called under version control or by an earlier version of Stata, ranktest will call a new program also included in the package, ranktest11. ranktest11 is essentially the previous version of ranktest version 1.4.01. Because ivreg2 runs under version control, the results reported by the current version of ivreg2 will be unaffected by this update. Similar remarks apply to related packages.