BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Sunday, April 30, 2023

Double Reshape With Excel File?

Hey everyone. I'm working with quite the interesting excel file. Let's load it, shall we?

Code:

clear *
cls
import excel "https://www.beerinstitute.org/wp-content/uploads/2021/12/2021-September-The-Brewers-Almanac-Beer-Institute-2021.xlsx", ///
sheet("Beer Shipments by State") cellrange(A3:MV55) clear
keep A CX-MV

qui foreach v of var CX-MV {
loc year: di `v'[1]
loc month: di `v'[2]

rename `v' shipments`year'_`month'
}
drop in 1/2
br

Okay, so we have total U.S. state imports of beer from 2000 to 2021. I want the monthly panel in long format, where each state is indexed to the year and month. But how? So far I've named the outcomes shipments_year_month... So I figured I might need to reshape twice? How might I make this a proper panel dataset?

Cannot evaluate a nlcom ratio after an mlogit estimation

Hi all,

I am trying to get a valuation of a nonlinear expression of my estimated parameters from a mlogit model

. mlogit OS3 A1 North Central South Biol TG Mod Maj ASC ASCConcern Q3Concern, noconstant vce(robust)

The model is estimated and the results are tabulated. Then I want to assess the ratio of the coefficient of TG with that of A1. Namely the coefficient of TG over the coefficient of A1.
I went to the post estimation link or in the command

.nlcom (WTP: _b[TG]/_b[A1]), post

STATA does not provide statistics of the ratio but informs me the following

expression (_b[TG]/_b[A1]) evaluates to missing
r(498);

It seems to me that the coefficients are not stored. Since display _b[TG]
yields 0

Any help will be appreciated.

Peter

Generating ith date of year variable from day/month/year date variable?

Hi all,

I am wondering if it is possible to generate a variable in the format of ith day of the year from a date variable. I have been reading about working with date variables in Stata, but have yet to find a solution to this problem.

So, using a small subset of the data for an example, I have:

date
07oct2003
01apr2005
07oct2003
30oct2003
01apr2005
01apr2005
29dec2003
30oct2003
30oct2003
01apr2005

And I am hoping to create a variable with values of:

day_of_year
280
91
280
303
91
91
363
303
303
91

I understand that this is complicated by leap years (2000, 2004, etc.), but I am hoping that Stata can automatically account for that somehow...
Any advice you might be able to offer would be greatly appreciated!

Best wishes,
Matt

How to identify the minimum year of membership to the agreement between the pairs that are symmetric

Dear All,
I have data on trade agreements and membership to the WTO between country (i) and country (j). My data looks something like:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3(country_i country_j) int year byte(gatt pta)
"AUS" "BRA" 1986 0 0
"AUS" "BRA" 1987 0 0
"AUS" "BRA" 1988 0 1
"AUS" "BRA" 1989 0 1
"AUS" "BRA" 1990 0 1
"AUS" "BRA" 1991 0 1
"AUS" "BRA" 1992 1 1
"AUS" "BRA" 1993 1 1
"AUS" "BRA" 1994 1 1
"BRA" "AUS" 1986 0 0
"BRA" "AUS" 1987 0 0
"BRA" "AUS" 1988 1 0
"BRA" "AUS" 1989 1 0
"BRA" "AUS" 1990 1 1
"BRA" "AUS" 1991 1 1
"BRA" "AUS" 1992 1 1
"BRA" "AUS" 1993 1 1
"BRA" "AUS" 1994 1 1
end

The gatt membership year between AUS-BRA pair is 1992 whereas between BRA-AUS pair , it is year 1988. I want to pick the minimum year (earliest year between the symmetric pairs). How can i do that in STATA ?
similarly the pta between between AUS-BRA pair is 1988 whereas between BRA-AUS pair, the agreement year is1990. Similarily, i want to pick the minimum year of agreement signing between the country pairs which are symmetric .

Thank you

xtdidregress problem: not all my data is used

Hello,

I am writing my master thesis and I have to find the causal relationship between the Belgian gender quotum in board of directors of listed firms and firm performance. I use the DD method and more specifically, I use xtdidregress in Stata 17. Here you see a little bit of data. The variables are Year, CompanyID (just a number per company), Age, LnSales, ROA, BoardMembers, PercentageIndependentDirectors, CriteriaMet (the year the company met the criteria of the law.

+-----------------------------------------------------------------------------+
| Year Compan~D Age LnSales ROA BoardM~s Percen~s Criter~t |
|-----------------------------------------------------------------------------|
1. | 2008 100 43 8.7498276 14.03 4 0.25 2019 |
2. | 2009 100 44 8.8176829 13.59 4 0.25 2019 |
3. | 2010 100 45 8.8928999 13.31 4 0.25 2019 |
4. | 2011 100 46 8.967963 12.08 4 0.25 2019 |
5. | 2012 100 47 9.0254074 11.16 5 0.40 2019 |
6. | 2013 100 48 9.0655458 11.21 5 0.40 2019 |
7. | 2014 100 49 9.0956924 9.81 5 0.40 2019 |
8. | 2015 100 50 9.1245101 8.98 5 0.40 2019 |
9. | 2016 100 51 9.1583626 9.54 5 0.40 2019 |
10. | 2017 100 52 9.1083741 9.46 5 0.40 2019 |
11. | 2018 100 53 9.1520649 9.22 5 0.40 2019 |
12. | 2019 100 54 9.1675372 9.39 5 0.40 2019 |
13. | 2020 100 55 9.2033862 9.95 6 0.50 2019 |
|-----------------------------------------------------------------------------|
14. | 2008 101 19 9.1235439 4.8 10 0.70 2017 |
15. | 2009 101 20 8.8446859 3.13 10 0.60 2017 |
16. | 2010 101 21 9.1789641 8.53 10 0.60 2017 |
17. | 2011 101 22 9.5805885 9.73 10 0.50 2017 |
18. | 2012 101 23 9.4373177 6.68 10 0.50 2017 |
19. | 2013 101 24 9.1921005 5.27 10 0.40 2017 |
20. | 2014 101 25 9.0857418 4.9 9 0.33 2017 |
21. | 2015 101 26 9.1796425 4.58 9 0.56 2017 |
22. | 2016 101 27 9.253739 3.48 11 0.55 2017 |
23. | 2017 101 28 9.3882576 5.09 10 0.60 2017 |
24. | 2018 101 29 9.526372 6.35 10 0.60 2017 |
25. | 2019 101 30 9.7691032 5.05 10 0.60 2017 |
26. | 2020 101 31 9.9383776 2.32 9 0.67 2017 |
|-----------------------------------------------------------------------------|
27. | 2008 102 78 8.6845703 12.13 13 0.54 2008 |
28. | 2009 102 79 8.6864295 13.6 14 0.57 2008 |
29. | 2010 102 80 8.7875256 17.45 14 0.50 2008 |
30. | 2011 102 81 8.7579409 10.16 14 0.50 2008 |
31. | 2012 102 82 8.7663943 8.73 14 0.50 2008 |
32. | 2013 102 83 8.7385752 8.72 12 0.58 2008 |
33. | 2014 102 84 8.6929935 8.78 13 0.46 2008 |
34. | 2015 102 85 8.6901376 6.91 13 0.54 2008 |
35. | 2016 102 86 8.6706007 7.24 13 0.54 2008 |
36. | 2017 102 87 8.6550403 6.83 11 0.64 2008 |
37. | 2018 102 88 8.659387 6.28 14 0.50 2008 |
38. | 2019 102 89 8.6372847 4.62 14 0.50 2008 |
39. | 2020 102 90 8.6020857 6.78 13 0.54 2008 |
|-----------------------------------------------------------------------------|
40. | 2008 103 203 8.7236869 3.29 11 0.18 2020 |
41. | 2009 103 204 8.7434838 4.85 12 0.00 2020 |
42. | 2010 103 205 8.8612934 5.82 13 0.23 2020 |
43. | 2011 103 206 8.6957242 8.11 13 0.23 2020 |
44. | 2012 103 207 8.6151363 6.63 13 0.23 2020 |
45. | 2013 103 208 8.6071253 4.28 13 0.15 2020 |
46. | 2014 103 209 8.6200385 1.02 13 0.23 2020 |
47. | 2015 103 210 8.7053974 4.94 10 0.30 2020 |
48. | 2016 103 211 8.7751941 2.21 8 0.38 2020 |
49. | 2017 103 212 8.1476067 2.86 8 0.38 2020 |
50. | 2018 103 213 8.1825872 2.62 8 0.50 2020 |
51. | 2019 103 214 8.2424405 2.81 11 0.36 2020 |
52. | 2020 103 215 8.1071175 4.26 11 0.36 2020 |
|-----------------------------------------------------------------------------|
53. | 2008 104 13 7.3348064 24 12 0.25 2018 |
54. | 2009 104 14 7.3570318 22.45 12 0.25 2018 |
55. | 2010 104 15 7.4173521 21.7 12 0.33 2018 |
56. | 2011 104 16 7.413114 17.01 12 0.33 2018 |
57. | 2012 104 17 7.4088184 14.08 12 0.33 2018 |
58. | 2013 104 18 7.2870817 6.59 12 0.33 2018 |
59. | 2014 104 19 7.1302522 3.42 12 0.33 2018 |
60. | 2015 104 20 7.1015368 5.51 12 0.33 2018 |
61. | 2016 104 21 7.1023849 5.4 13 0.31 2018 |
62. | 2017 104 22 7.1125955 3.02 12 0.33 2018 |
63. | 2018 104 23 7.1087015 2.47 12 0.33 2018 |
64. | 2019 104 24 7.162661 2.41 12 0.33 2018 |
65. | 2020 104 25 7.1580135 3.29 12 0.33 2018 |
|-----------------------------------------------------------------------------|
66. | 2008 105 4 10.065054 3.75 13 0.31 2019 |
67. | 2009 105 5 10.512111 6.62 13 0.31 2019 |
68. | 2010 105 6 10.49949 5.86 12 0.33 2019 |
69. | 2011 105 7 10.572496 6.84 12 0.33 2019 |
70. | 2012 105 8 10.590566 7.96 11 0.27 2019 |
71. | 2013 105 9 10.67348 12.5 10 0.30 2019 |
72. | 2014 105 10 10.759242 7.48 11 0.27 2019 |
73. | 2015 105 11 10.682904 7.33 13 0.23 2019 |
74. | 2016 105 12 10.725841 1.88 15 0.20 2019 |
75. | 2017 105 13 10.941004 4.54 15 0.20 2019 |
76. | 2018 105 14 10.908137 2.96 15 0.20 2019 |
77. | 2019 105 15 10.865306 5.39 15 0.20 2019 |
78. | 2020 105 16 10.755368 .75 15 0.20 2019 |
|-----------------------------------------------------------------------------|
79. | 2008 106 13 6.7526313 17.93 10 0.30 2019 |
80. | 2009 106 14 6.1340161 1.49 10 0.30 2019 |
81. | 2010 106 15 6.2635411 3.91 9 0.33 2019 |
82. | 2011 106 16 5.9775101 -.95 10 0.30 2019 |
83. | 2012 106 17 6.0178655 -2.44 9 0.44 2019 |
84. | 2013 106 18 5.9937145 -1.4 11 0.27 2019 |
85. | 2014 106 19 6.1611757 .47 11 0.27 2019 |
86. | 2015 106 20 6.7411185 13.14 13 0.31 2019 |
87. | 2016 106 21 6.5283453 7.74 11 0.55 2019 |
88. | 2017 106 22 6.2409929 1.34 8 0.62 2019 |
89. | 2018 106 23 6.3969697 -1.2 8 0.62 2019 |
90. | 2019 106 24 6.8377372 4.92 9 0.67 2019 |
91. | 2020 106 25 7.115379 13.95 6 0.83 2019 |
|-----------------------------------------------------------------------------|
92. | 2008 107 74 6.5865688 3.64 11 0.45 2021 |
93. | 2009 107 75 6.4584417 -8.87 10 0.50 2021 |
94. | 2010 107 76 6.7990547 7.35 9 0.56 2021 |
95. | 2011 107 77 6.9481714 10.78 9 0.56 2021 |
96. | 2012 107 78 7.0527072 11.86 9 0.44 2021 |
97. | 2013 107 79 7.0544626 6.53 8 0.62 2021 |
98. | 2014 107 80 6.8116496 2.74 8 0.62 2021 |
99. | 2015 107 81 6.9362028 2.08 9 0.67 2021 |
100. | 2016 107 82 7.0051923 1.27 8 0.88 2021 |
101. | 2017 107 83 6.9890643 2.59 10 0.60 2021 |
102. | 2018 107 84 6.9358868 7.72 10 0.50 2021 |
103. | 2019 107 85 6.9870931 9.25 7 0.43 2021 |
104. | 2020 107 86 6.6464983 -.23 7 0.43 2021 |
|-----------------------------------------------------------------------------|
105. | 2008 108 79 6.3838495 5.47 24 0.33 2008 |
106. | 2009 108 80 6.5338324 3.92 24 0.46 2008 |
107. | 2010 108 81 6.4887717 8.09 23 0.35 2008 |
108. | 2011 108 82 6.5650635 4.47 17 0.41 2008 |
109. | 2012 108 83 6.4398391 4.1 20 0.40 2008 |
110. | 2013 108 84 6.4298388 3.53 20 0.40 2008 |
111. | 2014 108 85 6.3891495 3.13 20 0.40 2008 |
112. | 2015 108 86 6.3853816 3.18 20 0.40 2008 |
113. | 2016 108 87 6.3220894 2.67 20 0.40 2008 |
114. | 2017 108 88 6.3571904 3.55 19 0.37 2008 |
115. | 2018 108 89 6.4126045 2.67 19 0.37 2008 |
116. | 2019 108 90 6.4153866 3.28 20 0.40 2008 |
117. | 2020 108 91 6.3758371 3.73 15 0.53 2008 |

As you can see, I have all the data for all companies form 2008 until 2020. The year the law was passed is 2011.
Now I will show you the code I gave stata.

. xtset CompanyID Year, yearly

Panel variable: CompanyID (strongly balanced)
Time variable: Year, 2008 to 2020
Delta: 1 year

. generate Time=(Year>2011)

. generate Treated=.
(663 missing values generated)

. replace Treated=0 if CriteriaMet<=2011
(65 real changes made)

. replace Treated=1 if CriteriaMet>2011
(598 real changes made)

. replace Treated=0 if CriteriaMet<=2011 & Year<2011
(0 real changes made)

. generate TimeTreated=Time*Treated

. xtdidregress (ROA Age LnSales BoardMembers PercentageIndependentDirectors)(TimeTreated), group(CompanyID) time(Year)

The outcome:

xtdidregress (ROA Age LnSales BoardMembers PercentageIndependentDirectors)(TimeTreated), group(CompanyID) time(Year)
note: 2020.Year omitted because of collinearity.

Number of groups and treatment time

Time variable: Year
Control: TimeTreated = 0
Treatment: TimeTreated = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
CompanyID | 5 46
-------------+---------------------
Time |
Minimum | 2008 2012
Maximum | 2008 2012
-----------------------------------

Difference-in-differences regression Number of obs = 663
Data type: Longitudinal

(Std. err. adjusted for 51 clusters in CompanyID)
------------------------------------------------------------------------------------------------
| Robust
ROA | Coefficient std. err. t P>|t| [95% conf. interval]
-------------------------------+----------------------------------------------------------------
ATET |
TimeTreated |
(1 vs 0) | -2.682235 2.406841 -1.11 0.270 -7.516517 2.152048
------------------------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, panel effects, and time effects.
I don't understand why they would only use 2008-2012. I have a wider range of data and it is all there. Did I do something wrong?

Thanks in advance for your help
Kind regards
Marie De Tollenaere

Saturday, April 29, 2023

Understanding and interpreting the large coefficients of control variables in my results table

Hi,

I am running a panel regression with a DiD setup and fixed-effects. I have obtained significant results for the years that I am interested in. However, there is one thing that worries me. Although my did estimators (interaction btw post & treated) have negative and significant coefficients (which proves my hypotheses), their impact might be dwarfed by the larger impact of the control variables (GDP Growth, INFLATION, UNEMPLOYMENT, RATES). For instance, the impact of rates are 17 (?) which seems very large compared to my did estimator. My dependent variable is loan growth.

I made sure that each of them scaled to the same style (i.e., loan growth = 0.02, gdp growth = 0.01 etc.)

Now, while I am writing this post, I realized what might be an issue. My dependent variable is loan growth which is calculated as annual % changes between years. But the control variable interest rates is not. They are the average duration of 10Y Government Bond yields in a given year (i.e, 2016 = 1%, 2017 = 2. this control variable is not calculated as % changes from year to year). On the other hand, the coefficient of GDP growth is rather small. Perhaps this is because it is calculated as the growth between the GDP between years. So it is consistent with the dependent variable.

I am confused about interpreting this result and would like to receive your valuable insights.

dependent variable: loan growth	(1)	(2)	(3)
VARIABLES	2019	2020	2021

did_estimator_2019	-0.0487***
	(0.0161)
post2019	0.0186*
	(0.0112)
GDP Growth	0.234***	0.230***	0.237***
	(0.0765)	(0.0763)	(0.0756)
UNEMPLOYMENT	2.644***	2.491***	3.071***
	(0.343)	(0.351)	(0.360)
INFLATION	2.148***	3.240***	1.828***
	(0.293)	(0.609)	(0.288)
INTEREST RATES	17.55***	16.18***	17.33***
	(1.024)	(0.751)	(0.696)
did_estimator_2020		-0.0353***
		(0.0119)
post2020		-0.0140
		(0.0111)
did_estimator_2021			-0.0224*
			(0.0121)
post2021			0.0416***
			(0.00482)
Constant	-0.418***	-0.389***	-0.434***
	(0.0213)	(0.0127)	(0.0129)

Observations	2,227	2,227	2,227
R-squared	0.535	0.535	0.543
Number of ID	373	373	373

Standard errors with new -xthdidregress- command in Stata 18

I am trying to understand how -xthdidregress- estimates standard errors for difference-in-differences with two-way fixed effects (TWFE). The help manual suggests standard errors are clustered by treatment level by default. However, using the -regress- command gives me different results. Does anyone know why?

Reproducible example below using a panel of three states from 2001 to 2005. State 1's treatment begins in 2003. The point estimates of interest using -xthdidregress- and -regress- are identical, but their standard errors differ.

Code:

clear all

input state year gdp post2003 treatmentGroup treated
1 2001 100 0 1 0
1 2002 115 0 1 0
1 2003 95 1 1 1
1 2004 87 1 1 1
1 2005 73 1 1 1
2 2001 113 0 0 0
2 2002 117 0 0 0
2 2003 121 1 0 0
2 2004 125 1 0 0
2 2005 129 1 0 0
3 2001 47 0 0 0
3 2002 53 0 0 0
3 2003 59 1 0 0
3 2004 62 1 0 0
3 2005 66 1 0 0
end

*Set panel
xtset state year, yearly

Code:

*xthdidregress using default VCE

xthdidregress twfe (gdp) (treated), group(state) hettype(time)

note: variable _did_cohort, containing cohort indicators formed by treatment variable treated and group variable state, was added to the dataset.

Computing ATETs using margins ...

Treatment and time information

Time variable: year
Time interval: 2001 to 2005
Control:       _did_cohort = 0
Treatment:     _did_cohort > 0
-------------------------------
                  | _did_cohort
------------------+------------
Number of cohorts |           2
------------------+------------
Number of obs     |
    Never treated |          10
             2003 |           5
-------------------------------

Heterogeneous-treatment-effects regression               Number of obs    = 15
                                                         Number of panels =  3
Estimator:       Two-way fixed effects
Panel variable:  state
Treatment level: state
Control group:   Never treated
Heterogeneity:   Time

                                  (Std. err. adjusted for 3 clusters in state)
------------------------------------------------------------------------------
             |               Robust
        Time |       ATET   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2003  |        -20   1.984313   -10.08   0.010    -28.53781   -11.46219
       2004  |      -31.5   1.322876   -23.81   0.002    -37.19187   -25.80813
       2005  |      -49.5   1.322876   -37.42   0.001    -55.19187   -43.80813
------------------------------------------------------------------------------

Code:

*xthdidregress explicitly specifying default VCE

xthdidregress twfe (gdp) (treated), group(state) hettype(time) vce(cluster state)

note: variable _did_cohort, containing cohort indicators formed by treatment variable treated and group variable state, was added to the dataset.

Computing ATETs using margins ...

Treatment and time information

Time variable: year
Time interval: 2001 to 2005
Control:       _did_cohort = 0
Treatment:     _did_cohort > 0
-------------------------------
                  | _did_cohort
------------------+------------
Number of cohorts |           2
------------------+------------
Number of obs     |
    Never treated |          10
             2003 |           5
-------------------------------

Heterogeneous-treatment-effects regression               Number of obs    = 15
                                                         Number of panels =  3
Estimator:       Two-way fixed effects
Panel variable:  state
Treatment level: state
Control group:   Never treated
Heterogeneity:   Time

                                  (Std. err. adjusted for 3 clusters in state)
------------------------------------------------------------------------------
             |               Robust
        Time |       ATET   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2003  |        -20   1.984313   -10.08   0.010    -28.53781   -11.46219
       2004  |      -31.5   1.322876   -23.81   0.002    -37.19187   -25.80813
       2005  |      -49.5   1.322876   -37.42   0.001    -55.19187   -43.80813
------------------------------------------------------------------------------

Code:

*regress

reg gdp treated#i(2003/2005).year post2003 treatmentGroup i.state i.year, vce(cluster state)

note: 3.state omitted because of collinearity.
note: 2005.year omitted because of collinearity.

Linear regression                               Number of obs     =         15
                                                F(2, 2)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.9967
                                                Root MSE          =     2.7544

                                    (Std. err. adjusted for 3 clusters in state)
--------------------------------------------------------------------------------
               |               Robust
           gdp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
  treated#year |
       1 2003  |        -20   2.173707    -9.20   0.012     -29.3527    -10.6473
       1 2004  |      -31.5   1.449138   -21.74   0.002    -37.73514   -25.26486
       1 2005  |      -49.5   1.449138   -34.16   0.001    -55.73514   -43.26486
               |
      post2003 |   19.16667   3.392803     5.65   0.030     4.568614    33.76472
treatmentGroup |       56.8   1.014396    55.99   0.000      52.4354     61.1646
               |
         state |
            2  |       63.6   2.70e-14  2.4e+15   0.000         63.6        63.6
            3  |          0  (omitted)
               |
          year |
         2002  |   8.333333   5.660781     1.47   0.279    -16.02304    32.68971
         2003  |       -7.5   .7245688   -10.35   0.009    -10.61757   -4.382432
         2004  |         -4   2.36e-08 -1.7e+08   0.000           -4          -4
         2005  |          0  (omitted)
               |
         _cons |   46.53333   3.165456    14.70   0.005     32.91348    60.15319
--------------------------------------------------------------------------------

Choosing the Correct Panel Data Model

Hi there, I am estimating a fiscal reaction function in dynamic form (lagged dependent variable) and trying to choose the right model. My dataset is N=13 and T=18. A previous study has used system GMM but I am doubtful about the method because from Roodman (2009) system GMM is dubious for N<20. Also the system GMM method was developed under the assumptions of N large and T small. My data sets shows signs of endogeneity, cross-sectional correlation, heteroscedasticity and autocorrelation. Another study on the same region use Panel FE IV regressions but this does only takes care of endogeneity and not the other issues. I am thinking that a panel model with 2 lags of the endogenous variable and panel corrected standard errors should suffice? May also with FGLS?

Newt to Statalist.

Baseline Balance - T-Test - Missing Values

I want to create a balance table on STATA. It is a baseline dataset conducted as a part of an RCT. However, some of the variables have missing values. The number of missing values the variables have varies from 18 to 97. N is 2000, and the data is at the household level. How do I account for these missing values while running the t-test? What is the command I use on STATA?

Thanks!

Help with stcrreg / competing regression

Dear All,

I am trying to perform a competing regression analysis but receiving the following error.

'option compete(): competing risks events must be stset as censored r(459);'

Code as below:
stset FU_MA_YEARS, failure(CVE_MAJ_AMP==1) scale(1)
stcrreg IA, compete(Death==1)

I believe the error may be related to the fact that both 'death' and 'CVE_MAJ_AMP' are separate variables.
Would this be correct ? If so, what would be the best way to address this ? How can we incorporate both outcomes in the same variable ? Thank you.

bar chart multiple global ifs

Hello,

I am working on my masters thesis and am having some troubles visualizing the data as I would like. I have 15 variable cat_1 to cat_15 that are binary variables 0/1, and I would like to plot them on one single bar graph but only for when the value is 1, not 0. At the moment I have these 15 separate graphs:

graph bar (count) if cat_1 == 1, over(year)
graph bar (count) if cat_2 == 1, over(year)
graph bar (count) if cat_3 == 1, over(year)
graph bar (count) if cat_4 == 1, over(year)
graph bar (count) if cat_5 == 1, over(year)
graph bar (count) if cat_6 == 1, over(year)
graph bar (count) if cat_7 == 1, over(year)
graph bar (count) if cat_8 == 1, over(year)
graph bar (count) if cat_9 == 1, over(year)
graph bar (count) if cat_10 == 1, over(year)
graph bar (count) if cat_11 == 1, over(year)
graph bar (count) if cat_12 == 1, over(year)
graph bar (count) if cat_13 == 1, over(year)
graph bar (count) if cat_14 == 1, over(year)
graph bar (count) if cat_15 == 1, over(year)

Any tips as to how to get these onto the same plot?

Thanks

freduse not working in Stata 18

I just installed Stata 18 and when I use freduse I get this error:

. freduse CTNA, clear
note: https://research.stlouisfed.org/fred...ddata/CTNA.txt redirected to
https://fred.stlouisfed.org/series/C...ddata/CTNA.txt
header entry of unknown type encountered
unknown header type = <!DOCTYPE_HTML_PUBLIC_"-//IETF//DTD_HTML_2.0//EN">
value of unknown header =
no variables defined
r(111);

I am able to import the data using import delimited using "https://ift.tt/oXiUIjZ", clear but it would be great if my old code still worked.

The freduse command is working in Stata 17 (still installed on my computer).

Does CSDID store weight (dripw)?

Hi everyone,

As we know that it's a tradition to show covariates balance before and after weighting.

I am using CSDID with dripw option to balance my treatment and control groups and I wanna show that dripw improves covariates balance. Does anyone know if CSDID stores weight? or if there are any other ways to show that dripw improves covariates balance?

Thank you so much,
Alex

For example, if we are using the CSDID example:

use https://friosavila.github.io/playing...rdid/mpdta.dta, clear
csdid lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw)

tab first_treat
tab year

*here we generate treat and control groups to show balance for the 2004 treated group before weighting:

gen treat2004=1 if first_treat==2004
replace treat2004=0 if first_treat==0
ttest lpop if year<2004,by(treat2004)

*How do I show that the difference in lpop between treated and control groups improve after using dripw?

Friday, April 28, 2023

Dr

I try to import a Excel file but it shows "<istmt>: 3499 import_excel_import_file() not found". How can I import the file

Importing jason files.

I need to import thousands of JSON files saved on my local hard drive and read in all the variables as string. The JSON files are of very simple structure as follows.

dealing with a time varying variable in stcox

I am analyzing survival in a group of respiratory patients
it turns out that one of my variables (FVC, a measure of respiratory function expressed in percentage of predicted value, that is, the higher the better), has a time-varying effect on survival. If I do
stcox fvc
I get an highly significant effect:
HR .969788, 95% CI .9642006 .9754077
but estat phtest in unhappy:
Test of proportional-hazards assumption, Global test chi2 18.03 , df 1, p= 0.0000

What apparently is happening (plotting the curve of patients above or below the median fvc), is that survival in patients with a better FVC begins to decrease with a delay of about two years compared with those with poorer respiratory function, which makes sense.

For what I understand from the stata survival analysis reference manual, there are two possible (apparently equivalent) solutions:
a) use the tvc and tvexp options, that is:
stcox fvc, tvc(fvc) tvexp(_t>=2)
b) stsplit the data at time 2 and build an interaction variable with time:
stsplit twoyears, at(2)
gen fvc2=fvc*(twoyears==2)
stcox fvc fvc2
I did try both solutions, and they give very similar but slightly different solutions:
solution a gives:
main HR .9502671 (95% CI .939542 .9611146)
tvc HR 1.008867 (1.004696 1.013055)

while solution b:
-------------+----------------------------------------------------------------
fvc HR .9590344 ( .9508496 .9672896)
fvc2 HR 1.020984 ( 1.009306 1.032797)

Reassuringly, this solution made estat phtest, detail happy:

fvc rho 0.05118, chi2 1.29, df 1, p= 0.2566
fvc2 rho -0.02088, chi2 0.20, df 1, p= 0.6525
Global test chi2 1.64, df 2, p= 0.4409

my questions are:
a) Am I correct?
b) Should I care for the very slight difference?
c) what should I do with the coefficient of the interaction variable?
thanks!

Forcing Stata to Post a Singular Variance-Covariance Matrix ("variance matrix is nonsymmetric or highly singular")

Hello all,

I am trying to get around the "variance matrix is nonsymmetric or highly singular" error and force Stata to give me standard errors. Is this possible in Stata? I have looked at other threads, but haven't found a way to force standard errors to be given in them.

Context:
In the project I am working on with my colleague, he is running a regression with a CBSA fixed effect; this means that there are a huge number of regressors and that most of them are sparse indicator variables. When non-clustered standard errors are requested, there is no problem. When clustered standard errors are requested, we get the "variance matrix is nonsymmetric or highly singular" error. This is expected – when the number of regressors exceeds the number of clusters (which is the case with our project), the var-cov matrix is rank-deficient, and valid statistical inference on a limited number of coefficients (but not jointly on all of them at once) can still be conducted. So, we would like to get around the "variance matrix is nonsymmetric or highly singular" error and have Stata give us the var-cov matrix and standard errors anyway, despite the singular nature of the var-cov matrix.

Thursday, April 27, 2023

Margins plots curve lines other than straight line

After using "margins" command, I use "marginsplot" command to graph the margins plots. I get straight line while I find in some other papers, they present curve. How can I get curves other than straight lines?

The graph I get like this:

Array

The graph in other papers like this:

Thanks.

How to find lag and lead time

Hello,

I have this time series data looking at whether food contamination in a school district led to incidence of hospitalizations. I have already conducted other analysis but I am looking to find the lag time between peaks and troughs of food contamination with the incidence of hospitalizations. Does anyone know how to do this easily? I was using the below code which is not at all giving me what I want.

tsset id
dfuller numberofincidents, trend lag(2)
varbasic numberofincidents foodcontamination , lag(1)

Code:
* Example generated by -dataex-. For more info, type help dataex clear input int WeekOf byte id float(numberofincidents foodcontamination) 22283 1 0 102.41084 22290 2 20 74.084435 22297 3 0 61.01071 22304 4 20 34.863262 22311 5 0 17.431631 22318 6 0 15.252678 22325 7 20 17.431631 22332 8 0 10.89477 22339 9 0 10.89477 22346 10 0 13.073724 22353 11 0 6.536862 22360 12 0 13.073724 22367 13 0 19.610586 22374 14 20 30.505356 22381 15 0 28.3264 22388 16 40 28.3264 22395 17 0 15.252678 22402 18 20 6.536862 22409 19 0 6.536862 22416 20 0 8.715816 22423 21 20 10.89477 22430 22 20 2.178954 22437 23 0 2.178954 22444 24 0 4.357908 22451 25 0 6.536862 22458 26 0 6.536862 22465 27 0 15.252678 22472 28 20 17.431631 22479 29 20 74.084435 22486 30 40 78.44234 22493 31 80 56.65281 22500 32 20 69.726524 22507 33 60 71.90548 22514 34 40 54.47385 22521 35 0 71.90548 22528 36 20 98.05293 22535 37 0 80.6213 22542 38 0 87.15816 22549 39 0 50.11594 22556 40 0 32.68431 22563 41 0 32.68431 22570 42 0 30.505356 22577 43 0 41.40013 22584 44 0 56.65281 22591 45 0 74.084435 22598 46 20 61.01071 22605 47 0 135.09515 22612 48 0 93.69502 22619 49 20 93.69502 22626 50 20 296.33774 22633 51 20 745.2023 22640 52 60 1056.7927 22647 53 0 858.5079 22654 54 60 623.1808 22661 55 60 437.9698 22668 56 60 274.54822 22675 57 60 189.569 22682 58 0 102.41084 22689 59 20 78.44234 22696 60 0 65.36862 22703 61 0 47.93699 22710 62 20 50.11594 22717 63 20 61.01071 22724 64 0 93.69502 22731 65 0 132.9162 22738 66 0 204.82167 22745 67 0 211.35854 22752 68 40 222.2533 22759 69 0 392.2117 22766 70 20 394.3907 22773 71 0 416.1802 22780 72 60 418.3592 22787 73 20 337.7379 22794 74 20 385.6749 22801 75 40 366.0643 22808 76 40 383.4959 22815 77 80 372.6011 22822 78 20 337.7379 22829 79 20 268.01135 22836 80 60 350.8116 22843 81 20 324.66415 22850 82 60 344.2747 22857 83 60 300.69565 22864 84 40 300.69565 22871 85 20 246.2218 22878 86 40 211.35854 22885 87 0 220.07436 22892 88 20 265.8324 22899 89 40 246.2218 22906 90 0 209.1796 22913 91 60 180.8532 22920 92 20 185.2111 22927 93 20 185.2111 22934 94 60 169.9584 22941 95 160 165.6005 22948 96 40 165.6005 22955 97 20 102.41084 22962 98 120 159.06364 22969 99 120 119.84247 22976 100 20 176.49527 end format %tdnn/dd/CCYY WeekOf

Inference from initial statistics to get a hang of coefficient of interests

Code:

	Earning	employment	Establishment
Treat	27,425.41	19,458.09	3,829.57
	(10,636.87)	(35,378.12)	(4,147.39)


Control	33,612.52	22,614.83	2,640.76
	(14,158.23)	(40,015.26)	(3,051.18)

Code:

Earn (OLS)	Earn ( IV I)	(Earn IV II)	Emp (OLS)	Emp (IV I)	Emp(IV II)	Est (OLS)	(Est IV I)	Est (IV II)

-.157***	-1.116***	-1.080***	-.031**	-.365**	-.307**	.044***	.256**	.240**
0.038	0.336	0.415	0.014	0.173	0.142	0.009	0.113	0.118

This is my first time working with a new set of data which focuses on retail industry at us county and year level.

I have created summary statistics first to get a hang of the whole dataset so that I understand if the significant result I'm getting for my regressions is really consistent with the data or not.

In my summary stat table, the standard deviation for earnings is not very high in the case of retail sector. [ Standard deviation is given in between the first bracket for treat and control units ].

Despite not having a high standard deviation, my standard error for IV is quite high compared to OLS.

However, even though my OLS regressions are showing a comparatively lower coefficient, my IV coefficients, and start error - both are on the rather higher side.

Normally, the IV coefficient does that compared to old, but at the same time do you think this has something to do with me mishandling of data ??

Calculate average of the (pairwise) sample cross-correlations

Dear forum,
since this is my first post here and since I am new to Stata, I hope my post fulfills the forum requirements. I want to calculate the pairwise correlations across the (abnormal) returns of 80 firms,
for an event study test statistic and I need the average over all these pairwise correlations. I have read that I might achieve what I intend by using a loop, however I am unable to program
such a loop unfortunately. In the last clumn i have stored a value I also need to construct the test statistic, but for the correlations that value is irrelevant.

Below you can find an excerpt of my data. Any help would be highly appreciated!!
Jonas

input long Date float(resid_MarketModel_1 resid_MarketModel_2 resid_MarketModel_3 BMP_t)
22715 .028857823 -.012030476 .011276288 -17.066809
22718 .001680778 .015519869 .04662051 -17.066809
22719 -.013552702 -.0013050697 -.010423742 -17.066809
22720 .011290512 .0247944 .022345623 -17.066809
22721 -.017967667 .0013580762 -.02512785 -17.066809
22722 -.009187383 -.011011245 -.0087083 -17.066809
22725 .003578504 -.0034927686 -.00557095 -17.066809
22726 -.005407726 .010930496 .006511768 -17.066809
22727 -.013838122 -.01001027 -.015077838 -17.066809
22728 -.00786587 -.006517837 -.007176251 -17.066809
22729 .0239404 .003767889 .027022535 -17.066809
22732 -.011907137 -.014157334 -.02665999 -17.066809
22733 .005552202 -.008957997 -.009840348 -17.066809
22734 -.023531053 .00018480085 -.02001087 -17.066809
22735 .0040255897 -.017226135 -.009717827 -17.066809
22736 -.014861178 -.010953325 -.007508885 -17.066809
22739 -.012034866 -.003144226 -.01649329 -17.066809
22740 -.0007985968 -.001498356 .007198072 -17.066809
22741 -.0037047905 -.006126278 -.0025296016 -17.066809

urlencoding() usage

Hi all,

Anyone understand the usage of urlencoding()?

string scalar urlencode(string scalar s , real scalar useplus )
string scalar urldecode(string scalar s)

Never managed to use this command..

required the sampled required the sampled firms to have at least five observations in the data set

please i required the sampled required the sampled firms to have at least five observations in the data set WHAT The code i need to add to my regression

reg pensionassetallocationequitywh csopresence firmsizewh ROAwh operationcashflowvolatilitywh leveragewh dividendpayoutwh boardsizewh boardindependencewh contribu
tionwh pension_sizewh durationwh planreturnswh discountratewh mergersaquisitions i.sic i.year

What should I do to achieve the following result.?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 code str29 value
"A " "test1test1test1test1。"
"A " "test2test2test2test2。"
"A " "test3test3test3test3。"
"B " "test1test1test1test1。"
"B " "test2test2test2test2。"
"B " "test3test3test3test3。"
"B " "test4test4test4test4。"
"C " "test1test1test1test1。"
"C "  "test2test2test2test2。"
end

The results I want to achieve are:
code value
A test1test1test1test1。test2test2test2test2。test 3test3test3test3。
B test1test1test1test1。test2test2test2test2。test 3test3test3test3。test4test4test4test4。
C test1test1test1test1。test2test2test2test2。

Wednesday, April 26, 2023

Create new variables using two categorical variables

I have three variables gender sector and wf. Gender and sector are categorical variables, where gender takes values between male and female & sector takes a value between rural and urban. wf is a binary variable which takes the value 1 if the individual is in the workforce and zero otherwise. I want to create four new variables rural_male_wf, rural_female_wf, urban_male_wf and urban_female_wf that take value from variable wf if an individual is from the rural sector and male and so on. Can someone please help me regarding the same?

clear
input str5(sector gender) float(wf rural_male_wf rural_female_wf urban_male_wf urban_female_wf)
"rural" "male" 0 0 . . .
"rural" "femal" 1 . 1 . .
"urban" "male" 1 . . 1 .
"rural" "male" 1 1 . . .
"urban" "femal" 0 . . . 0
"urban" "femal" 0 . . . 0
"rural" "femal" 1 . 1 . .
"rural" "male" 0 0 . . .
"urban" "femal" 1 . . . 1
"urban" "male" 0 . . 0 .
"urban" "male" 1 . . 1 .
"rural" "femal" 0 . 0 . .
"rural" "male" 1 1 . . .
"urban" "male" 0 . . 0 .
end

assign variable label to variable name

Hi
I have a dataset that has some variables, say, X1-X10. The corresponding variable labels are 2018:q1, 2018q2, and so on. i want to rename the variables and set their labels as the new name, such that my new variable names be something like var_2018_q1, var_2018_q2, etc. I would appreciate it if anyone can help.

Thanks,

how to calculate output of r(alpha) (short example included)?

My deepest apologies if this is difficult to work with. I'm not a stata user, i'm just trying to decipher existing code

Input

year	etype	category	value	avg_freq
1996	1	1	25526966.0	8
1997	1	1	3266403.0	8
1998	1	1	1870628.0	8

Code:
import excel "\\inputdata.xlsx", firstrow case(lower)

levelsof category, l(smoothingcategory)

foreach i of local smoothingcategory {
keep if etype==1 & value != . & category==`i'

tsset year
tssmooth exponential smoothed=value, samp0(2) forecast(1)
gen parameter=r(alpha)

if (parameter > .5 & avg_freq > 50) {
drop smoothed parameter
tssmooth exponential smoothed=value, parms(.5) samp0(2) forecast(1)
gen parameter=r(alpha)
}
else if (avg_freq < 50 & parameter > .1) {
drop smoothed parameter
tssmooth exponential smoothed=value, parms(.1) samp0(2) forecast(1)
gen parameter=r(alpha)
}

replace category=`i' if missing(category)
}

Output
Can someone explain how "smoothed" and "parameter" are calculated? anyone have a formula they can share?

year	etype	category	value	avg_freq	smoothed	parameter
1996	1	1	25526966	8	14396684	0.0001305
1997	1	1	3266403	8	14398137	0.0001305
1998	1	1	1870628	8	14396684	0.0001305

Creating an Abortion scale

Hello Everyone,

I am trying to create a 7-point scale for attitudes towards abortion for every decade using GSS from 1980 to 2021. So, the command I used for the 1980s is

gen scale_1980s= (abany*2 + abdefect*.3 + abnomore*1.5 + abhlth*.1 + abpoor*1.5 + abrape*.3 + absingle*1.3) if year==1980 | year==1982 | year==1983| year==1984 | year==1985 | year==1987 | year==1988 | year==1989

I have given values to each variable based on response frequency. If a variable has a higher response, it has lower values. For example, 81.94 percent supported abortion ofr rape (abrape). Thus, it has a value of 0.2.

Is this the correct way of doing it?

Stata 18 and Statalist

With the most welcome release of Stata 18, the FAQ Advice has as is customary been tweaked slightly. Members are please asked to take special case to flag the version they are using if it seems possible that answers depend on the version being used.

Interpret log-linear coefficient of a percentage type

Extremely trivial question but I just want to ensure I have the right interpretation.

I have the following model: ln(y) = b0 + b1 X1

where X1 is a percentage type. My coeffcient b1 = 0.0096, so would this be interpreted as:

a 1% change in X1 will increase y by 0.0096%, or am I mistaken?

Thank you

continuous interaction variable and margins

Tuesday, April 25, 2023

Solving a system of nonlinear equations with summations

This is my first time posting in the forum, so my apologies if this question isn't clear enough or doesn't meet all the standards. I did read the advice on posting.

I want to solve the following equation (see attached image) in Stata for each w_j from j=1 to 188. I have data on L_{Fj}, d_{ij}, and L_{Ri} as well as the constant value for \varepsilon. My data is at the i, j level.

dest_nb_workers_pre is L_{Fj}
origin_nb_workers_pre is L_{Ri}
time_pre is d_{ij}

Any suggestions on what commands I should use or just what strategies in general? I aim to have a new variable (say, wages) with the w_j for each destination_id. Thanks for any help!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(origin_id destination_id) long(dest_nb_workers_pre origin_nb_workers_pre) float time_pre
 11 1 11568 16923 12.509804
155 1 11568 10075  25.47222
 27 1 11568 29488 19.314816
119 1 11568  9281 26.979166
108 1 11568 35021  18.75926
161 1 11568  9769  25.40476
107 1 11568 33708 18.383333
  2 1 11568 25126  21.73684
136 1 11568 31699  19.12745
 87 1 11568 18090 30.285715
 79 1 11568 12699 32.433334
 39 1 11568 33048 15.880953
101 1 11568 20942 13.071428
143 1 11568 11053 25.142857
 36 1 11568 30324 11.254386
186 1 11568  8450  24.03333
134 1 11568  8053 21.666666
  3 1 11568 13570 20.203703
 68 1 11568 10037 33.696968
124 1 11568 12811  41.85417
125 1 11568 37678     21.75
 60 1 11568 22955  29.40476
 32 1 11568 15031      7.15
168 1 11568 13771   21.4375
 29 1 11568 48756 13.308642
 24 1 11568 20245 21.526316
129 1 11568 18135 19.210976
138 1 11568 38623 15.458427
113 1 11568 21499  20.17857
 95 1 11568 26565 21.145834
 16 1 11568 27891 14.953704
 91 1 11568 20309 26.083334
 83 1 11568 10129  31.59524
117 1 11568  6483      32.9
 22 1 11568 22016  15.50926
 23 1 11568 15647  19.61111
 70 1 11568 13890   27.6875
130 1 11568 24611  23.58974
170 1 11568 11920 29.020834
 76 1 11568 19319  30.79487
102 1 11568 31378 12.261905
 55 1 11568 11421 33.729168
 96 1 11568 28450 18.833334
140 1 11568 19779 30.641666
120 1 11568  7435 30.574074
 57 1 11568 18720  31.02778
 15 1 11568 22508 12.287828
 43 1 11568 18596 21.927084
183 1 11568  7033 22.366667
158 1 11568 28382     28.38
133 1 11568 13059  21.61111
152 1 11568 11078 26.291666
181 1 11568  7346 24.583334
 12 1 11568 34221 12.904762
 88 1 11568 33010  24.78125
 35 1 11568 28795  10.73077
180 1 11568  7636 28.166666
157 1 11568 23929     26.64
150 1 11568  7652  28.25904
 72 1 11568 19315 29.960945
 44 1 11568 11533  20.72222
182 1 11568  7307      33.5
 42 1 11568 36102  22.61111
 82 1 11568 11991 29.703703
 33 1 11568 18661      9.25
 46 1 11568 18178 11.766666
 38 1 11568 23380 16.461538
 89 1 11568 21118 27.166666
105 1 11568 17771    9.8125
144 1 11568  5836        26
 78 1 11568 19862    36.075
146 1 11568  8054 29.805555
148 1 11568  7251  29.72222
166 1 11568 10311  18.78833
 71 1 11568 14571 30.020834
126 1 11568 12807 18.958334
154 1 11568 17941  26.51282
 18 1 11568 23908  5.236111
 75 1 11568  9317 32.633335
  8 1 11568 11364 15.242424
128 1 11568 30389 18.759226
  7 1 11568 10443 19.020834
 25 1 11568  6855 13.962963
178 1 11568 12847 24.469696
 97 1 11568 18639 17.656862
  1 1 11568 11568         0
156 1 11568 23595  24.46667
141 1 11568 13920  26.39394
153 1 11568  9940  17.13889
167 1 11568 39253 19.731884
 14 1 11568 19615  5.569445
175 1 11568  8301 37.416668
122 1 11568 10140 32.541668
 99 1 11568 23426 14.309524
 41 1 11568 21956 18.255556
114 1 11568  9324 14.166667
110 1 11568 22293  22.19697
 34 1 11568  8062  9.595238
  5 1 11568  7015    19.875
151 1 11568  6818  25.90476
end

Newey West SE for a sample with same date in different quintiles

Here is my data example. For one date, I have different values of term in different quintile (quint).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long date float term double sprtrn float quint
12425 .013541103 .005951362 1
12425  .01908227 .005951362 2
12425  .02498053 .005951362 3
12425 .030166345 .005951362 4
12425  .05550182 .005951362 5
12425  .03056108 .005951362 .
end
format %td date

That's why when I use tsset, the message error is "repeated time values in sample"

Code:

tsset date
repeated time values in sample

Therefore, I cannot use newey (or newey2) to adjust Newey West standard error.

Could you please show me how to deal with problem?

Problem with prediction after mlogit

Hi,

I am working with a Household survey for 2022, where each row is an individual. I have information on demographics as well as employment variables. I want to simulate each individual's employment status and sector for 2024. For simplicity, assume we are working with 2 sectors. What I have done so far is to create the following categorical variable (I called it sector_all): 1 if employed in sector A, 2 if employed in sector B, 3 if unemployed, and 4 if out of the labor force. Using this as a dependent variable, I run the following multinomial logit regression:
mlogit sector_all gender married children indigenous c.age i.educ i.rural hh_size i.region

which I then used to predict the probability that each individual falls in each of the four categories of sector_all:
predict p1 p2 p3 p4, pr

Now, I would like to use these probabilities to create a simulated version of sector_all, but for 2024. The caveat is that I would like the distribution of workers in 2024 to follow macro growth data in each sector. Lets imagine that sector A is projected to grow 5% in that period, and sector B is projected to decrease in 3%; then I would like that the number of workers in sectors A and B t represent those growth rates.

I am having a lot of trouble to find a way to do this.So far, I have obtained for each person the highest probability across all four categories, and to which sector it corresponds (i.e. the most likely sector they would move to) - see my code below
egen highest_p = rowmax(p1-p4) /*Highest probability*/
forval i = 1/8 {
gen aux`i' = `i' if p`i'== highest_p
}
egen pred_sector_all = rowmax(aux*) /*Predicted sector*/

I have tried generating a random number from a uniform distribution and compare it to this probability, and decide if an individual moves or not based on this comparison, but it never converges to the numbers I need.
gen sector_form2024 = sector_form
gen u = .

loc y_sectorA = 0
loc y_sectorB = 0

while `pred_sectorA' != `y_sectorA' | `pred_sectorB' != `y_sectorB' {

replace sector_form2024 = sector_form
replace u = runiform()

replace sector_form2024 = pred_sector_form if u > highest_p

count if sector_all2024 == 1
loc y_sectorA = r(N)
count if sector_form2024 == 2
loc y_sectorB = r(N)
}
(here pred_sectorA and pred_sectorB are the target number of workers in each corresponding sector after using the growth rates mentioned before)

Any ideas?? Any help would be much much appreciated.

using coefplot with xtreg

I am using

HTML Code:

xtreg

to run an event study model and would like to use coefplot with it. I have used regress command to generate plots of coefficients using coefplot with no issues. However, now I have the following regression:

HTML Code:

xtreg outcome ib9.time_indicator##ib0.treatment [w = weight], fe robust cluster(state)

, allowing me to use analytic weights to ensure estimates reflect the sample. When using this specification with coefplot (code:

HTML Code:

coefplot drop(`vars') vertical yline(0)

), it is clear that the coefficient estimates visualized with coefplot don't match the values in the weighted regressions. Given the propietary nature of my data, I can't share an extract, but wanted to ask: have folks used weights with coefplot effectively so that estimates from the weighted regression and coefplot align?

Monday, April 24, 2023

IBES Analyst Mean Forecast (EPS) for Year 1-3

Dear all,
I have downloaded the analyst forecast data from IBES and now want to create panel data (Using CUSIP and firm year). Particularly, I want to create variables for the mean forecasts of EPS (given in the variable "MEASURE") 1, 2, and 3. I need your guidance in preparing the panel data using the IBES data below:

TICKER CUSIP OFTIC CNAME STATPERS MEASURE FISCALP FPI NUMEST MEANEST FPEDATS ANNDATS_ACT ANNTIMS_ACT
0001 26878510 EPE EP ENGR CORP 20dec2018 EPS ANN 1 11 -.19 31dec2018 14mar2019 18:00:00
0001 26878510 EPE EP ENGR CORP 17jan2019 EPS ANN 1 11 -.2 31dec2018 14mar2019 18:00:00
0001 26878510 EPE EP ENGR CORP 14feb2019 EPS ANN 1 9 -.23 31dec2018 14mar2019 18:00:00
0001 26878510 EPE EP ENGR CORP 14mar2019 EPS ANN 1 11 -.67 31dec2019 25mar2020 16:50:00
0001 26878510 EPE EP ENGR CORP 18apr2019 EPS ANN 1 9 -.5 31dec2019 25mar2020 16:50:00
0001 26878510 EPE EP ENGR CORP 16may2019 EPS ANN 1 7 -.59 31dec2019 25mar2020 16:50:00
0001 26878510 EPEG EP ENGR CORP 20jun2019 EPS ANN 1 4 -.55 31dec2019 25mar2020 16:50:00
0001 26878510 EPEG EP ENGR CORP 18jul2019 EPS ANN 1 3 -.64 31dec2019 25mar2020 16:50:00
0001 26878510 EPEG EP ENGR CORP 15aug2019 EPS ANN 1 3 -.63 31dec2019 25mar2020 16:50:00
0001 26878510 EPEG EP ENGR CORP 19sep2019 EPS ANN 1 3 -.66 31dec2019 25mar2020 16:50:00
0001 26878510 EPEGQ EP ENGR CORP 17oct2019 EPS ANN 1 1 -.53 31dec2019 25mar2020 16:50:00
0001 26878510 EPE EP ENGR CORP 20dec2018 EPS ANN 2 12 -.31 31dec2019 25mar2020 16:50:00
0001 26878510 EPE EP ENGR CORP 17jan2019 EPS ANN 2 12 -.52 31dec2019 25mar2020 16:50:00
0001 26878510 EPE EP ENGR CORP 14feb2019 EPS ANN 2 10 -.59 31dec2019 25mar2020 16:50:00
0001 26878510 EPE EP ENGR CORP 14mar2019 EPS ANN 2 10 -.57 31dec2020
0001 26878510 EPE EP ENGR CORP 18apr2019 EPS ANN 2 9 -.55 31dec2020
0001 26878510 EPE EP ENGR CORP 16may2019 EPS ANN 2 8 -.54 31dec2020
0001 26878510 EPEG EP ENGR CORP 20jun2019 EPS ANN 2 4 -.66 31dec2020
0001 26878510 EPEG EP ENGR CORP 18jul2019 EPS ANN 2 3 -.75 31dec2020
0001 26878510 EPEG EP ENGR CORP 15aug2019 EPS ANN 2 3 -.75 31dec2020
0001 26878510 EPEG EP ENGR CORP 19sep2019 EPS ANN 2 3 -.91 31dec2020
0001 26878510 EPEGQ EP ENGR CORP 17oct2019 EPS ANN 2 1 -.48 31dec2020
0001 26878510 EPE EP ENGR CORP 20dec2018 EPS ANN 3 10 -.23 31dec2020
0001 26878510 EPE EP ENGR CORP 17jan2019 EPS ANN 3 10 -.4 31dec2020
0001 26878510 EPE EP ENGR CORP 14feb2019 EPS ANN 3 8 -.53 31dec2020
0001 26878510 EPE EP ENGR CORP 14mar2019 EPS ANN 3 1 -.41 31dec2021
0001 26878510 EPE EP ENGR CORP 18apr2019 EPS ANN 3 4 -.43 31dec2021
0001 26878510 EPE EP ENGR CORP 16may2019 EPS ANN 3 6 -.56 31dec2021
0001 26878510 EPEG EP ENGR CORP 20jun2019 EPS ANN 3 3 -.91 31dec2021
0001 26878510 EPEG EP ENGR CORP 18jul2019 EPS ANN 3 2 -1.14 31dec2021
0001 26878510 EPEG EP ENGR CORP 15aug2019 EPS ANN 3 2 -.97 31dec2021
0001 26878510 EPEG EP ENGR CORP 19sep2019 EPS ANN 3 2 -1.27 31dec2021

Thanks

Problem with xtunitroot llc: throws error r(2000) with strongy balanced panel

I am using the Levin Lin Chiu test in Stata 17 to test for stationarity in a panel of 115 countries, from 1995 to 2015 with a delta of five years. This means N(115) * T(5)= 575 obs.

. xtunitroot llc Trade_to_GDP
no observations
r(2000);

I did the summary and describe of the variables and all are numeric. Also, the panel is strongly balanced. So I don´t understand why it says no obs.

Any help is appreciated, thank you!

Twoway line graph with non-integer intervals

Hi,
I want to create a two-way line graph with the line restricted between two years value. The problem is the subcommand "if" doesn't work since I want to start in the middle of the year whereas my year variable only contains only integer value of years (2011,2012,2013,...). I want the line to start at decimal values (line 2013.5, 2014.,...), which is in the middle of the interval. Two-way graphs seem to automatically round the year range so that it is contained within an integer value range.

Below is a simple replication example:

Code:

clear 
set obs 10
gen year = _n + 2010
gen value = 2
twoway (line value year if year >= 2013.5 & year <= 2015.5), xlabel(2011(1)2020)

The line is between 2014 and 2016 whereas I want it to between 2013.5 and 2015.5.
Thank you!

Error in post command: unknown function () post: above message corresponds to expression 1, variable

I´m getting an error in my code when using post command but I don´t know why. The code is reproducible with auto dataset:

Code:

clear all

tempfile locations
tempname locat

postfile `locat' str32(x1 x2 x3 x4) using `locations', replace

sysuse auto

gen population = 1

gen domestico = 1 if foreign == 0
gen externo = 1 if foreign == 1

gen alto = 1 if gear_ratio <= 2
gen bajo = 1 if gear_ratio > 2

gen cost_h = 1 if price >= 10000
gen cost_l = 1 if price < 10000


local place "domestico externo"
local gear_l "alto bajo"
local cost "cost_h cost_l"


foreach p of local place{
    foreach g of local gear_l{
        foreach c of local cost{
                
        summ population  if `c'== 1 & `g' == 1 & `p' == 1
        local r = r(N)
        
        dis "`p'" "`g'" "`c'"  "`r'"
    
        post `locat' ("`c'")("`g'")("`p'")("`r'")
    
        }
    }        
}

Code:

unknown function () post:  above message corresponds to expression 1, variable x1

I´ve tried changing the type of variable in str32() for each variable but nothing works. Thanks

unable to export mean and sd using tabstat

Hello,

I am using the below code to export my descriptive statistics. All is good except that I get an empty column for mean and sd. Can anyone please let me know what I have missed in the code?

estpost tabstat ABWC Restate_dummy GC_dummy if BIG4 == 0, stat(N mean p25 median p75 sd) col(stat)
esttab . using Descriptives.rtf, cells("count mean(fmt(a2)) p25(fmt(a2)) p50(fmt(a2)) p75(fmt(a2)) sd(fmt(a2))") append

Thank you so much!

Different kinds of missings

Dear stata-community,

I collected data on about 2000 participants via an online survey. I am now in the process of cleaning that data (with about 100 variables) and have come across an issue in coding missings. There are two possible scenarios of missings in my data set.

1. someone skipped a question and thus did not answer that single question (see example ID 1)
2. someone answered the first few questions and then stopped, thus discontinued the survey (see example ID 2)
Both missings are coded as "." right now.
A single observation could include both kinds of missings in different variables. (see example ID 3, here Var 2 was not answered and then the survey was discontinued after Var 3)

Example

ID	Var 1	Var 2	Var 3	Var 4	Var 5
1	Yes	.	No	No	N0
2	Yes	.	.	.	.
3	Yes	.	No	.	.

I would now like to replace "." with "-1" for scenario 1 and with "-2" for scenario 2. Does anyone have an idea of how I could do that using a loop or a different form of automization?
I would like to avoid having to go through all observations manually.

Any help is greatly appreciated.
Thanks in advance.
Maike

Specification of triple difference-in-differences with 2 treatment groups and 1 control group

Hello everyone,

I have an outcome which I measure between two dates for three different cohorts. The treatment whose effect I want to measure is COVID-19.
My control group is observed in 2016 (t=0) and 2019 (t=1), my first treatment group is observed in 2017 (t=0) and 2020 (t=1), and my second control group in 2018 (t=0) and 2021 (t=1).

Group	t=0	t=1
Control	2016	2019
Treated 1	2017	2020
Treated 2	2018	2021

I tried to apply a triple difference regression using the following code, but I seem to be specifying something wrong...

Code:

clear
set obs 9999
egen id = seq(), from(1) to(9999) block(1)
gen outcome = 1 + rnormal()
generate year=2016 if id<=3333
replace year=2017 if id>3333 & id <=6666
replace year=2018 if id>6666 & id <=9999
generate t=0
save samplet0, replace
replace year=2019 if id<=3333
replace year=2020 if id>3333 & id <=6666
replace year=2021 if id>6666 & id <=9999
replace t=1
append using samplet0
generate treated1=0
replace treated1=1 if year==2020
generate treated2=0
replace treated2=1 if year==2021

reg outcome t##treated1##treated2

Does anyone have an idea? Thank you in advance!

Ivreg2

Array Hi,

I am writing a thesis on the effect of higher stringency on Covid-19 mortality in US counties. I am comparing all counties that border each other across state borders, and aiming to understand if there was an effect of stringency. After I have run these regressions, I want to try to understand the causality, as a rise in Array

may cause stringency to increase. To do this I am using political party as my instrument, however I am a bit confused on how to use ivreg2. Below is the code line I wrote, and the output it gave. Just wondering if this is correct, and if it is how do I interpret these results?

ivreg2 WeightedMortality StringencyIndex (Party=Pairs)

This is the regression. The thing I am confused about is which way around Party and Pairs go. If they are this way round then these are the results... Where it says Party has been instrumented.

However if I switch around Pairs and Party (Pairs refers to the pair of counties that border each other over state borders, so Florida Escambia and Alabama Baldwin are pair 1, then the next counties that border are pair 2 and so on).

ivreg2 WeightedMortality StringencyIndex (Pairs=Party)

The results are different and it says Party is an excluded instrument. Which way around is correct? How do I interpret these results?

Many thanks to anyone who is able to help!!

Best wishes

Charles

Sunday, April 23, 2023

stset+sttocc matching - nested case-control study

Hi all,

I am designing a nested case-control study using cohort data and would like some help with matching using the stset + sttocc command.

My failure event is the development of a disease (brain cancer).
I have worked out how to match for "sex" and "birth year" using stset and sttocc.

gen xitdate=mdy(xitmo, xitdy, xityr)

stset xitdate, failure(cns==1) id(id) scale(365.25)
set seed 9123456
sttocc, match (sex byr) n(5) nodots
list _set id _case _time sex byr, sepby(_set)

However, I would like my controls to be matched not only for sex and birth year but additionally for date of diagnosis (i.e. each control has the same date as their matched case over the same date of diagnosis).
Is there a way to do this?

Many thanks,

Ye Jin Bang.

Creating a gamma distribution function to estimate likelihoods

I have created a dataset from the following:

set obs 10
gen goals=_n-1
gen alpha=1.4
gen beta=1

I want to use a gamma distribution function to estimate the likelihood for each observation in goals given the alpha and beta parameters defined. I believe i need to do this in mata but have never used this programming language and am at a loss of to create the function in mata and then access that function to generate a new variable

Giulio DiDiodato

Ordered Probit: discontinuous region with missing values encountered when estimating Marginal effect

Dear Statalist

I am using ordered probit model. While calculating the marginal effect it showing the error discontinuous region with missing value. My dependent variable is having 10 categories. I have used the following command

Code:

xtoprobit A170 LnNSDPpcC X003 X011 marriage employ gender dharm1 dharm2 dharm3  health2 health3 edu2 edu3 edu4 class1 class2 So2 No2 BOD formalscindex2 formalscindex3 informalSc2 informalSc3 , vce(robust)

Code:

 margins, dydx(*)

Kindly help me.

Ordered logistic regression with panel data and an interaction term between two binary independent variables

Hi Users,

I have a panel data set covering the years 2011-18 but mainly focusing on 2015-16. I am attempting to use the xtologit command in Stata 17.0.

I have an ordinal dependent variable which is the number of visits to the GP (general practitioner doctor surgery in the UK) in the last 12 months. Categorised as follows: 1 (Very poor health (10+ visits)); 2 (Poor health (4-6 visits)); 3 (Average health (2-3 visits); 4 (Good health (1-2 visits); 5 (Perfect health (0 visits). Whereby this variable is a proxy for health - those in a low category (1, or 2) are visiting the GP more times and are believed to be in worse health.

In my dataset I have a group of individuals who are 'treated' (by receiving an exogenous increase in income) and a group who are 'control' (who are the untreated / counterfactual). I have control variables for female (1= female; 0=male); furthereduc (1= achieved higher education; 0= did not) and age (continuous). I am using a two period difference-in-differences approach to determine if the increase in income has a causal effect on decreasing the number of GP visits - which is why I am most interested in the interpretation of the estimate on the interaction term.

Below is my code and I am struggling to understand how to get the marginal effect of the interaction between post2016 and treated2016_basecase - as I understand it there is little value in reporting the odds ratios or standalone coefficients.

My intention is to be able to make the following comments from the results "The ATET (being treated and in the post period) means an individual is X% less likely to be in category 1 of visit_gp (i.e. very poor health) and the individual is Y% less likely to be in the category 2 of visit_gp (i.e. poor health) and the individual is Z% more likely to be in the category 5 (i.e. perfect health).

Code:

clear 
use Dataset_Regressions.dta
xtset pidp year
keep if year==2015 | year==2016 
*Remove observations which have missing values
drop if treated2016_basecase==. | visit_gp==. | age==. | agesq==. | furthereduc==. | female==. | married==. | homeowner==. | child==. | extraminch==. | permanent==. | commute==. | smoker==. | numcigs==.
*Any individual that only appears once must be dropped because otherwise regression may capture effects of leaving the labour market
bysort pidp: drop if _N==1
*Check to ensure that the size of treatment and control groups is the same in both the pre and post intervention periods
tab year treated2016_basecase

*Ordered logistic regression for GP visits
eststo drop *
*Regression
eststo: xtologit visit_gp i.post2016##i.treated2016_basecase c.age i.furthereduc i.female
*Marginal effects
eststo: margins, dydx(treated2016_basecase) at(treated2016_basecase=1) at(post2016=1)

The output from the above regression is posted below. And I believe after the regression the coefficients can be interpreted as "being treated in the post intervention period there is a higher change of being in one of the lower categories".

The subsequent margins analysis I am trying to isolate the marginal effects of treatment when treatment =1 and post2016 =1 but this output does not seem correct as I would expect it to be:

1 1 1
2 1 1
3 1 1
4 1 1
5 1 1

Please can someone help clarify the approach in particular to correctly specifying the margins command for my purpose.

Thank you!

Code:

Fitting comparison model:

Iteration 0:   log likelihood = -2355.6421  
Iteration 1:   log likelihood = -2337.7227  
Iteration 2:   log likelihood = -2337.6938  
Iteration 3:   log likelihood = -2337.6938  

Refining starting values:

Grid node 0:   log likelihood = -2286.9491

Fitting full model:

Iteration 0:   log likelihood = -2286.9491  
Iteration 1:   log likelihood = -2209.1284  
Iteration 2:   log likelihood = -2204.8774  
Iteration 3:   log likelihood = -2204.8211  
Iteration 4:   log likelihood = -2204.8211  

Random-effects ordered logistic regression           Number of obs    =  1,690
Group variable: pidp                                 Number of groups =    845

Random effects u_i ~ Gaussian                        Obs per group:
                                                                  min =      2
                                                                  avg =    2.0
                                                                  max =      2

Integration method: mvaghermite                      Integration pts. =     12

                                                     Wald chi2(6)     =  28.67
Log likelihood = -2204.8211                          Prob > chi2      = 0.0001

-----------------------------------------------------------------------------------------------
                     visit_gp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
------------------------------+----------------------------------------------------------------
                              |
                     post2016 |
           Post Intervention  |  -.1433743   .1195572    -1.20   0.230    -.3777021    .0909535
                              |
         treated2016_basecase |
                     Treated  |   .3348412   .2164677     1.55   0.122    -.0894277    .7591101
                              |
post2016#treated2016_basecase |
   Post Intervention#Treated  |  -.3885202   .2085959    -1.86   0.063    -.7973605    .0203202
                              |
                          age |   -.007706   .0080193    -0.96   0.337    -.0234236    .0080116
                              |
                  furthereduc |
  Achieved Further Education  |    .244885   .1809999     1.35   0.176    -.1098683    .5996383
                              |
                       female |
                      female  |  -.6936186   .1887752    -3.67   0.000    -1.063611   -.3236261
------------------------------+----------------------------------------------------------------
                        /cut1 |  -5.400528   .4615378                     -6.305126   -4.495931
                        /cut2 |   -3.87221   .4405636                     -4.735699   -3.008721
                        /cut3 |  -1.822361     .42432                     -2.654013   -.9907095
                        /cut4 |   1.126816   .4201844                      .3032702    1.950363
------------------------------+----------------------------------------------------------------
                    /sigma2_u |    4.43263   .5114511                      3.535469    5.557453
-----------------------------------------------------------------------------------------------
LR test vs. ologit model: chibar2(01) = 265.75        Prob >= chibar2 = 0.0000
(est1 stored)

. *Marginal effects
. eststo: margins, dydx(treated2016_basecase) at(treated2016_basecase=1) at(post2016=1)

Average marginal effects                                 Number of obs = 1,690
Model VCE: OIM

dy/dx wrt: 1.treated2016_basecase

1._predict: Pr(1.visit_gp), predict(pr outcome(1))
2._predict: Pr(2.visit_gp), predict(pr outcome(2))
3._predict: Pr(3.visit_gp), predict(pr outcome(3))
4._predict: Pr(4.visit_gp), predict(pr outcome(4))
5._predict: Pr(5.visit_gp), predict(pr outcome(5))

1._at: treated2016_basecase = 1
2._at: post2016             = 1

-----------------------------------------------------------------------------------------
                        |            Delta-method
                        |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
------------------------+----------------------------------------------------------------
0.treated2016_basecase  |  (base outcome)
------------------------+----------------------------------------------------------------
1.treated2016_basecase  |
           _predict#_at |
                   1 1  |  -.0040163   .0062129    -0.65   0.518    -.0161933    .0081607
                   1 2  |   .0019696   .0078703     0.25   0.802    -.0134559    .0173951
                   2 1  |  -.0052691   .0074747    -0.70   0.481    -.0199193    .0093811
                   2 2  |    .002248   .0089558     0.25   0.802    -.0153049     .019801
                   3 1  |  -.0087369   .0112432    -0.78   0.437    -.0307732    .0132995
                   3 2  |   .0031155   .0123593     0.25   0.801    -.0211083    .0273393
                   4 1  |   .0007934    .003308     0.24   0.810    -.0056903     .007277
                   4 2  |  -.0013864   .0056086    -0.25   0.805     -.012379    .0096062
                   5 1  |   .0172289   .0218619     0.79   0.431    -.0256197    .0600775
                   5 2  |  -.0059467   .0235787    -0.25   0.801      -.05216    .0402666
-----------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
(est2 stored)

A snippet of my code is:

Code:

clear
input long pidp int year float(post2016 treated2016_basecase female furthereduc age)
68017687 2015 0 1 1 0 36
68017687 2016 1 1 1 1 37
68116287 2015 0 0 1 0 33
68116287 2016 1 0 1 0 34
68150967 2015 0 0 0 0 54
68150967 2016 1 0 0 0 55
68150975 2015 0 0 0 0 27
68150975 2016 1 0 0 0 29
68155047 2015 0 0 1 0 59
68155047 2016 1 0 1 0 60
68166607 2015 0 1 1 1 60
68166607 2016 1 1 1 1 61
68197895 2015 0 1 0 0 24
68197895 2016 1 1 0 0 25
68244807 2015 0 0 1 0 38
68244807 2016 1 0 1 0 38
68273371 2015 0 1 1 0 49
68273371 2016 1 1 1 0 50
68293099 2015 0 0 0 1 25
68293099 2016 1 0 0 1 26
68317567 2015 0 1 0 0 36
68317567 2016 1 1 0 0 37
68347487 2015 0 1 0 0 58
68347487 2016 1 1 0 0 59
68351571 2015 0 0 0 1 41
68351571 2016 1 0 0 1 42
68356331 2015 0 0 0 0 32
68356331 2016 1 0 0 0 33
68439289 2015 0 0 1 0 53
68439289 2016 1 0 1 0 54
68469887 2015 0 0 1 1 48
68469887 2016 1 0 1 1 49
68474647 2015 0 1 1 0 41
68474647 2016 1 1 1 0 42
68569173 2015 0 0 1 1 24
68569173 2016 1 0 1 1 25
68573253 2015 0 0 1 1 25
68573253 2016 1 0 1 1 26
68622207 2015 0 0 1 0 57
68622207 2016 1 0 1 0 58
68636487 2015 0 0 1 0 43
68636487 2016 1 0 1 0 44
68757527 2015 0 0 0 1 53
68757527 2016 1 0 0 1 55
69007091 2015 0 0 0 1 44
69007091 2016 1 0 0 1 45
69078487 2015 0 0 1 1 36
69078487 2016 1 0 1 1 37
69224007 2015 0 0 0 1 37
69224007 2016 1 0 0 1 39
69264127 2015 0 0 1 0 39
69264127 2016 1 0 1 0 40
69269571 2015 0 0 0 0 55
69269571 2016 1 0 0 0 56
69280447 2015 0 1 0 1 62
69280447 2016 1 1 0 1 63
69625205 2015 0 0 1 1 45
69625205 2016 1 0 1 1 46
69831249 2015 0 1 1 0 47
69831249 2016 1 1 1 0 48
70069925 2015 0 0 1 0 37
70069925 2016 1 0 1 0 38
70625485 2015 0 0 1 0 45
70625485 2016 1 0 1 0 46
70786645 2015 0 1 1 0 54
70786645 2016 1 1 1 0 55
71148405 2015 0 1 0 0 50
71148405 2016 1 1 0 0 51
71148409 2015 0 0 1 0 48
71148409 2016 1 0 1 0 49
71434005 2015 0 1 1 1 39
71434005 2016 1 1 1 1 40
71461209 2015 0 1 1 1 26
71461209 2016 1 1 1 1 27
77031164 2015 0 1 0 0 29
77031164 2016 1 1 0 0 30
81959729 2015 0 0 0 0 48
81959729 2016 1 0 0 0 49
82171885 2015 0 1 1 0 48
82171885 2016 1 1 1 0 49
88785569 2015 0 0 0 1 57
88785569 2016 1 0 0 1 58
88916809 2015 0 1 1 1 38
88916809 2016 1 1 1 1 39
89099729 2015 0 1 1 0 33
89099729 2016 1 1 1 0 34
89300329 2015 0 0 1 1 51
89300329 2016 1 0 1 1 52
89358125 2015 0 0 0 0 52
89358125 2016 1 0 0 0 53
95278889 2015 0 0 0 1 39
95278889 2016 1 0 0 1 40
95416925 2015 0 0 1 0 41
95416925 2016 1 0 1 0 42
95575369 2015 0 0 0 0 43
95575369 2016 1 0 0 0 44
96029605 2015 0 0 0 1 46
96029605 2016 1 0 0 1 47
96029609 2015 0 0 1 0 47
96029609 2016 1 0 1 0 48
end
label values year b_pregno
label values post2016 post2016
label def post2016 0 "Pre Intervention", modify
label def post2016 1 "Post Intervention", modify
label values treated2016_basecase treated2016_basecase
label def treated2016_basecase 0 "Control", modify
label def treated2016_basecase 1 "Treated", modify
label values female female
label def female 0 "male", modify
label def female 1 "female", modify
label values furthereduc furthereduc
label def furthereduc 0 "No Further Education", modify
label def furthereduc 1 "Achieved Further Education", modify