BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Wednesday, July 31, 2019

Plotting marginal effects of interactions (containing factor variables) with the grinter command

Hello,

I am stuck at plotting the marginal effect of the factor variable i.Left_right_placement regarding the following regression:

xi: reg index_EU_support budgetbalance15 membership_length Political_discussions_index i.Gender Age Education i.Left_right_placement i.Community c.budgetbalance15#i.Left_right_placement i.country, vce(robust)

I tried this using the grinter command

grinter budgetbalance15, inter(c.budgetbalance15#i.Left_right_placement) const02(i.Left_right_placement)

but I am having trouble as i.Left_right_placement is coded
1: being politically orientated toward the left-wing,
2: being orientated politically toward the center and
3: being orientated politically toward the right-wing.

The prefix i. is causing trouble. (Regarding my interaction with dummies and continous variables, the grinter command worked perfectly!)

Does anyone have a hint for me how to do this? I am grateful for any advice.

Thank you
Clara

Hansen issue

why i am not getting Hansen statistics?
the command is as follows
xtabond2 fg_ta l.fg_ta size age ia debt_ratio TobinQ fac roa cashratio i.sector_1, gmm(fg_ta, collapse) iv(l.(size age ia debt_ratio TobinQ fac roa cashratio)) robust small

Dynamic panel-data estimation, one-step system GMM
------------------------------------------------------------------------------
Group variable: id_new Number of obs = 5347
Time variable : year Number of groups = 414
Number of instruments = 38 Obs per group: min = 0
F(37, 413) = 0.07 avg = 12.92
Prob > F = 1.000 max = 19
------------------------------------------------------------------------------
| Robust
fg_ta | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fg_ta |
L1. | .000316 .0003189 0.99 0.322 -.0003109 .0009428
|
size | -6.194766 4.592842 -1.35 0.178 -15.22303 2.833497
age | .8060387 .5726115 1.41 0.160 -.3195579 1.931635
ia | -6.45e-07 1.79e-06 -0.36 0.719 -4.17e-06 2.88e-06
debt_ratio | 9.890391 9.390873 1.05 0.293 -8.569479 28.35026
TobinQ | .9944456 .9018276 1.10 0.271 -.778299 2.76719
fac | -1.61e-07 1.59e-07 -1.01 0.312 -4.72e-07 1.51e-07
roa | .0270614 .0651213 0.42 0.678 -.1009491 .1550719
cashratio | 47.05278 35.69258 1.32 0.188 -23.109 117.2146
|
sector_1 |
1 | 0 (empty)
2 | 0 (omitted)
3 | 0 (omitted)
4 | 58.01809 362.6928 0.16 0.873 -654.936 770.9722
5 | 0 (omitted)
6 | 0 (omitted)
8 | 0 (omitted)
9 | 0 (omitted)
10 | 0 (omitted)
11 | 0 (omitted)
12 | 0 (omitted)
13 | -210.308 290.1488 -0.72 0.469 -780.6606 360.0446
14 | 178.4741 201.4681 0.89 0.376 -217.5567 574.505
15 | 0 (omitted)
16 | 0 (omitted)
17 | 29.43242 269.6478 0.11 0.913 -500.6209 559.4858
18 | 335.7025 560.6141 0.60 0.550 -766.3105 1437.715
19 | 0 (omitted)
20 | .0138768 350.6006 0.00 1.000 -689.1703 689.198
21 | 0 (omitted)
22 | -53.83718 614.8621 -0.09 0.930 -1262.487 1154.812
23 | 64.63455 93.24389 0.69 0.489 -118.6573 247.9264
24 | 0 (omitted)
25 | -110.1165 202.9899 -0.54 0.588 -509.1387 288.9057
26 | 0 (omitted)
27 | 0 (omitted)
28 | 0 (omitted)
29 | 0 (omitted)
|
_cons | 45.31213 82.45595 0.55 0.583 -116.7735 207.3978
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(L.size L.age L.ia L.debt_ratio L.TobinQ L.fac L.roa L.cashratio)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/29).fg_ta collapsed
Instruments for levels equation
Standard
L.size L.age L.ia L.debt_ratio L.TobinQ L.fac L.roa L.cashratio
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.fg_ta collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -1.25 Pr > z = 0.211
Arellano-Bond test for AR(2) in first differences: z = -0.81 Pr > z = 0.418
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(0) = 20.88 Prob > chi2 = .
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(0) = 2.98 Prob > chi2 = .
(Robust, but weakened by many instruments.)

Capture change in one variable while different variable is constant

Dear Statalisters,

I am working with a longitudinal dataset. I'm trying to create a variable to capture change between two periods in one variable, but only when a different variable remains constant.

Essentially, my goal is to create a single row for each "wave" by creating a new variable that will represent a specific change in the said variable, and then delete the left-over row for that year.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id_num byte(wave jobcens)
101  1 3
101  2 1
101  3 1
101  5 2
101  5 3
101  6 2
101  8 3
101  9 1
101 10 2
101 10 3
101 11 1

end

From the above code, the cases below represent the type of cases I mean to capture and what I'd like them to be (6, in this case):

101 5 2
101 5 3 --> 101 5 6

101 10 2
101 10 3 --> 101 10 6

I've used the following code in the past to create a change variable:

gen change_var = .

replace change_var = 6 if x1[_n-1] == 1 & x1 == 2 & id_num[_n-1] == id_num

However, the added stipulation I face is to restrict this to only happen when the "wave" is the same. I'm wondering if I will need to turn the dataset wide to do this? And if so, any suggestions on how might I do still achieve the same goal?

Thank you in advance!

Much appreciation,
Wyatt

calculating distance in Stata

Dear All,
I would like to know how I can calculate distance in stata between two cities in a country say for the UK Manchester and Nottingham. I have never done this before. Please advise on how I can go about it. I have information on postcodes and names of local municipalities for the uk in my dataset. I would like to generate a matrix that gives distances between one city and each of the individual cities within the dataset.

Grateful for your help.

Best,

Bridget

tabm and percentages

Hello,

This is a question that has come up after my last couple posts on the Statalist and is now a question I have more generally because I feel that it may be beneficial in the future as well in addition to my current dilemna.

The general question is; is there a possibility, when using -tabm-, to force Stata to generate percentages in the output table?

For the sake of an example, I have this code to generate results of question responses where it could be a "1" or "2" response:

Code:

preserve
keep q10a q10b q10c q10d q10e q10f
findname, all(inlist(@, 1, 2, .)) 
tabm `r(varlist)' 
restore

And it produces this table:

variable	1	2	Total
q10a	101	54	155
q10b	111	20	131
q10c	90	30	120
q10d	150	6	156
q10e	125	26	151
q10f	105	15	120

Is there a way, using -tabm-, to get that table to have the percentage of "1" responses out of the total? So for q10a, the new percentage column would say 65.16%...I do not see any guidance on percentages in the -help tabm- file. Thank you.

Panel Data - Which model to use? Fama macbeth or panel data another model

Hello,

I am currently writing my thesis in Finance.
I want to compare the difference in Turnover rate between Male Investors/ Female investors and also between between Home-based investors and Foreign-Based investors.

Basically my model is as follows:

Turnover rate = Alpha + male/female + Home/Foreign

Turnover rate is a continous variable and male/female and Home/Foreign are dummy variables.

I have 300 investors in my sample over a 24 months period (however it is an unbalanced panel because I don't have the turnover for all the different months)

My question is:

Can I use a Fama Macbeth regression to test for these differences or is there a more suited regression.

so far I have used
xtfmb Turn Residence_Dumm1 Gender_Dumm1.

Thank you.
Yud

If qualifier for n variables equal to zero

Dear Stata community, dear Nick
I am looking for a command which allows me to create an if-statement which refers to about 40 variables.
I have got yearly bilateral FDI data for 40 different classes of foreign direct investment.
I want to delete an observation if a country reports only zeros on all FDI classes.

Thus, something similar to :

Code:

drop if var1-var40=0

I already tried that line but it did drop less observations than using:

Code:

drop if var1 =0 & var2 = 0 ... & var40 = 0

You can see my data below where the ID* abbreviations are representing different classes of FDI (var1-var40):

input str36 name str38 cpname int timeperiod double(IDA_G_DV IDA_G IDL_G IDL_G_DV)
"AfghanistanIslamic Republic of" "Germany" 2016 0 0 0 0
"AfghanistanIslamic Republic of" "Germany" 2017 0 0 0 0
"AfghanistanIslamic Republic of" "Italy" 2009 0 0 0 0
"AfghanistanIslamic Republic of" "Italy" 2010 0 0 0 421572.76040272
"AfghanistanIslamic Republic of" "Italy" 2011 0 0 0 1801631.46384125
"AfghanistanIslamic Republic of" "Italy" 2012 0 0 0 0
"AfghanistanIslamic Republic of" "Italy" 2013 0 0 0 0

Thank you very much for your time

How to plot of linear time trend normalized to zero and one at the beginning and end of the sample

Dear fellow stata users,

After conducting a cross-sectional regression on the panel data, I collected the coefficients' of each years' regression.
and not I want to plot a linear time trend, which is normalized to zero and one at the beginning and end of the sample, so that the intercept measures the level in the beginning time period and the slope measures the cumulative change over the full sample period.

But now I don't know how to normalize the time variable on the trend model.
I would be appreciate if there's someone can give me someone hints to.
Thank you inadvance!

I want to generate the time trend on b5, b3 sFPE3, sFPE5 separately.Here's part of the variables

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double year float(b5 b3 b1 sFPE1 sFPE3 sFPE5)
1964  .01210736 .007341923 .0035780396  .003844916 .009996476  .01090701
1965 .015396264 .008618028 -.003051655  .003801919 .009884687  .01078504
1966 .006149841 .014598425  .006788644 .0040203054 .010452474 .011404544
1967    .011944 .009383247    .0106511 .0041352836 .010751409 .011730708
1968 .014137417 .004013681  .002383717 .0041071908  .01067837 .011651016
1969  .02031038 .012030845   .00366233  .003894486 .010125353 .011047628
1970 .013024523   .0208625  .006350213  .004410404   .0114667  .01251115
1971 .015139028 .011371393  .010445192 .0044373223 .011536685  .01258751
1972  .01243061  .01062778  .007894311  .004719735 .012270935  .01338864
1973 .017844478 .016503459 -.008085154  .004959575   .0128945 .014069005
end

(sorry for my bad english grammar, coz i'm not a native speaker.)

question on matching

Hello,

I have a "CEO" variable for firms on panel data and the CEO names are not inputted the same due to spelling errors. For example, it can be "Jaeyoung Song" for one year and "Jaeyong Song" for another year. I hope to make the information to be consistent despite the typos. Also, other times, it is spelled as "Song Jaeyoung" instead of "Jaeyoung Song" due to first-last name ordering differences in other countries than the US. Is there any way that I can match them and give the same values to these names?

Ultimately, I hope to give the same values for the same CEO of the same firm ID. Let's say

Firm 1 has years from 1992-2000 and its CEO has changed as follows:
1992 Jaeyoung Song
1993 Jaeyoung Song
1994 Jaeyong Song
1995 Song Jaeyoung
1996 Taeho Kim
1997 Taeho Kim
1998 Sunhwa Han
1999 Sunha Han
2000 Sunhwa Han

Then CEO changes twice over the years and I hope to give the same CEO the same ID within the firm. Could you help me on how to match slightly different information and also give different values for different people within the same ID?

Thank you for your help in advance!

Marginal effects after bivariate probit and parameter homogeneity test.

Hello people,

I am replicating an empirical economic study in which the authors conduct a bivariate probit regression with marginal effects at the sample means. They report two columns with marginal effects (one for each of the 2 dependent variables). The results show opposite effects for the two DV, which confirms their theory. In addition, they run a parameter homogeneity test for each independent variable to check if the differences between the effects are significant.
If I run a biprobit in stata, I also get 2 tables: One for each of both DV. However if I run mfx afterward, I get only ONE table. I also tried all 4 combinations, i.e. p(00), p(10), p(01), p(11) but all effects are completely different compared to the study.

Does anyone have an idea what the authors did in econometric terms? How do I get the two coefficient tables as two mfx tables? And what could be the command for a parameter homogeneity test?

Maybe it helps to have a bit of the theoretical background:

They use a biprobit because of the following reason: The data comes from a questionnaire in which every observation is a firm (i.e. the data is at firm level). Each firm has a dummy variable for "patent application" and for "trade secrecy". Both can be 1 or 0, independently, so there are 4 possible combinations. The theoretical model that is tested, however, considers one innovation that can either be patented or kept secret (i.e. the model is at innovation level/product level). At first they run a probit regression where "patent application" is the DV. However, as in the sample there are firms that have a value of 1 for patenting AND secrecy, they conduct a biprobit. The results show opposite effects of the independent variables on patenting and secrecy, which is a likable result that does not contradict the results of the first probit.

I know this is a tough and long question but I don't know who to ask anymore.

Thank you very much and best regards!

giving the same values to the same ID's variable

Hello,

I'm manipulating panel data and have a question regarding some variables. I have firms as the unit of analysis and each of them are given an ID. While information on their "entry of year" variable should be the same with the ID, some errors exist - e.g. some of the entry year is 1992 and others 1993 for the same firm - which I believe is a typo. I hope to align them all and make the information consistent within the ID. What command do I need to use? Or at least can I tag them to check?

Thank you for your help in advance!

Error message "VCE is not positive definite" while performing the multiple imputation

Dear Stata users,

I am performing the following MI code,

HTML Code:

mi impute chained (logit, augment)a4_0 a5_1_0 g4_a a4_12 a5_1_12 a4_24 a5_1_24 a4_36 ///
      a5_1_36 a4_48 a5_1_48 a4_60 a5_1_60 (ologit, augment)a1_0 a1_12 a1_24 a1_36 a1_48 ///
      a1_60 bmi_c_0 bmi_c_12 bmi_c_24 bmi_c_36 bmi_c_48 bmi_c_60 (mlogit, augment)g3 g8 (ologit, ascontinuous)a2 ///
   = age sex cluster, add(50) rseed (53421) savetrace(trace1,replace)

However, I got the following error message;

HTML Code:

Performing chained iterations ...
mi impute: VCE is not positive definite
    The posterior distribution from which mi impute drew the imputations for a5_1_60
    is not proper when the VCE estimated from the observed data is not positive
    definite.  This may happen, for example, when the number of parameters exceeds the
    number of observations. Choose an alternate imputation model.
error occurred during imputation of a4_0 a5_1_0 g4_a a4_12 a5_1_12 a4_24 a5_1_24 a4_36
a5_1_36 a4_48 a5_1_48 a4_60 a5_1_60 a1_0 a1_12 a1_24 a1_36 a1_48 a1_60 bmi_c_0
bmi_c_12 bmi_c_24 bmi_c_36 bmi_c_48 bmi_c_60 g3 g8 a2 on m = 1

Please help me to sort this issue.

Stata word in Latex

Dear all,

I was wondering if there is Latex package to add nice-looking Stata names into a document. In my paper, I mention Stata several times, and I would like to make it look nice in a similar fashion to the brand latex when you type \LaTeX.

Thank you so much.

Pablo.

Analysis of survey rounds: OLS or logit?

Hello community!
I am conducing a research on UK public attitudes towards immigration.
For my data I have combined 8 UK surveys conducted from 2002 to 2016.
The feature of this data is that it is not longitudinal and not time series due: each survey has different amount of respondents and each survey respondents sample is unique for each survey.

Thus, I have data on UK from 2002 to 2016 with unique respondents for each year.

My dataset has both individual and national indicators assigned for each respondent, and national indicators (like gdp rate) are same for specific year respondents group.

My question is that if it is scientifically acceptable to do OLS regression or logistic regression for combined dataset (taking into regression a large sample of combined respondents for each year) without separating regression results for each year?

Two fixed effects regressions using two different subsamples of same dataset.

Hi all,

I have a question regarding comparing coefficients of two regressions (fe) which are subsamples of a larger dataset.

Context: I am looking at determinants of bank liquidity across European countries.
I have decided to split the countries into subsamples: Developed European Countries and Emerging European Countries (no overlap) to see whether my independent variables change significantly. Liquidity Coverage Ratio - Dependent Variable. Bank Size, Net interest Margin, Return on Av.Assets, Leverage and GdpGrowth - Independent Variables

The stata code I am using for panel data is:
xtset id year

Regression 1:
xtreg LCR Size nim roaa lev gdpg if developed==1, fe robust

Regression 2:
xtreg LCR Size nim roaa lev gdpg if emerging==1, fe robust

Is there a way of comparing the results of the two regressions against each other? E.g. The coefficient of NIM was positively statistically significant in developed countries but emerging countries a smaller positive insignificant result. Therefore the model suggests that NIM has a greater influence over LCR in dependent countries compared to independent countries.

Thanks,
(Sorry if this has already been asked I can't find any posts).

ppml_panel_sg and PPML have different predict valuea

hello,
I am using ppml_panel_sg to reg trade on some gravity variables and rta, finding that ppml_panel_sg cannot get the right predict values.
Then I use simplify the data,with the same data i use ppml_panel_sg and PPML to predict values respectively,finding different predict values.
like this,
code:
ppml trade PAIR_FE1-PAIR_FE$NTij_1 EXPORTER_TIME_FE* IMPORTER_TIME_FE1-IMPORTER_TIME_FE$NT_yr RTA, noconst
predict trade1,mu

ppml_panel_sg trade RTA,ex(exporter) im(importer) y(year)
predict trade2

sum trade trade1 trade2

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
trade | 512 18695.41 149875.9 0 2795426
trade1 | 512 18695.41 150052.1 .0022326 2802068
trade2 | 512 .0350129 .089555 0 .2636268

apparently,trade2 is not right predicted values.

I have 2 questions:
1)is it right to get the predict values like above?
2)I get the fixed effects from ppml_panel_sg by the options genD，finding it is also different with the results of ppml.

Can someone give me some hints/explanation? thank you!!!

"omitted because of collinearity" issue

Dear community,
I conduct an analysis of survey data, precisely logistic analysis.
I have individual level indicators which are pretty much unique for each respondent and national level indicators, which are the same for each respondent.
Thus my dataset looks like this (in a very simple way):

id	income_sat	income_eval	gdp_growth	unemp_r	inflation
1	0	1	0.025	0.08	0.01
2	1	1	0.025	0.08	0.01
3	0	0	0.025	0.08	0.01
4	0	1	0.025	0.08	0.01

The feature is that national level indicators are the same and if I run any sort of regression all national level indicators are shown as "omitted because of collinearity".

Could you advise me please if I do something wrong?

using factor variables in command mtreatreg

Dear all,
I want to use mtreatreg command and i have categorial variables. The command i use:
mtreatreg Y X1 ib1.X2, mtreat(treatment = X1 ib1.X2 X3) sim(20) density(normal) basecat(1)
But i get this error message:
factor variables and time-series operators not allowed
r(101);
i need help to resolve this problem

Birth cohorts- household level pseudo panel

Hello everyone,

I'm using 3 rounds of cross-sectional NSSO data which is at the household level and I would like to construct a pseudo panel. I want to aggregate this household-level data into group-level averages on the basis of household-head's birth year.

tab birthyear

birthyear | Freq. Percent Cum.
------------+-----------------------------------
1897 | 1 0.00 0.00
1898 | 1 0.00 0.00
1900 | 1 0.00 0.00
1903 | 1 0.00 0.00
1904 | 2 0.00 0.00
1905 | 5 0.00 0.00
1906 | 6 0.00 0.01
1907 | 6 0.00 0.01
1908 | 2 0.00 0.01
1909 | 15 0.00 0.01
1910 | 15 0.00 0.02
1911 | 14 0.00 0.02
1912 | 15 0.00 0.03
1913 | 14 0.00 0.03
1914 | 20 0.01 0.04
1915 | 109 0.03 0.07
1916 | 34 0.01 0.08
1917 | 37 0.01 0.09
1918 | 29 0.01 0.10
1919 | 130 0.04 0.14
1920 | 246 0.08 0.22
1921 | 139 0.04 0.26
1922 | 90 0.03 0.29
1923 | 213 0.07 0.35
1924 | 286 0.09 0.44
1925 | 739 0.23 0.67
1926 | 340 0.10 0.77
1927 | 429 0.13 0.90
1928 | 248 0.08 0.98
1929 | 864 0.27 1.24
1930 | 1,259 0.39 1.63
1931 | 992 0.30 1.93
1932 | 452 0.14 2.07
1933 | 1,106 0.34 2.41
1934 | 1,174 0.36 2.77
1935 | 3,107 0.95 3.72
1936 | 1,422 0.44 4.16
1937 | 1,827 0.56 4.72
1938 | 1,003 0.31 5.03
1939 | 3,338 1.02 6.05
1940 | 4,335 1.33 7.38
1941 | 3,765 1.15 8.54
1942 | 1,637 0.50 9.04
1943 | 3,453 1.06 10.10
1944 | 4,442 1.36 11.46
1945 | 5,776 1.77 13.23
1946 | 4,866 1.49 14.73
1947 | 4,619 1.42 16.14
1948 | 2,431 0.75 16.89
1949 | 7,584 2.33 19.22
1950 | 6,514 2.00 21.21
1951 | 7,446 2.28 23.50
1952 | 3,004 0.92 24.42
1953 | 6,645 2.04 26.46
1954 | 6,264 1.92 28.38
1955 | 9,721 2.98 31.36
1956 | 6,513 2.00 33.36
1957 | 7,944 2.44 35.80
1958 | 3,759 1.15 36.95
1959 | 10,637 3.26 40.21
1960 | 10,215 3.13 43.35
1961 | 10,302 3.16 46.51
1962 | 4,444 1.36 47.87
1963 | 9,853 3.02 50.89
1964 | 9,434 2.89 53.79
1965 | 11,403 3.50 57.28
1966 | 9,522 2.92 60.20
1967 | 9,305 2.85 63.06
1968 | 4,368 1.34 64.40
1969 | 13,543 4.15 68.55
1970 | 9,190 2.82 71.37
1971 | 11,612 3.56 74.93
1972 | 4,478 1.37 76.31
1973 | 9,221 2.83 79.14
1974 | 7,637 2.34 81.48
1975 | 8,195 2.51 83.99
1976 | 7,152 2.19 86.19
1977 | 6,466 1.98 88.17
1978 | 3,172 0.97 89.14
1979 | 7,799 2.39 91.54
1980 | 3,649 1.12 92.65
1981 | 6,309 1.94 94.59
1982 | 2,405 0.74 95.33
1983 | 3,763 1.15 96.48
1984 | 2,608 0.80 97.28
1985 | 2,185 0.67 97.95
1986 | 1,899 0.58 98.54
1987 | 1,384 0.42 98.96
1988 | 787 0.24 99.20
1989 | 893 0.27 99.48
1990 | 453 0.14 99.61
1991 | 525 0.16 99.78
1992 | 272 0.08 99.86
1993 | 278 0.09 99.94
1994 | 100 0.03 99.97
1995 | 83 0.03 100.00
------------+-----------------------------------
Total | 325,990 100.00

. As you can see, in some of the birth years, there are lots of observations and I'm hoping someone can advise me what strategy I should use to create say 10 birth cohorts? So since it starts from 1987, I could form one cohort 1897-1927 and another one from 1987-2007. But not sure about the ones in between.
I am aware that the number of cohorts must be sufficiently large for the within estimator to be consistent. Any advise would be so useful.

Thank you so much for your help,
Samira.

GLLAMM or melogit?

Dear Statalist,

I would like to ask you about two things for which I need your help:

I am working with a multilevel model with three levels (region, firms, time) for which I am interested in determining if it is relevant to include the region as a level. I do what is usually done in the literature and do an empty model and check the icc, however, I have some problems with the command. First, when implementing the melogit command, it give the following error:

Code:

 melogit y  ||region_id: ||id: , or intpoints(30)                          

Fitting fixed-effects model:

Iteration 0:   log likelihood = -25315.955 
Iteration 1:   log likelihood = -25274.806 
Iteration 2:   log likelihood = -25274.769 
Iteration 3:   log likelihood = -25274.769 

Refining starting values:

Grid node 0:   log likelihood = -20408.483

Fitting full model:

initial values not feasible
r(1400);

Next, what I do is implementing the evaltype(gf0) into the command and then it converge:

Code:

 melogit y  ||region_id: ||id: , or evaltype(gf0) intpoints(30)                    


Integration method: mvaghermite                 Integration pts.  =         30

                                                Wald chi2(0)      =          .
Log likelihood = -19512.201                     Prob > chi2       =          .
------------------------------------------------------------------------------
           y |       Odds   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .0830499   .0114779   -18.00   0.000     .0633431    .1088879
-------------+----------------------------------------------------------------
region_id    |
   var(_cons)|   .2556345   .1207838                      .1012601    .6453577
-------------+----------------------------------------------------------------
region_id>id |
   var(_cons)|   4.913874    .194306                      4.547424    5.309852
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
LR test vs. logistic model: chi2(2) = 11525.14            Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. estat icc

Intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
                   region_id |   .0302191   .0138461      .0121931    .0729267
                id|region_id |    .611098   .0106392      .5900539    .6317361
------------------------------------------------------------------------------

However, if I do the same estimation in GLLAMM (which I read once that is more rigorous, even though more time consuming) the estimation is different depending on using adaptive quadrative option, especially the icc as you can see next.

Code:

 gllamm y , i(id region_id) family(binomial) link(logit) nrf(1 1) eq(inter inter) nip(30) adapt eform                      
 
gllamm model
 
log likelihood = -19512.201
 
------------------------------------------------------------------------------
           y |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .0830527   .0114777   -18.01   0.000      .063346      .10889
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): 4.9138833 (.19430896)
 
***level 3 (region_id)
 
    var(1): .25556668 (.12077743)
------------------------------------------------------------------------------

 

.
end of do-file

. di .25556668/(.25556668+4.9138833+3.29)
.03021079

. gllamm y , i(id region_id) family(binomial) link(logit) nrf(1 1) eq(inter inter) nip(30)  eform  
 
gllamm model
 
log likelihood = -19564.722
 
------------------------------------------------------------------------------
           y |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   4.878615   2.387978     3.24   0.001     1.869182    12.73332
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): 4.8848945 (.17888784)
 
***level 3 (region_id)
 
    var(1): 7.1880202 (1.8145096)
------------------------------------------------------------------------------

 

. di 7.1880202/(7.1880202+4.8848945+3.29)
.46788128

My first question is what am I doing with the evaltype(gf0) option?
Why such a difference between the two estimation with GLLAMM?
Which estimation should I use, GLLAMM or melogit (with the evaltype(gf0) option) for determining how relevant is the regional context?

Lets say that I am forced to use GLLAMM for such a model (in the hypothetical case that the command melogit give any other error). If I want the margins with the melogit command I would do the following:

Code:

melogit y x1 x2##c.z1 ||region_id: ||id: , or intpoints(30)     
estat icc
margins, dydx(x2) at(z1 = (0(5)15))
marginsplot
margins, at(z1 = (0(5)15))
marginsplot

How to obtain the margins and plot them from a GLLAMM estimation?
Code:
gllamm y x1 x2 x2Xz1, i(id region_id) family(binomial) link(logit) nrf(1 1) eq(inter inter) nip(30) adapt eform

Here you have the descriptive of the variable for if it help you (dataset is so huge to use dataex).

Code:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |     47,411    .2249478    .4175523          0          1
          x1 |     32,122    .3176639    .4655752          0          1
          x2 |     47,306     .218133    .4129826          0          1
          z1 |     98,165    5.869033    5.543298          0   16.89129
       x2Xz1 |     47,306    1.285256     3.41989          0   16.89129

Thanks a lot in advance for your help.

Why are there three Constant Terms in Time Series FMOLS Estimation Results?

Why do ı have three constant term in results table. Can anyone help with that?

qui cointreg y1 x1 x2 x3 x4, est(fmols) eqtrend(0) eqdet(sdummymac1)
qui est store sfmolsmacs1
qui cointreg y1 x1 x2 x3 x4,, est(fmols) eqtrend(1) eqdet(tdummymac1)
qui est store tfmolsmacs1
estimates table sfmolsmacs1 tfmolsmacs1, b(%6.3f) se(%6.3f) p stats(N r2 r2_a rmse lrse) style(oneline)

Variable	sfmol~1	tfmol~1
------------	--------	------------
x1	0.008	-0.014
	0.006	0.005
	0.1598	0.003
x2	-0.043	-0.3
	0.037	0.036
	0.2413	0
x3	-0.002	0.001
	0.001	0.001
	0.004	0.076
x4	-0.009	-0.009
	0.003	0.002
	0.0045	0.0001
_cons	3.843
	0.549
	0
linear		0.191
		0.04
		0
_cons		4.585
		0.397
		0
_cons	-0.304	-0.009
	0.042	0.001
	0	0
------------	--------	------------
N	107	107
r2	0.197	0.414
r2_a	0.158	0.378
rmse	0.167	0.127
lrse	0.11	0.076

cmp models with zero-inflation

I have recently started using cmp models which I find to be very much useful in building multivariate models with different distributions and sample selection. I would like to know if there is a way to add zero-inflation to the corresponding models such as ordered probit using cmp so that we can use mutivariate models with zero inflation.

Graph by categories and groups

Trying to plot a graph (scatter plot with line ) using a variable (rip) which takes values differently by fyear, state and item. See a sample data below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(rip fyear) str16 state str9 item
         . 201011 "ANDHRA PRADESH" "bidi"     
 .09099022 201112 "ANDHRA PRADESH" "bidi"     
 .10854694 201213 "ANDHRA PRADESH" "bidi"     
 .10353264 201314 "ANDHRA PRADESH" "bidi"     
  .0962819 201415 "ANDHRA PRADESH" "bidi"     
 .08711275 201516 "ANDHRA PRADESH" "bidi"     
 .08109602 201617 "ANDHRA PRADESH" "bidi"     
 .08437795 201718 "ANDHRA PRADESH" "bidi"     
 .08440902 201819 "ANDHRA PRADESH" "bidi"     
         . 201011 "ANDHRA PRADESH" "chewing"  
    .52594 201112 "ANDHRA PRADESH" "chewing"  
   .573869 201213 "ANDHRA PRADESH" "chewing"  
  .4906687 201314 "ANDHRA PRADESH" "chewing"  
   .507094 201415 "ANDHRA PRADESH" "chewing"  
  .4349617 201516 "ANDHRA PRADESH" "chewing"  
   .379727 201617 "ANDHRA PRADESH" "chewing"  
  .3401141 201718 "ANDHRA PRADESH" "chewing"  
  .3209431 201819 "ANDHRA PRADESH" "chewing"  
         . 201011 "ANDHRA PRADESH" "cigarette"
  .6422687 201112 "ANDHRA PRADESH" "cigarette"
  .7280357 201213 "ANDHRA PRADESH" "cigarette"
  .7079984 201314 "ANDHRA PRADESH" "cigarette"
   .679833 201415 "ANDHRA PRADESH" "cigarette"
  .6249001 201516 "ANDHRA PRADESH" "cigarette"
  .6212281 201617 "ANDHRA PRADESH" "cigarette"
  .5683537 201718 "ANDHRA PRADESH" "cigarette"
  .5830745 201819 "ANDHRA PRADESH" "cigarette"
         . 201011 "ASSAM"          "bidi"     
 .07124937 201112 "ASSAM"          "bidi"     
.072014794 201213 "ASSAM"          "bidi"     
 .07024458 201314 "ASSAM"          "bidi"     
 .06895715 201415 "ASSAM"          "bidi"     
.063124955 201516 "ASSAM"          "bidi"     
 .06579541 201617 "ASSAM"          "bidi"     
 .07138462 201718 "ASSAM"          "bidi"     
 .06889646 201819 "ASSAM"          "bidi"     
         . 201011 "ASSAM"          "chewing"  
 .08735093 201112 "ASSAM"          "chewing"  
 .07451655 201213 "ASSAM"          "chewing"  
 .06969457 201314 "ASSAM"          "chewing"  
  .0770348 201415 "ASSAM"          "chewing"  
 .09772397 201516 "ASSAM"          "chewing"  
 .09071058 201617 "ASSAM"          "chewing"  
 .08697958 201718 "ASSAM"          "chewing"  
 .08091222 201819 "ASSAM"          "chewing"  
         . 201011 "ASSAM"          "cigarette"
  .5504718 201112 "ASSAM"          "cigarette"
  .5141999 201213 "ASSAM"          "cigarette"
 .57849634 201314 "ASSAM"          "cigarette"
 .57627475 201415 "ASSAM"          "cigarette"
 .53460586 201516 "ASSAM"          "cigarette"
  .5026038 201617 "ASSAM"          "cigarette"
    .48206 201718 "ASSAM"          "cigarette"
  .4520759 201819 "ASSAM"          "cigarette"
         . 201011 "BIHAR"          "bidi"     
  .0886173 201112 "BIHAR"          "bidi"     
  .0779648 201213 "BIHAR"          "bidi"     
 .08331867 201314 "BIHAR"          "bidi"     
  .0885797 201415 "BIHAR"          "bidi"     
 .10049088 201516 "BIHAR"          "bidi"     
 .08846604 201617 "BIHAR"          "bidi"     
 .07779509 201718 "BIHAR"          "bidi"     
 .07439497 201819 "BIHAR"          "bidi"     
         . 201011 "BIHAR"          "chewing"  
 .10634077 201112 "BIHAR"          "chewing"  
 .09940512 201213 "BIHAR"          "chewing"  
 .10277797 201314 "BIHAR"          "chewing"  
 .10430764 201415 "BIHAR"          "chewing"  
 .11484671 201516 "BIHAR"          "chewing"  
 .10868685 201617 "BIHAR"          "chewing"  
 .09779953 201718 "BIHAR"          "chewing"  
 .09777625 201819 "BIHAR"          "chewing"  
         . 201011 "BIHAR"          "cigarette"
 1.1298708 201112 "BIHAR"          "cigarette"
 1.1109984 201213 "BIHAR"          "cigarette"
 1.1648171 201314 "BIHAR"          "cigarette"
 1.3054184 201415 "BIHAR"          "cigarette"
  1.342271 201516 "BIHAR"          "cigarette"
 1.2638005 201617 "BIHAR"          "cigarette"
 1.1409205 201718 "BIHAR"          "cigarette"
  1.232831 201819 "BIHAR"          "cigarette"
         . 201011 "CHANDIGARH"     "bidi"     
 .02962901 201112 "CHANDIGARH"     "bidi"     
 .02758915 201213 "CHANDIGARH"     "bidi"     
 .02607511 201314 "CHANDIGARH"     "bidi"     
.023134753 201415 "CHANDIGARH"     "bidi"     
.022966146 201516 "CHANDIGARH"     "bidi"     
.025267234 201617 "CHANDIGARH"     "bidi"     
  .0391226 201718 "CHANDIGARH"     "bidi"     
 .04377725 201819 "CHANDIGARH"     "bidi"     
         . 201011 "CHANDIGARH"     "cigarette"
 .27653742 201112 "CHANDIGARH"     "cigarette"
  .3126252 201213 "CHANDIGARH"     "cigarette"
  .3424058 201314 "CHANDIGARH"     "cigarette"
  .3550999 201415 "CHANDIGARH"     "cigarette"
  .3633941 201516 "CHANDIGARH"     "cigarette"
  .3496068 201617 "CHANDIGARH"     "cigarette"
  .3621924 201718 "CHANDIGARH"     "cigarette"
  .3684403 201819 "CHANDIGARH"     "cigarette"
         . 201011 "CHHATISGARH"    "bidi"     
end

I want a code similar to

Code:

scatter rip fyear, by(state) connect(l)

Since I have 3 commodities under the categorical variable "item" and there is a single rip variable for all three commodities in "item", each small graph returns 3 observations each against each value of fyear. Is there any way, I could color code the points for each item with a separate color so that it looks three distinct lines on each graph. I also do not need any line connecting the dots between values of any two observations belonging to different categories in the variable "item".

Also, is there any way to fit rip values of one of these items on a secondary axis as it has values very different from the other two categories in "item". Scale differences makes the graph pretty bad and it will be good to show the rip values of that particular category on a different y axis.

Can someone help me with it?

Tuesday, July 30, 2019

Dynamic panel data, with small N and large T

I have a panel data with N=17 and T=46. The model has a dynamic specification as it includes a lagged dependent variable. It looks something like the equation below:

Code:

Yit=ayit-1+b1D1it+b2D2it+b3xit+eit

Where y is my dependent variable, x a vector of covariates and Ds are dummy variables.

A dynamic model is usually estimated using the GMM method, however as my N is smaller than my T in this case it is not feasible.
Some papers talk about (1) Running a separate regression for each group and averaging the coe‑cient over groups; (2) Combine the data dening a common slope, allowing for xed or random intercepts and estimating pooled regressions (Mairesse & Griliches 1988); (3) Take the data average over group and estimate the aggregate time series regressions (Pesaran, Pierse & Kumar 1989, Lee, Pesaran & Pierse 1990) and (4) Averaging the data over time and estimating cross section regression on group means (Barro 1991).
I have also gone through some earlier posts, but still lack clarity. My issue is
What method can be used to estimate a dynamic model with small N and large T.
Is xtivreg appropriate in this situation?

Stata Matrix error

Hi. I am doing a regression based decomposition analysis. This is my code

Code:

global X "Education respondents_current_age Education_husband Resp_age_at_birth wq1 wq3"
regr prenatal_care $X [aw=weight]
sum prenatal_care [aw=weight]
sca m_prenatal_care=r(mean)
sca m_prenatal_care=r(mean)
foreach  x of global X {
qui {
sca b_`x'=_b[`x']
corr rank `x' [aw=weight], c
sca cov_`x'=r(cov_12)
sum `x' [aw=weight]
sca elas_`x'=(b_`x'*r(mean))/m_prenatal_care
sca CI_`x'=2*cov_`x'/r(mean)
sca con_`x'=elas_`x'*CI_`x'
sca prcnt_`x'=(con_`x'/CI_`x')*100
sca prcnt_`x'=(con_`x'/CI_`x')*100
sca prcnt_`x'=(con_`x'/CI_`x')*100

matrix Aaa = nullmat(Aaa) \ ///
(elas_`x', CI_`x', con_`x', prcnt_`x')
}

di "`x' elasticity:", elas_`x'
di "`x' concentration index:", CI_`x'
di "`x' contribution:", con_`x'
di "`x' percentage contribution:", prcnt_`x'
}
matrix rownames Aaa= $X
matrix colnames Aaa = "Elasticity""CI""Absolute""%"
matrix list Aaa, format(%8.4f)

I keep getting the error / not found and then matrix Aaa not found. I have tried to fix it but i am stuck. someone please help me

Changing the date format

Currently, I have the variable, calldate, which is in the format 09jul2019, 10jul2019, etc.

Is there any way I could change it into the format 20190709, 20190710?

I tried the following code, but it said type mismatch in both the lines:

Code:

 
 gen date2 = date(calldate, "DMY") format date2 %td

Thanks a lot!

stata doesn't recognize missings in string variable

I want to delete observations that have missing data for a particular variable but stata does not recognize the empty cells as missing when I use the command:

drop if fishing_site==" "

so, I thought it might be a problem with spaces so I tried trimming them but it did not make any changes and still not recognizing missings.

What do I need to do?

Update to -iscogen- available from SSC

An update to the iscogen package is now available from SSC. Type

Code:

. ssc install iscogen, replace

to install the update.

New features:

egp11(): Translation of ISCO-88 and ISCO-68 to EGP classes based on the original SPSS scripts by Harry Ganzeboom. egp11() provides an alternative to egp() (which is based on the Stata adaption of Ganzeboom's scripts by John Hendrickx). The main difference between egp() and egp11()is that egp11() distinguishes between classes IIIa and IIIb whereas egp() makes no such distinction.
mpg(): Translation of ISCO-88 to the German Magnitude Prestige Scale (MPS).

ben

Update to -kmatch- available from SSC

A maintenance update to kmatch is now available from SSC. Type

Code:

. ssc install kmatch, replace

to install the update. kmatch requires moremata and kdens to be installed, so possibly you will also have to type

Code:

. ssc install moremata, replace
. ssc install kdens, replace

Changes:

kmatch eb (entropy balancing) returned balancing weights that were scaled in terms of the number of observations instead of the sum of weights if pweighs or iweights were specified; this is fixed. The wrong scaling did not affect the results of treatment effect estimation, but it lead to erroneous balancing diagnostics in case of ATE.

ben

Adding 'proportion' to esttab command.

Dear All,

I am estimating a logistic fixed effects regression to estimate the effect of a policy change on a binary outcome variable. In my results table that I create using the esttab command I want to include the binary distribution of the outcome variable in the pre-treatment period. Something as follows:

Code:

prop opioidpois if post==0 & treat==1

Proportion estimation             Number of obs   =        802

--------------------------------------------------------------
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
opioidpois   |
           0 |   .8029925   .0140534      .7739456    .8291311
           1 |   .1970075   .0140534      .1708689    .2260544
--------------------------------------------------------------

. local base=e(cmd)

. eststo raw1: xtlogit opioidpois i.post did age, fe
note: multiple positive outcomes within groups encountered.
note: 609 groups (1,218 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log likelihood = -481.84658 
Iteration 1:   log likelihood =   -384.481 
Iteration 2:   log likelihood = -383.05408 
Iteration 3:   log likelihood = -383.05124 
Iteration 4:   log likelihood = -383.05124 

Conditional fixed-effects logistic regression   Number of obs     =      1,706
Group variable: studypersonid                   Number of groups  =        853

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2

                                                LR chi2(3)        =     416.41
Log likelihood  = -383.05124                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
  opioidpois |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.post |   .6571757   .2257941     2.91   0.004     .2146274    1.099724
         did |   1.332142   .1859222     7.17   0.000      .967741    1.696543
         age |   .0764154   .0991872     0.77   0.441    -.1179879    .2708186
------------------------------------------------------------------------------

. eststo margin1: margins, dydx(did) post

Average marginal effects                        Number of obs     =      1,706
Model VCE    : OIM

Expression   : Pr(opioidpois|fixed effect is 0), predict(pu0)
dy/dx w.r.t. : did

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         did |   .0565268   .1390073     0.41   0.684    -.2159225    .3289761
------------------------------------------------------------------------------

. estadd mat Bsl  = `base'
proportion not found
r(111);

Eventually, I want to add the BsI to my esttab command as follows:

esttab using "F.csv", keep(post) margin b(a4) se(4) nogaps nolabel star(* 0.10 ** 0.05 *** 0.01) ///
label title(Impact of intervention on prescription opioid poisonings, 2011-2019.) ///
mtitles( "ALL Presc. Opioids") stats(N Bsl) coeflabels(post "Impact Intervention" ) replace

The idea is to express the treatment effect in comparison to the proportion at baseline.

I will be grateful for your help.
Sincerely,
Sumedha.

Equivalent of 'exposure' variable for xtlogit regression for diff-in-diff

Dear All,

I am running a logistic FE regression to estimate a the effect of a 'treatment' using a generalized difference-in-difference model and a binary outcome variable. Its a two period model - pre and post treatment.

Different individuals are treated at different calendar times and are thus, the 'exposure' variable captures for each individual the length in days that they are in the treated group.

My data looks as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double studypersonid float(treat post exposure) double opioidpois float age
103336 1 0 2319 0 35.666668
103336 1 1  730 1 37.416668
103338 1 0 2137 0  45.41667
103338 1 1  912 0  47.16667
103342 1 0 1834 0        37
103342 1 1 1215 0     37.75
103344 0 0 3049 0    29.875
103344 0 1 3049 0 31.916666
103345 0 0 3049 0     58.75
103345 0 1 3049 1  59.10417
103346 1 0 2021 0        56
103346 1 1 1028 1     58.75
103347 0 0 3049 0        30
103347 0 1 3049 1     31.25
103350 0 0 3049 0  56.95025
103350 0 1 3049 0  59.62333
103351 0 0 3049 0  44.83333
103351 0 1 3049 1  46.72222
103355 0 0 3049 0  43.31452
103355 0 1 3049 1  44.97222
103365 1 0 1969 0        57
103365 1 1 1080 1  59.66667
103370 1 0 1842 0 35.583332
103370 1 1 1207 0  39.16667
103374 0 0 3049 1 31.166666
103374 0 1 3049 0     31.75
103375 1 0 1961 0 37.583332
103375 1 1 1088 1  38.64583
103389 0 0 3049 0    25.875
103389 0 1 3049 0 28.114584
103391 1 0 1901 1 34.666668
103391 1 1 1148 1    35.875
103394 0 0 3049 1  43.33333
103394 0 1 3049 0        45
103396 0 0 3049 0 21.916666
103396 0 1 3049 1 24.833334
103399 1 0 1906 1  55.55555
103399 1 1 1143 0  57.66667
103401 1 0 2173 0 35.666668
103401 1 1  876 0 37.583336
103402 1 0 2064 1  47.66667
103402 1 1  985 0  50.38095
103403 0 0 3049 0 36.795597
103403 0 1 3049 1  40.27778
103404 0 0 3049 0     38.75
103404 0 1 3049 0     40.25
103408 0 0 3049 0 32.666668
103408 0 1 3049 0      34.5
103413 0 0 3049 0  57.86111
103413 0 1 3049 1  59.98214
103418 0 0 3049 0 21.916666
103418 0 1 3049 1  24.02778
103420 0 0 3049 1 35.814816
103420 0 1 3049 1  39.32143
103421 1 0 1932 0 32.416668
103421 1 1 1117 0 33.333332
103423 0 0 3049 0     24.25
103423 0 1 3049 0 25.166666
103425 1 0 2390 0 21.416666
103425 1 1  659 1    23.625
103428 0 0 3049 0     27.75
103428 0 1 3049 1 29.819445
103430 1 0 2082 0     40.25
103430 1 1  967 1  41.66667
103435 1 0 2053 0 30.416666
103435 1 1  996 0    31.375
103440 1 0 1905 0 20.916666
103440 1 1 1144 0        22
103441 0 0 3049 0  55.51042
103441 0 1 3049 0  56.85049
103442 0 0 3049 1 36.166668
103442 0 1 3049 0      37.5
103449 0 0 3049 1 19.583334
103449 0 1 3049 0     20.75
103454 1 0 1898 0  49.52778
103454 1 1 1151 1  51.83333
103456 1 0 2030 0  52.33334
103456 1 1 1019 1  54.08334
103462 0 0 3049 1  53.31723
103462 0 1 3049 0  57.53472
103472 1 0 1880 0  38.44444
103472 1 1 1169 1  39.19445
103475 0 0 3049 0 24.680555
103475 0 1 3049 0  27.29167
103484 0 0 3049 0 35.333332
103484 0 1 3049 0     36.25
103488 1 0 2062 0      34.5
103488 1 1  987 0 35.083332
103489 0 0 3049 0 36.229168
103489 0 1 3049 1        38
103492 0 0 3049 1 29.666666
103492 0 1 3049 1 30.423077
103497 0 0 3049 0  56.02778
103497 0 1 3049 0  56.58334
103498 0 0 3049 0  48.33333
103498 0 1 3049 0     48.75
103500 0 0 3049 0      31.5
103500 0 1 3049 0        34
103505 1 0 2126 0 33.583332
103505 1 1  923 1 36.041668
end

In my estimation I want to take into account the different treatment exposure periods. If this were a count data model I would include an 'exposure' variable to account for different durations of exposure to treatment. How should I do that in an xtlogit regression? Right now I have estimated the following:

Code:

. eststo raw1: xtlogit opioidpois i.post did age, fe
note: multiple positive outcomes within groups encountered.
note: 609 groups (1,218 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log likelihood = -481.84658 
Iteration 1:   log likelihood =   -384.481 
Iteration 2:   log likelihood = -383.05408 
Iteration 3:   log likelihood = -383.05124 
Iteration 4:   log likelihood = -383.05124 

Conditional fixed-effects logistic regression   Number of obs     =      1,706
Group variable: studypersonid                   Number of groups  =        853

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2

                                                LR chi2(3)        =     416.41
Log likelihood  = -383.05124                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
  opioidpois |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.post |   .6571757   .2257941     2.91   0.004     .2146274    1.099724
         did |   1.332142   .1859222     7.17   0.000      .967741    1.696543
         age |   .0764154   .0991872     0.77   0.441    -.1179879    .2708186
------------------------------------------------------------------------------

. eststo margin1: margins, dydx(did) post

Average marginal effects                        Number of obs     =      1,706
Model VCE    : OIM

Expression   : Pr(opioidpois|fixed effect is 0), predict(pu0)
dy/dx w.r.t. : did

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         did |   .0565268   .1390073     0.41   0.684    -.2159225    .3289761
------------------------------------------------------------------------------

I will be grateful for your help.
Sincerely,
Sumedha.

Import Excel Loop issues

Hello everyone,
I'm trying to write code to import certain worksheets from an excel workbook, but I can't quite get it to work properly. Here's the code I have written for it so far:

cd "$pathin\Bank Spreadsheets\XLS spreadsheets"
foreach sheet in "Ex 1" "Ex 2" "Ex 3" "Ex 5" "Ex 6" "Ex 8" "Ex 9" "EX 10" "EX 11" "EX 12" {
import excel using "$pathin\Bank Spreadsheets\XLS spreadsheets\Caliber Home Loans-30992_FUP Findings Exhibits.xls", sheet("`sheet'") cellrange(A5) firstrow clear
save file_`sheet', replace
}

Unfortunately the syntax error that keeps popping up says "invalid '1'. I've tried checking the spacing and the naming of the sheets, but nothing seems to work. Any help would be great, thanks!

Coupled forestplot

Dear Stata user,
I could create one with the midas command using the following code: midas tp fp fn tn , by(subgroup) texts(0.50) bfor(dss) id(studyid) ford fors
The problem with midas is that I do not manage to create a "nice" output. I have to modify the graph by hand to add the labels for each subgroup and customize the forestplot obtain.
Would you knownhow to customize midas forestplots using stata options?

Spatial Panel analysis for SYS-GMM

Hello, I have a problem. I have my model SYS_GMM (level e diff) done by using xtabond2 of Arellano and Bond. If I want to do a Spatial Panel analysis there is not the possibility to do with SYS_GMM. If i use spregdpd command i just can run(xtabond), same for spregxt... I don't know how to do?
My GMM model is like this:

xtabond2 dependent l.dependet explanatories control tau2003-tau2011, ///
gmm(dependent explanatories, lag(2 2) eq(diff)) gmm(dependent explanatories, lag(1 1) eq(lev)) ///
iv (controls tau2003-tau2011, eq(both)) h(2) ar(2) two robust noconst

I have 103 Italian Province and 10 years (2003-2012)... with GMM I'm using 2003-2011
Many thanks

Tabulation only for freq>=5

Hello,
I am running this tabulate command to display some studentized residuals:

tabulate cname if rstudent_1974>=1 & year>=1974 & year<=1988 & fh_ipolity2>=8 & bmr_demdur>=5 & pwt_pop>=1, sum(rstudent_1974)

The tabulation is shown in the picture below. Is there any way to only show countries with frequencies of >=5?
Array

Thank you.

Martin

Difference in Difference Estimation

Dear All,

My goal is to find out the impact of migration on several firm level performance measures, such as wage. Here is my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id year) byte(sector wage ccode Density) float(meanID Treatment Time TT)
 1 2000 10 43 5  10 40.56667 0 0 0
 1 2001 10 14 5  41 40.56667 0 0 0
 1 2002 10 10 5  96 40.56667 0 0 0
 1 2003 10 44 5  33 40.56667 0 1 0
 1 2004 10 78 5   6 40.56667 0 1 0
 1 2005 10 59 5  24 40.56667 0 1 0
 2 2000 10 27 5   3 40.56667 0 0 0
 2 2001 10 52 5  28 40.56667 0 0 0
 2 2002 10 21 5  22 40.56667 0 0 0
 2 2003 10 55 5  69 40.56667 0 1 0
 2 2004 10 34 5  39 40.56667 0 1 0
 2 2005 10 23 5  72 40.56667 0 1 0
 3 2000 10 34 5  24 40.56667 0 0 0
 3 2001 10  9 5   9 40.56667 0 0 0
 3 2002 10 20 5  68 40.56667 0 0 0
 3 2003 10 57 5  22 40.56667 0 1 0
 3 2004 10  9 5  51 40.56667 0 1 0
 3 2005 10 85 5  80 40.56667 0 1 0
 4 2000 10 67 5  33 40.56667 0 0 0
 4 2001 10 66 5  50 40.56667 0 0 0
 4 2002 10 54 5  49 40.56667 0 0 0
 4 2003 10 53 5  16 40.56667 0 1 0
 4 2004 10 76 5  36 40.56667 0 1 0
 4 2005 10 27 5  63 40.56667 0 1 0
 5 2000 10 98 5  42 40.56667 0 0 0
 5 2001 10 99 5  36 40.56667 0 0 0
 5 2002 10 24 5  20 40.56667 0 0 0
 5 2003 10  6 5  30 40.56667 0 1 0
 5 2004 10 13 5  80 40.56667 0 1 0
 5 2005 10 15 5  65 40.56667 0 1 0
 6 2000 10 89 6  24 42.43333 0 0 0
 6 2001 10 11 6  33 42.43333 0 0 0
 6 2002 10 84 6   7 42.43333 0 0 0
 6 2003 10 76 6  61 42.43333 0 1 0
 6 2004 10 90 6  19 42.43333 0 1 0
 6 2005 10 14 6  80 42.43333 0 1 0
 7 2000 10 25 6  21 42.43333 0 0 0
 7 2001 10 42 6  15 42.43333 0 0 0
 7 2002 10  5 6  26 42.43333 0 0 0
 7 2003 10 21 6  12 42.43333 0 1 0
 7 2004 10 31 6   0 42.43333 0 1 0
 7 2005 10 28 6  36 42.43333 0 1 0
 8 2000 10 72 6  96 42.43333 0 0 0
 8 2001 10 22 6  19 42.43333 0 0 0
 8 2002 10 62 6  24 42.43333 0 0 0
 8 2003 10 59 6  51 42.43333 0 1 0
 8 2004 10 81 6  93 42.43333 0 1 0
 8 2005 10 70 6  25 42.43333 0 1 0
 9 2000 10 32 6  52 42.43333 0 0 0
 9 2001 10 65 6  27 42.43333 0 0 0
 9 2002 10 35 6  89 42.43333 0 0 0
 9 2003 10 47 6  54 42.43333 0 1 0
 9 2004 10 73 6  90 42.43333 0 1 0
 9 2005 10 22 6  10 42.43333 0 1 0
10 2000 10  8 6 100 42.43333 0 0 0
10 2001 10 40 6  20 42.43333 0 0 0
10 2002 10 49 6  68 42.43333 0 0 0
10 2003 10  2 6   8 42.43333 0 1 0
10 2004 10 63 6  22 42.43333 0 1 0
10 2005 10 59 6  91 42.43333 0 1 0
11 2000 10 57 7  55 53.96667 1 0 0
11 2001 10 23 7   9 53.96667 1 0 0
11 2002 10 26 7 100 53.96667 1 0 0
11 2003 10 44 7  10 53.96667 1 1 1
11 2004 10 72 7  12 53.96667 1 1 1
11 2005 10 76 7  96 53.96667 1 1 1
12 2000 10 12 7  95 53.96667 1 0 0
12 2001 10  4 7  81 53.96667 1 0 0
12 2002 10 86 7  35 53.96667 1 0 0
12 2003 10 15 7  52 53.96667 1 1 1
12 2004 10 27 7  56 53.96667 1 1 1
12 2005 10  3 7  16 53.96667 1 1 1
13 2000 10 78 7   2 53.96667 1 0 0
13 2001 10 31 7  80 53.96667 1 0 0
13 2002 10 79 7  14 53.96667 1 0 0
13 2003 10 65 7  49 53.96667 1 1 1
13 2004 10 54 7  78 53.96667 1 1 1
13 2005 10 42 7  69 53.96667 1 1 1
14 2000 10 70 7   5 53.96667 1 0 0
14 2001 10 86 7  69 53.96667 1 0 0
14 2002 10 88 7  96 53.96667 1 0 0
14 2003 10  6 7  76 53.96667 1 1 1
14 2004 10 44 7  32 53.96667 1 1 1
14 2005 10 37 7  75 53.96667 1 1 1
15 2000 10 24 7  61 53.96667 1 0 0
15 2001 10 14 7  97 53.96667 1 0 0
15 2002 10 96 7  50 53.96667 1 0 0
15 2003 10 70 7  18 53.96667 1 1 1
15 2004 10 94 7  78 53.96667 1 1 1
15 2005 10 54 7  53 53.96667 1 1 1
16 2000 11 86 8 77     46.7 0 0 0
16 2001 11 58 8 93     46.7 0 0 0
16 2002 11 98 8 14     46.7 0 0 0
16 2003 11 52 8 26     46.7 0 1 0
16 2004 11 51 8 15     46.7 0 1 0
16 2005 11 91 8 72     46.7 0 1 0
17 2000 11 37 8 80     46.7 0 0 0
17 2001 11 45 8  4     46.7 0 0 0
17 2002 11 69 8 99     46.7 0 0 0
17 2003 11 86 8 64     46.7 0 1 0
end

I am trying to implement difference in difference estimation technique.
I define post and pre-treatment periods. The immigration starts at year 2003. I have data for the period 2000-2005. I define post treatment period as 2003-2005, and pre-treatment period in 2000-2002.

I have immigration density (Density) calculated at the city level and yearly defined as the total migrants divided by the total population. I use these densities to define control and treatment groups. More specifically I define the city as treatment group if average immigration density (2003-2005) is above 51. Under this assumption a given city belongs to treatment region for the 2003-2005 period. However, under this assumption I am not able to exploit yearly changes in immigration densities. How can I do this?

My first hypothesis is the firms in the treatment region should face lower wages due to the abundance in the low skill labor provided by the migrants. In order to do that I run the following codes. All of them give the same results

xtreg wage i.Treatment##i.Time i.year
xtreg wage i.Treatment##i.Time i.year,fe
xtreg wage TT i.year, fe

I also want to control for the industry. Can I do this by simply adding industry fixed effects to the firm level regression like the following:

xtreg wage i.Treatment##i.Time i.year i.sector

I also want to test the following hypotheses:

-The immigration of "low skill" workers should benefit more the labour-intensive firms relative to the capital-intensive firms.

-The labour-intensive firms in the labour-intensive industries benefit more than the labour-intensive firms in the capital-intensive industry.

Many thanks in advance.

Putexcel command for logistic regression

Hello there

I'm trying to make a nice table for my logistic regression results using putexcel command but I couldn't find the scalars I need. The framework would be Odds ratio, 95% Confidence intervals and p-value.

Now I ran the command:
logit top i.gest_c, base level(95)

and then ereturn list:

scalars:
e(rank) = 2
e(N) = 1474
e(ic) = 4
e(k) = 3
e(k_eq) = 1
e(k_dv) = 1
e(converged) = 1
e(rc) = 0
e(ll) = -889.8136445574683
e(k_eq_model) = 1
e(ll_0) = -922.7348412688314
e(df_m) = 1
e(chi2) = 65.84239342272622
e(p) = 4.88461896337e-16
e(N_cdf) = 0
e(N_cds) = 0
e(r2_p) = .0356778515766175

macros:
e(cmdline) : "logit top i.gest_c, base level(95)"
e(cmd) : "logit"
e(estat_cmd) : "logit_estat"
e(predict) : "logit_p"
e(marginsok) : "default Pr"
e(marginsnotok) : "stdp DBeta DEviance DX2 DDeviance Hat Number Residuals RStandard SCore"
e(title) : "Logistic regression"
e(chi2type) : "LR"
e(opt) : "moptimize"
e(vce) : "oim"
e(user) : "mopt__logit_d2()"
e(ml_method) : "d2"
e(technique) : "nr"
e(which) : "max"
e(depvar) : "top"
e(properties) : "b V"

matrices:
e(b) : 1 x 3
e(V) : 3 x 3
e(Cns) : 1 x 4
e(mns) : 1 x 3
e(rules) : 1 x 4
e(ilog) : 1 x 20
e(gradient) : 1 x 3

I couldn't fine what I need to export the results

thanks

interflex command failed

Hello everyone,

I have a problem with the results of the interflex command.
I had tried to look if there was common support in my data when I generated an interaction variable (between D and X). To do so, I used interflex.
The problem is that it fails to display the real values of the moderator (X), which is a binary variable.
I cannot tell if there is common support or not and why, in the beginning, it looks like that.

I hope someone can help me with that and thanks for any help.
Gal.

Converting Excel

I need to import 3 excel files with different columns/rows into Stata.

Excel #1 looks like this:

Code:

Date StockCode SharesRepurchased
1-1-2015 US10 40
1-1-2015 US30 10
2-1-2015 US20 30
2-1-2015 US40 40
2-1-2015 US50 40
2-1-2015 US10 10
3-1-2015 US20 20
4-1-2015 US20 10
4-1-2015 US30 10

Excel #2 looks like this:

Code:

Date US10_Price US10_Volume US20_Price US20_Volume US30_Price US30_Volume US40_Price US40_Volume US50_Price US50_Volume
1-1-2015 300 6000 200 8000 100 7000 150 6000 300 8000
2-1-2015 500 4000 400 5000 300 3000 350 6000 800 7000
3-1-2015 200 7000 400 3000 200 2000 350 5000 700 3000
4-1-2015 100 8000 600 3000 300 6000 250 4000 200 5000

Excel #3 looks like this:

Code:

Date US10_Value US20_Value US30_Value US40_Value US50_Value
1-1-2015 0 0 3 0 4
2-1-2015 0 1 0 0 4
3-1-2015 0 3 0 4 0
4-1-2015 4 0 0 0 0

I need to run a regression on these variables, and in order to do this I think I need a Stata file like this:

Code:

StockCode Date SharesRepurchased Price Volume Value
US10 1-1-2015 40 300 6000 0
US10 2-1-2015 10 500 4000 0
US10 3-1-2015 0 200 7000 0 
US10 4-1-2015 0 100 8000 4
US20 1-1-2015 0 200 8000 0
US20 2-1-2015 30 400 5000 1
US20 3-1-2015 20 400 3000 3
US20 4-1-2015 10 600 3000 0
US30 1-1-2015 10 100 7000 3
US30 2-1-2015 0  300 3000 0
US30 3-1-2015 0 200 2000 0
US30 4-1-2015 10 300 6000 0
US40 1-1-2015 0 150 6000 0
US40 2-1-2015 40 350 6000 0
US40 3-1-2015 0  350 5000 4
US40 4-1-2015 0 250 4000 0
US50 1-1-2015 0 300 8000 4
US50 2-1-2015 40 800 7000 4
US50 3-1-2015 0 700 3000 0
US50 4-1-2015 0 200 5000 0

How can I convert the excel files such that I can get a Stata file like this? Also, in reality Excel file #2 contains too many variables, so do I need to split this excel file up?

Thank you for your help in advance

update available of abm_grid: A Mata class for managing a square grid for agent based models

An update of the abm_grid class is now available on GitHub: https://github.com/maartenteaches/abm_grid . The abm_grid class is intended to be used by people who want to create an Agent Based Model (ABM) on a square grid (like a chessboard) in Mata. An ABM is a simulation in which agents, that each follow simple rules, interact with one another and thus produce an often surprising outcome at the macro level. The purpose of an ABM is to explore mechanisms through which actions of the individual agents add up to a macro outcome by varying the rules that agents have to follow or varying the environment in which it lives.

Implementing a new ABM will always require programming, but a lot of the tasks will be similar across ABMs. For example, in many ABMs the agents live on a square grid, and can only interact with their neighbors. abm_grid contains a set of functions that will do tasks like finding neighbors, adding, moving, and removing agents, etc., and someone can import them into their own ABM. I presented various examples at the last German Stata Users' meeting: http://www.maartenbuis.nl/presentations/munich19.html

This update adds functions for finding cells on a straight line between two cells and the distance between two cells.

I hope some of you will find this useful.

F test values are missing, I tried to understand it from the previous post but could not get it.

I am very new stata, please pardon me if i sound stupid. I rana regression with xtreg fixed effect with robust stantdard error. I am not getting any F-values. Please advise me how can i correct it.
following is the code and output:

xtreg nim to fo to_fo deposits_ta ilnplgl eta noniigr log_ta cir gdp inf i.country_enc##i.year, fe vce(robust)

note: 2.country_enc omitted because of collinearity
note: 3.country_enc omitted because of collinearity
note: 4.country_enc omitted because of collinearity
note: 5.country_enc omitted because of collinearity
note: 6.country_enc omitted because of collinearity
note: 2.country_enc#2015.year omitted because of collinearity
note: 6.country_enc#2012.year omitted because of collinearity
note: 6.country_enc#2013.year omitted because of collinearity
note: 6.country_enc#2014.year omitted because of collinearity
note: 6.country_enc#2015.year omitted because of collinearity

Fixed-effects (within) regression Number of obs = 664
Group variable: indexnumber Number of groups = 91

R-sq: Obs per group:
within = 0.4285 min = 1
between = 0.0301 avg = 7.3
overall = 0.0177 max = 9

F(52,90) = .
corr(u_i, Xb) = -0.9825 Prob > F = .

(Std. Err. adjusted for 91 clusters in indexnumber)

Robust
nim Coef. Std. Err. t P>t [95% Conf. Interval]

to 1.255876 .5382938 2.33 0.022 .1864617 2.325291
fo 52.96379 21.99281 2.41 0.018 9.271228 96.65636
to_fo -.6382428 .276699 -2.31 0.023 -1.187954 -.0885319
deposits_ta .0237399 .0133812 1.77 0.079 -.0028442 .0503241
ilnplgl .0781159 .0590603 1.32 0.189 -.0392176 .1954494
eta .0877079 .0388741 2.26 0.026 .0104778 .1649381
noniigr -.0295204 .0103156 -2.86 0.005 -.0500141 -.0090267
log_ta .9934426 .6449867 1.54 0.127 -.2879361 2.274821
cir .0006089 .0030228 0.20 0.841 -.0053965 .0066143
gdp -.2868968 .1162933 -2.47 0.016 -.5179337 -.05586
inf -.725627 .2901574 -2.50 0.014 -1.302075 -.1491786

country_enc
BH 0 (omitted)
KW 0 (omitted)
OM 0 (omitted)
QA 0 (omitted)
SA 0 (omitted)

year
2008 3.681293 1.57233 2.34 0.021 .5575856 6.805
2009 -4.802538 1.76613 -2.72 0.008 -8.311264 -1.293812
2010 -2.986004 1.018095 -2.93 0.004 -5.008628 -.9633809
2011 1.681272 1.153676 1.46 0.149 -.6107067 3.97325
2012 5.014927 2.676075 1.87 0.064 -.3015633 10.33142
2013 5.453898 2.911113 1.87 0.064 -.3295357 11.23733
2014 5.992145 3.198673 1.87 0.064 -.3625764 12.34687
2015 9.797602 4.86684 2.01 0.047 .1287741 19.46643

country_enc#year
BH#2008 -1.590107 1.116315 -1.42 0.158 -3.807862 .6276489
BH#2009 6.762221 2.849444 2.37 0.020 1.101302 12.42314
BH#2010 4.718013 2.151318 2.19 0.031 .4440452 8.991981
BH#2011 11.80396 5.579555 2.12 0.037 .7192008 22.88872
BH#2012 14.00803 6.468143 2.17 0.033 1.157937 26.85813
BH#2013 15.62937 7.140021 2.19 0.031 1.444476 29.81427
BH#2014 10.44497 4.645631 2.25 0.027 1.215609 19.67432
BH#2015 0 (omitted)
KW#2008 -.9938968 .6607788 -1.50 0.136 -2.306649 .3188556
KW#2009 2.016586 .8360217 2.41 0.018 .3556827 3.677489
KW#2010 -3.592654 1.777485 -2.02 0.046 -7.123937 -.061371
KW#2011 -5.220124 2.617954 -1.99 0.049 -10.42115 -.019102
KW#2012 -11.93304 5.444058 -2.19 0.031 -22.74861 -1.117467
KW#2013 -12.22739 5.582161 -2.19 0.031 -23.31733 -1.137451
KW#2014 -14.28755 6.450839 -2.21 0.029 -27.10327 -1.471832
KW#2015 -17.3921 7.822955 -2.22 0.029 -32.93377 -1.850436
OM#2008 1.422949 .6512566 2.18 0.031 .1291136 2.716783
OM#2009 .639456 .4175157 1.53 0.129 -.1900119 1.468924
OM#2010 3.273371 1.404909 2.33 0.022 .4822747 6.064467
OM#2011 -.1204118 .5244058 -0.23 0.819 -1.162235 .9214118
OM#2012 -4.497804 2.026926 -2.22 0.029 -8.524646 -.4709624
OM#2013 -5.284617 2.096766 -2.52 0.013 -9.450209 -1.119024
OM#2014 -9.347664 3.947663 -2.37 0.020 -17.19039 -1.504943
OM#2015 -15.73905 6.828109 -2.31 0.023 -29.30428 -2.173821
QA#2008 -4.89923 2.117082 -2.31 0.023 -9.105184 -.6932768
QA#2009 -15.64697 6.491666 -2.41 0.018 -28.54379 -2.75014
QA#2010 -11.4837 4.863722 -2.36 0.020 -21.14633 -1.821064
QA#2011 -12.17413 5.056877 -2.41 0.018 -22.22049 -2.127756
QA#2012 -16.73995 6.711824 -2.49 0.014 -30.07416 -3.405741
QA#2013 -17.85967 7.228353 -2.47 0.015 -32.22006 -3.499289
QA#2014 -19.40116 7.929745 -2.45 0.016 -35.15498 -3.647341
QA#2015 -25.97281 10.74476 -2.42 0.018 -47.31914 -4.626475
SA#2008 .5516365 .3903445 1.41 0.161 -.2238509 1.327124
SA#2009 8.842405 3.810767 2.32 0.023 1.271651 16.41316
SA#2010 10.24259 4.422587 2.32 0.023 1.456345 19.02883
SA#2011 5.881292 2.505673 2.35 0.021 .9033342 10.85925
SA#2012 0 (omitted)
SA#2013 0 (omitted)
SA#2014 0 (omitted)
SA#2015 0 (omitted)

_cons -113.7222 45.00304 -2.53 0.013 -203.1286 -24.31586

sigma_u 10.70542
sigma_e .83403192
rho .99396705 (fraction of variance due to u_i)

Difference in Difference Parallel Trends test

Dear Statalisters,

I am running a Difference in Difference analysis. In order to test whether treatment and control units have common trends, I am doing the following exercise.

During the period before treatment, I run the regression below, where Treatment_i is a dummy taking value 1 if the unit i is in treatment, 0 otherwise; Untreatment_i is a dummy with value 1 if unit i is not treated, 0 otherwise; Time_t is a time variable. Further, I add fixed effects by each time period and unit, and a number of observed factors - which I do also include in the Difference in Difference regression.

Outcome = a + b (Treatment_i * Time_t) + c (Untreatment_i * Time_t) + FixedEffectsUnits_i + FixedEffectsTime_t + Controls_it + e_it

After estimating the equation above, I can test whether the coefficients b and c are statistically different through an F test. If I can reject the null hypothesis that the coefficients are the same, this means treatment and untreatment groups fundamentally have different pre-treatment trends.

To see how the test performs, I have run the test in three model specifications: (1) without fixed effects and controls; (2) with fixed effects; (3) with fixed effects and controls. One would expect that, the more we control for observed variables and unobserved fixed factors, the more we should pass the test (e.g. we account for things that may drive different pre-treatment trends, hence making the requirement of parallel trends more flexible). So far, however, I am having the following results:

Model (1) fails the test
Model (2) passes the test
Model (3) fails the test

The control variables that I am adding in Model (3) are generalized in the literature of the topic I am looking at, so I am surprised to see Model (3) failing. I was wondering whether you would be able to share views on my approach. If you think it's sensible, what would be your interpretation of the above? I guess this result is fundamentally saying that ignoring the additional controls is a better Difference in Difference specification, but I struggle to see the intuition of why (the additional controls confound?).

Thanks.

Pre post analysis with ANOVA

Good morning,
I was asked to perform a pre-post analysis with ANOVA in order to see the effect of the treatment on the treated. I thought about running a one way anova with the difference between pre and post values as dependent variable and the treatment dummy as independent but the difference between the means of the two groups are very negligible and i don't obtain any significant results. I saw on youtube that it can be done through ANCOVA but i have only found the procedure to run it on SPSS, and not on Stata.

Thanks in advance for your help!

Tommaso

Using cmp command for simultaneous equation model with an ordered endogenous variable

Hello everyone,

I’m running a simultaneous equation model with 2 equations as following :

Y1 = f(Y2, X1) ; probit
Y2 = f(X1, X2) ; oprobit

Where,
Y1 is a binary variable
Y2 is an ordered endogenous variable
X1 is a set of exogenous variables that affect Y1 and Y2
X2 is a set of exogenous variables that affect only Y2.

To do this, I’m using the cmp command like this :

Cmp (Y1 = Y2# X1) (Y2 = X2 X2), ind($cmp_probit $cmp_oprobit)

At first, I’m not sure if it’s the best solution ? What is the exact difference between the use of the latent variable (Y2#) and if I directlly introduce the categories of Y2 (Y2_1 Y2_2 for example) ? What’s solution is better and why ?

And, how can I interpret the coefficient associate to the latent variable Y2 in my first equation ?

Thanks.

Monday, July 29, 2019

asclogit or clogit with treatment and control?

Dear all,

I have conducted a labelled choice experiment with labels=engine type (gasoline, EV, HEV), and two attributes: availability of vehicle (in minutes from current location) and price of the vehicle.
I am interested in how a treatment randomly allocated to part of the sample influences their choice of engine.
To find probability of choosing EV or HEV over gasoline (the base category) I ran an asclogit on Stata with engine type as the ASC. However, I am not sure how to include the treatment: should it be interacted with the attribute levels or with the demographics (i.e. the case-specific variables)?

Otherwise, I could run a simple clogit and include the ASCs for EV and HEV and then interact these terms with the treatment, but then is there a way of retrieving probability of choosing an EV or HEV from these coefficients/their WTP?

I hope my question is clear.
Thanks to everyone who will answer!
Dana

Data Envelopment Analysis to generate TFP with macro level panel data

Hi, I'm relatively new to stata and struggling with the dea (data envelopment analysis) command.
My questions are 1. how to specify input and output variables before running the dea command.
2. Can the dea command be used to generate TFP for country level data. What I see is mostly the dea is applied for firm level micro data.

why i am not getting Hansen statistics?

. xtabond2 fg_ta l.fg_ta size age ia_w debt_ratio_w TobinQ fac roa cashratio i.sector_1, gmm(fg_ta) iv(l.(size age ia_w debt_ratio_w Tobin
> Q fac roa cashratio i.sector_1)) small
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, one-step system GMM
------------------------------------------------------------------------------
Group variable: id_new Number of obs = 5347
Time variable : year Number of groups = 414
Number of instruments = 406 Obs per group: min = 0
F(37, 5309) = 3.05 avg = 12.92
Prob > F = 0.000 max = 19
------------------------------------------------------------------------------
fg_ta | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fg_ta |
L1. | -.0000606 .0000375 -1.61 0.106 -.0001342 .000013
|
size | -.0038107 .0093966 -0.41 0.685 -.022232 .0146105
age | .0018514 .0007455 2.48 0.013 .00039 .0033129
ia_w | 2.33e-08 2.58e-08 0.90 0.366 -2.73e-08 7.39e-08
debt_ratio_w | .6209697 .0966938 6.42 0.000 .4314102 .8105292
TobinQ | .0016039 .014735 0.11 0.913 -.0272827 .0304906
fac | -6.01e-10 8.22e-10 -0.73 0.465 -2.21e-09 1.01e-09
roa | .0089528 .0022868 3.92 0.000 .0044698 .0134359
cashratio | 2.786667 .418544 6.66 0.000 1.966149 3.607186
|
sector_1 |
1 | 0 (empty)
2 | .3958966 .1144011 3.46 0.001 .1716233 .6201698
3 | .3690125 .114708 3.22 0.001 .1441377 .5938873
4 | .4866837 .098741 4.93 0.000 .2931107 .6802567
5 | .3044108 .0910152 3.34 0.001 .1259836 .482838
6 | .357386 .1038887 3.44 0.001 .1537214 .5610505
8 | .2341531 .1230389 1.90 0.057 -.0070537 .4753599
9 | .3214858 .0934618 3.44 0.001 .1382624 .5047093
10 | .3903854 .1173698 3.33 0.001 .1602924 .6204784
11 | .4766692 .1541836 3.09 0.002 .174406 .7789323
12 | .1961415 .1273304 1.54 0.124 -.0534784 .4457614
13 | .7539128 .1084891 6.95 0.000 .5412297 .966596
14 | .0578078 .1439086 0.40 0.688 -.2243121 .3399278
15 | .2145243 .1155814 1.86 0.064 -.0120627 .4411113
16 | .5095201 .1162163 4.38 0.000 .2816883 .7373518
17 | .2013789 .1250231 1.61 0.107 -.0437177 .4464756
18 | .5156745 .1104714 4.67 0.000 .2991051 .7322439
19 | .0607167 .1339227 0.45 0.650 -.2018268 .3232602
20 | .3742212 .0901035 4.15 0.000 .1975813 .5508611
21 | .4252176 .1091614 3.90 0.000 .2112163 .6392188
22 | .2452838 .1166464 2.10 0.036 .0166089 .4739588
23 | .4224144 .0930485 4.54 0.000 .2400011 .6048277
24 | .4857119 .1778337 2.73 0.006 .1370847 .834339
25 | .4390616 .0952022 4.61 0.000 .2524261 .6256971
26 | .3052786 .1157192 2.64 0.008 .0784214 .5321358
27 | .339258 .1612758 2.10 0.035 .0230911 .6554248
28 | .1758941 .1766032 1.00 0.319 -.1703207 .522109
29 | .2169359 .1399078 1.55 0.121 -.0573409 .4912127
|
_cons | -.7754556 .18524 -4.19 0.000 -1.138602 -.412309
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(L.size L.age L.ia_w L.debt_ratio_w L.TobinQ L.fac L.roa L.cashratio
1bL.sector_1 2L.sector_1 3L.sector_1 4L.sector_1 5L.sector_1 6L.sector_1
8L.sector_1 9L.sector_1 10L.sector_1 11L.sector_1 12L.sector_1
13L.sector_1 14L.sector_1 15L.sector_1 16L.sector_1 17L.sector_1
18L.sector_1 19L.sector_1 20L.sector_1 21L.sector_1 22L.sector_1
23L.sector_1 24L.sector_1 25L.sector_1 26L.sector_1 27L.sector_1
28L.sector_1 29L.sector_1)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/29).fg_ta
Instruments for levels equation
Standard
L.size L.age L.ia_w L.debt_ratio_w L.TobinQ L.fac L.roa L.cashratio
1bL.sector_1 2L.sector_1 3L.sector_1 4L.sector_1 5L.sector_1 6L.sector_1
8L.sector_1 9L.sector_1 10L.sector_1 11L.sector_1 12L.sector_1
13L.sector_1 14L.sector_1 15L.sector_1 16L.sector_1 17L.sector_1
18L.sector_1 19L.sector_1 20L.sector_1 21L.sector_1 22L.sector_1
23L.sector_1 24L.sector_1 25L.sector_1 26L.sector_1 27L.sector_1
28L.sector_1 29L.sector_1
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.fg_ta
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -64.64 Pr > z = 0.000
Arellano-Bond test for AR(2) in first differences: z = -0.97 Pr > z = 0.332
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(368) =4379.38 Prob > chi2 = 0.000
(Not robust, but not weakened by many instruments.)

Difference-in-Sargan tests of exogeneity of instrument subsets:
GMM instruments for levels
Sargan test excluding group: chi2(349) =3135.60 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(19) =1243.78 Prob > chi2 = 0.000
iv(L.size L.age L.ia_w L.debt_ratio_w L.TobinQ L.fac L.roa L.cashratio 1bL.sector_1 2L.sector_1 3L.sector_1 4L.sector_1 5L.sector_1 6L.s
> ector_1 8L.sector_1 9L.sector_1 10L.sector_1 11L.sector_1 12L.sector_1 13L.sector_1 14L.sector_1 15L.sector_1 16L.sector_1 17L.sector_1
> 18L.sector_1 19L.sector_1 20L.sector_1 21L.sector_1 22L.sector_1 23L.sector_1 24L.sector_1 25L.sector_1 26L.sector_1 27L.sector_1 28L.se
> ctor_1 29L.sector_1)
Sargan test excluding group: chi2(333) =3085.86 Prob > chi2 = 0.000
Difference (null H = exogenous): chi2(35) =1293.52 Prob > chi2 = 0.000

Two variables combined fixed effect with clustered standard errors of another variable

Hi Stata community,

I have an unbalanced panel data set for the years 1992-2018 across different years from various industries classified through 4-digit SIC codes. I tried running firm, year and industry fixed effects models separately using the -areg- commands, which gave me decent results. Now my supervisor wants me to run a fixed effect model with two key variables combined (year and industry), with clustered standard errors by firms. Could anyone give me an idea on what sort of codes would be appropriate?

I ran the following command for the single variable (firm, industry and year) fixed effect models individually:

xtset company_id fiscalyear
areg DV IV1 IV2 IV3, absorb (fiscalyear)
areg DV IV1 IV2 IV3, absorb (company_id)
areg DV IV1 IV2 IV3, absorb (sic)

Looking forward to your suggestions! Cheers!

How to test for multiple instruments in in etregress or eregress

Dear all:
I am using Stata 15 to run a linear regression with endogenous treatment aiming to do instrumental variable analysis with a dummy endogenous variable.

I have two potential instruments and would like to test whether I should include one of them or both.

My basic model is this:

etregress y1 c.X1 i.X2 , treat(y2 = c.z1 c.z2 c.X1 i.X2)

, where y1 is the continuous outcome, X1 and X2 are vectors of independent variables, y2 is the dummy endogenous variable, and z1 and z2 potential instruments.

I can also estimate this as

eregress y1 c.X1 c.X2, endogenous(y2 = c.z1 c.z2 c.X1 i.X2, probit), which gives the same results.

Any help on how to test which instrument(s) to use would be very much appreciated.

snapshot with descriptive names instead of numbers

I just upgraded to Stata 16 today and I was experimenting with the new frames feature. This allows a person to make changes to their data and then restore the original dataset. This is useful for e.g. robustness tests in which you want to drop observations or switch an indicator, run a regression, and then restore the original data.

Working with frames helped me come up with some code I personally found useful, and which I didn't find online anywhere else. So I'd thought I'd share it here, so that it shows up in a Google search for anyone else it might help.

I was attracted by the fact that you can refer to a frame by a descriptive name, so that you aren't likely to accidentally load the wrong data.

But I quickly realized a shortcoming in frames: any changes to the working data actually change the data in the frame. That is, the frame is mutable. So you have to make sure to constantly copy one frame to a new one and make changes to the new frame. It's easy to accidentally make changes to the current frame and thus affect all subsequent regressions in unintentional ways.

So that brought me back to snapshot, which is immutable. Once you make a snapshot, that snapshot is read-only. If you change any data, you are changing only the working data, not the snapshot. So if you restore the snapshot, all your changes are undone.

The one shortcoming of snapshot is that you have to refer to snapshots by number, not by name. This means it is easy to accidentally load the wrong snapshot. This can happen if you modify your code to create new snapshots and you forget to update all the numbers in your code.

But that helped me realize a way to refer to snapshots by name, not by number. This way, you get the immutability of snapshots but the descriptive names of frames. I thought I'd share the code, in case it helps anyone else.

The most important part is using "quietly snapshot list" followed by "global descriptive_name = r(snapshot)". r(snapshot) is the number of snapshots, so if you've just created a new snapshot, its number is equal to r(snapshot). Subsequently, you can restore the snapshot by executing "snapshot restore $descriptive_name" instead of "snapshot restore #".

Code:

* First snapshot
clear
cd "insert your path here"
use "insert your data file here"
* do various data processing, as required
snapshot save, label("First snapshot)
quietly snapshot list
global first_snapshot = r(snapshot)

* Second snapshot
clear
cd "insert your path here"
use "insert your data file here"
* do various data processing, as required
snapshot save, label("Second snapshot)
quietly snapshot list
global second_snapshot = r(snapshot)

* Visual inspection to verify
snapshot list
macro list

. . . 
clear
snapshot restore $first_snapshot
clear
snapshot restore $second_snapshot