Saturday, February 29, 2020

Merging two panel datasets

Hello everyone,
I have searched the forum but did not manage to find help for my problem. Im new to stata and still trying to learn so i apologize if my questions are inappropriate.

Im trying to merge two panel data sets like the ones in the examples below but i get the fallowing error msg: "variables Country year do not uniquely identify observations in the master data"

i have tried to use the datasets the other way around and i get this error: variables Country year do not uniquely identify observations in the using data

the code i type is : merge 1:1 Country year using "\\Client\C$\Users\1\Desktop\xxxx.dta nogenerate force

I would appreciate very much any answers and again im very sorry if my question are out of place or inappropriate.



First data set is:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte ID str1 Country int year byte(var1 var2 var3)
1 "A" 2013  6  6  7
1 "A" 2013  8  1  3
2 "A" 2013  4  8  9
2 "A" 2013 10 14 15
1 "A" 2014  1  2  3
1 "A" 2014  1  5  4
2 "A" 2014  6  8  5
2 "A" 2014  5  7  5
1 "B" 2013  7  3  7
1 "B" 2013  8  5  8
end
The second one looks like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 Country int year byte(var1 var2 var3)
"A" 2013 1 4 6
"A" 2014 2 4 6
"A" 2015 2 4 6
"B" 2013 3 4 6
"B" 2014 1 2 3
end

Help needed to fix GMM iteration problem

I am trying to write a wrapper using GMM. When I do it manually, it converges very fast and provides me coefficients and SE. However, when I write it with a loop, it does not converge. Could anyone help me, please? Followings are my manual and automatic programs:

Manual one:


HTML Code:
clear all
program define svywt
syntax varlist, wtvar(varlist)
    
    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'
    local p: word count `wtvar'
    local first1(`:word 1 of `wtvar''/(1+{xb: `wtvar'}))
    
    local first2(`:word 2 of `wtvar''/(1+{xb: }))
    
    local last(last:(`depvar' - ({`depvar': `indepvars' _cons}))/    (1+{xb: }))
    local last_inst instruments(last:`indepvars')
    
    gmm `first1'`-'`first`p'' `last', `last_inst' onestep winitial(unadjusted, independent)
        
    end

sysuse auto, clear
egen y = mean(price)

gen price_1 = price -y

egen x = mean(mpg)

gen mpg_1=mpg-x

svywt price weight mpg, wtvar(price_1 mpg_1)

With a loop:


HTML Code:
clear all
program define svywt
syntax varlist, WTvar(varlist)
    
    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'
    
    local p: word count `wtvar' 
     
    forvalues i = 2(1)`p'{
     local first_1(first_1:`:word `i' of `wtvar''/(1+{xb: `wtvar'} )) //this one is done manually to declare linear combination `xb'
     local first_`i'(first_`i':`:word `i' of `wtvar''/(1+ {xb:}) )
    }
    local third(third:(`depvar' - ({`depvar': `indepvars' _cons}))/    (1+{xb:} ))
    local third_inst instruments(third: `indepvars')
    
    gmm `first_1'`-'`first_`p'' `third', `third_inst' onestep winitial(unadjusted, independent)
        
    end

sysuse auto, clear
egen y = mean(price)

gen price_1 = price -y

egen x = mean(mpg)

gen mpg_1=mpg-x

svywt price weight mpg, wtvar(price_1 mpg_1)
I need a loop because if I have 100 equations, I can accommodate them easily. Thank you so much.

Rabiul

Margins for Interaction Term

I have run the following regression and am trying to analyse whether the initial level of Income of a country affects changes the effect of inequality on growth (i.e the relationship could be non-linear among developed and developing countries). I have included an interaction term to do this which shows that as inequality is negative and the interaction term is positive, inequality negatively affects the growth of a country with lower initial income.

However, I am trying to find if this weakens as the income per capita increases or turns positive for higher income countries and hence believe the
Code:
margins
command is appropriate. But, I am unsure of the code required for this. i have the following:

Code:
. xtdpdgmm growth_rate c.l.gini_disp##c.l.ln_Income l.EFW l.ln_pl_i l.fyr_sch_sec l.myr_sch_sec,
>  model(fod) gmm(l.(c.gini_disp##c.ln_Income EFW ln_pl_i fyr_sch_sec myr_sch_sec), lag(0 0)) gm
> m(l.(c.gini_disp##c.ln_Income EFW ln_pl_i fyr_sch_sec myr_sch_sec), lag(0 0) diff model(level)
> ) two w(ind) teffects
note: standard errors can be severely biased in finite samples

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .00142514
Step 2         f(b) =  .88647575

Group variable: ncountry                     Number of obs         =       708
Time variable: period                        Number of groups      =       112

Moment conditions:     linear =     156      Obs per group:    min =         1
                    nonlinear =       0                        avg =  6.321429
                        total =     156                        max =        11

-------------------------------------------------------------------------------------------
              growth_rate |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
                gini_disp |
                      L1. |  -.0092847   .0008463   -10.97   0.000    -.0109434    -.007626
                          |
                ln_Income |
                      L1. |  -.0522122   .0039198   -13.32   0.000    -.0598949   -.0445296
                          |
cL.gini_disp#cL.ln_Income |   .0009942    .000096    10.36   0.000     .0008062    .0011823
                          |
                      EFW |
                      L1. |   .0078056   .0006158    12.68   0.000     .0065987    .0090124
                          |
                  ln_pl_i |
                      L1. |  -.0054779   .0009282    -5.90   0.000    -.0072971   -.0036586
                          |
              fyr_sch_sec |
                      L1. |   .0001479   .0015346     0.10   0.923    -.0028598    .0031556
                          |
              myr_sch_sec |
                      L1. |   .0119233   .0016257     7.33   0.000      .008737    .0151097
                          |
                   period |
                    1970  |   .0047695   .0008684     5.49   0.000     .0030676    .0064715
                    1975  |  -.0028515   .0012079    -2.36   0.018    -.0052189    -.000484
                    1980  |  -.0152241   .0011472   -13.27   0.000    -.0174725   -.0129757
                    1985  |  -.0064175   .0021772    -2.95   0.003    -.0106847   -.0021503
                    1990  |  -.0024537   .0019021    -1.29   0.197    -.0061817    .0012744
                    1995  |  -.0099781   .0023262    -4.29   0.000    -.0145373   -.0054189
                    2000  |   -.017659   .0021014    -8.40   0.000    -.0217777   -.0135404
                    2005  |  -.0088494   .0021685    -4.08   0.000    -.0130995   -.0045992
                    2010  |   -.013366   .0027469    -4.87   0.000    -.0187499   -.0079821
                    2015  |  -.0326874   .0030746   -10.63   0.000    -.0387135   -.0266612
                          |
                    _cons |   .4437157   .0332909    13.33   0.000     .3784667    .5089647
-------------------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(fodev):
   1965:L.gini_disp 1970:L.gini_disp 1975:L.gini_disp 1980:L.gini_disp
   1985:L.gini_disp 1990:L.gini_disp 1995:L.gini_disp 2000:L.gini_disp
   2005:L.gini_disp 2010:L.gini_disp 2020:L.gini_disp 2025:L.gini_disp
   1965:L.ln_Income 1970:L.ln_Income 1975:L.ln_Income 1980:L.ln_Income
   1985:L.ln_Income 1990:L.ln_Income 1995:L.ln_Income 2000:L.ln_Income
   2010:L.ln_Income 2015:L.ln_Income 2020:L.ln_Income 2025:L.ln_Income
   1965:cL.gini_disp#cL.ln_Income 1970:cL.gini_disp#cL.ln_Income
   1975:cL.gini_disp#cL.ln_Income 1980:cL.gini_disp#cL.ln_Income
   1985:cL.gini_disp#cL.ln_Income 1990:cL.gini_disp#cL.ln_Income
   2000:cL.gini_disp#cL.ln_Income 2005:cL.gini_disp#cL.ln_Income
   2010:cL.gini_disp#cL.ln_Income 2015:cL.gini_disp#cL.ln_Income
   2020:cL.gini_disp#cL.ln_Income 2025:cL.gini_disp#cL.ln_Income 1965:L.EFW
   1970:L.EFW 1975:L.EFW 1980:L.EFW 1990:L.EFW 1995:L.EFW 2000:L.EFW
   2005:L.EFW 2010:L.EFW 2015:L.EFW 2020:L.EFW 2025:L.EFW 1965:L.ln_pl_i
   1970:L.ln_pl_i 1980:L.ln_pl_i 1985:L.ln_pl_i 1990:L.ln_pl_i 1995:L.ln_pl_i
   2000:L.ln_pl_i 2005:L.ln_pl_i 2010:L.ln_pl_i 2015:L.ln_pl_i 2020:L.ln_pl_i
   2025:L.ln_pl_i 1970:L.fyr_sch_sec 1975:L.fyr_sch_sec 1980:L.fyr_sch_sec
   1985:L.fyr_sch_sec 1990:L.fyr_sch_sec 1995:L.fyr_sch_sec 2000:L.fyr_sch_sec
   2005:L.fyr_sch_sec 2010:L.fyr_sch_sec 2015:L.fyr_sch_sec
 2, model(level):
   1970:D.L.gini_disp 1975:D.L.gini_disp 1980:D.L.gini_disp 1985:D.L.gini_disp
   1990:D.L.gini_disp 1995:D.L.gini_disp 2000:D.L.gini_disp 2005:D.L.gini_disp
   2010:D.L.gini_disp 2015:D.L.gini_disp 2020:D.L.gini_disp 2025:D.L.gini_disp
   1965:D.L.ln_Income 1970:D.L.ln_Income 1975:D.L.ln_Income 1980:D.L.ln_Income
   1985:D.L.ln_Income 1990:D.L.ln_Income 1995:D.L.ln_Income 2000:D.L.ln_Income
   2005:D.L.ln_Income 2015:D.L.ln_Income 2020:D.L.ln_Income 2025:D.L.ln_Income
   1965:D.cL.gini_disp#cL.ln_Income 1970:D.cL.gini_disp#cL.ln_Income
   1975:D.cL.gini_disp#cL.ln_Income 1980:D.cL.gini_disp#cL.ln_Income
   1985:D.cL.gini_disp#cL.ln_Income 1990:D.cL.gini_disp#cL.ln_Income
   1995:D.cL.gini_disp#cL.ln_Income 2000:D.cL.gini_disp#cL.ln_Income
   2005:D.cL.gini_disp#cL.ln_Income 2010:D.cL.gini_disp#cL.ln_Income
   2015:D.cL.gini_disp#cL.ln_Income 2020:D.cL.gini_disp#cL.ln_Income
   2025:D.cL.gini_disp#cL.ln_Income 1965:D.L.EFW 1970:D.L.EFW 1975:D.L.EFW
   1980:D.L.EFW 1985:D.L.EFW 1990:D.L.EFW 1995:D.L.EFW 2000:D.L.EFW
   2005:D.L.EFW 2010:D.L.EFW 2015:D.L.EFW 2020:D.L.EFW 2025:D.L.EFW
   1965:D.L.ln_pl_i 1970:D.L.ln_pl_i 1975:D.L.ln_pl_i 1980:D.L.ln_pl_i
   1985:D.L.ln_pl_i 1990:D.L.ln_pl_i 1995:D.L.ln_pl_i 2000:D.L.ln_pl_i
   2005:D.L.ln_pl_i 2010:D.L.ln_pl_i 2015:D.L.ln_pl_i 2020:D.L.ln_pl_i
   2025:D.L.ln_pl_i 1965:D.L.fyr_sch_sec 1970:D.L.fyr_sch_sec
   1975:D.L.fyr_sch_sec 1980:D.L.fyr_sch_sec 1985:D.L.fyr_sch_sec
   1990:D.L.fyr_sch_sec 1995:D.L.fyr_sch_sec 2000:D.L.fyr_sch_sec
   2005:D.L.fyr_sch_sec 2010:D.L.fyr_sch_sec 2015:D.L.fyr_sch_sec
   2020:D.L.fyr_sch_sec
 3, model(level):
   1970bn.period 1975.period 1980.period 1985.period 1990.period 1995.period
   2000.period 2005.period 2010.period 2015.period
 4, model(level):
   _cons
Code:
. margins, dydx(l.gini_disp) at(l.ln_Income = (0(0.1)10))

Average marginal effects                        Number of obs     =        708
Model VCE    : Conventional

Expression   : Linear prediction, predict()
dy/dx w.r.t. : L.gini_disp

1._at        : L.ln_Income     =           0

2._at        : L.ln_Income     =          .1

3._at        : L.ln_Income     =          .2

4._at        : L.ln_Income     =          .3

5._at        : L.ln_Income     =          .4

6._at        : L.ln_Income     =          .5

7._at        : L.ln_Income     =          .6

8._at        : L.ln_Income     =          .7

9._at        : L.ln_Income     =          .8

10._at       : L.ln_Income     =          .9

11._at       : L.ln_Income     =           1

12._at       : L.ln_Income     =         1.1

13._at       : L.ln_Income     =         1.2

14._at       : L.ln_Income     =         1.3

15._at       : L.ln_Income     =         1.4

16._at       : L.ln_Income     =         1.5

17._at       : L.ln_Income     =         1.6

18._at       : L.ln_Income     =         1.7

19._at       : L.ln_Income     =         1.8

20._at       : L.ln_Income     =         1.9

21._at       : L.ln_Income     =           2

22._at       : L.ln_Income     =         2.1

23._at       : L.ln_Income     =         2.2

24._at       : L.ln_Income     =         2.3

25._at       : L.ln_Income     =         2.4

26._at       : L.ln_Income     =         2.5

27._at       : L.ln_Income     =         2.6

28._at       : L.ln_Income     =         2.7

29._at       : L.ln_Income     =         2.8

30._at       : L.ln_Income     =         2.9

31._at       : L.ln_Income     =           3

32._at       : L.ln_Income     =         3.1

33._at       : L.ln_Income     =         3.2

34._at       : L.ln_Income     =         3.3

35._at       : L.ln_Income     =         3.4

36._at       : L.ln_Income     =         3.5

37._at       : L.ln_Income     =         3.6

38._at       : L.ln_Income     =         3.7

39._at       : L.ln_Income     =         3.8

40._at       : L.ln_Income     =         3.9

41._at       : L.ln_Income     =           4

42._at       : L.ln_Income     =         4.1

43._at       : L.ln_Income     =         4.2

44._at       : L.ln_Income     =         4.3

45._at       : L.ln_Income     =         4.4

46._at       : L.ln_Income     =         4.5

47._at       : L.ln_Income     =         4.6

48._at       : L.ln_Income     =         4.7

49._at       : L.ln_Income     =         4.8

50._at       : L.ln_Income     =         4.9

51._at       : L.ln_Income     =           5

52._at       : L.ln_Income     =         5.1

53._at       : L.ln_Income     =         5.2

54._at       : L.ln_Income     =         5.3

55._at       : L.ln_Income     =         5.4

56._at       : L.ln_Income     =         5.5

57._at       : L.ln_Income     =         5.6

58._at       : L.ln_Income     =         5.7

59._at       : L.ln_Income     =         5.8

60._at       : L.ln_Income     =         5.9

61._at       : L.ln_Income     =           6

62._at       : L.ln_Income     =         6.1

63._at       : L.ln_Income     =         6.2

64._at       : L.ln_Income     =         6.3

65._at       : L.ln_Income     =         6.4

66._at       : L.ln_Income     =         6.5

67._at       : L.ln_Income     =         6.6

68._at       : L.ln_Income     =         6.7

69._at       : L.ln_Income     =         6.8

70._at       : L.ln_Income     =         6.9

71._at       : L.ln_Income     =           7

72._at       : L.ln_Income     =         7.1

73._at       : L.ln_Income     =         7.2

74._at       : L.ln_Income     =         7.3

75._at       : L.ln_Income     =         7.4

76._at       : L.ln_Income     =         7.5

77._at       : L.ln_Income     =         7.6

78._at       : L.ln_Income     =         7.7

79._at       : L.ln_Income     =         7.8

80._at       : L.ln_Income     =         7.9

81._at       : L.ln_Income     =           8

82._at       : L.ln_Income     =         8.1

83._at       : L.ln_Income     =         8.2

84._at       : L.ln_Income     =         8.3

85._at       : L.ln_Income     =         8.4

86._at       : L.ln_Income     =         8.5

87._at       : L.ln_Income     =         8.6

88._at       : L.ln_Income     =         8.7

89._at       : L.ln_Income     =         8.8

90._at       : L.ln_Income     =         8.9

91._at       : L.ln_Income     =           9

92._at       : L.ln_Income     =         9.1

93._at       : L.ln_Income     =         9.2

94._at       : L.ln_Income     =         9.3

95._at       : L.ln_Income     =         9.4

96._at       : L.ln_Income     =         9.5

97._at       : L.ln_Income     =         9.6

98._at       : L.ln_Income     =         9.7

99._at       : L.ln_Income     =         9.8

100._at      : L.ln_Income     =         9.9

101._at      : L.ln_Income     =          10

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
L.gini_disp  |
         _at |
          1  |  -.0092847   .0008463   -10.97   0.000    -.0109434    -.007626
          2  |  -.0091853   .0008367   -10.98   0.000    -.0108253   -.0075453
          3  |  -.0090859   .0008272   -10.98   0.000    -.0107072   -.0074645
          4  |  -.0089864   .0008177   -10.99   0.000    -.0105891   -.0073837
          5  |   -.008887   .0008082   -11.00   0.000     -.010471    -.007303
          6  |  -.0087876   .0007987   -11.00   0.000     -.010353   -.0072222
          7  |  -.0086882   .0007892   -11.01   0.000    -.0102349   -.0071414
          8  |  -.0085887   .0007797   -11.02   0.000    -.0101169   -.0070606
          9  |  -.0084893   .0007702   -11.02   0.000    -.0099988   -.0069798
         10  |  -.0083899   .0007607   -11.03   0.000    -.0098807    -.006899
         11  |  -.0082905   .0007511   -11.04   0.000    -.0097627   -.0068182
         12  |   -.008191   .0007416   -11.04   0.000    -.0096447   -.0067374
         13  |  -.0080916   .0007322   -11.05   0.000    -.0095266   -.0066566
         14  |  -.0079922   .0007227   -11.06   0.000    -.0094086   -.0065758
         15  |  -.0078928   .0007132   -11.07   0.000    -.0092906    -.006495
         16  |  -.0077934   .0007037   -11.08   0.000    -.0091725   -.0064142
         17  |  -.0076939   .0006942   -11.08   0.000    -.0090545   -.0063333
         18  |  -.0075945   .0006847   -11.09   0.000    -.0089365   -.0062525
         19  |  -.0074951   .0006752   -11.10   0.000    -.0088185   -.0061717
         20  |  -.0073957   .0006657   -11.11   0.000    -.0087005   -.0060908
         21  |  -.0072962   .0006563   -11.12   0.000    -.0085825     -.00601
         22  |  -.0071968   .0006468   -11.13   0.000    -.0084645   -.0059291
         23  |  -.0070974   .0006373   -11.14   0.000    -.0083465   -.0058482
         24  |   -.006998   .0006279   -11.15   0.000    -.0082286   -.0057674
         25  |  -.0068985   .0006184   -11.16   0.000    -.0081106   -.0056865
         26  |  -.0067991    .000609   -11.17   0.000    -.0079927   -.0056056
         27  |  -.0066997   .0005995   -11.18   0.000    -.0078747   -.0055247
         28  |  -.0066003   .0005901   -11.19   0.000    -.0077568   -.0054438
         29  |  -.0065009   .0005806   -11.20   0.000    -.0076389   -.0053628
         30  |  -.0064014   .0005712   -11.21   0.000    -.0075209   -.0052819
         31  |   -.006302   .0005618   -11.22   0.000     -.007403    -.005201
         32  |  -.0062026   .0005523   -11.23   0.000    -.0072851     -.00512
         33  |  -.0061032   .0005429   -11.24   0.000    -.0071673   -.0050391
         34  |  -.0060037   .0005335   -11.25   0.000    -.0070494   -.0049581
         35  |  -.0059043   .0005241   -11.27   0.000    -.0069315   -.0048771
         36  |  -.0058049   .0005147   -11.28   0.000    -.0068137   -.0047961
         37  |  -.0057055   .0005053   -11.29   0.000    -.0066959   -.0047151
         38  |   -.005606   .0004959   -11.30   0.000    -.0065781    -.004634
         39  |  -.0055066   .0004866   -11.32   0.000    -.0064603    -.004553
         40  |  -.0054072   .0004772   -11.33   0.000    -.0063425   -.0044719
         41  |  -.0053078   .0004678   -11.35   0.000    -.0062247   -.0043908
         42  |  -.0052084   .0004585   -11.36   0.000     -.006107   -.0043097
         43  |  -.0051089   .0004492   -11.37   0.000    -.0059893   -.0042286
         44  |  -.0050095   .0004398   -11.39   0.000    -.0058716   -.0041474
         45  |  -.0049101   .0004305   -11.40   0.000    -.0057539   -.0040663
         46  |  -.0048107   .0004212   -11.42   0.000    -.0056363   -.0039851
         47  |  -.0047112   .0004119   -11.44   0.000    -.0055186   -.0039038
         48  |  -.0046118   .0004027   -11.45   0.000     -.005401   -.0038226
         49  |  -.0045124   .0003934   -11.47   0.000    -.0052835   -.0037413
         50  |   -.004413   .0003842   -11.49   0.000    -.0051659     -.00366
         51  |  -.0043135    .000375   -11.50   0.000    -.0050485   -.0035786
         52  |  -.0042141   .0003658   -11.52   0.000     -.004931   -.0034972
         53  |  -.0041147   .0003566   -11.54   0.000    -.0048136   -.0034158
         54  |  -.0040153   .0003474   -11.56   0.000    -.0046962   -.0033343
         55  |  -.0039158   .0003383   -11.58   0.000    -.0045789   -.0032528
         56  |  -.0038164   .0003292   -11.59   0.000    -.0044616   -.0031712
         57  |   -.003717   .0003201   -11.61   0.000    -.0043444   -.0030896
         58  |  -.0036176   .0003111   -11.63   0.000    -.0042272   -.0030079
         59  |  -.0035182    .000302   -11.65   0.000    -.0041101   -.0029262
         60  |  -.0034187   .0002931   -11.67   0.000    -.0039931   -.0028443
         61  |  -.0033193   .0002841   -11.68   0.000    -.0038762   -.0027624
         62  |  -.0032199   .0002752   -11.70   0.000    -.0037593   -.0026804
         63  |  -.0031205   .0002664   -11.71   0.000    -.0036426   -.0025983
         64  |   -.003021   .0002576   -11.73   0.000    -.0035259   -.0025161
         65  |  -.0029216   .0002489   -11.74   0.000    -.0034094   -.0024338
         66  |  -.0028222   .0002402   -11.75   0.000     -.003293   -.0023514
         67  |  -.0027228   .0002316   -11.76   0.000    -.0031767   -.0022688
         68  |  -.0026233   .0002231   -11.76   0.000    -.0030606    -.002186
         69  |  -.0025239   .0002147   -11.76   0.000    -.0029447   -.0021031
         70  |  -.0024245   .0002064   -11.75   0.000     -.002829     -.00202
         71  |  -.0023251   .0001982   -11.73   0.000    -.0027135   -.0019366
         72  |  -.0022257   .0001901   -11.71   0.000    -.0025983    -.001853
         73  |  -.0021262   .0001822   -11.67   0.000    -.0024834   -.0017691
         74  |  -.0020268   .0001745   -11.62   0.000    -.0023687   -.0016849
         75  |  -.0019274   .0001669   -11.55   0.000    -.0022545   -.0016003
         76  |   -.001828   .0001596   -11.46   0.000    -.0021407   -.0015152
         77  |  -.0017285   .0001525   -11.34   0.000    -.0020274   -.0014297
         78  |  -.0016291   .0001457   -11.18   0.000    -.0019146   -.0013436
         79  |  -.0015297   .0001392   -10.99   0.000    -.0018025   -.0012568
         80  |  -.0014303   .0001331   -10.74   0.000    -.0016912   -.0011693
         81  |  -.0013308   .0001275   -10.44   0.000    -.0015807    -.001081
         82  |  -.0012314   .0001223   -10.07   0.000    -.0014711   -.0009917
         83  |   -.001132   .0001177    -9.62   0.000    -.0013627   -.0009013
         84  |  -.0010326   .0001137    -9.08   0.000    -.0012554   -.0008097
         85  |  -.0009332   .0001104    -8.45   0.000    -.0011496   -.0007167
         86  |  -.0008337   .0001079    -7.73   0.000    -.0010452   -.0006223
         87  |  -.0007343   .0001061    -6.92   0.000    -.0009424   -.0005263
         88  |  -.0006349   .0001053    -6.03   0.000    -.0008412   -.0004286
         89  |  -.0005355   .0001053    -5.09   0.000    -.0007418   -.0003291
         90  |   -.000436   .0001061    -4.11   0.000     -.000644    -.000228
         91  |  -.0003366   .0001078    -3.12   0.002     -.000548   -.0001252
         92  |  -.0002372   .0001104    -2.15   0.032    -.0004535   -.0000209
         93  |  -.0001378   .0001137    -1.21   0.225    -.0003605     .000085
         94  |  -.0000383   .0001176    -0.33   0.744    -.0002689    .0001922
         95  |   .0000611   .0001222     0.50   0.617    -.0001785    .0003007
         96  |   .0001605   .0001274     1.26   0.208    -.0000892    .0004102
         97  |   .0002599    .000133     1.95   0.051    -8.34e-07    .0005207
         98  |   .0003593   .0001391     2.58   0.010     .0000867     .000632
         99  |   .0004588   .0001456     3.15   0.002     .0001734    .0007441
        100  |   .0005582   .0001524     3.66   0.000     .0002595    .0008569
        101  |   .0006576   .0001595     4.12   0.000     .0003451    .0009702
------------------------------------------------------------------------------

. marginsplot

  Variables that uniquely identify margins: L.ln_Income
Array

But, I am not sure if this is the appropriate code for the information I am trying to get. Any help understanding the appropriate code would be greatly appreciated.

What kind of a binary regression model should I choose if I work with pool or panel data?

Hello!

I have a longitudinal dataset, which contains observations of individuals between 2008-2017. I want to examine only one year period, but it contains too few observations, that is why I have decided to combine all the years to make a pool. However, now, I am not sure what kind of regression to choose in order to find out the relationship between the dependent and independent variables. The types of variables are illustrated in the table below:
Response variable dummy
Variable of interest dummy
Control variables different ones (dummy, float, etc.)
1) Could you, please, help me by recommending what kind of a model to choose?
2) May I just use probit regression by not paying attention to years while using the pool regression (similar to that one I would use if I worked with cross-sectional data)?

Difference between quietly gen, quietly test and quietly regress?

Hi guys, I'm trying to replicate a paper but could not understand the meaning of the following commands. I tried googling it but still, the meaning is not clear to me.

foreach x in N_baseline {
qui gen `x'_T1=`x' if vote==1
qui gen `x'_T2=`x' if vote==0

foreach k in T1 T2 {
egen m_`k'=mean(`x'_`k')
local m_`k' = m_`k'
egen sd_`k'=sd(`x'_`k')
local sd_`k'= sd_`k'
}

quietly reg `x' vote, r
quietly test vote = 0
gen p_1= r(p)
local p_1 = p_1

Combining a newly constructed dummy variables to one categorical variable for analysis/refcat

Dear Statalist,

I have currently created four variables about family status: 1) Parents living together 2) Shared custody 3) Lonley parent 4) Living with a step parent/new family. They act as four dummys where 0 = different status 1 = actual status. My challenge is to combine these together.

Talking to my advisor she told me it would be more wise to group them together and use one of them as a reference category. I assume the best way to do this is to run the final, combined categorical variable as a factor variable and write b1.familystatus in the regression.After much googling and Statalisting I do not manage to find exactly what I look for.

So far I have tried the egen group-function (egen newvar = group (var1 var2 var3 var4)) but this gives me more categories than expected/a combination of the combinations I guess.

Has anyone come across this dilemma before?


Thanks a lot in advance.


Best regards,
Jonas.



If it was unclear: I have 4 pie charts where each just show 0 or 1 (50,50 split), while I want a more nuanced pie chart with all the combinations (25% split).

Computing IPW estimator by hand

Hello everyone,

For my project, I try to know if a certain type of policy ("pol", which is a dummy) has an impact on credit growth ("cred"). The problem is that there is an endogeneity bias, because variations in "cred" impact the implementation of "pol". In order to alleviate this endogeneity bias, I try to estimate the IPW score, because it gives less weight to observations that are easy to predict and more weight to observations that are unanticipated.

I want to calculate the IPW by hand, following the methodology in Richter et al (2019).

In a first step, I use a xtlogit in order to obtain the IPW score.

Code:
xtlogit pol l(1/4).pol l(1/4).cred l(1/4).control1 l(1/4).control2, fe

predic pred
Following Richter et al (2019), I use this formula for the IPW score:

Code:
gen IPW = (pol / pred) + (1- pol)/(1 - pred)
In a second step, I want to estimate the response function of the credit growth using local projections à la Jordà (2005).

Code:
lp cred l(1/4).pol  l(1/4).cred  l(1/4).control1  l(1/4).control2
The problem is that the command
Code:
lp
doesn't work with the option
Code:
 [pweight= IPW]
I struggle to understang how i can weighting my observations. Is it possible to do it by hand?

Also, I don't understand clearly how the IPW estimor (and other weigthing methodology) work:

- Does it only weight the treatment with something like this: IPW*pol
- Or all the variables: IPW*pol, IPW*cred, IPW*control1, etc.
- Or none of that.

Thanks for your answers.


Reference:
- Richter, B., Schularick, M., & Shim, I. (2019). The costs of macroprudential policy. Journal of International Economics, 118, 263-282.
- Jordà, Ò. (2005). Estimation and inference of impulse responses by local projections. American economic review, 95(1), 161-182.


New to SSC mbitobit - Bivariate Tobit model

Dear all,
Thanks to Professor Baum, A new command is available in SSC. -mbitobit-.
This command re-addresses an old problem. The estimation of bivariate tobit models, when the dependent variables are censored at zero.
While there are other commands that do the same (mvtobit, bitobit and cmp). I wrote this one to make it easier to obtain predictions (and marginal effects)
from most common for the model.
I also added an option -sim- to simulate data based on models already estimated
comments and questions are welcomed.

Best Regards,

Fernando

hausman test problem

Dear all,


I am working with panel data and when I use the command: hausman fe re, force, I get the following:

chi2(55) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -0.10 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;

Does anyone know how can I overcome this issue? I can not find a clear answer.


Kind Regards,
Katerina
Stata/SE 16.0

Bar Chart Percentiles

Hello,

I plotted a bar graph of the education level agains the mean of the average pay. Because of outliers, this grap is useless. This is why I want to plot the mean of the average pay for every education level against the 99 percentile of each pay according to the education level. How could I do that?

Kolmogorov – Smirnov Test for panel database

Hello!

I want to perform the Kolmogorov –Smirnov test to a panel database given that it is a frequently employed method within the literature. I have to simple questions:
1) My panel database includes from the whole population 70% of firms with more than 200 workers while less than 5% of firms with less than 200 workers. Can I use this database?
2) I want to check if exporting firms are more productive than non-exporting firms. However, firm j can be non-exporters in year t, become exporter in year t+1, and become non-exporters in year t+2. Can I compare the productivity between the group of firms that are exporters vs non-exporters, if firms switch status over time? If so, do I need any specific command for the K-S test?

Thank you very much for your time.

Create a new variable based on observations within a group

Hello,

my dataset is structered the following:

clear
input float (group_id won_close_dummy loss_close_dummy)
1 0 1
2 1 0
2 1 0
3 1 0
6 1 0
7 0 1
9 0 0
13 0 1
13 1 0
13 0 1
13 0 1
17 1 0
17 0 1
18 0 1
20 1 0
21 1 0
24 1 0
24 1 0
25 0 1
28 0 1
33 1 0
34 1 0
35 1 0
38 0 0
42 1 0
43 1 0
43 1 0
43 1 0
43 1 0
43 1 0
45 1 0
45 0 1
46 0 1
46 1 0
47 1 0
47 0 1

I now want to create a new binary variable "donated" which takes the value of 0 if within a group there is at least one 1 for both "won_close_dummy" and "loss_close_dummy". This means if within a group (eg. group_id ==13 or group_id==17) both variables "won_close_dummy" and "loss_close_dummy" have at least one 1 the new variable "donated" should get a 0 for the whole group.
If within a group (eg. group_id==2 or group_id==43) only one variable has a 1 the new variable "donated" should get a 1 for the hole group.
This means it should look like this in the end:

input float(group_id won_close_dummy loss_close_dummy donated)
1 0 1 1
2 1 0 1
2 1 0 1
3 1 0 1
6 1 0 1
7 0 1 1
9 0 0 1
13 0 1 0
13 1 0 0
13 0 1 0
13 0 1 0
17 1 0 0
17 0 1 0
18 0 1 1
20 1 0 1
21 1 0 1
24 1 0 1
24 1 0 1
25 0 1 1
28 0 1 1
33 1 0 1
34 1 0 1
35 1 0 1
38 0 0 1
42 1 0 1
43 1 0 1
43 1 0 1
43 1 0 1
43 1 0 1
43 1 0 1
45 1 0 0
45 0 1 0
46 0 1 0
46 1 0 0
47 1 0 0
47 0 1 0

I hope you can help me solving this problem. Thanks in advance!

Greetings
Marcel

How to load in the vertex attributes in network

Hello everyone, I am struggling to find out how to load the vertex attributes into a network? I am currently using the "nwcommands". I have searched for the whole afternoon but I have no idea on that. And I have not find a similar thread on this question. Could someone help me on that? Any suggestions will be appreciated. Thank you in advance.

How to add a constraint when matching using teffects psmatch, cem or psmatch2?

Hello dear statalists,
I am trying to match the houses sold at year t with similar houses sold in year t-1, t-2,..., t-5, respectively. That means apart from controlling for housing characteristic, I also need to add a constraint on trade_year. How can I do that with teffects psmatch, cem, psmatch2, or other Stata commands?

How to genearte the list of vars for nodes using nw2fromedge command when the number of nodes exceeds 1000

Hello, could someone help me on this question. I am using the network analysis command "nw2fromedge" to generate a one-mode network. As shown in the attached picture, the command will simultaneouly create a list of variables names "var2 var3 var4 var5" corresponding each node. In
my case where the number of codes more than 1000, I could successfully generate the network, however,
it has not simultaneously generated the variables named "var1 var2 var3 var4 ... var1000 var1001 ...". I notice that 1000 is the boundary. When the number of nodes is no more than 1000, the nw2fromedge command will generate "var1 var2 var3 var4 ... var1000" but when the number of nodes exceeds 1000 it has not.

When I run "nwsummarize, mat" I could see the matrix in the command dialogue. But I dont know how to operate on that or generate the varlist.

I need these variables to be explicit in varlist as I will use them to merge with other data and do some further analysis. And I think maybe I need to use MATA to utilize it but I am new to MATA. I will appreciate your suggestions or advice and thanks in advance.


The example codes to generate the dataset in the picture:

nwclear
use "https://ift.tt/32MP4fn", clear

nw2fromedge person institution, project(1)


Friday, February 28, 2020

Define nested loops for dependent and independent variables

I have a panel data set from 1970 to 2014 on yields and prices for four different crops. I need to regress yield on prices and the controls for all four crops separately i.e run four regressions. The conrols are common in all the four crop regressions but the prices are cop specific.
For this I need a loop to repeat the regression for the four crops and also to include the price specific to the crop which is the dependent variable
Following several posts on statalist I was able to run the following loop

Code:
local m=1
foreach Yield in Crop1 Crop2 Crop3 Crop4 {
                foreach Price in Price1 Price2 Price3 Price4 {
                local Control Control1 Control2 Control3
                eststo model_`m': xtabond2 `Yield' L(1/2).`Yield' L.`Price'  `Control' i.Year, gmm(L.Yield `Price', eq(diff)laglimits(2 6)collapse) ///
                gmm(L.Yield L.Price, eq(level)laglimits(1 4)) iv(L.Year, equation(level)) twostep small robust
                local m `++m'
                }
 }
esttab model_*  , se(4) b(4) r(2)wide
However this executes 16 regressions. One for each of the four dependent variables by including the independent variable i.e. price sequentially. So I have 16 regressions instead of four.
Evidently something needs to be adjusted in my loop specifying the price variables. Any suggestions?

Two Stage Least Squares with multiple Fixed Effects (using reghdfe)

Dear Statalist,

I am trying to estimate a two stage least squares where both stages contain multiple fixed effects and interactions between age and district factors. I used reghdfe in combination with the xi command for the first stage regression and attempted to use the stages(first) addition but I keep getting an error saying "invalid options: stages(first)". Does anyone know why this is happening or have any ideas as to a different way to proceed?

Here is my code and the response:

xi: reghdfe yrschl i.age74|intensity, stages(first) absorb(age district) v

> ce(cluster district)

i.age74 _Iage74_2-23 (naturally coded; _Iage74_12 omitted)

i.age74|inten~y _IageXinte_# (coded as above)

invalid options: stages(first)

r(9);



Any input would be greatly valued. Thank you

Dropping repeated columns from a matrix

Dear statalist,

I have a matrix in which two or more columns are exactly same. Is there some way of keeping one of them and dropping all others?

For example, I have the matrix

1 4 2 1 1
2 7 0 2 2
3 3 5 3 3

I want to drop the last two columns since they are same as the 1st column.

Thank you in advance.

Shoummo Sen Gupta

Subsetting twice with independent group t-tests

I'm looking to use an independent group t-test to compare the difference between students' grade estimates and their assigned grades (variable: gradempre) in one course against that of another course (variable: CourseID).

I've tried using:
ttest gradempre if CourseID==1, by(gradempre if CourseID==2)

to no avail. Of course I could assign the two courses' gradempre values as different variables...but I'm looking to do this kind of kind of comparison a number of times (and it feels like a good thing to know how to do!).

Any advice is much appreciated! Thank you.

System info:
Stata 15.1 for Windows (running Win10)

Combining Shapefiles and the households survey data

Hello!
I am working with the household survey with the Household Geocodes displaced at 10km buffer and an external shapefile. I converted a shapefile from a raster file using QGIS and also used a stata command “shp2dta” to convert the shapefile into dta files. I would like to march the information stored in the variable “DN” to the nearest household. I understand that the command “geonear” can be used to calculate the nearest distance and then merge the data.

However, I am facing some difficulty in executing the task.
(1) One when I merge the file tz_data(shp) and tz_coordinates created by shp2dta file a lot of missing is generated.

(2) If I try to use the command “geonear ea_id lat_modified lon_modified using "Tz_coordinates.dta", n(_ID _X _Y)” the following error appears: “lat_modified or lon_modified not constant within ea_id group”. I also tried to use “geoinpoly lat_dd_mod lon_dd_mod using "Tanzania_coordinates.dta” command stata gives me the following error: Y (latitude) must be between -90 and 90 in coordinates file.
Could you please assist me on how I can go about it?.

Here is the sample of the data that I am using: The first and second is the shapefile and database file generated by “shp2dta”. and the last codes is for the household survey data.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long _ID double _X
1                  .
1 -647509.5603336413
1 -647259.5711087444
1 -647259.5711087444
1 -647509.5603336413
1 -647509.5603336413
2                  .
2 -646009.6249842601
2 -645759.6357593632
2 -645759.6357593632
2 -646009.6249842601
2 -646009.6249842601
3                  .
3 -637260.0021128698
3 -636760.0236630761
3 -636760.0236630761
3 -637260.0021128698
3 -637260.0021128698
4                  .
4 -636010.0559883855
4 -635260.0883136949
4 -635260.0883136949
4 -636010.0559883855
4 -636010.0559883855
5                  .
5 -633010.1852896231
5 -632760.1960647262
5 -632760.1960647262
5 -633010.1852896231
5 -633010.1852896231
end

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long _ID byte DN
 1 51
 2 51
 3 22
 4 22
 5 51
 6 44
 7 44
 8 51
 9 51
10 51
11 58
12 58
13 32
14 58
15 69
16 29
17 51
18 58
19 44
20 25
21 58
22 58
23 58
24 58
25 76
26 76
27 69
28 58
29 58
30 58
end






Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 y3_hhid double(lat_dd_mod lon_dd_mod) int(crops01 crops02)
"0001-001" -5.08575059837 35.8543891943 557 370
"0002-001" -5.08575059837 35.8543891943 557 370
"0003-001" -5.08575059837 35.8543891943 557 370
"0003-010" -6.86088773309 39.2519840813 756 458
"0005-001" -5.08575059837 35.8543891943 557 370
"0006-001" -5.08575059837 35.8543891943 557 370
"0007-001" -5.08575059837 35.8543891943 557 370
"0008-001" -5.08575059837 35.8543891943 557 370
"0009-001" -5.08575059837 35.8543891943 557 370
"0010-001" -6.45661888376 36.7228609724 619 442
"0011-001" -6.45661888376 36.7228609724 619 442
"0011-004" -6.40602581971 36.8375470998 645 429
"0012-001" -6.45661888376 36.7228609724 619 442
"0013-001" -6.45661888376 36.7228609724 619 442
"0014-001" -6.45661888376 36.7228609724 619 442
"0014-007" -6.45661888376 36.7228609724 619 442
"0015-001" -6.98727510065 36.8989578038 646 430
"0016-001" -6.45661888376 36.7228609724 619 442
"0017-001" -6.45661888376 36.7228609724 645 429
"0018-001" -6.45661888376 36.7228609724 621 435
"0019-001"  -6.6140094868  36.508662736 539 397
"0019-003"  -6.6140094868  36.508662736 539 397
"0020-001" -6.65111878938  36.385741194 524 388
"0021-001"  -6.6140094868  36.508662736 539 397
"0022-001"  -6.6140094868  36.508662736 539 397
"0023-001"  -6.6140094868  36.508662736 539 397
"0024-001"  -6.6140094868  36.508662736 539 397
"0025-001" -6.71165256022 38.7421347251 828 533
"0026-001"  -6.6140094868  36.508662736 539 397
"0027-001"  -6.6140094868  36.508662736 539 397
end


Inequality analysis

Hi,
I’am new in this forum, I don’t know if I posted in good place for my question, this is my first message. I’am taking class about Inequality analysis, as i’am new in Stata, I have problem with some commands used. I didn’t understand the following code:
gen transfer = 0
loca total_transfers = 0
local limit = 0
while `total_transfers' < 300 {
qui replace transfer=max(0, `limit'-income)
qui sum transfer
local total_transfers=r(sum)
local limit = `limit'+1
}
I understood the first line of the code, but the rest is not clear for me. Could you help me for understanding please? Thanks very much

Fitting Gompertz distribution using streg

Hello,

I want to know how can I restrict the hazard rate (γ) ancillary parameter to be strictly positive while fitting Gompertz distribution. The -streg- command manual doesn't specify how to achieve this.

My Stata command is: streg if treatment==0, d(gom) nohr nolog

Thank you,

Kind Regards,
Masnoon.

merging different levels of data

I have three datasets that I want to merge for multilevel regression. The first one is individual-level survey data (afro6new). The other two are country-level datasets. When I try to merge, I'm told my 'country' and 'year' variables do not uniquely identify data. I have tried several reshaping techniques but only errors, and I could not figure how to reshape it. Added, to that, some of my data are in string and I can't quite destring them all with this command:

foreach var of varlist country countrycode year timecode gdppercapitacurrentusnygdppcapcd gdppercapitapppcurrentinternatio gdppppcurrentinternationalnygdpm cpiatransparencyaccountabilityan cpiapoliciesforsocialinclusioneq cpiaqualityofbudgetaryandfinanci cpiafinancialsectorrating1lowto6 unemploymenttotaloftotallaborfor v13 {
destring `var', replace
}

code and error
use afro6new.dta
. merge 1:1 country year using p4v2018trimmed.dta
variables country year do not uniquely identify observations in the master data
r(459);


Code and error when i try using the other dataset as the master
use wdi.dta
merge 1:1 country year using afro6new.dta
key variable country is str48 in master but byte in using data
Each key variable -- the variables on which observations are matched -- must be of the
same generic type in the master and using datasets. Same generic type means both numeric
or both string.



Please help.

First dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte country int year str7 RESPNO byte Q27E
1 2015 "ALG0001" 9
1 2015 "ALG0002" 0
1 2015 "ALG0003" 0
1 2015 "ALG0004" 0
1 2015 "ALG0005" 0
1 2015 "ALG0006" 9
1 2015 "ALG0007" 3
1 2015 "ALG0008" 1
1 2015 "ALG0009" 0
1 2015 "ALG0010" 0
1 2015 "ALG0011" 2
1 2015 "ALG0012" 0
1 2015 "ALG0013" 0
1 2015 "ALG0014" 1
1 2015 "ALG0015" 0
1 2015 "ALG0016" 1
1 2015 "ALG0017" 1
1 2015 "ALG0018" 9
1 2015 "ALG0019" 0
1 2015 "ALG0020" 0
1 2015 "ALG0021" 4
1 2015 "ALG0022" 0
1 2015 "ALG0023" 0
1 2015 "ALG0024" 0
1 2015 "ALG0025" 0
1 2015 "ALG0026" 0
1 2015 "ALG0027" 2
1 2015 "ALG0028" 0
1 2015 "ALG0029" 0
1 2015 "ALG0030" 0
1 2015 "ALG0031" 0
1 2015 "ALG0032" 0
1 2015 "ALG0033" 0
1 2015 "ALG0034" 0
1 2015 "ALG0035" 0
1 2015 "ALG0036" 0
1 2015 "ALG0037" 0
1 2015 "ALG0038" 0
1 2015 "ALG0039" 0
1 2015 "ALG0040" 0
1 2015 "ALG0041" 1
1 2015 "ALG0042" 1
1 2015 "ALG0043" 0
1 2015 "ALG0044" 0
1 2015 "ALG0045" 0
1 2015 "ALG0046" 0
1 2015 "ALG0047" 0
1 2015 "ALG0048" 0
1 2015 "ALG0049" 1
1 2015 "ALG0050" 1
1 2015 "ALG0051" 2
1 2015 "ALG0052" 0
1 2015 "ALG0053" 1
1 2015 "ALG0054" 1
1 2015 "ALG0055" 0
1 2015 "ALG0056" 1
1 2015 "ALG0057" 0
1 2015 "ALG0058" 0
1 2015 "ALG0059" 2
1 2015 "ALG0060" 0
1 2015 "ALG0061" 1
1 2015 "ALG0062" 0
1 2015 "ALG0063" 0
1 2015 "ALG0064" 0
1 2015 "ALG0065" 1
1 2015 "ALG0066" 0
1 2015 "ALG0067" 1
1 2015 "ALG0068" 1
1 2015 "ALG0069" 1
1 2015 "ALG0070" 1
1 2015 "ALG0071" 0
1 2015 "ALG0072" 0
1 2015 "ALG0073" 1
1 2015 "ALG0074" 0
1 2015 "ALG0075" 1
1 2015 "ALG0076" 0
1 2015 "ALG0077" 0
1 2015 "ALG0078" 0
1 2015 "ALG0079" 0
1 2015 "ALG0080" 0
1 2015 "ALG0081" 0
1 2015 "ALG0082" 0
1 2015 "ALG0083" 0
1 2015 "ALG0084" 0
1 2015 "ALG0085" 0
1 2015 "ALG0086" 0
1 2015 "ALG0087" 1
1 2015 "ALG0088" 0
1 2015 "ALG0089" 0
1 2015 "ALG0090" 0
1 2015 "ALG0091" 1
1 2015 "ALG0092" 0
1 2015 "ALG0093" 1
1 2015 "ALG0094" 0
1 2015 "ALG0095" 9
1 2015 "ALG0096" 0
1 2015 "ALG0097" 1
1 2015 "ALG0098" 0
1 2015 "ALG0099" 0
1 2015 "ALG0100" 1
end
label values country COUNTRY
label def COUNTRY 1 "Algeria", modify
label values Q27E Q27E
label def Q27E 0 "No, would never do this", modify
label def Q27E 1 "No, but would do if had the chance", modify
label def Q27E 2 "Yes, once or twice", modify
label def Q27E 3 "Yes, several times", modify
label def Q27E 4 "Yes, often", modify
label def Q27E 9 "Don't know", modify




Second dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str48 country int year str16(gdppercapitacurrentusnygdppcapcd gdppercapitapppcurrentinternatio)
"Algeria" 1960 "246.30876252962"  ".."              
"Algeria" 1961 "214.776273516192" ".."              
"Algeria" 1962 "172.245997766356" ".."              
"Algeria" 1963 "226.899988804342" ".."              
"Algeria" 1964 "238.048532020905" ".."              
"Algeria" 1965 "249.883486058815" ".."              
"Algeria" 1966 "235.598111822427" ".."              
"Algeria" 1967 "253.923650753479" ".."              
"Algeria" 1968 "281.925745024389" ".."              
"Algeria" 1969 "302.752306734104" ".."              
"Algeria" 1970 "336.224855584547" ".."              
"Algeria" 1971 "341.388987340498" ".."              
"Algeria" 1972 "442.351878193504" ".."              
"Algeria" 1973 "554.755124929209" ".."              
"Algeria" 1974 "817.988308478181" ".."              
"Algeria" 1975 "936.790025769149" ".."              
"Algeria" 1976 "1037.60703844052" ".."              
"Algeria" 1977 "1192.74388765763" ".."              
"Algeria" 1978 "1456.41939597215" ".."              
"Algeria" 1979 "1782.69798137911" ".."              
"Algeria" 1980 "2202.99736479746" ".."              
"Algeria" 1981 "2237.08632490354" ".."              
"Algeria" 1982 "2210.30192760168" ".."              
"Algeria" 1983 "2312.65561948251" ".."              
"Algeria" 1984 "2467.34642199031" ".."              
"Algeria" 1985 "2582.87958916856" ".."              
"Algeria" 1986 "2757.13052415538" ".."              
"Algeria" 1987 "2807.33029744222" ".."              
"Algeria" 1988 "2417.3766834056"  ".."              
"Algeria" 1989 "2215.84755797478" ".."              
"Algeria" 1990 "2408.68881482248" "6683.94251044679"
"Algeria" 1991 "1731.61127444317" "6661.13959247123"
"Algeria" 1992 "1776.03667438071" "6774.46596213675"
"Algeria" 1993 "1807.32854846258" "6640.1668105834" 
"Algeria" 1994 "1507.86531803627" "6583.17260367457"
"Algeria" 1995 "1452.26944487837" "6844.63963324371"
"Algeria" 1996 "1603.93792062434" "7129.63842232678"
"Algeria" 1997 "1619.80622391421" "7214.87270695396"
"Algeria" 1998 "1596.00371429844" "7553.94658827198"
"Algeria" 1999 "1588.2984525177"  "7797.01507100731"
"Algeria" 2000 "1764.88822213372" "8162.58757272877"
"Algeria" 2001 "1740.56006870277" "8480.07045862817"
"Algeria" 2002 "1781.75857132265" "8981.34926734444"
"Algeria" 2003 "2103.4531050538"  "9682.4760351135" 
"Algeria" 2004 "2609.94560777437" "10234.9309046785"
"Algeria" 2005 "3113.10109432814" "11022.1474669711"
"Algeria" 2006 "3478.81854327462" "11380.0944227252"
"Algeria" 2007 "3950.56160744508" "11897.1038212039"
"Algeria" 2008 "4923.54009872171" "12218.0473865624"
"Algeria" 2009 "3883.37814284703" "12294.6124889925"
"Algeria" 2010 "4480.72453900113" "12655.1374103049"
"Algeria" 2011 "5455.74133764262" "13046.1285113709"
"Algeria" 2012 "5592.32609806051" "13482.7211230765"
"Algeria" 2013 "5499.58148704572" "13823.8282325269"
"Algeria" 2014 "5493.02558996263" "14326.283007894" 
"Algeria" 2015 "4177.86751715913" "14711.2208543312"
"Algeria" 2016 "3946.4214445883"  "15036.3641495916"
"Algeria" 2017 "4044.29837226523" "15207.1791488773"
"Algeria" 2018 "4114.71506136896" "15481.7876195754"
"Algeria" 2019 ".."               ".."              
"Benin"   1960 "93.0225089907107" ".."              
"Benin"   1961 "95.5721547147448" ".."              
"Benin"   1962 "94.4645349843821" ".."              
"Benin"   1963 "99.8591138855553" ".."              
"Benin"   1964 "104.339768039886" ".."              
"Benin"   1965 "110.132793835113" ".."              
"Benin"   1966 "112.940836383512" ".."              
"Benin"   1967 "111.951601925238" ".."              
"Benin"   1968 "116.89506602186"  ".."              
"Benin"   1969 "116.025094341185" ".."              
"Benin"   1970 "114.556596466987" ".."              
"Benin"   1971 "112.570089087637" ".."              
"Benin"   1972 "134.819407935009" ".."              
"Benin"   1973 "161.987373671434" ".."              
"Benin"   1974 "174.014149085561" ".."              
"Benin"   1975 "207.300439745473" ".."              
"Benin"   1976 "208.656153830514" ".."              
"Benin"   1977 "218.4543657692"   ".."              
"Benin"   1978 "263.581057550946" ".."              
"Benin"   1979 "327.821677993792" ".."              
"Benin"   1980 "378.043898303902" ".."              
"Benin"   1981 "337.978194739187" ".."              
"Benin"   1982 "322.77769945337"  ".."              
"Benin"   1983 "271.129240225224" ".."              
"Benin"   1984 "252.869785044788" ".."              
"Benin"   1985 "244.410998858469" ".."              
"Benin"   1986 "303.348897858296" ".."              
"Benin"   1987 "344.503070791789" ".."              
"Benin"   1988 "346.736037681422" ".."              
"Benin"   1989 "311.678303860822" ".."              
"Benin"   1990 "393.686214423531" "949.168690914722"
"Benin"   1991 "385.753616012638" "988.77101457638" 
"Benin"   1992 "317.962855286116" "1005.61460513474"
"Benin"   1993 "411.926030522283" "1052.03860879311"
"Benin"   1994 "279.666504326806" "1059.29204604852"
"Benin"   1995 "367.387341032599" "1109.72406044667"
"Benin"   1996 "387.432924636251" "1142.40629405757"
"Benin"   1997 "361.100269794441" "1192.09566510535"
"Benin"   1998 "379.442353954477" "1216.72738664615"
"Benin"   1999 "403.623703839155" "1262.40524394344"
end

Third Dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str24 country int year byte(polity2 democ)
"Benin"   2013 7 7
"Benin"   2014 7 7
"Benin"   2015 7 7
"Benin"   2016 7 7
"Algeria" 2013 2 3
"Algeria" 2014 2 3
"Algeria" 2015 2 3
"Algeria" 2016 2 3
end

Creating a balanced dataset

Dear all,

I am currently attempting the following task. I have a country-product-year dataset (91 countries, 27 years from 1991-2016) which is unbalanced and contains data on prices and quantities of each item. It looks as follows (dataset is large so I give country-specific examples):

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 iso3code int(year itemcode) float p double q
"ARG" 1991   15   84.0463 11036600
"ARG" 1991   27    352.53   347600
"ARG" 1991   44  74.31699   573360
"ARG" 1991   56   89.8857  7684800
"ARG" 1991   71        85    46000
"ARG" 1991   75   85.2388   610000
"ARG" 1991   79  59.32511   136000
"ARG" 1991   83   68.9007  2252400
"ARG" 1991  101  162.1438    42200
"ARG" 1991  108        68    15000
"ARG" 1991  116  79.22397  1749887
"ARG" 1991  122  153.6256   289000
"ARG" 1991  125 136.40302   150000
"ARG" 1991  156        10 18200000
"ARG" 1991  176     516.8   241578
"ARG" 1991  181       170    10000
"ARG" 1991  187   175.762    33200
"ARG" 1991  191       177     2000
"ARG" 1991  201   357.398    25800
"ARG" 1991  210       150      100
"ARG" 1991  221       350      430
"ARG" 1991  222       370     8000
"ARG" 1991  236  165.5112 10862000
"ARG" 1991  242   462.823   310600
"ARG" 1991  260       517   121000
"ARG" 1991  267   133.398  4033400
"ARG" 1991  270       140    17000
"ARG" 1991  275        60    58650
"ARG" 1991  280       133    16500
"ARG" 1991  328    288.99   789400
"ARG" 1991  329     49.13   429600
"ARG" 1991  333   127.807   456800
"ARG" 1991  366  455.9952    71579
"ARG" 1991  367     722.4     4800
"ARG" 1991  388       185   716000
"ARG" 1991  394      76.9   362000
"ARG" 1991  401     336.7    92000
"ARG" 1991  403       179   498450
"ARG" 1991  406       500    74000
"ARG" 1991  414       100     3100
"ARG" 1991  417  362.6322    23474
"ARG" 1991  423       369    38500
"ARG" 1991  426      48.8   217000
"ARG" 1991  463        50   627000
"ARG" 1991  486     153.6   194200
"ARG" 1991  490       245   773900
"ARG" 1991  495       295   550200
"ARG" 1991  497       238   656000
"ARG" 1991  507   169.615   203900
"ARG" 1991  515   102.083  1067500
"ARG" 1991  521   117.396   297830
"ARG" 1991  523     122.4    21000
"ARG" 1991  526     449.3    18900
"ARG" 1991  531    2610.8     5600
"ARG" 1991  534     733.8   240000
"ARG" 1991  536    1215.8    52100
"ARG" 1991  544     907.6     7700
"ARG" 1991  560  395.7692  2081620
"ARG" 1991  567  110.0835   126000
"ARG" 1991  568  200.5408    67000
"ARG" 1991  569  811.7377      911
"ARG" 1991  571  199.2125     1600
"ARG" 1991  572  199.2126     3100
"ARG" 1991  574  183.8859     3558
"ARG" 1991  600 184.45625     1600
"ARG" 1991  667  47.50001    46075
"ARG" 1991  671     576.7   160761
"ARG" 1991  677    1264.1      310
"ARG" 1991  689       540     2900
"ARG" 1991  711   1806.53     2400
"ARG" 1991  723       440     1200
"ARG" 1991  767      1369   323600
"ARG" 1991  773     136.9     1700
"ARG" 1991  821     136.9     1300
"ARG" 1991  826      1010    94504
"ARG" 1991  867  1499.004  2918000
"ARG" 1991  882  126.7586  6121000
"ARG" 1991  944  1499.004  2919280
"ARG" 1991  977   1990.17    84800
"ARG" 1991  987   800.806   125000
"ARG" 1991 1012   1990.17    84700
"ARG" 1991 1017   1753.24     6732
"ARG" 1991 1032 1753.2814     6732
"ARG" 1991 1035   623.802   141585
"ARG" 1991 1055   623.802   141497
"ARG" 1991 1058    1265.6   373549
"ARG" 1991 1062   818.254   297830
"ARG" 1991 1069  665.9197     5855
"ARG" 1991 1070  665.9197     5855
"ARG" 1991 1073 552.58765      519
"ARG" 1991 1077 552.58765      519
"ARG" 1991 1080  543.1813    30921
"ARG" 1991 1087  543.1813    30921
"ARG" 1991 1094    1265.6   361350
"ARG" 1991 1097  667.2581    45600
"ARG" 1991 1120  667.2632    44813
"ARG" 1991 1141    1312.8     7040
"ARG" 1991 1144    1312.8     7018
"ARG" 1991 1163    1265.6    42000
"ARG" 1991 1182  720.2191    54000
end



Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 iso3code int(year itemcode) float p double q
"BRA" 1991   15   .01541883   2916823
"BRA" 1991   27  .029132357   9488007
"BRA" 1991   44   .02844604    111650
"BRA" 1991   56  .017183634  23624340
"BRA" 1991   71   .02791878      6304
"BRA" 1991   75  .017992996    230423
"BRA" 1991   83   .01494276    257516
"BRA" 1991   89   .01493617     47000
"BRA" 1991  116  .032063466   2267035
"BRA" 1991  122   .01958768    622432
"BRA" 1991  125  .005692714  24537505
"BRA" 1991  137  .001635489    216449
"BRA" 1991  156 .0013970905 260887893
"BRA" 1991  176   .06492159   2744711
"BRA" 1991  181   .04172147     29649
"BRA" 1991  187   .06263688      4566
"BRA" 1991  216   .02296445     35838
"BRA" 1991  217   .07183072    185965
"BRA" 1991  222   .08170213      3525
"BRA" 1991  234  .036363635       220
"BRA" 1991  236   .02401203  14937806
"BRA" 1991  242   .04923585    140548
"BRA" 1991  249   .04976159   1276547
"BRA" 1991  254  .005937624    525968
"BRA" 1991  256  .013492499    198703
"BRA" 1991  257    .0269814     69900
"BRA" 1991  265  .018414844    129678
"BRA" 1991  267  .021085715     35000
"BRA" 1991  270   .02377778      9000
"BRA" 1991  275  .007901669      2278
"BRA" 1991  289   .04416667     12000
"BRA" 1991  328   .04568672   2079751
"BRA" 1991  329         .01   1310000
"BRA" 1991  333       .0249     20000
"BRA" 1991  339   .02417072     13777
"BRA" 1991  388   .04773081   2343811
"BRA" 1991  403  .029506786    887728
"BRA" 1991  406   .17351024     85165
"BRA" 1991  463   .01723641   1950000
"BRA" 1991  486  .015177518   5762141
"BRA" 1991  490  .025130564  18936344
"BRA" 1991  495   .02194331    660657
"BRA" 1991  497  .036265902    436057
"BRA" 1991  507   .02545098     51000
"BRA" 1991  515   .03843395    526904
"BRA" 1991  521    .0462519     16475
"BRA" 1991  523  .006391347      6102
"BRA" 1991  534  .074178666     96672
"BRA" 1991  544   .08181818      2200
"BRA" 1991  560  .021938935    648026
"BRA" 1991  567  .021193936    432435
"BRA" 1991  568   .05748722     64136
"BRA" 1991  569   .07383386     23282
"BRA" 1991  571   .02271236    550053
"BRA" 1991  572   .06486438    111340
"BRA" 1991  574  .023184774   1190307
"BRA" 1991  587   .06298519     47662
"BRA" 1991  591  .006801333   1500000
"BRA" 1991  600  .008040813    643716
"BRA" 1991  603  .006801777    475317
"BRA" 1991  656   .05642464   1520382
"BRA" 1991  661   .12393174    320967
"BRA" 1991  667  .015015882     10389
"BRA" 1991  671  .026431374    166431
"BRA" 1991  687   .11173218     83906
"BRA" 1991  767         .14    686000
"BRA" 1991  780   .03088102      3303
"BRA" 1991  782   .06102278     11635
"BRA" 1991  788   .09788723      7999
"BRA" 1991  789  .015612632    233721
"BRA" 1991  821  .016364582     74979
"BRA" 1991  826   .10329096    413831
"BRA" 1991  836   .09310786     48374
"BRA" 1991  839  .034623217       491
"BRA" 1991  867   .20955773   4510800
"BRA" 1991  882  .029630193  15546642
"BRA" 1991  944   .20955777   4506824
"BRA" 1991  977   .24879746     79000
"BRA" 1991  987    .2908874     29300
"BRA" 1991 1012   .24879795     79032
"BRA" 1991 1017    .1636286     35000
"BRA" 1991 1020  .029551463    262119
"BRA" 1991 1032    .1636286     35000
"BRA" 1991 1035     .160595   1200000
"BRA" 1991 1055    .1605946   1199785
"BRA" 1991 1058   .14680062   2627700
"BRA" 1991 1062   .10993301   1315019
"BRA" 1991 1069   .14906645     19924
"BRA" 1991 1070   .14906645     19924
"BRA" 1991 1080   .15272714     56231
"BRA" 1991 1087   .15272714     56231
"BRA" 1991 1091  .034346152     26000
"BRA" 1991 1094   .14680068   2629150
"BRA" 1991 1097   .14689174     12322
"BRA" 1991 1120   .14688045     12037
"BRA" 1991 1141    .1128395      4050
"BRA" 1991 1144    .1128395      4050
"BRA" 1991 1182    .3544033     18668
"BRA" 1991 1185    .3160016     17117
"BRA" 1992   15    .2082048   2795598
end
As the data example shows, for Argentina I have data for the itemcode 79, but for Brazil I do not. In contrast, Brazil has data for the itemcode 89, while Argentina does not. I would like to transform my dataset such that Brazil also has the itemcode 79 as identifier and Argentina the itemcode 89, logically then with a missing value for price and quantity. In other words, I would like my dataset to contain for each country, the maximum number of items for which there is data in my whole dataset. This would mean that if the max number of items that a country has data on in my dataset(can be missing as well) is 100, then my dataset should have in total 91(# of countries)*27(# of years)*100(# of items)= 245700 observations. All countries should have the same number of items in the dataset, but then with missing data for price and quantity if they do not produce or report data on these items. I hope my question was clear enough with this example, and I would be grateful if anyone here could assist me with this task.

Thank you in advance.

Best,

Satya


Testing differences in abnormal returnsusing estudy

Hello,
I am currently performing an event study. I have a Sample of 84 european banks with daily returns calculated on closing Stock prices and a widely diversified Benchmark portfolio. I have Multiple events over a time of 2 years where an event always occurs for all bank simultanousy measuring a Bail-in of a european bank. The events are clustered in time as well. Let T=0 denote the event data,-1 the day prior and +1 the day past the event. I use an estimation Window of exactly 80 days with upper Bound -2. I regress each banks stock return according to the Market Model on the Benchmark portfolio and estimate the normal return as out of sample prediction. The abnormal return hence equals the observed stock return-predicted return. If an event date occurs in a later events' estimation window, i simply delete(dummy out) the obersvations of the earlier event period(-1,0,1) to estimate normal Performance correctly. I define three Different estimation windows, [-1;-1], [0;0] and [0;1] to check for anticipatory effects as if they would habe been the true eventday.[0:1] displaying the cumulative ar of the event date and the following one. My event identifying is really accurate and thus prevents Type 2 error. When testing i apply the generalized rank test adjusting for forecast errors, allowing event induced variance, and cross sectional correlation, autocorrelation. I further defined subsamples splitting up the banks into systematically important banks or not and giips state banks or not and then rerunned the estudy command for them. All my test statistic are good and seem to have low Type 1 error comparing with other empirical results. Only that the Grank tests differ in significance between CAAR and portfolio CAR for a single day in the eventwindow when they are supossed to be equal by defitnition suprized me. If somebody could explain the difference between portfolio car and caar in this case would be nice.
But Now to my real Problem: there is no Option to test the equality of average abnormal returns using estudy. I want to test if the differences in abnormal returns between the groups are nonzero. In that matter just aggregating cross sectional for the respective event window and testing average abnormal return on -1 against -1 and 0 against 0 and so on. Applying two sample ttest, results in Overrejection of the Null. E.g. the gsib sample consist only 16 Banks and the abnormal returns are surely not approximative normal distributed . Using wilcoxon rank sum doesnt seem to imrpove my results. Comparing with 68 non gsib banks i get hillarious low p values because the test statistic isnt adjusted At All.
My Question: I need a proper test statistic to test differences between groups. I thought about someting like a sign test with standardized abnormal returns and estination window adjusted Standard deviations. However the estudy command doesnt display the standardized abnormal returns or z statistic or allow for test of differences in two subsamples. I did the same stuff manually without estudy. However im new to stata and to stupid to correctly compute Sd(SAR) and SARs in the weird enviroment of stata. Maybe somebody can Help me i would appreciate it.
Thanks for Your time.
Best regards

prevent pagebreak in table putdocx

Hi
I making summary tables in putdocx looping through 100+ cities. The problem is that pagebreaks occur in the middle of tables. It is not an option to have one table on each page. "nosplit" seems to work only on rows.

Spmatrix creating empty weighting matrices

Hi,

I'm trying to do a very standard spatial analysis. The aim is to run ML and 2SLS estimations with spregress. I've successfully run this several times and then after I changed one of the excel files I was using it suddenly stopped working, or better, it does work, but it creates empty matrices. Here's my log:

.
. clear all

. import excel "network_contiguity_MLE_FULL.xlsx", sheet("network_contiguity_MLE_FULL") firstrow
(34 vars, 6,633 obs)
. .
. gen y = AIM15 - AIM13
(153 missing values generated)
.
. xtset id
panel variable: id (balanced)

. spset id
Sp dataset
data: cross sectional
spatial-unit id: _ID (equal to id)
coordinates: none
linked shapefile: none

.
. mata

: mata matuse W_network_full_U /*Upload Matrix X and Vector v from file wmatrix*/
(loading X[6650,6650], v[6650,1])

: end
.
. spmatrix spfrommata ZW = X v
(weighting matrix contains 6650 islands)

.mata

:sum(X)
4725

The last line is to show that the mata matrix it is not empty, yet it creates a matrix with as many islands as number of observations, so I guess empty?

This code still works using a different mata matrix, but except for the position of 1s and 0s they look identical to me.
I've been struggling for days with this, and I really cannot figure it out.
Thanks a lot for any help.

Interquartile range

Hello,

Please can anyone advise how I generate the interquartile range? I have a data example for age here:

Code:
. sum age, detail

                     Age at recruitment
-------------------------------------------------------------
      Percentiles      Smallest
 1%           41             40
 5%           44             40
10%           47             40       Obs                 990
25%           53             40       Sum of Wgt.         990

50%           60                      Mean            58.5101
                        Largest       Std. Dev.       7.68262
75%           65             70
90%           68             70       Variance       59.02265
95%           69             70       Skewness      -.5442946
99%           69             70       Kurtosis       2.248327

Imputation

Hey guys,

I work with panel data and I have the following problem.

I've dummy variables from 2 years (2002/2007), but I need to create the dummy variable for the year 2004 with the condition that the identifier is the same. The new dummy of 2004 is 1, if the dummies of 2002 & 2007 are 1. I already added the missing years with "tsfill".

How does the described imputation work?

thank you in advance
Ben







Gologit2 autofit option

Hi, I have developed a gologit2 with autofit option. My dependent variable is Prudence coded as 1 low, 2 neutral, 3 high. I obtained two model 1 and 2, if an independent variable like income have a negative coefficient in model 1 (low prudence) means that an increase of income increases the likelihood of being in the current or a lower category?

Exogenous variables in Olley and Pakes estimation (opreg)

Hello,

I want to estimate a Cobb-Douglas production function with Stata's opreg command as explained here:
https://journals.sagepub.com/doi/pdf...867X0800800204

Can anyone tell me where to put exogenous regressors because they are very relevant in my study?
I suppose they should appear in equation (7) in the paper cited above along with age and capital, which would mean in the state() option. Is that correct?

Unfortunately if I try this most of them are simply omitted from the final output table, which I do not understand. Those regressors have many zeros if this has any consequence in opreg.
There are no such problems using any other estimation strategy like OLS, FE or GMM. I can provide further information if needed.

Best regards,

Tim

Matching within certain intervals in the list of matching variables

Hi everyone,
I'm planning to match cases and controls based on age (within 3 years), gender, index date (within 3 years – index date is date of incident treatment prescription). I read about 'psmatch2' and other Stata commands. However, it seems to me that they match with the exact value given in the variable. I'm not sure how can I match within certain interval for a given variable in the list of matching variables.

Reshape long to wide

Hi everyone,
Im a newbie in stata and will really appreciate if somebody will help me.

I have searched the forum, but haven't found an answer to my problem i would like to reshape my data from long to wide and have a separate column for each service sector and its respected values i have uploaded a pic of the data i have

Array

Panel GARCH Model?

Dear all,

I am attempting to estimate a panel EGARCH-M model in STATA, as of yet I have had no luck. Has anyone been successful in doing so?

Any help would be appreciated.

Thank you!

Year and Month fixed effects

Hi,

I would like to run a regression with year month fixed effects. My data consists of year and months but I am unsure how I should include this dummies.

If I want to create a dummy for just one month I would use the "gen dj1=1 if month==1" formula, which is the dummy variable for January. But, how am I suppose to do this for both year and month fixed effects?

Thanks in advance!


Boxplot in complex design survey

Dear Statalist users:

I have a data set with weight syntax as below, How can I run a box plot with weight.?

svyset province [pweight=c_weight_final], strata(region) fpc(fpc1) vce(linearized) singleunit(scaled) || village_id, strata(province) fpc(fpc2) || unique_id, strata(agegr)
pweight: c_weight_final
VCE: linearized
Single unit: scaled
Strata 1: region
SU 1: province
FPC 1: fpc1
Strata 2: province
SU 2: village_id
FPC 2: fpc2
Strata 3: agegr
SU 3: unique_id
FPC 3: <zero>

Thank you so much!

cmp multivariate probit

Hi,

I am running a multivariate probit model using the cmp function on 4 binary dependent variables.Yesterday I ran the model several times and it worked just fine. Today, when I try to run it, it returns the error message: initial vector: matrix must be dimension 109

My code is thus:

cmp setup
cmp (DOWNLOAD = MALE AGE24 AGE39 AGE59 EDUCATION1 EDUCATION2 INCOME1 INCOME2 INCOME3 URBAN RURAL SOCIALMEDIA RECOM_FRIENDS RECOM_INTERNET RADIO TV MUSICTASTE1 MUSICTASTE2 MUSICTASTE3 MUSICTASTE4 MUSICGROUP PAIDSTREAMING FREESTREAMING VIDEO NEWSONLINE)(PIRACY = MALE AGE24 AGE39 AGE59 EDUCATION1 EDUCATION2 INCOME1 INCOME2 INCOME3 URBAN RURAL SOCIALMEDIA RECOM_FRIENDS RECOM_INTERNET RADIO TV MUSICTASTE1 MUSICTASTE2 MUSICTASTE3 MUSICTASTE4 MUSICGROUP PAIDSTREAMING FREESTREAMING VIDEO NEWSONLINE)(CD = MALE AGE24 AGE39 AGE59 EDUCATION1 EDUCATION2 INCOME1 INCOME2 INCOME3 URBAN RURAL SOCIALMEDIA RECOM_FRIENDS RECOM_INTERNET RADIO TV MUSICTASTE1 MUSICTASTE2 MUSICTASTE3 MUSICTASTE4 MUSICGROUP PAIDSTREAMING FREESTREAMING VIDEO NEWSONLINE)(MERCHETC = MALE AGE24 AGE39 AGE59 EDUCATION1 EDUCATION2 INCOME1 INCOME2 INCOME3 URBAN RURAL SOCIALMEDIA RECOM_FRIENDS RECOM_INTERNET RADIO TV MUSICTASTE1 MUSICTASTE2 MUSICTASTE3 MUSICTASTE4 MUSICGROUP PAIDSTREAMING FREESTREAMING VIDEO NEWSONLINE), indicators($cmp_probit $cmp_probit $cmp_probit $cmp_probit) robust difficult
estimates table, star(.1 .05 .01)


When I exclude the difficult and robust options, it tells me "too few variables specified" but I cannot work out why. Please advise,

Many thanks,

Jake

How to correct the alignment of output

I am doing data cleaning and the output is hard to clean because of how it is coming out. For some tables there are a lot of things which I need to rename. How do I make them be in the same line? Array

Standard Errors FE IV Regression, Panel Data

Hello,

I am working with StataIC16, and with Panel Data, where i=550 and t=6. To determine the effect of expenditure on test schore performance (math4) I want to use fixed effects IV Regression. The coefficient 13 seems quite plausible but I really do not understand why the standard error is so high? If I apply normal IV Regression the standard deviation is 1/10 of it.

Can anyone explain me the mechanism behind the FE IV Regression standard error? The only explanation I have is that the correlation between the endogenous regressor and the instrument might be very small which would lead to large standard errors - but then the se in the normal IV model should also be really high?


[ivreg math4 (aexpp = lfound) $control y96 y97 y98, r]

Instrumental variables (2SLS) regression Number of obs = 2,159
F(8, 2150) = 147.21
Prob > F = 0.0000
R-squared = 0.3489
Root MSE = 12.352

------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
aexpp | 16.96577 2.467018 6.88 0.000 12.12778 21.80376







[xtivreg2 math4 (aexpp = lfound) $control $conyear, cluster(distid) fe endog(aexpp)]
Warning - singleton groups detected. 7 observation(s) not used.
Warning - collinearities detected
Vars dropped: y94 y98

FIXED EFFECTS ESTIMATION
------------------------
Number of groups = 543 Obs per group: min = 2
avg = 4.0
max = 4
Warning - collinearities detected
Vars dropped: y94 y98

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on distid

Number of clusters (distid) = 543 Number of obs = 2152
F( 8, 542) = 118.73
Prob > F = 0.0000
Total (centered) SS = 201356.9083 Centered R2 = 0.3801
Total (uncentered) SS = 201356.9083 Uncentered R2 = 0.3801
Residual SS = 124825.1118 Root MSE = 8.808

------------------------------------------------------------------------------
| Robust
math4 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
aexpp | 13.30751 22.61193 0.59 0.556 -31.01106 57.62607
lunch | .4525351 .5155055 0.88 0.380 -.5578372 1.462907
lunchsq | -.0022127 .0054569 -0.41 0.685 -.012908 .0084826
lenrol | 81.38056 64.70469 1.26 0.208 -45.4383 208.1994
lenrolsq | -5.737292 4.439372 -1.29 0.196 -14.4383 2.963717
y94 | 0 (omitted)
y95 | -11.71218 2.781055 -4.21 0.000 -17.16295 -6.261413
y96 | -11.70345 1.202484 -9.73 0.000 -14.06027 -9.346621
y97 | -14.54061 .6900335 -21.07 0.000 -15.89305 -13.18817
y98 | 0 (omitted)

------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 54.158
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 151.876
(Kleibergen-Paap rk Wald F statistic): 67.722
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
-endog- option:
Endogeneity test of endogenous regressors: 0.028
Chi-sq(1) P-val = 0.8673
Regressors tested: aexpp
------------------------------------------------------------------------------
Instrumented: aexpp
Included instruments: lunch lunchsq lenrol lenrolsq y95 y96 y97
Excluded instruments: lfound
Dropped collinear: y94 y98
------------------------------------------------------------------------------

GMM on static models

Hi, I was wondering if is it possible to use the GMM (xtabond2) on a static model. I'm aware that the GMM on dynamic models require GMM-style instruments to treat the problem with the lagged dependant variable in the model. However, suppose that we don't include y(t-1) in the model, do we still can use xtabond2 command? I'm aware also that GMM models can be used to treat variables mesasured withh error (using within transformation rather than differenciation).

Thursday, February 27, 2020

qlqc30 error

Can anyone explain/solve this problem?

. qlqc30, filename("mira_feb_2020_kun_baseline.dta") version(3) grp(1)
(ComparisonOfTransana_DATA_NOHDRS_2020-02-27_1318.csv)
_xlshwritestrcol(): 9901 Stata returned error
export_excel_write_file(): - function returned error
export_excel_export_file(): - function returned error
<istmt>: - function returned error

r(9901);

end of do-file

r(9901);

Thanks,
Therese

randomly choosing a sub-group of items

I would like to generate a random 10 variables from a set of 42 variables and I wonder how i could do it in stata 14.2. Basically, we asked 42 questions measuring similar attitudes and would like to use a random sample of it, otherwise we would have to run many analysis. They do not fit well in factors and the alpha is quite low. Therefore we want to use only a subgroup of it. Is it possible to do so in Stata? Would you please advise a command for it? Thanks a lot.



----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(PostCons28 PostCons31)
3 2
3 3
5 1
4 2
1 1
1 1
5 1
end
------------------ copy up to and including the previous line ------------------

Multiple industry variables

Hello,

I'm using STATA 15, SE.

I'm facing an issue, trying to come up with a way to control for industry fixed effects in my regression and other analysis, but the problem is a firm can belong to several industries at the same time. Meaning, I have this:

firm_id Industry1 Industry2 Industry3 Industry4
A Auto Mining . .
B Auto publishing Telecumincaiton Services
C Mining publishing Services .
D High-Tech Services . .
As you can see, firms can have multiple industry classifications, and once they do, it is in alphabetical order.

I'd like to be able to control for industry, such that each firm that is represented in that industry will be included. Meaning, if I run the regression:

reg y x i.industry

All firms that are in the Auto industry are given a coefficient, all firms in the mining industry are given a coefficient, and so on.

Another way to think of this is to say I run the following command:

bys industry: sum XXX All firms that are in the Auto industry are given a coefficient, all firms in the mining industry are given a coefficient, and so on.

I couldn't find an answer to this question so I would really appreciate your thoughts on this.


Thank you.

no room to add more variables

Hi, I am using PSID data set. I believe it is a very large data set. Stata tells me:

no room to add more variables
Up to 2,048 variables are allowed with this version of Stata. Versions are
available that allow up to 120,000 variables.

What should I do?

Reshaping data from wide to long panel-data

Good day,

I recently reshaped my dataset from wide to long into balanced panel dataset. Clyde Schechter has been so gracious in helping me and I have a follow up on my post on Creating a year identifier for pre-post analysis to use for diff-in-diff but focused on the reshaping aspect of the response.

I am using panel-data (balanced data) with unit of analysis is the county level. Variables have observations in years from 2008-2018 but my period of interest is 2011-2017. I reshaped the data from wide too long. Some of the variables include observations reporting data for each year (example # of FQHCs reported for each year 2011-2018) and some with data for 5-year estimates (for example: veteran and non-veteran education level 2012-2016, population by race and gender 2011-2017). I also have variables that reflect count/percentage/total# of observations over a period of time (example number of black females).

Here is the dataset before the reshaped:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long povertystat12 byte ruralclinic10 long vet_hs12 int vet_nohs12
 54598 0  4778 254
196640 0 18324 942
 23641 1  1532 208
 20603 2  1111 159
 57099 2  3933 527
 10154 1   351  55
 19977 2  1172 160
112690 0  9421 814
 33503 0  2028 304
 25465 1  1923 303
 43301 0  2682 288
 13091 1   819 107
 24448 5  1125 150
 13106 0   890 197
 14726 0   602 136
 50255 0  5786 398
 53910 0  3740 429
 12622 0   728  92
 10497 1   810  71
 37016 2  2729 308
 13662 1   786 111
 80126 1  4900 604
 48338 0  5918 309
 40895 0  1930 196
 70108 2  3308 399
end
label var povertystat12 "# Pers w/Pov Status Determined 2012-16" 
label var ruralclinic10 "# Rural Health Clinics 2010" 
label var vet_hs12 "Veterans 25+ w/HS Dipl or more 2012-16" 
label var vet_nohs12 "Veterans 25+ w/< HS Diploma 2012-16"
When running a tabstat of the data looking at the total number of persons with veteran education

Code:
 tabstat vet_hs12, stat(sum) format(%14.0fc) c(v)

   stats |  vet_hs12
---------+----------
     sum |    18,018,157
--------------------
After reshaping the data, the data looks like this:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte year int vetedu_hs byte ruralclinic long vetedu_hsplus
12  132  0   981
12   59  0   585
12  197  0  1608
12    1  1   159
12   16  2   314
12    7  0    52
12  311  0  2752
12  840  0 12952
12  157  0  1208
12  143  0  1498
12    2  0    48
12   87  3  1200
12 2140  0 29503
12   78  0   341
end
label var year "Reshaped Year variable panel data" 
label var vetedu_hs "Veterans 25+ w/< HS Diploma 2012-16" 
label var ruralclinic "# of rural health clinics 2010-17" 
label var vetedu_hsplus "Veterans 25+ w/HS Diploma+ 2012-16"
Here is the results of sum of variable vetedu_hs by year - which reflects the number of veteran persons with high school education by year. What I know is wrong is that the original variable noted that the vet_hs12 was the number of veterans with high school diploma 2012-2016 but when reshaped it only populated in year 2012

Code:
. tabstat vetedu_hs, by (year) stat(sum)
Summary for variables: vetedu_hs
     by categories of: year (Reshaped Year variable panel data)

    year |       sum
---------+----------
      11 |         0
      12 |   1328412
      13 |         0
      14 |         0
      15 |         0
      16 |         0
      17 |         0
      18 |         0
---------+----------
   Total |   1328412
--------------------

It is clear something is missing. I am befuddled. I consider I need to do some loops of the reshaped and populate across a range of time. I talked about this Creating a categorical variable from multiple numeric variables in post #4, however I do not think this is correct approach. Should I have not included the -vet_hs12- variable in the reshaped but I am not sure how I would have been able to build my model if i did not capture the fact that in my panel data I have 5-year estimate data?

Hope this is clear.

Thanks

Rene
Stata 12 on MAC OS (but also have access to Stata 15 on Windows)

Recoding Values of A Variable based on conditions

Hello,
I am stuck with a problem. I hope someone can help me to find the answer.
The first part of the problem is -Write codes that prints the numbers from 1 to 100.
I have used the following commands to create the sequence
set obs 100
gen seq = int(_n)

The second part of the problem is where I am really stuck. It says- But for multiples of four print “X” instead of the number and for the multiples of five print “Y”. For numbers which are multiples of both four and five print “XY”.

I don't know how to solve the second part. Can anyone help me with the second part?


Bootstrap with 2 way fixed effects in panel data using reghdfe

Dear Statalist,

I am estimating a model of House Price shocks on divorce rates. I am using the reghdfe command with county and time fixed effects, and clustering standard errors at the county level

Code:
reghdfe divorce_rate HPShock female_labourforce unemployment_rate degree under10 over45hours white weekly_pay, absorb(county time) vce(cluster county)
I want to run a bootstrap because my main variable of interest, House price shock, is by nature a constructed estimate. Although, I know that reghdfe only currently support bootstrap with one way fixed effects.

Does anyone know of a way to work around this?

Best,
Jamie

Calculate normalized euclidean distance

Hi all,

The Euclidean distance matrix could be obtained by using
Code:
use http://www.stata-press.com/data/r10/iris.dta,clear
matrix diss D = seplen sepwid petlen petwid, L2
Whereas, how can I calculate the normalized Euclidean distance according to the formula below?
[ATTACH=CONFIG]temp_17124_1582803101584_802[/ATTACH]
This formula is coming from the paper http://www.public.asu.edu/~huanliu/p...pakdd00clu.pdf page 4.
The tricky thing is the difference between two observations at a variable k is divided by the maxk - mink value of that variable k before calculating the distance.

Thank you so much for helping!

Best regards,
Jack

Frequency weight in a before and after graph (twoway pccapsym)

Hi.

I have a data set where I look at number of events before and after a certain time-point. I want to show this in a before and after graph with different colors for each group. I came a long way using twoway pccapsym. However, many of the observations have the same number of events before and after and therefore lines are overlaping. I am looking for a way to visualize how many observations that are overlapping. I thought I might use frequency weight. But I can not figure out how to use it in a graph made by twoway pccapsym.
In this context, I can't figure out how to make variables that contains the weight information for before and after and add that information to a graph where markers (dots) and/or lines are weighted according to frequency

Small example of data set:
id Group before after
1 2 0 2
2 1 3 1
3 3 2 2
4 2 2 2
5 3 1 3
6 1 1 3
7 1 3 3
In the picture you can see how my graph (w. all data) looks without frequency weight. An example of the code for the graph is:

gen byte one = 1
gen byte two = 2
twoway (pccapsym before one after two if group == 1) (pccapsym before one after two if group == 2) (pccapsym before one after two if group == 3)



Array
I hope you can help me?

Thanx


Creating a new variable based on the value of other variables

Hi all,

I have created a categorical variable (values 1 through 5) in excel based on certain characteristics of US states in a 20 year time period.

I have a large data set with over 2 million rows and I am trying to assign the value of the categorical variable to each row in my data set based on the state and the year.

What I have tried so far is:

gen state_year = 1
replace state_year = 5 if (year==1962 & state==1)

And so on etc.


However, since there are a 1000 combinations of states and years (50*20), this is going to take a very long time to code in a do-file.

Is there a way to import my excel file so that it creates a variable and assigns the value of that variable depending on what is in the excel?
I have attached a screenshot of the first few rows of my excel document.

Array

Any help would be much appreciated. Thanks in advance!

Harry

Selected Graphs for Panel Data Categories

Hi,
I'm new to Stata and I'm trying to get some panel-times series graphs for selected countries.
Specifically, I got a dataset comprising 180 countries and 26 years (1990-2015). Now I'm trying to get a simple line graph displaying one of my variables (tariffs) over time, but not for all countries, just selected ones (say, China, Rwanda, Bangladesh - they also have unique numerical identifiers if that helps).
However, everytime I'm doing a graph Stata either draws lines for all countries or just one.
Is there some handy way to get this done?
Thanks in advance!

attnd

Hi
I am using survey data and I want to use sample weights for estimating attnd and attk after running psmatch2.
How can I do that?
Thanks

Gravity model - PPML fixed-effect estimation

Dear all,

I am doing a study on how Brexit can impact the EU export by using the gravity model. My sample is a panel data, in which 28 EU countries exports to all countries (including EU countries) across 41 sectors (as defined by GTAP) from 1988 to 2018. My independent variables are traditional gravity variables such as contiguity, distance, GDP. My dummy variables/interest variables are "EEA" dummy accounting for if an importer is an EEA (European Economic Area) member; "auto" dummy = 1 if the trade in the automotive industry and the interaction term EEA_auto.
I use PPML estimator and exporter- and importer-time fixed effects for the estimation. My code is as follows:

egen exp_time = group(exp year)
quietly tabulate exp_time, generate(exp_time_fe)

egen imp_time = group(imp year)
quietly tabulate imp_time, generate(imp_time_fe)

*(1) Fixed effects including GDP variables (Just to compare)
ppml tradevalue gravity_variables_withGDP EEA auto EEA_auto exp_time_fe* imp_time_fe*, cluster (dist)

*(2) Fixed effects without GDP variables (The specification is used for interpretation)
ppml tradevalue gravity_variables_noGDP EEA auto EEA_auto exp_time_fe* imp_time_fe*, cluster (dist)

The coefficient of the EEA dummy changes significantly from (1) to (2), which makes the effects of EEA dummy is too large that I could not interpret. I acknowledge that is a problem probably due to multicollinearity and I should not interpret it as my explanatory variable, but it is my interest variable, I would like to keep it but I don't know exactly how to fix this issue.

Then I also try to include only exporter and importer and year fixed effects instead of exporter- and importer-time fixed effects. My code is:

egen exp_1 = group(exp)
tabulate exp_1, generate(exp_fe)

egen imp_1 = group(imp)
tabulate imp_1, generate(imp_fe)

egen year_1 = group(year)
tabulate year_1, generate(year_fe)

*(3)
ppml tradevalue gravity_variables_withGDP EEA auto EEA_auto ///
exp_fe* imp_fe* year_fe* ///
, cluster(dist)

*(4)
ppml tradevalue gravity_variable_noGDP EEA auto EEA_auto ///
exp_fe* imp_fe* year_fe*, cluster(dist)

All parameters look much more consistent between (3) and (4) and the effects of my interest variables are perfectly fine to interpret.

My question is: Is it appropriate that I control for exporter and importer and year fixed effects instead of exporter- and importer-time fixed effects and use that result to interpret? If not, could you please give me some advice on how to still keep exporter- and importer-time fixed effects and obtain a better result?

Note: I already use ppml_panel_sg but it did not work since it will absorb my interest variables. Therefore, I sticked to the normal ppml command.

Thank you very much in advance for your time and consideration!