BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Saturday, November 30, 2019

Help Fixing Bar Graph of GDP and Dummy Variables

Dear all,

I am using Stata 16, on mac. I used the following command in Stata: reg lnGDP lnagedpop lnpopulation WestCoast EastCoast, robust. All of the data is based on 50 states including D.C. over a span of three years (2016-2018). lnagedpop represents (population that is 65 and older). The dummy variable WestCoast = 1 if the U.S. state is located on the West Coast and 0 if not. Similarly, the dummy variable East Coast = 1 if the U.S. state is located on the East Coast and 0 if not. Midwest is my reference category. I am trying to create a graph based on the regression model I created. I am trying to make a bar graph with lnGDP on the yaxis and the dummy variables West Coast and East Coast on the x-axis. I used the command graph bar (mean) lnGDP, over(WestCoast) over(EastCoast), but my graph doesn't look right. Could someone help me fix my bar graph or maybe have an idea of another graph I could use instead so that it looks better?

Array

Thank you in advance for your help

Jason Browen

Trouble Generating Variable dependant on period

Array

Above is a picture of a subset of my data. hs10 refers to a specific product being sold and m_val is a value the product takes on. What I need to do is generaet a variable that is the log difference of m_val for period 1 and 2. The problem however is that there are a number of products that don't have m_val for both periods. For those products, I want the generated value to simply be the log of m_val.

I am lost with where to start!

Stata returning missing when converting string date

Dear all,
I have dates in two formats in a single variable one is 20-Apr-19 the other is 21-APRIL-2019. Stata is returning missing when converting 20-Apr-19 but it is properly converting 21-APRIL-2019. What can be the problem.
I am using this command

Code:

gen date2=date(var1, "DMY")

Thank you,
Oscar

Different I-square results for meta summarize and meta forestplot using random empirical Bayes method

Hi

The meta forestplot using random(ebayes) appears to produce a different value for the I2 result compared to the output for meta summarize using the same method. Using different methods (fixed dlaird etc) produces the same I2 in the meta summarize and forestplot but not when using random(ebayes).

Is the an issue with the meta forestplot?

Thanks

"Data have changed" dialog box

I searched the forum for this, but I didn't find anything. Sorry if it's a repeat question.

In my workflow, I work from original datasets and alter them with Stata code. As a result, I never save altered Stata datasets. I'm in Windows 10, and every time I close the Stata window, a dialog box informs me that "Data in memory have changed. Do you want to save the changes before exiting?" In previous versions of Stata, there was a keyboard shortcut for the "No" option, so it wasn't too big of a pain to repeatedly deal with that dialog box. But in Stata 16, that shortcut no longer exists, and now it's a big pain because every single time I close a Stata window, I have to mouse over and click "no." It's an unnecessary pain point, and in my workflow it happens a lot. Is there some sort of setting, option, or other method I can do to make Stata stop offering me that dialog box? Or at least restore the keyboard shortcut?

meta forestplot error in stata 16

Hello

I am running several meta forestplots for subgroups which worked fine in the version of stata 16 on 14 Nov.

meta forestplot _id outcome_name trimester _plot _esci _weight, ///
subgroup(sga_order) eform ///
esrefline(lcolor(green) lpattern(dash)) ///
nullrefline(favorsleft("Decreased risk") ///
favorsright("Increased risk")) insidemarker nonotes ///
random(ebayes) crop(. 2)

meta forestplot _id outcome_name trimester _plot _esci _weight, ///
subgroup(outcome) eform ///
esrefline(lcolor(green) lpattern(dash)) ///
nullrefline(favorsleft("Decreased risk") ///
favorsright("Increased risk")) insidemarker nonotes ///
random(ebayes) crop(. 2)

The first code produces a plot however the second produces an error thinking that meta is a variable

Effect-size label: Log Odds-Ratio
Effect size: logor
Std. Err.: se
Study label: author

variable _meta* not found
r(111);

end of do-file

r(111);

. update meta
invalid syntax
r(198);

Any suggestions what I am doing wrong?? Thanks

Searching within strings

I have strL a variable (pgttype) that contains several results such as:

"Aneuploidy CRMI 13, 15, 16, 17, 18, 21, 22, X, Y"
"Translocation 46,XX,t(12;21)(q24.33;q22.13)"
"Aneuploidy CRMI 13, 15, 16, 17, 18, 21, 22, X, Y, Add Chromosome 14, Translocation 45,XY,der(13;14)(q10;q10), Gender Selection Pt Choice F"

I am trying to find a way to search within these long string variables to create a new variable (pgttype1) that discretely categorizes such as:
1) contains "aneuploidy" but does not contain "translocation"
or
2) contains "translocation" but does not contain "gender selection"

Can't seem to get it to work with regexm or strpos...

Any ideas?

Thanks in dance

Method for continuous predictor and dichotomous dependent variable

Hi everyone,

I am working on a research project asking: Do lower rates of school satisfaction and school engagement influence students’ suspension and expulsion?

School satisfaction and school engagement are the independent variables and are continuous (likert scale strongly disagree to strongly agree)

Suspension and expulsion are ever/never dichotomous variables.

Would I use an ordinal logistic regression? If so how would I go about that?

simulation study

so i have to write a short programe for a simulation study, so far i have written

program define mysim

drop _all
set obs 19
gen b =inlist (0, 0.1,0.4)
more
gen u = rnomal(0,1)

Now the part I am stuck on is how to put into the simulation x = i
The question is yi = Bi + ui , so that xi = i and ui is N(0.1) for i = 1.....n
when B = {0, 0.1, 0.4}

Set Seed Random Sample

I would like to obtain a nationally representative sample from a (non representative) survey data set thereby using every individual from the given survey data set. To do so I am creating a 100% random sample with the corresponding weights by using gsample 100 [aw=weight], percent. I am not expanding the data by using the weights first and then drawing a random sample with other commands (e.g. sample) since the data set that I start with is already quite big. The problem is that I am not getting the same sample every time I draw the sample. I tried to set the seed at various positions, at the beginning of the do-file or right before the gsample command. I also tried to sort the data according to a unique identifier right before drawing the sample and also setting the seed right before that. But nothing has helped to have the same random sample when I draw the sample again and again. Any help would be appreciated. I cannot post my code before using gsample command since it is quite long, but some commands involve sorting the data.

Multiline model name for long variable names using esttab

Hi users,

I am using esttab to generate tables in Latex. I have long model name that I want to appear in two lines because if I keep them in one line it pushes the last column out of the table frame. Here is my code:

Code:

esttab high low diff Total , ///
cells("mean(pattern(1 1 0 1) fmt(2)) b(star pattern(0 0 1 0) fmt(2))") ///
label mlabels("Mean treated three or more times" "Mean treated one or two times" "Diff" "Total") ///
collabels(none) replace

This produces the following table for me:

Code:

---------------------------------------------------------------------------
                              (1)          (2)          (3)             (4)
                     Mean treat~s Mean treat~s         Diff           Total
---------------------------------------------------------------------------
Teacher with highe~e         0.12         0.11        -0.01            0.11
Teacher per student          2.64         2.24        -0.40            2.40
Avg class size              37.98        38.38         0.40           38.23
Total enrollment           510.48       569.82        59.34          546.52
Total female enrol~t       124.76       216.54        91.78*         180.51
Number of students~o        99.10       109.23        10.14          105.25
School age                  22.71        22.02        -0.70           22.29
Number of shifts             1.71         1.86         0.15            1.80
Average score at t~e       202.50       195.61        -6.89          198.31
---------------------------------------------------------------------------
Observations                   42           65          107             107
---------------------------------------------------------------------------

You see that name in columns 1 and 2 are long and that results in pushing column 4 out. I need to split the names in columns 1 and 2 into two parts that come in two rows. I appreciate if anyone can help me with that.

Clustered Errors and fixed effect on the same level

Hello Statalist-Forum,

I was wondering if you could help me with a problem:
I have a continous outcome variable (a health score) and information about incidents per governorate in a country. I want to describe the realtionship between those two (obviously adding more variables at another point).
My question: Do I include fixed governorate effects in this regression, eventhough it is on the exact same level (a dummy for all the governorates)? Additionally, do I cluster the SE?
Basic Code:
reg health incidents
1. alternative
reg health incidents i.governorate
2. alternative
reg health incidents i.governorate, vce(cluster gov)

Like I said, the information on the incidents are also on the governorate level (one number for each governorate).
It feels that this would create somewhat a problem or a circular thing. However I can not explain to myself econometrically if this really states a problem or if this is fine.
Thank a lot in advance

Division into two groups

Dear statalists,

I am running an analysis with an unbalanced dataset for 10 years. I would like to see whether the results differ for two groups of firms based on net income across years. For this reason, I want to differentiate and run the analysis for dependent variables of two groups of firms: one high-income firms and low-income firms (based on the L.median of net income). However, I am stucked now, as the command "bysort time: egen MNincome = median (l.netincome)" gives me error "not sorted", although I have sorted the data beforehand.
Could you please kindly advise, how to overcome this error? Maybe there is a different, simpler approach to run that kind of analysis? (without creating dependent variables for each group e.g. "gen employeesgroup1 = numberofemployees if MNincome > 10" where 10 is a median obtained from the command "sum netincome, detail" as 50% percentile.

Thank you in advance!

one year survival - data management

good day all

I have a list of children with 2 variables "date of birth" and "date of death". Both are recorded as numeric daily date (int); for example "13894" is "15jan1998".

I would like to make a new variable called "1-year survival" with binary outcome "yes" and "no" using the aforementioned variables. Thank you!

Comparing two waves with RE-Logit

I am currently using Panel data from the SOEP to analyze and compare political interest in 2013 and 2017. My final Dataset contains around 10 variables and 21,444 observations from two waves (2013 & 2017). My DV is a binary variable =1 if the individual is interested and ==0 otherwise. Below is a Description of my Variables

Code:

              storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
interested      float   %13.0g     interested
                                              Interested(0/1)
polpart         float   %9.0g      polpart    Participates (0/1)
hhinc_eqr       float   %9.0g                 Real Equivalized HH-Inc. in thousand €
female          float   %9.0g      female     Female (0/1)
age             int     %8.0g                 Age of Individual
west            float   %9.0g      west       West-Germany(0/1)
y2013           float   %9.0g      y2013      Pre crisis(0/1)
y2017           float   %9.0g      y2017      Post crisis(0/1)
unemployed      float   %10.0g     unemployed
                                              Unemployed(0/1)
edyears         float   %9.0g                 Number of Years of Education
party_pref      float   %13.0g     party_pref
                                              Party preference(0/1)
worried         float   %11.0g     worried    
hhinc_group     float   %9.0g      hhinc_group
                                              Income Groups
hhsize          byte    %8.0g                 Number of Persons in HH
persnr          long    %12.0g                Unveraenderliche Personennummer (PID)
syear           int     %12.0g                Befragungsjahr

I first ran xtlogit with a full year-dummy interaction to estimate the changes between 2013 and 2017:

Code:

. logistic interested i.y2017##c.age i.y2017##ib(2).hhinc_group i.y2017##i.west ///
> i.y2017##i.female i.y2017##i.party_pref i.y2017##i.unemployed ///
> i.y2017##i.worried i.y2017##c.edyears, vce(cluster persnr)

Logistic regression                             Number of obs     =     21,444
                                                Wald chi2(19)     =    3116.28
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -11933.143               Pseudo R2         =     0.1783

                                 (Std. Err. adjusted for 15,309 clusters in persnr)
-----------------------------------------------------------------------------------
                  |               Robust
       interested | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
            y2017 |
            2017  |   1.959571   .4097247     3.22   0.001     1.300721    2.952146
              age |   1.028058   .0016503    17.24   0.000     1.024828    1.031297
                  |
      y2017#c.age |
            2017  |   .9953376   .0017759    -2.62   0.009      .991863    .9988244
                  |
      hhinc_group |
            Poor  |    .841905   .0772755    -1.87   0.061     .7032897    1.007841
            Rich  |   1.351389   .0803104     5.07   0.000     1.202805    1.518328
                  |
y2017#hhinc_group |
       2017#Poor  |   1.102225   .1224724     0.88   0.381     .8865227     1.37041
       2017#Rich  |   .8609955   .0585903    -2.20   0.028     .7534891    .9838406
                  |
             west |
            West  |   1.097271   .0653854     1.56   0.119     .9763183    1.233207
                  |
       y2017#west |
       2017#West  |   .9534032   .0610252    -0.75   0.456     .8409944    1.080837
                  |
           female |
          Female  |   .3994588    .020074   -18.26   0.000       .36199    .4408059
                  |
     y2017#female |
     2017#Female  |   1.080646   .0578372     1.45   0.147     .9730302    1.200164
                  |
       party_pref |
      preference  |   3.502175   .1792699    24.49   0.000     3.167863    3.871767
                  |
 y2017#party_pref |
 2017#preference  |   .9004705   .0542628    -1.74   0.082     .8001578    1.013359
                  |
       unemployed |
      unemployed  |   1.166827   .1479902     1.22   0.224      .910013    1.496117
                  |
 y2017#unemployed |
 2017#unemployed  |   .8746221   .1345562    -0.87   0.384     .6469451    1.182425
                  |
          worried |
         worried  |   .8425315   .0449318    -3.21   0.001      .758913    .9353632
                  |
    y2017#worried |
    2017#worried  |   1.066966   .0755296     0.92   0.360      .928741    1.225763
                  |
          edyears |   1.206524   .0125361    18.07   0.000     1.182202    1.231346
                  |
  y2017#c.edyears |
            2017  |   .9956702   .0111172    -0.39   0.698     .9741176      1.0177
                  |
            _cons |    .010937   .0020517   -24.07   0.000     .0075722    .0157971
-----------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

In order to regard the serial correlation and Panel structure of my Data, I then ran:

Code:

. xtlogit interested c.age##c.age ib(2).hhinc_group i.west i.female i.party_pref ///
> i.unemployed i.worried edyears i.y2017, re nolog or intpoints(32)

Random-effects logistic regression              Number of obs     =     21,444
Group variable: persnr                          Number of groups  =     15,309

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        1.4
                                                              max =          2

Integration method: mvaghermite                 Integration pts.  =         32

                                                Wald chi2(11)     =    1184.43
Log likelihood  = -11080.549                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
  interested | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.043214   .0135104     3.27   0.001     1.017068    1.070033
             |
 c.age#c.age |   1.000125   .0001226     1.02   0.308     .9998846    1.000365
             |
 hhinc_group |
       Poor  |   .7840031   .0976617    -1.95   0.051     .6141654    1.000807
       Rich  |   1.440353   .1243391     4.23   0.000     1.216154    1.705883
             |
        west |
       West  |    1.22893   .1197768     2.12   0.034     1.015232    1.487609
             |
      female |
     Female  |   .1370047   .0129937   -20.96   0.000     .1137644    .1649927
             |
  party_pref |
 preference  |   10.51701   .9246106    26.76   0.000     8.852339    12.49471
             |
  unemployed |
 unemployed  |   1.141778   .1908279     0.79   0.428     .8228457    1.584327
             |
     worried |
    worried  |   .8122069   .0655849    -2.58   0.010     .6933188    .9514815
     edyears |   1.541916   .0302456    22.08   0.000      1.48376     1.60235
             |
       y2017 |
       2017  |   1.840504   .1101531    10.19   0.000     1.636789    2.069573
       _cons |   .0000852   .0000392   -20.36   0.000     .0000346    .0002101
-------------+----------------------------------------------------------------
    /lnsig2u |   2.411774    .067891                       2.27871    2.544838
-------------+----------------------------------------------------------------
     sigma_u |   3.339721   .1133685                      3.124753    3.569477
         rho |   .7722266   .0119415                      .7479791    .7947813
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline odds (conditional on zero random effects).
LR test of rho=0: chibar2(01) = 1719.38                Prob >= chibar2 = 0.000

However, I am not really sure on how to get estimate changes between 2013 and 2017 in the RE-Logit. My goal is to look at the predicted change in P(interested=1) between 2013 und 2017 for my whole sample, as well as for the three different Income Groups.
Now I am wondering, whether I should interpret this change by looking at the marginal effects of my year-dummy, or if I need to estimate separate RE-Logit for each of the three income groups (?). AFAIK the year dummies will pick up any variation in the outcome that happen over time and that is not attributed to other explanatory variables, BUT does it make sense to estimate it for different subpopulations ? If not, what other possibilities do I have to compare my two waves ?

Here is what I ran after my RE-Logit:

Code:

. margins, dydx(y2017) over(hhinc_group) coeflegend post

Average marginal effects                        Number of obs     =     21,444
Model VCE    : OIM

Expression   : Pr(interested=1), predict(pr)
dy/dx w.r.t. : 1.y2017
over         : hhinc_group

------------------------------------------------------------------------------
             |      dy/dx  Legend
-------------+----------------------------------------------------------------
0.y2017      |  (base outcome)
-------------+----------------------------------------------------------------
1.y2017      |
 hhinc_group |
       Poor  |   .0464933  _b[1.y2017:1bn.hhinc_group]
     Middle  |   .0518429  _b[1.y2017:2.hhinc_group]
       Rich  |     .05327  _b[1.y2017:3.hhinc_group]
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

.
end of do-file

. do "C:\Users\Lorenz\AppData\Local\Temp\STD1bbc_000000.tmp"

. test _b[1.y2017:1bn.hhinc_group] = _b[1.y2017:2.hhinc_group] = _b[1.y2017:3.hhinc_group]

 ( 1)  [1.y2017]1bn.hhinc_group - [1.y2017]2.hhinc_group = 0
 ( 2)  [1.y2017]1bn.hhinc_group - [1.y2017]3.hhinc_group = 0

           chi2(  2) =   50.06
         Prob > chi2 =    0.0000

Alternatively, I thought about running

Code:

xtlogit interested c.age##c.age ib(2).hhinc_group i.west i.female i.party_pref ///
i.unemployed i.worried edyears i.y2017 if hhinc_group==1, re nolog or intpoints(32)

I am unsure about the appropriateness of each of these approaches and how they differ in results and interpretation. Or should I include interaction terms with the year dummy in my RE-Logit ? I am thinking that in the case of full interaction, which is equal to estimating two separate equations for 2013 and 2017, all the within variation would be lost (?)

I am currently at undergrad level and re did my best to read all material available, but right now I cannot find an answer in regards to what I should use for my analysis.

I would very much appreciate any input.

Best regards,
Lorenz

xtreg, fe robust: xtoverid error(2b) operator invalid when correcting Hausman test (V_b-V_B is not positive definite) due to year dummies

Dear Statalisters,

I am analyzing a panel dataset with year dummies over a period of 2000-2018 and
apparently like many other Stata beginners, I came across the issue of (V_b-V_B is not positive definite) when using Hausman test to determine whether to use the FE or RE model.

I tried to follow suggestions of using xtoverid and after installing the package, I encountered the following issue:

Code:

. xtoverid
2b:  operator invalid
r(198);

Please find below my Hausman test and error message for xtoverid.

Notes: I already tried

Code:

hausman fe re, sigmamore
hausman fe re, sigmaless

but the "(V_b-V_B is not positive definite)" was not solved.

Code:

 ** Redo HAUSMAN test to check for RE or FE model

. xtreg wROAf1 IV_CSPaggr geoentropy4rg indentropy CF_age CF_size_lnEmp CF_Levg CF_NPM CF_orgslack CC_WGI s_poptotal CC_WDI
> _PopGrowth s_GDP s_GDPpc CC_WDI_GDPPCgrowth s_fdi CC_WDI_ttUnemplRate i.yeardummy, fe
note: 17.yeardummy omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     18,964
Group variable: compid                          Number of groups  =      3,780

R-sq:                                           Obs per group:
     within  = 0.0346                                         min =          1
     between = 0.0089                                         avg =        5.0
     overall = 0.0011                                         max =         16

                                                F(30,15154)       =      18.13
corr(u_i, Xb)  = -0.6065                        Prob > F          =     0.0000

-------------------------------------------------------------------------------------
             wROAf1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
         IV_CSPaggr |  -.0227671   .0064973    -3.50   0.000    -.0355027   -.0100316
      geoentropy4rg |  -.6074356   .4636818    -1.31   0.190    -1.516308    .3014366
         indentropy |  -.0358456   .2862778    -0.13   0.900    -.5969847    .5252934
             CF_age |   .1352476   .0487664     2.77   0.006     .0396596    .2308355
      CF_size_lnEmp |  -1.551527   .1776825    -8.73   0.000    -1.899806   -1.203248
            CF_Levg |  -3.375775   .6100142    -5.53   0.000    -4.571477   -2.180074
             CF_NPM |   .0023169   .0010765     2.15   0.031     .0002068    .0044269
        CF_orgslack |  -.0353077   .0529923    -0.67   0.505     -.139179    .0685636
             CC_WGI |   2.715108    1.34704     2.02   0.044     .0747483    5.355469
         s_poptotal |  -.0184933   .0304806    -0.61   0.544     -.078239    .0412524
   CC_WDI_PopGrowth |   .1152938   .3522301     0.33   0.743    -.5751196    .8057072
              s_GDP |    .632364   .1654363     3.82   0.000     .3080889    .9566391
            s_GDPpc |  -1.391459   .2382413    -5.84   0.000     -1.85844   -.9244769
 CC_WDI_GDPPCgrowth |   .1399394   .0499854     2.80   0.005      .041962    .2379168
              s_fdi |   .0759093   .0944164     0.80   0.421    -.1091582    .2609769
CC_WDI_ttUnemplRate |   .0317622   .0728713     0.44   0.663    -.1110744    .1745988
                    |
          yeardummy |
              2003  |   1.718398    .660478     2.60   0.009     .4237814    3.013015
              2004  |   2.647522   .5736919     4.61   0.000     1.523017    3.772027
              2005  |   3.015123   .5307992     5.68   0.000     1.974692    4.055553
              2006  |   2.572717   .4936521     5.21   0.000     1.605099    3.540335
              2007  |   .1135621   .4749622     0.24   0.811    -.8174212    1.044545
              2008  |  -.1597079   .4313103    -0.37   0.711    -1.005128    .6857123
              2009  |   2.637556   .4797226     5.50   0.000     1.697242     3.57787
              2010  |    1.65895   .3939397     4.21   0.000     .8867807    2.431119
              2011  |   1.438345   .3699788     3.89   0.000     .7131416    2.163548
              2012  |     .55647   .3401479     1.64   0.102    -.1102609    1.223201
              2013  |   .4664979   .3052461     1.53   0.126    -.1318213    1.064817
              2014  |  -1.235379   .2731461    -4.52   0.000    -1.770778   -.6999793
              2015  |  -.5328104   .2561052    -2.08   0.038    -1.034807   -.0308134
              2016  |   .5114611   .2398972     2.13   0.033     .0412337    .9816885
              2017  |          0  (omitted)
                    |
              _cons |   15.21448   5.054748     3.01   0.003     5.306568     25.1224
--------------------+----------------------------------------------------------------
            sigma_u |  13.647115
            sigma_e |  7.0821539
                rho |  .78783095   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
F test that all u_i=0: F(3779, 15154) = 6.53                 Prob > F = 0.0000

. estimate store fe

. xtreg wROAf1 IV_CSPaggr geoentropy4rg indentropy CF_age CF_size_lnEmp CF_Levg CF_NPM CF_orgslack CC_WGI s_poptotal CC_WDI
> _PopGrowth s_GDP s_GDPpc CC_WDI_GDPPCgrowth s_fdi CC_WDI_ttUnemplRate i.yeardummy, re

Random-effects GLS regression                   Number of obs     =     18,964
Group variable: compid                          Number of groups  =      3,780

R-sq:                                           Obs per group:
     within  = 0.0190                                         min =          1
     between = 0.0983                                         avg =        5.0
     overall = 0.0484                                         max =         16

                                                Wald chi2(31)     =     642.04
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-------------------------------------------------------------------------------------
             wROAf1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
         IV_CSPaggr |  -.0104024   .0050298    -2.07   0.039    -.0202607   -.0005442
      geoentropy4rg |   .5555316    .326717     1.70   0.089    -.0848219    1.195885
         indentropy |    .040395   .2456707     0.16   0.869    -.4411107    .5219008
             CF_age |   .0183981   .0053646     3.43   0.001     .0078837    .0289125
      CF_size_lnEmp |   .6894453    .093907     7.34   0.000      .505391    .8734995
            CF_Levg |  -3.212073   .5162737    -6.22   0.000     -4.22395   -2.200195
             CF_NPM |   .0021792   .0004408     4.94   0.000     .0013153    .0030432
        CF_orgslack |  -.1645398   .0435363    -3.78   0.000    -.2498694   -.0792102
             CC_WGI |  -.5799745   .4522981    -1.28   0.200    -1.466462    .3065135
         s_poptotal |  -.0020881   .0010709    -1.95   0.051    -.0041871    .0000108
   CC_WDI_PopGrowth |   .1173416   .2527258     0.46   0.642    -.3779918     .612675
              s_GDP |    .013877   .0326286     0.43   0.671    -.0500738    .0778278
            s_GDPpc |  -.4957057   .1248403    -3.97   0.000    -.7403882   -.2510232
 CC_WDI_GDPPCgrowth |   .1474068   .0474048     3.11   0.002      .054495    .2403185
              s_fdi |   .1062363   .0854276     1.24   0.214    -.0611988    .2736714
CC_WDI_ttUnemplRate |   .0051797   .0428637     0.12   0.904    -.0788316     .089191
                    |
          yeardummy |
              2003  |   1.641583   .6825769     2.40   0.016     .3037567    2.979409
              2004  |   2.930763   .6200142     4.73   0.000     1.715558    4.145969
              2005  |   3.252471   .6017448     5.41   0.000     2.073073    4.431869
              2006  |   2.934455   .5998017     4.89   0.000     1.758865    4.110044
              2007  |   .3813269   .6038372     0.63   0.528    -.8021723    1.564826
              2008  |    .252396   .5885435     0.43   0.668    -.9011281     1.40592
              2009  |   3.200676   .6210972     5.15   0.000     1.983348    4.418004
              2010  |   2.266513   .5860811     3.87   0.000     1.117815    3.415211
              2011  |   1.933953   .5919703     3.27   0.001     .7737126    3.094193
              2012  |   1.211612   .5868921     2.06   0.039     .0613248    2.361899
              2013  |    1.19128   .5855447     2.03   0.042     .0436333    2.338926
              2014  |  -.3239355   .5834149    -0.56   0.579    -1.467408    .8195367
              2015  |   .6538636   .5763879     1.13   0.257    -.4758358    1.783563
              2016  |   1.603099   .5801637     2.76   0.006     .4659991    2.740199
              2017  |   1.087371   .5918516     1.84   0.066     -.072637    2.247379
                    |
              _cons |  -.2501154   1.308373    -0.19   0.848     -2.81448    2.314249
--------------------+----------------------------------------------------------------
            sigma_u |   9.609396
            sigma_e |  7.0821539
                rho |  .64801529   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

. estimate store re

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        wROAf1[compid,t] = Xb + u[compid] + e[compid,t]

        Estimated results:
                         |       Var     sd = sqrt(Var)
                ---------+-----------------------------
                  wROAf1 |   112.5129       10.60721
                       e |    50.1569       7.082154
                       u |   92.34049       9.609396

        Test:   Var(u) = 0
                             chibar2(01) =  6033.36
                          Prob > chibar2 =   0.0000

. xtoverid
2b:  operator invalid
r(198);

Furthermore, I tried to follow suggestions of using Mundlak device in earlier discussions but failed to conduct the Hausman test with the Mundlak test because all newly generated mean(var) were omitted due to collinearity.

Code:

 ** Mundlak approach
egen CSPmean = mean(IV_CSPaggr)
egen Geoentmean = mean(geoentropy4rg)
egen Indentmean = mean(indentropy)
egen agemean = mean(CF_age)
egen sizemean = mean(CF_size_lnEmp)
egen levgmean = mean(CF_Levg)
egen npmmean = mean(CF_NPM)
egen orgslackmean = mean(CF_orgslack)
egen WGImean = mean(CC_WGI)
egen poptotalmean = mean(s_poptotal)
egen popgrowthmean = mean(CC_WDI_PopGrowth)
egen gdpmean = mean(s_GDP)
egen gdppcmean = mean(s_GDPpc)
egen gdppcgrowthmean = mean(CC_WDI_GDPPCgrowth)
egen fdimean = mean(s_fdi)
egen unemplmean = mean(CC_WDI_ttUnemplRate)

xtreg wROAf1 IV_CSPaggr geoentropy4rg indentropy CF_age CF_size_lnEmp CF_Levg CF_NPM CF_orgslack CC_WGI s_poptotal CC_WDI_PopGrowth s_GDP s_GDPpc CC_WDI_GDPPCgrowth s_fdi CC_WDI_ttUnemplRate CSPmean Geoentmean Indentmean agemean sizemean levgmean npmmean orgslackmean WGImean poptotalmean popgrowthmean gdpmean gdppcmean gdppcgrowthmean fdimean unemplmean i.yeardummy, re vce(robust)
note: CSPmean omitted because of collinearity
note: Geoentmean omitted because of collinearity
note: Indentmean omitted because of collinearity
note: agemean omitted because of collinearity
note: sizemean omitted because of collinearity
note: levgmean omitted because of collinearity
note: npmmean omitted because of collinearity
note: orgslackmean omitted because of collinearity
note: WGImean omitted because of collinearity
note: poptotalmean omitted because of collinearity
note: popgrowthmean omitted because of collinearity
note: gdpmean omitted because of collinearity
note: gdppcmean omitted because of collinearity
note: gdppcgrowthmean omitted because of collinearity
note: fdimean omitted because of collinearity
note: unemplmean omitted because of collinearity

Random-effects GLS regression                   Number of obs     =     18,964
Group variable: compid                          Number of groups  =      3,780

R-sq:                                           Obs per group:
     within  = 0.0190                                         min =          1
     between = 0.0983                                         avg =        5.0
     overall = 0.0484                                         max =         16

                                                Wald chi2(31)     =     403.84
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                    (Std. Err. adjusted for 3,780 clusters in compid)
-------------------------------------------------------------------------------------
                    |               Robust
             wROAf1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
         IV_CSPaggr |  -.0104024    .005792    -1.80   0.072    -.0217545    .0009497
      geoentropy4rg |   .5555316   .4246757     1.31   0.191    -.2768174    1.387881
         indentropy |    .040395   .2513858     0.16   0.872     -.452312    .5331021
             CF_age |   .0183981   .0037449     4.91   0.000     .0110582     .025738
      CF_size_lnEmp |   .6894453   .1634117     4.22   0.000     .3691641    1.009726
            CF_Levg |  -3.212073   1.113767    -2.88   0.004    -5.395016   -1.029129
             CF_NPM |   .0021792   .0011534     1.89   0.059    -.0000814    .0044399
        CF_orgslack |  -.1645398   .0894728    -1.84   0.066    -.3399033    .0108237
             CC_WGI |  -.5799745   .4286284    -1.35   0.176    -1.420071    .2601217
         s_poptotal |  -.0020881   .0008452    -2.47   0.013    -.0037447   -.0004315
   CC_WDI_PopGrowth |   .1173416   .2156586     0.54   0.586    -.3053415    .5400247
              s_GDP |    .013877   .0344897     0.40   0.687    -.0537215    .0814755
            s_GDPpc |  -.4957057   .1313103    -3.78   0.000    -.7530691   -.2383423
 CC_WDI_GDPPCgrowth |   .1474068   .0596172     2.47   0.013     .0305592    .2642544
              s_fdi |   .1062363   .0843957     1.26   0.208    -.0591762    .2716489
CC_WDI_ttUnemplRate |   .0051797   .0422669     0.12   0.902     -.077662    .0880214
            CSPmean |          0  (omitted)
         Geoentmean |          0  (omitted)
         Indentmean |          0  (omitted)
            agemean |          0  (omitted)
           sizemean |          0  (omitted)
           levgmean |          0  (omitted)
            npmmean |          0  (omitted)
       orgslackmean |          0  (omitted)
            WGImean |          0  (omitted)
       poptotalmean |          0  (omitted)
      popgrowthmean |          0  (omitted)
            gdpmean |          0  (omitted)
          gdppcmean |          0  (omitted)
    gdppcgrowthmean |          0  (omitted)
            fdimean |          0  (omitted)
         unemplmean |          0  (omitted)
                    |
          yeardummy |
              2003  |   1.641583   .4706248     3.49   0.000     .7191752     2.56399
              2004  |   2.930763    .580364     5.05   0.000     1.793271    4.068256
              2005  |   3.252471   .5911194     5.50   0.000     2.093898    4.411043
              2006  |   2.934455   .6290839     4.66   0.000     1.701473    4.167436
              2007  |   .3813269   .6928329     0.55   0.582    -.9766007    1.739254
              2008  |    .252396   .5860639     0.43   0.667     -.896268     1.40106
              2009  |   3.200676   .6079538     5.26   0.000     2.009109    4.392244
              2010  |   2.266513   .6160047     3.68   0.000     1.059166     3.47386
              2011  |   1.933953   .6155617     3.14   0.002     .7274743    3.140432
              2012  |   1.211612   .5967539     2.03   0.042      .041996    2.381228
              2013  |    1.19128   .5951258     2.00   0.045     .0248547    2.357705
              2014  |  -.3239355    .605627    -0.53   0.593    -1.510943    .8630717
              2015  |   .6538636   .5871039     1.11   0.265    -.4968388    1.804566
              2016  |   1.603099    .580165     2.76   0.006     .4659965    2.740202
              2017  |   1.087371   .6118433     1.78   0.076      -.11182    2.286562
                    |
              _cons |  -.2501154   1.708782    -0.15   0.884    -3.599267    3.099036
--------------------+----------------------------------------------------------------
            sigma_u |   9.609396
            sigma_e |  7.0821539
                rho |  .64801529   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
estimates store mundlak


testparm CSPmean Geoentmean Indentmean agemean sizemean levgmean npmmean orgslackmean WGImean poptotalmean popgrowthmean gdpmean gdppcmean gdppcgrowthmean fdimean unemplmean

no such variables;
the specified varlist does not identify any testable coefficients
r(111);

I would highly appreciate your feedback on my mistakes and support in solving them.

Kind regards,

Jennifer

Medeff command: invalid mediate (error r(198))

Dear Statalists,

Could you please tell me the reason why when I type the following command:

medeff (regress edb51 acled gdpgrowth1 gdppercapita1 trade1 inflation1 exchangerate1 telephone1 cellphone1 education1 i.cid i.year) (regress fdiwdi1 acled edb51 gdpgrowth1 gdppercapita1 trade1 inflation1 exchangerate1 telephone1 cellphone1 education1 i.cid i.year), treat(acled), mediate(edb51) sims (1000) vce (bootstrap)

Stata gives me the error r.198 with the saying invalid 'mediate'.

Please I hope someone can help because it is really urgent!!!

Thank you very much and kind regards,
Siham Hari

Doubt on Heckprob model results

Hello everybody,

I'm running a Heckman selection model based on a heckprob. I have some categorical variables such as grup_ha grupo_edad niveleducativo and the rest of them are dichotomous variables.
I can notice in my results that only approximately 15% of the sample are uncensored observations. So I'm not sure whether this is helping to explatin the whole behavior. Even more so if the model is appropiate.
Because χ 2 = 57.35, this clearly justifies the Heckman selection equation with these data, but I'm not sure yet about it and I can't find much literature about this kind of model. Could anyone tell me how to Interpret these heckprobit results? I suppose I have to interpret as a probit model.
Thank you!

Code:

. heckprob cred_o_fin_aprob i.regiones_co sexo tenenciapropia cuidadotierrayanim existeinfraestructura acceso_sistderiego accesoenergia desti
> noventa recibir_asistenciaoasesoria grup_ha niveleducativo grupo_edad, select( soli_cred_o_fin2013=grup_ha sexo grupo_edad sabeleeryesc) vc
> e(robust)

Fitting probit model:

Iteration 0:   log pseudolikelihood = -31412.985  
Iteration 1:   log pseudolikelihood = -30745.027  
Iteration 2:   log pseudolikelihood = -30740.017  
Iteration 3:   log pseudolikelihood = -30740.017  

Fitting selection model:

Iteration 0:   log pseudolikelihood = -245658.48  
Iteration 1:   log pseudolikelihood = -242017.34  
Iteration 2:   log pseudolikelihood = -241997.85  
Iteration 3:   log pseudolikelihood = -241997.85  

Fitting starting values:

Iteration 0:   log pseudolikelihood = -61035.075  
Iteration 1:   log pseudolikelihood =  -30953.04  
Iteration 2:   log pseudolikelihood = -30710.317  
Iteration 3:   log pseudolikelihood = -30709.844  
Iteration 4:   log pseudolikelihood = -30709.844  

Fitting full model:

Iteration 0:   log pseudolikelihood = -272920.09  
Iteration 1:   log pseudolikelihood = -272709.04  (not concave)
Iteration 2:   log pseudolikelihood = -272708.68  (backed up)
Iteration 3:   log pseudolikelihood = -272707.97  
Iteration 4:   log pseudolikelihood = -272707.92  
Iteration 5:   log pseudolikelihood = -272707.91  
Iteration 6:   log pseudolikelihood = -272707.91  

Probit model with sample selection              Number of obs     =    571,952
                                                Censored obs      =    483,897
                                                Uncensored obs    =     88,055

                                                Wald chi2(15)     =    1092.15
Log pseudolikelihood = -272707.9                Prob > chi2       =     0.0000

---------------------------------------------------------------------------------------------
                            |               Robust
                            |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
cred_o_fin_aprob            |
                regiones_co |
                    Caribe  |  -.3207418   .0201058   -15.95   0.000    -.3601483   -.2813352
                  Pacifico  |    .075082   .0145369     5.16   0.000     .0465902    .1035738
                  Oriental  |   .1377078   .0150459     9.15   0.000     .1082184    .1671973
          Orinoco-Amazonia  |  -.2023915   .0241538    -8.38   0.000    -.2497322   -.1550508
                            |
                       sexo |  -.0885042   .0142334    -6.22   0.000    -.1164013   -.0606072
             tenenciapropia |   .0716722   .0120028     5.97   0.000     .0481471    .0951974
         cuidadotierrayanim |   .1055903   .0179416     5.89   0.000     .0704253    .1407552
      existeinfraestructura |   .0249654   .0102271     2.44   0.015     .0049206    .0450102
         acceso_sistderiego |   .0202652   .0134314     1.51   0.131    -.0060599    .0465903
              accesoenergia |   .0666022   .0110267     6.04   0.000     .0449903    .0882141
               destinoventa |  -.0020524     .01243    -0.17   0.869    -.0264148    .0223099
recibir_asistenciaoasesoria |    .111292   .0118058     9.43   0.000     .0881531    .1344309
                    grup_ha |  -.8661633   .8533362    -1.02   0.310    -2.538672    .8063449
             niveleducativo |  -.0214992   .0051078    -4.21   0.000    -.0315102   -.0114881
                 grupo_edad |  -.0740636   .0065678   -11.28   0.000    -.0869362    -.061191
                      _cons |   2.724741   .8553536     3.19   0.001     1.048278    4.401203
----------------------------+----------------------------------------------------------------
soli_cred_o_fin2013         |
                    grup_ha |  -.0262168   .4370274    -0.06   0.952    -.8827748    .8303412
                       sexo |   .1801206   .0046461    38.77   0.000     .1710145    .1892267
                 grupo_edad |   .0252196    .002117    11.91   0.000     .0210703    .0293689
               sabeleeryesc |   .4375166   .0060783    71.98   0.000     .4256034    .4494298
                      _cons |  -1.567543   .4371122    -3.59   0.000    -2.424268   -.7108193
----------------------------+----------------------------------------------------------------
                    /athrho |  -.4703892   .0621132    -7.57   0.000    -.5921288   -.3486496
----------------------------+----------------------------------------------------------------
                        rho |  -.4385137   .0501692                     -.5314249   -.3351774
---------------------------------------------------------------------------------------------
Wald test of indep. eqns. (rho = 0): chi2(1) =    57.35   Prob > chi2 = 0.0000
end

Friday, November 29, 2019

Help creating graph for multiple linear regression

Dear all,

I am using Stata 16, on mac. I am estimating the regression: reg GDP agedpop population WestCoast EastCoast, robust. agedpop(=population that is 65 and older), WestCoast(=1 if state is on the WestCoast and 0 if not), EastCoast(=1 if state is on the EastCoast and 0 if not), and Midwest if my omitted category. Does anyone have any ideas on some graphs I could use to represent this regression?

Thank you in advance for your help

Jason Browen

Exporting residuals vs prediction data

Hi,

So, I need to make a residuals versus predictor plot. However, the graphics aren't working in STATA. Yes, I've tried just about everything. Now, I could easily plot this in another program (such as excel), I just need to know how to get the raw numbers for this kind of plot. Like, get the data output and paste it into another program and then make the graph in that program. How do I go about doing this?

Residuals vs fit would also be relevant btw, but I assume the process is the same.

how to delete specific observations

Hello everyone
I need to delete the firms that does not have data for three years consecutively. please teach me how to do it

for example
firm1 has data for five years but as follows (2005 2007 2008 2010 2012). i dont want this firm i want to delete it from my sample . hope you get my point

thanks in advance

Wagstaff concentration Index for binary outcome

Hi,

I have read in existing literature that if we are dealing with a binary outcome (i.e th individual went for prenatal care/did not go for prenatal care), the Concentration Index is not valid as it is not normalised Could someone please explain how to normalise wagstaff concentration index and what this generally means.

Zahrah

How to generate a new variable based on existing variables

Dear Stata experts,

I have a dataset like below. "tiea"=1 means in a given year the company has a connection to the Senator "a", tiea=0 means the company has no connection to the Senator "a" in the given year. The same applies to Senator "b". Senator b replaced Senator a in the year 2004.

So, I wanted to create a new variable called "currenttie" which should be able to measure whether the company has a tie to the current Senator in any given year. For example, in year 2000 and 2001, the currenttie value should be 1, whereas in year 2002 and 2003 the currenttie value should be 0. After year 2004, the tie to Senator b became current. So the currenttie in 2004, 2006,2007 and 2008 has a value of 1.

Could you please show me how to generate the "currenttie" variable based on the tiea, tieb, and year variables? Thank you very much!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year byte(tiea tieb currenttie)
2000 1 0 1
2001 1 0 1
2002 0 1 0
2003 0 0 0
2004 1 1 1
2005 0 0 0
2006 0 1 1
2007 0 1 1
2008 0 1 1
end

IKEA Corporate Social Responsibility (IKEA CSR): a brief overview

IKEA Corporate Social Responsibility (CSR) efforts are led by Chief Sustainability Officer, Pia Heidenmark Cook. The home improvement and furnishing chain started to research CSR reports under the title People & Planet Positive starting from 2012. IKEA CSR efforts and … Continue reading →

The post IKEA Corporate Social Responsibility (IKEA CSR): a brief overview appeared first on Research-Methodology.

ARDL lag coefficients

Hi everyone or anyone who knows this better than me,

I am trying to fit the following short-run equation using ARDL, following in a higly simplified form:

X = a₁Y + a₂Z + a₃W

And a long-run equation:

X = a₄ + a₅Y + a₆Z + a₇W

To get any results of significance, I have to use a "aic ec1" ARDL command:

ardl logesgbr loggdpgbr logrergbr logvolgbr , aic ec1 btest

I do get some significant results with this, but I am questioning the type of lags STATA suggests:

- What are lags such as "LD, L2D, D1... etc"?
- Are their "coef." values anything I could use in my equation?
- And what should I do with the adjustment coefficient?

If anyone could shed some light on this, it would be highly appreciated!

Regards,
James

Interpreting a non-linear relationship with predicted values and margins plot

I am trying to interpret a non-linear relationship in a fixed effects model. Attached is my Stata output. As you can see from the output, both the overhead ratio and the overhead ratio squared are positive and significant.

However, when I graph the predicted values I get a u-shape relationship and when I do a margins plot I get an inverted-u. So I'm not sure which is accurate? I also thought that if a non linear relationship is present that one value would be negative and the other value would be positive.

Any help would be much appreciated!

jackknife loop with wrong number of observations

forvalues i = 1/139 {

reg fawtd fdistockgdp if seqnum != `i', robust

outreg2 using table, append excel
}

I'm running the above loop (to create a jackknife test) on 139 observations. I'm doing this because I have a list of 139 countries and want to know how my regression coefficient changes when each country is dropped individually.
There is data for all observations, no data is missing. I used seqnum to generate a numbered list of countries. I'm also using outreg2 to export the data.

My problem is that the results I get list 138 observations for some observations and 139 observations for others (seemingly randomly). It seems to me that it should be 138 for all, since I'm dropping one country each time. I've attached a screenshot that shows the first 22 results below. You can see how the number of observations varies.Array

I would really appreciate it if anyone has thoughts on what the problem might be. Thank you!

Calculation of standard errors after predictive margins: how are they computed?

Hello,

I'm struggling how standard errors after -margins- are calculated, e.g. after -logit-. I am able to reproduce the productive margins themselves, but I am not able to derive the standard errors. They don't seem to be in line with the standard deviations of the predicted probabilities ... See example below

Code:

sysuse auto
logit foreign price mpg weight length
* Manually predict probabilities at various levels of weight
generate p1500 = invlogit(_b[weight]*1500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p2000 = invlogit(_b[weight]*2000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p2500 = invlogit(_b[weight]*2500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p3000 = invlogit(_b[weight]*3000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p3500 = invlogit(_b[weight]*3500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p4000 = invlogit(_b[weight]*4000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p4500 = invlogit(_b[weight]*4500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
generate p5000 = invlogit(_b[weight]*5000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons])
* Calculating predictive margins using margins command
margins, at ( weight =(1500(500)5000))

And here is (part of) the output:

Code:

Logistic regression                             Number of obs     =         74
                                                LR chi2(4)        =      55.94
                                                Prob > chi2       =     0.0000
Log likelihood = -17.064729                     Pseudo R2         =     0.6211

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |   .0009392   .0003093     3.04   0.002      .000333    .0015454
         mpg |  -.1155925   .0966509    -1.20   0.232    -.3050248    .0738398
      weight |  -.0078002   .0030342    -2.57   0.010    -.0137471   -.0018534
      length |   .0387482   .0875022     0.44   0.658    -.1327529    .2102493
       _cons |   9.883036   11.26217     0.88   0.380    -12.19042    31.95649
------------------------------------------------------------------------------

. * Calculating predictive margins using margins command
. margins, at ( weight =(1500(500)5000))

Predictive margins                              Number of obs     =         74
Model VCE    : OIM

Expression   : Pr(foreign), predict()

1._at        : weight          =        1500
2._at        : weight          =        2000
3._at        : weight          =        2500
4._at        : weight          =        3000
5._at        : weight          =        3500
6._at        : weight          =        4000
7._at        : weight          =        4500
8._at        : weight          =        5000

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   .9978147   .0039638   251.73   0.000     .9900458    1.005584
          2  |   .9230255   .0393331    23.47   0.000     .8459341    1.000117
          3  |   .5344313   .1674495     3.19   0.001     .2062364    .8626262
          4  |   .2003275     .07624     2.63   0.009     .0508998    .3497552
          5  |   .0874817   .0298857     2.93   0.003     .0289067    .1460567
          6  |   .0129882   .0135501     0.96   0.338    -.0135694    .0395459
          7  |   .0003349   .0007068     0.47   0.636    -.0010504    .0017201
          8  |   6.82e-06   .0000234     0.29   0.771    -.0000391    .0000527
------------------------------------------------------------------------------

Below is the summary of manual predictions: the means are identical to what is predicted by -margins-
However I cannot recreate the standard error of -margins-, e.g. Std. Dev. of manual predections at
values of weight of 2500 and 3000 are close to each other (.3565795 and .3427131), while the
Std. Err. produced by -margins- differs a lot for these values (.1674495 and .07624 )

Code:

. sum p1500-p5000

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       p1500 |         74    .9978147    .0039419   .9830204          1
       p2000 |         74    .9230255    .1224789   .5395426   .9999996
       p2500 |         74    .5344313    .3565795   .0231664   .9999796
       p3000 |         74    .2003275    .3427131   .0004798   .9989911
       p3500 |         74    .0874817    .2395264   9.71e-06    .952476
-------------+---------------------------------------------------------
       p4000 |         74    .0129882    .0509261   1.97e-07   .2885812
       p4500 |         74    .0003349     .001379   3.98e-09   .0081432
       p5000 |         74    6.82e-06    .0000281   8.05e-11   .0001661

Is there anyone who can help on this?

Thanks a lot,
MIke

Error terms for Time Series Regression with a Panel Variable

Dear all,

For my master thesis, I would like as a first step to run a Fama-French three factor regression, and compute idiosyncratic volatility for every common stock as the standard deviation of one, three, six, or twelve months of daily error term.

For this, I downloaded permno, date, and return of stocks from CRSP and daily FF 3 factors. I dropped all missing values of return (unbalanced time series data)and merged both datasets based on date.

I created a variable for time :

egen time = group(date)
label variable time "Date Identifier"

My data looks like this :

Array

I declared a time series dataset with a panel variable (permno) :

tsset permno time, generic

I want to save the error terms for each stock every day so that I can later create a loop to compute standard deviation of that error term based on 1,3,6,12 months of daily data. How can I do that ?

I know I can have an output of factor loadings and rmse for every stock by running :

statsby _b[MktRF] _b[SMB] _b[HML] rmse = e(rmse), by(permno) saving(try reg.dta, replace): regress excessretx MktRF SMB HML

but again I would be losing the time dimension that I need to compute monthly idiosyncratic volatility of every stock.

How to change display of scientific notation of summary statistics to nummeric?

Stata 15.1

I am attempting to create a descriptive statistics table for my thesis. However, I found that variables with large numbers are displayed using the scientific notation.
In my attempt to change it, I read many FAQ's and other discussions about this problem, but none of these options seemed to work for me.

The data type of the variables is

Code:

double

, please see below.

Code:

sum var1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      var1 |     12,286    1.18e+09    2.21e+10   .0223229   7.71e+11

format var1 %24.0f

sum var1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      var1 |     12,286    1.18e+09    2.21e+10   .0223229   7.71e+11

recast float var1, force
var1:  12072 values changed

sum var1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      var1 |     12,286    1.18e+09    2.21e+10   .0223229   7.71e+11

I saw in a reply by Jorrit on a similar question to string the variable, however in that case I am unable to compute summary statistics, right? https://www.statalist.org/forums/for...tific-notation I cannot test this because I get the following message:

Code:

tostring var1, gen(var1s) format("%17.0f")
var1 cannot be converted reversibly; no generate

Thanks in advance.

Kind regards,
Stan

How to regress categorical variables without base

Hello,

Currently I am investigating the impact of industry categorization (categorical variable) on premium's paid (continuous variable) in acquisitions. Therefore, I have categorized my data in 48 industry categories and now I am trying to regress the industries with the paid premiums. But stata automatically selects one of my industries as base. How can I calculate the impact of the different industries on the premiums without a base since there is no standard industry?

Thank you for your help!

Command problem with counting sample firms smaller than the firm for each country in each year (panel data)

Hi, guys,
I would like to measure the percentage of sample firms smaller than the firm for each country in each year.
But I don't know how to use loop to count the number of firms with a smaller asset volume than the specific firm.
For example, for company "01COMMUNIQUE LAB" in 1999, I want to know how many firms are with assets < 2472.

Data:
input name year totass
"01COMMUNIQUE LAB" 1999 2472
"01COMMUNIQUE LAB" 2000 13487
"01COMMUNIQUE LAB" 2001 5145
"01COMMUNIQUE LAB" 2002 2375
"01COMMUNIQUE LAB" 2003 635
"01COMMUNIQUE LAB" 2004 859
"01COMMUNIQUE LAB" 2005 703
"01COMMUNIQUE LAB" 2006 707
"01COMMUNIQUE LAB" 2007 2915
"01COMMUNIQUE LAB" 2008 2157
"0373849 B.C. LTD" 1999 4586
"0373849 B.C. LTD" 2000 4106
"0373849 B.C. LTD" 2001 3659
"0373849 B.C. LTD" 2002 3649
"0373849 B.C. LTD" 2003 7523
"0373849 B.C. LTD" 2004 6165
"0373849 B.C. LTD" 2005 5892
"0373849 B.C. LTD" 2006 18235
"0373849 B.C. LTD" 2007 34371
"0373849 B.C. LTD" 2008 4831

Code:
egen yeargroup = group(year)
sort year
foreach var yeargroup{
gen Size == 0;
if totass[_n] > totass[_n+1]{
relsize ==relsize + 1;
}
}

It turns out not working. Could somebody help me with the commands?
Thanks in advance.
Xiao

Mean of a group minus the observation's value in Panel data

Hi,

I have a data structure like this:

IO-IOarea-Year-index
1----4-------1997-0
2----4-------1998-2.2267373
3----5-------1998-0
4----5-------2000-0

I would create a mean of index variable for each IOarea group per year. So I used this code:
by IOarea Year, sort: egen index_byissue=mean(index)

However, now I would like to add a second step where I show the mean of the group minus that IO's score per year. So each IO get the score of index_byissue minus its own score each year. I aim to use other group members' average score each year as a new variable.

Thanks so much by now!

Logging results in forvalues-loop: command lines are not shown in output-logfile

Hello,

I want to save some output results using the -log- command. This works fine if I use -log- as in the example below: the command lines are shown above the results in the smcl-file (e.g. sum ... and reg ...) and consequently output is easily intrepretable.
Commands:

Code:

clear
sysuse auto
log using "Output-testA.smcl", replace  nomsg
sum price mpg weight length rep78
reg price mpg weight length i.rep78
log close
translate "Output-testA.smcl" "Output-testA.pdf", translator(smcl2pdf)

Result:

Code:

. sum price mpg weight length rep78

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
       rep78 |         69    3.405797    .9899323          1          5

. reg price mpg weight length i.rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(7, 61)        =      7.25
       Model |   262008114         7  37429730.6   Prob > F        =    0.0000
    Residual |   314788844        61  5160472.86   R-squared       =    0.4542
-------------+----------------------------------   Adj R-squared   =    0.3916
       Total |   576796959        68  8482308.22   Root MSE        =    2271.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -126.8367   84.49819    -1.50   0.138    -295.8012    42.12791
      weight |   5.186695   1.163383     4.46   0.000     2.860367    7.513022
      length |  -124.1544   40.07637    -3.10   0.003     -204.292   -44.01671
             |
       rep78 |
          2  |   1137.284   1803.332     0.63   0.531    -2468.701    4743.269
          3  |   1254.642   1661.545     0.76   0.453    -2067.823    4577.108
          4  |   2267.188   1698.018     1.34   0.187    -1128.208    5662.584
          5  |   3850.759   1787.272     2.15   0.035     276.8886     7424.63
             |
       _cons |   14614.49   6155.842     2.37   0.021     2305.125    26923.86
------------------------------------------------------------------------------

. log close

However, when I want to log results within -forvalues', the separate commands do not show up in the log file, making it less convenient to read/interpret.
Commands:

Code:

clear
sysuse auto
forvalues i = 1/3 {
    log using "Output-testB`i'.smcl", replace nomsg
    sum price mpg weight length rep78
    reg price mpg weight length i.rep78
    log close
    translate "Output-testB`i'.smcl"  "Output-testB`i'.pdf", translator(smcl2pdf)
    }

Results for 1 of the 3 log-files:

Code:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
       rep78 |         69    3.405797    .9899323          1          5

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(7, 61)        =      7.25
       Model |   262008114         7  37429730.6   Prob > F        =    0.0000
    Residual |   314788844        61  5160472.86   R-squared       =    0.4542
-------------+----------------------------------   Adj R-squared   =    0.3916
       Total |   576796959        68  8482308.22   Root MSE        =    2271.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -126.8367   84.49819    -1.50   0.138    -295.8012    42.12791
      weight |   5.186695   1.163383     4.46   0.000     2.860367    7.513022
      length |  -124.1544   40.07637    -3.10   0.003     -204.292   -44.01671
             |
       rep78 |
          2  |   1137.284   1803.332     0.63   0.531    -2468.701    4743.269
          3  |   1254.642   1661.545     0.76   0.453    -2067.823    4577.108
          4  |   2267.188   1698.018     1.34   0.187    -1128.208    5662.584
          5  |   3850.759   1787.272     2.15   0.035     276.8886     7424.63
             |
       _cons |   14614.49   6155.842     2.37   0.021     2305.125    26923.86
------------------------------------------------------------------------------

Is there some way to get the command lines (e.g. sum ... and reg ...) in the log file in case of sending output to a log file within -forvalues-?

Thanks a lot,
Mike

Reshaping Data

Dear members
I am facing an issue with some data which is available with me in MS-Excel in the following format:

		SALES	SALES	SALES	COGS	COGS	COGS
Company	ISIN	2016	2017	2018	2016	2017	2018
A	IN123456	344	400	410	300	325	330
B	JP112345	549	520	510	480	478	470
C	AU235678	218	345	490	240	300	400

I would like to convert it in the following format:

Company	ISIN	Year	SALES	COGS
A	IN123456	2016	344	300
B	JP112345	2016	549	480
C	AU235678	2016	218	240
A	IN123456	2017	400	325
B	JP112345	2017	520	478
C	AU235678	2017	345	300
A	IN123456	2018	410	330
B	JP112345	2018	510	470
C	AU235678	2018	490	400

Can this be done through Stata commands? If yes, please share possible ways to do it.

Thanks in advance

method for omitted variable bias cross sectional analysis

Dear all,
I have a cross sectional analysis to analyse the effect of head circumference (continuous variable) on cognititve skills (continuous variable). Can someone please suggest a method i can use to solve for omitted variable bias. I will be really grateful.

Panel regression model with N>T and serial autocorrelated error.

Dear all,

I've been trying to estimate a panel regression model on a dataset with N>T, where N is the number of cross-sectional units and T is number of time observations. I want to include a fixed effect.

I ran the Wooldridge test for serial autocorrelation and I rejected the null: so the model has serial autocorrelated errors.

Due to the fact that N>T I understand that I cannot rely on -xtreg-. My question is: given these conditions is it good to estimate the model by -xtregar- including a fixed effect? Is this model consistent with the fact that N>T and that the error is autocorrelated of order 1?

Many thanks to those who can help me

Comparing non nested models with xtmelogit

Dear all,

I am using xtmelogit to run a multilevel logistic regression with PISA data. My data is hierarchical (individuals nested into schools, schools nested into countries).

My dependent variable is the expectation of college graduation among the fifteen-years old students who were interviewed for PISA. I am regressing my dichotomous dependent variable (expecting college graduation or not) to a number of controls and two key independent variables: gender of the respondent and father's education.

There is the case of which one of the two parents is more important for the phenomenon that I am trying to explain. In other words, if the model with father's education has a better goodness of fit than the model with mother's education or vice versa. Here there are the two models:

Model with father's education (fisced4)

Code:

xtmelogit expect_ISCED5A female ib4.fisced4 || country3: || schoolid:, variance

Model with mother's education (misced4)

Code:

xtmelogit expect_ISCED5A female ib4.misced4 || country3: || schoolid:, variance

Since they are not nested models, I suspect that I should use BIC o AIC in order to compare the goodness of fit of the two models, but I am not sure.

Could I ask you for some guidance regarding to the best way of making this model comparison and, if this is the case, how to get the AIC reported after xtmelogit?

Many thanks for your attention and your help

Kind regards

Luis Ortiz

LOOP ERROR no observations defined

Dear Stata Users

I have been running a loop for each out of 5 years.

Code:

clear all
cd "\\registry\2017"
save appended_2017, emptyok
local filelist: dir . files "*.dta"
foreach f of local filelist {
  use `f', clear
  append using appended_2017.dta
  save appended_2017.dta, replace
  }

It worked well for 3 years (2014, 2015, 2016). For some unkown to me reason it produces an error for year 2017 and 2018.
All files are sitting in the corresponding year folder.

Error:

no variables defined
r(111);

Any idea of what is causing an error?

Thank you.

Calculate volatility of daily returns by use of GARCH

Hi,

We try to calculate the forecasted implied volatility of daily returns by use of the GARCH (1,1) model. So far we don't get any value that is in line with usual volatilities. Can somebody help us. We uploaded the data.

Thank you!

Replace observations with another group in Panel data

Hi,
I have panel data and I want to replace observations of one group with the same variable's observations for another group. I have 31 organizations in my dataset and I would like to use observations of UN for the variables of disaster and damage for other UN agencies as well (UN agencies have "1" under the UN_system dummy variable). I'm very much open to suggestions on this! Thank you.

Organization---Year----disasters----damage---UN_system
UN----------------1990-------1-------------2-------------1
UN----------------1991-------4------------ 3-------------1
UNHCR----------1990-------.------------. ------------ 1
UNHCR----------1991-------.------------. ------------ 1
UNICEF----------1990-------.------------. ------------ 1
UNICEF--------- 1990 -------.------------. ------------ 1

Interpreting significance - HLM using REML and Kenward-Roger-corr.

Dear Statalisters,

I struggle with interpreting my results from multilevel linear regressions, using restricted maximum likelihood and Kenward-Roger correction. I use Stata 15.0 on a Mac (version 10.14), and I hope I’m using the Code-function right.

My problem is that I do not know how to interpret the significance of the coefficients in my multilevel outputs. I’m doing my master thesis, so I will report significance at both the 0.1-, 0.05-, 0.01- and 0.001-level in my regression tables.

For example, from reading off P>|t| in the output below, I immediately thought that x1-x3 are statistically significant (x1 at the 0.1-level, x2 at the 0.01-level, x3 at the 0.001-level). However, is that the correct interpretation? The confusion arise as I do not know how to calculate the critical t-value, so that I can compare t from the output to the critical t-value. I tried using this calculatur (http://www.ttable.org/student-t-value-calculator.html), plotting in df=6, but I'm not sure this yields the right value. The critical value for a two-tail test with significance level 0.01 is calculated to be +/-3.71. However, if x2 is significant at the 0.01-level (which I though from reading its P>|t|), why is the t-value only 3.43, which yields a significance at the 0.05-level if the critical t-value is 3.71? Is the critical t-value in fact something else, or is my interpretation of P>|t| wrong?

Code:

mixed CHILDREN x1 x2 x3 || COUNTRY2:, reml dfmethod(kroger)

Code:

Mixed-effects REML regression                   Number of obs     =      3,245
Group variable: COUNTRY2                        Number of groups  =         10

                                                Obs per group:
                                                              min =        108
                                                              avg =      324.5
                                                              max =        466
DF method: Kenward-Roger                        DF:           min =      16.07
                                                              avg =   1,561.11
                                                              max =   3,240.03

                                                F(3,    98.23)    =      21.91
Log restricted-likelihood = -4836.0572          Prob > F          =     0.0000

--------------------------------------------------------------------------------
      CHILDREN |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
x1 |   .2339834   .1167206     2.00   0.058    -.0091296    .4770963
x2 |    .133747   .0389608     3.43   0.001     .0573566    .2101373
x3 |   .0837528   .0124587     6.72   0.000     .0593243    .1081814
_cons |   .5332078   .2389675     2.23   0.040     .0268024    1.039613
--------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
COUNTRY2: Identity           |
                  var(_cons) |   .1383108   .0711551      .0504601    .3791085
-----------------------------+------------------------------------------------
               var(Residual) |   1.136198   .0282628      1.082133    1.192965
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 124.99        Prob >= chibar2 = 0.0000

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |      3,245         .  -4836.057       6    9684.114   9720.624
-----------------------------------------------------------------------------

All help is very much appreciated.

Kind regards,

Frøydis Jensen

Using loops to create local from other locals

I tell Stata that:

Code:

local 1 "a b c"
local 2 "d e f"
local 3 "g h i"

and I want it to create the following local:

Code:

local 10 "a b c d e f g h i"

without having to write it explicitly but by a loop.
Nick replied to my post yesterday and suggested the following:

Code:

forval j = 1/10 {
      foreach v of var `list`j'' {
            ....
      }
}

However, I can't find a way to produce the local I want with it.

Can you think of an elegant and simple way of doing it?

creating loop

Hi,
I am new in stata, so I am doing long commands for a simple variable generating. I want to shorten the below commands. As you see only the numbers after "p" are descending rtr7p(8,7,6,5,4,3,2,1)e1by and rtr6p(8,7,6,5,4,3,2,1). I have to take values from that specific variable below exactly in the same order, not from the beginning but from the last one up to the first one. Do you have any suggestions for shortening these kinds of repetition? Thank you in advance

generate yearcp=rtr7p8e1by if rtr6p8==1
replace yearcp=rtr7p7e1by if rtr6p7==1
replace yearcp=rtr7p6e1by if rtr6p6==1
replace yearcp=rtr7p5e1by if rtr6p5==1
replace yearcp=rtr7p4e1by if rtr6p4==1
replace yearcp=rtr7p3e1by if rtr6p3==1
replace yearcp=rtr7p2e1by if rtr6p2==1
replace yearcp=rtr7p1e1by if rtr6p1==1

Thursday, November 28, 2019

Problem in reporting Arellano-Bond test for auto-correlation after xtabond2

Hello dear Stata Users,
I am estimating a dynamical panel (370 firms; T=10) where my dependent var is leverage and my independents are the lagged dependent, firm-specific factors impacting leverage and macro factors.
I use the command: xtabond2
Stata reports correctly the Sargan/Hansen tests but does not report for the second-order Arellano-Bond test for autocorrelation

Arellano-Bond test for AR(2) in first differences: z = . Pr > z = .

My panel is not balanced and I have missing values in lone of my regressors (tobinq) can it be a reason?

Does anyone have an idea? I am stuck here.

Thank you all in advance

why categorical variables regression only shows one group result?

Hey,

I am running a regression with categorical variables. But I am quite confused why it only shows the second group result?

Code:
gen byte agegroup = 0 if age>=60 & age<.
replace agegroup = 1 if age>=50 & age<60
replace agegroup = 2 if age<50

reg cash_etr i.agegroup tenure lev PPE cash net_CF size tobin roa sales_growth capx xrd cf_vol for_income ch_nol nol for_op ffi_* fyear_* if age<=50, r
outreg2 using main_regression_age.xls, excel stats(coef tstat) bdec(3) tdec(2) symbol(***, **, *) label cti(Cash_ETR) replace addtext(Industry and year effects, YES) nonotes drop(cash_etr fyear_* ffi_* o.*) adjr2

reg gaap_etr i.agegroup tenure lev PPE cash net_CF size tobin roa sales_growth capx xrd cf_vol for_income ch_nol nol for_op ffi_* fyear_*, r
outreg2 using main_regression_age.xls, excel stats(coef tstat) bdec(3) tdec(2) symbol(***, **, *) label cti(GAAP_ETR) append addtext(Industry and year effects, YES) nonotes drop(gaap_etr fyear_* ffi_* o.*) adjr2

reg pbtd i.agegroup tenure lev PPE cash net_CF size tobin roa sales_growth capx xrd cf_vol for_income ch_nol nol for_op ffi_* fyear_*, r
outreg2 using main_regression_age.xls, excel stats(coef tstat) bdec(3) tdec(2) symbol(***, **, *) label cti(PBTD) append addtext(Industry and year effects, YES) nonotes drop(pbtd fyear_* ffi_* o.*) adjr2

bootstrap test for a multilevel mediation analysis

Dear all,

I'm trying to perform a 2-2-1 multilevel analysis, and my DV is binary. I used the command "melogit" by step to achieve my goal, but when I tried to perform a bootstrap test, this command seems to be not fit. I know there is a command "ml_mediation" for multilevel mediation with continuous DV could use bootstrapping, but if there is any command for logit models?

Thanks for any help on this.

How get variable list for a condition

I wonder if its possible get a variable list for a condition, example:

Code:

clear all
sysuse auto
ds, has(type string)

unab allvars: _all
unab vars_to_exclude: make


replace mpg =. if mpg >33

foreach i in `:list allvars - vars_to_exclude' {
  display "`i'"
  list `i' if `i'==.
}

price
mpg

     +-----+
     | mpg |
     |-----|
 43. |   . |
 57. |   . |
 66. |   . |
 71. |   . |
     +-----+
rep78

     +-------+
     | rep78 |
     |-------|
  3. |     . |
  7. |     . |
 45. |     . |
 51. |     . |
 64. |     . |
     +-------+
headroom
trunk
weight
length
turn
displacement
gear_ratio
foreign

My goal its get something:

Code:

*Just an idea:

ds, has(vlist)
mpg rep78

 or

foreach v of local vlist {
    di  "`v'"
}

mpg rep78

Please any advice I would grateful
Regards
Rodrigo

average using past group information

Hi all,

I have company-level information as follows. For each company id in a country, I have the current location (current_loc) and new location (new_loc) that a company will move to next quarter. Now for each company that moved (e.g. id 137 moved from location 5 to location 7), I want to get two variables:
- the average of variable size using all companies located in in the same current area (e.g. for id 137 it's location 5) in the past two years
- the average of variable size using all companies located in in the new area (e.g. for id 137 it's location 7) in the past two years

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 country float company_id byte curent_loc str6 quarter byte new_loc float(size avg_size_current avg_size_new)
"X"   131 5 "2012q3" . 27.443584 . .
"X"   137 5 "2012q4" 7 23.344286 . .
"X"   140 5 "2013q1" . 16.832315 . .
"X"   219 5 "2013q2" . 11.427843 . .
"X"   165 5 "2013q3" .  53.44666 . .
"X" 14685 6 "2012q1" .  2488.442 . .
"X"   134 6 "2012q1" . 13.555255 . .
"X"   127 6 "2012q2" .   26.1684 . .
"X"    81 6 "2012q2" . 37.755157 . .
"X"    66 6 "2012q2" .  53.79955 . .
"X"     2 6 "2012q3" . 20.235474 . .
"X"  5021 6 "2012q3" .  2871.219 . .
"X"    93 6 "2012q3" .  39.22329 . .
"X"   210 6 "2012q4" 5 28.488956 . .
"X"    19 6 "2013q1" .  52.53154 . .
"X"   197 6 "2013q2" . 29.569094 . .
"X"   130 6 "2013q3" . 15.983066 . .
"X" 14427 7 "2012q2" .  2766.468 . .
"X"   146 7 "2012q2" .  44.75117 . .
"X"    92 7 "2012q2" .  44.33076 . .
"X"   164 7 "2012q3" .  56.59673 . .
"X"   158 7 "2012q3" .  32.74441 . .
"X"   186 7 "2012q3" . 13.370055 . .
"X"  1239 7 "2012q4" 5 2251.2556 . .
"X"    42 7 "2013q3" .  58.74424 . .
"X"    85 7 "2013q4" .  46.32192 . .
"X"    53 7 "2014q1" . 12.270756 . .
"X"    76 7 "2014q3" .  47.29833 . .
"X"   171 7 "2016q2" . 34.806293 . .
"X" 10144 7 "2016q3" . 2166.3647 . .
"X"    51 7 "2016q3" .   52.9871 . .
"X"    37 7 "2016q4" . 16.703777 . .
end

Thanks for your help in advance

Analysis of Stock market data in Stata - First Time

Dear All,

I have daily stock market data of 174 firms from 2000 to 2019 (about 4400 prices per stock). I also have the market indexes for 19 separate indexes, as this is a cross-country analysis. I have attempted to import the data into stata, from excel, and then attempted to convert this data to the long form, with the results below.

My data looks like this, from (dataex):

Code:

* Example generated by -dataex-. To install: ssc install dataex

 dataex D_Date stock daily_return
clear
input str10 D_Date str5 stock str17 daily_return
"  1/1/2003" "Co1" ""                 
"  1/1/2004" "Co1" "1947.63"          
"  1/1/2007" "Co1" "5916.02"          
"  1/1/2008" "Co1" "6095.83"          
"  1/1/2009" "Co1" "2425.31"          
"  1/1/2010" "Co1" "3607.04"          
"  1/1/2013" "Co1" "3802.48"          
"  1/1/2014" "Co1" "4151.77"          
"  1/1/2015" "Co1" "3610.1"           
"  1/1/2016" "Co1" "4085.84"          
"  1/1/2018" "Co1" "6158.6"           
"  1/1/2019" "Co1" "5084.71"          
"  1/2/2003" "Co1" ""                 
"  1/2/2004" "Co1" "1977.54"          
"  1/2/2006" "Co1" "4797.02"          
"  1/2/2007" "Co1" "6042.59"          
"  1/2/2008" "Co1" "6090.77"          
"  1/2/2009" "Co1" "2510.8"           
"  1/2/2012" "Co1" "2973.79"          
"  1/2/2013" "Co1" "3921.28"          
"  1/2/2014" "Co1" "4130.940000000001"
"  1/2/2015" "Co1" "3661.88"          
"  1/2/2017" "Co1" "4659.3"           
"  1/2/2018" "Co1" "6202.12"          
"  1/2/2019" "Co1" "5131.13"          
"  1/3/2003" "Co1" ""                 
"  1/3/2005" "Co1" "3143.99"          
"  1/3/2006" "Co1" "4842.7"           
"  1/3/2007" "Co1" "6051.79"          
"  1/3/2008" "Co1" "6033"             
"  1/3/2011" "Co1" "4390.29"          
"  1/3/2012" "Co1" "3000.02"          
"  1/3/2013" "Co1" "3938.14"          
"  1/3/2014" "Co1" "4162.91"          
"  1/3/2017" "Co1" "4712.08"          
"  1/3/2018" "Co1" "6279.7"           
"  1/3/2019" "Co1" "5132.25"          
"  1/4/2005" "Co1" "3167.22"          
"  1/4/2006" "Co1" "4903.25"          
"  1/4/2007" "Co1" "5984.54"          
"  1/4/2008" "Co1" "5887.63"          
"  1/4/2010" "Co1" "3666.94"          
"  1/4/2011" "Co1" "4369.51"          
"  1/4/2012" "Co1" "2969.71"          
"  1/4/2013" "Co1" "3932.07"          
"  1/4/2016" "Co1" "4002.94"          
"  1/4/2017" "Co1" "4707.91"          
"  1/4/2018" "Co1" "6401.52"          
"  1/4/2019" "Co1" "5293.43"          
"  1/5/2004" "Co1" "2015.32"          
"  1/5/2005" "Co1" "3137.83"          
"  1/5/2006" "Co1" "4914.440000000001"
"  1/5/2007" "Co1" "5860.46"          
"  1/5/2009" "Co1" "2534.2"           
"  1/5/2010" "Co1" "3748.82"          
"  1/5/2011" "Co1" "4283.28"          
"  1/5/2012" "Co1" "2897.42"          
"  1/5/2015" "Co1" "3557.63"          
"  1/5/2016" "Co1" "4027.53"          
"  1/5/2017" "Co1" "4731.89"          
"  1/5/2018" "Co1" "6394.02"          
"  1/6/2003" "Co1" ""                 
"  1/6/2004" "Co1" "2015.32"          
"  1/6/2005" "Co1" "3137.83"          
"  1/6/2006" "Co1" "4914.440000000001"
"  1/6/2009" "Co1" "2534.2"           
"  1/6/2010" "Co1" "3748.82"          
"  1/6/2011" "Co1" "4283.28"          
"  1/6/2012" "Co1" "2897.42"          
"  1/6/2014" "Co1" "4162.91"          
"  1/6/2015" "Co1" "3557.63"          
"  1/6/2016" "Co1" "4027.53"          
"  1/6/2017" "Co1" "4731.89"          
"  1/7/2003" "Co1" ""                 
"  1/7/2004" "Co1" "2031.12"          
"  1/7/2005" "Co1" "3173.95"          
"  1/7/2008" "Co1" "5851.84"          
"  1/7/2009" "Co1" "2631.09"          
"  1/7/2010" "Co1" "3744.43"          
"  1/7/2011" "Co1" "4309.83"          
"  1/7/2013" "Co1" "3933.59"          
"  1/7/2014" "Co1" "4287.97"          
"  1/7/2015" "Co1" "3573.88"          
"  1/7/2016" "Co1" "3916.55"          
"  1/7/2019" "Co1" "5357.7"           
"  1/8/2003" "Co1" ""                 
"  1/8/2004" "Co1" "2056.84"          
"  1/8/2007" "Co1" "5807.37"          
"  1/8/2008" "Co1" "5907.91"          
"  1/8/2009" "Co1" "2589.72"          
"  1/8/2010" "Co1" "3741.35"          
"  1/8/2013" "Co1" "3929.04"          
"  1/8/2014" "Co1" "4351.91"          
"  1/8/2015" "Co1" "3626.56"          
"  1/8/2016" "Co1" "3786.08"          
"  1/8/2018" "Co1" "6392.610000000001"
"  1/8/2019" "Co1" "5361.12"          
"  1/9/2003" "Co1" ""                 
"  1/9/2004" "Co1" "2051.39"          
"  1/9/2006" "Co1" "4970.45"          
end

I need to present this data in a format from which i can calculate returns, abnormal returns and the CAR for an event study.

Can anyone help with the way i am presenting my data? I apologise if this is not in a correct format but i am very new and have tried my best.

Any help with this issue would be greatly appreciated.

Callum

Expand data

Dear Statalist Members,

I am looking for help with expanding my dataset below to include a categorical variable that takes the values 1 to 15 for each observation. How can i do it. So i need to have 15 observations of each of the individual observations currently in the dataset.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 nuts318cd str9 ttwa11cd double area_intersection_sqm float(percentage_nuts3_in_ttwa percentage_ttwa_in_nuts3)
"UKC11" "E30000093" 203769975.4     68.5304    26.73737
"UKC11" "E30000199" 31866.05631  .010716955   .00800233
"UKC11" "E30000203" 8120.006566   .00273086   .00047345
"UKC11" "E30000215" 93514618.39    31.45014    92.84214
"UKC11" "E30000246" 16150.06451  .005431469  .000754046
"UKC11" "E30000275" 1721.391752  .000578926  .000566958
"UKC12" "E30000093" 298643393.4    99.99471    39.18604
"UKC12" "E30000147" 15797.86158  .005289595  .003358221
"UKC13" "E30000093" 40310.33369  .020412154  .005289259
"UKC13" "E30000199" 197413336.2    99.96523    49.57521
"UKC13" "E30000203" 14324.15059  .007253395  .000835193
"UKC13" "E30000246" 14039.81523  .007109415  .000655519
"UKC14" "E30000064" 22453.95394  .001006206  .001029967
"UKC14" "E30000093" 434.9438238 .0000194907 .0000570705
"UKC14" "E30000106" 56045.84077  .002511524  .002862576
"UKC14" "E30000199"  47431345.3   2.1254914   11.911146
"UKC14" "E30000203"  1714908898    76.84842    99.99057
"UKC14" "E30000215" 7209707.493    .3230811    7.157861
"UKC14" "E30000245" 318562308.6     14.2754    25.26244
"UKC14" "E30000246" 9913.175135  .000444229  .000462846
"UKC14" "E30000275" 143346252.4    6.423626    47.21256
"UKC21" "E30000064"  2030476941    40.40051    93.13832
"UKC21" "E30000106" 12.28169309 2.44369e-07 6.27295e-07
"UKC21" "E30000173"  1460636223   29.062355    99.99895
"UKC21" "E30000203" 10266.28956  .000204269  .000598593
"UKC21" "E30000245" 563125216.8   11.204532    44.65662
"UKC21" "K01000009" 971549507.2    19.33097    57.50991
"UKC21" "K01000010" 51041.11077  .001015568  .002415779
"UKC21" "S22000067" 20922.50236  .000416296  .001403097
"UKC22" "E30000173" 1066.288223  .000265062 .0000730009
"UKC22" "E30000245" 379294671.4    94.28657     30.0786
"UKC22" "E30000275" 22982843.17    5.713166    7.569636
"UKC23" "E30000203" 5531.766495  .004028287  .000322539
"UKC23" "E30000245" 29468.49058  .021459244  .002336893
"UKC23" "E30000275"   137288062    99.97451    45.21724
"UKD11" "E30000106" 15318.28089  .000738895  .000782391
"UKD11" "E30000163" 77898272.46    3.757515   14.123253
"UKD11" "E30000223"  7257.39641  .000350069  .000672506
"UKD11" "E30000286" 737526859.2   35.575474    99.99664
"UKD11" "E30000290" 851034545.5    41.05065    99.99887
"UKD11" "K01000010" 406650680.9    19.61527   19.246805
"UKD12" "E30000039" 392.9681989 8.27563e-06 .0000333336
"UKD12" "E30000064" 149557684.9    3.149577    6.860236
"UKD12" "E30000076" 22084.66337  .000465087  .003840515
"UKD12" "E30000106"  1957803573    41.22993    99.99604
"UKD12" "E30000163" 473662135.9    9.974983    85.87675
"UKD12" "E30000203" 82740.55843  .001742456  .004824324
"UKD12" "E30000223"  1079124408    22.72558    99.99703
"UKD12" "E30000246" 20307.96607  .000427671  .000948179
"UKD12" "E30000286" 24754.72675  .000521317  .003356338
"UKD12" "E30000290" 9621.292648  .000202617  .001130528
"UKD12" "K01000010"  1088165321    22.91598    51.50294
"UKD12" "S22000067" 27691.31014  .000583159  .001857024
"UKD33" "E30000239" 115595841.7         100    6.289208
"UKD34" "E30000239" 203167544.3    99.99309    11.05371
"UKD34" "E30000284" 14042.44305  .006911277  .001959212
"UKD35" "E30000239" 229213855.8         100    12.47081
"UKD36" "E30000170" 8056.388494  .002456537   .00111413
"UKD36" "E30000239"   172919193    52.72615    9.407992
"UKD36" "E30000255" 24226.78715  .007387181  .002584866
"UKD36" "E30000284" 155005669.1      47.264    21.62651
"UKD37" "E30000029" 7006.580527  .001751785  .001925189
"UKD37" "E30000170" 11991.00274   .00299799  .001658254
"UKD37" "E30000219" 20101.86674  .005025868  .005382483
"UKD37" "E30000239" 399928974.5    99.99023   21.758884
"UKD41" "E30000170" 107194506.6    78.21149   14.824088
"UKD41" "E30000239"    29849325    21.77873   1.6240083
"UKD41" "E30000255" 13401.89378   .00977832  .001429909
"UKD42" "E30000171" 34872029.36         100    16.13467
"UKD44" "E30000039" 12103.43673  .001412412  .001026678
"UKD44" "E30000076" 574996921.4    67.09933    99.99177
"UKD44" "E30000170" 183.1079797 .0000213678 .0000253223
"UKD44" "E30000171"  64195469.6    7.491297    29.70211
"UKD44" "E30000223" 16445.66858  .001919129  .001523937
"UKD44" "E30000255" 217712862.6   25.406025   23.228775
"UKD45" "E30000039" 7479.785688  .000744056  .000634475
"UKD45" "E30000076" 13721.62053  .001364966  .002386185
"UKD45" "E30000170" 422956930.5    42.07388    58.49134
"UKD45" "E30000171" 117063523.2    11.64496    54.16322
"UKD45" "E30000182" 19037.76823  .001893793  .006798087
"UKD45" "E30000255" 465211381.6    46.27716    49.63552
"UKD46" "E30000018" 346.9086397 .0000706419  .000100878
"UKD46" "E30000029"  14486.7692  .002949978   .00398051
"UKD46" "E30000039" 9279.847951  .001889679  .000787166
"UKD46" "E30000170" 192902964.5    39.28132    26.67684
"UKD46" "E30000182" 280009612.2    57.01907    99.98702
"UKD46" "E30000239" 18143955.22      3.6947    .9871558
"UKD47" "E30000170" 17756.33516  .003231584   .00245555
"UKD47" "E30000233" 248511241.2    45.22808    40.97392
"UKD47" "E30000239" 18148.76411  .003303004  .000987417
"UKD47" "E30000255" 254286723.4    46.27919   27.131006
"UKD47" "E30000284" 46628424.33    8.486192    6.505633
"UKD61" "E30000239" 38439.87147   .02128579  .002091393
"UKD61" "E30000284" 180550930.7    99.97871   25.190605
"UKD62" "E30000185" 841.4333672 .0000721424  .000189205
"UKD62" "E30000197" 662756971.6    56.82312     78.0034
"UKD62" "E30000239" 475823055.3    40.79588    25.88804
"UKD62" "E30000262" 13702.54716  .001174822  .001188375
"UKD62" "E30000273" 27661249.03    2.371606   2.5870845
"UKD62" "E30000284" 19827.24759  .001699938  .002766313
end

Best,

Bridget