Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Saturday, November 30, 2019
Help Fixing Bar Graph of GDP and Dummy Variables
I am using Stata 16, on mac. I used the following command in Stata: reg lnGDP lnagedpop lnpopulation WestCoast EastCoast, robust. All of the data is based on 50 states including D.C. over a span of three years (2016-2018). lnagedpop represents (population that is 65 and older). The dummy variable WestCoast = 1 if the U.S. state is located on the West Coast and 0 if not. Similarly, the dummy variable East Coast = 1 if the U.S. state is located on the East Coast and 0 if not. Midwest is my reference category. I am trying to create a graph based on the regression model I created. I am trying to make a bar graph with lnGDP on the yaxis and the dummy variables West Coast and East Coast on the x-axis. I used the command graph bar (mean) lnGDP, over(WestCoast) over(EastCoast), but my graph doesn't look right. Could someone help me fix my bar graph or maybe have an idea of another graph I could use instead so that it looks better?
Array
Thank you in advance for your help
Jason Browen
Trouble Generating Variable dependant on period
Above is a picture of a subset of my data. hs10 refers to a specific product being sold and m_val is a value the product takes on. What I need to do is generaet a variable that is the log difference of m_val for period 1 and 2. The problem however is that there are a number of products that don't have m_val for both periods. For those products, I want the generated value to simply be the log of m_val.
I am lost with where to start!
Stata returning missing when converting string date
I have dates in two formats in a single variable one is 20-Apr-19 the other is 21-APRIL-2019. Stata is returning missing when converting 20-Apr-19 but it is properly converting 21-APRIL-2019. What can be the problem.
I am using this command
gen date2=date(var1, "DMY")
Oscar
Different I-square results for meta summarize and meta forestplot using random empirical Bayes method
The meta forestplot using random(ebayes) appears to produce a different value for the I2 result compared to the output for meta summarize using the same method. Using different methods (fixed dlaird etc) produces the same I2 in the meta summarize and forestplot but not when using random(ebayes).
Is the an issue with the meta forestplot?
Thanks
"Data have changed" dialog box
In my workflow, I work from original datasets and alter them with Stata code. As a result, I never save altered Stata datasets. I'm in Windows 10, and every time I close the Stata window, a dialog box informs me that "Data in memory have changed. Do you want to save the changes before exiting?" In previous versions of Stata, there was a keyboard shortcut for the "No" option, so it wasn't too big of a pain to repeatedly deal with that dialog box. But in Stata 16, that shortcut no longer exists, and now it's a big pain because every single time I close a Stata window, I have to mouse over and click "no." It's an unnecessary pain point, and in my workflow it happens a lot. Is there some sort of setting, option, or other method I can do to make Stata stop offering me that dialog box? Or at least restore the keyboard shortcut?
meta forestplot error in stata 16
I am running several meta forestplots for subgroups which worked fine in the version of stata 16 on 14 Nov.
meta forestplot _id outcome_name trimester _plot _esci _weight, ///
subgroup(sga_order) eform ///
esrefline(lcolor(green) lpattern(dash)) ///
nullrefline(favorsleft("Decreased risk") ///
favorsright("Increased risk")) insidemarker nonotes ///
random(ebayes) crop(. 2)
meta forestplot _id outcome_name trimester _plot _esci _weight, ///
subgroup(outcome) eform ///
esrefline(lcolor(green) lpattern(dash)) ///
nullrefline(favorsleft("Decreased risk") ///
favorsright("Increased risk")) insidemarker nonotes ///
random(ebayes) crop(. 2)
The first code produces a plot however the second produces an error thinking that meta is a variable
Effect-size label: Log Odds-Ratio
Effect size: logor
Std. Err.: se
Study label: author
variable _meta* not found
r(111);
end of do-file
r(111);
. update meta
invalid syntax
r(198);
Any suggestions what I am doing wrong?? Thanks
Searching within strings
"Aneuploidy CRMI 13, 15, 16, 17, 18, 21, 22, X, Y"
"Translocation 46,XX,t(12;21)(q24.33;q22.13)"
"Aneuploidy CRMI 13, 15, 16, 17, 18, 21, 22, X, Y, Add Chromosome 14, Translocation 45,XY,der(13;14)(q10;q10), Gender Selection Pt Choice F"
I am trying to find a way to search within these long string variables to create a new variable (pgttype1) that discretely categorizes such as:
1) contains "aneuploidy" but does not contain "translocation"
or
2) contains "translocation" but does not contain "gender selection"
Can't seem to get it to work with regexm or strpos...
Any ideas?
Thanks in dance
Method for continuous predictor and dichotomous dependent variable
I am working on a research project asking: Do lower rates of school satisfaction and school engagement influence students’ suspension and expulsion?
School satisfaction and school engagement are the independent variables and are continuous (likert scale strongly disagree to strongly agree)
Suspension and expulsion are ever/never dichotomous variables.
Would I use an ordinal logistic regression? If so how would I go about that?
simulation study
program define mysim
- drop _all
- set obs 19
- gen b =inlist (0, 0.1,0.4)
- more
- gen u = rnomal(0,1)
The question is yi = Bi + ui , so that xi = i and ui is N(0.1) for i = 1.....n
when B = {0, 0.1, 0.4}
Set Seed Random Sample
Multiline model name for long variable names using esttab
I am using esttab to generate tables in Latex. I have long model name that I want to appear in two lines because if I keep them in one line it pushes the last column out of the table frame. Here is my code:
esttab high low diff Total , ///
cells("mean(pattern(1 1 0 1) fmt(2)) b(star pattern(0 0 1 0) fmt(2))") ///
label mlabels("Mean treated three or more times" "Mean treated one or two times" "Diff" "Total") ///
collabels(none) replace---------------------------------------------------------------------------
(1) (2) (3) (4)
Mean treat~s Mean treat~s Diff Total
---------------------------------------------------------------------------
Teacher with highe~e 0.12 0.11 -0.01 0.11
Teacher per student 2.64 2.24 -0.40 2.40
Avg class size 37.98 38.38 0.40 38.23
Total enrollment 510.48 569.82 59.34 546.52
Total female enrol~t 124.76 216.54 91.78* 180.51
Number of students~o 99.10 109.23 10.14 105.25
School age 22.71 22.02 -0.70 22.29
Number of shifts 1.71 1.86 0.15 1.80
Average score at t~e 202.50 195.61 -6.89 198.31
---------------------------------------------------------------------------
Observations 42 65 107 107
---------------------------------------------------------------------------Clustered Errors and fixed effect on the same level
I was wondering if you could help me with a problem:
I have a continous outcome variable (a health score) and information about incidents per governorate in a country. I want to describe the realtionship between those two (obviously adding more variables at another point).
My question: Do I include fixed governorate effects in this regression, eventhough it is on the exact same level (a dummy for all the governorates)? Additionally, do I cluster the SE?
Basic Code:
reg health incidents
1. alternative
reg health incidents i.governorate
2. alternative
reg health incidents i.governorate, vce(cluster gov)
Like I said, the information on the incidents are also on the governorate level (one number for each governorate).
It feels that this would create somewhat a problem or a circular thing. However I can not explain to myself econometrically if this really states a problem or if this is fine.
Thank a lot in advance
Division into two groups
I am running an analysis with an unbalanced dataset for 10 years. I would like to see whether the results differ for two groups of firms based on net income across years. For this reason, I want to differentiate and run the analysis for dependent variables of two groups of firms: one high-income firms and low-income firms (based on the L.median of net income). However, I am stucked now, as the command "bysort time: egen MNincome = median (l.netincome)" gives me error "not sorted", although I have sorted the data beforehand.
Could you please kindly advise, how to overcome this error? Maybe there is a different, simpler approach to run that kind of analysis? (without creating dependent variables for each group e.g. "gen employeesgroup1 = numberofemployees if MNincome > 10" where 10 is a median obtained from the command "sum netincome, detail" as 50% percentile.
Thank you in advance!
one year survival - data management
I have a list of children with 2 variables "date of birth" and "date of death". Both are recorded as numeric daily date (int); for example "13894" is "15jan1998".
I would like to make a new variable called "1-year survival" with binary outcome "yes" and "no" using the aforementioned variables. Thank you!
Comparing two waves with RE-Logit
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
interested float %13.0g interested
Interested(0/1)
polpart float %9.0g polpart Participates (0/1)
hhinc_eqr float %9.0g Real Equivalized HH-Inc. in thousand €
female float %9.0g female Female (0/1)
age int %8.0g Age of Individual
west float %9.0g west West-Germany(0/1)
y2013 float %9.0g y2013 Pre crisis(0/1)
y2017 float %9.0g y2017 Post crisis(0/1)
unemployed float %10.0g unemployed
Unemployed(0/1)
edyears float %9.0g Number of Years of Education
party_pref float %13.0g party_pref
Party preference(0/1)
worried float %11.0g worried
hhinc_group float %9.0g hhinc_group
Income Groups
hhsize byte %8.0g Number of Persons in HH
persnr long %12.0g Unveraenderliche Personennummer (PID)
syear int %12.0g Befragungsjahr
. logistic interested i.y2017##c.age i.y2017##ib(2).hhinc_group i.y2017##i.west ///
> i.y2017##i.female i.y2017##i.party_pref i.y2017##i.unemployed ///
> i.y2017##i.worried i.y2017##c.edyears, vce(cluster persnr)
Logistic regression Number of obs = 21,444
Wald chi2(19) = 3116.28
Prob > chi2 = 0.0000
Log pseudolikelihood = -11933.143 Pseudo R2 = 0.1783
(Std. Err. adjusted for 15,309 clusters in persnr)
-----------------------------------------------------------------------------------
| Robust
interested | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
y2017 |
2017 | 1.959571 .4097247 3.22 0.001 1.300721 2.952146
age | 1.028058 .0016503 17.24 0.000 1.024828 1.031297
|
y2017#c.age |
2017 | .9953376 .0017759 -2.62 0.009 .991863 .9988244
|
hhinc_group |
Poor | .841905 .0772755 -1.87 0.061 .7032897 1.007841
Rich | 1.351389 .0803104 5.07 0.000 1.202805 1.518328
|
y2017#hhinc_group |
2017#Poor | 1.102225 .1224724 0.88 0.381 .8865227 1.37041
2017#Rich | .8609955 .0585903 -2.20 0.028 .7534891 .9838406
|
west |
West | 1.097271 .0653854 1.56 0.119 .9763183 1.233207
|
y2017#west |
2017#West | .9534032 .0610252 -0.75 0.456 .8409944 1.080837
|
female |
Female | .3994588 .020074 -18.26 0.000 .36199 .4408059
|
y2017#female |
2017#Female | 1.080646 .0578372 1.45 0.147 .9730302 1.200164
|
party_pref |
preference | 3.502175 .1792699 24.49 0.000 3.167863 3.871767
|
y2017#party_pref |
2017#preference | .9004705 .0542628 -1.74 0.082 .8001578 1.013359
|
unemployed |
unemployed | 1.166827 .1479902 1.22 0.224 .910013 1.496117
|
y2017#unemployed |
2017#unemployed | .8746221 .1345562 -0.87 0.384 .6469451 1.182425
|
worried |
worried | .8425315 .0449318 -3.21 0.001 .758913 .9353632
|
y2017#worried |
2017#worried | 1.066966 .0755296 0.92 0.360 .928741 1.225763
|
edyears | 1.206524 .0125361 18.07 0.000 1.182202 1.231346
|
y2017#c.edyears |
2017 | .9956702 .0111172 -0.39 0.698 .9741176 1.0177
|
_cons | .010937 .0020517 -24.07 0.000 .0075722 .0157971
-----------------------------------------------------------------------------------
Note: _cons estimates baseline odds.. xtlogit interested c.age##c.age ib(2).hhinc_group i.west i.female i.party_pref ///
> i.unemployed i.worried edyears i.y2017, re nolog or intpoints(32)
Random-effects logistic regression Number of obs = 21,444
Group variable: persnr Number of groups = 15,309
Random effects u_i ~ Gaussian Obs per group:
min = 1
avg = 1.4
max = 2
Integration method: mvaghermite Integration pts. = 32
Wald chi2(11) = 1184.43
Log likelihood = -11080.549 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
interested | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 1.043214 .0135104 3.27 0.001 1.017068 1.070033
|
c.age#c.age | 1.000125 .0001226 1.02 0.308 .9998846 1.000365
|
hhinc_group |
Poor | .7840031 .0976617 -1.95 0.051 .6141654 1.000807
Rich | 1.440353 .1243391 4.23 0.000 1.216154 1.705883
|
west |
West | 1.22893 .1197768 2.12 0.034 1.015232 1.487609
|
female |
Female | .1370047 .0129937 -20.96 0.000 .1137644 .1649927
|
party_pref |
preference | 10.51701 .9246106 26.76 0.000 8.852339 12.49471
|
unemployed |
unemployed | 1.141778 .1908279 0.79 0.428 .8228457 1.584327
|
worried |
worried | .8122069 .0655849 -2.58 0.010 .6933188 .9514815
edyears | 1.541916 .0302456 22.08 0.000 1.48376 1.60235
|
y2017 |
2017 | 1.840504 .1101531 10.19 0.000 1.636789 2.069573
_cons | .0000852 .0000392 -20.36 0.000 .0000346 .0002101
-------------+----------------------------------------------------------------
/lnsig2u | 2.411774 .067891 2.27871 2.544838
-------------+----------------------------------------------------------------
sigma_u | 3.339721 .1133685 3.124753 3.569477
rho | .7722266 .0119415 .7479791 .7947813
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline odds (conditional on zero random effects).
LR test of rho=0: chibar2(01) = 1719.38 Prob >= chibar2 = 0.000Now I am wondering, whether I should interpret this change by looking at the marginal effects of my year-dummy, or if I need to estimate separate RE-Logit for each of the three income groups (?). AFAIK the year dummies will pick up any variation in the outcome that happen over time and that is not attributed to other explanatory variables, BUT does it make sense to estimate it for different subpopulations ? If not, what other possibilities do I have to compare my two waves ?
Here is what I ran after my RE-Logit:
. margins, dydx(y2017) over(hhinc_group) coeflegend post
Average marginal effects Number of obs = 21,444
Model VCE : OIM
Expression : Pr(interested=1), predict(pr)
dy/dx w.r.t. : 1.y2017
over : hhinc_group
------------------------------------------------------------------------------
| dy/dx Legend
-------------+----------------------------------------------------------------
0.y2017 | (base outcome)
-------------+----------------------------------------------------------------
1.y2017 |
hhinc_group |
Poor | .0464933 _b[1.y2017:1bn.hhinc_group]
Middle | .0518429 _b[1.y2017:2.hhinc_group]
Rich | .05327 _b[1.y2017:3.hhinc_group]
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
.
end of do-file
. do "C:\Users\Lorenz\AppData\Local\Temp\STD1bbc_000000.tmp"
. test _b[1.y2017:1bn.hhinc_group] = _b[1.y2017:2.hhinc_group] = _b[1.y2017:3.hhinc_group]
( 1) [1.y2017]1bn.hhinc_group - [1.y2017]2.hhinc_group = 0
( 2) [1.y2017]1bn.hhinc_group - [1.y2017]3.hhinc_group = 0
chi2( 2) = 50.06
Prob > chi2 = 0.0000Alternatively, I thought about running
xtlogit interested c.age##c.age ib(2).hhinc_group i.west i.female i.party_pref /// i.unemployed i.worried edyears i.y2017 if hhinc_group==1, re nolog or intpoints(32)
I am currently at undergrad level and re did my best to read all material available, but right now I cannot find an answer in regards to what I should use for my analysis.
I would very much appreciate any input.
Best regards,
Lorenz
xtreg, fe robust: xtoverid error(2b) operator invalid when correcting Hausman test (V_b-V_B is not positive definite) due to year dummies
I am analyzing a panel dataset with year dummies over a period of 2000-2018 and
apparently like many other Stata beginners, I came across the issue of (V_b-V_B is not positive definite) when using Hausman test to determine whether to use the FE or RE model.
I tried to follow suggestions of using xtoverid and after installing the package, I encountered the following issue:
. xtoverid
2b: operator invalid
r(198);Notes: I already tried
hausman fe re, sigmamore
hausman fe re, sigmaless ** Redo HAUSMAN test to check for RE or FE model
. xtreg wROAf1 IV_CSPaggr geoentropy4rg indentropy CF_age CF_size_lnEmp CF_Levg CF_NPM CF_orgslack CC_WGI s_poptotal CC_WDI
> _PopGrowth s_GDP s_GDPpc CC_WDI_GDPPCgrowth s_fdi CC_WDI_ttUnemplRate i.yeardummy, fe
note: 17.yeardummy omitted because of collinearity
Fixed-effects (within) regression Number of obs = 18,964
Group variable: compid Number of groups = 3,780
R-sq: Obs per group:
within = 0.0346 min = 1
between = 0.0089 avg = 5.0
overall = 0.0011 max = 16
F(30,15154) = 18.13
corr(u_i, Xb) = -0.6065 Prob > F = 0.0000
-------------------------------------------------------------------------------------
wROAf1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
IV_CSPaggr | -.0227671 .0064973 -3.50 0.000 -.0355027 -.0100316
geoentropy4rg | -.6074356 .4636818 -1.31 0.190 -1.516308 .3014366
indentropy | -.0358456 .2862778 -0.13 0.900 -.5969847 .5252934
CF_age | .1352476 .0487664 2.77 0.006 .0396596 .2308355
CF_size_lnEmp | -1.551527 .1776825 -8.73 0.000 -1.899806 -1.203248
CF_Levg | -3.375775 .6100142 -5.53 0.000 -4.571477 -2.180074
CF_NPM | .0023169 .0010765 2.15 0.031 .0002068 .0044269
CF_orgslack | -.0353077 .0529923 -0.67 0.505 -.139179 .0685636
CC_WGI | 2.715108 1.34704 2.02 0.044 .0747483 5.355469
s_poptotal | -.0184933 .0304806 -0.61 0.544 -.078239 .0412524
CC_WDI_PopGrowth | .1152938 .3522301 0.33 0.743 -.5751196 .8057072
s_GDP | .632364 .1654363 3.82 0.000 .3080889 .9566391
s_GDPpc | -1.391459 .2382413 -5.84 0.000 -1.85844 -.9244769
CC_WDI_GDPPCgrowth | .1399394 .0499854 2.80 0.005 .041962 .2379168
s_fdi | .0759093 .0944164 0.80 0.421 -.1091582 .2609769
CC_WDI_ttUnemplRate | .0317622 .0728713 0.44 0.663 -.1110744 .1745988
|
yeardummy |
2003 | 1.718398 .660478 2.60 0.009 .4237814 3.013015
2004 | 2.647522 .5736919 4.61 0.000 1.523017 3.772027
2005 | 3.015123 .5307992 5.68 0.000 1.974692 4.055553
2006 | 2.572717 .4936521 5.21 0.000 1.605099 3.540335
2007 | .1135621 .4749622 0.24 0.811 -.8174212 1.044545
2008 | -.1597079 .4313103 -0.37 0.711 -1.005128 .6857123
2009 | 2.637556 .4797226 5.50 0.000 1.697242 3.57787
2010 | 1.65895 .3939397 4.21 0.000 .8867807 2.431119
2011 | 1.438345 .3699788 3.89 0.000 .7131416 2.163548
2012 | .55647 .3401479 1.64 0.102 -.1102609 1.223201
2013 | .4664979 .3052461 1.53 0.126 -.1318213 1.064817
2014 | -1.235379 .2731461 -4.52 0.000 -1.770778 -.6999793
2015 | -.5328104 .2561052 -2.08 0.038 -1.034807 -.0308134
2016 | .5114611 .2398972 2.13 0.033 .0412337 .9816885
2017 | 0 (omitted)
|
_cons | 15.21448 5.054748 3.01 0.003 5.306568 25.1224
--------------------+----------------------------------------------------------------
sigma_u | 13.647115
sigma_e | 7.0821539
rho | .78783095 (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
F test that all u_i=0: F(3779, 15154) = 6.53 Prob > F = 0.0000
. estimate store fe
. xtreg wROAf1 IV_CSPaggr geoentropy4rg indentropy CF_age CF_size_lnEmp CF_Levg CF_NPM CF_orgslack CC_WGI s_poptotal CC_WDI
> _PopGrowth s_GDP s_GDPpc CC_WDI_GDPPCgrowth s_fdi CC_WDI_ttUnemplRate i.yeardummy, re
Random-effects GLS regression Number of obs = 18,964
Group variable: compid Number of groups = 3,780
R-sq: Obs per group:
within = 0.0190 min = 1
between = 0.0983 avg = 5.0
overall = 0.0484 max = 16
Wald chi2(31) = 642.04
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
-------------------------------------------------------------------------------------
wROAf1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
IV_CSPaggr | -.0104024 .0050298 -2.07 0.039 -.0202607 -.0005442
geoentropy4rg | .5555316 .326717 1.70 0.089 -.0848219 1.195885
indentropy | .040395 .2456707 0.16 0.869 -.4411107 .5219008
CF_age | .0183981 .0053646 3.43 0.001 .0078837 .0289125
CF_size_lnEmp | .6894453 .093907 7.34 0.000 .505391 .8734995
CF_Levg | -3.212073 .5162737 -6.22 0.000 -4.22395 -2.200195
CF_NPM | .0021792 .0004408 4.94 0.000 .0013153 .0030432
CF_orgslack | -.1645398 .0435363 -3.78 0.000 -.2498694 -.0792102
CC_WGI | -.5799745 .4522981 -1.28 0.200 -1.466462 .3065135
s_poptotal | -.0020881 .0010709 -1.95 0.051 -.0041871 .0000108
CC_WDI_PopGrowth | .1173416 .2527258 0.46 0.642 -.3779918 .612675
s_GDP | .013877 .0326286 0.43 0.671 -.0500738 .0778278
s_GDPpc | -.4957057 .1248403 -3.97 0.000 -.7403882 -.2510232
CC_WDI_GDPPCgrowth | .1474068 .0474048 3.11 0.002 .054495 .2403185
s_fdi | .1062363 .0854276 1.24 0.214 -.0611988 .2736714
CC_WDI_ttUnemplRate | .0051797 .0428637 0.12 0.904 -.0788316 .089191
|
yeardummy |
2003 | 1.641583 .6825769 2.40 0.016 .3037567 2.979409
2004 | 2.930763 .6200142 4.73 0.000 1.715558 4.145969
2005 | 3.252471 .6017448 5.41 0.000 2.073073 4.431869
2006 | 2.934455 .5998017 4.89 0.000 1.758865 4.110044
2007 | .3813269 .6038372 0.63 0.528 -.8021723 1.564826
2008 | .252396 .5885435 0.43 0.668 -.9011281 1.40592
2009 | 3.200676 .6210972 5.15 0.000 1.983348 4.418004
2010 | 2.266513 .5860811 3.87 0.000 1.117815 3.415211
2011 | 1.933953 .5919703 3.27 0.001 .7737126 3.094193
2012 | 1.211612 .5868921 2.06 0.039 .0613248 2.361899
2013 | 1.19128 .5855447 2.03 0.042 .0436333 2.338926
2014 | -.3239355 .5834149 -0.56 0.579 -1.467408 .8195367
2015 | .6538636 .5763879 1.13 0.257 -.4758358 1.783563
2016 | 1.603099 .5801637 2.76 0.006 .4659991 2.740199
2017 | 1.087371 .5918516 1.84 0.066 -.072637 2.247379
|
_cons | -.2501154 1.308373 -0.19 0.848 -2.81448 2.314249
--------------------+----------------------------------------------------------------
sigma_u | 9.609396
sigma_e | 7.0821539
rho | .64801529 (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
. estimate store re
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
wROAf1[compid,t] = Xb + u[compid] + e[compid,t]
Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
wROAf1 | 112.5129 10.60721
e | 50.1569 7.082154
u | 92.34049 9.609396
Test: Var(u) = 0
chibar2(01) = 6033.36
Prob > chibar2 = 0.0000
. xtoverid
2b: operator invalid
r(198); ** Mundlak approach
egen CSPmean = mean(IV_CSPaggr)
egen Geoentmean = mean(geoentropy4rg)
egen Indentmean = mean(indentropy)
egen agemean = mean(CF_age)
egen sizemean = mean(CF_size_lnEmp)
egen levgmean = mean(CF_Levg)
egen npmmean = mean(CF_NPM)
egen orgslackmean = mean(CF_orgslack)
egen WGImean = mean(CC_WGI)
egen poptotalmean = mean(s_poptotal)
egen popgrowthmean = mean(CC_WDI_PopGrowth)
egen gdpmean = mean(s_GDP)
egen gdppcmean = mean(s_GDPpc)
egen gdppcgrowthmean = mean(CC_WDI_GDPPCgrowth)
egen fdimean = mean(s_fdi)
egen unemplmean = mean(CC_WDI_ttUnemplRate)
xtreg wROAf1 IV_CSPaggr geoentropy4rg indentropy CF_age CF_size_lnEmp CF_Levg CF_NPM CF_orgslack CC_WGI s_poptotal CC_WDI_PopGrowth s_GDP s_GDPpc CC_WDI_GDPPCgrowth s_fdi CC_WDI_ttUnemplRate CSPmean Geoentmean Indentmean agemean sizemean levgmean npmmean orgslackmean WGImean poptotalmean popgrowthmean gdpmean gdppcmean gdppcgrowthmean fdimean unemplmean i.yeardummy, re vce(robust)
note: CSPmean omitted because of collinearity
note: Geoentmean omitted because of collinearity
note: Indentmean omitted because of collinearity
note: agemean omitted because of collinearity
note: sizemean omitted because of collinearity
note: levgmean omitted because of collinearity
note: npmmean omitted because of collinearity
note: orgslackmean omitted because of collinearity
note: WGImean omitted because of collinearity
note: poptotalmean omitted because of collinearity
note: popgrowthmean omitted because of collinearity
note: gdpmean omitted because of collinearity
note: gdppcmean omitted because of collinearity
note: gdppcgrowthmean omitted because of collinearity
note: fdimean omitted because of collinearity
note: unemplmean omitted because of collinearity
Random-effects GLS regression Number of obs = 18,964
Group variable: compid Number of groups = 3,780
R-sq: Obs per group:
within = 0.0190 min = 1
between = 0.0983 avg = 5.0
overall = 0.0484 max = 16
Wald chi2(31) = 403.84
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 3,780 clusters in compid)
-------------------------------------------------------------------------------------
| Robust
wROAf1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
IV_CSPaggr | -.0104024 .005792 -1.80 0.072 -.0217545 .0009497
geoentropy4rg | .5555316 .4246757 1.31 0.191 -.2768174 1.387881
indentropy | .040395 .2513858 0.16 0.872 -.452312 .5331021
CF_age | .0183981 .0037449 4.91 0.000 .0110582 .025738
CF_size_lnEmp | .6894453 .1634117 4.22 0.000 .3691641 1.009726
CF_Levg | -3.212073 1.113767 -2.88 0.004 -5.395016 -1.029129
CF_NPM | .0021792 .0011534 1.89 0.059 -.0000814 .0044399
CF_orgslack | -.1645398 .0894728 -1.84 0.066 -.3399033 .0108237
CC_WGI | -.5799745 .4286284 -1.35 0.176 -1.420071 .2601217
s_poptotal | -.0020881 .0008452 -2.47 0.013 -.0037447 -.0004315
CC_WDI_PopGrowth | .1173416 .2156586 0.54 0.586 -.3053415 .5400247
s_GDP | .013877 .0344897 0.40 0.687 -.0537215 .0814755
s_GDPpc | -.4957057 .1313103 -3.78 0.000 -.7530691 -.2383423
CC_WDI_GDPPCgrowth | .1474068 .0596172 2.47 0.013 .0305592 .2642544
s_fdi | .1062363 .0843957 1.26 0.208 -.0591762 .2716489
CC_WDI_ttUnemplRate | .0051797 .0422669 0.12 0.902 -.077662 .0880214
CSPmean | 0 (omitted)
Geoentmean | 0 (omitted)
Indentmean | 0 (omitted)
agemean | 0 (omitted)
sizemean | 0 (omitted)
levgmean | 0 (omitted)
npmmean | 0 (omitted)
orgslackmean | 0 (omitted)
WGImean | 0 (omitted)
poptotalmean | 0 (omitted)
popgrowthmean | 0 (omitted)
gdpmean | 0 (omitted)
gdppcmean | 0 (omitted)
gdppcgrowthmean | 0 (omitted)
fdimean | 0 (omitted)
unemplmean | 0 (omitted)
|
yeardummy |
2003 | 1.641583 .4706248 3.49 0.000 .7191752 2.56399
2004 | 2.930763 .580364 5.05 0.000 1.793271 4.068256
2005 | 3.252471 .5911194 5.50 0.000 2.093898 4.411043
2006 | 2.934455 .6290839 4.66 0.000 1.701473 4.167436
2007 | .3813269 .6928329 0.55 0.582 -.9766007 1.739254
2008 | .252396 .5860639 0.43 0.667 -.896268 1.40106
2009 | 3.200676 .6079538 5.26 0.000 2.009109 4.392244
2010 | 2.266513 .6160047 3.68 0.000 1.059166 3.47386
2011 | 1.933953 .6155617 3.14 0.002 .7274743 3.140432
2012 | 1.211612 .5967539 2.03 0.042 .041996 2.381228
2013 | 1.19128 .5951258 2.00 0.045 .0248547 2.357705
2014 | -.3239355 .605627 -0.53 0.593 -1.510943 .8630717
2015 | .6538636 .5871039 1.11 0.265 -.4968388 1.804566
2016 | 1.603099 .580165 2.76 0.006 .4659965 2.740202
2017 | 1.087371 .6118433 1.78 0.076 -.11182 2.286562
|
_cons | -.2501154 1.708782 -0.15 0.884 -3.599267 3.099036
--------------------+----------------------------------------------------------------
sigma_u | 9.609396
sigma_e | 7.0821539
rho | .64801529 (fraction of variance due to u_i)
-------------------------------------------------------------------------------------
estimates store mundlak
testparm CSPmean Geoentmean Indentmean agemean sizemean levgmean npmmean orgslackmean WGImean poptotalmean popgrowthmean gdpmean gdppcmean gdppcgrowthmean fdimean unemplmean
no such variables;
the specified varlist does not identify any testable coefficients
r(111);Kind regards,
Jennifer
Medeff command: invalid mediate (error r(198))
Could you please tell me the reason why when I type the following command:
medeff (regress edb51 acled gdpgrowth1 gdppercapita1 trade1 inflation1 exchangerate1 telephone1 cellphone1 education1 i.cid i.year) (regress fdiwdi1 acled edb51 gdpgrowth1 gdppercapita1 trade1 inflation1 exchangerate1 telephone1 cellphone1 education1 i.cid i.year), treat(acled), mediate(edb51) sims (1000) vce (bootstrap)
Stata gives me the error r.198 with the saying invalid 'mediate'.
Please I hope someone can help because it is really urgent!!!
Thank you very much and kind regards,
Siham Hari
Doubt on Heckprob model results
I'm running a Heckman selection model based on a heckprob. I have some categorical variables such as grup_ha grupo_edad niveleducativo and the rest of them are dichotomous variables.
I can notice in my results that only approximately 15% of the sample are uncensored observations. So I'm not sure whether this is helping to explatin the whole behavior. Even more so if the model is appropiate.
Because χ 2 = 57.35, this clearly justifies the Heckman selection equation with these data, but I'm not sure yet about it and I can't find much literature about this kind of model. Could anyone tell me how to Interpret these heckprobit results? I suppose I have to interpret as a probit model.
Thank you!
. heckprob cred_o_fin_aprob i.regiones_co sexo tenenciapropia cuidadotierrayanim existeinfraestructura acceso_sistderiego accesoenergia desti
> noventa recibir_asistenciaoasesoria grup_ha niveleducativo grupo_edad, select( soli_cred_o_fin2013=grup_ha sexo grupo_edad sabeleeryesc) vc
> e(robust)
Fitting probit model:
Iteration 0: log pseudolikelihood = -31412.985
Iteration 1: log pseudolikelihood = -30745.027
Iteration 2: log pseudolikelihood = -30740.017
Iteration 3: log pseudolikelihood = -30740.017
Fitting selection model:
Iteration 0: log pseudolikelihood = -245658.48
Iteration 1: log pseudolikelihood = -242017.34
Iteration 2: log pseudolikelihood = -241997.85
Iteration 3: log pseudolikelihood = -241997.85
Fitting starting values:
Iteration 0: log pseudolikelihood = -61035.075
Iteration 1: log pseudolikelihood = -30953.04
Iteration 2: log pseudolikelihood = -30710.317
Iteration 3: log pseudolikelihood = -30709.844
Iteration 4: log pseudolikelihood = -30709.844
Fitting full model:
Iteration 0: log pseudolikelihood = -272920.09
Iteration 1: log pseudolikelihood = -272709.04 (not concave)
Iteration 2: log pseudolikelihood = -272708.68 (backed up)
Iteration 3: log pseudolikelihood = -272707.97
Iteration 4: log pseudolikelihood = -272707.92
Iteration 5: log pseudolikelihood = -272707.91
Iteration 6: log pseudolikelihood = -272707.91
Probit model with sample selection Number of obs = 571,952
Censored obs = 483,897
Uncensored obs = 88,055
Wald chi2(15) = 1092.15
Log pseudolikelihood = -272707.9 Prob > chi2 = 0.0000
---------------------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
cred_o_fin_aprob |
regiones_co |
Caribe | -.3207418 .0201058 -15.95 0.000 -.3601483 -.2813352
Pacifico | .075082 .0145369 5.16 0.000 .0465902 .1035738
Oriental | .1377078 .0150459 9.15 0.000 .1082184 .1671973
Orinoco-Amazonia | -.2023915 .0241538 -8.38 0.000 -.2497322 -.1550508
|
sexo | -.0885042 .0142334 -6.22 0.000 -.1164013 -.0606072
tenenciapropia | .0716722 .0120028 5.97 0.000 .0481471 .0951974
cuidadotierrayanim | .1055903 .0179416 5.89 0.000 .0704253 .1407552
existeinfraestructura | .0249654 .0102271 2.44 0.015 .0049206 .0450102
acceso_sistderiego | .0202652 .0134314 1.51 0.131 -.0060599 .0465903
accesoenergia | .0666022 .0110267 6.04 0.000 .0449903 .0882141
destinoventa | -.0020524 .01243 -0.17 0.869 -.0264148 .0223099
recibir_asistenciaoasesoria | .111292 .0118058 9.43 0.000 .0881531 .1344309
grup_ha | -.8661633 .8533362 -1.02 0.310 -2.538672 .8063449
niveleducativo | -.0214992 .0051078 -4.21 0.000 -.0315102 -.0114881
grupo_edad | -.0740636 .0065678 -11.28 0.000 -.0869362 -.061191
_cons | 2.724741 .8553536 3.19 0.001 1.048278 4.401203
----------------------------+----------------------------------------------------------------
soli_cred_o_fin2013 |
grup_ha | -.0262168 .4370274 -0.06 0.952 -.8827748 .8303412
sexo | .1801206 .0046461 38.77 0.000 .1710145 .1892267
grupo_edad | .0252196 .002117 11.91 0.000 .0210703 .0293689
sabeleeryesc | .4375166 .0060783 71.98 0.000 .4256034 .4494298
_cons | -1.567543 .4371122 -3.59 0.000 -2.424268 -.7108193
----------------------------+----------------------------------------------------------------
/athrho | -.4703892 .0621132 -7.57 0.000 -.5921288 -.3486496
----------------------------+----------------------------------------------------------------
rho | -.4385137 .0501692 -.5314249 -.3351774
---------------------------------------------------------------------------------------------
Wald test of indep. eqns. (rho = 0): chi2(1) = 57.35 Prob > chi2 = 0.0000
endFriday, November 29, 2019
Help creating graph for multiple linear regression
I am using Stata 16, on mac. I am estimating the regression: reg GDP agedpop population WestCoast EastCoast, robust. agedpop(=population that is 65 and older), WestCoast(=1 if state is on the WestCoast and 0 if not), EastCoast(=1 if state is on the EastCoast and 0 if not), and Midwest if my omitted category. Does anyone have any ideas on some graphs I could use to represent this regression?
Thank you in advance for your help
Jason Browen
Exporting residuals vs prediction data
So, I need to make a residuals versus predictor plot. However, the graphics aren't working in STATA. Yes, I've tried just about everything. Now, I could easily plot this in another program (such as excel), I just need to know how to get the raw numbers for this kind of plot. Like, get the data output and paste it into another program and then make the graph in that program. How do I go about doing this?
Residuals vs fit would also be relevant btw, but I assume the process is the same.
how to delete specific observations
I need to delete the firms that does not have data for three years consecutively. please teach me how to do it
for example
firm1 has data for five years but as follows (2005 2007 2008 2010 2012). i dont want this firm i want to delete it from my sample . hope you get my point
thanks in advance
Wagstaff concentration Index for binary outcome
I have read in existing literature that if we are dealing with a binary outcome (i.e th individual went for prenatal care/did not go for prenatal care), the Concentration Index is not valid as it is not normalised Could someone please explain how to normalise wagstaff concentration index and what this generally means.
Zahrah
How to generate a new variable based on existing variables
I have a dataset like below. "tiea"=1 means in a given year the company has a connection to the Senator "a", tiea=0 means the company has no connection to the Senator "a" in the given year. The same applies to Senator "b". Senator b replaced Senator a in the year 2004.
So, I wanted to create a new variable called "currenttie" which should be able to measure whether the company has a tie to the current Senator in any given year. For example, in year 2000 and 2001, the currenttie value should be 1, whereas in year 2002 and 2003 the currenttie value should be 0. After year 2004, the tie to Senator b became current. So the currenttie in 2004, 2006,2007 and 2008 has a value of 1.
Could you please show me how to generate the "currenttie" variable based on the tiea, tieb, and year variables? Thank you very much!
* Example generated by -dataex-. To install: ssc install dataex clear input int year byte(tiea tieb currenttie) 2000 1 0 1 2001 1 0 1 2002 0 1 0 2003 0 0 0 2004 1 1 1 2005 0 0 0 2006 0 1 1 2007 0 1 1 2008 0 1 1 end
IKEA Corporate Social Responsibility (IKEA CSR): a brief overview
IKEA Corporate Social Responsibility (CSR) efforts are led by Chief Sustainability Officer, Pia Heidenmark Cook. The home improvement and furnishing chain started to research CSR reports under the title People & Planet Positive starting from 2012. IKEA CSR efforts and … Continue reading →
The post IKEA Corporate Social Responsibility (IKEA CSR): a brief overview appeared first on Research-Methodology.
ARDL lag coefficients
I am trying to fit the following short-run equation using ARDL, following in a higly simplified form:
And a long-run equation:
X = a1Y + a2Z + a3W
To get any results of significance, I have to use a "aic ec1" ARDL command:
X = a4 + a5Y + a6Z + a7W
ardl logesgbr loggdpgbr logrergbr logvolgbr , aic ec1 btestI do get some significant results with this, but I am questioning the type of lags STATA suggests:
- What are lags such as "LD, L2D, D1... etc"?
- Are their "coef." values anything I could use in my equation?
- And what should I do with the adjustment coefficient?
If anyone could shed some light on this, it would be highly appreciated!
Regards,
James
Interpreting a non-linear relationship with predicted values and margins plot
I am trying to interpret a non-linear relationship in a fixed effects model. Attached is my Stata output. As you can see from the output, both the overhead ratio and the overhead ratio squared are positive and significant.
However, when I graph the predicted values I get a u-shape relationship and when I do a margins plot I get an inverted-u. So I'm not sure which is accurate? I also thought that if a non linear relationship is present that one value would be negative and the other value would be positive.
Any help would be much appreciated!
jackknife loop with wrong number of observations
reg fawtd fdistockgdp if seqnum != `i', robust
outreg2 using table, append excel
}
I'm running the above loop (to create a jackknife test) on 139 observations. I'm doing this because I have a list of 139 countries and want to know how my regression coefficient changes when each country is dropped individually.
There is data for all observations, no data is missing. I used seqnum to generate a numbered list of countries. I'm also using outreg2 to export the data.
My problem is that the results I get list 138 observations for some observations and 139 observations for others (seemingly randomly). It seems to me that it should be 138 for all, since I'm dropping one country each time. I've attached a screenshot that shows the first 22 results below. You can see how the number of observations varies.Array
I would really appreciate it if anyone has thoughts on what the problem might be. Thank you!
Calculation of standard errors after predictive margins: how are they computed?
I'm struggling how standard errors after -margins- are calculated, e.g. after -logit-. I am able to reproduce the productive margins themselves, but I am not able to derive the standard errors. They don't seem to be in line with the standard deviations of the predicted probabilities ... See example below
sysuse auto logit foreign price mpg weight length * Manually predict probabilities at various levels of weight generate p1500 = invlogit(_b[weight]*1500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p2000 = invlogit(_b[weight]*2000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p2500 = invlogit(_b[weight]*2500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p3000 = invlogit(_b[weight]*3000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p3500 = invlogit(_b[weight]*3500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p4000 = invlogit(_b[weight]*4000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p4500 = invlogit(_b[weight]*4500 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) generate p5000 = invlogit(_b[weight]*5000 + _b[price]*price +_b[mpg]*mpg + _b[length]*length + _b[_cons]) * Calculating predictive margins using margins command margins, at ( weight =(1500(500)5000))
Logistic regression Number of obs = 74
LR chi2(4) = 55.94
Prob > chi2 = 0.0000
Log likelihood = -17.064729 Pseudo R2 = 0.6211
------------------------------------------------------------------------------
foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | .0009392 .0003093 3.04 0.002 .000333 .0015454
mpg | -.1155925 .0966509 -1.20 0.232 -.3050248 .0738398
weight | -.0078002 .0030342 -2.57 0.010 -.0137471 -.0018534
length | .0387482 .0875022 0.44 0.658 -.1327529 .2102493
_cons | 9.883036 11.26217 0.88 0.380 -12.19042 31.95649
------------------------------------------------------------------------------
. * Calculating predictive margins using margins command
. margins, at ( weight =(1500(500)5000))
Predictive margins Number of obs = 74
Model VCE : OIM
Expression : Pr(foreign), predict()
1._at : weight = 1500
2._at : weight = 2000
3._at : weight = 2500
4._at : weight = 3000
5._at : weight = 3500
6._at : weight = 4000
7._at : weight = 4500
8._at : weight = 5000
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .9978147 .0039638 251.73 0.000 .9900458 1.005584
2 | .9230255 .0393331 23.47 0.000 .8459341 1.000117
3 | .5344313 .1674495 3.19 0.001 .2062364 .8626262
4 | .2003275 .07624 2.63 0.009 .0508998 .3497552
5 | .0874817 .0298857 2.93 0.003 .0289067 .1460567
6 | .0129882 .0135501 0.96 0.338 -.0135694 .0395459
7 | .0003349 .0007068 0.47 0.636 -.0010504 .0017201
8 | 6.82e-06 .0000234 0.29 0.771 -.0000391 .0000527
------------------------------------------------------------------------------However I cannot recreate the standard error of -margins-, e.g. Std. Dev. of manual predections at
values of weight of 2500 and 3000 are close to each other (.3565795 and .3427131), while the
Std. Err. produced by -margins- differs a lot for these values (.1674495 and .07624 )
. sum p1500-p5000
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
p1500 | 74 .9978147 .0039419 .9830204 1
p2000 | 74 .9230255 .1224789 .5395426 .9999996
p2500 | 74 .5344313 .3565795 .0231664 .9999796
p3000 | 74 .2003275 .3427131 .0004798 .9989911
p3500 | 74 .0874817 .2395264 9.71e-06 .952476
-------------+---------------------------------------------------------
p4000 | 74 .0129882 .0509261 1.97e-07 .2885812
p4500 | 74 .0003349 .001379 3.98e-09 .0081432
p5000 | 74 6.82e-06 .0000281 8.05e-11 .0001661Thanks a lot,
MIke
Error terms for Time Series Regression with a Panel Variable
For my master thesis, I would like as a first step to run a Fama-French three factor regression, and compute idiosyncratic volatility for every common stock as the standard deviation of one, three, six, or twelve months of daily error term.
For this, I downloaded permno, date, and return of stocks from CRSP and daily FF 3 factors. I dropped all missing values of return (unbalanced time series data)and merged both datasets based on date.
I created a variable for time :
egen time = group(date)
label variable time "Date Identifier"
My data looks like this :
Array
I declared a time series dataset with a panel variable (permno) :
tsset permno time, generic
I want to save the error terms for each stock every day so that I can later create a loop to compute standard deviation of that error term based on 1,3,6,12 months of daily data. How can I do that ?
I know I can have an output of factor loadings and rmse for every stock by running :
statsby _b[MktRF] _b[SMB] _b[HML] rmse = e(rmse), by(permno) saving(try reg.dta, replace): regress excessretx MktRF SMB HML
but again I would be losing the time dimension that I need to compute monthly idiosyncratic volatility of every stock.
How to change display of scientific notation of summary statistics to nummeric?
I am attempting to create a descriptive statistics table for my thesis. However, I found that variables with large numbers are displayed using the scientific notation.
In my attempt to change it, I read many FAQ's and other discussions about this problem, but none of these options seemed to work for me.
The data type of the variables is
double
sum var1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
var1 | 12,286 1.18e+09 2.21e+10 .0223229 7.71e+11
format var1 %24.0f
sum var1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
var1 | 12,286 1.18e+09 2.21e+10 .0223229 7.71e+11
recast float var1, force
var1: 12072 values changed
sum var1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
var1 | 12,286 1.18e+09 2.21e+10 .0223229 7.71e+11tostring var1, gen(var1s) format("%17.0f")
var1 cannot be converted reversibly; no generateKind regards,
Stan
How to regress categorical variables without base
Currently I am investigating the impact of industry categorization (categorical variable) on premium's paid (continuous variable) in acquisitions. Therefore, I have categorized my data in 48 industry categories and now I am trying to regress the industries with the paid premiums. But stata automatically selects one of my industries as base. How can I calculate the impact of the different industries on the premiums without a base since there is no standard industry?
Thank you for your help!
Command problem with counting sample firms smaller than the firm for each country in each year (panel data)
I would like to measure the percentage of sample firms smaller than the firm for each country in each year.
But I don't know how to use loop to count the number of firms with a smaller asset volume than the specific firm.
For example, for company "01COMMUNIQUE LAB" in 1999, I want to know how many firms are with assets < 2472.
Data:
input name year totass
"01COMMUNIQUE LAB" 1999 2472
"01COMMUNIQUE LAB" 2000 13487
"01COMMUNIQUE LAB" 2001 5145
"01COMMUNIQUE LAB" 2002 2375
"01COMMUNIQUE LAB" 2003 635
"01COMMUNIQUE LAB" 2004 859
"01COMMUNIQUE LAB" 2005 703
"01COMMUNIQUE LAB" 2006 707
"01COMMUNIQUE LAB" 2007 2915
"01COMMUNIQUE LAB" 2008 2157
"0373849 B.C. LTD" 1999 4586
"0373849 B.C. LTD" 2000 4106
"0373849 B.C. LTD" 2001 3659
"0373849 B.C. LTD" 2002 3649
"0373849 B.C. LTD" 2003 7523
"0373849 B.C. LTD" 2004 6165
"0373849 B.C. LTD" 2005 5892
"0373849 B.C. LTD" 2006 18235
"0373849 B.C. LTD" 2007 34371
"0373849 B.C. LTD" 2008 4831
Code:
egen yeargroup = group(year)
sort year
foreach var yeargroup{
gen Size == 0;
if totass[_n] > totass[_n+1]{
relsize ==relsize + 1;
}
}
It turns out not working. Could somebody help me with the commands?
Thanks in advance.
Xiao
Mean of a group minus the observation's value in Panel data
I have a data structure like this:
IO-IOarea-Year-index
1----4-------1997-0
2----4-------1998-2.2267373
3----5-------1998-0
4----5-------2000-0
I would create a mean of index variable for each IOarea group per year. So I used this code:
by IOarea Year, sort: egen index_byissue=mean(index)
However, now I would like to add a second step where I show the mean of the group minus that IO's score per year. So each IO get the score of index_byissue minus its own score each year. I aim to use other group members' average score each year as a new variable.
Thanks so much by now!
Logging results in forvalues-loop: command lines are not shown in output-logfile
I want to save some output results using the -log- command. This works fine if I use -log- as in the example below: the command lines are shown above the results in the smcl-file (e.g. sum ... and reg ...) and consequently output is easily intrepretable.
Commands:
clear sysuse auto log using "Output-testA.smcl", replace nomsg sum price mpg weight length rep78 reg price mpg weight length i.rep78 log close translate "Output-testA.smcl" "Output-testA.pdf", translator(smcl2pdf)
. sum price mpg weight length rep78
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
mpg | 74 21.2973 5.785503 12 41
weight | 74 3019.459 777.1936 1760 4840
length | 74 187.9324 22.26634 142 233
rep78 | 69 3.405797 .9899323 1 5
. reg price mpg weight length i.rep78
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(7, 61) = 7.25
Model | 262008114 7 37429730.6 Prob > F = 0.0000
Residual | 314788844 61 5160472.86 R-squared = 0.4542
-------------+---------------------------------- Adj R-squared = 0.3916
Total | 576796959 68 8482308.22 Root MSE = 2271.7
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -126.8367 84.49819 -1.50 0.138 -295.8012 42.12791
weight | 5.186695 1.163383 4.46 0.000 2.860367 7.513022
length | -124.1544 40.07637 -3.10 0.003 -204.292 -44.01671
|
rep78 |
2 | 1137.284 1803.332 0.63 0.531 -2468.701 4743.269
3 | 1254.642 1661.545 0.76 0.453 -2067.823 4577.108
4 | 2267.188 1698.018 1.34 0.187 -1128.208 5662.584
5 | 3850.759 1787.272 2.15 0.035 276.8886 7424.63
|
_cons | 14614.49 6155.842 2.37 0.021 2305.125 26923.86
------------------------------------------------------------------------------
. log closeCommands:
clear
sysuse auto
forvalues i = 1/3 {
log using "Output-testB`i'.smcl", replace nomsg
sum price mpg weight length rep78
reg price mpg weight length i.rep78
log close
translate "Output-testB`i'.smcl" "Output-testB`i'.pdf", translator(smcl2pdf)
} Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
mpg | 74 21.2973 5.785503 12 41
weight | 74 3019.459 777.1936 1760 4840
length | 74 187.9324 22.26634 142 233
rep78 | 69 3.405797 .9899323 1 5
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(7, 61) = 7.25
Model | 262008114 7 37429730.6 Prob > F = 0.0000
Residual | 314788844 61 5160472.86 R-squared = 0.4542
-------------+---------------------------------- Adj R-squared = 0.3916
Total | 576796959 68 8482308.22 Root MSE = 2271.7
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -126.8367 84.49819 -1.50 0.138 -295.8012 42.12791
weight | 5.186695 1.163383 4.46 0.000 2.860367 7.513022
length | -124.1544 40.07637 -3.10 0.003 -204.292 -44.01671
|
rep78 |
2 | 1137.284 1803.332 0.63 0.531 -2468.701 4743.269
3 | 1254.642 1661.545 0.76 0.453 -2067.823 4577.108
4 | 2267.188 1698.018 1.34 0.187 -1128.208 5662.584
5 | 3850.759 1787.272 2.15 0.035 276.8886 7424.63
|
_cons | 14614.49 6155.842 2.37 0.021 2305.125 26923.86
------------------------------------------------------------------------------Thanks a lot,
Mike
Reshaping Data
I am facing an issue with some data which is available with me in MS-Excel in the following format:
|
||||||||||||||||||||||||||||||||||||||||
|
Can this be done through Stata commands? If yes, please share possible ways to do it.
Thanks in advance
method for omitted variable bias cross sectional analysis
I have a cross sectional analysis to analyse the effect of head circumference (continuous variable) on cognititve skills (continuous variable). Can someone please suggest a method i can use to solve for omitted variable bias. I will be really grateful.
Panel regression model with N>T and serial autocorrelated error.
I've been trying to estimate a panel regression model on a dataset with N>T, where N is the number of cross-sectional units and T is number of time observations. I want to include a fixed effect.
I ran the Wooldridge test for serial autocorrelation and I rejected the null: so the model has serial autocorrelated errors.
Due to the fact that N>T I understand that I cannot rely on -xtreg-. My question is: given these conditions is it good to estimate the model by -xtregar- including a fixed effect? Is this model consistent with the fact that N>T and that the error is autocorrelated of order 1?
Many thanks to those who can help me
Comparing non nested models with xtmelogit
I am using xtmelogit to run a multilevel logistic regression with PISA data. My data is hierarchical (individuals nested into schools, schools nested into countries).
My dependent variable is the expectation of college graduation among the fifteen-years old students who were interviewed for PISA. I am regressing my dichotomous dependent variable (expecting college graduation or not) to a number of controls and two key independent variables: gender of the respondent and father's education.
There is the case of which one of the two parents is more important for the phenomenon that I am trying to explain. In other words, if the model with father's education has a better goodness of fit than the model with mother's education or vice versa. Here there are the two models:
Model with father's education (fisced4)
xtmelogit expect_ISCED5A female ib4.fisced4 || country3: || schoolid:, variance
xtmelogit expect_ISCED5A female ib4.misced4 || country3: || schoolid:, variance
Could I ask you for some guidance regarding to the best way of making this model comparison and, if this is the case, how to get the AIC reported after xtmelogit?
Many thanks for your attention and your help
Kind regards
Luis Ortiz
LOOP ERROR no observations defined
I have been running a loop for each out of 5 years.
clear all
cd "\\registry\2017"
save appended_2017, emptyok
local filelist: dir . files "*.dta"
foreach f of local filelist {
use `f', clear
append using appended_2017.dta
save appended_2017.dta, replace
}All files are sitting in the corresponding year folder.
Error:
r(111);
Thank you.
Calculate volatility of daily returns by use of GARCH
We try to calculate the forecasted implied volatility of daily returns by use of the GARCH (1,1) model. So far we don't get any value that is in line with usual volatilities. Can somebody help us. We uploaded the data.
Thank you!
Replace observations with another group in Panel data
I have panel data and I want to replace observations of one group with the same variable's observations for another group. I have 31 organizations in my dataset and I would like to use observations of UN for the variables of disaster and damage for other UN agencies as well (UN agencies have "1" under the UN_system dummy variable). I'm very much open to suggestions on this! Thank you.
Organization---Year----disasters----damage---UN_system
UN----------------1990-------1-------------2-------------1
UN----------------1991-------4------------ 3-------------1
UNHCR----------1990-------.------------. ------------ 1
UNHCR----------1991-------.------------. ------------ 1
UNICEF----------1990-------.------------. ------------ 1
UNICEF--------- 1990 -------.------------. ------------ 1
Interpreting significance - HLM using REML and Kenward-Roger-corr.
I struggle with interpreting my results from multilevel linear regressions, using restricted maximum likelihood and Kenward-Roger correction. I use Stata 15.0 on a Mac (version 10.14), and I hope I’m using the Code-function right.
My problem is that I do not know how to interpret the significance of the coefficients in my multilevel outputs. I’m doing my master thesis, so I will report significance at both the 0.1-, 0.05-, 0.01- and 0.001-level in my regression tables.
For example, from reading off P>|t| in the output below, I immediately thought that x1-x3 are statistically significant (x1 at the 0.1-level, x2 at the 0.01-level, x3 at the 0.001-level). However, is that the correct interpretation? The confusion arise as I do not know how to calculate the critical t-value, so that I can compare t from the output to the critical t-value. I tried using this calculatur (http://www.ttable.org/student-t-value-calculator.html), plotting in df=6, but I'm not sure this yields the right value. The critical value for a two-tail test with significance level 0.01 is calculated to be +/-3.71. However, if x2 is significant at the 0.01-level (which I though from reading its P>|t|), why is the t-value only 3.43, which yields a significance at the 0.05-level if the critical t-value is 3.71? Is the critical t-value in fact something else, or is my interpretation of P>|t| wrong?
mixed CHILDREN x1 x2 x3 || COUNTRY2:, reml dfmethod(kroger)
Mixed-effects REML regression Number of obs = 3,245
Group variable: COUNTRY2 Number of groups = 10
Obs per group:
min = 108
avg = 324.5
max = 466
DF method: Kenward-Roger DF: min = 16.07
avg = 1,561.11
max = 3,240.03
F(3, 98.23) = 21.91
Log restricted-likelihood = -4836.0572 Prob > F = 0.0000
--------------------------------------------------------------------------------
CHILDREN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
x1 | .2339834 .1167206 2.00 0.058 -.0091296 .4770963
x2 | .133747 .0389608 3.43 0.001 .0573566 .2101373
x3 | .0837528 .0124587 6.72 0.000 .0593243 .1081814
_cons | .5332078 .2389675 2.23 0.040 .0268024 1.039613
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
COUNTRY2: Identity |
var(_cons) | .1383108 .0711551 .0504601 .3791085
-----------------------------+------------------------------------------------
var(Residual) | 1.136198 .0282628 1.082133 1.192965
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 124.99 Prob >= chibar2 = 0.0000
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 3,245 . -4836.057 6 9684.114 9720.624
-----------------------------------------------------------------------------All help is very much appreciated.
Kind regards,
Frøydis Jensen
Using loops to create local from other locals
local 1 "a b c" local 2 "d e f" local 3 "g h i"
local 10 "a b c d e f g h i"
Nick replied to my post yesterday and suggested the following:
forval j = 1/10 {
foreach v of var `list`j'' {
....
}
}Can you think of an elegant and simple way of doing it?
creating loop
I am new in stata, so I am doing long commands for a simple variable generating. I want to shorten the below commands. As you see only the numbers after "p" are descending rtr7p(8,7,6,5,4,3,2,1)e1by and rtr6p(8,7,6,5,4,3,2,1). I have to take values from that specific variable below exactly in the same order, not from the beginning but from the last one up to the first one. Do you have any suggestions for shortening these kinds of repetition? Thank you in advance

generate yearcp=rtr7p8e1by if rtr6p8==1
replace yearcp=rtr7p7e1by if rtr6p7==1
replace yearcp=rtr7p6e1by if rtr6p6==1
replace yearcp=rtr7p5e1by if rtr6p5==1
replace yearcp=rtr7p4e1by if rtr6p4==1
replace yearcp=rtr7p3e1by if rtr6p3==1
replace yearcp=rtr7p2e1by if rtr6p2==1
replace yearcp=rtr7p1e1by if rtr6p1==1
Thursday, November 28, 2019
Problem in reporting Arellano-Bond test for auto-correlation after xtabond2
I am estimating a dynamical panel (370 firms; T=10) where my dependent var is leverage and my independents are the lagged dependent, firm-specific factors impacting leverage and macro factors.
I use the command: xtabond2
Stata reports correctly the Sargan/Hansen tests but does not report for the second-order Arellano-Bond test for autocorrelation
Arellano-Bond test for AR(2) in first differences: z = . Pr > z = .
My panel is not balanced and I have missing values in lone of my regressors (tobinq) can it be a reason?
Does anyone have an idea? I am stuck here.
Thank you all in advance
why categorical variables regression only shows one group result?
I am running a regression with categorical variables. But I am quite confused why it only shows the second group result?
Code:
gen byte agegroup = 0 if age>=60 & age<.
replace agegroup = 1 if age>=50 & age<60
replace agegroup = 2 if age<50
reg cash_etr i.agegroup tenure lev PPE cash net_CF size tobin roa sales_growth capx xrd cf_vol for_income ch_nol nol for_op ffi_* fyear_* if age<=50, r
outreg2 using main_regression_age.xls, excel stats(coef tstat) bdec(3) tdec(2) symbol(***, **, *) label cti(Cash_ETR) replace addtext(Industry and year effects, YES) nonotes drop(cash_etr fyear_* ffi_* o.*) adjr2
reg gaap_etr i.agegroup tenure lev PPE cash net_CF size tobin roa sales_growth capx xrd cf_vol for_income ch_nol nol for_op ffi_* fyear_*, r
outreg2 using main_regression_age.xls, excel stats(coef tstat) bdec(3) tdec(2) symbol(***, **, *) label cti(GAAP_ETR) append addtext(Industry and year effects, YES) nonotes drop(gaap_etr fyear_* ffi_* o.*) adjr2
reg pbtd i.agegroup tenure lev PPE cash net_CF size tobin roa sales_growth capx xrd cf_vol for_income ch_nol nol for_op ffi_* fyear_*, r
outreg2 using main_regression_age.xls, excel stats(coef tstat) bdec(3) tdec(2) symbol(***, **, *) label cti(PBTD) append addtext(Industry and year effects, YES) nonotes drop(pbtd fyear_* ffi_* o.*) adjr2
bootstrap test for a multilevel mediation analysis
I'm trying to perform a 2-2-1 multilevel analysis, and my DV is binary. I used the command "melogit" by step to achieve my goal, but when I tried to perform a bootstrap test, this command seems to be not fit. I know there is a command "ml_mediation" for multilevel mediation with continuous DV could use bootstrapping, but if there is any command for logit models?
Thanks for any help on this.
How get variable list for a condition
clear all
sysuse auto
ds, has(type string)
unab allvars: _all
unab vars_to_exclude: make
replace mpg =. if mpg >33
foreach i in `:list allvars - vars_to_exclude' {
display "`i'"
list `i' if `i'==.
}
price
mpg
+-----+
| mpg |
|-----|
43. | . |
57. | . |
66. | . |
71. | . |
+-----+
rep78
+-------+
| rep78 |
|-------|
3. | . |
7. | . |
45. | . |
51. | . |
64. | . |
+-------+
headroom
trunk
weight
length
turn
displacement
gear_ratio
foreign
*Just an idea:
ds, has(vlist)
mpg rep78
or
foreach v of local vlist {
di "`v'"
}
mpg rep78Please any advice I would grateful
Regards
Rodrigo
average using past group information
I have company-level information as follows. For each company id in a country, I have the current location (current_loc) and new location (new_loc) that a company will move to next quarter. Now for each company that moved (e.g. id 137 moved from location 5 to location 7), I want to get two variables:
- the average of variable size using all companies located in in the same current area (e.g. for id 137 it's location 5) in the past two years
- the average of variable size using all companies located in in the new area (e.g. for id 137 it's location 7) in the past two years
* Example generated by -dataex-. To install: ssc install dataex clear input str1 country float company_id byte curent_loc str6 quarter byte new_loc float(size avg_size_current avg_size_new) "X" 131 5 "2012q3" . 27.443584 . . "X" 137 5 "2012q4" 7 23.344286 . . "X" 140 5 "2013q1" . 16.832315 . . "X" 219 5 "2013q2" . 11.427843 . . "X" 165 5 "2013q3" . 53.44666 . . "X" 14685 6 "2012q1" . 2488.442 . . "X" 134 6 "2012q1" . 13.555255 . . "X" 127 6 "2012q2" . 26.1684 . . "X" 81 6 "2012q2" . 37.755157 . . "X" 66 6 "2012q2" . 53.79955 . . "X" 2 6 "2012q3" . 20.235474 . . "X" 5021 6 "2012q3" . 2871.219 . . "X" 93 6 "2012q3" . 39.22329 . . "X" 210 6 "2012q4" 5 28.488956 . . "X" 19 6 "2013q1" . 52.53154 . . "X" 197 6 "2013q2" . 29.569094 . . "X" 130 6 "2013q3" . 15.983066 . . "X" 14427 7 "2012q2" . 2766.468 . . "X" 146 7 "2012q2" . 44.75117 . . "X" 92 7 "2012q2" . 44.33076 . . "X" 164 7 "2012q3" . 56.59673 . . "X" 158 7 "2012q3" . 32.74441 . . "X" 186 7 "2012q3" . 13.370055 . . "X" 1239 7 "2012q4" 5 2251.2556 . . "X" 42 7 "2013q3" . 58.74424 . . "X" 85 7 "2013q4" . 46.32192 . . "X" 53 7 "2014q1" . 12.270756 . . "X" 76 7 "2014q3" . 47.29833 . . "X" 171 7 "2016q2" . 34.806293 . . "X" 10144 7 "2016q3" . 2166.3647 . . "X" 51 7 "2016q3" . 52.9871 . . "X" 37 7 "2016q4" . 16.703777 . . end
Thanks for your help in advance
Analysis of Stock market data in Stata - First Time
I have daily stock market data of 174 firms from 2000 to 2019 (about 4400 prices per stock). I also have the market indexes for 19 separate indexes, as this is a cross-country analysis. I have attempted to import the data into stata, from excel, and then attempted to convert this data to the long form, with the results below.
My data looks like this, from (dataex):
* Example generated by -dataex-. To install: ssc install dataex dataex D_Date stock daily_return clear input str10 D_Date str5 stock str17 daily_return " 1/1/2003" "Co1" "" " 1/1/2004" "Co1" "1947.63" " 1/1/2007" "Co1" "5916.02" " 1/1/2008" "Co1" "6095.83" " 1/1/2009" "Co1" "2425.31" " 1/1/2010" "Co1" "3607.04" " 1/1/2013" "Co1" "3802.48" " 1/1/2014" "Co1" "4151.77" " 1/1/2015" "Co1" "3610.1" " 1/1/2016" "Co1" "4085.84" " 1/1/2018" "Co1" "6158.6" " 1/1/2019" "Co1" "5084.71" " 1/2/2003" "Co1" "" " 1/2/2004" "Co1" "1977.54" " 1/2/2006" "Co1" "4797.02" " 1/2/2007" "Co1" "6042.59" " 1/2/2008" "Co1" "6090.77" " 1/2/2009" "Co1" "2510.8" " 1/2/2012" "Co1" "2973.79" " 1/2/2013" "Co1" "3921.28" " 1/2/2014" "Co1" "4130.940000000001" " 1/2/2015" "Co1" "3661.88" " 1/2/2017" "Co1" "4659.3" " 1/2/2018" "Co1" "6202.12" " 1/2/2019" "Co1" "5131.13" " 1/3/2003" "Co1" "" " 1/3/2005" "Co1" "3143.99" " 1/3/2006" "Co1" "4842.7" " 1/3/2007" "Co1" "6051.79" " 1/3/2008" "Co1" "6033" " 1/3/2011" "Co1" "4390.29" " 1/3/2012" "Co1" "3000.02" " 1/3/2013" "Co1" "3938.14" " 1/3/2014" "Co1" "4162.91" " 1/3/2017" "Co1" "4712.08" " 1/3/2018" "Co1" "6279.7" " 1/3/2019" "Co1" "5132.25" " 1/4/2005" "Co1" "3167.22" " 1/4/2006" "Co1" "4903.25" " 1/4/2007" "Co1" "5984.54" " 1/4/2008" "Co1" "5887.63" " 1/4/2010" "Co1" "3666.94" " 1/4/2011" "Co1" "4369.51" " 1/4/2012" "Co1" "2969.71" " 1/4/2013" "Co1" "3932.07" " 1/4/2016" "Co1" "4002.94" " 1/4/2017" "Co1" "4707.91" " 1/4/2018" "Co1" "6401.52" " 1/4/2019" "Co1" "5293.43" " 1/5/2004" "Co1" "2015.32" " 1/5/2005" "Co1" "3137.83" " 1/5/2006" "Co1" "4914.440000000001" " 1/5/2007" "Co1" "5860.46" " 1/5/2009" "Co1" "2534.2" " 1/5/2010" "Co1" "3748.82" " 1/5/2011" "Co1" "4283.28" " 1/5/2012" "Co1" "2897.42" " 1/5/2015" "Co1" "3557.63" " 1/5/2016" "Co1" "4027.53" " 1/5/2017" "Co1" "4731.89" " 1/5/2018" "Co1" "6394.02" " 1/6/2003" "Co1" "" " 1/6/2004" "Co1" "2015.32" " 1/6/2005" "Co1" "3137.83" " 1/6/2006" "Co1" "4914.440000000001" " 1/6/2009" "Co1" "2534.2" " 1/6/2010" "Co1" "3748.82" " 1/6/2011" "Co1" "4283.28" " 1/6/2012" "Co1" "2897.42" " 1/6/2014" "Co1" "4162.91" " 1/6/2015" "Co1" "3557.63" " 1/6/2016" "Co1" "4027.53" " 1/6/2017" "Co1" "4731.89" " 1/7/2003" "Co1" "" " 1/7/2004" "Co1" "2031.12" " 1/7/2005" "Co1" "3173.95" " 1/7/2008" "Co1" "5851.84" " 1/7/2009" "Co1" "2631.09" " 1/7/2010" "Co1" "3744.43" " 1/7/2011" "Co1" "4309.83" " 1/7/2013" "Co1" "3933.59" " 1/7/2014" "Co1" "4287.97" " 1/7/2015" "Co1" "3573.88" " 1/7/2016" "Co1" "3916.55" " 1/7/2019" "Co1" "5357.7" " 1/8/2003" "Co1" "" " 1/8/2004" "Co1" "2056.84" " 1/8/2007" "Co1" "5807.37" " 1/8/2008" "Co1" "5907.91" " 1/8/2009" "Co1" "2589.72" " 1/8/2010" "Co1" "3741.35" " 1/8/2013" "Co1" "3929.04" " 1/8/2014" "Co1" "4351.91" " 1/8/2015" "Co1" "3626.56" " 1/8/2016" "Co1" "3786.08" " 1/8/2018" "Co1" "6392.610000000001" " 1/8/2019" "Co1" "5361.12" " 1/9/2003" "Co1" "" " 1/9/2004" "Co1" "2051.39" " 1/9/2006" "Co1" "4970.45" end
I need to present this data in a format from which i can calculate returns, abnormal returns and the CAR for an event study.
Can anyone help with the way i am presenting my data? I apologise if this is not in a correct format but i am very new and have tried my best.
Any help with this issue would be greatly appreciated.
Callum
Expand data
I am looking for help with expanding my dataset below to include a categorical variable that takes the values 1 to 15 for each observation. How can i do it. So i need to have 15 observations of each of the individual observations currently in the dataset.
* Example generated by -dataex-. To install: ssc install dataex clear input str5 nuts318cd str9 ttwa11cd double area_intersection_sqm float(percentage_nuts3_in_ttwa percentage_ttwa_in_nuts3) "UKC11" "E30000093" 203769975.4 68.5304 26.73737 "UKC11" "E30000199" 31866.05631 .010716955 .00800233 "UKC11" "E30000203" 8120.006566 .00273086 .00047345 "UKC11" "E30000215" 93514618.39 31.45014 92.84214 "UKC11" "E30000246" 16150.06451 .005431469 .000754046 "UKC11" "E30000275" 1721.391752 .000578926 .000566958 "UKC12" "E30000093" 298643393.4 99.99471 39.18604 "UKC12" "E30000147" 15797.86158 .005289595 .003358221 "UKC13" "E30000093" 40310.33369 .020412154 .005289259 "UKC13" "E30000199" 197413336.2 99.96523 49.57521 "UKC13" "E30000203" 14324.15059 .007253395 .000835193 "UKC13" "E30000246" 14039.81523 .007109415 .000655519 "UKC14" "E30000064" 22453.95394 .001006206 .001029967 "UKC14" "E30000093" 434.9438238 .0000194907 .0000570705 "UKC14" "E30000106" 56045.84077 .002511524 .002862576 "UKC14" "E30000199" 47431345.3 2.1254914 11.911146 "UKC14" "E30000203" 1714908898 76.84842 99.99057 "UKC14" "E30000215" 7209707.493 .3230811 7.157861 "UKC14" "E30000245" 318562308.6 14.2754 25.26244 "UKC14" "E30000246" 9913.175135 .000444229 .000462846 "UKC14" "E30000275" 143346252.4 6.423626 47.21256 "UKC21" "E30000064" 2030476941 40.40051 93.13832 "UKC21" "E30000106" 12.28169309 2.44369e-07 6.27295e-07 "UKC21" "E30000173" 1460636223 29.062355 99.99895 "UKC21" "E30000203" 10266.28956 .000204269 .000598593 "UKC21" "E30000245" 563125216.8 11.204532 44.65662 "UKC21" "K01000009" 971549507.2 19.33097 57.50991 "UKC21" "K01000010" 51041.11077 .001015568 .002415779 "UKC21" "S22000067" 20922.50236 .000416296 .001403097 "UKC22" "E30000173" 1066.288223 .000265062 .0000730009 "UKC22" "E30000245" 379294671.4 94.28657 30.0786 "UKC22" "E30000275" 22982843.17 5.713166 7.569636 "UKC23" "E30000203" 5531.766495 .004028287 .000322539 "UKC23" "E30000245" 29468.49058 .021459244 .002336893 "UKC23" "E30000275" 137288062 99.97451 45.21724 "UKD11" "E30000106" 15318.28089 .000738895 .000782391 "UKD11" "E30000163" 77898272.46 3.757515 14.123253 "UKD11" "E30000223" 7257.39641 .000350069 .000672506 "UKD11" "E30000286" 737526859.2 35.575474 99.99664 "UKD11" "E30000290" 851034545.5 41.05065 99.99887 "UKD11" "K01000010" 406650680.9 19.61527 19.246805 "UKD12" "E30000039" 392.9681989 8.27563e-06 .0000333336 "UKD12" "E30000064" 149557684.9 3.149577 6.860236 "UKD12" "E30000076" 22084.66337 .000465087 .003840515 "UKD12" "E30000106" 1957803573 41.22993 99.99604 "UKD12" "E30000163" 473662135.9 9.974983 85.87675 "UKD12" "E30000203" 82740.55843 .001742456 .004824324 "UKD12" "E30000223" 1079124408 22.72558 99.99703 "UKD12" "E30000246" 20307.96607 .000427671 .000948179 "UKD12" "E30000286" 24754.72675 .000521317 .003356338 "UKD12" "E30000290" 9621.292648 .000202617 .001130528 "UKD12" "K01000010" 1088165321 22.91598 51.50294 "UKD12" "S22000067" 27691.31014 .000583159 .001857024 "UKD33" "E30000239" 115595841.7 100 6.289208 "UKD34" "E30000239" 203167544.3 99.99309 11.05371 "UKD34" "E30000284" 14042.44305 .006911277 .001959212 "UKD35" "E30000239" 229213855.8 100 12.47081 "UKD36" "E30000170" 8056.388494 .002456537 .00111413 "UKD36" "E30000239" 172919193 52.72615 9.407992 "UKD36" "E30000255" 24226.78715 .007387181 .002584866 "UKD36" "E30000284" 155005669.1 47.264 21.62651 "UKD37" "E30000029" 7006.580527 .001751785 .001925189 "UKD37" "E30000170" 11991.00274 .00299799 .001658254 "UKD37" "E30000219" 20101.86674 .005025868 .005382483 "UKD37" "E30000239" 399928974.5 99.99023 21.758884 "UKD41" "E30000170" 107194506.6 78.21149 14.824088 "UKD41" "E30000239" 29849325 21.77873 1.6240083 "UKD41" "E30000255" 13401.89378 .00977832 .001429909 "UKD42" "E30000171" 34872029.36 100 16.13467 "UKD44" "E30000039" 12103.43673 .001412412 .001026678 "UKD44" "E30000076" 574996921.4 67.09933 99.99177 "UKD44" "E30000170" 183.1079797 .0000213678 .0000253223 "UKD44" "E30000171" 64195469.6 7.491297 29.70211 "UKD44" "E30000223" 16445.66858 .001919129 .001523937 "UKD44" "E30000255" 217712862.6 25.406025 23.228775 "UKD45" "E30000039" 7479.785688 .000744056 .000634475 "UKD45" "E30000076" 13721.62053 .001364966 .002386185 "UKD45" "E30000170" 422956930.5 42.07388 58.49134 "UKD45" "E30000171" 117063523.2 11.64496 54.16322 "UKD45" "E30000182" 19037.76823 .001893793 .006798087 "UKD45" "E30000255" 465211381.6 46.27716 49.63552 "UKD46" "E30000018" 346.9086397 .0000706419 .000100878 "UKD46" "E30000029" 14486.7692 .002949978 .00398051 "UKD46" "E30000039" 9279.847951 .001889679 .000787166 "UKD46" "E30000170" 192902964.5 39.28132 26.67684 "UKD46" "E30000182" 280009612.2 57.01907 99.98702 "UKD46" "E30000239" 18143955.22 3.6947 .9871558 "UKD47" "E30000170" 17756.33516 .003231584 .00245555 "UKD47" "E30000233" 248511241.2 45.22808 40.97392 "UKD47" "E30000239" 18148.76411 .003303004 .000987417 "UKD47" "E30000255" 254286723.4 46.27919 27.131006 "UKD47" "E30000284" 46628424.33 8.486192 6.505633 "UKD61" "E30000239" 38439.87147 .02128579 .002091393 "UKD61" "E30000284" 180550930.7 99.97871 25.190605 "UKD62" "E30000185" 841.4333672 .0000721424 .000189205 "UKD62" "E30000197" 662756971.6 56.82312 78.0034 "UKD62" "E30000239" 475823055.3 40.79588 25.88804 "UKD62" "E30000262" 13702.54716 .001174822 .001188375 "UKD62" "E30000273" 27661249.03 2.371606 2.5870845 "UKD62" "E30000284" 19827.24759 .001699938 .002766313 end
Best,
Bridget