Hi, I'm replicating a report, and this is the attached do-file
use wintertime_final, clear
gen snodas_weighted=snodas_max+(.25+.125*west)*(fsnoda s_max-snodas_max)
xi i.wk i.sn
cluster2 resort wk2 _I*, tcluster(d) fcluster(sn)
outreg2 using table2, replace excel dec(3) keep(wk2)
cluster2 resort wk2 _I* if snodas_max~=., tcluster(d) fcluster(sn)
outreg2 using table2, excel dec(3) keep(wk2)
cluster2 resort wk2 snodas_max _I*, tcluster(d) fcluster(sn)
outreg2 using table2, excel dec(3) keep(wk2 snodas_max)
cluster2 resort wk2 snodas_weighted _I*, tcluster(d) fcluster(sn)
outreg2 using table2, excel dec(3) keep(wk2 snodas_weighted)
cluster2 resort wk2 fsnodas_max snodas_max lsnodas_max _I*, tcluster(d) fcluster(sn)
outreg2 using table2, excel dec(3) keep(wk2 fsnodas_max snodas_max lsnodas_max)
But I cannot use cluster 2 (I installed cluster2.ado), each time I run the data, it shows "varlist not allowed",
Can anyone help me in this one?
Thank you!
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Monday, November 30, 2020
Specific Values of Variables and creating a new Variable
Hello,
I am trying to takes 3 variables and take a specific value from that variable, and create a new variable with this.I am trying to take LIVING ALONE from each variable and create one variable. But, they each have different values. SZFFHERE = 1 for leaving alone, FFRESIDE = 3 for living alone, and SZFFONLY = 0 for living alone. I have been trying FOR HOURS to combine these three values from variables and make one variable called LIVINGALONE. Any advice please!


I am trying to takes 3 variables and take a specific value from that variable, and create a new variable with this.I am trying to take LIVING ALONE from each variable and create one variable. But, they each have different values. SZFFHERE = 1 for leaving alone, FFRESIDE = 3 for living alone, and SZFFONLY = 0 for living alone. I have been trying FOR HOURS to combine these three values from variables and make one variable called LIVINGALONE. Any advice please!
Terrible parallelization performance in Mata
I just ran an example from the SJ paper about boottest ("Fast and Wild") and was surprised at its poor performance. I'm fortunate enough to be working in Stata/MP, and I discovered that the more processors I enabled it to use, the slower it got. I'm using a Dell XPS 17 9700, which has a pretty good cooling system. Its CPU is an Intel i7-10875H, which has 8 cores and hyperthreading. I'm running Stata/MP 12-core 16.1. It's got 64GB of RAM and Windows 10 Pro.
Here is the log from a distilled demonstration. It sets the number of cores to 1, 2, ..., 12. On each iteration it calls a program that creates a 2500 x 1 matrix X and then computes X + X :* X 10,000 times. Simplifying that calculation to X + X or X :* X makes the problem go away.
I'm wondering if anyone else with access to Stata/MP gets similar results, or has insights. Possibly it doesn't happen on all computers. I understand that implementing invisible parallelization in a compiler is a tricky business. But Stata/MP doesn't come cheap!
Output:
That's right: using 1 core takes 0.14 seconds. Using 8 cores takes 3.84 seconds. Using 12 (with hyperthreading) takes 5.63 seconds.
Here's output I get from Stata 15.0--it's actually better! But still bad:
I monitored CPU usage during these tests and saw no evidence of throttling.
I'm worried that my Mata-based programs are getting seriously slowed down.
If you've got MP and can run this test, I'd be interested in the results.
Here is the log from a distilled demonstration. It sets the number of cores to 1, 2, ..., 12. On each iteration it calls a program that creates a 2500 x 1 matrix X and then computes X + X :* X 10,000 times. Simplifying that calculation to X + X or X :* X makes the problem go away.
I'm wondering if anyone else with access to Stata/MP gets similar results, or has insights. Possibly it doesn't happen on all computers. I understand that implementing invisible parallelization in a compiler is a tricky business. But Stata/MP doesn't come cheap!
Code:
cap mata mata drop demo()
mata
mata set matastrict on
mata set matalnum off
mata set mataoptimize on
void demo() {
real matrix X; real scalar i
X = runiform(2500,1)
for (i=10000; i; i--)
(void) X + X :* X
}
end
timer clear
forvalues p=1/12 {
qui set processors `p'
set seed 1202938431
timer on `p'
mata demo()
timer off `p'
}
timer listCode:
. timer list 1: 0.14 / 1 = 0.1390 2: 0.16 / 1 = 0.1640 3: 1.63 / 1 = 1.6330 4: 2.02 / 1 = 2.0150 5: 2.47 / 1 = 2.4680 6: 2.92 / 1 = 2.9210 7: 3.38 / 1 = 3.3780 8: 3.84 / 1 = 3.8370 9: 4.26 / 1 = 4.2640 10: 4.70 / 1 = 4.7040 11: 5.21 / 1 = 5.2100 12: 5.63 / 1 = 5.6260
Here's output I get from Stata 15.0--it's actually better! But still bad:
Code:
1: 0.13 / 1 = 0.1280 2: 0.14 / 1 = 0.1440 3: 1.09 / 1 = 1.0900 4: 1.35 / 1 = 1.3460 5: 1.55 / 1 = 1.5540 6: 1.78 / 1 = 1.7810 7: 2.10 / 1 = 2.1010 8: 2.35 / 1 = 2.3540 9: 2.59 / 1 = 2.5920 10: 2.89 / 1 = 2.8890 11: 3.16 / 1 = 3.1570 12: 3.41 / 1 = 3.4120
I'm worried that my Mata-based programs are getting seriously slowed down.
If you've got MP and can run this test, I'd be interested in the results.
Replace missing values with those of previous values or those of the partner
Hi Statalist.
I am using annual survey responses on marital status from a panel study (for each partner in a couple - married/cohabitation) for my analysis. Around 0.8% of responses are missing and I want to replace the missing values with lagged values (for the given partner) or those of the other partner if these are also missing - it is not uncommon that only one in the couple answers this question. Sometimes individuals in a couple miss answering this question in a given year(s). Is this approach reasonable, particularly if there are lagged values and future values and these are the same?
I'm sure my draft code does not achieve what I want it to, so help here is appreciated (mrcurr1 - marital status of male partner); mrcurr2 - marital status of female partner)
Sample data:
I am using annual survey responses on marital status from a panel study (for each partner in a couple - married/cohabitation) for my analysis. Around 0.8% of responses are missing and I want to replace the missing values with lagged values (for the given partner) or those of the other partner if these are also missing - it is not uncommon that only one in the couple answers this question. Sometimes individuals in a couple miss answering this question in a given year(s). Is this approach reasonable, particularly if there are lagged values and future values and these are the same?
I'm sure my draft code does not achieve what I want it to, so help here is appreciated (mrcurr1 - marital status of male partner); mrcurr2 - marital status of female partner)
Code:
bys id (wave): replace mrcurr1 = L.mrcurr1 if missing(mrcurr1) bys id (wave): replace mrcurr1 = L.mrcurr2 if missing(mrcurr1) // in case the current wave response is missing for both bys id (wave): replace mrcurr2 = L.mrcurr2 if missing(mrcurr2) bys id (wave): replace mrcurr2 = L.mrcurr1 if missing(mrcurr2)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave mrcurr1 mrcurr2) 11 61 8 . 2 11 61 12 . 2 11 61 13 . 2 11 61 14 . 2 11 61 15 . 2 11 61 16 . 2 11 61 17 . 2 11 61 18 . 2 13 60 12 . 1 13 60 13 . 1 13 60 14 . 1 13 60 15 . 1 13 60 16 . 1 13 60 17 . 1 13 60 18 . 1 16 64 8 2 . 16 64 9 2 . 16 64 10 2 . 16 64 11 1 . 16 64 12 1 . 16 64 13 1 . 16 64 14 1 . 16 64 15 1 . 16 64 16 1 . 16 64 17 1 . 16 64 18 1 . 23 24 6 . . 23 24 7 . . 23 24 8 . . 23 24 9 . . 23 24 10 . . 23 24 11 . . 23 24 12 . . end
Lincom for interpreting interactions in logistic regression
Hello,
I am estimating, for example, the following double-difference / difference-in-difference logistic regression where opt_dum, depstat (Mild or None), and treat (HT or Con) are all binary variables.
The output of the model is as follows:
I then use the margins command to obtain the marginal estimates
My question is whether the following lincom code is appropriate for testing the double difference i.e. to test whether the association between depstat and opt_dum varies by the treat variable
Thank you,
Caroline
I am estimating, for example, the following double-difference / difference-in-difference logistic regression where opt_dum, depstat (Mild or None), and treat (HT or Con) are all binary variables.
The output of the model is as follows:
Code:
logit opt_dum i.depstat##i.treat, or
Iteration 0: log likelihood = -166.14693
Iteration 1: log likelihood = -121.37891
Iteration 2: log likelihood = -120.88007
Iteration 3: log likelihood = -120.87909
Iteration 4: log likelihood = -120.87909
Logistic regression Number of obs = 240
LR chi2(3) = 90.54
Prob > chi2 = 0.0000
Log likelihood = -120.87909 Pseudo R2 = 0.2725
-------------------------------------------------------------------------------
opt_dum | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
depstat |
Mild | .3891403 .1851221 -1.98 0.047 .1531694 .9886448
|
treat |
HT | 19.15126 9.456233 5.98 0.000 7.276179 50.40706
|
depstat#treat |
Mild#HT | .5459965 .3697249 -0.89 0.372 .1448083 2.058668
|
_cons | .3953488 .1132654 -3.24 0.001 .2254839 .693179
-------------------------------------------------------------------------------Code:
. margins i.depstat##i.treat, post coeflegend
Predictive margins Number of obs = 240
Model VCE : OIM
Expression : Pr(opt_dum), predict()
-------------------------------------------------------------------------------
| Margin Legend
--------------+----------------------------------------------------------------
depstat |
Non | .5833333 _b[1bn.depstat]
Mild | .375 _b[2.depstat]
|
treat |
Con | .2083333 _b[1bn.treat]
HT | .75 _b[2.treat]
|
depstat#treat |
Non#Con | .2833333 _b[1bn.depstat#1bn.treat]
Non#HT | .8833333 _b[1bn.depstat#2.treat]
Mild#Con | .1333333 _b[2.depstat#1bn.treat]
Mild#HT | .6166667 _b[2.depstat#2.treat]
-------------------------------------------------------------------------------Code:
. lincom ( _b[2.depstat#2.treat]- _b[2.depstat#1bn.treat])-(_b[1bn.depstat#2.treat]-_b[1bn.depstat#1bn.treat])
( 1) 1bn.depstat#1bn.treat - 1bn.depstat#2.treat - 2.depstat#1bn.treat + 2.depstat#2.treat = 0
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -.1166667 .1047263 -1.11 0.265 -.3219264 .0885931
------------------------------------------------------------------------------Caroline
Plot Transition Matrix on a Graph
Hi all,
I would like to plot in a graph the transition matrix from employment to employment (EtoE), unemployment to employment (UtoE), unemployment to unemployment (UtoU), etc. over the period 2010 t0 2014.
I would like to have the year in the x-axis and the transition probability on the y-axis. Below is the code that I wrote:
When I run this code, I obtained 0 for all the probability of transition. Can someone tell me where I made mistake?
My data looks like the following:
Thanks in advance for your help.
I would like to plot in a graph the transition matrix from employment to employment (EtoE), unemployment to employment (UtoE), unemployment to unemployment (UtoU), etc. over the period 2010 t0 2014.
I would like to have the year in the x-axis and the transition probability on the y-axis. Below is the code that I wrote:
Code:
set more off
forvalues i=2010/2014{
gen em1`i'=labor==1 if year==`i'
gen em2`i'=labor==2 if year==`i'
gen em3`i'=labor==3 if year==`i'
}
forvalues i=2010/2013{
local j=`i'+1
gen UtoN`i'=(em2`i'==1 & em3`j'==1) if em2`i'==1
gen UtoE`i'=(em2`i'==1 & em1`j'==1) if em2`i'==1
gen UtoU`i'=(em2`i'==1 & em2`j'==1) if em2`i'==1
gen NtoE`i'=(em3`i'==1 & em1`j'==1) if em3`i'==1
gen NtoU`i'=(em3`i'==1 & em2`j'==1) if em3`i'==1
gen NtoN`i'=(em3`i'==1 & em3`j'==1) if em3`i'==1
gen EtoU`i'=(em1`i'==1 & em2`j'==1) if em1`i'==1
gen EtoN`i'=(em1`i'==1 & em3`j'==1) if em1`i'==1
gen EtoE`i'=(em1`i'==1 & em1`j'==1) if em1`i'==1
}
collapse (mean) UtoN2010-EtoE2013, by(gender state)
reshape long UtoN UtoE UtoU NtoE NtoU NtoN EtoU EtoN EtoE, i(gender state) j(year)
graph twoway ( (line UtoE year,lpattern(dash) lcolor(red)) (line UtoU year,lpattern(dash) lcolor(black))(line EtoE year, lcolor(black)), ///
legend(label(1 "U->E") label(2 "U->U") label(3 "E->E")) by(state))My data looks like the following:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int ID double(year labor gender state) 11112 2010 3 1 1 11112 2011 1 1 1 11112 2012 3 1 2 11112 2013 2 1 1 11112 2014 1 1 1 11113 2010 2 1 1 11113 2011 2 1 1 11113 2012 1 1 2 11113 2013 2 1 2 11113 2014 1 1 1 11114 2010 1 1 2 11114 2011 3 1 2 11114 2012 3 1 1 11114 2013 2 1 1 11114 2014 1 1 1 11115 2010 2 0 1 11115 2011 2 0 1 11115 2012 1 0 1 11115 2013 3 0 2 11115 2014 1 0 1 11116 2010 1 0 2 11116 2011 1 0 2 11116 2012 3 0 2 11116 2013 2 0 2 11116 2014 3 0 2 end label values labor Employment label def Employment 1 "Employed", modify label def Employment 2 "Unemployed", modify label def Employment 3 "Inactive", modify label values gender Female label def Female 1 "Female", modify label def Female 0 "Male", modify label values state Region label def Region 1 "North", modify label def Region 2 "South", modify
Twoway graph with two bars in one diagram
Hello everyone,
I am new to Stata and currently trying to create a bar chart that shows GDP data from two different countries over a certain period of time. I figured out how to create a bar chart with only one country but now I would like to add bars that indicate a value for another country in the same graph. What I would like to show in the graph is the comparison of quarterly GDP values of two different countries. Here is my code:
graph twoway (bar obsvalue qdate if country == "DEU", color(gs8)) ///
(bar obsvalue qdate if country == "ITA", color(black)) ///
ytitle("Gross Value Added in Million US$") xtitle("Period") ///
xlabel(232 "Q1/2018" 233 "Q2/2018" 234 "Q3/2018" 235 "Q4/2018" 236 "Q1/2019" ///
237 "Q2/2019" 238 "Q3/2019" 239 "Q4/2019" 240 "Q1/2020" 241 "Q2/2020", angle(45)) ///
legend(order(1 "Germany" 2 "Italy"))
When I execute this code I always get the error message: ) required. I really cannot figure out where in the code the brackets are missing.
I would really appreciate any advise how to fix that problem.
I am new to Stata and currently trying to create a bar chart that shows GDP data from two different countries over a certain period of time. I figured out how to create a bar chart with only one country but now I would like to add bars that indicate a value for another country in the same graph. What I would like to show in the graph is the comparison of quarterly GDP values of two different countries. Here is my code:
graph twoway (bar obsvalue qdate if country == "DEU", color(gs8)) ///
(bar obsvalue qdate if country == "ITA", color(black)) ///
ytitle("Gross Value Added in Million US$") xtitle("Period") ///
xlabel(232 "Q1/2018" 233 "Q2/2018" 234 "Q3/2018" 235 "Q4/2018" 236 "Q1/2019" ///
237 "Q2/2019" 238 "Q3/2019" 239 "Q4/2019" 240 "Q1/2020" 241 "Q2/2020", angle(45)) ///
legend(order(1 "Germany" 2 "Italy"))
When I execute this code I always get the error message: ) required. I really cannot figure out where in the code the brackets are missing.
I would really appreciate any advise how to fix that problem.
Sum of dummy variable up until the point of observation
Dear STATALIST forum,
I have a question about generating a variable that depicts the sum of a dummy variable up until point x. To help you understand, here is how my data looks like right now:
And here is how I would like to have it:
To conclude: I would like to generate a sum of the genre dummy up until the observation was made, but not going beyond platforms.
Thanks for your help. It is very much appreciated.
Best,
Sebastian
I have a question about generating a variable that depicts the sum of a dummy variable up until point x. To help you understand, here is how my data looks like right now:
| platform | year | month | genre_id | Genre1_dummy | Genre2_dummy | Genre3_dummy |
| microsoft | 2003 | 4 | 1 | 1 | 0 | 0 |
| microsoft | 2003 | 8 | 3 | 0 | 0 | 1 |
| macOS | 2002 | 1 | 1 | 1 | 0 | 0 |
| macOS | 2002 | 4 | 2 | 0 | 1 | 0 |
| macOS | 2003 | 8 | 1 | 1 | 0 | 0 |
| platform | year | month | genre_id | Genre1_dummy | Genre2_dummy | Genre3_dummy | genre_1 | genre_2 | genre_3 |
| microsoft | 2003 | 4 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| microsoft | 2003 | 8 | 3 | 0 | 0 | 1 | 1 | 0 | 1 |
| macOS | 2002 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| macOS | 2002 | 4 | 2 | 0 | 1 | 0 | 1 | 1 | 0 |
| macOS | 2003 | 8 | 1 | 1 | 0 | 0 | 2 | 1 | 0 |
Thanks for your help. It is very much appreciated.
Best,
Sebastian
Let values continue
Hello, can someone tell me the command how to do what is shown on the screenshot.
You copy the values of another variable and then pull it down for the same mergeid.
Sorry for the bad paint-display, I hope it makes clear, what I want to express with this. :D
Thanks in advance!
You copy the values of another variable and then pull it down for the same mergeid.
Sorry for the bad paint-display, I hope it makes clear, what I want to express with this. :D
Thanks in advance!
Intercepts at knots in mixed
Dear all,
I am currently working on a mixed model with incorporation of knots at specific timepoints to compare the course over time between 5 different groups (cohorts).
Yet, I cannot figure out how I can retrieve the estimates of the intercept at these knots. And how to compare the intercept at these knots between the 5 groups.
Can anyone help me with this?
My mixed model:
mixed DAS28ESR i.cohort##c.years c.Age_incl i.sex ||EACNUMM: c.years, covariance(unstructured) reml
In which DAS28ESR is a measure for disease activity (continuous) and EACNUMM are the individuals.
Thanks in advance!
Kind regards,
Marloes
I am currently working on a mixed model with incorporation of knots at specific timepoints to compare the course over time between 5 different groups (cohorts).
Yet, I cannot figure out how I can retrieve the estimates of the intercept at these knots. And how to compare the intercept at these knots between the 5 groups.
Can anyone help me with this?
My mixed model:
mixed DAS28ESR i.cohort##c.years c.Age_incl i.sex ||EACNUMM: c.years, covariance(unstructured) reml
In which DAS28ESR is a measure for disease activity (continuous) and EACNUMM are the individuals.
Thanks in advance!
Kind regards,
Marloes
distribution explanatory variable for different birth cohorts
Hi there,
I'm struggling to understand part of a assignment that I need to do.
I have to examine whether education has an effect on the total number of kids a woman has.
The data is taken from a survey, surveyed between 1985 and 2009. Each woman is observed only once in the data, hence the women have a different age and might be observed in a different year.
The women are divided into 10 10-year birth cohorts (e.g. a woman born in 1915 is in birth cohort 1911-1920). Education is measured as years of schooling.
"I have to check whether the distribution of education looks reasonable for different birth cohorts. I should conclude that birth cohorts 1, 2, 3, and 10 should not be used in the analysis."
Below is the distribution of education in years of schooling for the different birth cohorts. I can't figure out on the basis of what I should remove birth cohorts 1, 2, 3, and 10. Could anyone help me out? Thank you in advance (:
Array
I'm struggling to understand part of a assignment that I need to do.
I have to examine whether education has an effect on the total number of kids a woman has.
The data is taken from a survey, surveyed between 1985 and 2009. Each woman is observed only once in the data, hence the women have a different age and might be observed in a different year.
The women are divided into 10 10-year birth cohorts (e.g. a woman born in 1915 is in birth cohort 1911-1920). Education is measured as years of schooling.
"I have to check whether the distribution of education looks reasonable for different birth cohorts. I should conclude that birth cohorts 1, 2, 3, and 10 should not be used in the analysis."
Below is the distribution of education in years of schooling for the different birth cohorts. I can't figure out on the basis of what I should remove birth cohorts 1, 2, 3, and 10. Could anyone help me out? Thank you in advance (:
Array
tabstat of multiple dummy variables
Hello,
I have a large dataset of N = 59,000,000.
I have a set of diagnoses dummy variables a00_a09 - u_unclassified, and a set of service code dummy variables, the service code dummy variables begin with ncb*
I need to find how many times a diagnoses code occurs for a particular service code.
I thought something like the below would work, I simply count where there is cost, for each diagnosis code, by service code
foreach var of varlist a00_a09 - u_unclassified {
tabstat total_ss_costs if `var' ==1, s(n) by (ncb*) }
Every record has a cost.
However, I think I need to specify the the ncb* differently.
Also, as the code is written above, Stata would produce a new output table for each of the 'var' - but would not label the output table with the 'var' name - is it possible to amend the code so that this could happen
Can anyone help?
sample data is below:
I have a large dataset of N = 59,000,000.
I have a set of diagnoses dummy variables a00_a09 - u_unclassified, and a set of service code dummy variables, the service code dummy variables begin with ncb*
I need to find how many times a diagnoses code occurs for a particular service code.
I thought something like the below would work, I simply count where there is cost, for each diagnosis code, by service code
foreach var of varlist a00_a09 - u_unclassified {
tabstat total_ss_costs if `var' ==1, s(n) by (ncb*) }
Every record has a cost.
However, I think I need to specify the the ncb* differently.
Also, as the code is written above, Stata would produce a new output table for each of the 'var' - but would not label the output table with the 'var' name - is it possible to amend the code so that this could happen
Can anyone help?
sample data is below:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float total_ss_costs byte(r69 r50_r68 r47_r49 r40_r46 ncbps29m_hist ncbps29b_hist ncbps27b_hist)
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
7143.572 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
141.99995 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
6460.776 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1
0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
903.4579 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
4346.248 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
426.3519 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
13727.894 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
284.62573 0 0 0 0 0 0 0
110.00005 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
529 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
endConfidence-interval plots / any plot : resize possible?
Dear Statalisters
My question, basically, is whether there is a 'correct' way to resize Stata graphs (including the background region), so that the graph becomes narrower.
This is the basic graph I start with, using the command ciplot:
[ATTACH=CONFIG]temp_20524_1606749475305_891[/ATTACH]
I'd prefer if this plot were less wide. For example, as shown in the photo below (not my slide):
[ATTACH=CONFIG]temp_20522_1606748419290_207[/ATTACH]
One option is to tinker with the aspect ratio of the overall graph. An aspect ratio of 3, for example, makes the graph narrower - but the background region remains the same.
[ATTACH=CONFIG]temp_20525_1606749624951_754[/ATTACH]
If I were to
the above plot (aspect ratio 3) with another plot (same aspect ratio of 3), this is the result:
[ATTACH=CONFIG]temp_20526_1606750008219_260[/ATTACH]
Although It would certainly do the job, I'd prefer if there was less white space between the two graphs. Basically, I'd prefer to replicate the look of the 5 plots closely next to each other as in my photo.
The 'incorrect' option would be to crop my individual plots in a separate tool (eg. Word) and then paste them next to each other, but I've decided to 'dig deep' into this matter and find out whether there is a 'correct' way to resize this using Stata.
Finally, please note that I've also tried
[ATTACH=CONFIG]temp_20527_1606750564923_894[/ATTACH]
This works fine too, but my issue regarding the white space between the two groups remains.
Editing the aspect ratio makes the white space between the groups smaller, but leaves the background region unchanged, similar to the ciplot command.
Any help is much appreciated.
Stata version 16.1
My question, basically, is whether there is a 'correct' way to resize Stata graphs (including the background region), so that the graph becomes narrower.
This is the basic graph I start with, using the command ciplot:
Code:
ciplot female, by(resgroup_march) scheme(lean1)
I'd prefer if this plot were less wide. For example, as shown in the photo below (not my slide):
[ATTACH=CONFIG]temp_20522_1606748419290_207[/ATTACH]
One option is to tinker with the aspect ratio of the overall graph. An aspect ratio of 3, for example, makes the graph narrower - but the background region remains the same.
[ATTACH=CONFIG]temp_20525_1606749624951_754[/ATTACH]
If I were to
Code:
graph combine
[ATTACH=CONFIG]temp_20526_1606750008219_260[/ATTACH]
Although It would certainly do the job, I'd prefer if there was less white space between the two graphs. Basically, I'd prefer to replicate the look of the 5 plots closely next to each other as in my photo.
The 'incorrect' option would be to crop my individual plots in a separate tool (eg. Word) and then paste them next to each other, but I've decided to 'dig deep' into this matter and find out whether there is a 'correct' way to resize this using Stata.
Finally, please note that I've also tried
Code:
statsby mean=r(mean) ub=r(ub) lb=r(lb), by(resgroup_march) clear: ci means female twoway (rcap ub lb resgroup_march) (scatter mean resgroup_march), ytitle(Share of female respondents) ylabel(0.30(0.10)0.70, gmin) xlabel(0(1)1, valuelabel)
This works fine too, but my issue regarding the white space between the two groups remains.
Editing the aspect ratio makes the white space between the groups smaller, but leaves the background region unchanged, similar to the ciplot command.
Any help is much appreciated.
Stata version 16.1
How to perform NLS estimation with a complex function
Dear Stata users,
I need to use NLS to estimate the following equation describing the so-called Bass diffusion model:
yt = m * [ ((1 - exp (-(p+q) * t)) / (1 + q/p * exp (-(p+q) * t))) - ((1 - exp (-(p+q) * (t-1))) / (1 + q/p * exp (-(p+q) * (t-1)))) ] + ut
where m,p, and q are parameters and t indicates time. As initial values for these parameters, I am to use those obtained with OLS applied to:
yt = β0 + β1 x1, t-1 + β2 x2, t-1 + ϵt
where β0 = m * p
β1 = q - p
β2= - q/m
Note that m = 41302528
p = 0.024
q = 0.194
To perform this NLS estimation, I tried to follow the example that was given in the Stata forum (https://www.stata.com/features/overv...ar-regression/) in order to create a program that defines my function.Then, I used the nl command and specified the initial values.
When I run the command, I get the following message:
verify that nlces is a function evaluator program
r(198);
Would you please help me figure out what the problem is? I am a beginner in Stata software.
Thanks in advance.
I need to use NLS to estimate the following equation describing the so-called Bass diffusion model:
yt = m * [ ((1 - exp (-(p+q) * t)) / (1 + q/p * exp (-(p+q) * t))) - ((1 - exp (-(p+q) * (t-1))) / (1 + q/p * exp (-(p+q) * (t-1)))) ] + ut
where m,p, and q are parameters and t indicates time. As initial values for these parameters, I am to use those obtained with OLS applied to:
yt = β0 + β1 x1, t-1 + β2 x2, t-1 + ϵt
where β0 = m * p
β1 = q - p
β2= - q/m
Note that m = 41302528
p = 0.024
q = 0.194
To perform this NLS estimation, I tried to follow the example that was given in the Stata forum (https://www.stata.com/features/overv...ar-regression/) in order to create a program that defines my function.Then, I used the nl command and specified the initial values.
When I run the command, I get the following message:
verify that nlces is a function evaluator program
r(198);
Would you please help me figure out what the problem is? I am a beginner in Stata software.
Thanks in advance.
Endogeneity & dropping of constraints
Hi,
I am doing Durbin–Wu–Hausman test to test my independent variables for endogeniety.
My data is cross sectional and I have a matched pairs design. For the purpose of analysis, I would use either logistic regression or probit or conditional logistic regression.
I am using the following steps to test for endogeneity:
1) regress X1 X2 X3 X4 X5 X6 X7 (X1 to X7 are all independent variables and X1 is being tested for endogeneity)
2) predict X1_res, res
3) regress DV X1 X2 X3 X4 X5 X6 X7 X1_res (DV is the dependent variable)
4) test X1_res
For a lot of the independent variables, STATA is returning the message that the constraint is dropped. For eg: o.CDualy_res = 0
Constraint 1 dropped
Could you please help me understand why are the constraints is getting dropped?
I am doing Durbin–Wu–Hausman test to test my independent variables for endogeniety.
My data is cross sectional and I have a matched pairs design. For the purpose of analysis, I would use either logistic regression or probit or conditional logistic regression.
I am using the following steps to test for endogeneity:
1) regress X1 X2 X3 X4 X5 X6 X7 (X1 to X7 are all independent variables and X1 is being tested for endogeneity)
2) predict X1_res, res
3) regress DV X1 X2 X3 X4 X5 X6 X7 X1_res (DV is the dependent variable)
4) test X1_res
For a lot of the independent variables, STATA is returning the message that the constraint is dropped. For eg: o.CDualy_res = 0
Constraint 1 dropped
Could you please help me understand why are the constraints is getting dropped?
Hausman test/Estimating pmg,mg, dfe
Good afternoon
I am estimating a dynamic model with pmg, mg and dfe. the study has to do with the application of the kuznets curve to CO2, co2 = co2 (primary energy, gdp, gdp ^ 2, trade openess).
I discovered the optimal lags (1,1,0,0,0) in the order of variables described above. however I don't know if I'm using the right commands to do the hausman test and estimate the model.
xtpmg d.lco2pc d.lceppc d.lpibpc d.lpib2pc d.labr, lr( l.lco2pc l.lceppc lpib2pc labr) ec(ECT) replace mg
xtpmg d.lco2pc d.lceppc d.lpibpc d.lpib2pc d.labr, lr( l.lco2pc l.lceppc lpib2pc labr) ec(ECT) replace pmg
hausman mg pmg
best regards
PS: co2=co2
primary energy=ceppc
GDP=pib
GDP^2=pib^2
trade openess=abr
l(var)=ln(var)
Subsample dataset under multiple conditions
Windows 10
Stata 16
Hello!
I would like to ask help on how to balance my dataset under several constraints simultaneously.
I have collected a sample of 5946 respondents, with the following demographic characteristics: region, gender and age.
The sample is not representative on a national level, based on the above characteristics. In fact, these are the percentages in the real population (from the census) and in my sample are:
I would like to balance my sample, so it matches all the population quotas according to the above tables. In case there is more than one datasets (ie, subsamples) satisfying the above conditions, I would like to keep the largest one. In case there are more than one datasets satisfying the above conditions, and are of equal size, I would like to keep all of them.
I only need the sample to be balanced on a national level, not on a regional level. For instance, I need 49% males in the overall subsample but I don’t necessarily need 49% in each region. That means that regions that have more women can compensate for those who have a deficiency in women etc.
The constraints are not really that strict. For example, I don’t need to get exactly 49.03% males. I could do 49.04%, or any other percentage that would not be significantly different from 49.03%.
The test we use to estimate the significance is a two-sample test of proportions (prtesti in STATA)
Is there any way to do this? I considered gsample, but I am not sure how to move forward with it or if it even relevant to my case.
Thank you for your time in advance
Best,
Eleni
Stata 16
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long region str6 gender float age 8 "male" 4 2 "male" 5 13 "male" 5 10 "male" 3 4 "male" 5 14 "female" 4 2 "male" 4 12 "male" 3 13 "male" 4 1 "male" 5 8 "male" 1 11 "female" 3 14 "female" 4 5 "male" 5 8 "male" 3 5 "female" 4 7 "male" 2 9 "male" 4 11 "male" 5 8 "female" 5 4 "female" 4 2 "male" 4 6 "male" 5 9 "male" 3 2 "male" 4 2 "male" 2 11 "female" 5 9 "male" 5 8 "male" 4 13 "female" 1 2 "female" 1 9 "male" 3 2 "male" 5 2 "male" 4 9 "male" 5 9 "male" 4 2 "female" 4 2 "male" 4 2 "male" 5 4 "male" 4 2 "male" 5 11 "female" 5 2 "female" 5 5 "female" 4 11 "male" 5 8 "male" 5 10 "male" 3 11 "female" 1 2 "female" 5 2 "male" 4 4 "male" 3 13 "male" 5 8 "male" 4 1 "male" 5 5 "male" 3 2 "male" 1 5 "female" 3 10 "male" 2 14 "male" 4 12 "male" 4 2 "male" 5 6 "female" 3 2 "male" 4 11 "female" 1 11 "male" 5 12 "female" 4 2 "male" 2 14 "male" 3 2 "female" 4 1 "female" 3 12 "female" 4 9 "female" 4 2 "female" 3 8 "male" 3 2 "female" 4 11 "male" 3 8 "female" 4 5 "male" 5 9 "male" 4 10 "female" 5 13 "male" 4 10 "male" 4 13 "female" 5 2 "female" 4 13 "male" 3 5 "male" 1 4 "male" 4 13 "male" 1 8 "male" 3 1 "male" 3 8 "male" 1 13 "male" 5 2 "male" 4 8 "male" 3 13 "female" 4 8 "male" 3 1 "female" 3 4 "female" 1 7 "female" 1 2 "male" 5 end label values region periph_res label def periph_res 1 "Anatolikis Macedonias kai Thrakis", modify label def periph_res 2 "Attikis", modify label def periph_res 4 "Dytikis Elladas", modify label def periph_res 5 "Dytikis Makedonias", modify label def periph_res 6 "Ionion Nison", modify label def periph_res 7 "Ipeirou", modify label def periph_res 8 "Kentrikis Makedonias", modify label def periph_res 9 "Kritis", modify label def periph_res 10 "Notiou Aigaiou", modify label def periph_res 11 "Peloponnisou", modify label def periph_res 12 "Stereas Elladas", modify label def periph_res 13 "Thessalias", modify label def periph_res 14 "Voreiou Aigaiou", modify label values age agel label def agel 1 "18-24", modify label def agel 2 "25-34", modify label def agel 3 "35-44", modify label def agel 4 "45-54", modify label def agel 5 "55 plus", modify
Hello!
I would like to ask help on how to balance my dataset under several constraints simultaneously.
I have collected a sample of 5946 respondents, with the following demographic characteristics: region, gender and age.
The sample is not representative on a national level, based on the above characteristics. In fact, these are the percentages in the real population (from the census) and in my sample are:
| REGION CODE | CENSUS (Population) (%) | Sample (%) |
| R1 | 5.62 | 5.8 |
| R2 | 35.40 | 33.54 |
| R3 | 6.28 | 4.44 |
| R4 | 2.62 | 2.24 |
| R5 | 1.92 | 1.66 |
| R6 | 3.11 | 3.21 |
| R7 | 17.40 | 19.07 |
| R8 | 5.76 | 5.89 |
| R9 | 2.86 | 3.55 |
| R10 | 5.34 | 4.89 |
| R11 | 5.06 | 4.86 |
| R12 | 6.77 | 6.63 |
| R13 | 1.84 | 4.22 |
| GENDER | Males | Females |
| CENSUS (Population) (%) | 49.03 | 50.97 |
| Sample (%) | 55.63 | 44.37 |
| AGE GROUP | 18-24 | 25-34 | 35-44 | 45-54 | 55 over |
| CENSUS (Population) (%) | 9.63 | 17.32 | 18.43 | 16.58 | 38.03 |
| Sample (%) | 9.2 | 14.3 | 24 | 26.86 | 25.65 |
I would like to balance my sample, so it matches all the population quotas according to the above tables. In case there is more than one datasets (ie, subsamples) satisfying the above conditions, I would like to keep the largest one. In case there are more than one datasets satisfying the above conditions, and are of equal size, I would like to keep all of them.
I only need the sample to be balanced on a national level, not on a regional level. For instance, I need 49% males in the overall subsample but I don’t necessarily need 49% in each region. That means that regions that have more women can compensate for those who have a deficiency in women etc.
The constraints are not really that strict. For example, I don’t need to get exactly 49.03% males. I could do 49.04%, or any other percentage that would not be significantly different from 49.03%.
The test we use to estimate the significance is a two-sample test of proportions (prtesti in STATA)
Is there any way to do this? I considered gsample, but I am not sure how to move forward with it or if it even relevant to my case.
Thank you for your time in advance
Best,
Eleni
Looping to create a difference score
Hello all,
I am working on a project where I need difference scores. I will have a number of difference scores, so typing the command over-and-over will become tedious. I am trying to develop a loop that will provide me with ONLY the difference scores for my variables of interest. I realize there is probably some process that I am overlooking, so I need some assistance, please. I am working with the following code:
** setting up data **
set obs 10000
set seed 40245
*set trace on
gen x1 = rnormal(25, 10)
gen x2 = rnormal(12, 3)
gen x3 = rnormal(10, 5)
gen x4 = rnormal(15, 8)
gen x5 = rnormal(5, 2)
gen x6 = rnormal(3, 1)
** actual difference scores that I desire **
gen d1 = x2 - x1
gen d2 = x4 - x3
gen d3 = x6 - x5
**loop to replicate the difference scores **
local t1 x1 x3 x5
local t2 x2 x4 x6
local counter = 2
foreach v1 of varlist `t1' {
foreach v2 of varlist `t2' {
local ++counter
gen d`counter' = `v2'-`v1'
}
}
sum
drop *
At the moment, the loop provides me every possible combination between `v1' and `v2'. I only want the three. How do I "tame" the loop? Thank you in advance.
I am working on a project where I need difference scores. I will have a number of difference scores, so typing the command over-and-over will become tedious. I am trying to develop a loop that will provide me with ONLY the difference scores for my variables of interest. I realize there is probably some process that I am overlooking, so I need some assistance, please. I am working with the following code:
** setting up data **
set obs 10000
set seed 40245
*set trace on
gen x1 = rnormal(25, 10)
gen x2 = rnormal(12, 3)
gen x3 = rnormal(10, 5)
gen x4 = rnormal(15, 8)
gen x5 = rnormal(5, 2)
gen x6 = rnormal(3, 1)
** actual difference scores that I desire **
gen d1 = x2 - x1
gen d2 = x4 - x3
gen d3 = x6 - x5
**loop to replicate the difference scores **
local t1 x1 x3 x5
local t2 x2 x4 x6
local counter = 2
foreach v1 of varlist `t1' {
foreach v2 of varlist `t2' {
local ++counter
gen d`counter' = `v2'-`v1'
}
}
sum
drop *
At the moment, the loop provides me every possible combination between `v1' and `v2'. I only want the three. How do I "tame" the loop? Thank you in advance.
Cross tabulations based on multiple dummy variables
Hello,
I have a large dataset for N= 59,000,000 observations.
I have 154 dummy variables relating to a set of diagnosis codes, and 173 dummy variables relating to service codes ( e.g. chemotherapy, orthopedics etc). The service codes all begin with ncb*. I have one string variable that identifies individuals and one cost variable (float).
I need to produce a cross tab of diagnosis code (column) *service code(row) and count the number individuals in each cell. Some people will have no diagnosis and no service code. I would then like to repeat with a sum of cost rather than count of individuals
In the past I have used collapse and reshape, to get the output I would like, but with columns being dummy variables I am unsure of how to do this.
Ideally I would like to produce a new Stata file as this will be easier to export to excel I think.
Below is a sample of the data but with the id variable not included - because it is a long string. I have included the cost variable. I am assuming I could replace a sum with a count to count the ID rather than sum the cost.
Can anyone advise please?
I have a large dataset for N= 59,000,000 observations.
I have 154 dummy variables relating to a set of diagnosis codes, and 173 dummy variables relating to service codes ( e.g. chemotherapy, orthopedics etc). The service codes all begin with ncb*. I have one string variable that identifies individuals and one cost variable (float).
I need to produce a cross tab of diagnosis code (column) *service code(row) and count the number individuals in each cell. Some people will have no diagnosis and no service code. I would then like to repeat with a sum of cost rather than count of individuals
In the past I have used collapse and reshape, to get the output I would like, but with columns being dummy variables I am unsure of how to do this.
Ideally I would like to produce a new Stata file as this will be easier to export to excel I think.
Below is a sample of the data but with the id variable not included - because it is a long string. I have included the cost variable. I am assuming I could replace a sum with a count to count the ID rather than sum the cost.
Can anyone advise please?
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float total_ss_costs byte(r69 r50_r68 r47_r49 r40_r46 ncbps29m_hist ncbps29b_hist ncbps27b_hist)
0 0 0 0 01 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
7143.572 0 1 0 0 1 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
141.99995 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
6460.776 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
903.4579 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
4346.248 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0
426.3519 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
13727.894 0 1 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
284.62573 0 0 0 0 0 0 1
110.00005 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0
529 0 0 1 0 0 0 1
0 0 0 0 0 0 0 0
endOrdered Probit with 2 continuous interaction term
Hello all,
I am doing an ordered probit regression as follows:
Y takes the value from 1 to 10
X is a continuous variable
Z is also a continuous variable
My regression is: oprobit Y X c.X#c.Z
Please tell me how can I interpret the interaction variable c, since both X and Z are continuous so the command margins X, at (Z=1) does not work
I am doing an ordered probit regression as follows:
Y takes the value from 1 to 10
X is a continuous variable
Z is also a continuous variable
My regression is: oprobit Y X c.X#c.Z
Please tell me how can I interpret the interaction variable c, since both X and Z are continuous so the command margins X, at (Z=1) does not work
Interaction terms on robust standard errors result in no F-stat and p-value
Hi all,
I am currently experiencing some difficulties with my model, and I hope you will be able to help me.
When I'm running my regression I get no F-stat and p-value output. The problem is occurring when I'm including dummy variables (in order to treat missing values, MIM), robust SE (to fulfill MLR. 5, Homoscedasticity), and several interaction terms (included in order to achieve linearity in parameters, RESET test) to my regression. (actually also only when running my regression with interaction terms and the robust SE).
Do you have any apparent explanation for why I do not get any F-stat and p-value output?
Thanks in advance.
Best regards,
Marcus Frellsen.
I am currently experiencing some difficulties with my model, and I hope you will be able to help me.
When I'm running my regression I get no F-stat and p-value output. The problem is occurring when I'm including dummy variables (in order to treat missing values, MIM), robust SE (to fulfill MLR. 5, Homoscedasticity), and several interaction terms (included in order to achieve linearity in parameters, RESET test) to my regression. (actually also only when running my regression with interaction terms and the robust SE).
Do you have any apparent explanation for why I do not get any F-stat and p-value output?
Thanks in advance.
Best regards,
Marcus Frellsen.
marginal effects in hurdle model
Dear All
I'm fitting a hurdle model (with the command churdle linear) and use the margin (margins, dydx(CategoricalVariable)). I wonder why the statistical significance is different between the hurdle linear and the margin.
Hurdle linear part
SED_wkday | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
SED_wkday |
speaking_wk | -13.15755 16.96346 -0.78 0.438 -46.40532 20.09022
texting_wk | -22.34044 11.64429 -1.92 0.055 -45.16282 .4819424
social_net_wk | -50.51344 8.730203 -5.79 0.000 -67.62433 -33.40256
browsing_wk | -24.73634 11.57269 -2.14 0.033 -47.41838 -2.054286
Margin
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
speaking_wk | -13.60962 10.8538 -1.25 0.210 -34.88268 7.663439
texting_wk | -9.302818 7.627541 -1.22 0.223 -24.25252 5.646887
social_net_wk | -19.89934 5.714631 -3.48 0.000 -31.09981 -8.698867
browsing_wk | -.7922133 7.934503 -0.10 0.920 -16.34355 14.75913
Please could someone help me? what's going on and why the statistical significance differ?
In addition, how could I fit an interaction in hurdle model?
Thank you
Ellie
I'm fitting a hurdle model (with the command churdle linear) and use the margin (margins, dydx(CategoricalVariable)). I wonder why the statistical significance is different between the hurdle linear and the margin.
Hurdle linear part
SED_wkday | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
SED_wkday |
speaking_wk | -13.15755 16.96346 -0.78 0.438 -46.40532 20.09022
texting_wk | -22.34044 11.64429 -1.92 0.055 -45.16282 .4819424
social_net_wk | -50.51344 8.730203 -5.79 0.000 -67.62433 -33.40256
browsing_wk | -24.73634 11.57269 -2.14 0.033 -47.41838 -2.054286
Margin
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
speaking_wk | -13.60962 10.8538 -1.25 0.210 -34.88268 7.663439
texting_wk | -9.302818 7.627541 -1.22 0.223 -24.25252 5.646887
social_net_wk | -19.89934 5.714631 -3.48 0.000 -31.09981 -8.698867
browsing_wk | -.7922133 7.934503 -0.10 0.920 -16.34355 14.75913
Please could someone help me? what's going on and why the statistical significance differ?
In addition, how could I fit an interaction in hurdle model?
Thank you
Ellie
Intra-Day Data Differences
Hi,
I am working with the following intra-day data.
I want to take the difference between two successive prices (the intra-day interval is not of fixed length) over a window of 10 days, as in the second step I want to sum up the differences over the same interval. The problem I am facing in taking up the differences is that the the window is defined over a number of days whereas the data is intra-day. Would appreciate any help in this regards. Thank You.
I am working with the following intra-day data.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id long date double(time price) 1 21823 56100000 102.387 1 21824 33596000 102.288 1 21824 47710000 102.397 1 21825 50009000 102.742 1 21825 60499000 102.702736 1 21826 12119000 102.582 1 21826 32789000 102.565 1 21826 43951000 102.609 1 21826 54412000 102.617 1 21826 54412000 102.55451 1 21829 37797000 102.545 1 21829 45684000 102.522 1 21830 44628000 102.536 1 21831 35067000 102.645 1 21831 37511000 102.542 1 21831 48165000 102.174 1 21831 48165000 102.474 1 21831 64399000 102.4974 1 21831 64399000 102.4974 1 21832 36633000 102.355 1 21832 57553000 102.299 1 21832 57995000 102.338645 1 21833 40237000 102.319 1 21836 57954000 102.080452 1 21837 30188000 102.332 1 21837 32455999.999999996 102.268 1 21837 44701000 102.073 1 21838 32155999.999999996 102.259 1 21838 33861000 102.403 1 21838 33861000 102.403 1 21838 33861000 102.403 1 21839 40823000 102.237 1 21839 40823000 102.237 1 21840 10127000 102.424 1 21840 28777000 102.423 1 21840 42387000 102.453 1 21840 42387000 102.453 1 21843 36383000 102.36 1 21843 42529000 102.327 1 21843 44051000 102.39 1 21843 46214000 102.356 1 21843 46214000 102.206 1 21843 46214000 102.206 1 21843 47160000 102.287 1 21843 47160000 102.137 1 21843 47160000 102.137 1 21843 54017000 102.389 1 21844 42487000 102.378 1 21844 51397000 102.388 1 21844 56163000 102.403 1 21845 31133000 102.336 1 21845 39218000 102.36 1 21845 40921000 102.353 1 21845 57404000 102.357 1 21846 33927000 102.367 1 21846 33927000 102.315821 1 21846 37407000 102.452 1 21846 37795000 102.537 1 21846 40807000 102.437 1 21846 40807000 102.437 1 21846 43021000 102.497 1 21847 21949000 102.467 1 21847 29852000 102.238 1 21850 32089000.000000004 102.146 1 21850 33233000 102.087 1 21851 35148000 102.856 1 21851 35149000 102.189 1 21851 37061000 102.181 1 21851 41673000 102.189 1 21851 63717000 102.268 1 21852 31804000.000000004 102.134 1 21852 31821000 102.095 1 21852 38297000 102.235 1 21852 38310000 102.235 1 21852 40491000 102.155 1 21852 55524000 102.261 1 21853 34598000 102.172 1 21853 34598000 102.272 1 21853 42468000 102.357 1 21854 49875000 102.38718 1 21854 49875000 102.35594 1 21857 35252000 102.308 1 21857 53270000 102.244 1 21858 36819000 102.148 1 21858 40646000 102.207 1 21858 55883000 102.214 1 21858 59147000 102.186285 1 21859 36765000 102.232 1 21859 37575000 102.278 1 21859 39973000 102.165 1 21859 51312000 102.197 1 21860 40842000 102.07 1 21861 54688000 101.9095 1 21861 54688000 102.0095 1 21865 36402000 102.038 1 21865 43839000 101.891 1 21865 43839000 101.954 1 21865 43978000 101.948 1 21865 52278000 102.017 1 21865 58523000 101.968 end format %td date format %tcHH:MM:SS time
Merging data
Dear all,
I want to merge data from two different datasets. The first has only one observation per firm(CUSIP) for a given year, as shown.
The second data set has multiple observations per firm(CUSIP) per year, as it contains information on the individual directors of the firms in the first dataset.
I tried merging 1:m, but after running -tabstat it showed the first dataset had increased significantly in observations. How would I go about merging while I keep the observations in the first dataset at its original level, as only need one observation per year per firm for those variables. The basic idea would be that within one dataset I have observations for all different directors, but for some variables only for a company as a whole.
I want to merge data from two different datasets. The first has only one observation per firm(CUSIP) for a given year, as shown.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str12 CUSIP double year "00036020" 2009 "00036020" 2010 "00036020" 2011 "00036020" 2012 "00036020" 2013 "00036020" 2014 "00036020" 2015 "00036020" 2016 "00036020" 2017 "00036020" 2018 "00036110" 2007 "00036110" 2008 "00036110" 2009 "00036110" 2010 "00036110" 2011 "00036110" 2012 "00036110" 2013 "00036110" 2014 "00036110" 2015 "00036110" 2016 "00036110" 2017 "00036110" 2018 "00088630" 2007 "00088630" 2008 "00088630" 2009 "00095710" 2007 "00095710" 2008 "00095710" 2009 "00095710" 2010 "00095710" 2011 "00095710" 2012 "00095710" 2013 "00095710" 2014 "00095710" 2015 "00095710" 2016 "00095710" 2017 "00095710" 2018 "00101J10" 2013 "00101J10" 2014 "00101J10" 2015 "00105510" 2007 "00105510" 2008 "00105510" 2009 "00105510" 2010 "00105510" 2011 "00105510" 2012 "00105510" 2013 "00105510" 2014 "00105510" 2015 "00105510" 2016 "00105510" 2017 "00105510" 2018 "00108410" 2007 "00108410" 2008 "00108410" 2009 "00108410" 2010 "00108410" 2011 "00108410" 2012 "00108410" 2013 "00108410" 2014 "00108410" 2015 "00108410" 2016 "00108410" 2017 "00108410" 2018 "00130H10" 2007 "00130H10" 2008 "00130H10" 2009 "00130H10" 2010 "00130H10" 2011 "00130H10" 2012 "00130H10" 2013 "00130H10" 2014 "00130H10" 2015 "00130H10" 2016 "00130H10" 2017 "00130H10" 2018 "00130H10" 2019 "00154710" 2008 "00154710" 2009 "00154710" 2010 "00154710" 2011 "00154710" 2012 "00154710" 2013 "00154710" 2014 "00154710" 2015 "00154710" 2016 "00154710" 2017 "00154710" 2018 "00163T10" 2007 "00163T10" 2008 "00163T10" 2009 "00163T10" 2010 "00163U10" 2016 "00163U10" 2017 "00163U10" 2018 "00163U10" 2019 "00164V10" 2012 "00164V10" 2013 "00164V10" 2014 "00164V10" 2015 end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str12 CUSIP double year "00846U101" 2007 "00846U101" 2007 "00846U101" 2007 "00846U101" 2007 "00846U101" 2007 "00846U101" 2007 "00846U101" 2007 "00846U101" 2007 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2008 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2009 "00846U101" 2010 "00846U101" 2010 "00846U101" 2010 "00846U101" 2010 "00846U101" 2010 "00846U101" 2010 "00846U101" 2010 "00846U101" 2010 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2011 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2012 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2013 "00846U101" 2014 "00846U101" 2014 "00846U101" 2014 "00846U101" 2014 "00846U101" 2014 "00846U101" 2014 "00846U101" 2014 "00846U101" 2014 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2015 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2016 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2017 "00846U101" 2018 "00846U101" 2018 end
Anyone want to do my homework?
Anyone want to do a statistics assignment for me? I am strugglinggg. It's an elementary course for Stata and I'm suppose to analyze political data and making sense of it. My research question is "are unhappy people more likely to engage in political rally's/marches.". I have consensus data that I'm suppose to manipulate to answer this. Looking at standard deviations, differences of the mean, standard error, p-values etc...
expression too long r(130) while specifying starting values
I am using sfpanel() command and I need to set initial values for the variables before running the regression.
In order to set the initial values, I am using the ‘svfrontier’ argument of sfpanel().
Since I’m including user fixed effects (and there are ~2000 Users in the dataset), I need to specify around 2000 starting values. This becomes a very long expression!
Command: Sfpanel y dummy2 dummy3 i.userid, model(tfe) dist (tnormal) svfrontier( 1 2 3 … 2000 )
The numbers inside the bracket (i.e. 1 2 3…) are the starting values here (say).
I’m getting the following Error:
expression too long
r(130);
Things I've tried: 1.) I have tried this for a smaller dataset (that had 15 users) and this worked perfectly fine but for the complete dataset (~2000 users), we've a longer expression and this becomes a problem. 2.) I tried the delimit command in the do file as well:
#delimit ; sfpanel y dummy2 dummy3 i.userid, model(tfe) dist(tnormal) svfrontier(1 2 3 ... 2000); #delimit cr This didn't work for the complete dataset. (although it did work for the smaller dataset, so shouldn't be a syntax error)
Any help will be greatly appreciated! Thanks, Aishvarya
In order to set the initial values, I am using the ‘svfrontier’ argument of sfpanel().
Since I’m including user fixed effects (and there are ~2000 Users in the dataset), I need to specify around 2000 starting values. This becomes a very long expression!
Command: Sfpanel y dummy2 dummy3 i.userid, model(tfe) dist (tnormal) svfrontier( 1 2 3 … 2000 )
The numbers inside the bracket (i.e. 1 2 3…) are the starting values here (say).
I’m getting the following Error:
expression too long
r(130);
Things I've tried: 1.) I have tried this for a smaller dataset (that had 15 users) and this worked perfectly fine but for the complete dataset (~2000 users), we've a longer expression and this becomes a problem. 2.) I tried the delimit command in the do file as well:
#delimit ; sfpanel y dummy2 dummy3 i.userid, model(tfe) dist(tnormal) svfrontier(1 2 3 ... 2000); #delimit cr This didn't work for the complete dataset. (although it did work for the smaller dataset, so shouldn't be a syntax error)
Any help will be greatly appreciated! Thanks, Aishvarya
Sunday, November 29, 2020
Combining multiple variables
Hello!
I just started learning Stata and I am using a consensus data set and trying to combine a bunch of different variables to make one "Happy People" vs "Unhappy People" variable and then see which group is more active in Political Rallys. The variables I want to use are:
p_psych 1 (Self- Described Anxious rated 1 - 5)
p_mental (Mental Health rated 1 - 5)
p_life_ideal (In most ways my life is ideal rated 1 - 5)
p_life_sat (I am satisfied with my life rated 1 - 5)
If there's a way to combine these I would love to hear your thoughts! I have no idea what I'm doing so anything would help!
Thank you!
I just started learning Stata and I am using a consensus data set and trying to combine a bunch of different variables to make one "Happy People" vs "Unhappy People" variable and then see which group is more active in Political Rallys. The variables I want to use are:
p_psych 1 (Self- Described Anxious rated 1 - 5)
p_mental (Mental Health rated 1 - 5)
p_life_ideal (In most ways my life is ideal rated 1 - 5)
p_life_sat (I am satisfied with my life rated 1 - 5)
If there's a way to combine these I would love to hear your thoughts! I have no idea what I'm doing so anything would help!
Thank you!
Combining multiple variables
Hello!
I jsut started learning Stata and I am using a consensus data set and trying to combine a bunch of different variables to make one "Happy People" vs "Unhappy People" variable and then see which group is more active in Political Rallys. The variables I want to use are:
p_psych 1 (Self- Described Anxious rated 1 - 5)
p_mental (Mental Health rated 1 - 5)
p_life_ideal (In most ways my life is ideal rated 1 - 5)
p_life_sat (I am satisfied with my life rated 1 - 5)
If there's a way to combine these I would love to hear your thoughts! I have no idea what I'm doing so anything would help!
Thank you!
I jsut started learning Stata and I am using a consensus data set and trying to combine a bunch of different variables to make one "Happy People" vs "Unhappy People" variable and then see which group is more active in Political Rallys. The variables I want to use are:
p_psych 1 (Self- Described Anxious rated 1 - 5)
p_mental (Mental Health rated 1 - 5)
p_life_ideal (In most ways my life is ideal rated 1 - 5)
p_life_sat (I am satisfied with my life rated 1 - 5)
If there's a way to combine these I would love to hear your thoughts! I have no idea what I'm doing so anything would help!
Thank you!
Split file by group i.e. group 1 (non clinical) vs group 2 (clinical)
Hi there,
Might someone please advise how I split my Stata dataset by 1 variable "Group" which is has two levels 1 (non clinical) vs group 2 (clinical) as I would like to be able to compare those two groups?
I think I'm overthinking this - many thanks in advance for your time and expertise.
Might someone please advise how I split my Stata dataset by 1 variable "Group" which is has two levels 1 (non clinical) vs group 2 (clinical) as I would like to be able to compare those two groups?
I think I'm overthinking this - many thanks in advance for your time and expertise.
Control variable that varies accross time but not accros countries how to add to dataset and regression
Hello everyone,
For my regression i am using the VIX as a proxy for volatility and the TED spread as a proxy for the global risk free rate, my data consist of around 85 countries over 20 year in a long format. These 2 rates vary across time but not between countries, so they change from year to year over the 20 years but are the same for all countries for each individual year as they proxy for global factors. I am wondering how to include these 2 variables into my dataset, for now i have added them by copying them accross the 20 years so they are exactly the same for each country, i was wondering if this is the correct approach of adding them or if there is some other way to add them into the regression or data.
Kind regards,
Max Verheijen
For my regression i am using the VIX as a proxy for volatility and the TED spread as a proxy for the global risk free rate, my data consist of around 85 countries over 20 year in a long format. These 2 rates vary across time but not between countries, so they change from year to year over the 20 years but are the same for all countries for each individual year as they proxy for global factors. I am wondering how to include these 2 variables into my dataset, for now i have added them by copying them accross the 20 years so they are exactly the same for each country, i was wondering if this is the correct approach of adding them or if there is some other way to add them into the regression or data.
Kind regards,
Max Verheijen
Quintile Sorting two variables
I have monthly observations from various funds over along amount of time but the time series are unbalanced. For each fund I have an average alpha pro fund and a turnover ratio per year and now I want to divide the turnover ratio into quintiles and then sort the mean alphas into the correct turnover quintiles to see in the end which quintile has the best alphas.(i. e. Do more active fonds perform better?)
In the end I need two panels: Panel A that divides every observation in the quintiles, so that one fund can jump into quintiles and Panel B where I measure a mean Turnover for each funds and a mean Alpha for each fund an divide each fund in one fixed quintile based on their mean Turnover Ratio and Alpha.
My data in dataex:
clear
input double(portfolioid Turnover) float mean_Turnover double AlphaCAPM float mean_AlphaCAPM
4 1.6770000457763672 1.417865 .0011276676681867825 -.0006093712
4 1.6770000457763672 1.417865 -.00066037569874813 -.0006093712
4 1.6770000457763672 1.417865 -.003203570453383219 -.0006093712
4 1.6770000457763672 1.417865 -.005067735517733929 -.0006093712
4 1.6770000457763672 1.417865 -.004276204678426189 -.0006093712
4 1.6770000457763672 1.417865 -.00264340576862393 -.0006093712
4 1.6770000457763672 1.417865 -.004977918697552257 -.0006093712
4 1.6770000457763672 1.417865 -.0013603183772131176 -.0006093712
4 1.6770000457763672 1.417865 -.0020266099299921167 -.0006093712
4 1.6770000457763672 1.417865 -.0011593780606042707 -.0006093712
4 1.6770000457763672 1.417865 -.0017615356820123058 -.0006093712
4 1.6770000457763672 1.417865 -.0024929980839788132 -.0006093712
4 1.6770000457763672 1.417865 .0028820224147934476 -.0006093712
4 1.6770000457763672 1.417865 .004593126737425821 -.0006093712
4 1.6770000457763672 1.417865 .003565814453837919 -.0006093712
4 1.6770000457763672 1.417865 .0015064737708275246 -.0006093712
4 1.6770000457763672 1.417865 -.001450077625743531 -.0006093712
4 1.3890000581741333 1.417865 .0025399518285950835 -.0006093712
4 1.3890000581741333 1.417865 -.00008314149448057293 -.0006093712
4 1.3890000581741333 1.417865 -.00264706689732443 -.0006093712
4 1.3890000581741333 1.417865 -.0046777288729113395 -.0006093712
4 1.3890000581741333 1.417865 -.00914245381680592 -.0006093712
4 1.3890000581741333 1.417865 -.00700097423044709 -.0006093712
4 1.3890000581741333 1.417865 -.008433296568919339 -.0006093712
4 1.3890000581741333 1.417865 -.013204861540199477 -.0006093712
4 1.3890000581741333 1.417865 -.014364183593742185 -.0006093712
4 1.3890000581741333 1.417865 -.015444051513141611 -.0006093712
4 1.3890000581741333 1.417865 -.01279344256693269 -.0006093712
4 1.3890000581741333 1.417865 -.009466272067912462 -.0006093712
4 1.3890000581741333 1.417865 -.007383238950604474 -.0006093712
4 1.3890000581741333 1.417865 -.006018720126566169 -.0006093712
4 1.3890000581741333 1.417865 -.006023297375925912 -.0006093712
4 1.3890000581741333 1.417865 -.0032789059631669858 -.0006093712
4 1.3890000581741333 1.417865 -.004582010000065449 -.0006093712
4 1.3890000581741333 1.417865 -.0033608506502379766 -.0006093712
4 1.100000023841858 1.417865 .03921126931429379 -.0006093712
4 1.100000023841858 1.417865 .037104918792983395 -.0006093712
4 1.100000023841858 1.417865 .036233238601829657 -.0006093712
4 1.100000023841858 1.417865 .03443272530766962 -.0006093712
4 1.100000023841858 1.417865 .02526996563362867 -.0006093712
4 1.100000023841858 1.417865 .019073450673706516 -.0006093712
4 1.100000023841858 1.417865 .018588612534819627 -.0006093712
4 1.100000023841858 1.417865 .007151904262588452 -.0006093712
4 1.100000023841858 1.417865 .014357413609561769 -.0006093712
4 1.100000023841858 1.417865 .014499444223885 -.0006093712
4 1.100000023841858 1.417865 .013921937092171734 -.0006093712
4 1.100000023841858 1.417865 .01271685737387346 -.0006093712
4 1.5399999618530273 1.417865 .012079164666963182 -.0006093712
4 1.5399999618530273 1.417865 .00716736984339204 -.0006093712
4 1.5399999618530273 1.417865 .00788063315810916 -.0006093712
4 1.5399999618530273 1.417865 .00681633879941004 -.0006093712
4 1.5399999618530273 1.417865 .006783476955592701 -.0006093712
4 1.5399999618530273 1.417865 .0008174012149019787 -.0006093712
4 1.5399999618530273 1.417865 .00277341205796916 -.0006093712
4 1.5399999618530273 1.417865 .0017975961479823905 -.0006093712
4 1.5399999618530273 1.417865 .002657955196104919 -.0006093712
4 1.5399999618530273 1.417865 .0009851211820280752 -.0006093712
4 1.5399999618530273 1.417865 .0048100191602722 -.0006093712
4 1.5399999618530273 1.417865 .005723510274338288 -.0006093712
4 1.3799999952316284 1.417865 .005084054580076671 -.0006093712
4 1.3799999952316284 1.417865 .0074259662326077325 -.0006093712
4 1.3799999952316284 1.417865 .01017841319013133 -.0006093712
4 1.3799999952316284 1.417865 .011344522290153904 -.0006093712
4 1.3799999952316284 1.417865 .0119847233815383 -.0006093712
4 1.3799999952316284 1.417865 .00846514544659567 -.0006093712
4 1.3799999952316284 1.417865 .00872006067702675 -.0006093712
4 1.3799999952316284 1.417865 .007937297081933515 -.0006093712
4 1.3799999952316284 1.417865 .008264231995338047 -.0006093712
4 1.3799999952316284 1.417865 .002626290973518384 -.0006093712
4 1.3799999952316284 1.417865 .004769733071080048 -.0006093712
4 1.3799999952316284 1.417865 .004288828037580048 -.0006093712
4 1.5099999904632568 1.417865 .0030103450348373043 -.0006093712
4 1.5099999904632568 1.417865 .0010676980608890965 -.0006093712
4 1.5099999904632568 1.417865 -.00198054981620948 -.0006093712
4 1.5099999904632568 1.417865 -.0002576709104457238 -.0006093712
4 1.5099999904632568 1.417865 .001719370536849455 -.0006093712
4 1.5099999904632568 1.417865 -.0010946970159981077 -.0006093712
4 1.5099999904632568 1.417865 -.00041203850616485654 -.0006093712
4 1.5099999904632568 1.417865 .0008502249132824116 -.0006093712
4 1.5099999904632568 1.417865 -.0010846949403825468 -.0006093712
4 1.5099999904632568 1.417865 -.003410998505710562 -.0006093712
4 1.5099999904632568 1.417865 -.0031783749851852297 -.0006093712
4 1.5099999904632568 1.417865 -.002886334290051005 -.0006093712
4 1.5499999523162842 1.417865 -.0023420224475766208 -.0006093712
4 1.5499999523162842 1.417865 -.0011567322096284784 -.0006093712
4 1.5499999523162842 1.417865 -.002413825004820118 -.0006093712
4 1.5499999523162842 1.417865 -.003561881440058567 -.0006093712
4 1.5499999523162842 1.417865 -.004437255144742497 -.0006093712
4 1.5499999523162842 1.417865 -.001321013824925759 -.0006093712
4 1.5499999523162842 1.417865 .0009930816087858965 -.0006093712
4 1.5499999523162842 1.417865 -.000030053225495138558 -.0006093712
4 1.5499999523162842 1.417865 -.00025366199256105816 -.0006093712
4 1.5499999523162842 1.417865 .0031576600593439044 -.0006093712
4 1.5499999523162842 1.417865 .0017091193316435445 -.0006093712
4 1.5499999523162842 1.417865 .002972087483720765 -.0006093712
4 1.2799999713897705 1.417865 .0015071261897042498 -.0006093712
4 1.2799999713897705 1.417865 -.000771124694131986 -.0006093712
4 1.2799999713897705 1.417865 -.002834069970186823 -.0006093712
4 1.2799999713897705 1.417865 -.0039409218708400025 -.0006093712
4 1.2799999713897705 1.417865 -.004219105288234017 -.0006093712
end
In the end I need two panels: Panel A that divides every observation in the quintiles, so that one fund can jump into quintiles and Panel B where I measure a mean Turnover for each funds and a mean Alpha for each fund an divide each fund in one fixed quintile based on their mean Turnover Ratio and Alpha.
My data in dataex:
clear
input double(portfolioid Turnover) float mean_Turnover double AlphaCAPM float mean_AlphaCAPM
4 1.6770000457763672 1.417865 .0011276676681867825 -.0006093712
4 1.6770000457763672 1.417865 -.00066037569874813 -.0006093712
4 1.6770000457763672 1.417865 -.003203570453383219 -.0006093712
4 1.6770000457763672 1.417865 -.005067735517733929 -.0006093712
4 1.6770000457763672 1.417865 -.004276204678426189 -.0006093712
4 1.6770000457763672 1.417865 -.00264340576862393 -.0006093712
4 1.6770000457763672 1.417865 -.004977918697552257 -.0006093712
4 1.6770000457763672 1.417865 -.0013603183772131176 -.0006093712
4 1.6770000457763672 1.417865 -.0020266099299921167 -.0006093712
4 1.6770000457763672 1.417865 -.0011593780606042707 -.0006093712
4 1.6770000457763672 1.417865 -.0017615356820123058 -.0006093712
4 1.6770000457763672 1.417865 -.0024929980839788132 -.0006093712
4 1.6770000457763672 1.417865 .0028820224147934476 -.0006093712
4 1.6770000457763672 1.417865 .004593126737425821 -.0006093712
4 1.6770000457763672 1.417865 .003565814453837919 -.0006093712
4 1.6770000457763672 1.417865 .0015064737708275246 -.0006093712
4 1.6770000457763672 1.417865 -.001450077625743531 -.0006093712
4 1.3890000581741333 1.417865 .0025399518285950835 -.0006093712
4 1.3890000581741333 1.417865 -.00008314149448057293 -.0006093712
4 1.3890000581741333 1.417865 -.00264706689732443 -.0006093712
4 1.3890000581741333 1.417865 -.0046777288729113395 -.0006093712
4 1.3890000581741333 1.417865 -.00914245381680592 -.0006093712
4 1.3890000581741333 1.417865 -.00700097423044709 -.0006093712
4 1.3890000581741333 1.417865 -.008433296568919339 -.0006093712
4 1.3890000581741333 1.417865 -.013204861540199477 -.0006093712
4 1.3890000581741333 1.417865 -.014364183593742185 -.0006093712
4 1.3890000581741333 1.417865 -.015444051513141611 -.0006093712
4 1.3890000581741333 1.417865 -.01279344256693269 -.0006093712
4 1.3890000581741333 1.417865 -.009466272067912462 -.0006093712
4 1.3890000581741333 1.417865 -.007383238950604474 -.0006093712
4 1.3890000581741333 1.417865 -.006018720126566169 -.0006093712
4 1.3890000581741333 1.417865 -.006023297375925912 -.0006093712
4 1.3890000581741333 1.417865 -.0032789059631669858 -.0006093712
4 1.3890000581741333 1.417865 -.004582010000065449 -.0006093712
4 1.3890000581741333 1.417865 -.0033608506502379766 -.0006093712
4 1.100000023841858 1.417865 .03921126931429379 -.0006093712
4 1.100000023841858 1.417865 .037104918792983395 -.0006093712
4 1.100000023841858 1.417865 .036233238601829657 -.0006093712
4 1.100000023841858 1.417865 .03443272530766962 -.0006093712
4 1.100000023841858 1.417865 .02526996563362867 -.0006093712
4 1.100000023841858 1.417865 .019073450673706516 -.0006093712
4 1.100000023841858 1.417865 .018588612534819627 -.0006093712
4 1.100000023841858 1.417865 .007151904262588452 -.0006093712
4 1.100000023841858 1.417865 .014357413609561769 -.0006093712
4 1.100000023841858 1.417865 .014499444223885 -.0006093712
4 1.100000023841858 1.417865 .013921937092171734 -.0006093712
4 1.100000023841858 1.417865 .01271685737387346 -.0006093712
4 1.5399999618530273 1.417865 .012079164666963182 -.0006093712
4 1.5399999618530273 1.417865 .00716736984339204 -.0006093712
4 1.5399999618530273 1.417865 .00788063315810916 -.0006093712
4 1.5399999618530273 1.417865 .00681633879941004 -.0006093712
4 1.5399999618530273 1.417865 .006783476955592701 -.0006093712
4 1.5399999618530273 1.417865 .0008174012149019787 -.0006093712
4 1.5399999618530273 1.417865 .00277341205796916 -.0006093712
4 1.5399999618530273 1.417865 .0017975961479823905 -.0006093712
4 1.5399999618530273 1.417865 .002657955196104919 -.0006093712
4 1.5399999618530273 1.417865 .0009851211820280752 -.0006093712
4 1.5399999618530273 1.417865 .0048100191602722 -.0006093712
4 1.5399999618530273 1.417865 .005723510274338288 -.0006093712
4 1.3799999952316284 1.417865 .005084054580076671 -.0006093712
4 1.3799999952316284 1.417865 .0074259662326077325 -.0006093712
4 1.3799999952316284 1.417865 .01017841319013133 -.0006093712
4 1.3799999952316284 1.417865 .011344522290153904 -.0006093712
4 1.3799999952316284 1.417865 .0119847233815383 -.0006093712
4 1.3799999952316284 1.417865 .00846514544659567 -.0006093712
4 1.3799999952316284 1.417865 .00872006067702675 -.0006093712
4 1.3799999952316284 1.417865 .007937297081933515 -.0006093712
4 1.3799999952316284 1.417865 .008264231995338047 -.0006093712
4 1.3799999952316284 1.417865 .002626290973518384 -.0006093712
4 1.3799999952316284 1.417865 .004769733071080048 -.0006093712
4 1.3799999952316284 1.417865 .004288828037580048 -.0006093712
4 1.5099999904632568 1.417865 .0030103450348373043 -.0006093712
4 1.5099999904632568 1.417865 .0010676980608890965 -.0006093712
4 1.5099999904632568 1.417865 -.00198054981620948 -.0006093712
4 1.5099999904632568 1.417865 -.0002576709104457238 -.0006093712
4 1.5099999904632568 1.417865 .001719370536849455 -.0006093712
4 1.5099999904632568 1.417865 -.0010946970159981077 -.0006093712
4 1.5099999904632568 1.417865 -.00041203850616485654 -.0006093712
4 1.5099999904632568 1.417865 .0008502249132824116 -.0006093712
4 1.5099999904632568 1.417865 -.0010846949403825468 -.0006093712
4 1.5099999904632568 1.417865 -.003410998505710562 -.0006093712
4 1.5099999904632568 1.417865 -.0031783749851852297 -.0006093712
4 1.5099999904632568 1.417865 -.002886334290051005 -.0006093712
4 1.5499999523162842 1.417865 -.0023420224475766208 -.0006093712
4 1.5499999523162842 1.417865 -.0011567322096284784 -.0006093712
4 1.5499999523162842 1.417865 -.002413825004820118 -.0006093712
4 1.5499999523162842 1.417865 -.003561881440058567 -.0006093712
4 1.5499999523162842 1.417865 -.004437255144742497 -.0006093712
4 1.5499999523162842 1.417865 -.001321013824925759 -.0006093712
4 1.5499999523162842 1.417865 .0009930816087858965 -.0006093712
4 1.5499999523162842 1.417865 -.000030053225495138558 -.0006093712
4 1.5499999523162842 1.417865 -.00025366199256105816 -.0006093712
4 1.5499999523162842 1.417865 .0031576600593439044 -.0006093712
4 1.5499999523162842 1.417865 .0017091193316435445 -.0006093712
4 1.5499999523162842 1.417865 .002972087483720765 -.0006093712
4 1.2799999713897705 1.417865 .0015071261897042498 -.0006093712
4 1.2799999713897705 1.417865 -.000771124694131986 -.0006093712
4 1.2799999713897705 1.417865 -.002834069970186823 -.0006093712
4 1.2799999713897705 1.417865 -.0039409218708400025 -.0006093712
4 1.2799999713897705 1.417865 -.004219105288234017 -.0006093712
end
Displaying results in full form - double format
Hi All,
I am using the following command, in stata 13.1
total subsidyamt, over(fiscal_year1)
to calculate the total amount of subsidy provided in each fiscal year, however, the results display the total subsidy amount in their scientific notation, is there a way by which we could display the results in full form integers, FYI, the display needs to be in double format
At present the results are displayed as below
Over Total Std. Err. [95% Conf. Interval]
subsidyamt
1993 566000 12070.11 542342.9 589657.1
1994 4.54e+07 107917.5 4.52e+07 4.56e+07
1995 6.38e+07 128991.8 6.36e+07 6.41e+07
1996 7.34e+07 141826.7 7.31e+07 7.36e+07
1997 8.54e+07 158783.3 8.51e+07 8.58e+07
1998 9.23e+07 168680.5 9.20e+07 9.26e+07
1999 1.10e+08 191806.5 1.09e+08 1.10e+08
2000 1.28e+08 203604 1.28e+08 1.29e+08
2001 1.15e+08 200214.9 1.14e+08 1.15e+08
2002 1.14e+08 197153.4 1.14e+08 1.14e+08
2003 8.37e+07 175729.6 8.33e+07 8.40e+07
2004 1.16e+08 207778.1 1.15e+08 1.16e+08
2005 1.16e+08 208659.8 1.15e+08 1.16e+08
2006 1.29e+08 199145.2 1.28e+08 1.29e+08
2007 1.18e+08 205689.4 1.18e+08 1.19e+08
2008 2.10e+08 218814.2 2.10e+08 2.11e+08
2009 2.56e+08 247226.8 2.56e+08 2.57e+08
2010 2.18e+08 218655 2.17e+08 2.18e+08
2011 1.84e+08 204790.8 1.84e+08 1.84e+08
2012 5.70e+08 534520 5.69e+08 5.71e+08
2013 1.04e+09 720283.5 1.04e+09 1.04e+09
2014 6.10e+08 561786.4 6.09e+08 6.11e+08
2015 3.57e+08 441164.8 3.56e+08 3.58e+08
2016 2.55e+07 214876.5 2.50e+07 2.59e+07
2018 9.39e+07 163531.4 9.36e+07 9.42e+07
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 100 out of 366242 observations
Use the count() option to list more
Thank you.
I am using the following command, in stata 13.1
total subsidyamt, over(fiscal_year1)
to calculate the total amount of subsidy provided in each fiscal year, however, the results display the total subsidy amount in their scientific notation, is there a way by which we could display the results in full form integers, FYI, the display needs to be in double format
At present the results are displayed as below
Over Total Std. Err. [95% Conf. Interval]
subsidyamt
1993 566000 12070.11 542342.9 589657.1
1994 4.54e+07 107917.5 4.52e+07 4.56e+07
1995 6.38e+07 128991.8 6.36e+07 6.41e+07
1996 7.34e+07 141826.7 7.31e+07 7.36e+07
1997 8.54e+07 158783.3 8.51e+07 8.58e+07
1998 9.23e+07 168680.5 9.20e+07 9.26e+07
1999 1.10e+08 191806.5 1.09e+08 1.10e+08
2000 1.28e+08 203604 1.28e+08 1.29e+08
2001 1.15e+08 200214.9 1.14e+08 1.15e+08
2002 1.14e+08 197153.4 1.14e+08 1.14e+08
2003 8.37e+07 175729.6 8.33e+07 8.40e+07
2004 1.16e+08 207778.1 1.15e+08 1.16e+08
2005 1.16e+08 208659.8 1.15e+08 1.16e+08
2006 1.29e+08 199145.2 1.28e+08 1.29e+08
2007 1.18e+08 205689.4 1.18e+08 1.19e+08
2008 2.10e+08 218814.2 2.10e+08 2.11e+08
2009 2.56e+08 247226.8 2.56e+08 2.57e+08
2010 2.18e+08 218655 2.17e+08 2.18e+08
2011 1.84e+08 204790.8 1.84e+08 1.84e+08
2012 5.70e+08 534520 5.69e+08 5.71e+08
2013 1.04e+09 720283.5 1.04e+09 1.04e+09
2014 6.10e+08 561786.4 6.09e+08 6.11e+08
2015 3.57e+08 441164.8 3.56e+08 3.58e+08
2016 2.55e+07 214876.5 2.50e+07 2.59e+07
2018 9.39e+07 163531.4 9.36e+07 9.42e+07
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long subsidyamt float fiscal_year1 30000 1953 30000 1969 30000 1982 10000 1993 10000 1993 .... 10000 1993 7000 1993 7000 1993 7000 1993 7000 1994 10000 1994 10000 1994 ... 10000 1994 10000 1994 7000 1994 10000 1994 10000 1994 10000 1994 10000 1994 end
Listed 100 out of 366242 observations
Use the count() option to list more
Thank you.
Very weird spmap problem: "master data not sorted"
Hello all
Suddenly, I am having some unexplained problem with spmap: syntax which previously was working now does not anymore, reporting the error "master data not sorted". Now, I know for sure that sorting _ID in the master data is kind of unrequested, since I can run spmap regardless of the sorting of _ID in other .dta files which still work. This seems to be quite random and sudden: some .dta files no more work with spmap, always reporting such error.
Do you have any idea? I am quite lost now.
Thanks in advance, kind regards.
Suddenly, I am having some unexplained problem with spmap: syntax which previously was working now does not anymore, reporting the error "master data not sorted". Now, I know for sure that sorting _ID in the master data is kind of unrequested, since I can run spmap regardless of the sorting of _ID in other .dta files which still work. This seems to be quite random and sudden: some .dta files no more work with spmap, always reporting such error.
Do you have any idea? I am quite lost now.
Thanks in advance, kind regards.
R2 difference from different regressions
Array
Hello, i'm working on replicating Peters and Taylor(2016)
i want to know how to compare R^2 from different regressions like panel c
for example
panel A regression
xi:xterg y x_a i.fyear, fe cluster(gvkey)
panel B regression
xi:xterg y x_b i.fyear, fe cluster(gvkey)
how to test whether the R2 values in Panels A and B are different, taking into account the correlation across regressions and again clustering by firms.
thanks!
Hello, i'm working on replicating Peters and Taylor(2016)
i want to know how to compare R^2 from different regressions like panel c
for example
panel A regression
xi:xterg y x_a i.fyear, fe cluster(gvkey)
panel B regression
xi:xterg y x_b i.fyear, fe cluster(gvkey)
how to test whether the R2 values in Panels A and B are different, taking into account the correlation across regressions and again clustering by firms.
thanks!
Year and industry fixed efffects
Hello;
I have a paneldataset of the S&P 500 from 2002 to 2019 and need to implement industry and year fixed effects.
However i get this problem if i implement the xtset command
my regression command is xtreg firm_beta esg_single (certain control variables), fe
with xtset company_key year however this if for firm fixed effects
Array
could you help me out?
Thanks, Patrick!
I have a paneldataset of the S&P 500 from 2002 to 2019 and need to implement industry and year fixed effects.
However i get this problem if i implement the xtset command
my regression command is xtreg firm_beta esg_single (certain control variables), fe
with xtset company_key year however this if for firm fixed effects
Array
could you help me out?
Thanks, Patrick!
An issue using esttab syntax
Hello,
I would like to show the `y' variable in excel when running each of the following regressions using "esttab" syntax. In other words, I would like to show in excel which y variable that was used in order to divide the sample for each regression (I have over 50 y variables).
Thank you very much.
I would like to show the `y' variable in excel when running each of the following regressions using "esttab" syntax. In other words, I would like to show in excel which y variable that was used in order to divide the sample for each regression (I have over 50 y variables).
Code:
foreach y of varlist H_* {
eststo: quietly reg `ylist' `xlist'* if `y'==1, vce(cluster firm)
}Code:
esttab using AAA, csv label wide onecell numbers scalars (F) replace
How to rename and format variables from appended tempfile?
Hi Statalist,
I'm trying to append, rename, format and generate new variables. may goals are to;
1. rename variables "SCH_ID" "School_id" and "SchoolID into "schoolid" since these data has same content.
2. generate data source from each of the tempfile
3. destring var "PSCGC_MUN" with string from other sources. My code below
clear all
import excel "...\SY2017-2018 EnrollmentDB.xlsx", sheet("ES") cellrange(A2:AU2366) firstrow
tempfile sy2017_18enrol_es
save `sy2017_18enrol_es', replace
clear all
import excel "...\SY2017-2018 EnrollmentDB.xlsx", sheet("SS") cellrange(A2:BC483) firstrow
tempfile sy2017_18enrol_ss
save `sy2017_18enrol_ss', replace
clear all
import excel "...\SY2018-2019 EnrollmentDB.xlsx", sheet("ES") cellrange(A2:BN2320) firstrow
tempfile sy2018_19enrol_es
save `sy2018_19enrol_es', replace
clear
import excel "...\SY2018-2019 EnrollmentDB.xlsx", sheet("SS") cellrange(A2:BF472) firstrow
tempfile sy2018_19enrol_ss
save `sy2018_19enrol_ss', replace
clear
import excel ...\SY 2019-2020.xlsx", sheet("BARMM") cellrange(A5:CA2723) firstrow
tempfile sy2019_20enrol_es_ss
save `sy2019_20enrol_es_ss', replace
clear
set more off
import excel "...\SY 2020-2021 Enrolment.xlsx", sheet("Sheet1") firstrow
tempfile sy2020_21enrol_es_ss
save `sy2020_21enrol_es_ss', replace
clear all
set more off
local forms `sy2017_18enrol_es' `sy2017_18enrol_ss' `sy2018_19enrol_es' `sy2018_19enrol_ss' `sy2019_20enrol_es_ss' `sy2020_21enrol_es_ss'
foreach file of local forms {
local datasource "`file'"
foreach v in "`file'" {
capture confirm var SCH_TYPE
capture confirm var SCH_ID
capture confirm var School_id
capture confirm numeric var PSCGC_MUN
if !_rc {
capture ren SCH_TYPE SchoolType
capture ren SCH_ID SchoolID
capture ren School_id SchoolID
capture destring PSCGC_MUN, replace
}
gen data=`datasource'
append using "`file'" //don't want to use force here
}
}
Thank you in advance,
I'm trying to append, rename, format and generate new variables. may goals are to;
1. rename variables "SCH_ID" "School_id" and "SchoolID into "schoolid" since these data has same content.
2. generate data source from each of the tempfile
3. destring var "PSCGC_MUN" with string from other sources. My code below
clear all
import excel "...\SY2017-2018 EnrollmentDB.xlsx", sheet("ES") cellrange(A2:AU2366) firstrow
tempfile sy2017_18enrol_es
save `sy2017_18enrol_es', replace
clear all
import excel "...\SY2017-2018 EnrollmentDB.xlsx", sheet("SS") cellrange(A2:BC483) firstrow
tempfile sy2017_18enrol_ss
save `sy2017_18enrol_ss', replace
clear all
import excel "...\SY2018-2019 EnrollmentDB.xlsx", sheet("ES") cellrange(A2:BN2320) firstrow
tempfile sy2018_19enrol_es
save `sy2018_19enrol_es', replace
clear
import excel "...\SY2018-2019 EnrollmentDB.xlsx", sheet("SS") cellrange(A2:BF472) firstrow
tempfile sy2018_19enrol_ss
save `sy2018_19enrol_ss', replace
clear
import excel ...\SY 2019-2020.xlsx", sheet("BARMM") cellrange(A5:CA2723) firstrow
tempfile sy2019_20enrol_es_ss
save `sy2019_20enrol_es_ss', replace
clear
set more off
import excel "...\SY 2020-2021 Enrolment.xlsx", sheet("Sheet1") firstrow
tempfile sy2020_21enrol_es_ss
save `sy2020_21enrol_es_ss', replace
clear all
set more off
local forms `sy2017_18enrol_es' `sy2017_18enrol_ss' `sy2018_19enrol_es' `sy2018_19enrol_ss' `sy2019_20enrol_es_ss' `sy2020_21enrol_es_ss'
foreach file of local forms {
local datasource "`file'"
foreach v in "`file'" {
capture confirm var SCH_TYPE
capture confirm var SCH_ID
capture confirm var School_id
capture confirm numeric var PSCGC_MUN
if !_rc {
capture ren SCH_TYPE SchoolType
capture ren SCH_ID SchoolID
capture ren School_id SchoolID
capture destring PSCGC_MUN, replace
}
gen data=`datasource'
append using "`file'" //don't want to use force here
}
}
Thank you in advance,
Comparing independent variables in different models with the samen depedent variable
Hello,
I am running three multiple linear regression models. In all three, the dependent variable is the democracy index score of a range of countries over several years that the EU has agreements with. I use three independent variables in each model. Two of those are the same in each model: a sum of imports and exports from and to the EU divided by the GDP of the country, and a score on network governance with the EU. However, the third independent variable is a binary variable which tells whether the country has a particular agreement with the EU in force or not. In each model, this is a different agreement type (there are 3 of those agreements possible (an SAA, an AA, and a CPA), hence 3 models). A country can only have one of the particular agreements in force at a time, or it can have no agreement at all in force. The three models thus look like this:
M1:
If anything is unclear or you need more information, please let me know.
Best, Harry
I am running three multiple linear regression models. In all three, the dependent variable is the democracy index score of a range of countries over several years that the EU has agreements with. I use three independent variables in each model. Two of those are the same in each model: a sum of imports and exports from and to the EU divided by the GDP of the country, and a score on network governance with the EU. However, the third independent variable is a binary variable which tells whether the country has a particular agreement with the EU in force or not. In each model, this is a different agreement type (there are 3 of those agreements possible (an SAA, an AA, and a CPA), hence 3 models). A country can only have one of the particular agreements in force at a time, or it can have no agreement at all in force. The three models thus look like this:
M1:
- M1:
- IV: democracy score
- DV1: SAA
- DV2: (imports+exports)/gdp score
- DV3: network governance score
- IV: democracy score
- M2:
- IV: democracy score
- DV1: AA
- DV2: (imports+exports)/gdp score
- DV3: network governance score
- IV: democracy score
- M3:
- IV: democracy score
- DV1: CPA
- DV2: (imports+exports)/gdp score
- DV3: network governance score
- IV: democracy score
If anything is unclear or you need more information, please let me know.
Best, Harry
Latent class analysis - constraining categorical variables
Hi Statalist!
I am trying to perform a latent class analysis and I am very new to this method, so I hope you will understand if my question is a bit naïve.
I would like to perform the analysis using nine categorical variables (each with five categories) and obtain three classes. Here is the code I am using:
Once I try to estimate the marginal predicted means of the outcome within each latent class (estat lcmean), the command seems to run forever, without giving any output. Looking on the forum, I am supposing that the issue is related to one coefficient being above 15. Below you can find part of the output, and the high coefficient in red.
As I understood, I should constrain the coefficient to 15. I tried to use the following code, but it is not adequate for my categorical variable.
Could anyone please suggest me the right specifications? Any suggestion is be more than welcome,
Kind regards,
Marla
I am trying to perform a latent class analysis and I am very new to this method, so I hope you will understand if my question is a bit naïve.
I would like to perform the analysis using nine categorical variables (each with five categories) and obtain three classes. Here is the code I am using:
Code:
gsem (e4_1b e4_2b e4_3b e4_4b e4_5b e4_6b e4_7b e4_8b e4_9b <-, ologit), lclass(C 3)
Code:
Class : 2
Response : e4_1b
Family : ordinal
Link : logit
Response : e4_2b
Family : ordinal
Link : logit
Response : e4_3b
Family : ordinal
Link : logit
Response : e4_4b
Family : ordinal
Link : logit
Response : e4_5b
Family : ordinal
Link : logit
Response : e4_6b
Family : ordinal
Link : logit
Response : e4_7b
Family : ordinal
Link : logit
Response : e4_8b
Family : ordinal
Link : logit
Response : e4_9b
Family : ordinal
Link : logit
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/e4_1b |
cut1 | -6.207387 .2099544 -6.61889 -5.795884
cut2 | -4.585708 .09238 -4.766769 -4.404646
cut3 | -1.894731 .0285634 -1.950714 -1.838748
cut4 | .7498389 .0206396 .7093861 .7902917
-------------+----------------------------------------------------------------
/e4_2b |
cut1 | -3.075159 .0378004 -3.149246 -3.001071
cut2 | -.6201423 .0158303 -.6511692 -.5891154
cut3 | 1.625804 .0206937 1.585246 1.666363
cut4 | 4.059866 .0600725 3.942126 4.177606
-------------+----------------------------------------------------------------
/e4_3b |
cut1 | -4.235327 .0753749 -4.383059 -4.087595
cut2 | -2.243794 .0294187 -2.301453 -2.186134
cut3 | -.249513 .0220957 -.2928197 -.2062063
cut4 | 2.190033 .0348925 2.121645 2.258421
-------------+----------------------------------------------------------------
/e4_4b |
cut1 | -24.39119 2397.723 -4723.841 4675.059
cut2 | -3.252751 .0575151 -3.365479 -3.140024
cut3 | -.6396046 .0260331 -.6906285 -.5885807
cut4 | 2.04866 .0369879 1.976165 2.121155
-------------+----------------------------------------------------------------
/e4_5b |
cut1 | -5.785578 .1817201 -6.141743 -5.429414
cut2 | -2.87042 .0414458 -2.951652 -2.789188
cut3 | -.8560699 .0229513 -.9010535 -.8110862
cut4 | 1.127883 .0237217 1.08139 1.174377
-------------+----------------------------------------------------------------
/e4_6b |
cut1 | -.2159903 .0155038 -.2463771 -.1856035
cut2 | 1.242691 .0190969 1.205261 1.28012
cut3 | 2.667265 .0303306 2.607818 2.726711
cut4 | 5.438793 .1438952 5.156764 5.720823
-------------+----------------------------------------------------------------
/e4_7b |
cut1 | -1.62553 .0205138 -1.665736 -1.585324
cut2 | .0296444 .0151189 .0000119 .0592768
cut3 | 1.490041 .0191115 1.452583 1.527499
cut4 | 3.487356 .0482501 3.392788 3.581925
-------------+----------------------------------------------------------------
/e4_8b |
cut1 | 1.331648 .023303 1.285975 1.377321
cut2 | 2.582377 .0354519 2.512892 2.651861
cut3 | 3.56694 .043742 3.481207 3.652673
cut4 | 6.136812 .1994962 5.745807 6.527817
-------------+----------------------------------------------------------------
/e4_9b |
cut1 | .0988198 .0149272 .0695631 .1280765
cut2 | 1.434078 .0194443 1.395968 1.472188
cut3 | 2.952968 .0329211 2.888444 3.017492
cut4 | 5.747459 .1649939 5.424077 6.070841
------------------------------------------------------------------------------Code:
gsem (e4_1b e4_2b e4_3b e4_4b e4_5b e4_6b e4_7b e4_8b e4_9b <- ) (2: e4_4b <- _cons@15) , ologit lclass(C 3)Kind regards,
Marla
Testing significance for Alphas CAPM
Hello everybody,
I have calculated monthly alphas and betas for monthly fund data from different portfolioids. How can I test the statistical significance of all calculated alphas? Can I do it with the fama macbeth regression?
I have calculated monthly alphas and betas for monthly fund data from different portfolioids. How can I test the statistical significance of all calculated alphas? Can I do it with the fama macbeth regression?
Survey data- Pool Cross Sectional data with weights- What to do with weights?
Dear all,
I have a pooled cross sectional data ( weakly balanced, 9 different years wealth and income data for individuals belonging to 4 races, 2 genders, and three categories of education) I have been referring to the STATA guide about this. There is a whole pdf manual about about svy.
What I saw was that when we do svy, we don't need to do any calibration of the variables using the weights given in the data. For instance we don't need to calculate weighted mean or anything (like X1W1+X2W2..../W1+W2+....). We just specify the survey design using svyset and use use "svy:" before putting any estimation command. The data somewhat looks like this- Each year has multiple individuals belonging to a each race category and probably there are different individuals in each year, not the same individuals sampled every year. I think this is sampling with replacement (since new set of individuals in each year? or not?How to know?), but I am not sure.
My questions are -
1. How to know if our data is sampling with replacement or sampling without replacement?
2. How to know if it is a one stage design or a two stage design? (This is necessary to know for when we specify survey design)
2. Which approach is better for plotting trends in income? Using svyset, doing regressions and marginsplotting OR without svyset, by calculating weighted means and doing line plot OR collapse (p50) income [pw=weight], by(year race)?
3. If we use svyset, how to know the type of weights? aweight, pweight...etc?
4. svyset PSU [pweight=pw], strata(strata)
Is is alright if I use year as strata?
5. What is the primary sampling unit here? I think each individual?
Any help would be greatly appreciated.
Thank you.
I have a pooled cross sectional data ( weakly balanced, 9 different years wealth and income data for individuals belonging to 4 races, 2 genders, and three categories of education) I have been referring to the STATA guide about this. There is a whole pdf manual about about svy.
What I saw was that when we do svy, we don't need to do any calibration of the variables using the weights given in the data. For instance we don't need to calculate weighted mean or anything (like X1W1+X2W2..../W1+W2+....). We just specify the survey design using svyset and use use "svy:" before putting any estimation command. The data somewhat looks like this- Each year has multiple individuals belonging to a each race category and probably there are different individuals in each year, not the same individuals sampled every year. I think this is sampling with replacement (since new set of individuals in each year? or not?How to know?), but I am not sure.
| weight | Year | Race | Gender | Age | Income |
| 1678.4 | 1990 | Black | M | 55 | |
| .2 | 1990 | White | M | 25 | |
| 6546 | 1990 | Black | M | 44 | |
| 151.55 | 1990 | White | F | 56 | |
| 564.55 | 1991 | White | F | 60 | |
| 54.66 | 1991 | White | M | 30 | |
| 1483.08 | 1991 | Black | M | 29 | |
| 452.6 | 1991 | Black | F | 48 | |
| 111.56 | 1992 | White | M | 65 |
1. How to know if our data is sampling with replacement or sampling without replacement?
2. How to know if it is a one stage design or a two stage design? (This is necessary to know for when we specify survey design)
2. Which approach is better for plotting trends in income? Using svyset, doing regressions and marginsplotting OR without svyset, by calculating weighted means and doing line plot OR collapse (p50) income [pw=weight], by(year race)?
3. If we use svyset, how to know the type of weights? aweight, pweight...etc?
4. svyset PSU [pweight=pw], strata(strata)
Is is alright if I use year as strata?
5. What is the primary sampling unit here? I think each individual?
Any help would be greatly appreciated.
Thank you.
Saturday, November 28, 2020
Loops to make variables compatible
Hi, I have a variable which has been coded differently in two datasets. (old one for year 2000 and new one for year 2005 )
Old codes are { 2 4 5 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 3 28 29 31 6 32 17 33 30 }
New codes are { 28 18 10 24 6 2 1 29 32 23 27 14 17 13 21 3 8 11 33 16 9 19 35 12 4 26 7 30 31 15 34 25}
So, for example, the variable (lets call it state) was taking a value 2 earlier but in the new dataset that value corresponds to 28...... similarly 4 earlier and 18 now and so on
both the datasets have been appended with a variable called year taking value 2000 for old dataset and 2005 for new dataset.
How do I write a loop to convert the old codes for variable state to new one for the year 2000 so that the variable state can be made compatible across years ?
Old codes are { 2 4 5 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 3 28 29 31 6 32 17 33 30 }
New codes are { 28 18 10 24 6 2 1 29 32 23 27 14 17 13 21 3 8 11 33 16 9 19 35 12 4 26 7 30 31 15 34 25}
So, for example, the variable (lets call it state) was taking a value 2 earlier but in the new dataset that value corresponds to 28...... similarly 4 earlier and 18 now and so on
both the datasets have been appended with a variable called year taking value 2000 for old dataset and 2005 for new dataset.
How do I write a loop to convert the old codes for variable state to new one for the year 2000 so that the variable state can be made compatible across years ?
IPTW - strategies
Hi everyone,
I'm evaluating whether increased dosage (number of completed sessions) of an behaviour change therapeutic program, is associated with a disappearance in that behaviour in the 12 months after the program ends. The outcome is a logistic binary effect - did engage in behaviour vs. did not engage in that behaviour.
Unfortunately, the program is not a RCT, only have a naturalistic/observation design: another group - those who were referred but never commenced the program - are going to act as a control group. I'm planning to use IPTW to develop propensity scores to weight the groups to balance both in terms of the likelihood of participating in the program to begin with. I am seeking an average treatment effect on the treated (ATT) sample, relative to never having participated in the program before.
I have two questions:
1. I was planning to do a xtmelogit of program participation, followed by Predict prob_particication and then
gen iptw=1/prob_participation
summ iptw
However, i've also seen people use -pscore- syntax to develop propensity scores? Is it okay to use the former method or is it less efficient than pscore?
2. When choosing the covariates to include in propensity score, is it better practice to include only personal characteristics of the sample - e.g., age, gender, education, etc. or is it also okay to include some broader operational factors in the model, - e.g., program trainer identity, timing of the start of the program, etc.? I've seen written in some places that choosing more 'operational factors' can be problematic because they are often more related to treatment than to the outcome and may be confounders.
I'm not sure if its helpful information, but my data is in wide format.
Thanks in advance! I've never completed IPTW before so it's very new to me
Marlee
I'm evaluating whether increased dosage (number of completed sessions) of an behaviour change therapeutic program, is associated with a disappearance in that behaviour in the 12 months after the program ends. The outcome is a logistic binary effect - did engage in behaviour vs. did not engage in that behaviour.
Unfortunately, the program is not a RCT, only have a naturalistic/observation design: another group - those who were referred but never commenced the program - are going to act as a control group. I'm planning to use IPTW to develop propensity scores to weight the groups to balance both in terms of the likelihood of participating in the program to begin with. I am seeking an average treatment effect on the treated (ATT) sample, relative to never having participated in the program before.
I have two questions:
1. I was planning to do a xtmelogit of program participation, followed by Predict prob_particication and then
gen iptw=1/prob_participation
summ iptw
However, i've also seen people use -pscore- syntax to develop propensity scores? Is it okay to use the former method or is it less efficient than pscore?
2. When choosing the covariates to include in propensity score, is it better practice to include only personal characteristics of the sample - e.g., age, gender, education, etc. or is it also okay to include some broader operational factors in the model, - e.g., program trainer identity, timing of the start of the program, etc.? I've seen written in some places that choosing more 'operational factors' can be problematic because they are often more related to treatment than to the outcome and may be confounders.
I'm not sure if its helpful information, but my data is in wide format.
Thanks in advance! I've never completed IPTW before so it's very new to me
Marlee
creating a loop
Hi guys:
as you can see, below are four parts of repetitive codes, and the only difference is ccode2 (31, 40, 41, 42). so i am wondering if there is anyway to write a loop to simplify these repetitive codes? Thanks in advance and happy holidays!
as you can see, below are four parts of repetitive codes, and the only difference is ccode2 (31, 40, 41, 42). so i am wondering if there is anyway to write a loop to simplify these repetitive codes? Thanks in advance and happy holidays!
Code:
use "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", clear drop edtc sort ccode1 year by ccode1 year: egen edtc= sum(trade_sha1) if ccode1==2 & ccode2 != 31 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop if edtc == . sort ccode1 year quietly by ccode1 year: gen dup = cond(_N==1,0,_n) drop if dup>1 drop dup keep ccode1 ccode2 year edtc rename edtc edtc1 replace ccode2 = 31 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_edtc.dta", replace merge ccode1 ccode2 year using "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta" drop _merge drop if ccode1_1 ==. save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop edtc sort ccode1 year by ccode1 year: egen edtc= sum(trade_sha1) if ccode1==2 & ccode2 != 40 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop if edtc == . sort ccode1 year quietly by ccode1 year: gen dup = cond(_N==1,0,_n) drop if dup>1 drop dup keep ccode1 ccode2 year edtc rename edtc edtc1 replace ccode2 = 40 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_edtc.dta", replace merge ccode1 ccode2 year using "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta" drop _merge drop if ccode1_1 ==. save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop edtc sort ccode1 year by ccode1 year: egen edtc= sum(trade_sha1) if ccode1==2 & ccode2 != 41 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop if edtc == . sort ccode1 year quietly by ccode1 year: gen dup = cond(_N==1,0,_n) drop if dup>1 drop dup keep ccode1 ccode2 year edtc rename edtc edtc1 replace ccode2 = 41 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_edtc.dta", replace merge ccode1 ccode2 year using "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta" drop _merge drop if ccode1_1 ==. save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop edtc sort ccode1 year by ccode1 year: egen edtc= sum(trade_sha1) if ccode1==2 & ccode2 != 42 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace drop if edtc == . sort ccode1 year quietly by ccode1 year: gen dup = cond(_N==1,0,_n) drop if dup>1 drop dup keep ccode1 ccode2 year edtc rename edtc edtc1 replace ccode2 = 42 sort ccode1 ccode2 year save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_edtc.dta", replace merge ccode1 ccode2 year using "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta" drop _merge drop if ccode1_1 ==. save "/Users/jack/Library/Mobile Documents/com~apple~CloudDocs/data/COW_Trade_4.0/cow_trade_4.0_rr.dta", replace
Panel index within group
Hey there, I like to create a new column indicating the occurrence of non-missing BAM within each id. For example, a new column named "BAM_panel" to start from 1-6 for id 3. Simply put, I want a panel index for BAM within each id when it is not missing. Would you help out? Thanks very much!
Code:
clear input int id float BAM 3 5 3 2 3 7 3 -6 3 1 3 -4 3 . 4 8 4 21 4 6 4 11 4 2 4 -1 4 -2 4 -15 4 -1 4 4 4 4 4 6 4 11 4 11 4 22 4 7 4 28 4 4 4 8 4 7 4 7 4 35 4 16 4 4 4 7 4 10 4 1 4 -7 4 -5 4 3 4 7 4 8 4 35 4 6 4 10 4 3 4 1 4 -7 4 4 4 -5 4 7 4 -5 4 1 4 -1 4 3 4 -2 4 . 7 . end
Question: Generating summary statistics table with tabstat and table
Dear all,
I have recently attempted to generate a descriptive statistic table with the command
and it generate the following:
The mean and median is currently in row format.
I was hoping to change it into the column format where the mean median is displayed directly each variable. For example:
Is tabstat or table able to do this? Otherwise what command should I use? Thank you.
I have recently attempted to generate a descriptive statistic table with the command
Code:
tabstat govtdebt metrde, statistics( mean median ) by(incomelevel)
The mean and median is currently in row format.
I was hoping to change it into the column format where the mean median is displayed directly each variable. For example:
Is tabstat or table able to do this? Otherwise what command should I use? Thank you.
Fitting two lines using aaplot
Dear All,
Is it possible to fit two lines (i.e. more than two variables) using the aaplot command? If yes, will appreciate any possible help.
Thanks,
Dapel
Is it possible to fit two lines (i.e. more than two variables) using the aaplot command? If yes, will appreciate any possible help.
Thanks,
Dapel
Time fixed effects as dummies change significance
Dear Statalist,
I am doing research on using nighttime light (radiances per area) to predict real regional GDP (random effect model). I have a dataset of 34 provinces in Indonesia for the period 2012-2020 (quarterly data). My mentor suggested I include time fixed effects (i.qdate) and province fixed effects (i.id) as dummies. But when I do so, the significance of my interest variable (ln_nli) become insignificance. Here's my result before and after adding the time and provinces dummy:
. xtreg . xtreg ln_pdrb ln_nli, re ro
Random-effects GLS regression Number of obs = 1,122
Group variable: id Number of groups = 34
R-sq: Obs per group:
within = 0.3997 min = 33
between = 0.5016 avg = 33.0
overall = 0.4384 max = 33
Wald chi2(1) = 164.50
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 34 clusters in id)
------------------------------------------------------------------------------
| Robust
ln_pdrb | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_nli | .1363934 .0106343 12.83 0.000 .1155505 .1572363
_cons | 17.57819 .1798055 97.76 0.000 17.22578 17.9306
-------------+----------------------------------------------------------------
sigma_u | .82280392
sigma_e | .09782348
rho | .98606209 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. xtreg ln_pdrb ln_nli i.qdate i.id, re ro
Random-effects GLS regression Number of obs = 1,122
Group variable: id Number of groups = 34
R-sq: Obs per group:
within = 0.8764 min = 33
between = 1.0000 avg = 33.0
overall = 0.9985 max = 33
Wald chi2(33) = .
corr(u_i, X) = 0 (assumed) Prob > chi2 = .
(Std. Err. adjusted for 34 clusters in id)
------------------------------------------------------------------------------
| Robust
ln_pdrb | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_nli | .0071749 .0078816 0.91 0.363 -.0082727 .0226225
|
qdate |
210 | .0281132 .0042352 6.64 0.000 .0198124 .036414
211 | .0285663 .012437 2.30 0.022 .0041903 .0529424
212 | .024168 .0056676 4.26 0.000 .0130597 .0352764
213 | .0488249 .0062365 7.83 0.000 .0366016 .0610482
214 | .0875023 .0052677 16.61 0.000 .0771778 .0978268
215 | .0941612 .0085182 11.05 0.000 .0774657 .1108566
216 | .0838755 .006852 12.24 0.000 .0704458 .0973053
217 | .1006991 .0076502 13.16 0.000 .085705 .1156933
218 | .1361169 .007639 17.82 0.000 .1211449 .151089
219 | .1439286 .0098438 14.62 0.000 .1246351 .1632221
220 | .1285819 .0110317 11.66 0.000 .1069601 .1502037
221 | .1559425 .0115845 13.46 0.000 .1332374 .1786476
222 | .1867966 .0139732 13.37 0.000 .1594097 .2141835
223 | .1943102 .0125887 15.44 0.000 .1696368 .2189835
224 | .1819726 .013399 13.58 0.000 .1557111 .2082341
225 | .2012336 .0149958 13.42 0.000 .1718423 .2306248
226 | .2399893 .0161248 14.88 0.000 .2083854 .2715933
227 | .2516112 .0159806 15.74 0.000 .2202897 .2829326
228 | .226623 .0141486 16.02 0.000 .1988923 .2543537
229 | .2448326 .0168339 14.54 0.000 .2118388 .2778264
230 | .2867302 .0181616 15.79 0.000 .2511341 .3223263
231 | .2939583 .0181339 16.21 0.000 .2584166 .3295001
232 | .2795545 .0158775 17.61 0.000 .2484351 .3106738
233 | .3027815 .018904 16.02 0.000 .2657303 .3398327
234 | .3326927 .0195459 17.02 0.000 .2943834 .371002
235 | .3371736 .0176719 19.08 0.000 .3025373 .3718099
236 | .3191344 .0178491 17.88 0.000 .2841509 .3541179
237 | .3431905 .0197575 17.37 0.000 .3044666 .3819144
238 | .378396 .0190506 19.86 0.000 .3410574 .4157345
239 | .3853209 .0203378 18.95 0.000 .3454596 .4251822
240 | .3490189 .0190272 18.34 0.000 .3117262 .3863116
241 | .3074161 .02012 15.28 0.000 .2679816 .3468506
|
id |
2 | 1.349615 .0033258 405.80 0.000 1.343097 1.356134
3 | .2081012 .0001396 1490.41 0.000 .2078276 .2083749
4 | 1.35338 .0031468 430.08 0.000 1.347212 1.359547
5 | .080414 .0020315 39.58 0.000 .0764323 .0843958
6 | .8071257 .0031772 254.04 0.000 .8008985 .8133529
7 | -1.094362 .0033559 -326.10 0.000 -1.100939 -1.087784
8 | .5555607 .0047317 117.41 0.000 .5462867 .5648347
9 | -.9177501 .0002953 -3108.17 0.000 -.9183288 -.9171714
10 | .273506 .0109266 25.03 0.000 .2520903 .2949218
11 | 2.518404 .03778 66.66 0.000 2.444356 2.592451
12 | 2.349943 .0151691 154.92 0.000 2.320212 2.379674
13 | 1.945931 .0119341 163.06 0.000 1.922541 1.969321
14 | -.3214616 .0144193 -22.29 0.000 -.349723 -.2932003
15 | 2.446185 .0140759 173.79 0.000 2.418597 2.473773
16 | 1.158497 .0175069 66.17 0.000 1.124184 1.19281
17 | .1139387 .013109 8.69 0.000 .0882455 .1396319
18 | -.3446538 .0030191 -114.16 0.000 -.3505712 -.3387365
19 | -.6924587 .0047129 -146.93 0.000 -.7016958 -.6832216
20 | -.0098659 .0049347 -2.00 0.046 -.0195378 -.0001941
21 | -.3577425 .0049798 -71.84 0.000 -.3675027 -.3479824
22 | 1.333831 .002178 612.41 0.000 1.329562 1.3381
23 | 1.334234 .0017355 768.79 0.000 1.330832 1.337635
24 | -.8315154 .0046088 -180.42 0.000 -.8405486 -.8224823
25 | -.4790523 .0026143 -183.24 0.000 -.4841762 -.4739284
26 | -.3125605 .0025476 -122.69 0.000 -.3175537 -.3075674
27 | .7999971 .0031723 252.18 0.000 .7937795 .8062147
28 | -.4333453 .0034614 -125.19 0.000 -.4401296 -.4265611
29 | -1.630576 .0010813 -1508.04 0.000 -1.632696 -1.628457
30 | -1.477908 .0055417 -266.69 0.000 -1.48877 -1.467047
31 | -1.512727 .0054251 -278.84 0.000 -1.52336 -1.502094
32 | -.7831341 .0059699 -131.18 0.000 -.794835 -.7714333
33 | -.7823274 .0068561 -114.11 0.000 -.7957651 -.7688898
34 | .1184597 .0086356 13.72 0.000 .1015342 .1353851
|
_cons | 17.0143 .0191901 886.62 0.000 16.97669 17.05191
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .04505634
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.
I am doing research on using nighttime light (radiances per area) to predict real regional GDP (random effect model). I have a dataset of 34 provinces in Indonesia for the period 2012-2020 (quarterly data). My mentor suggested I include time fixed effects (i.qdate) and province fixed effects (i.id) as dummies. But when I do so, the significance of my interest variable (ln_nli) become insignificance. Here's my result before and after adding the time and provinces dummy:
. xtreg . xtreg ln_pdrb ln_nli, re ro
Random-effects GLS regression Number of obs = 1,122
Group variable: id Number of groups = 34
R-sq: Obs per group:
within = 0.3997 min = 33
between = 0.5016 avg = 33.0
overall = 0.4384 max = 33
Wald chi2(1) = 164.50
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 34 clusters in id)
------------------------------------------------------------------------------
| Robust
ln_pdrb | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_nli | .1363934 .0106343 12.83 0.000 .1155505 .1572363
_cons | 17.57819 .1798055 97.76 0.000 17.22578 17.9306
-------------+----------------------------------------------------------------
sigma_u | .82280392
sigma_e | .09782348
rho | .98606209 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. xtreg ln_pdrb ln_nli i.qdate i.id, re ro
Random-effects GLS regression Number of obs = 1,122
Group variable: id Number of groups = 34
R-sq: Obs per group:
within = 0.8764 min = 33
between = 1.0000 avg = 33.0
overall = 0.9985 max = 33
Wald chi2(33) = .
corr(u_i, X) = 0 (assumed) Prob > chi2 = .
(Std. Err. adjusted for 34 clusters in id)
------------------------------------------------------------------------------
| Robust
ln_pdrb | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_nli | .0071749 .0078816 0.91 0.363 -.0082727 .0226225
|
qdate |
210 | .0281132 .0042352 6.64 0.000 .0198124 .036414
211 | .0285663 .012437 2.30 0.022 .0041903 .0529424
212 | .024168 .0056676 4.26 0.000 .0130597 .0352764
213 | .0488249 .0062365 7.83 0.000 .0366016 .0610482
214 | .0875023 .0052677 16.61 0.000 .0771778 .0978268
215 | .0941612 .0085182 11.05 0.000 .0774657 .1108566
216 | .0838755 .006852 12.24 0.000 .0704458 .0973053
217 | .1006991 .0076502 13.16 0.000 .085705 .1156933
218 | .1361169 .007639 17.82 0.000 .1211449 .151089
219 | .1439286 .0098438 14.62 0.000 .1246351 .1632221
220 | .1285819 .0110317 11.66 0.000 .1069601 .1502037
221 | .1559425 .0115845 13.46 0.000 .1332374 .1786476
222 | .1867966 .0139732 13.37 0.000 .1594097 .2141835
223 | .1943102 .0125887 15.44 0.000 .1696368 .2189835
224 | .1819726 .013399 13.58 0.000 .1557111 .2082341
225 | .2012336 .0149958 13.42 0.000 .1718423 .2306248
226 | .2399893 .0161248 14.88 0.000 .2083854 .2715933
227 | .2516112 .0159806 15.74 0.000 .2202897 .2829326
228 | .226623 .0141486 16.02 0.000 .1988923 .2543537
229 | .2448326 .0168339 14.54 0.000 .2118388 .2778264
230 | .2867302 .0181616 15.79 0.000 .2511341 .3223263
231 | .2939583 .0181339 16.21 0.000 .2584166 .3295001
232 | .2795545 .0158775 17.61 0.000 .2484351 .3106738
233 | .3027815 .018904 16.02 0.000 .2657303 .3398327
234 | .3326927 .0195459 17.02 0.000 .2943834 .371002
235 | .3371736 .0176719 19.08 0.000 .3025373 .3718099
236 | .3191344 .0178491 17.88 0.000 .2841509 .3541179
237 | .3431905 .0197575 17.37 0.000 .3044666 .3819144
238 | .378396 .0190506 19.86 0.000 .3410574 .4157345
239 | .3853209 .0203378 18.95 0.000 .3454596 .4251822
240 | .3490189 .0190272 18.34 0.000 .3117262 .3863116
241 | .3074161 .02012 15.28 0.000 .2679816 .3468506
|
id |
2 | 1.349615 .0033258 405.80 0.000 1.343097 1.356134
3 | .2081012 .0001396 1490.41 0.000 .2078276 .2083749
4 | 1.35338 .0031468 430.08 0.000 1.347212 1.359547
5 | .080414 .0020315 39.58 0.000 .0764323 .0843958
6 | .8071257 .0031772 254.04 0.000 .8008985 .8133529
7 | -1.094362 .0033559 -326.10 0.000 -1.100939 -1.087784
8 | .5555607 .0047317 117.41 0.000 .5462867 .5648347
9 | -.9177501 .0002953 -3108.17 0.000 -.9183288 -.9171714
10 | .273506 .0109266 25.03 0.000 .2520903 .2949218
11 | 2.518404 .03778 66.66 0.000 2.444356 2.592451
12 | 2.349943 .0151691 154.92 0.000 2.320212 2.379674
13 | 1.945931 .0119341 163.06 0.000 1.922541 1.969321
14 | -.3214616 .0144193 -22.29 0.000 -.349723 -.2932003
15 | 2.446185 .0140759 173.79 0.000 2.418597 2.473773
16 | 1.158497 .0175069 66.17 0.000 1.124184 1.19281
17 | .1139387 .013109 8.69 0.000 .0882455 .1396319
18 | -.3446538 .0030191 -114.16 0.000 -.3505712 -.3387365
19 | -.6924587 .0047129 -146.93 0.000 -.7016958 -.6832216
20 | -.0098659 .0049347 -2.00 0.046 -.0195378 -.0001941
21 | -.3577425 .0049798 -71.84 0.000 -.3675027 -.3479824
22 | 1.333831 .002178 612.41 0.000 1.329562 1.3381
23 | 1.334234 .0017355 768.79 0.000 1.330832 1.337635
24 | -.8315154 .0046088 -180.42 0.000 -.8405486 -.8224823
25 | -.4790523 .0026143 -183.24 0.000 -.4841762 -.4739284
26 | -.3125605 .0025476 -122.69 0.000 -.3175537 -.3075674
27 | .7999971 .0031723 252.18 0.000 .7937795 .8062147
28 | -.4333453 .0034614 -125.19 0.000 -.4401296 -.4265611
29 | -1.630576 .0010813 -1508.04 0.000 -1.632696 -1.628457
30 | -1.477908 .0055417 -266.69 0.000 -1.48877 -1.467047
31 | -1.512727 .0054251 -278.84 0.000 -1.52336 -1.502094
32 | -.7831341 .0059699 -131.18 0.000 -.794835 -.7714333
33 | -.7823274 .0068561 -114.11 0.000 -.7957651 -.7688898
34 | .1184597 .0086356 13.72 0.000 .1015342 .1353851
|
_cons | 17.0143 .0191901 886.62 0.000 16.97669 17.05191
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .04505634
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.