Thursday, April 30, 2020

Exporting Logistic Regression output table - command using 'svyset'

Dear experts,
I am running logistic regression model using 'svyset' command in Stata 15. I am unable to export the logistic regression output table. In this regard, I would like to request experts' help if there is/are any command that will export the output table to Excel/word with significance level astricks.

Thanking you

Sincerely,
Jamtsho

Correct for Selection on Independent Variables

Dear Statalists,
I am confused about how to correct for selection on one independent variable.

I want to estimate Y_ft=beta*Certified_ft+Z_ft in a sample from 2000-2013. Here f, t indicate firm and year respectively. Z are exogenous variables. Certified_ft means whether the firm gets a certification in year t. However, Certified_ft is only observed for firms survived in 2018. (I combined two datasets: one reports Y from 2000-2013; the other reports when firms got certified for firms survived in 2018)

So I face two selection issues: 2) the endogeneity of Certified_ft: factors that affect Certified and Y at the same time. I developed an instrument z1 for it; 2) the survivor bias: I only observe Certified_ft for firms survived in 2018. I wondered how to correct for these biases. I thought of two possibilities:

1) Semykina & Wooldrige (2010) corrected for endogeneity and selection. It is similar to Heckman two-stage method. However, it applied to selection on dependent variables, rather than independent variables.
2) Control function approach in Imbens & Wooldrige (2007) (page 4). First estimate a probit model of Prob(Certified_ft) on instruments z2 (hoping to correct for the survivor bias), obtain its predicted probabilities p2, then estimate Y on Certified, Z and p2, probably using 2SLS (with z1 as the instrument for Certified).

It is a little complicated as I face two layers of selection. Do you think method 2 can help me address this problem? Or what approach else would you recommend?
Any comments would be appreciated! Thank you.
Best,
K

Replacing missing rows of a variable

Dear All
I have a file with more than 1k observations and two variables. One of the variables Y is complete but the variable Countryis not and it looks like this (I want to replace rows 2 to 19 with row 1,ie America and rows 20 to 39 with row 2 i.e. Germany:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float n str40 country
  1 "         America"
  2 ""                         
  3 ""                         
  4 ""                         
  5 ""                         
  6 ""                         
  7 ""                         
  8 ""                         
  9 ""                         
 10 ""                     
 11 ""                         
 12 ""                         
 13 ""                         
 14 ""                         
 15 ""                         
 16 ""                         
 17 ""                         
 18 ""                         
 19 ""                         
 20 ""       Germany"                
 21 ""                         
 22 ""                         
 23 ""                         
 24 ""                         
 25 ""                         
 26 ""                         
 27 ""                         
 28 ""                         
 29 ""                         
 30 ""                         
 31 ""                         
 32 ""                         
 33 ""                         
 34 ""                         
 35 ""                         
 36 ""                         
 37 ""                         
 38 ""                         
 39 ""                         
 40 ""      Belgium"            
 41 ""                         
 42 ""                         
 43 ""                         
 44 ""                         
 45 ""                         
 46 ""                         
 47 ""                         
 48 ""                         
 49 ""                         
 50 ""                         
 51 ""                         
 52 ""                         
 53 ""                         
 54 ""                         
 55 ""                         
 56 ""                         
 57 ""                         
 58 ""                         
 59 ""                         
 60 ""      Somalia"               
 61 ""                         
 62 "                
 63 ""                         
 64 ""                         
 65 ""                         
 66 ""                         
 67 ""                         
 68 ""                         
 69 ""                         
 70 ""                         
 71 ""                         
 72 ""                         
 73 ""                         
 74 ""                         
 75 ""                         
 76 ""                         
 77 ""                         
 78 ""                         
 79 ""                         
 80 ""      Spain"              
 81 ""                         
 82 ""                         
 83 ""                         
 84 ""                         
 85 ""                         
 86 ""                         
 87 ""                         
 88 ""                         
 89 ""                         
 90 ""                         
 91 ""                         
 92 ""                         
 93 ""                         
 94 ""                         
 95 ""                         
 96 ""                         
 97 ""                         
 98 ""                         
 99 ""                         
100 ""                         
end

Thanks,
Dapel

How to sort data for distinct IDs with multiple visits and multiple values

Dear all,

I have a dataset that has multiple visits for each person. However each person had a differing number of visits
The dataset is demonstrated below

id visit value
1 1 70
1 2 70
1 3 68
1 4 58
2 1 55
2 2 35
3 1 57
4 1 67
4 2 55
5 1 59
5 2 52
5 3 32


I had a few queries to working on this dataset
1. Is there a command that would allow me to identify:
a) the visit with the lowest value for each id (i.e visit 1 for
b) the value corresponding to the last visit (i.e visit 4 for id 1, visit 3 for id 5) for each id
c) the change and direction of change between the highest and lowest value for each id (i.e for visit 5, the change would be negative 27 over 3 visits)

2. After establishing the above,
Is there a command that would
a) allow me to drop the other visits apart from the desired visit in question - be it the visit with the lowest value, last visit,
b) such that the dataset becomes one visit per id?

I would be grateful for your help with these queries.

Thank you very much

Panel VECM in STATA


I am giving a general description of the model in a panel setting: Let y, a, b, and z are four variables. There exist a vector of co-integration of order 1.

y=f(a(z), b (z))

a=g(z)

b=t(z)

Can I estimate this system of equation using panel VECM?

Replace missing values from a different row

I have data that are in long format where one ID has four rows of data. Only one row has information and I want to fill in the other three rows with the same information. Here is a snapshot of my data and the code that I attempted. Any advice would be great - thanks.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(id time in)
1002 0                  .
1002 1                  .
1002 2 1.5033675432969853
1002 3                  .
1003 0                  .
1003 1                  .
1003 2   .632943440174186
1003 3                  .
end
by id, sort: replace in = cond(!missing(in[_n-1], in[_n-1], in[_n+1]) if missing(in)

Bootstrapped SE with Two-Sample IV: "insufficient observations to compute bootstrap standard errors"

I am not sure if this is kosher but I am trying to calculate bootstrapped standard errors for a two-sample IV. (As an aside, for some reason user-written commands like weaktsiv are not working in my setting, and anyway I don't think it can calculate clustered errors.) I'm not sure if bootstrapping is the way to go, but in any case, I am getting the typical "insufficient observations" error. Here is my code:

Code:
program tsiv
reg yrschl inter bXn_* bX71_* bXws_* if male == 1
predict double xhat, xb
reg tr01 xhat bXn_* bX71_* bXws_* if male == 1
drop xhat
end

bootstrap, reps(2) nodrop: "tsiv"
My data is two stacked data sets, one which includes the variable yrschl but not tr01, and another which includes tr01 but not yrschl.

According to this discussion, https://www.statalist.org/forums/for...s-will-be-save, bootstrap will keep only the observations from the first regression. I thought "nodrop" would resolve the issue, as suggested in the discussion. But I get the same error with or without this option. What might be going wrong with my code?

The full error is:

Code:

. bootstrap, reps(2): tsiv
(running tsiv on estimation sample)

Bootstrap replications (2)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
xx
insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

Code for Marginsplot Interaction Terms - Error/questions

Hi all,

I am trying to determine whether the effect of income (continuous var) on cash usage depends on credit card ownership (categorical).Also, whether the effect of education (categorical var) on cash usage depends on credit card ownership. I am using interaction terms income*credit and educat*credit.

When I type the following code, I get an error message saying “”incometh: factor variables may not contain nonifnteger values.” Q1) Why might this be happening and is there a command i can use to fix this?


Code:
reg cashshare age i.credit##c.incometh i.educat##i.credit
margins incometh, at(credit = (0 1))
marginsplot, recastci(rarea)
Q2) When I run the next code for educate*credit, there is a heavy overlap of lines - is there any recommendation to help resolve this please?

Code:
reg cashshare age i.credit##c.incometh i.educat##i.credit
margins educat, at(credit = (0 1))
marginsplot, recastci(rarea)

On another note, I thought that running the following code would give a better visualisation of the interactions between income*credit and educate*credit. However, the difference is that it is the predictive margins of credit card adoption and not income/education. Q3) Am I correct in saying that the following code will not illustrate the effect of income or education on cash usage given credit card adoption.Instead, I believe it shows the effect of credit card adoption on cash usage given education/income level.

Code:
reg cashshare age i.credit##c.incometh i.educat##i.credit
margins credit, at(incometh = (0(20)150))
marginsplot, recastci(rarea)

margins credit, at(educat = (1 2 3 4))
marginsplot, recastci(rarea)
Many thanks in advance if you are able to help!


Esttab - Compress labels

Hi everyone,

Is there a way to compress labels in esttab command? If two variables from two regressions have same variable names, we can save some rows and write everything next to each other. However, I can not do it if variable names are different but labels are same. Thanks for any help in advance.

Ulas

Wilcoxon rank sum test for more than 3 groups?

So Stata is not letting me do a rank sum test for more than 3 groups with the command: ranksum score if outcome~=1, by(outcome). Does anyone know how I can use the ranksum command for more than 3 groups in Stata?

Filling in missing values in long data

I have some sample data pasted below.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(studentid year schoolid)
1001 1 109
1001 1 109
1001 2 121
1001 2 121
end
I want to create two new variables based on schoolid for each respective year. One new column for a variable representing schoolid1 with values of 109 going down and another column representing schoolid2 with values of 121 going down. The table below is what I'd ideally want.

studentid year schoolid schoolid1 schoolid2
1001 1 109 109 121
1001 1 109 109 121
1001 2 121 109 121
1001 2 121 109 121

I'd appreciate any suggestions! Thank you!

Hausman Test - "V_b-V_B is not positive definite" appears

Background of question

I am an economics student, currently writing my bachelor thesis, and quite inexperienced with Stata. I would be grateful for any help!

The purpose of my research is to analyse the drivers of export sophistication of Malaysian exports.

The dependent variable is the natural logarithm of the export sophistication index, more specifically the export sophistication of Malaysian exports to 171 countries.

The independent variables are:
  • Foreign Direct Investment (FDI) proxied by the stock and flow of FDI inflow, FDIS and FDIF respectively
  • Research and Development (R&D) proxied by Gross Domestic Expenditure on R&D as a percentage of GDP and Number of researchers per thousand in the labour force, GDE and RES respectively
Control variables are Malaysia’s GDP per capita PPP (current international $) proxying for the level of economic development (GDPc); Malaysia’s total population proxying for the country size (POPc); Malaysia’s gross enrolment ratio of the tertiary education segment proxying for Malaysia’s human capital (HCc); and the rule of law proxying for Malaysia's institutional quality (INSc).

Important here is that the data for the independent and control variables do not vary between the countries (id), only throughout the years since the data is specific to Malaysia.

My question

To check whether I should use a fixed-effects or random-effects model, I did the Hausman test, but the output does not seem right.

The coefficients in the random and fixed effects model are exactly the same. Furthermore, "V_b-V_B is not positive definite" appears.

I also tried by adding "hausman fixed random, sigmamore", but that does not change anything to my results.

What would you recommend me to do? Does that simply mean I will have to stick to the random-effects model? And I should simply ignore the "V_b-V_B is not positive definite" ?


Please find a screenshot of the test attached.

I would highly appreciate if you could help me.
Please let me know if you need further clarification.
Thank you and kind regards,
Julie

Histogram by groups

Hello,

I would like to know the code to create a bar graph with the values of the receipts of each group by year. Therefore, in each year it should appear 4 bars with the values for "scandinavian", "continental", "anglo-saxon" and " mediterranean" groups.
I send an example of the data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str52 country int year float(mean_receipts groups)
"Belgium"        1995      43.7 2
"Denmark"        1995      47.9 1
"Spain"          1995 33.866665 4
"Ireland"        1995        32 3
"United Kingdom" 1995        32 3
"Austria"        1995      43.7 2
"Greece"         1995 33.866665 4
"Italy"          1995 33.866665 4
"Ireland"        1996      33.6 3
"Austria"        1997  43.03333 2
"Luxembourg"     1997  43.03333 2
"Belgium"        1997  43.03333 2
"Italy"          1998        41 4
"United Kingdom" 1999      33.4 3
"Netherlands"    1999      37.7 1
"Luxembourg"     2000  42.63334 2
"Austria"        2000  42.63334 2
"Ireland"        2000      31.8 3
"Denmark"        2000  47.83333 1
"Italy"          2000 36.166668 4
"Belgium"        2000  42.63334 2
"Finland"        2000  47.83333 1
"Sweden"         2000  47.83333 1
"Spain"          2000 36.166668 4
"Greece"         2000 36.166668 4
"Finland"        2003      45.1 1
"Portugal"       2003      34.9 4
"Greece"         2003      34.9 4
"Sweden"         2003      45.1 1
"Denmark"        2003      45.1 1
"Luxembourg"     2003    42.975 2
"France"         2003    42.975 2
"Spain"          2003      34.9 4
"Italy"          2003      34.9 4
"Ireland"        2003      29.7 3
"Belgium"        2003    42.975 2
"Austria"        2003    42.975 2
"Ireland"        2004     32.45 3
"Spain"          2004    34.825 4
"Belgium"        2004    42.675 2
"Italy"          2004    34.825 4
"Luxembourg"     2004    42.675 2
"United Kingdom" 2004     32.45 3
"Sweden"         2004     42.75 1
"Finland"        2004     42.75 1
"Portugal"       2004    34.825 4
"Greece"         2004    34.825 4
"Austria"        2004    42.675 2
"Denmark"        2004     42.75 1
"France"         2004    42.675 2
"Netherlands"    2004     42.75 1
"Italy"          2005    35.625 4
"France"         2005     42.75 2
"Austria"        2005     42.75 2
"Netherlands"    2005      43.5 1
"Belgium"        2005     42.75 2
"Finland"        2005      43.5 1
"United Kingdom" 2005      32.9 3
"Denmark"        2005      43.5 1
"Sweden"         2005      43.5 1
"Luxembourg"     2005     42.75 2
"Ireland"        2005      32.9 3
"Spain"          2005    35.625 4
"Greece"         2005    35.625 4
"Portugal"       2005    35.625 4
"Netherlands"    2006    43.125 1
"Luxembourg"     2006     42.15 2
"Spain"          2006    36.025 4
"Denmark"        2006    43.125 1
"United Kingdom" 2006     33.65 3
"Ireland"        2006     33.65 3
"Belgium"        2006     42.15 2
"Italy"          2006    36.025 4
"Portugal"       2006    36.025 4
"Austria"        2006     42.15 2
"Finland"        2006    43.125 1
"Sweden"         2006    43.125 1
"Greece"         2006    36.025 4
"France"         2006     42.15 2
"Ireland"        2007      33.4 3
"Greece"         2007      36.7 4
"Finland"        2007     42.55 1
"Italy"          2007      36.7 4
"Spain"          2007      36.7 4
"United Kingdom" 2007      33.4 3
"Belgium"        2007    42.025 2
"Netherlands"    2007     42.55 1
"Luxembourg"     2007    42.025 2
"Denmark"        2007     42.55 1
"France"         2007    42.025 2
"Austria"        2007    42.025 2
"Portugal"       2007      36.7 4
"Sweden"         2007     42.55 1
"Sweden"         2008    41.875 1
"Austria"        2008    42.525 2
"Netherlands"    2008    41.875 1
"Spain"          2008      35.6 4
"France"         2008    42.525 2
"Ireland"        2008     32.95 3
"Italy"          2008      35.6 4
end
format %ty year
label values groups gnames
label def gnames 1 "Scandinavian", modify
label def gnames 2 "Continental", modify
label def gnames 3 "Anglo_Saxon", modify
label def gnames 4 "Mediterranean", modify

Thank you in advance.

Renaming variables

Hi,

I am facing a problem trying to rename variables. I am using Stata 13 MP.
This started with me having to reshape a long data into a wide format. Now I have unique identifiers in every row but drug variables in 350 columns. These columns are named " genericname* " and I would like to write a loop to rename them all into the respective drug they are representing. The drug names are exclusively present within columns but not in every row as some patients might not be on those drugs. I am finding it difficult to write a loop that would scan any of the entries within columns and use it to replace the variable name.

I know it is something along those coding lines but I am not quite getting it:

foreach v of varlist genericname* {
forvalues i=1/77929 {
local name = `v'[`i']
capture rename `v' `name'
}
}

Hope someone has a solution to this problem.

Thanks in advance,
Moubadda

Code for including confidence intervals on both curves

Hi all,

I am trying to compare a linear vs quadratic predictive margins in one graph. However, I am trying to show the confidence intervals for both curves but can’t seem to find the correct code. The following code I use produces curves without confidence intervals:

Code:
reg csh_sh age income
qui margins, at(age=(18(7)90)) noci saving(x1, replace)
reg csh_sh c.age##c.age income
margins, at(age=(18(7)90))  noci saving(x2, replace)
qui combomarginsplot x1 x2 , noci plotdim(_filenumber) label("Linear" "Quadratic") name(fixed, replace)
Any help really appreciated. Thanks!

Graphing problem

I collapsed my data for it to include count, mean and a dummy variable for the category they belong to.

My aim was to create a bar graph similar to what:
Code:
  graph bar mean, over(dummy)
would return but with number of observations on top. Hence I used a two way graph with the code:
Code:
 twoway (bar mean dummy) (scatter mean dummy, m(i) mlabel(count) mlabposition(12))
However the result from the two has numbers on the X-axis and not the categorical labels. Any way I can add labels to it?

Difference-in-Difference analysis after PSM

Dear statist,
I need your help.
I am doing work that involves assessing the impact of issuing a specific obligation on governance variables (e.g. independent directors, size of the board of directors, etc.).

First of all, I made a specific association with STATA's psmatch2 command, to find three companies more similar to the issuer of this bond, but which are not issuers.

Now I have to do a DID analysis. To do this I encountered two problems:
1. create a specific ID for matched pairs
2. create a PRE_POST vector for treated group and control group

My difficulty is due to the use of the panel data and the nature of the data which requires that each company has issued an obligation in several years.
It follows that a control group company can be combined with a company treated in 2014 and another company treated in 2018.

In this case, I struggle to find a specific ID for each game and to create the time vector (PRE_POST).

I thought of duplicating the companies that are paired more than once, so that they can have a specific ID for each pairing and a PRE_POST consistent with the company to be treated. This would cause an increase in the control group, would it be a problem?
Would it be statistically correct?
Do you have any other ideas?



The code to calculate the propensity score and make the match:
Code:
 
 xtlogit Treatment logattivo ROE ROA SIC, fe /*with a logit function I calculate the propensity score. My covariates are TotalAssets, ROA, ROE, industry */  predict pscore  gen pscore1=YEAR*10+SIC1*100+pscore       psmatch2 TRATTAMENTO2, pscore(pscore1) logit
If data duplication was valid, I used three methods, read in this forum, to do DID analysis. Which of these is more correct to estimate the impact of a bond issue (considering before and after issue) on a y (ESG score), compared with the respective control sample?What are their differences?

After using the -xtset ID code:
1. I do a regression on panel data:
Code:
xtreg ESG.SCORE  i.Treatment##i.PRE_POST, fe    vce(cluster ID)
2.I use the "mixed" code

Code:
gen interaction  = 1.TRATTAMENTO#1.PRE_POST

foreach v of varlist Treatment PRE_POST interaction {
    by ID, sort: egen b_`v' = mean(`v')
    gen w_`v' = `v' - b_`v'
}

mixed ESG.SCORE  w_* b_* || ID: || issuer:
3. i use "diff" code:
Code:
diff ESG.SCORE, t(Treatment) p(PRE_POST) id(ID)
My data:


----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte ID int(issuer YEAR) byte(Treatment PRE_POST) double ESG.SCORE
1  56 2016 0 1     .
1   2 2015 1 1  9.18
1 151 2016 0 1     .
1 128 2014 0 1     .
1 128 2015 0 1     .
1 151 2019 0 1     .
1  56 2013 0 0     .
1  56 2017 0 1     .
1  56 2014 0 1     .
1 128 2012 0 0     .
1   2 2019 1 1     .
1  56 2019 0 1     .
1 151 2014 0 1     .
1 128 2010 0 0     .
1 128 2013 0 0     .
1 128 2016 0 1     .
1 128 2018 0 1     .
1 128 2017 0 1     .
1   2 2017 1 1 76.42
1   2 2010 1 0    25
1 151 2018 0 1     .
1  56 2010 0 0     .
1 128 2011 0 0     .
1  56 2018 0 1 39.36
1   2 2018 1 1 75.69
1 151 2011 0 0     .
1 128 2019 0 1     .
1  56 2011 0 0     .
1 151 2010 0 0     .
1  56 2015 0 1     .
1   2 2013 1 0 38.04
1  56 2012 0 0     .
1   2 2014 1 1 26.04
1   2 2012 1 0 32.22
1 151 2017 0 1     .
1   2 2016 1 1 17.35
1 151 2013 0 0     .
1 151 2012 0 0     .
1 151 2015 0 1     .
1   2 2011 1 0  26.6
2 181 2016 0 0     .
2 181 2010 0 0     .
2 181 2014 0 0     .
2 133 2013 0 0     .
2 181 2017 0 0     .
2  36 2019 1 1     .
2  36 2017 1 0     .
2 148 2015 0 0     .
2 133 2014 0 0     .
2 148 2018 0 0     .
2  36 2018 1 0 27.96
2 181 2019 0 1     .
2 181 2012 0 0     .
2 148 2012 0 0     .
2 181 2011 0 0     .
2  36 2015 1 0     .
2 133 2011 0 0     .
2 133 2016 0 0     .
2  36 2011 1 0     .
2  36 2013 1 0     .
2  36 2010 1 0     .
2 133 2017 0 0     .
2 133 2010 0 0     .
2 148 2013 0 0     .
2 148 2010 0 0     .
2 181 2018 0 0     .
2 181 2015 0 0     .
2 133 2015 0 0     .
2 133 2019 0 1     .
2  36 2016 1 0     .
2  36 2014 1 0     .
2 181 2013 0 0     .
2 148 2014 0 0     .
2 148 2016 0 0     .
2 148 2019 0 1     .
2  36 2012 1 0     .
2 148 2011 0 0     .
2 133 2018 0 0 52.97
2 133 2012 0 0     .
2 148 2017 0 0     .
3  44 2013 1 0     .
3 118 2014 0 0     .
3 122 2013 0 0 40.36
3 122 2015 0 0    50
3 122 2016 0 0 82.45
3 210 2014 0 0 83.68
3 210 2011 0 0 79.03
3 210 2016 0 0 85.05
3  44 2016 1 0     .
3  44 2017 1 1     .
3 118 2013 0 0     .
3  44 2011 1 0     .
3 122 2011 0 0 22.94
3  44 2012 1 0     .
3  44 2015 1 0     .
3 122 2012 0 0 26.79
3 210 2013 0 0 89.78
3 210 2018 0 1 92.95
3 210 2012 0 0 83.33
3  44 2018 1 1 10.19
end
------------------ copy up to and including the previous line ------------------

Sorry for the many questions, but I'm a beginner

Combining duplicate names into one

Hello everyone,

I have a dataset containing data about Board Members and it is formatted like this:
Name Start Date End Date Role Name Ticker
Mr. Jan Appels 21-03-2010 31-12-2010 Board member AED
Mr. Jan Appels 01-01-2011 20-11-2018 CEO AED
My question is: Is there a way to combine these two observations into one which contains the first start date and the latest end date based on Name and Ticker, where Role Name could be dropped as it is not a useful variable for my research. So the data would then look like:
Name Start Date End Date Ticker
Mr. Jan Appels 21-03-2010 20-11-2018 AED
I've tried several things but without success, hopefully, someone can help me.
Thanks in advance.

Regards,
Roel

Error calculating margins after melogit (could not calculate numerical derivatives -- discontinuous region with missing values encountered)

Hello all,

I am having trouble calculating margins after running a melogit with 3 levels. The command for the melogit is formulated as:

melogit y x z || level2var: || level3var:

I then try to caluculate margins at specefic levels of two variables in the FE equation and am presented with the following error message:

could not calculate numerical derivatives -- discontinuous region with missing values
encountered

r(459);

There is no problem obtaining margins when the melogit was calculated with only one level variable, regardless of which level variable I use.

Does anyone have any advice about this issue and how to move past it?

Counting number of people by gender

Hi all,

I am currently using Stata Version 14. In the sample data attached below, I have information on the gender, rank, area of residence and also the year of residence for individuals. Data is also sorted by the rank i.e. rank 1 being the top rank, followed by rank 2 and so on.

In the data attached, if variable male=0, then the individual is a female and if male=1, then the person is a male. What I would like to do is, identify the number of males and females above their rank order (i.e. rank 1 is above/better than rank 2 and so on), only for females. For ex, in area B, person of rank 2 is a female. Number of males above this individual's rank is 1 (as the person of rank 1 is a male) and the number of females above their rank is zero. Likewise, for area B again, person of rank 5 is a female. Number of males above their rank order is 3 and number of females above their rank is 1.

Any suggestions on how to execute this using code would be helpful.

Thanks.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str29 area float(year rank male males_above females_above female)
"A" 1978 1 1 . . 0
"A" 1978 2 1 . . 0
"A" 1978 3 0 2 0 1
"A" 1978 4 1 . . 0
"A" 1978 5 0 3 1 1
"B" 1983 1 1 . . 0
"B" 1983 2 0 1 0 1
"B" 1983 3 1 . . 0
"B" 1983 4 1 . . 0
"B" 1983 5 0 3 1 1
end

putexcel

Good day to everybody,

just a couple of quick questions concerning the putexcel command in Stata 16:

1. I'd like to see the results of the xtivreg2 command in the excel file I generated. After using the putexcel command, the returned message after the usage of xtivreg2 is 'information for the estimation table not found; Please replay estimation results and try again'. Is this because I am using non-official Stata command?

2. I already generated a log file, is there a way to translate this into an excel file w/o having to re-run the .do file?

Thanks in advance,
Kind Regards,
Federico

sem - visualizing interaction between latent variables

Dear all,

I would like to visualize a latent variable interaction after running sem in Stata 14. To interpret the interaction effect, I would like to plot the effects of the latent predictors on the latent dependent variable for different levels of the moderators (i.e., at 1 SD below and above the mean of the respective moderator). I first thought about using margins in combination with xblatent to get the values for drawing the interaction plot. However, I learnt that using margins in combination with xblatent is not possible.

More detailed illustration: Following Marsh et al. 2004 (please find the reference below), the indicators of the latent interaction term were calculated by multiplying pairs of the (mean-centered) indicators involved in the interaction before running sem. The following code illustrates my model:

Code:
sem (A -> a1 a2 a3 a4) ///
(B -> b1 b2 b3 b4)  ///
(C -> c1 c2 c3 c4) /// mean-centered indicator variables
(D -> d1 d2 d3 d4) /// mean-centered indicator variables
(Interaction_C*D -> ia1 ia2 ia3 ia4) /// The indicators ia1-4 were calculated by multiplying matched pairs of the indicators of C & D before running sem (Marsh et al. 2004)
(E -> f1 f2 f3 f4) ///
(F -> g1 g2 g3 g4) ///
(G -> h1 h2 h3 h4) ///
(C <- A B) ///
(E <- A B C D Interaction_C*D) ///
(F <- A B C D) ///
(G <- E F), cov(e.E*e.F) vce(sbentler) standardized
Referring to the code above, what I would like to do is to visualize the effect of the latent variable interaction on E. Is there a way to get the values for the effects of the latent predictors (C & D) on the latent dependent variable (E) when the moderators are at 1 SD below or above their means?

Reference:
MARSH, Herbert W., WEN, Zhonglin, et HAU, Kit-Tai. Structural equation models of latent interactions: evaluation of alternative estimation strategies and indicator construction. Psychological methods, 2004, vol. 9, no 3, p. 275.

merging data files to create a large panel data by VDS_Id and SUR_MON_YR

here is data below
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 VDS_ID str5 SUR_MON_YR str1 GIFT_REMITTANCES_SOURCE float(GIFT_AMT_GIV GIFT_AMT_REC) str26(LOAN_INSTITUTIONAL LOAN_NON_INSTITUETIONAL) double(LOAN_REPAID LOAN_REC) str24 LOAN_PURPOSE double LOAN_INT
"IAP10A0008" "05/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0008" "07/10" "." . . "Finance companies"    "Shopkeeper"         2400     . ""                 .
"IAP10A0008" "08/10" "." . . "Finance companies"    "Friends/Relatives"  2900 10000 "LOAN REPAYMENT"   .
"IAP10A0008" "08/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0008" "09/10" "." . . "Finance companies"    "Input dealer"       3300     . ""                 .
"IAP10A0008" "10/10" "." . . "Finance companies"    "Friends/Relatives"  1600     . ""                 .
"IAP10A0008" "12/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0010" "07/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0010" "09/10" "." . . "Self help group(SHG)" "Input dealer"          .  6000 "LOAN REPAYMENT"  12
"IAP10A0010" "09/10" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0030" "08/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0034" "01/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0034" "02/11" "." . . "Self help group(SHG)" "Money lender"          . 10000 "LOAN REPAYMENTS"  3
"IAP10A0034" "02/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0034" "05/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0034" "06/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0034" "09/10" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0034" "09/10" "." . . "Finance companies"    "Outsiders"             . 10000 "LOAN REPAYMENT"  18
"IAP10A0034" "10/10" "." . . "Finance companies"    "Shopkeeper"          900     . ""                 .
"IAP10A0034" "10/10" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0034" "11/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0034" "11/10" "." . . "Finance companies"    "Outsiders"           225     . ""                 .
"IAP10A0038" "11/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0043" "07/10" "." . . "Self help group(SHG)" "Friends/Relatives"     . 10000 "AGRICULTURE"     12
"IAP10A0043" "08/10" "." . . "Self help group(SHG)" "Input dealer"          .  6000 "AGRICULTURE"     12
"IAP10A0043" "09/10" "." . . "Self help group(SHG)" "Money lender"          .  6000 "DOMESTIC"        12
"IAP10A0043" "11/10" "." . . ""                     "Input dealer"          .     . ""                 .
"IAP10A0044" "01/11" "." . . "Finance companies"    "Friends/Relatives"  3760     . ""                 .
"IAP10A0044" "02/11" "." . . "Finance companies"    "Dairy"              3760     . ""                 .
"IAP10A0044" "03/11" "." . . "Finance companies"    "Friends/Relatives"  4700     . ""                 .
"IAP10A0044" "04/11" "." . . "Finance companies"    "Friends/Relatives"  4700     . ""                 .
"IAP10A0044" "05/11" "." . . "Finance companies"    "Shopkeeper"         4700     . ""                 .
"IAP10A0044" "05/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0044" "06/11" "." . . "Finance companies"    "Shopkeeper"         4700     . "DOMESTIC"         .
"IAP10A0044" "07/10" "." . . "Self help group(SHG)" "Friends/Relatives"     . 10000 "LIVESTOCK"       12
"IAP10A0044" "07/10" "." . . "Finance companies"    "Outsiders"             . 40000 "LOAN REPAYMENT"  18
"IAP10A0044" "08/10" "." . . "Finance companies"    "Input dealer"       3740     . ""                 .
"IAP10A0044" "09/10" "." . . "Self help group(SHG)" "Friends/Relatives"     . 12000 "AGRICULTURE"     12
"IAP10A0044" "09/10" "." . . "Finance companies"    "Shopkeeper"         3740     . ""                 .
"IAP10A0044" "09/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0044" "10/10" "." . . "Finance companies"    "Friends/Relatives"  1870     . ""                 .
"IAP10A0044" "11/10" "." . . "Finance companies"    "Friends/Relatives"  3760     . ""                 .
"IAP10A0044" "12/10" "." . . "Finance companies"    "Outsiders"          3760     . ""                 .
"IAP10A0046" "01/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0046" "03/11" "." . . "Self help group(SHG)" "Outsiders"             .  5000 "AGRICULTURE"     24
"IAP10A0046" "06/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0046" "07/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0046" "09/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0046" "10/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0048" "01/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0048" "02/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0048" "07/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0053" "03/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0053" "03/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0053" "05/11" "." . . "Grameena Banks"       "Friends/Relatives" 33000 25000 "AGRICULTURE"     12
"IAP10A0053" "06/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0053" "06/11" "." . . "Grameena Banks"       "Friends/Relatives" 62000 60000 "AGRICULTURE"     12
"IAP10A0053" "08/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0053" "09/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0055" "04/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0055" "07/10" "." . . ""                     "Input dealer"          .     . ""                 .
"IAP10A0055" "07/10" "." . . "Grameena Banks"       "Shopkeeper"        35500 50000 "AGRICULTURE"     12
"IAP10A0055" "08/10" "." . . ""                     "Input dealer"          .     . ""                 .
"IAP10A0081" "06/11" "." . . "Self help group(SHG)" "Dairy"                 . 12000 "AGRICULTURE"     12
"IAP10A0081" "10/10" "." . . ""                     "Input dealer"          .     . ""                 .
"IAP10A0082" "01/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0082" "04/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0082" "06/11" "." . . "Self help group(SHG)" "Friends/Relatives"     . 13000 "AGRICULTURE"     12
"IAP10A0082" "10/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0082" "12/10" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0202" "02/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0202" "02/11" "." . . "Self help group(SHG)" "Friends/Relatives"     . 10000 "MARRIAGE"         3
"IAP10A0202" "02/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0202" "05/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0202" "07/10" "." . . "Finance companies"    "Friends/Relatives"  2160     . ""                 .
"IAP10A0202" "08/10" "." . . "Finance companies"    "Friends/Relatives"  2920  8000 "AGRICULTURE"     18
"IAP10A0202" "08/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0202" "09/10" "." . . "Finance companies"    "Friends/Relatives"  2920     . ""                 .
"IAP10A0202" "09/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0202" "10/10" "." . . "Finance companies"    "Friends/Relatives"  1460     . ""                 .
"IAP10A0202" "10/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0202" "12/10" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0203" "01/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0203" "04/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0203" "05/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0203" "07/10" "." . . "Finance companies"    "Friends/Relatives"   800     . ""                 .
"IAP10A0203" "08/10" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0203" "08/10" "." . . "Finance companies"    "Friends/Relatives"  1500 10000 "LOAN REPAYMENT"  18
"IAP10A0203" "09/10" "." . . ""                     "Input dealer"          .     . ""                 .
"IAP10A0203" "09/10" "." . . "Finance companies"    "Friends/Relatives"  2240     . ""                 .
"IAP10A0203" "10/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0203" "10/10" "." . . "Finance companies"    "Friends/Relatives"  1250     . ""                 .
"IAP10A0208" "02/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0208" "06/11" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0208" "07/10" "." . . ""                     "Friends/Relatives"     .     . ""                 .
"IAP10A0208" "08/10" "." . . "Self help group(SHG)" "Friends/Relatives"     . 15000 "AGRICULTURE"     12
"IAP10A0209" "05/11" "." . . ""                     "Money lender"          .     . ""                 .
"IAP10A0209" "06/11" "." . . "Cooperatives"         "Shopkeeper"         1900     . "DOMESTIC"         .
"IAP10A0209" "07/10" "." . . "Cooperatives"         "Dairy"             31200 30000 "AGRICULTURE"     12
"IAP10A0209" "07/10" "." . . ""                     "Money lender"          .     . ""                 .
end
another file is...

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 VDS_ID str5 SUR_MON_YR str40 GIFT_REMITTANCES_SOURCE long(GIFT_AMT_GIV GIFT_AMT_REC)
"IAP10A0008" "02/11" "Friends/Relatives"                        50     .
"IAP10A0008" "04/11" "Friends/Relatives"                       200     .
"IAP10A0008" "05/11" "Friends/Relatives"                      2000     .
"IAP10A0008" "07/10" "Friends/Relatives"                       300     .
"IAP10A0008" "09/10" "Friends/Relatives"                       500     .
"IAP10A0008" "12/10" "Friends/Relatives"                     20000     .
"IAP10A0009" "01/11" "Institutions"                              .   600
"IAP10A0009" "02/11" "Friends/Relatives"                         .   400
"IAP10A0009" "03/11" "Friends/Relatives"                         .   600
"IAP10A0009" "04/11" "Friends/Relatives"                         .   700
"IAP10A0009" "05/11" "Friends/Relatives"                         .   800
"IAP10A0009" "06/11" "Friends/Relatives"                         .   400
"IAP10A0009" "12/10" "Friends/Relatives"                         .   800
"IAP10A0010" "01/11" "Friends/Relatives"                       400     .
"IAP10A0010" "02/11" "Friends/Relatives"                       320     .
"IAP10A0010" "03/11" "Friends/Relatives"                       320     .
"IAP10A0010" "04/11" "Friends/Relatives"                       330     .
"IAP10A0010" "05/11" "Friends/Relatives"                       500     .
"IAP10A0010" "06/11" "Friends/Relatives"                        50     .
"IAP10A0010" "07/10" "Friends/Relatives"                         .  2000
"IAP10A0010" "09/10" "Friends/Relatives"                       200     .
"IAP10A0010" "10/10" "Friends/Relatives"                       400     .
"IAP10A0010" "11/10" "Friends/Relatives"                       500     .
"IAP10A0030" "02/11" "Friends/Relatives"                       400     .
"IAP10A0030" "04/11" "Friends/Relatives"                       100     .
"IAP10A0030" "05/11" "Friends/Relatives"                       400     .
"IAP10A0030" "06/11" "Friends/Relatives"                     12500     .
"IAP10A0030" "10/10" "Friends/Relatives"                      1500     .
"IAP10A0030" "11/10" "Friends/Relatives"                       202     .
"IAP10A0034" "01/11" "Friends/Relatives"                      3000     .
"IAP10A0034" "02/11" "Friends/Relatives"                       350     .
"IAP10A0034" "05/11" "Friends/Relatives"                      4900     .
"IAP10A0034" "06/11" "Friends/Relatives-DAUGHTER OPERATION"  70000     .
"IAP10A0034" "09/10" "Friends/Relatives"                       150     .
"IAP10A0034" "11/10" "Friends/Relatives"                      1000     .
"IAP10A0035" "03/11" "Friends/Relatives"                      2000     .
"IAP10A0035" "04/11" "Friends/Relatives"                      3000     .
"IAP10A0035" "05/11" "Friends/Relatives"                      8000     .
"IAP10A0035" "07/10" "Friends/Relatives"                        50     .
"IAP10A0035" "08/10" "Friends/Relatives"                       300   800
"IAP10A0035" "11/10" "Friends/Relatives"                       800     .
"IAP10A0036" "09/10" "Friends/Relatives"                         .  2000
"IAP10A0038" "01/11" "Friends/Relatives"                         .   900
"IAP10A0038" "02/11" "Friends/Relatives"                       200  1000
"IAP10A0038" "03/11" "Friends/Relatives"                       100     .
"IAP10A0038" "04/11" "Friends/Relatives"                       100   500
"IAP10A0038" "05/11" "Friends/Relatives"                       415     .
"IAP10A0038" "06/11" "Friends/Relatives"                       100     .
"IAP10A0038" "07/10" "Friends/Relatives"                       200     .
"IAP10A0038" "08/10" "Friends/Relatives"                       100     .
"IAP10A0038" "11/10" "Friends/Relatives"                      1600     .
"IAP10A0039" "01/11" "Friends/Relatives"                       200     .
"IAP10A0039" "02/11" "Friends/Relatives"                       200     .
"IAP10A0039" "05/11" "Friends/Relatives"                      5000     .
"IAP10A0039" "07/10" "Friends/Relatives"                       200     .
"IAP10A0043" "02/11" "Dowry(Cash & kind)"                   411000     .
"IAP10A0043" "02/11" "Friends/Relatives"                         . 70000
"IAP10A0043" "03/11" "Friends/Relatives"                       100     .
"IAP10A0043" "04/11" "Friends/Relatives"                       300     .
"IAP10A0043" "05/11" "Friends/Relatives"                     15000     .
"IAP10A0043" "06/11" "Friends/Relatives"                       200     .
"IAP10A0043" "08/10" "Friends/Relatives"                      2100     .
"IAP10A0044" "02/11" "Friends/Relatives"                     14500     .
"IAP10A0044" "04/11" "Friends/Relatives"                       400     .
"IAP10A0044" "05/11" "Friends/Relatives"                      2600     .
"IAP10A0044" "07/10" "Friends/Relatives"                       500     .
"IAP10A0044" "08/10" "Friends/Relatives"                       100     .
"IAP10A0044" "09/10" "Institutions"                              .  1500
"IAP10A0046" "02/11" "Friends/Relatives"                       200     .
"IAP10A0046" "05/11" "Friends/Relatives"                       500     .
"IAP10A0046" "07/10" "Friends/Relatives"                       300     .
"IAP10A0048" "02/11" "Friends/Relatives"                      3900     .
"IAP10A0048" "04/11" "Friends/Relatives"                       116     .
"IAP10A0048" "05/11" "Friends/Relatives"                       102     .
"IAP10A0048" "06/11" "Friends/Relatives"                       500     .
"IAP10A0053" "01/11" "Friends/Relatives"                       300     .
"IAP10A0053" "02/11" "Friends/Relatives"                      1200     .
"IAP10A0053" "04/11" "Friends/Relatives"                      1000     .
"IAP10A0053" "05/11" "Friends/Relatives"                       500     .
"IAP10A0053" "06/11" "Friends/Relatives"                       200     .
"IAP10A0053" "08/10" "Friends/Relatives"                       500     .
"IAP10A0053" "09/10" "Friends/Relatives"                       200     .
"IAP10A0055" "02/11" "Friends/Relatives"                       200     .
"IAP10A0055" "03/11" "Friends/Relatives"                       300     .
"IAP10A0055" "04/11" "Friends/Relatives"                      2500     .
"IAP10A0055" "05/11" "Friends/Relatives"                      1500     .
"IAP10A0055" "06/11" "Friends/Relatives"                       400     .
"IAP10A0055" "07/10" "Friends/Relatives"                       300     .
"IAP10A0055" "08/10" "Friends/Relatives"                       200     .
"IAP10A0055" "10/10" "Friends/Relatives"                       200     .
"IAP10A0055" "11/10" "Friends/Relatives"                      2500     .
"IAP10A0058" "04/11" "Friends/Relatives"                       250     .
"IAP10A0058" "05/11" "Friends/Relatives"                       150     .
"IAP10A0058" "06/11" "Friends/Relatives"                       100     .
"IAP10A0058" "07/10" "Friends/Relatives"                      4000     .
"IAP10A0058" "08/10" "Friends/Relatives"                       400     .
"IAP10A0058" "12/10" "Friends/Relatives"                       300     .
"IAP10A0081" "01/11" "Friends/Relatives"                       100     .
"IAP10A0081" "04/11" "Friends/Relatives"                       500     .
"IAP10A0081" "05/11" "Friends/Relatives"                      1500     .
end

merge by VDS_ID and SUR_MON_YR

ANOVA REPEATE MEASURES - HOW TO INTERPRET CONTRADICTORY RESULTS BETWEEN ANOVA AND MULTIPLE PAIRWISE COMPARISONS (post hoc)?

Good mornig to everybody.

HOW TO INTERPRET CONTRADICTORY RESULTS BETWEEN ANOVA AND MULTIPLE PAIRWISE COMPARISONS (post hoc)?How to write this in a scientific article???

Thanks a million to everybody

xtabond2 vs xtreg / xtregar

Dear Stata users,

I have always assumed that in the presence of serial autocorrelation and assumptions of endogeneity, a correctly specified GMM model should lead to "poorer" results (in the sense of less significance) than standard xtreg/xtregar because the autocorrelation and/or endogeneity will lead to false significance in the latter (overinflated coefficients, smaller errors). Yet every now and then I see someone report poor xtreg/xtregar results but then five-star GMM results. This seems weird to me. Am I alone? Because GMM is so sensitive to assumptions and instruments and lags etc., my knee-jerk response is to think that in such cases the GMM must be flawed.

So my basic question is: could you imagine xtreg output in which not much is going on in terms of significance but then great GMM results based on the same data, and that the latter results would be "true" and not spurious? Thanks in advance for any replies.

Time Series Graph Percentage Deviations from Trend

Hello,

I am currently working on time series analysis, trying to identify the effect of an increase in income on consumption.
Therefore I have quarterly data from 1990 to 2019.
To get a better overview of the development I want to graph the variables consumption and income as percentage deviations from trend over time.
However I only get this outcome:

I use the command:

[twoway (tsline d.y) (tsline d.c)]

Does anybody know how I need to change the code in order to get percentage deviations from trend?

Thank you very much in advance!




ANOVA REPEATE MEASURES - HOW TO INTERPRET CONTRADICTORY RESULTS BETWEEN ANOVA AND MULTIPLE PAIRWISE COMPARISONS (post hoc)?

Good mornig to everybody.

1)HOW TO INTERPRET CONTRADICTORY RESULTS BETWEEN ANOVA AND MULTIPLE PAIRWISE COMPARISONS (post hoc)? I ran a model and i find Non-significant ANOVA with significant multiple pairwise comparisons...i MEAN HOW to write this in an article??? 2) Another question How can I report results of Outcome, timepoint and the interaction between outcome and timepoint ? (see the output below) (table 1)

table 1

Source Partial SS df MS F Prob>F

Model .494415648 19 .026021876 6.40 0.000000

Outcome .063395043 3 .021131681 5.20 0.0016
Timepoint .294629916 4 .073657479 18.11 0.0000
Outcome#timepoint .37227184 12 .031022653 7.63 0.0000

Residual 1.27713207 314 .0040673


thanks to everybody


Xtreg with Demeaned variables

I have a panel dateset and my dependent variable and independent variables are demeaned and standardized. Assuming that I do not need to include country fixed effects since the variables are already demeaned, does the below command automatically fit in a random effects model?

xtreg depvar inde.var i.year, vce(robust)

Thank you.

Wednesday, April 29, 2020

Heteroskedasticity or other problem with regression?

Hello everyone, so I run a pooled OLS regression on panel data of log real monthly wage on education, age, sex, marital status, job sector, and more and got a pretty high R squared of 0.45, F-value of 43.19 and significant P-value, however with "hettest", heteroskedasticity was detected. I plotted the residual vs fitted values, and found them to look like this: Array

Since a lot of values are far from zero, what does this mean? Should I just reconsider the regression predictors, or is there something wrong with the data?It does have a lot of missing values but those get eliminated. Sorry but I'm pretty new.
Here is a clip of the data:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int age byte(education sex region maritalstatus urban) long mainjobcode float Year
33 16 2 2 1 0 6 2009
18  . 2 4 5 1 . 2009
34  . 2 4 5 1 . 2009
19  . 1 1 4 0 . 2009
28  . 2 2 1 0 6 2009
19  . 1 4 5 1 . 2009
32 14 2 4 1 0 6 2009
14  . 1 4 5 0 6 2009
17  . 1 4 5 0 6 2009
28 61 1 2 5 1 5 2009
.  . 2 2 . 1 . 2009
.  . 1 4 . 0 . 2009
30 61 1 1 5 1 . 2009
26  . 1 3 5 1 . 2009
14  . 2 2 5 0 . 2009
28  . 1 1 3 0 7 2009
50  . 1 2 1 1 9 2009
42  . 2 1 1 1 . 2009
.  . 2 1 . 1 . 2009
33 51 1 3 1 1 3 2009
30 31 1 1 1 0 9 2009
29 41 1 4 5 1 7 2009
32 14 2 2 1 0 6 2009
16  . 1 2 5 0 6 2009
14  . 2 2 5 0 . 2009
29  . 1 4 5 0 9 2009
23 15 1 3 5 0 6 2009
20 14 1 1 5 1 . 2009
30 31 2 1 1 1 6 2009
21 17 2 4 3 0 9 2009
.  . 1 3 . 0 . 2009
59 12 1 3 1 0 9 2009
21  . 2 1 4 0 . 2009
.  . 2 2 . 0 . 2009
40  . 2 1 1 0 6 2009
24  . 2 2 1 0 . 2009
50  . 2 1 4 0 5 2009
18  . 1 2 5 1 . 2009
.  . 1 4 . 0 . 2009
25  . 2 2 5 1 . 2009
.  . 2 2 . 0 . 2009
21 11 2 4 1 0 6 2009
.  . 2 2 . 0 . 2009
46 17 2 3 4 0 7 2009
28  . 1 2 5 1 5 2009
47 17 1 3 1 0 9 2009
21 13 2 1 3 1 . 2009
21  . 2 3 5 0 . 2009
14  . 2 2 5 0 6 2009
17  . 2 4 5 0 6 2009
43 12 2 2 1 0 6 2009
23 16 2 3 1 0 5 2009
58 15 2 1 3 0 6 2009
23  . 1 1 5 0 . 2009
28 17 2 2 1 0 6 2009
48  . 1 4 1 0 6 2009
41  . 2 2 1 0 6 2009
43  . 2 3 1 1 . 2009
36 17 1 4 1 0 6 2009
15  . 2 3 5 0 6 2009
15  . 2 1 5 0 . 2009
38  . 2 2 1 0 6 2009
15  . 1 1 5 0 . 2009
56 17 1 1 1 0 6 2009
22  . 1 3 5 1 . 2009
63 11 1 1 2 0 6 2009
44  . 2 4 4 0 6 2009
18 15 2 2 1 0 6 2009
27 17 1 4 1 0 6 2009
18  . 1 3 5 0 6 2009
46  . 1 2 1 0 6 2009
52 15 1 4 4 0 6 2009
16  . 1 2 5 0 . 2009
26  . 2 2 1 0 . 2009
.  . 2 2 . 0 . 2009
61  . 2 1 4 0 6 2009
15  . 1 1 5 1 . 2009
47 33 1 2 2 0 . 2009
35 13 2 3 2 0 6 2009
25  . 2 1 1 1 6 2009
16 14 2 1 5 0 6 2009
21  . 1 3 5 0 6 2009
21 17 1 1 5 0 . 2009
.  . 2 1 . 1 . 2009
45 14 1 3 1 1 5 2009
31  . 2 3 2 0 6 2009
14  . 1 1 5 0 6 2009
14  . 1 3 5 1 . 2009
15  . 1 4 5 0 6 2009
38 34 2 3 1 0 6 2009
17  . 2 4 5 1 . 2009
18 14 2 4 5 0 6 2009
29 41 1 4 1 0 9 2009
56 14 2 4 1 0 6 2009
41  . 2 4 1 0 6 2009
30 16 1 1 1 1 8 2009
61 33 2 1 1 0 6 2009
63 16 1 2 1 0 6 2009
16  . 1 2 5 0 6 2009
17  . 1 3 5 1 . 2009
end
label values education h4q7
label def h4q7 11 "Completed P.1", modify
label def h4q7 12 "Completed P.2", modify
label def h4q7 13 "Completed P.3", modify
label def h4q7 14 "Completed P.4", modify
label def h4q7 15 "Completed P.5", modify
label def h4q7 16 "Completed P.6", modify
label def h4q7 17 "Completed P.7", modify
label def h4q7 31 "Completed S.1", modify
label def h4q7 33 "Completed S.3", modify
label def h4q7 34 "Completed S.4", modify
label def h4q7 41 "Completed Post primary Specialized training or Certificate", modify
label def h4q7 51 "Completed Post secondary Specialized training or diploma", modify
label def h4q7 61 "Completed Degree and above", modify
label values sex h2q3
label def h2q3 1 "Male", modify
label def h2q3 2 "Female", modify
label values region region
label def region 1 "Central", modify
label def region 2 "Eastern", modify
label def region 3 "Northern", modify
label def region 4 "Western", modify
label values maritalstatus h2q10
label def h2q10 1 "Married monogamously", modify
label def h2q10 2 "Married polygamously", modify
label def h2q10 3 "Divorced / Separated", modify
label def h2q10 4 "Widow/Widower", modify
label def h2q10 5 "Never married", modify
label values urban urban
label def urban 0 "Rural", modify
label def urban 1 "Urban", modify
label values mainjobcode mainjobcode
label def mainjobcode 3 "Technicians and Associate Professionals", modify
label def mainjobcode 5 "Services and Sales Workers", modify
label def mainjobcode 6 "Skilled Agricultural, Forestry and Fishery Workers", modify
label def mainjobcode 7 "Craft and Related Trades Workers", modify
label def mainjobcode 8 "Plant and Machine Operators and Assemblers", modify
label def mainjobcode 9 "Elementary Occupations", modify

Sum of values of a conditional variable

I am new to using Stata, I have always made corrections using already clean bases.
I have the variables "sales" and "economic_sector". I need to create another variable called "MARKET_SHARE", which calculates the ratio of sales to total sales by economic sector.

I was trying to calculate the value of total sales given the condition like this:

egen float TOTAL_1=rowtotal(cond(economic_sector==1,sales,0))
egen float TOTAL_2=rowtotal(cond(economic_sector==2,sales,0))
egen float TOTAL_3=rowtotal(cond(economic_sector==1,sales,0))

gen marketshare = 0
replace marketshare=sales/TOTAL_1 if economic_sector==1
replace marketshare=sales/TOTAL_2 if economic_sector==2
replace marketshare=sales/TOTAL_3 if economic_sector==3

but replace marketshare=sales/TOTAL_1 if economic_sector==1, gives me "factor variables and time-series operators not allowed"

Any help would be very usefull.

Performing chi squared with one variable against a group of variables individually

I've done a ton of looking around, and I've likely just not got the right key words, but my goal is to perform a simple chi squared analysis using one dependent variable against a group of variables (independently) all at once. I'm quite new to STATA and do most of my work in R normally, but am trying to learn a multitude of platforms as some of biomedical group members use STATA exclusively. Thank you.

re case control matching

Hi everyone,

Can someone please suggest a method/any packages by which I can derive a case control dataset (from a larger unmatched dataset of cases and controls) matched for variables of interest which include a combination of continuous and categorical (e.g age or sex)
Is there an option to alter the matching ratio (e.g cases:controls 1:1, or 1:2 or 1:3) etc. Using STATA 16.

Many thanks,

Tejas

Foreach loop using macro in numlist

Probably a simple syntax issue, but I can't figure this out after reading the documentation on foreach, macros, and numlist. I run the following example and get an 'invalid syntax' error for the second forvalues loop. It seems like the local macro k is being being input to the forvalues function as '322 - 10' instead of '312', but I don't know how resolve this. When k is set to 312 instead of a function, the code works. But I want k to change depending on the iteration of the first forvalues loop.

For context, I'm trying to get out-of-sample estimates for a variable window of data--i.e. I have two months of data, I want to omit moving, multi-sized windows of data, run the estimation, and then compare the predicted values to the omitted observed values.

Code:
set seed 12345
set obs 100
gen dayofyear = 265 + _n
gen y = 10* exp( -exp( -0.1 * (dayofyear - 315))) + rnormal()
replace y = 0 if dayofyear < 300 | y < 0 
line y dayofyear

local windowstart 310
local windowend 322
local windowlengthmin 10
local windowlengthmax 11

forvalues i=`windowlengthmin'(1)`windowlengthmax' {

    local k `windowend' - `i'
    //local k 312

    forvalues j=`windowstart'(1)`k' {

        preserve
        keep if dayofyear < `j' | dayofyear >= (`i' + `j')
        nl (y = {b1 = 10} * exp( -exp( -{b2 = 0.1} * (dayofyear - {b3 = 315}))))
        estimates store est1
        
        restore
        preserve
        estimates restore est1        
        gen yhat = _b[b1:_cons] * exp( -exp( - _b[b2:_cons]* (dayofyear -  _b[b3:_cons])))
        keep if dayofyear >= `windowstart' & dayofyear < (`i' + `windowstart')
            
        mkmat yhat, mat(Yhat)
        mkmat y, mat(Y)
        matrix E = (Yhat - Y)' * (Yhat - Y)
        
        if `i'==`windowlengthmin' & `j' == `windowstart' matrix Error1 = E
        else matrix Error1 = (Error1 \ E)
        restore
        
    }
}
        
matrix list Error1

Quintile

hi I am new to the stata forum. I am currently working on my thesis topic " poverty and consumption inequality". I need help in making Quintile groups in stata 13 for different variables

Transforming panel data with different time points

Hello, I am using data collected from 36 different hospitals over the course of 7 quarters (2015q2 - 2016q4). The data consist of assessment scores measured from 0-100 percent (one score per assessment tool). I would like to use the final score for each hospital in a regression model and control for the baseline score. I received the data in long format and generated a variable for the quarter that each assessment score was made. The number of scores per hospital vary between 3 and 7, and the assessment dates also begin at different quarters during this time period. In an effort to use only the first and last assessment score, I kept only these for each hospital and then reshaped the data into wide format. However, due to the different starting and ending points, I am not sure how to make the data align so that I can use it in an analysis. Is there a way to "line up" all of the data to help with subsequent analysis?

These are the steps I have taken so far:

by facID, sort: keep if (_n == 1) | (_n == _N)

reshape wide tool_1_maternal tool_2_maternal tool_3_maternal tool_4_maternal tool_5_maternal tool_6_maternal total_maternal, i(facID) j(qdate)




Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 facID byte(tool_1_maternal tool_2_maternal tool_3_maternal tool_4_maternal) int tool_5_maternal byte(tool_6_maternal total_maternal) float qdate
"01"  40 100  67  88 100 63  84 225
"01"  40  67  17  38  50  0  38 223
"01"   0 100  67  88 100 63  76 227
"01"  40 100  80  88  40 75  68 224
"01"   0  50   0  43  25  0  23 222
"01"   0  83  40  86 100 43  71 226
"02"  40  67  20  43  60 29  46 223
"02"   0  33  17  38  17  0  23 222
"02"  40  67  40  75  60 29  61 224
"02"  60 100  80 100 100 63  88 227
"02"  20 100  80  75  75 43  71 226
"02"  80 100  83  86  80 57  86 225
"03"  60  83  67  75  83 63  77 226
"03"   0  50  33  38  33 13  33 223
"03"  40  83  40  63  40 38  57 224
"03"  60  83  80  75 100 63  81 227
"03"  60 100  83  88  83 75  87 225
"04"  80 100  67  50  67 38  59 226
"04"  40  33   0  25  17  0  23 222
"04"   0  17  17  38  17  0  21 221
"04"  80 100  67  75  67 50  74 227
"04"   0  83   0  38  50 25  33 223
"04"   . 100  60  71  33 13  65 225
"04"  80  50  25  25  25 38  40 224
"05"   0  33  17  38  50 13  26 221
"05" 100 100  67  75  60 63  74 224
"05"  20  83  33  50   0 13  38 223
"05"  20  83  33  38  33  0  36 222
"05" 100 100  50 100  83 63  84 226
"05"  80 100  67 100 100 63  87 227
"05"  80 100  33  63  67 63  64 225
"06"  60  67  50  57  33 38  53 227
"06"  20  50  17  25  20 14  24 222
"06"  60  67  25  43  25 38  42 226
"06"  20  50  25  43  25  0  33 225
"06"  20  50  17  25  17  0  23 224
"06"   0  50  20  29  17 13  22 223
"06"   0  17  17  25  50  0  21 221
"07"   0   0   0  13  17 25   5 224
"07"   0  33   0  13   0  0   8 223
"07" 100 100  83 100 100 75  97 227
"07" 100 100  83 100  80 75  95 226
"07"  80 100  67 100  83 75  82 225
"08"  20  50  33  43  40 13  36 224
"08" 100 100  83 100 100 75  97 227
"08"  60  50   0  29  40 13  33 223
"08" 100 100  83  86  75 75  86 226
"08"   0  67   0  13  17 25  15 222
"08" 100 100  80  75  60 71  76 225
"09"  80 100  80 100  80 75  89 227
"09"   0  67   0  43   0  0  22 223
"09"  60 100  80 100  80 50  83 226
"09"   0  50   0  25  33  0  23 224
"09"  60 100  67  71  60 13  72 225
"10"  60  83  50  50  83 63  68 225
"10"   0  67  40  38  50  0  41 224
"10" 100 100  83  75  67 63  82 227
"10"  60  83  50  57  67 38  65 226
"10"   0  67  50  38  67 38  49 223
end


After reshaping my data looks like this for one the assessment tool scores:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 facID byte(tool_1_maternal221 tool_1_maternal222 tool_1_maternal223 tool_1_maternal224 tool_1_maternal225 tool_1_maternal226 tool_1_maternal227)
"01" .  .   .  .  40   0   .
"02" .  .  40  .  80   .   .
"03" .  .   .  .  60  60   .
"04" .  .   . 80   .  80   .
"05" 0  .   .  .  80   .   .
"06" 0  .   .  .   .   .  60
"07" .  .   .  0  80   .   .
"08" .  .   . 20 100   .   .
"09" .  .   .  .  60   .  80
"10" .  .   0  .  60   .   .
"13" .  .   .  . 100 100   .
"14" .  .   .  .   . 100  80
"15" .  .   .  . 100 100   .
"16" .  .   . 80   .  80   .
"17" .  .   .  .  60   .  60
"18" . 60   .  .   .   . 100
"19" .  . 100  .   .   . 100
"22" .  .   .  .  40   . 100
"23" .  .   . 40  60   .   .
"25" .  .  80  .  80   .   .
"26" .  .  60  .   . 100   .
"27" .  .   .  .  80  80   .
"28" .  .  80  .   .  60   .
"29" .  .   0  .  40   .   .
"30" 0 20   .  .   .   .   .
"31" .  .   .  .   . 100 100
"32" .  .   0  . 100   .   .
"33" .  0   . 80   .   .   .
"34" 0  .   .  .   .  80   .
"35" .  0  60  .   .   .   .
"36" 0  .   .  .  80   .   .
end



Problem with Hausman test in xsmle

I was estimating spatial panel data models in STATA, when I tried to perform Hausman test, the following message appears:

... estimating fixed-effects model to perform Hausman test
_xsmle_hausman_ml(): 3200 conformability error
<istmt>: - function returned error
r(3200);

I'm a iniciant, someone know how to solve that?

Thats the code that i'm using


clear all
set more off

use C:\Users\aliss.LAPTOP-VERCCTIC\Desktop\weights.dta, clear

spmat dta W m*, normalize(row)

spmat summarize W
spmat summarize W, links
spmat summarize W, links detail

use C:\Users\aliss.LAPTOP-VERCCTIC\Desktop\paneldata2.dta, clear

global id ID2
global t YEAR
global ylist TX_MAMA
global xlist U_REF TX_ENV TX_URB PEST MED_ONCO

* Set data as panel data
sort $id $t
xtset $id $t

xtdescribe
xtsum $id $t $ylist $xlist

* Pooled OLS estimator
reg $ylist $xlist
//estat ic -> Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC)
estat ic

//Spatial Durbin Model model (SDM)
//SDM with random-effects
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) re vce(cluster ID2) nolog
estat ic
//SDM with spatial fixed-effects
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) fe type(ind) vce(cluster ID2) nolog
//SDM with spatial fixed-effects [data transformed according to Lee and Yu (2010)]
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) fe type(ind, leeyu) vce(cluster ID2) nolog
estat ic
//SDM with time fixed-effects
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) fe type(time) vce(cluster ID2) nolog
estat ic
//SDM with spatial and time fixed-effects
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) fe type(both) vce(cluster ID2) nolog
estat ic
//SDM without direct, indirect and total effects
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) noeffects nolog
estat ic
//testing the appropriateness of a random-effects variant using the Robust Hausman test
//(example: if Prob>=chi2 = 0.0000 -> p-value lower than one percent -> we strongly reject the null hypothesis -> use fixed-effects)
//SDM Hausman test
xsmle TX_MAMA U_REF TX_ENV TX_URB PEST MED_ONCO, wmat(W) model(sdm) hausman nolog


Here, in the last line above is where the problem happend



Xtabond2 Newbie question

Hi,
I am a quite newbie to dynamic panels. For my project, I want to run a simple model on government approval rates on a dataset consisting of quarterly approval data for 20 countries around 30 years (some countries have less data). Because the approval rate for any given quarter is also dependent on the approval rate in the previous period, I use the following model:
xtabond2 approval l.approval noparties inflation growth coalition c.wars##c.right, gmm(approval inflation growth, lag(1 2) collapse eq(diff)) iv(since2 noparties coalition wars right, eq(diff)) robust

noparties refers to the number of parties, coalition to coalition government, wars, if there is a military conflict going on, and right is the ideological orientation of the government. Inflation and growth are yearly macroeconomic indicators.
So I have two questions:
1- Is the way I use xtabond2 correct?
2- If I change lag(1 2) to something like lag(1 1) or (lag 0 3), the coefficients of almost all the variables change quite dramatically. Why is this the case? Then how can I choose the proper values for lag intervals?

Averaging values from different "sum, detail outputs"

Hi everyone,

I was wondering if there is a chance to average "sum, detail outputs" over various months.
First, I sorted my dataset by month and then let Stata produce summary statistics for each month via: "bysort month: sum variable_of_interest, detail"
Now I would like to know the time series average across the different metrics calibrated, e.g., mean, standard deviation,95th percentile, and so on.

Does anybody have an idea how to solve this?

Any help is much appreciated.

Best,
Phil

Latent class analysis using gsem - Cross validation

Hi,
I'm trying to learn LCA/LPA using gsem command in Stata by walking myself through Masyn (2013) - cited in SEM example 52 - and trying to replicate the steps mentioned in her empirical examples.

In her article, it is recommended to cross validate the optimal number of classes in large samples.
In particular:
  • Divide the sample in two subsamples A and B.
  • Obtain the optimal number of classes (say K-class) in one of the sub samples; say A - using a long procedure explained in the text.
  • Estimate model (1): a K-class model in subsample B fixing all parameters to parameters obtained from a K-class model in subsample A.
  • Estimate model (2): an unrestricted K-class model in subsample B
  • Test Model (1) against Model (2).
My question is: using the @sign on each coefficient and equation separately is the only way to estimate the restricted model (1)? (Which will be time consuming in case of having large number of indicators). Or is there any other ways to do it? Moreover, in case of the LPA, one would need to fix the estimated variance and covariance as well. In particular, hw can one restrict the entire e(b) matrix to specific numbers?

Thanks in advance,
Emma


Reference:
Masyn, K. E. (2013). 25 latent class analysis and finite mixture modeling. The Oxford handbook of quantitative methods, 551.


Three-year volatility

I have been looking on the forum for a topic about three-year volatility but I didn't find what I wanted.

I have panel data which looks like: Firm ID CFO Fiscal Year
1 0.042 2011
1 0.057 2012
1 0.032 2013
1 0.045 2014
1 0.031 2015
1 0.030 2016
2 0.041 2011
2 0.048 2012
2 0.051 2013
2 0.050 2014
2 0.034 2015
2 0.043 2016
2 0.041 2016


I would like to calculate the three-year volatility (measured as standard deviation over the year t, t-1 and t-2). I have been trying to compute this but it didn't work.
Does anyone know how to do this?

Thank you in advance!


IV using panel data and fixed effects

Hi,

I am using panel data about women's wellbeing and influencing factors, therefore, I have been using a fixed effects panel regression (xtreg..., fe cluster(pidp)) so far.
To establish causality I am trying to use an IV regression. I assume that I need to account for the panel nature and fe, so so far have used the following regress:
xtivreg $Y1 $X1 ($Y2 = $X2), fe
Where Y1 is the dependent variable (life satisfaction)
X2 is the endogenous variable (housewife)
X1 are controls (marital status, children, region, income, year, age)
X2 is the instrument (gender employment ratio by region and year)

I have seen that there is also a regression for IV that accounts for the endogenous variable being binary, which mine is, using treatreg $Y1 $X1, treat($Y2 = $X2 $X1), which makes the first stage probit. However, this does not account for fe, please could someone help by providing guidance on which is the best method to use?

I have some other job status' as well, (unemployed, part-time, full-time), do these need to be included in the iv regressions? Or are they accounted for as '0' in the binary variable 'housewife'?

Many thanks,

Ash

Computing the consistency ratio and consistency index for analytic hierarchy process with mata

hey everyone,

i'm completely new to stata and mata as well. im trying to figure out if there is a possibilty of getting the consistency ratio and index out of a matrix by using mata. i guess eigenvalues and eigenvectors are playing a role here but thats all i know yet.

thanks for every help in advance!
best wishes

Linear regression using a time variable

Hi all, I'm having some difficulties in doing a linear regression for my research.

My dataset consists of 361 respondents, with each of them being 'exposed' to two scandals on two different time occasions. That means that one respondents has four rows, two from before a scandal and two from after a scandal. We are working with four types of scandals and three response types in orde to check the effects on the dependent variable being "Watchingvids". Each respondent get exposed to two out of the 12 possible scenario's (scandals x responses) (see file)

How can I do a linear regression on Watchingvids taking into consideration that we need to check the difference between before and after the scandal?

Ik know this might sound vague, but please be free to ask additional questions so that I can make it more clear to you.

Thanks in advance for your help

Kind regards, Remi Letaief




Forecasting a variable

Hi,

I am using a panel dataset and for one of my variables I only have data 2014 to 2017.
For my dataset to balanced I need to forecast what the 2018 values for this variable.
Is there a way to do this in stata?

Thanks

Estimating asymmetrical confidence intervals for ICC using -nlcom-

Dear all
Out of curiosity I want to reproduce the calculations from:
Code:
cls
use https://www.stata-press.com/data/r16/judges, clear
icc rating target judge, format(%6.3f)
using a mixed regressions and possibly -nlcom-:
Code:
mixed rating, reml noheader nolog nofetable ||_all: R.target ||_all: R.judge
nlcom ///
    ( individual: exp(2*_b[lns1_1_1:_cons]) / (exp(2*_b[lns1_1_1:_cons]) + (exp(2*_b[lns1_2_1:_cons]) + exp(2*_b[lnsig_e:_cons]))) ) ///
    ( average: exp(2*_b[lns1_1_1:_cons]) / (exp(2*_b[lns1_1_1:_cons]) + (exp(2*_b[lns1_2_1:_cons]) + exp(2*_b[lnsig_e:_cons])) / 4) )
The point estimates are equal for the two methods. The same is not true for the confidence intervals.

The distribution of ICC is F and it is asymmetrical. This is clearly the problem for -nlcom-.

Without success, I've tried stepwise estimation by -nlcom- estimating the log variances in the first step with option post.
Note that I have to estimate the total variance as the sum of variances from two independent variables before (or at the same time) that I estimate ICC.
Also without success I've tried building estimation of log variances and ICC into one -nlcom- like
Code:
nlcom (log_a: ...) (log_b: ...) (ICC: exp(log_a - log_b))
I have the following questions:
  1. Is there a way of tricking -nlcom- into getting more exact confidence intervals? Or is it a lost cause?
  2. Are the approaches I've used with -nlcom- valid?
  3. If so, how do I do it right?
Looking forward to hear from you


LaTeX font on eps figures: cannot get writepsfrag package to work

Hello,

I cannot get writepsfrag to work. I am trying to have the same fonts on my Stata-produced figures and the rest of my LaTeX document.

I am running the example from the writepsfrag help file:

Code:
* ssc install writepsfrag
#delimit;
twoway function y=normalden(x), range(-4 4)
text(0.125 0 "\textbf{\color{blue}{Normal PDF:}}")
text(0.090 0 "\(y=\frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)")
xlabel(-4 "\(-4\sigma\)" -2 "\(-2\sigma\)" 0 "\(\mu\)" 2 "\(2\sigma\)" 4 "\(4\sigma\)")
xtitle("All text is set in {\LaTeX} font") ytitle("\(y\)");
graph export normal.eps, as(eps);
writepsfrag normal.eps using normal.tex, replace body(figure, caption("Normal Probability Density Function"));
#delimit cr
and adding it to Overleaf using the following code:

Code:
\documentclass[varwidth=true]{standalone}
\usepackage[utf8]{inputenc}
\usepackage{psfrag}

\begin{document}

\begin{figure}[htbp]
\centering
\psfrag{\\textbf{\\color{blue}{Normal PDF:}}}[c][c][1][0]{\normalsize \textbf{\color{blue}{Normal PDF:}}}
\psfrag{\\(y=\\frac{1}{\\sigma\\sqrt{2\\pi}}e^{\\frac{-(x-\\mu)^2}{2\\sigma^2}}\\)}[c][c][1][0]{\normalsize \(y=\frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)}
\psfrag{0}[c][c][1][0]{\normalsize 0}
\psfrag{.1}[c][c][1][0]{\normalsize .1}
\psfrag{.2}[c][c][1][0]{\normalsize .2}
\psfrag{.3}[c][c][1][0]{\normalsize .3}
\psfrag{.4}[c][c][1][0]{\normalsize .4}
\psfrag{\\(y\\)}[c][c][1][0]{\normalsize \(y\)}
\psfrag{\\(-4\\sigma\\)}[c][c][1][0]{\normalsize \(-4\sigma\)}
\psfrag{\\(-2\\sigma\\)}[c][c][1][0]{\normalsize \(-2\sigma\)}
\psfrag{\\(\\mu\\)}[c][c][1][0]{\normalsize \(\mu\)}
\psfrag{\\(2\\sigma\\)}[c][c][1][0]{\normalsize \(2\sigma\)}
\psfrag{\\(4\\sigma\\)}[c][c][1][0]{\normalsize \(4\sigma\)}
\psfrag{All text is set in {\\LaTeX} font}[c][c][1][0]{\normalsize All text is set in {\LaTeX} font}
\resizebox{1\linewidth}{!}{\includegraphics{normal.eps}}
\caption{Normal Probability Density Function}
\end{figure}

\end{document}
However, the figure generated does not work the way it should:
Array
Am I doing something wrong? Is there an alternative way to produce eps files in Stata with a font that corresponds to the LaTeX document? I tried downloading LM Roman 10 and setting it as the Stata graph font. It works within Stata, but when I export graphs as eps, they revert back to the standard font .

Dropping multiple missing observations

Currently I'm working on a project in which I use Item response theory. I have 8 variables from a lot of cases, which I would like to find the latent variable for. Now, the problem is that there is quite some missing data. I already worked out how to impute some of the values, but now I want to make another model, in which I drop all the observations which have missing values for more than 4 of the 8 variables, because imputing these can be seen as inreliable. I've looked around in multiple manuals, but can't seem to find what I'm looking for.

Which F stats should I look at with ivlasso?

Hi,

I'm running IV regression with ivlasso in Stata. It reports several different first stage F statistics, does any one know which one is the right one that I should look at? I'm clustering standard errors, so it should be one of the last three F stat. BTW I also wonder why LASSO gives so large F stats, does it use a different way to calculate than usual ivreg? Thanks!
Array

Difference and Difference Design Model Specification

Hey everyone,

I have a question concerning a model specification for a Difference and Difference Design. I have cross sectional data over 7 time periods in one country. Treatment kicks in at round 5, 6 and 7. Treatment is definined on the individual level. I am running the following estimation controlling for leads and lags:


y_igt = delta_g + alpha_t + Treatement_igt * T + X_igt * sigma + u_igt


i= individual i living in distrcit g at time t.
Delta and alpha are district and time fixed effect
Treatment is defined on the individual level taken the value 1 if i was treated, o otherwise. Treatment is multiplied with a survey dummy t, taken the value 1 for round 2,3,4,5,6,7. This allows me to control for lead and lags taking round 1 as a baseline.
X are individual co variates and u clustered standard errors.

Is this model specification correct? Do I also need to include the interaction effect for round 1?

All the best,

Simeon




Expanding a time series dataset by one month

Hi,

I have panel data and I would like to expand the dataset by one month. So I would like to expand each "stock" time series to "2020m4". Here is the data I have:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double stock float(date exret)
 1 722           .
 1 721  .016571907
 1 720 -.021690037
 2 722  -.08885346
 2 721   .03352593
 2 720 -.013057826
 3 722   -.1733951
 3 721   .05968564
 3 720  -.13112126
 4 722  .025679555
 4 721  -.04374008
 4 720   .05557474
 5 722 -.023603665
 5 721 -.005209302
 5 720   .05193632
 6 722   .01481354
 6 721  -.04118994
 6 720  .033280447
 7 722   .04653623
 7 721  -.04578867
 7 720   .04011091
 8 722   .07355644
 8 721  -.03574601
 8 720   .08482119
 9 722  -.08222993
 9 721   .06095662
 9 720   -.0952904
10 722  -.05354553
10 721 .0007637492
10 720  -.09510595
end
format %tm date
label values stock unit_id
label def unit_id 1 "130062", modify
label def unit_id 2 "130088", modify
label def unit_id 3 "130298", modify
label def unit_id 4 "130502", modify
label def unit_id 5 "130591", modify
label def unit_id 6 "131844", modify
label def unit_id 7 "132808", modify
label def unit_id 8 "134057", modify
label def unit_id 9 "13466Q", modify
label def unit_id 10 "13471D", modify
I have tried "predict" thinking that it might expand the dataset automatically but that is not the right way to proceed. Please help!



How to check for significant heterogeneity for a categorical variable with 2 interactions

Dear Statlist,

I am estimating whether elderly age in a healthier way because of the Long-term care system they are in. I do this by grouping European countries in different LTCsystem groups. I regress health (grip strength) on factors determining health and an interaction of age with the LTC system categorical variable to see whether the age health slopes statistically differ between the LTC system groups. I estimate this using random effects. However, I also included age square, which leaves me with two interactions.

xtreg maxgrip c.age i.LTCsystem c.age#i.LTCsystem c.age#c.age c.age#c.age#i.LTCsystem bmi if female==1, re vce(cluster mergeid_n)

How can I test whether the slope of the age health slopes significantly differ for the four clusters? (I know I can do testparm i.LTCsystem#c.age or test 2.LTCsystem#c.age=2.LTCsystem#c.age but then I can only conclude something about the first polynomial of age.)

Any help would be greatly appreciated.


Ouput in case you would need it:
VARIABLES Max. of grip strength measure
Age at interview (in years) 0.380*
(0.204)
LTCsystem = 2, Cluster 2 12.961
(10.910)
LTCsystem = 3, Cluster 3 14.001
(11.303)
LTCsystem = 4, Cluster 4 34.450**
(13.460)
1b.LTCsystem#co.age
2.LTCsystem#c.age -0.295
(0.288)
3.LTCsystem#c.age -0.386
(0.298)
4.LTCsystem#c.age -0.963***
(0.357)
c.age#c.age -0.005***
(0.001)
1b.LTCsystem#co.age#co.age
2.LTCsystem#c.age#c.age 0.002
(0.002)
3.LTCsystem#c.age#c.age 0.002
(0.002)
4.LTCsystem#c.age#c.age 0.007***
(0.002)
Body mass index 0.077***

Regression on Panel Data

Hello,

I am having a bit of trouble running the xtreg command on my dataset. I have panel data on 24 countries for a 12 year time period (2001-2012) on 13 variables (HRlog, DV, dGDP, GDP, CPI, GINI, HDI, dUP, Dpenalty, PM1, PM2, PM3, UEM). I am doing an analysis on homicide rates in the developing world and wish to know if capital punishment has any deterrent effect on homicide rates or not. So my dependent variable is HRlog (homicide rates per 100,000 people) and I have taken the log of it, while the rest of the socio-economic indicators like the Gini Index and HDI and so on are my explanatory variables. DV (legislation on domestic violence) and Dpenalty (Capital punishment) are dummy variables whose value is between 0 and 1. 0 means no policy on domestic violence in that country and no death penalty, while 1 means there is legislation on domestic violence and there is capital punishment. I have also created year dummies from y2001-y2012.

Before running xtreg, I do setup my data as panel data by running the xtset command STATA recognizes it as Panel Data.

So i am running the command xtreg HRlog dGDP GDP GINI HDI CPI PM1 PM2 PM3 UEM dUP NoDeath Death NoLegis Legis y2001 y2002 y2003 y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011 y2012, fe

But I get the error no observations r(2000). Please help me, what am I doing wrong?

How to do Reality check and SPA test by Stata?

Dear Statalist,

I want to know whether there are any user written commands by stata for performing Reality Check (White 2000) and Superior Predictive Ability test (Hansen 2001)?
So far I didnt find anything.

Here is the link to White`s paper: White, H. (2000). A reality check for data snooping. Econometrica, 68(5), 1097-1126. http://www.ssc.wisc.edu/~bhansen/718/White2000.pdf

Here is Hansen`s paper: Hansen, P. R. (2001). An unbiased and powerful test for superior predictive ability (No. 2001-06). http://www-siepr.stanford.edu/workp/swp05003.pdf

Thanks a lot on advance for your help!
It is really appreciated!
Sincerely Ning.

Parametric survival analysis


Good morning. I am doing parametric survival analysis through streg. After streg, I got the following graph through the stcurve option. What should I do when I want to get the exact hazard ratio at a specific point in time as shown in the graph? Array

Question about diff-in-diff with multiple control groups and one treatment group

Hello!

I am running a Difference in Difference (DD) regression to see whether the introduction of a policy affected school enrolment for a particular social group (S1) as compared to three other social groups (S2, S3, S4) and I have data from before and after the policy was implemented.

The command for the regression is:
reg school hh_S2 hh_S3 hh_S4 post post_X_S2 post_X_S3 post_X_S4

where hh_S2 = 1 if the household belongs to the social group S2 and 0 otherwise and so on for S3, S4, S5
and post = 1 for years after the policy and 0 for before the policy

I am more interested in seeing the effect of the policy on S1 school enrolment because that is my treatment group. My question here is that since I have multiple control groups rather than one treatment and one control group, what would be the interpretation of the coefficient for the constant term as well as the post coefficient. Is the coefficient for 'post' the effect of the policy on school enrolment for social group S1 after the policy and coefficient of constant the expected school enrolment for social group S1 before the policy?

Thank you

Tuesday, April 28, 2020

How to model "Sparial Variability" or "Choice of Location"

I have a survey data on 10000 delivery person. I have number of delivery they made in 49 neighborhoods (that is 49 columns plus 01 as "other neighborhoods") and one column as their earning per hour (EPH). I am trying to explain that a person's EPH depends on his choice of neighborhoods to work in.

For initial exploration, I divided the 10000 into 10 groups (based on EPH, low to high) and plotted the fraction of delivery they made in different neighborhoods. It is clear from the graph that high earning groups pick the rich neighborhoods more. (I arbitrarily picked a neighborhood that I know rich people live in and a neighborhood I know to be poor. Also, I picked "fraction of delivery made in neighborhood X" instead of "total delivery" because different delivery person has different years of experience).

My question is how do I statistically model the variation in EPH due to choice of neighborhoods? What kind of regression model should I be studying? How do I deal with the fact that neighborhoods are spatially correlated?

Additionally,
1) I have data from the delivery company which tells me true EPH per neighborhood, aggregated over ALL delivery for ALL delivery person. That is 125 rows for 125 neighborhoods and 2 columns with EPH and Total Delivery. How do I incorporate this information in my modelling?
2) I have the Well-Known Text representation of the neighborhoods.

Thanks!