BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Tuesday, May 31, 2022

Problems corresponding to variable names in stata double-layer loops

I encountered a problem in the process of data merging. I put 5 control variables into 5 sheets of an excel table. After converting the panel data, I want to put the corresponding variable name in each sheet.After changing it, although the variable name has been changed, the dta name has also been changed, and the dta data generated all have become the data in the last sheet. Please help me and answer it, thank you very much! ! ! ! (below is the stata code I wrote)
local x cpi house profits population rgdp
forvalues i = 1/5{
foreach m of local x{
import excel using control.xlsx,sheet(Sheet`i') first clear
reshape long M, i(province) j(year)
rename M `m'
save `m'.dta,replace
}
}

How to loop through multiple locals instead of all variables inside each local?

Code:

local ipv "angry fear intimacy_idx cont12_idx ev12_idx pv12_idx sv12_idx ipv_ovall_idx"
local ipv_12_entire "cont12_idx ev12_idx pv12_idx sv12_idx cont_idx ev_idx pv_idx sv_idx"

* IPV outcomes
eststo clear
foreach outcome of local ipv {
eststo `outcome'_demo: reg `outcome' arm_free `demo_controls', vce(cluster new_csps)
}

estout * using "${Oleaf}/REG_IPV/panelA_ipv.tex", replace


* Violence indexes (12 months VS entire marriage)
eststo clear
foreach outcome of local ipv_12_entire {
eststo `outcome'_demo: reg `outcome' arm_free `demo_controls', vce(cluster new_csps)
}

estout * using "${Oleaf}/REG_IPV/panelA_ipv_12_entire.tex", replace

QUESTION:
If I do the above by running two separate regressions, it works as intended.
However, I would like to run the two locals "ipv" and "ipv_12_entire" inside a loop.
I did the following.
Instead of looping over two locals as I'd like, the code below is looping every variable inside each local.
Basically, the code below produces 16 .tex files (8 variables in local "ipv" and 8 variables in local "ipv_12_entire").
Is there a way to write a loop with grouping locals? I just want to produce 2 .tex files.

Code:

loc i = 1

eststo clear
foreach outcome of local `ipv' `ipv_12_entire' {

eststo `outcome': reg `outcome' arm_free `demo_controls', vce(cluster new_csps)

estout * using "${Oleaf}/REG_IPV/panelA_`outcome'.tex", replace

loc i = `i' + 1
}

How to loop through two locals instead of all variables inside each local

Code:

  local ipv "angry fear intimacy_idx cont12_idx ev12_idx pv12_idx sv12_idx ipv_ovall_idx" local ipv_12_entire "cont12_idx ev12_idx pv12_idx sv12_idx cont_idx ev_idx pv_idx sv_idx"   *------------------------------------------------------------------------------* * REGRESSION TABLES: PANEL A *------------------------------------------------------------------------------*  * IPV outcomes  eststo clear  foreach outcome of local ipv {      eststo `outcome'_demo: reg `outcome' arm_free `demo_controls', vce(cluster new_csps)         estadd local demo_ctrl "Yes" : `outcome'_demo               estadd local fe "None" : `outcome'* }     estout * using "${Oleaf}/REG_IPV/panelA_ipv.tex", replace           * Violence indexes (12 months VS entire marriage) eststo clear  foreach outcome of local ipv_12_entire {      eststo `outcome'_demo: reg `outcome' arm_free `demo_controls', vce(cluster new_csps)         estadd local demo_ctrl "Yes" : `outcome'_demo               estadd local fe "None" : `outcome'* }     estout * using "${Oleaf}/REG_IPV/panelA_ipv_12_entire.tex", replace

If I do the above by running two separate regressions, it works as intended. However, I would like to run the two locals "ipv" and "ipv_12_entire" inside a loop. I did the following. And instead of looping over two locals as I'd like, the code below is looping every variable inside each local. Is there a way to write a loop with grouping locals?

Code:

 loc i = 1   eststo clear foreach outcome of local `ipv' `ipv_12_entire' {          eststo `outcome': reg `outcome' arm_free `demo_controls', vce(cluster new_csps)         estadd local demo_ctrl "Yes" : `outcome'         estadd local fe "None" : `outcome'      estout * using "${Oleaf}/REG_IPV/panelA_`outcome'.tex", replace  loc i = `i' + 1     }

Transformation of values

Hi,

I have a data set on tumors with the variables karnofsky index/KI (categorical with values 40, 60, 80, 90, 100) and gross tumor volume/GTV (continuous, m3), among others. I have done a linear regression showing a significant negative correlation between the two (coefficient -0.16, p 0.007).

1. I want to visualise the relationship, f.eks with a scatter plot, but as KI is categorical and GTV is continuous, it ends up looking very strange and I cannot see a linear relationship. I have transformed the GTV variable with cube root because it was very left-skewed, but KI has a normal distribution. Do I still have to transform KI to get a linear relationship?Or convert it to a continuous variable?

Array

2. I want to calculate the Pearon's correlation coefficient, but as far as I know assumptions are that the variables must be continuous and have a linear relationship. How do I solve this? By transforming one or both values?

Thanks in advance,
Best regards

keep observations if first 2 character of the string is capital letter

I have a dataset where the variable x is a string. I need to keep only observations at which the first 2 characters in the variable x is capital letter. Can anyone tell me how to do it ?

Latent growth curve modeling

Dear experts,
I am currently trying to fit an unconditional growth curve model (as a first step before looking forward to more complex models). As you can see in the attachments, none of my attempts worked out - no matter whether I took a linear, quadratic or cubic model. I used the following sem commands for computing these curves:

sem (Intercept@1 Slope@0 -> deprivation1) ///
(Intercept@1 Slope@1 -> deprivation2) ///
(Intercept@1 Slope@2 -> deprivation3) ///
(Intercept@1 Slope@3 -> deprivation4) ///
(Intercept@1 Slope@4 -> deprivation5) ///
(Intercept@1 Slope@5 -> deprivation6) ///
(Intercept@1 Slope@6 -> deprivation7) ///
(Intercept@1 Slope@7 -> deprivation8) ///
(Intercept@1 Slope@8 -> deprivation9) ///
(Intercept@1 Slope@9 -> deprivation10) ///
(Intercept@1 Slope@10 -> deprivation11) ///
(Intercept@1 Slope@11 -> deprivation12) ///
(Intercept@1 Slope@12 -> deprivation13), ///
method(mlmv) noconstant means(Intercept Slope)

sem (Intercept@1 Linear@0 Quadratic@0 -> deprivation1) ///
(Intercept@1 Linear@1 Quadratic@1 -> deprivation2) ///
(Intercept@1 Linear@2 Quadratic@4 -> deprivation3) ///
(Intercept@1 Linear@3 Quadratic@9 -> deprivation4) ///
(Intercept@1 Linear@4 Quadratic@16 -> deprivation5) ///
(Intercept@1 Linear@5 Quadratic@25 -> deprivation6) ///
(Intercept@1 Linear@6 Quadratic@36 -> deprivation7) ///
(Intercept@1 Linear@7 Quadratic@49 -> deprivation8) ///
(Intercept@1 Linear@8 Quadratic@64 -> deprivation9) ///
(Intercept@1 Linear@9 Quadratic@81 -> deprivation10) ///
(Intercept@1 Linear@10 Quadratic@100 -> deprivation11) ///
(Intercept@1 Linear@11 Quadratic@121 -> deprivation12) ///
(Intercept@1 Linear@12 Quadratic@144 -> deprivation13), ///
method(mlmv) noconstant means(Intercept Linear Quadratic)

sem (Intercept@1 Linear@0 Quadratic@0 Cubic@0 -> deprivation1) ///
(Intercept@1 Linear@1 Quadratic@1 Cubic@1 -> deprivation2) ///
(Intercept@1 Linear@2 Quadratic@4 Cubic@8 -> deprivation3) ///
(Intercept@1 Linear@3 Quadratic@9 Cubic@27 -> deprivation4) ///
(Intercept@1 Linear@4 Quadratic@16 Cubic@64 -> deprivation5) ///
(Intercept@1 Linear@5 Quadratic@25 Cubic@126 -> deprivation6) ///
(Intercept@1 Linear@6 Quadratic@36 Cubic@216 -> deprivation7) ///
(Intercept@1 Linear@7 Quadratic@49 Cubic@343 -> deprivation8) ///
(Intercept@1 Linear@8 Quadratic@64 Cubic@512 -> deprivation9) ///
(Intercept@1 Linear@9 Quadratic@81 Cubic@729 -> deprivation10) ///
(Intercept@1 Linear@10 Quadratic@100 Cubic@1000 -> deprivation11) ///
(Intercept@1 Linear@11 Quadratic@121 Cubic@1331 -> deprivation12) ///
(Intercept@1 Linear@12 Quadratic@144 Cubic@1728 -> deprivation13), ///
method(mlmv) noconstant means(Intercept Linear Quadratic Cubic)

Thereby deprivation* is an index of twenty items for material deprivation, one for each of the 13 panel waves.
Do any of you have any ideas which kind of growth function could be more appropriate to my data?

Thanks for your help and best regards
Patrick

Monday, May 30, 2022

Heckman

Hi,

How do I apply weights in Heckman selection model?

Thank you

Compare coefficients with xtreg i.year and fe

Hello,

I have two models:

Model A: Y = α1 + β1X1 + Σθ∙C + firm-fixed effect + year-fixed effect

Model B: Y = α1 + β2X2 + Σθ∙C + firm-fixed effect + year-fixed effect

Where Σθ∙C is list of controlling variables and are the same for both model A & B

I want to compare the coefficients of β1 and β2

I search some related threads. The closest one is here: https://www.stata.com/statalist/arch.../msg00275.html

If there is no time effect in my model, everything works fine as the example in the link. However, there is time fixed effect and I do not know how to deal with it

Code of each model:

HTML Code:

xtreg y x control1 control2 i.year, fe

Thank you
Phan

adjustrcspline error

I'm trying to fit an stcox model with cubic splines to see the association between mortality and ratiopasit where ratiopasit is the ratio of PA (mins) and sitting time (hours).When I use the adjustrcspline command I get an error.

mkspline2 ssratiopasit =ratiopasit, cubic nknots (5) displayknots
mat ratiopasitknots=r(knots)
matrix list ratiopasitknots

ratiopasitknots[1,5]
knot1 knot2 knot3 knot4 knot5
ratiopasit 0 1.5155344 4.496788 10.675714 35.928143

stcox ssratiopasit1 ssratiopasit2 ssratiopasit3 ssratiopasit4 drage_05 i.sex_05 i.gq5_edu_05_recoded i.smokstat_05recoded i.gq3_marr__05_recoded i.no_comorbid_recoded waist_05
-------------------------------------------------------------------------------------------
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
--------------------------+----------------------------------------------------------------
ssratiopasit1 | .8297227 .0597145 -2.59 0.009 .7205641 .9554178
ssratiopasit2 | 1609080 1.18e+07 1.94 0.052 .8802037 2.94e+12
ssratiopasit3 | 1.96e-10 2.35e-09 -1.86 0.062 1.23e-20 3.144156
ssratiopasit4 | 4733.167 24257.47 1.65 0.099 .2054684 1.09e+08
drage_05 | 1.128093 .0040805 33.32 0.000 1.120124 1.136119
|
sex_05 |
female | .6823007 .0531652 -4.91 0.000 .5856657 .7948805
|
gq5_edu_05_recoded |
high school or more | .9878815 .0885629 -0.14 0.892 .8286959 1.177645
|
smokstat_05recoded |
ex-smoker | .4832277 .0583604 -6.02 0.000 .3813734 .6122845
non-smoker | .4035807 .0482261 -7.59 0.000 .3193122 .5100882
unknown category/missing | .4148576 .0792575 -4.61 0.000 .2852851 .6032801
|
2.gq3_marr__05_recoded | 1.289657 .0951568 3.45 0.001 1.116011 1.490321
|
no_comorbid_recoded |
1 | 1.227457 .1227486 2.05 0.040 1.008984 1.493236
2 | 1.45518 .1514774 3.60 0.000 1.186617 1.784526
3 | 1.782304 .1993574 5.17 0.000 1.431435 2.219176
|
waist_05 | 1.005195 .0028721 1.81 0.070 .9995814 1.01084
-------------------------------------------------------------------------------------------

. adjustrcspline

all variables created in the last call to mkspline2 must be
independent variables in the last estimation command
r(198);

I've looked for this error on Statalist. Since the function (adjustrcspline) is calling values from e() estimated from the equation, these matrices should not be empty.
. matlist e(b)

| 1b. 2.
| ssratio~1 ssratio~2 ssratio~3 ssratio~4 drage_05 sex_05 sex_05
-------------+-----------------------------------------------------------------------------
y1 | -.1866637 14.29117 -22.35132 8.46235 .1205287 0 -.3822848

| 1b. 2. 1b. 2. 3. 4. 1b.
| gq5_edu~d gq5_edu~d smoksta~d smoksta~d smoksta~d smoksta~d gq3_mar~d
-------------+-----------------------------------------------------------------------------
y1 | 0 -.0121925 0 -.7272673 -.9073788 -.8798199 0

| 2. 0b. 1. 2. 3.
| gq3_mar~d no_com~ed no_com~ed no_com~ed no_com~ed waist_05
-------------+------------------------------------------------------------------
y1 | .2543762 0 .2049448 .3751297 .5779068 .0051814

Can someone shed light on why do I still get this error. Thanks in advance.

Clarification on Methods used for 95% CI Calculation in sts list Commands

Hello,

Would anyone be able to clarify the following three questions about the methods Stata uses to calculate confidence intervals for the survivor function with respect to the sts list command?

The main sts manual (https://www.stata.com/manuals/ststs.pdf) states on page 17 that while the standard error for survivor functions is calculated using Greenwood's formula, the confidence intervals are calculated using the ln[−ln S(t)] approach:

The standard error reported is given by Greenwood’s formula (Greenwood 1926) ... These standard errors, however, are not used for confidence intervals. Instead, the asymptotic variance of ln[−ln S(t)] ... is used.

1. I presumed that, because this manual applies to confidence intervals generated using the sts graph and sts list commands, the ln[−ln S(t)] approach is used to calculate all confidence intervals for the survivor function using the sts list command. Is this accurate?

The reason I am asking for clarification is that the manual specific to sts list (https://www.stata.com/manuals/ststslist.pdf) states on page 3 that the level option for sts list,

specifies the confidence level, as a percentage, for the Greenwood pointwise confidence interval of the survivor or failure function

In my analyses, I am using the risktable(numlist) option with the sts list function to describe an sts graph Kaplan Meier curve at 3 specified time points. I am finding that, in some cases, the sts list command and the sts list command with the risktable() option produce different survivor function results at the same time points. Namely, when sts list with risktable() is used, some time points within the extent of my survival data (but with only one participant at risk) are not assigned a survival estimate, while the sts list command alone assigns an estimate. I have attached images of the three outputs I am referring to below.

2. What could be causing the difference in survivor function & confidence interval availability between sts list and sts list, risktable()?
3. Could you explain how I should interpret a missing survivor function result that is within the extent of my data and displays a confidence interval in my Kaplan-Meier graph and sts list, but cannot be assigned a confidence interval according to sts list, risktable()?

I greatly appreciate your time and assistance with this,
Andrew

Code:

sts list, risktable(6 12 24)

Array

Code:

sts list, at(6 12 24)

Array

Code:

sts graph, ci risktable risktable(0(6)24)

Array

Resize graphs for putexcel but obtain good quality

Dear all

I am creating a data report with Excel and use putexcel to transfer the Stata graphs to excel. The height option has to be large for a good quality of the exported png.

Code:

sysuse auto.dta, clear
histogram weight
graph export pic.png, replace height(2000)
putexcel set "pic.xlsx", modify open
putexcel A1 = picture(pic.png)
putexcel save

However, using a large value makes the graph in Excel very large (for instance from coloumn A-Z) and I have to manually resize the exported graph in Excel. Is there a way to resize the exported png before put it into Excel or resize it from Stata afterwards? Unfortunately, I could not find an option for the "putexcel = picture"-command that is doing the resizing.

Best wishes & thanks
Martin

Sunday, May 29, 2022

GLS Fixed Effects regression, Omitted because of Collinearity

Dear Stata community,

Hi, I have a panel data (Company ID, years) independent variables(MTB, size, growth, roa, tangibility, covid19: dummy), and dependent variable (leverage).

I tried to rung a command: xtreg leverage MTB size growth roa tangibility covid if year == 2016, fe
However, Stata drops every independent variable due to collinearity even though the only added dummy is covid variable.
So I took covid out and re-run the command but Stata still drops all variables due to the same reason.
So, I ran the command without if: xtreg leverage MTB size growth roa tangibility covid, fe
It perfectly runs fine.

However, I would like to run xtreg command for each year(from 2016 to 2021), is there any possible solution for this?

P.S: I dropped every independent variable by one but it still suffered from collinearity, so I believe it is a matter of year variable.

Any advice is appreciated!

Thanks in advance

Array
Array

generating a new variable based on 2 conditions

hello there
new stata user

i have a dataset that look like this

date_m com date com_ret Date com_cap smb_group groupmean
2018m12 Comp710 12/31/2018 .0148 12/31/2018 52757 1 .0102609
2018m12 Comp712 12/31/2018 .0675 12/31/2018 45732 1 .0102609
2018m12 Comp664 12/31/2018 -.0437 12/31/2018 74315 1 .0102609
2018m12 Comp682 12/31/2018 -.1154 12/31/2018 37400 1 .0102609
2018m12 Comp708 12/31/2018 -.0226 12/31/2018 64900 1 .0102609
2018m12 Comp535 12/31/2018 .0716 12/31/2018 69545 1 .0102609
2018m12 Comp399 12/31/2018 -.0428 12/31/2018 62014 1 .0102609
2018m12 Comp735 12/31/2018 .0709 12/31/2018 36048 1 .0102609
2018m12 Comp679 12/31/2018 .1521 12/31/2018 64328 1 .0102609
2018m12 Comp652 12/31/2018 .0615 12/31/2018 40797 1 .0102609
2018m12 Comp709 12/31/2018 .1286 12/31/2018 61920 1 .0102609
2018m12 Comp651 12/31/2018 -.017 12/31/2018 74000 1 .0102609
2018m12 Comp581 12/31/2018 -.0494 12/31/2018 40706 1 .0102609
2018m12 Comp562 12/31/2018 .1139 12/31/2018 36962 1 .0102609
2018m12 Comp419 12/31/2018 .0445 12/31/2018 44685 1 .0102609
2018m12 Comp744 12/31/2018 -.0351 12/31/2018 23202 1 .0102609
2018m12 Comp592 12/31/2018 .0172 12/31/2018 78455 1 .0102609
2018m12 Comp696 12/31/2018 .5328 12/31/2018 75600 1 .0102609
2018m12 Comp753 12/31/2018 .0586 12/31/2018 27768 1 .0102609
2018m12 Comp691 12/31/2018 .0142 12/31/2018 50693 1 .0102609
2018m12 Comp588 12/31/2018 .1562 12/31/2018 28186 1 .0102609
2018m12 Comp688 12/31/2018 -.045 12/31/2018 67505 1 .0102609
2018m12 Comp620 12/31/2018 -.0985 12/31/2018 63988 1 .0102609
2018m12 Comp742 12/31/2018 .2919 12/31/2018 49545 1 .0102609
2018m12 Comp332 12/31/2018 -.024 12/31/2018 80563 1 .0102609
2018m12 Comp571 12/31/2018 -.0778 12/31/2018 38241 1 .0102609
2018m12 Comp613 12/31/2018 .3841 12/31/2018 69165 1 .0102609
2018m12 Comp724 12/31/2018 .0464 12/31/2018 62542 1 .0102609
2018m12 Comp431 12/31/2018 .015 12/31/2018 80312 1 .0102609
2018m12 Comp668 12/31/2018 .0361 12/31/2018 74600 1 .0102609

i`ve generated the mean of returns based on groups (5 groups)
now I want to get the difference of groupmean of group 1 and group 5 only
but I can`t comp up with the code for this

here`s what should be done
1- sorted data by date (monthly)
2- keep the data sorted by groups as well (1 to 5)
3- get the difference of two numbers. let`s call these two numbers A and B
=> A is the groupmean for group 1 for the first month
=> B is the groupmean for group 5 for the first month
=> A-B is the same for the first month

repeat this process for the whole data (Monthly)

I hope I explained well

TIA

Plot the estimated hazard curve at mean-1SD mean, and mean+1SD after fitting survival analysis using Stata.

Hi, everybody,
I have a dataset like this,
* Example generated by -dataex.
clear
input int id int time byte injury byte safescore
1 1 1 9
2 1 1 9
3 1 0 6
3 2 0 6
3 3 1 6
4 1 0 6
4 2 0 6
4 3 0 6
5 1 0 6
5 2 1 7
6 1 0 6
6 2 0 6
7 1 0 6
7 2 0 8
8 1 0 6
8 2 0 6
8 3 1 7
9 1 0 6
9 2 1 8
10 1 0 6
10 2 1 7
end

I fitted a simple discrete-time survival model with logit function in Stata.
logit injury safescore,nocons

Now, I want to graph the fitted hazard curve at mean of safescore, mean+SD, and mean-SD in Stata.

For ease of interpretation, I centered the safescore at the mean and created the resulting variable c_safescore.

egen safescore_mean=mean(safescore)
egen safescore_sd=sd(safescore)
gen c_safescore=safescore-safescore_mean
gen c_safescore_upper=c_safescore+1*safescore_sd
gen c_safescore_lower=c_safescore-1*safescore_sd

Can someone help me to do this in Stata?

My own solution is below,

predict c_safescore_m
line c_safescore_m time, sort
||
line c_safescore_lower time, sort
||
line c_safescore_upper time, sort

It seems that something is incorrect.

Unbalanced Panel - Structural Breaks

I have an unbalanced panel dataset. I am looking to do structural break analysis and came across xtbreak and estat sb commands. However, these both are for balanced panel and time series respectively. Is there any command for unbalanced panel data?

Best,

45-Degree Line with Marginsplot

Dear all, I am trying to add a diagonal/45-degree line to my marginsplot, but (function y=x, ...) does not work. Could anyone tell me whether it's possible, and if so how to add a diagonal line to a marginsplot? Thanks a lot in advance.

Using CRSP to calculate the Cumulative Abnormal Return

Hello everyone,

This is a very noob question, I am trying to get the quarterly CAR. However, my school's subscription only provides daily data.

I would greatly appreciate it if anyone can help me generate the quarterly CAR.
Can I just add all the Holding Period Return (ret) during that quarter to calculate the CAR?

Thank you in advance.

Best.

True Random Effect Greene

Dear sir,

Recently i've been trying using SFA to determine the efficiency of Gov't Spending. Im using Greene true random effect (TRE) with the guidance from Belloti "Stochastic frontier analysis using Stata". Sfpanel

There are several question that i would ask
1. What is frontier variable? can i treat it as control variable?
2. Is my syntax is true? because i have no idea if this is true or not
sfpanel lny lnx1 lnx2 (Treated as control variable), model(tre) distribution(hnormal) usigma(lnu1 lnu2 (Govt Spending)) difficult rescale nsim(100) simtype(genhalton)

New package -rori- on the SSC

Thanks to Kit Baum, there is a new package -rori- on the SSC.

rori -- Immediate command for estimation of selection bias through relative odds ratio (ROR) with large sample confidence interval

Response bias is an often neglected consideration when evaluating the results of epidemiologic surveys, usually because of a paucity of information about nonrespondents. The effect of nonparticipation was described by a relative odds ratio (ROR), calculated as the OR (participants)/ OR (source population).
The relative odds ratio (ROR) is computed as the cross product ratio of the participation rates in the four exposure by outcome categories.

-rori- calculates the relative odds ratio (ROR) in a 2x2 table of outcome and exposure. -rori- also calculates the Prevalence ratios of the exposure groups. When exposure isn't binary, it is possible to specify N for the target and response populations.

References

Austin MA, Criqui MH, Barrett-Connor E, Holdbrook MJ. The effect of response bias on the odds ratio. Am J Epidemiol. 1981;114:137–43
Nohr EA, Frydenberg M, Henriksen TB, Olsen J. Does low participation in cohort studies induce bias? Epidemiology. 2006;17:413–8
Nohr EA, Liew Z. How to investigate and adjust for selection bias in cohort studies.

Drop observations if not meeting frequency criterium

My dataset consists of time-series observation of US firms with identifiers like gvkey, cusip and a construct variable cusip_fiscalyear. I use data between 1995 and 2005, but need to drop all observations of a particular firm if the observations are less than 8 years. I suppose I need to generate a kind of frequency variable? And I suppose I can use any one of the three identifiers.

Saturday, May 28, 2022

Creating a 2 Y-axis line using two datasets

Hello Stata community;
I have 2 Stata datasets: one is called Republicans, and the other one is Sunspots.
Example of Republicans data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Year byte Number_Republicans
1961 36
1963 34
1965 32
1967 36
1969 43
1971 44
1973 42
1975 37
1977 38
1979 41
end

Example of Sunspots data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Year double Sunspots_Number
1960 112.3
1961  53.9
1962  37.6
1963  27.9
1964  10.2
1965  15.1
1966    47
1967  93.8
1968 105.9
1969 105.5
1970 104.5
1971  66.6
1972  68.9
1973    38
1974  34.5
1975  15.5
1976  12.6
1977  27.5
1978  92.5
1979 155.4
1980 154.6
end

As you can see, both datasets have the same X axis (Year), but different Y axis (Republicans number for the first dataset, Sunspots number for the second), my goal is to draw a 2-line graph showing the evolution of Sunspots number and Republicans number. Since both of the datasets have the same X-axis, the idea of merging them comes to mind already, I wanna know if merging the 2 datasets is correct (I think it is), second, I would to get help on how to draw the 2-line grapgh, I would love to have the Sunspots number on the left Y axis, and the Republicans number on the right Y axis, and both of those Y variables don't have the same scale.

Thanks very much for your help.

sample size-new

From a published study we have the following information. Here there is only one group (preterm infants, 56 infants) in which the variable Deltamet (mean change in methylation of the SLC6A4 gene) is measured. deltamet (mean (sd) = 0.07 (1.23)

The results of regressions to see the effect of NICU-related stress and deltamet on ATL MPR (anterior lobe brain),is reported below

deltameta	beta	SE	t	p
	-0.48	0.16	-3.06	0.01

I tried to see the sampsi command THIS TIME choosing the case where you have only one group but here too I had difficulty Assuming a power of 80% is an alpha of 0.05 what is the sample size? In other words, how many subjects must be taken if the estimate deriving from the sample must fall within 2 percentage points of the regression value indicated, with a confidence level (1-α) equal to 95%? I don't know if sampsi is the right command . Is anybody can help me? Thanks in advance Tommaso

Reference group with multiple dichotomous variables

Hello Statalist

I have a question regarding how to interpret the reference group in my logistic regression. I have four independent variables and one dependent variable. All dichotomous. When i run a logistic regression with X1 and Y, the reference group is when X1 has the value 0. Is that correctly understood? Furthermore, how do i show this in an output table in my assignment? Since they are dichotomous then when X1 = 0 X2, X3 and X4 either = 1 or 0 becasuse they are mututally exclusive. Should i write that X2, X3 and X4 are all the referencegroup or just that X1=0 is the regerencegroup?

If i run a mutiple regression with all the variables, i can exclude one wich would be the referencegroup since k-1.

Any advice or input is much appreciated!

Portfolio average

Hi, I'm currently writing my bachelor thesis and I'm new to Stata. I have monthly data and I want to sort the data monthly in 5 portfolios according to their leverage change. To sort the data I have used the following code:

Code:

bys year month: astile change5 = lvchange_abs, nq(5)

A sample of my data is:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str12 isin int year byte month float lvchange_abs double(ri change5)
"GB0004052071" 1988  7 .003303988   9.566951 2
"GB0001826634" 1988  8 .001834629  -8.154499 1
"GB0009223206" 1988  8 .016871154  -11.46245 3
"GB0030913577" 1988  8 .003004034  -5.144036 1
"GB0007188757" 1988  8  .03863934  -2.752304 4
"GB0033195214" 1988  8   .0211871 -14.440441 4
"GB00B61TVQ02" 1988  8 .009517585 -2.8571427 2
"GB00B1WY2338" 1988  8  .04266421 -10.276688 5
"GB0004052071" 1988  8 .003303988 -3.2786906 2
"GB00B1WY2338" 1988  9  .04266421  14.096916 5
"GB0004052071" 1988  9 .003303988  7.3446393 2
"GB0001826634" 1988  9 .001834629   1.869154 1
"GB0030913577" 1988  9 .003004034  5.2060723 1
"GB0033195214" 1988  9   .0211871  6.4669967 4
"GB00B61TVQ02" 1988  9 .009517585  3.9215684 2
"GB0009223206" 1988  9 .016871154  11.160719 3
"GB0007188757" 1988  9  .03863934   7.679951 4
"GB0001826634" 1988 10 .001834629 -1.8348575 1
"GB00B1WY2338" 1988 10  .04266421   1.544416 5
"GB0030913577" 1988 10 .003004034  1.2371182 1
"GB0033195214" 1988 10   .0211871  2.0080209 4
end

I want to calculate the average return (ri) of the each portfolio per month, but I don't know how to do this. It will be something like the mean for each month for each year for each portfolio (ranging from 1 to 5). Is there an easy way to do this? And I would also like to implement an if function to only calculate the mean if a portfolio has more than 5 companies in it.

Thanks

Sample size

Hello everyone, I ask for advice on the calculation of the sample size ... From a previous study (which considered 56 highly preterm infants) it was found that these infants have on average the value of the "slc684" gene (mean=0.07 (SD = 1.23)). We have only one group.
Is this command I wrote to define the sample size correct? #sampsi 0.07 0.9, sd1(1.23) onesample Estimated sample size for one-sample comparison of mean to hypothesized value Test Ho: m = .07, where m is the mean in the population Assumptions: alpha = 0.0500 (two-sided) power = 0.9000 alternative m = .9 sd = 1.23 Estimated required sample size: n = 24 #
I have entered the value 0.9 as an alternative hypothesis. it's correct? tHANKS TO EVERYBODY

Friday, May 27, 2022

Running Multinomial Probit on unbalanced Panel data

I have unbalanced panel data on household cooking energy, with total observations equal to 1762 as follows:

	Year
Count	2010	2013	2016	2019
1	62	12	17	422
2	57	55	20	8
3	307	308	302	7
4	46	46	46	46
Total	472	421	385	483

I would like to use the multinomial Probit to analyze the choice determinants in STATA 16 (I have 4 alternatives). I set my time variable, Year. First I need to know if I can still go ahead and use this unbalanced data, or if I should use only if the count==3 as it is the largest available. I thought of also analyzing using the count=4 data (46 households, 184 observations), just to be able to use the most recent data, but I am not sure if this will be enough for estimations.

Also, I haven't seemed to find a command for the multinomial Probit on panel data for STATA 16. Please assist with this as well.
Thank you

Reshaping long for variables with two indexes

Dear Statalisters,

I am having a little issue with the -reshape long- command as I'm discovering it for the first time and I'm not quite sure of how it works exactly. To put it briefly, I ran a regression with an interaction term between my first variable (var1, 8 values) and my second variable (var2, 7 values) and I generated 8*7 = 56 coefficients named coef_`i'_`j', i being the value of var1 and j being the value of var2. Now, I'd like to reshape long my variables coef, that is, I'd like to have one variable coef in a way that its modalities align with the appropriate var1 variable and the appropriate var2 variable. Here's a data example, even if I cannot show you the full picture because of memory restrictions:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float var1 byte var2 float(coef_3_1 coef_4_5 coef_2_7)
1 5 .2391724 -.037122227 .021247435
1 6 .2391724 -.037122227 .021247435
1 5 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 8 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 9 .2391724 -.037122227 .021247435
1 9 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 5 .2391724 -.037122227 .021247435
1 5 .2391724 -.037122227 .021247435
1 3 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 9 .2391724 -.037122227 .021247435
1 8 .2391724 -.037122227 .021247435
1 5 .2391724 -.037122227 .021247435
1 9 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 1 .2391724 -.037122227 .021247435
1 5 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
1 5 .2391724 -.037122227 .021247435
1 2 .2391724 -.037122227 .021247435
1 7 .2391724 -.037122227 .021247435
end

Long story short, I'd like my data to look like this:

var1	var2	coef
1	1	coef_1_1
1	2	coef_1_2
1	3	coef_1_3
...	...	...
2	1	coef_2_1
...	...	...
4	1	coef_4_1
...	...	...
8	7	coef_8_7

If my coef variable had only one index, namely, coef_`i', I'd know how to do, but I can't seem to find the appopropriate code for a combination of two variables. Any help would be much appreciated!

Regards,

Hugo

Split string variable by the last word that meets character length limit

I am trying to upload a csv on website however, it requires that the character length for the field should not be greater than 500.

Since the field contains occupations and splitting by the exact character limit would mean trimming occupation names, is there a way to split the field in a manner that follows the character limit while keeping full occupation name? Text example is below:

Code:

clear
input strL job
Sales/Marketing Officer, Manager/Assistant Manager, Customer Service Officer/Enumerator, Data Entry Operator, Teacher, Accountant/Cashier, Administration/Operations Officer/Clerk, Computer Operator, Receptionist/Front Desk Officer/ Telephone Operator, Supervisor/Controller, Lab Assistant, Software Developer/Graphic Designer/IT Specialist, Doctors/Nurses, Designer, Lawyer, Journalist/Media Officer, Armed Forces - Police, Army, Fireman, Security Guard,etc, Telemarketing Officer/Call Centre Agent, Research and Writing Jobs: Content Writer/Research Assistant/Analyst, Lab Assistant,
end

Thursday, May 26, 2022

How to count firm number in panel data?

Hello,

I have some troubles in counting the firm number in panel data. I have 3741 fiscal year observations during 1994-2006. And I want to count how many firms are there in my sample.

I used the command below:
bysort fyear gvkey
count gvkey

"fyear" is the fiscal year and "gvkey" is the company ID.

But there always shows an error as ": required r(100);" in Stata.

So could you please tell me what I am doing wrong? Thank you very much!

Trailing but not leading zeros in display format

I would like to display a set of probabilities at two decimal places but with no leading zeros. E.g. I would like .95 to display as .95 and .90 to display as .90. I don't want .90 to display as 0.90 or as .9. Is there a display format that accomplishes this or do I need to resort to string manipulation (which is what I've been doing)?

Code:

loc p95=.95
di %3.2f `p95'
di %3.2g `p95'

loc p90=.90
di %3.2f `p90'
di %3.2g `p90'

Results:

Code:

. loc p95=.95

. di %3.2f `p95'
0.95

. di %3.2g `p95'
 .95

.
. loc p90=.90

. di %3.2f `p90'
0.90

. di %3.2g `p90'
  .9

GMM estimation and Agumented taylor rule

How to estimate the four parameters of the Augmented Taylor Rule
i_t=(1-ρ)α+(1-ρ)βπ_(t+n)+(1-ρ)γx_t+ρi_(t-1)+ε_t

using Generalized Method of Moments (GMM) in STATA ? Please I need help

Color for figure: twoway (tsline)

Hi all,

I am making a figure and I would like to differentiate between colors.
Specifically, I want the K2K CEMs to have a different colors than CEM 1-4.

Is this possible?

Code:

twoway (tsline CEM1, connect(ascending) cmissing(n)) (tsline CEM2 CEM3 CEM4 CEM1K2K CEM2K2K CEM3K2K CEM4K2K, lcolor(dkgreen)), title("Greenium per conventional CEM and K2K CEM") ytitle(Greenium in %)

Thank you very much!! I am still a beginner but this forum makes me learn much quicker.

Best regards,

Rens

graph x-axis not legible

I used the code below to generate the attached plot but the values on the x-axis are not legible. Kindly assist.

coefplot /// Array
(Depression, mcolor(green) ciopts(lcolor(green))) ///
(Dementia, mcolor(blue) ciopts(lcolor(blue))) ///
(Epilepsy, mcolor(red) ciopts(lcolor(red))) ///
(Other_complaints, mcolor(purple) ciopts(lcolor(purple))) ///
(Psychosis, mcolor(black) ciopts(lcolor(black))), ///
eform drop(_cons) xscale(log range(.5 2)) omitted ///
xline(1, lcolor(black) lwidth(thin) lpattern(dash)) ///
title("Changes in the physical domain of quality of life") xtitle("odds ratio") ///
text(1.8 1.9 "N=103", box fcolor(white) lcolor(black)) ///
msymbol(D) ciopts (recast (rcap))

Formatting datetime from one format to another in same variable

Hello everyone!

I am facing some issues with my timestamp variable. The data that I have received from the field are in 2 formats:

1) 17-04-2022 21:15
2) 2022-04-04:09:00:00.PM

I want to bring all the observations in the variable into a standard format.

Could someone please suggest some ways to go about the same? Thanks for your help!

Wednesday, May 25, 2022

How to weight the Gini coefficient decomposition?

hi all,

i want the Gini coefficient decomposition in Lerman-Yitzhaki's method.
but "descogini" doesn't support weights.

so, im trying to use DASP's "diginis".
however, I still haven't found an command that applies weights.

can you tell me how i can apply weights while using Lerman-Yitzhaki's method?
thanks,

Accessing Files Created within Subprogram

I am writing a Stata .ado program that performs the same subroutine many times. As such, I would like to turn this subroutine into its own sub-program for efficiency and readability. However, this subroutine involves working with and rewriting multiple intermediate datasets that I now save as tempfiles. From my understanding tempfiles cannot be accessed outside of the program they are created in, so any tempfiles I create in the subroutine would not be accessible in the wider program (and vice-versa).

I know I could just save what I save now as tempfiles as regular datasets (.dta), but since I intend for this command to be used by others I’m not sure if it’s best to save additional (potentially large) files to an arbitrary location. What is the best way to handle this problem? Is there a way to save a dataset to the tempfile directory but not have it delete the file and local path automatically at the conclusion of the program? Is it worth trying to rewrite this subroutine as a program at all? I would like to avoid using Frames so the code can run with slightly earlier versions of stata.

Here is a toy example of the issue at play. I would like to access the intermediate datasets (my_temp1, my_temp2) in the large .ado file the program is called in. The following, as expected, gives an invalid file specification error.

Code:

capture program drop my_prog
program my_prog
    sysuse auto
    tempfile my_temp1
    save `my_temp1' // save intermediate tempfiles
    
    sysuse citytemp
    tempfile my_temp2
    save `my_temp2' // save second intermediate file
    end

clear

my_prog

use `my_temp1'

Thanks!

Overlay graphics in Stata / Export graphic with transparent background?

I have 2 graphics (attached, .gph and .png versions of each) that I created with a user-written command (gsa) in Stata 17. They stem from the same estimation, but unfortunately GSA can only plot 10 markers at a time -- so if I have like 50 markers to plot, I have to make 5 graphs. But, what I really want (for publication) is 1 graph. Since they all stem from the same estimation.

Attached are 2 examples of those graphs - everything on the graphic (e.g., the blue lines and dots, the range) is identical except for the placement of the gray markers at the bottom left.

(1) Is there some way to lay these 2 graphics perfectly over top of one another in Stata, to make 1 graphic with all the markers? I have made the background color "None," so if there WAS a way to overlay 2 graphics in Stata, I think you would see both sets markers.

(2) Alternatively, is there some way to export these .gph files such that they have no background / a transparent background in the exported version? If I could do this, I could overlay the graphics in some other program, e.g. Word. But when I export to .png as you can see, or .pdf, etc. there is a blue background. Whereas I need a transparent background, to be able to overlay them.

Thanks,
Leah

Fixed Effects: how to report in a table

Hi all,

I would like to ask a question, please, about how to report a fixed effects regression in a table.

Should I, for a fixed effects regression, report the same parameters as for a normal regression, or should I also report the sigma_u, sigma_e and rho values indicated by Stata?

Context: I want to compare FE and OLS in one table. Therefore not reporting these additional parameters that the FE regression gives would be optimal for comparison reasons, but I am not sure if that is allowed.

Thank you very much for your reply in advance, it is appreciated!

Best regards,

Rens

Tuesday, May 24, 2022

5*5 bivariate dependent sorting for portfolio creation

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 companies byte stock_id int date double(Returns Idiosyncraticvolatility Marketcapitalisation)
"3M IN Equity"    1 16648  .22000875  .142986 8969.836
"3M IN Equity"    1 16679  .00594774 .0442478 9023.345
"3M IN Equity"    1 16709  .09604762 .1154085 9933.002
"3M IN Equity"    1 16740 -.18612806 .0927586 8246.053
"3M IN Equity"    1 16770  .04742311 .0760012 8646.527
"3M IN Equity"    1 16801  .10965069 .0958403 9648.558
"3M IN Equity"    1 16832  .07151935 .0495798 10363.89
"3M IN Equity"    1 16860  .34647638 .1465515 14655.33
"3M IN Equity"    1 16891  .11756522 .0883265 16483.66
"3M IN Equity"    1 16921 -.00785582 .1280992 16084.31
"3M IN Equity"    1 16952 -.27436253 .1219167 12430.47
"3M IN Equity"    1 16982  .03979741 .1011703 12935.15
"3M IN Equity"    1 17013 -.03140462 .0735412 12535.24
"3M IN Equity"    1 17044  .01979822 .0827738 12785.89
"3M IN Equity"    1 17074  .14347082 .1185964 14758.41
"3M IN Equity"    1 17105  .03962137  .083371 15354.89
"3M IN Equity"    1 17135  .09906806 .0820645 16953.97
"3M IN Equity"    1 17166  .00860078 .0466454 17100.42
"3M IN Equity"    1 17197  .11708011 .0865417 19224.46
"3M IN Equity"    1 17225  .02308274  .094335 19673.37
"3M IN Equity"    1 17256 -.10203222 .0797749 17765.06
"3M IN Equity"    1 17286  .07274823 .0691696 19105.61
"3M IN Equity"    1 17317  .12631571 .0807489 21677.99
"3M IN Equity"    1 17347  .07532471 .1245477 23373.96
"3M IN Equity"    1 17378 -.13217596 .0606982 20479.95
"3M IN Equity"    1 17409 -.04681604 .0867076 19543.26
"3M IN Equity"    1 17439    .044448 .0637702 20431.33
"3M IN Equity"    1 17470  .21749273 .1530644 25395.25
"3M IN Equity"    1 17500 -.13973865 .0837717 22083.34
"3M IN Equity"    1 17531  .16032942 .1357227 25923.58
"AACL IN Equity"  2 16648  .09542995 .0978906 370.1137
"AACL IN Equity"  2 16679  .19521217 .1228615 449.8987
"AACL IN Equity"  2 16709 -.10371983 .1566531 405.5737
"AACL IN Equity"  2 16740  -.1293248 .1224561  356.373
"AACL IN Equity"  2 16770  .02214113 .0605402 364.3515
"AACL IN Equity"  2 16801  .06818151 .1264464   390.06
"AACL IN Equity"  2 16832  .07864313 .1213397  421.974
"AACL IN Equity"  2 16860  .01873104 .1200355 429.9525
"AACL IN Equity"  2 16891 -.11920157 .0821024 381.6382
"AACL IN Equity"  2 16921  .54170286 .0743389 649.8045
"AACL IN Equity"  2 16952  -.2787134 .1243766   496.44
"AACL IN Equity"  2 16982 -.23889191  .101902 390.9465
"AACL IN Equity"  2 17013  .06368782 .1569841  416.655
"AACL IN Equity"  2 17044  .17520409 .1854097   496.44
"AACL IN Equity"  2 17074  .12095261 .1790345  560.268
"AACL IN Equity"  2 17105  .08489944 .1328724  609.912
"AACL IN Equity"  2 17135  .15394302  .176113 711.4162
"AACL IN Equity"  2 17166  .06336961 .1808397 757.9575
"AACL IN Equity"  2 17197  .19202095  .160783  918.414
"AACL IN Equity"  2 17225 -.12146984 .1425292 813.3637
"AACL IN Equity"  2 17256 -.05373425 .1655384 770.8117
"AACL IN Equity"  2 17286  .18910319 .1126103 931.2682
"AACL IN Equity"  2 17317 -.00764457 .1376465 924.1762
"AACL IN Equity"  2 17347 -.02919915 .0957982 897.5812
"AACL IN Equity"  2 17378 -.00992564 .1348592 888.7162
"AACL IN Equity"  2 17409  .00398209 .0735481 892.2622
"AACL IN Equity"  2 17439  .00495541  .074697 896.6947
"AACL IN Equity"  2 17470  .02441528 .1254344 918.8572
"AACL IN Equity"  2 17500 -.05554229 .1779125 869.2132
"AACL IN Equity"  2 17531  .19574458 .1349978 1057.151
"AAVAS IN Equity" 3 16648          .        .        .
"AAVAS IN Equity" 3 16679          .        .        .
"AAVAS IN Equity" 3 16709          .        .        .
"AAVAS IN Equity" 3 16740          .        .        .
"AAVAS IN Equity" 3 16770          .        .        .
"AAVAS IN Equity" 3 16801          .        .        .
"AAVAS IN Equity" 3 16832          .        .        .
"AAVAS IN Equity" 3 16860          .        .        .
"AAVAS IN Equity" 3 16891          .        .        .
"AAVAS IN Equity" 3 16921          .        .        .
"AAVAS IN Equity" 3 16952          .        .        .
"AAVAS IN Equity" 3 16982          .        .        .
"AAVAS IN Equity" 3 17013          .        .        .
"AAVAS IN Equity" 3 17044          .        .        .
"AAVAS IN Equity" 3 17074          .        .        .
"AAVAS IN Equity" 3 17105          .        .        .
"AAVAS IN Equity" 3 17135          .        .        .
"AAVAS IN Equity" 3 17166          .        .        .
"AAVAS IN Equity" 3 17197          .        .        .
"AAVAS IN Equity" 3 17225          .        .        .
"AAVAS IN Equity" 3 17256          .        .        .
"AAVAS IN Equity" 3 17286          .        .        .
"AAVAS IN Equity" 3 17317          .        .        .
"AAVAS IN Equity" 3 17347          .        .        .
"AAVAS IN Equity" 3 17378          .        .        .
"AAVAS IN Equity" 3 17409          .        .        .
"AAVAS IN Equity" 3 17439          .        .        .
"AAVAS IN Equity" 3 17470          .        .        .
"AAVAS IN Equity" 3 17500          .        .        .
"AAVAS IN Equity" 3 17531          .        .        .
"ABB IN Equity"   4 16648          . .0943647 61875.16
"ABB IN Equity"   4 16679          . .0600698 67658.14
"ABB IN Equity"   4 16709          . .0662682 72445.16
"ABB IN Equity"   4 16740          . .0447659 71160.99
"ABB IN Equity"   4 16770          . .0968998 82337.05
"ABB IN Equity"   4 16801          . .0395569 81582.65
"ABB IN Equity"   4 16832          . .0981462 107191.8
"ABB IN Equity"   4 16860          . .0617232   107925
"ABB IN Equity"   4 16891          . .0691475 124095.7
"ABB IN Equity"   4 16921          . .0517329 131137.5
end
format %tdnn/dd/CCYY date

Hello Sir,
I would like to do 5*5 bivariate sorting to form our portfolios return series. We require value weighted portfolios return series as well as equal weighted portfolios return series. I have attached my sample data above.
For value value weighted return:
First of all, at every 30^th June, we independently split all companies into five size groups by taking market capitalization as a base and for breakpoint we have used 20^th percentile, 40^th percentile, 60^th percentile and 80^th percentile (we can also call it as quintile breakpoints). Further, we have divided each group into five subgroups by using 20^th percentile, 40^th percentile, 60^th percentile and 80^th percentile value of Idiosyncratic Volatility for breakpoints. By doing this 5*5 bivariate sorting on variables the Size and Idiosyncratic volatility, we get 25 portfolios. After this every year from July-June, we calculate the monthly value weighted return for all these 25 portfolios. For weights, we have used market capitalization on 30^th June of every year, i.e. portfolio creation date. The same process we have repeated each year. Hence, we get the series of value-weighted monthly returns for these 25 portfolios by employing bivariate sorting. In this way, we can see the impact of Idiosyncratic volatility on stock returns while keeping size conditional.
For equal weighted return series:
Again equal weights are based on Market capitalization on 30^th June of every year.

I got the code which is written below from some earlier post in Stata forum. In this code portfolios are created every month. But I would like to regenerate the portfolios at the end of every June of the year i.e. yearly as I mentioned above. Sir, what will be the change in this code. I really need help.

capture program drop one_mdate
program define one_mdate
gen cutoffs = .
_pctile size, nq(5)
forvalues i = 1/4 {
replace cutoffs = r(r`i') in `i'
}
display `"`cutoffs'"'
xtile size_quintile = size, cutpoints(cutoffs)
drop cutoffs
by size_quintile, sort: egen idiovol_quintile = xtile(idiovol), nq(5)
tab *_quintile
exit
end
// DOUBLE SORT EACH FISCAL YEAR
runby one_mdate, by(mdate) status
capture program drop one_weighted_return
program define one_weighted_return
egen numerator = total(mcap*rt)
egen denominator = total(mcap)
gen vw_mean_rt = numerator/denominator
exit
end
runby one_weighted_return, by(mdate size_quintile idiovol_quintile)
gen int portfolio_num = (size_quintile-1)*5 + idiovol_quintile
by mdate size_quintile idiovol_quintile, sort: egen ew_mean_rt = mean(rt)
keep mdate *_quintile *_mean_rt
by mdate *_quintile, sort: keep if _n == 1

Thank you
Regards

Prediction interval (not confidence interval) calculations for glm w/ robust VCE model?

In reference to the two threads below, did anybody ever figure out how to get prediction intervals/individual intervals/the stdf command to work with margins in more complex models? I have a glm with poisson family and log link with robust variance estimator. I have been requested by a reviewer to add prediction intervals in one week but cannot find a solution as the predict + stdf command does not appear to work with robust VCE (https://www.statalist.org/forums/for...=1653449067985).

https://www.statalist.org/forums/for...argins-command
https://www.statalist.org/forums/for...able-after-mlr

Difficulty with margins: Wald estimate discrepency between ivreg2 and margins

Hello,
I am trying to estimate the Wald estimand and calculate standard errors using margins and suest, but I'm not able to exactly replicate what ivreg2 gets me or what I get by diving the reduced-form coefficient by the first-stage coefficient. Does anyone know why?

ivreg2 is available from SSC. Documentation: http://www.repec.org/bocode/i/ivreg2.html

(Note: the reason I'm using margins is: after I get this to work, I'm going to be simulating some other methods which will require margins, such as Edward Norton's (2022, NBER) guidance on calculating marginal effects on an outcome that has been transformed.)

I simulate some data:

Code:

  clearall
  set obs 50000
  set seed 10107
    * randomly draw instrument values, idiosyncratic terms for each individual
  drawnorm Z 
  replace Z=Z>0
  drawnorm U0
  drawnorm U1
    * set parameters which shift potential outcomes
  scalar mu1 = 1
  scalar mu0 = 0
    * Potential outcomes
  gen Y0 = mu0 + U0
  gen Y1 = mu1 + U1
  gen tx = Y1 - Y0 // treatment effect
  sum tx
    * Treatment decision
  * let Utility = Y + Z * 1[A=1]
  * ie, Z reflects additional inducement (or deterence, if negative) for taking treatment
  gen relativeutility1 = Y1 - Y0 + Z
  gen A = relativeutility1 > 0
  tab A, m
    * Outcome 
  gen Y = Y1*A + Y0*(1-A)

I estimate the local average treatment effect (LATE) using ivreg2:

Code:

 ivreg2 Y (A=Z) , r
  gen insample=e(sample)==1

I confirm by calculating Wald by hand:

Code:

  reg Y Z if insample==1
  estimates store RF
  scalar coef_RF = _b[Z]
  reg A Z if insample==1
  estimates store FS
  scalar coef_FS = _b[Z]
  di coef_RF / coef_FS

So far so good. But then I get a different point estimate and standard error when I use margins:

Code:

  suest RF FS, vce(robust)
    margins if insample==1, dydx(Z) expression(predict(equation(RF_mean)) / predict(equation(FS_mean)))

I would greatly appreciate it if anyone has any idea why this last technique using margins doesn't yield the same estimate as ivreg2 or dividing the reduced-form coefficient by the first-stage coefficient. Thank You!

Best,
Nate

Problem with Gen & string/numeric variable conversion

Dear forum members,

I am new to Stata and having a problem that I can't seem to find a straightforward answer to. I'm dealing with a data set that has mix of many string and numeric variables. So when I do a code such as "gen X = Y-Z", I get a type mismatch error. So then I try to encode the string variable, it goes through without an error but the actual number in X is not correct (not actual Y-Z calculation outcome). It seems to be a constant problem whenever I use a variable created from "encode" command.

Could you please advise how to resolve this?

I appreciate your help.

Best,
Nathan

Meta analysis using stata 15

How can I test/visualize subgroup difference. I use metan by(subgroup) but I can’t get the difference between subgroups?
anyone can help?

svy: tabulate seems to misprint the extended missing value .z

Hello Statalist members,

I am using Stata 17 (ver.10May2022). I found that svy: tabulate doesn't seem to work properly, if either the row variable or the column variable contains the extended missing value .z. For example:

Code:

clear all
set obs 10
generate w =1
generate x =rbinomial(3, .5)
replace x =.z if _n==1
replace x =.y if _n==2
 
quietly svyset [pw=w]
 
svy: tabulate x, missing
 
Number of strata =  1       Number of obs   = 10
Number of PSUs   = 10       Population size = 10
                            Design df       =  9
----------------------
        x | proportion
----------+-----------
        0 |         .1
        1 |         .2
        2 |         .5
       .y |         .1
          |
    Total |         .1
----------------------
Key: proportion = Cell proportion

As can be seen from the the above table, the proportion of the category .z is printed at the last row with a label of Total. This occurs with svy: tabulate twoway too. Is that a bug or is the command designed for that?

Thanks for any help on this.

Cheers,
Kirin

Monday, May 23, 2022

Scaling fitted values obtained from first stage of IV

Hi everyone:

I am running the following regression where I instrument my binary endogenous treatment variable adhd_dx with my instrument q.

Code:

svy:ivregress 2sls logtot age_6_10 age_10_13 famsz_0_4 rc2 rc3 rc4 rc5 rg2 rg3 rg4 ins2 ins3 (adhd_dx=q)

. However, the coefficient on adhd_dx is too big and doesn't seem reasonable in the context that I am studying (the dependent variable is log of expenditure)

Code:

------------------------------------------------------------------------------
             |                 BRR
      logtot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     adhd_dx |   4.061351    .654874     6.20   0.000     2.765473    5.357228

This means that the treatment effect is 100*(exp^4.06 -1)!

To make sense of the coefficient, I did the following:

Code:

svy:reg adhd_dx age_6_10 age_10_13 famsz_0_4 rc2 rc3 rc4 rc5 rg2 rg3 rg4 ins2 ins3 q

Then obtained the fitted values, yhat. Then scaled yhat :

Code:

replace yhat=yhat*100

and then used the scaled fitted values in the second stage

Code:

svy:reg logtot age_6_10 age_10_13 famsz_0_4 rc2 rc3 rc4 rc5 rg2 rg3 rg4 ins2 ins3 yhat

I get:

Code:

------------------------------------------------------------------------------
             |                 BRR
      logtot |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        yhat |   .0463319   .0059718     7.76   0.000     .0345147     .058149

Now, I have two questions:

1. Is scaling like this okay? I mean, if the original yhat is uncorrelated with unobservables, are the scaled yhats too, since the transformation is linear?

2. How would I interpret the new coefficient? A 1pp increase in adhd_dx leads to a 100*(exp^0.46 -1) percent increase in expenditure- is this the correct interpretation?

Reducing repeated responses to find actual sample size from a multiple imputed dataset

Hi there
I am hoping someone can advise me on this complex dataset that is derived from a dual frame complex sample design that is provided by the IRS and conducted by the Fed Reserve, line can be found at https://www.federalreserve.gov/econres/scfindex.htm. Because of the large missing data, the Feds imputed replacement values for users beforehand and released five replicate datasets that inlaid these multiply-imputed values. Hence, the apparent sample size of approx 28885 is actually only 5777. They provided a replicate weight dataset which I then created an average weight to normalize the population weight to reflect actual sample. The variable x42001 was given as the population weight (proportions representing actual population).
I first generated a new weight variable, nwgt, by dividing x42001 by the product of the average of weights multiplied by 5
*this nwgt variable is the population normalized version of x42001 - these figures are population weighted
gen nwgt=0
replace nwgt=x42001/(22268.03*5)

However, I am having 3 issues:
1. While I am able to reflect the weight for descriptive statistics for categorical variables, I am not able to do so for continuous variables. For e.g. On home ownership, if I tab townhome[iweight=nwgt] , I get N=5777 which is what I wanted. But when I tabstat age[iweight=nwgt], stat(count mean sd p50 min max) an error message appeared: iweights not allowed
2. there is also another variable weights in the dataset since the data oversampled the wealthy and white population. aweight=wgt and I want to showcase weighted vs unweighted dataset so how could I combine both iweight and aweight in the same line of command?
3. When I run the analysis, in this case, multinomial, I can't seem to use the iweights command either. mlogit risktol age i.gender i.townhome [iweight=nwgt]

Would be grateful for advice.
thank you.
Yours truly
LG

Multiple time-failure analysis, event-specific coeffcients?

Dear all,

I am using stcox along with a stratification variable to estimate the hazard functions for a model with recurrent events. However, Stata only displays one corresponding coefficient for each covariate, whereas I would like to know the coefficients for each stratum (= each event). In R, in seems to be possible to compute both, but the manual entry for stcox is not clear in this regard. Is there a way to obtain stratum or event-specific regression coefficients?

Thanks very much for you help, all the best.

Simon Bittmann

How to create control group against fraud firms?

Dear all,
I want to create a control sample of non-fraud firms against firms that committed fraud. I want the matching based on the firm's assets (at). Since the year of fraud for every firm is different, I don't know how can I create a control sample for the fraud firms. Kindly guide with the data below:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double at float fraud_year double fyear
32.335 . 1990
35.559 . 1991
41.976 . 1992
63.997 . 1993
93.811 . 1994
91072 . 1999
12.854 . 1990
16.243 . 1991
18.686 . 1992
18.293 . 1993
17.965 . 1994
17.851 . 1995
17.215 . 1996
18.315 . 1997
20.417 . 1998
22.774 1 1999
21.473 . 2000
22.219 . 2001
17.269 . 2002
18.985 . 2003
18.463 . 2004
20.212 . 2005
17.735 . 2006
21.38 . 2007
17.761 . 2008
21.476 . 2009
89300.7 . 1990
91987.6 . 1991
89928.2 . 1992
100036.7 . 1993
94172.5 . 1994
84323.7 . 1995
92912.9 . 1996
96000.6 . 1997
105148.1 . 1998
112839 . 1999
47445.7 . 2000
43255.1 . 2001
40047.5 . 2002
40950.2 . 2003
42133.7 . 2004
44364.6 . 2005
47626.4 . 2006
50724.7 . 2007
35852.5 . 2008
38550.4 . 2009
2835 . 1990
2767.2 . 1991
2642.7 . 1992
2793.8 . 1993
2945.7 . 1994
2942.4 . 1995
781.415 . 1990
859.281 . 1991
1092.076 . 1992
1006.126 . 1993
1112.929 . 1994
1158.684 . 1995
1072.709 1 1996
1058.928 . 1997
3405.517 . 1998
3132.349 . 1999
2.458 . 1990
1.817 . 1991
28.972 . 1992
21.769 . 1993
16.812 . 1994
137682 . 1990
146441 . 1991
175752 . 1992
94132 . 1993
97006 . 1994
107405 . 1995
108512 . 1996
120003 . 1997
126933 . 1998
148517 . 1999
154423 . 2000
151100 . 2001
157253 . 2002
175001 . 2003
192638 . 2004
113960 . 2005
127853 . 2006
149830 . 2007
126074 . 2008
124088 . 2009
58143.112 . 1990
69389.468 . 1991
79835.182 . 1992
101014.848 . 1993
114346.117 . 1994
134136.398 . 1995
148431.002 . 1996
163970.687 . 1997
194398 . 1998
268238 . 1999
306577 1 2000
492982 . 2001
561229 . 2002
end
[/CODE]

Estimate GARCH-DCC with asymmetries

Has anyone had experience estimating GARCH-DCC models with asymmetries (GJR for example)? Can this be done within the mgarch command?
Thanks in advance!

Sunday, May 22, 2022

bysort query

Hello, I am trying to use bys to generate sequential and total number of Lines of treatments by patient by disease phase (CLL vs RT). The total number just does not seems to show with my code: What am I doing wrong? The first _n works perfectly.

bysort studyid disease (Lineno): gen tx = _n if !inlist(lot_split, "No", "None", "untrea", "", "N/A", "NA")
bysort studyid disease (Lineno): gen txtot = _N if!missing(tx)

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float studyid str3 disease byte Lineno str102 lot_split
1 "cll" 1 "FCR"                         
1 "cll" 2 " Ibrutinib"                  
1 "cll" 3 ""                            
1 "cll" 4 ""                            
1 "cll" 5 ""                            
1 "cll" 6 ""                            
1 "cll" 7 ""                            
1 "cll" 8 ""                            
1 "rt"  1 "Venetoclax/Rituximab"        
1 "rt"  2 " Venetoclax"                 
1 "rt"  3 " Idelasilib/Rituximab"       
1 "rt"  4 " Obinutuzumab/Idelalisib"    
1 "rt"  5 ""                            
1 "rt"  6 ""                            
1 "rt"  7 ""                            
1 "rt"  8 ""                            
2 "cll" 1 "FCO"                         
2 "cll" 2 ""                            
2 "cll" 3 ""                            
2 "cll" 4 ""                            
2 "cll" 5 ""                            
2 "cll" 6 ""                            
2 "cll" 7 ""                            
2 "cll" 8 ""                            
2 "rt"  1 "ACP"                         
2 "rt"  2 " AVD"                        
2 "rt"  3 " BV-bendamustine"            
2 "rt"  4 " BV"                         
2 "rt"  5 " Pembrolizumab"              
2 "rt"  6 " Pembrolizumab/Acalabrutinib"
2 "rt"  7 ""                            
2 "rt"  8 ""                            
3 "cll" 1 ""                            
3 "cll" 2 ""                            
3 "cll" 3 ""                            
3 "cll" 4 ""                            
3 "cll" 5 ""                            
3 "cll" 6 ""                            
3 "cll" 7 ""                            
3 "cll" 8 ""                            
3 "rt"  1 "R-CHOP"                      
3 "rt"  2 " Ibrutinib"                  
3 "rt"  3 " BR-polatuzumab"             
3 "rt"  4 " Venetoclax"                 
3 "rt"  5 " Obinituzumab/Venetoclax"    
3 "rt"  6 " Venetoclax"                 
3 "rt"  7 ""                            
3 "rt"  8 ""                            
4 "cll" 1 ""                            
4 "cll" 2 ""                            
4 "cll" 3 ""                            
4 "cll" 4 ""                            
4 "cll" 5 ""                            
4 "cll" 6 ""                            
4 "cll" 7 ""                            
4 "cll" 8 ""                            
4 "rt"  1 "CHOP"                        
4 "rt"  2 " R-CHOP"                     
4 "rt"  3 " R-ICE"                      
4 "rt"  4 " Ibrutinib"                  
4 "rt"  5 " RT"                         
4 "rt"  6 ""                            
4 "rt"  7 ""                            
4 "rt"  8 ""                            
5 "cll" 1 ""                            
5 "cll" 2 ""                            
5 "cll" 3 ""                            
5 "cll" 4 ""                            
5 "cll" 5 ""                            
5 "cll" 6 ""                            
5 "cll" 7 ""                            
5 "cll" 8 ""                            
5 "rt"  1 ""                            
5 "rt"  2 ""                            
5 "rt"  3 ""                            
5 "rt"  4 ""                            
5 "rt"  5 ""                            
5 "rt"  6 ""                            
5 "rt"  7 ""                            
5 "rt"  8 ""                            
6 "cll" 1 "Chlorambucil"                
6 "cll" 2 " Rituximab/Chlorambucil"     
6 "cll" 3 " Ibrutinib"                  
6 "cll" 4 " Idelalisib/Rituximab"       
6 "cll" 5 " Venetoclax"                 
6 "cll" 6 ""                            
6 "cll" 7 ""                            
6 "cll" 8 ""                            
6 "rt"  1 "Venetoclax"                  
6 "rt"  2 " Pembrolizumab"              
6 "rt"  3 ""                            
6 "rt"  4 ""                            
6 "rt"  5 ""                            
6 "rt"  6 ""                            
6 "rt"  7 ""                            
6 "rt"  8 ""                            
7 "cll" 1 ""                            
7 "cll" 2 ""                            
7 "cll" 3 ""                            
7 "cll" 4 ""                            
end

Elasticities

Hello,

I would like to estimate the exports elasticity and the imports elasticity of some countries using STATA.
I have the bilateral exports and imports of these countries for one year..
How can I estimate these elasticities? Which command should I use in Stata?
Which other data do I need to estimate these elasticities?
Sorry about this basic question. Hope you can help me.

Would constant annual variables across firms in a year be fully absorbed by the year dummy?

Hi all,
I am running the following regression using reghdfe written by @Sergio Correia:

reghdfe f.Dependent U c.U#c.Q Q $xlist i.year, absorb(firmcode) vce(cluster firmcode year)

where the dependent variable is the firm’s debt ratio and U and Q are my variables of interest and are constant for all firms in a year, and as you know, c.U#c.Q is the interaction between U and Q. I have 20 years of unbalanced data for 5123 firms.

When I run my regression, one-third of the year dummies (the latest years) will be omitted because of collinearity; they will be only omitted when I add U, Q or their interaction.

Would this assumption be correct that because I put the i.year at the end after all other variables, Stata will automatically only absorb the part of the year fixed effect that has not been already absorbed by the U or Q variables (or the other annual macro variables like GDP)? Or would U and Q be fully absorbed by the year fixed effect?

I appreciate your help.
Thank you.
Mona

Storing the value of a variable from one observation in a local macro or scalar

Dear Statalist,

I'm using Stata 16.0 trying to store a variable value from one observation in a local macro (if the variable is a string) or in a scalar (if the variable is numeric).

Please see my example below. I cannot get the code to work because Stata does not allow "if" when defining locals and scalars. I tried also tried to use the list command (e.g., list make if price=4099), but was not sure how to store the output of the list command.

I'd appreciate your advice.

Thanks,
Brent

Code:

. *Here's an example
. sysuse auto
(1978 Automobile Data)

. 
. *In the first observation, the make variable is "AMC Concord" and the price variable is 4,099
. list make price if _n==1

     +---------------------+
     | make          price |
     |---------------------|
  1. | AMC Concord   4,099 |
     +---------------------+

. 
. *I would like to store AMC Concord in the local macro x, but this command doesn't work
. local x1 = make if price==4099
if not allowed
r(101);

end of do-file

r(101);

. do "C:\Users\fultonb\AppData\Local\Temp\STD627c_000000.tmp"

. *Also, I would like to store price in a scalar, but this command doesn't work
. scalar x2 = price if make=="AMC Concord"
if not allowed
r(101);

end of do-file

r(101);

Collapse variables by country, category, year

Hi!

I have the following data (dataex below) and I just want to confirm if I am doing the right approach given what I want to find. Initially, I want to group all variables in the dataex by country, category (tech_intensity), and year. Therefore, I did the following command

Code:

 collapse Establishments Employment share_emp Wages OutputINDSTAT4 ValueAdded GrossFixed r_valworker r_output_worker lval_per_worker ln_1980  perc_wanted2, by(country1 tech_intensity year)

The main purpose is to later create variables for each of the grouping variables .For instance, r_valworker_high, r_valworker_low, r_valworker_medium, ln_1980_high, ln_1980_low, ln_1980_medium... I do this by doing the following (for the variable share_emp)

Code:

 by year country1 (tech_intensity), sort: assert tech_intensity == _n
by year country1(tech_intensity): gen share_emp_high = share_emp[3]
by year country1(tech_intensity): gen share_emp_medium = share_emp[2]
by year country1(tech_intensity): gen share_emp_low = share_emp[1]

And, at the end, I group all variables (collapse) by country1 and year. My end goal is to merge this new dataset with World Development Indicators (WDI) data... Am I doing the right steps? Thank you very much!

Code:

 * Example generated by -dataex-. To install: ssc install dataex
clear
input int(country1 year) float(tech_intensity share_emp perc_wanted2 ln_1980 lval_per_worker) long Employment double(Wages OutputINDSTAT4)
4 1973 1  .05658884 . 0 .  1287 . .
4 1973 1          0 . 0 .     0 . .
4 1973 1    .552038 . 0 . 12555 . .
4 1973 1          0 . 0 .     0 . .
4 1973 1  .18999253 . 0 .  4321 . .
4 1973 1  .03187794 . 0 .   725 . .
4 1973 1          . . 0 .     . . .
4 1973 1  .08534494 . 0 .  1941 . .
4 1974 1          0 . 0 .     0 . .
4 1974 1   .5064898 . 0 . 14243 . .
4 1974 1  .04715337 . 0 .  1326 . .
4 1974 1  .02560364 . 0 .   720 . .
4 1974 1          0 . 0 .     0 . .
4 1974 1  .17229117 . 0 .  4845 . .
4 1974 1  .07268589 . 0 .  2044 . .
4 1974 1          . . 0 .     . . .
4 1975 1  .04365534 . 0 .  1447 . .
4 1975 1          0 . 0 .     0 . .
4 1975 1   .5189766 . 0 . 17202 . .
4 1975 1          0 . 0 .     0 . .
4 1975 1   .1539552 . 0 .  5103 . .
4 1975 1  .07002353 . 0 .  2321 . .
4 1975 1          . . 0 .     . . .
4 1975 1  .02636819 . 0 .   874 . .
4 1976 1          0 . 0 .     0 . .
4 1976 1  .56875193 . 0 . 20520 . .
4 1976 1  .02328224 . 0 .   840 . .
4 1976 1  .06422018 . 0 .  2317 . .
4 1976 1  .04060534 . 0 .  1465 . .
4 1976 1          . . 0 .     . . .
4 1976 1  .12583497 . 0 .  4540 . .
4 1976 1          0 . 0 .     0 . .
4 1977 1          0 . 0 .     0 . .
4 1977 1          . . 0 .     . . .
4 1977 1  .05816822 . 0 .  2240 . .
4 1977 1  .16383183 . 0 .  6309 . .
4 1977 1   .5333818 . 0 . 20540 . .
4 1977 1          0 . 0 .     0 . .
4 1977 1  .02389052 . 0 .   920 . .
4 1977 1  .04536602 . 0 .  1747 . .
4 1978 1          0 . 0 .     0 . .
4 1978 1          . . 0 .     . . .
4 1978 1          0 . 0 .     0 . .
4 1978 1  .04207927 . 0 .  1772 . .
4 1978 1    .152288 . 0 .  6413 . .
4 1978 1  .02355679 . 0 .   992 . .
4 1978 1   .5435397 . 0 . 22889 . .
4 1978 1  .06133789 . 0 .  2583 . .
4 1979 1  .05224232 . 0 .  2218 . .
4 1979 1   .5590023 . 0 . 23733 . .
4 1979 1          . . 0 .     . . .
4 1979 1 .021174863 . 0 .   899 . .
4 1979 1  .05949689 . 0 .  2526 . .
4 1979 1   .1414641 . 0 .  6006 . .
4 1979 1          0 . 0 .     0 . .
4 1979 1          0 . 0 .     0 . .
4 1980 1          0 . 0 .     0 . .
4 1980 1          . . 0 .     . . .
4 1980 1   .1470306 . 0 .  5672 . .
4 1980 1  .06897893 . 0 .  2661 . .
4 1980 1          0 . 0 .     0 . .
4 1980 1  .05783239 . 0 .  2231 . .
4 1980 1  .52979755 . 0 . 20438 . .
4 1980 1 .019908236 . 0 .   768 . .
4 1981 1          . . 0 .     . . .
4 1981 1  .01993137 . 0 .   697 . .
4 1981 1  .24252217 . 0 .  8481 . .
4 1981 1          0 . 0 .     0 . .
4 1981 1  .09917071 . 0 .  3468 . .
4 1981 1   .4308836 . 0 . 15068 . .
4 1981 1 .063768946 . 0 .  2230 . .
4 1981 1          0 . 0 .     0 . .
4 1982 1  .22308142 . 0 .  6866 . .
4 1982 1 .018974593 . 0 .   584 . .
4 1982 1  .05880824 . 0 .  1810 . .
4 1982 1   .4349535 . 0 . 13387 . .
4 1982 1  .10432777 . 0 .  3211 . .
4 1982 1          0 . 0 .     0 . .
4 1982 1          . . 0 .     . . .
4 1982 1          0 . 0 .     0 . .
4 1983 1 .033794936 . 0 .   942 . .
4 1983 1          . . 0 .     . . .
4 1983 1 .005632489 . 0 .   157 . .
4 1983 1  .07085456 . 0 .  1975 . .
4 1983 1  .12796871 . 0 .  3567 . .
4 1983 1   .1822487 . 0 .  5080 . .
4 1983 1          0 . 0 .     0 . .
4 1983 1   .3611609 . 0 . 10067 . .
4 1984 1 .005718865 . 0 .   157 . .
4 1984 1  .14854479 . 0 .  4078 . .
4 1984 1   .1889411 . 0 .  5187 . .
4 1984 1 .034568172 . 0 .   949 . .
4 1984 1  .04498598 . 0 .  1235 . .
4 1984 1   .3718355 . 0 . 10208 . .
4 1984 1          0 . 0 .     0 . .
4 1984 1          . . 0 .     . . .
4 1985 1   .3619766 . 0 . 10468 . .
4 1985 1  .06995401 . 0 .  2023 . .
4 1985 1          0 . 0 .     0 . .
4 1985 1          . . 0 .     . . .
end

How would you export ANOVA tables

foreach var in Subclass {
anova ATF4Targetgenes `var'
export ??????????
}

Hello everyone, I have created a for each loop to make many ANOVA tables at once, however, I am not sure how I can export the ANOVA tables to my working directory. For graphs the code would be export graph, what would it be for ANOVA tables?

Saturday, May 21, 2022

Criteria to apply xtpcse

Hello everyone. I have a dataset with N = 178 and T =14 .
Is it right to go for panel corrected standard errors model.

Questions about pseudo-strata/psu in complex survey design

Hi:

This is not a stata related question, so please forgive me if this is not allowed.

I am dealing with the NHIS database which has a complex survey design with clustering, stratification and oversampling of certain sub-population. The psu's are mainly counties or contiguous counties which are later stratified based on MSA status. The NHIS however only provide pseudo strata and pseudo psu codes for confidentiality reasons. For the survey period that I am interested in, there were 304 strata and 482 psu's. However, there are 300 pseudo-strata, each containing 2 pseudo psu's-so 600 pseudo psu's in total. My confusion stems from the fact that in the manual, they said that the pseudo psu's were constructed by collapsing the original psu's to create bigger clusters so that it would be more difficult to identify any given clusters. If that is the case then how come there are more pseudo psu's then the original ones?

I am trying to include some measure of area specific fixed effects in my panel regression and I was thinking of using the pseudo-psu's as a proxy for geographic area. It says in the above paper that, "a given geographic area within a given NHIS sample PSU should have the same set of Pseudo-Stratum and Pseudo-PSU codes assignments if it is present in more than one NHIS annual microdata file." Doesn't that imply that the original psu's are broken down into psudo-psu's which explains why there are more pseudo psu's than original ones? Then why does it say in the manual that the psu's are merged or collapsed?

I have attached a link to their manual.http://www.asasrms.org/Proceedings/y2007/Files/JSM2007-000353.pdf.

I would be really grateful if any kind soul could help me out!

convergence not achieved in GMM Model

Dear Statailist

I run the GMM regression of my variables in STATA and
I have error that w that "convergence not achieved"

and robust std. err. z P>|z| [95% conf. interval]
of one of my variables cannot be calculated?

Please can you explain it for me why?

Thanks

Base category not displayed

Fellow stata users, I am having trouble getting state to display the base category of the interaction between two variables that are both three-item categorical variables. Ideally, I should be seeing 9 points of interaction but I'm only seeing 6. And even then the base category is not displayed. When I output the results in an estout table, the base category is shown but not the base I was expecting. See below:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(Accountability_over_Speed Approval_of_Protest Religiousity_3)
 2  0  2
 2  1  2
 0  0  2
 2  1  0
.b  0  1
 0  0  2
 2  1  2
 1  0  1
 2  0  2
 0  0  2
 2  1  2
 2 .b  2
 2  1  2
 0  0  2
 2  0  2
 0  0  2
 0  1  2
 0  1  1
 0  0  2
 0  0  2
 0  1  2
 0  0  2
.b  0  2
 0  1  1
 2  1  2
 2  1  2
 2  0  2
 2  0  0
 0  0  2
 0  0  2
 0  0  0
 2  1  2
 2  0  2
 2 .b  1
 0  1  1
 2  0 .a
 2  0  2
 0  0  2
 0  0  1
 0  1  1
 2  0  1
 2  1  2
 0  1  2
 2  0  1
 2  1  1
 2  0  2
 0  1  2
 0  0  1
 2  1  1
 2  0  2
 2 .b  1
 0  0  2
 0  0  1
 0  0  1
.b  0  1
 0  1  1
 0  0  2
 0  0  2
 2  0  1
 0  0  2
 2  0  1
 0  1  2
 0  1  1
.b  0  1
 2  1  2
 1  0  2
 2  0  1
 0  0  2
 0  1  0
 2  1  1
 0 .b .b
 0  0  2
 0  0  2
 0  0  1
 2  0  1
 2  1  2
 2  0  2
 0  1  2
.b  0  2
 2  1  2
 0  1  1
 0  0  2
 1 .b .b
 0  0  1
 2  0  1
 2  1  2
 0  0  0
 0  0  1
.b .b  2
 2  1  2
 2  0  1
 0  1  2
 0  0  2
 2  1  2
 2  0  1
 0  0  2
 2  0  2
 0  0  2
.b  0  2
 0  1  2
end
label values Accountability_over_Speed Accountability_over_Speed
label def Accountability_over_Speed 0 "Speed", modify
label def Accountability_over_Speed 1 "Unsure", modify
label def Accountability_over_Speed 2 "Accountability", modify
label values Approval_of_Protest Approval_of_Protest
label def Approval_of_Protest 0 "No", modify
label def Approval_of_Protest 1 "Yes", modify
label values Religiousity_3 Religiousity_3
label def Religiousity_3 0 "Not Religious", modify
label def Religiousity_3 1 "Somewhat Religious", modify
label def Religiousity_3 2 "Highly Religious", modify

logit Approval_of_Protest i.Accountability_over_Speed##i.Religiousity_3, base
Results attached

Panel Data - Random effect model vs Pooled OLS

Hi Statalist

This is my first post, so bear with me if I make some mistakes.

I am conducting a Panel Data regression by looking at factors which may have impacted the U.S. Stock market during the first wave of COVID-19 (02.01.2020 - 12.06.2020).

Array

First of all, I ran a Hausman test in order to choose between Fixed and Random effect model, which gave a value of 1 indicating a Random effect model.
After the Hausman test, then was a xttest0 (Breusch-Pagan Lagrange multiplier) test in order to choose between Random Effect and OLS, which indicated a pooled OLS model.

However, I have a hard time to understand why my dataset isn't appropriate for a random effect model? My data is both changing across time and across firms (Cross section and Time-series data).
I will here below, show a short screenshot of my excel file which I am uploading to Stata.

Array

I hope my question make sense.

Best regards Victor

ARIMA Model

I am trying to find the arima model that has uncorrelated residuals. I plotted the AC and PAC. I observe that in any of the cases (PAC, AC) there is a decaying signal. Rather, I observe that both series have significant signals at lag 12, 24, 36 and so on. What ARIMA process could I use?

The PAC and AC are displayed in the attached file.

[ATTACH]temp_27403_1653127332530_336[/ATTACH]
Array

reshaping OECD panel data with not unique within i(var1 var2) problem

Hi, Stata masters. I have a problem with reshaping the common panel data which shows " with not unique within i(var1 var2) problem" on the output screen.

I know there are a lot of solutions on the internet; however, none of them are helpful.

Here's my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(countryid year) float fdistock str7 fdistock_measure str3 fdistock_subject
1 2009  2124.148 "MLN_USD" "CHN"
1 2010  6829.962 "MLN_USD" "CHN"
1 2011   6524.39 "MLN_USD" "CHN"
1 2012  8969.823 "MLN_USD" "CHN"
1 2013         . "MLN_USD" "CHN"
1 2014 10307.453 "MLN_USD" "CHN"
1 2015 10150.508 "MLN_USD" "CHN"
1 2016  9672.938 "MLN_USD" "CHN"
1 2017  10492.16 "MLN_USD" "CHN"
1 2018  9812.253 "MLN_USD" "CHN"
1 2019 10867.372 "MLN_USD" "CHN"
1 2009  89786.51 "MLN_USD" "USA"
1 2010  97319.85 "MLN_USD" "USA"
1 2011  101819.1 "MLN_USD" "USA"
1 2012 109685.74 "MLN_USD" "USA"
1 2013  110407.8 "MLN_USD" "USA"
1 2014  109386.8 "MLN_USD" "USA"
1 2015  80235.26 "MLN_USD" "USA"
1 2016  89867.59 "MLN_USD" "USA"
1 2017 104398.25 "MLN_USD" "USA"
1 2018   91698.9 "MLN_USD" "USA"
1 2019 102163.52 "MLN_USD" "USA"
2 2012  3288.918 "MLN_USD" "CHN"
2 2013  2785.823 "MLN_USD" "CHN"
2 2014  2445.197 "MLN_USD" "CHN"
2 2015  2481.219 "MLN_USD" "CHN"
2 2016  2615.158 "MLN_USD" "CHN"
2 2017   3445.67 "MLN_USD" "CHN"
2 2018  3764.598 "MLN_USD" "CHN"
2 2019  3756.459 "MLN_USD" "CHN"
2 2020  3568.536 "MLN_USD" "CHN"
2 2012  6683.377 "MLN_USD" "USA"
2 2013  8441.594 "MLN_USD" "USA"
2 2014 10376.913 "MLN_USD" "USA"
2 2015 10890.582 "MLN_USD" "USA"
2 2016 10451.144 "MLN_USD" "USA"
2 2017 11849.364 "MLN_USD" "USA"
2 2018 13234.486 "MLN_USD" "USA"
2 2019 13385.756 "MLN_USD" "USA"
2 2020 13818.873 "MLN_USD" "USA"
2 2012  1.038 "PC_FDI" "CAN"
2 2013   .985 "PC_FDI" "CAN"
2 2014   .643 "PC_FDI" "CAN"
2 2015   .614 "PC_FDI" "CAN"
2 2016   .501 "PC_FDI" "CAN"
2 2017   .219 "PC_FDI" "CAN"
2 2018   .066 "PC_FDI" "CAN"
2 2019   .089 "PC_FDI" "CAN"
2 2020   .104 "PC_FDI" "CAN"
2 2012 14.604 "PC_FDI" "DEU"
2 2013 14.178 "PC_FDI" "DEU"
2 2014 13.046 "PC_FDI" "DEU"
2 2015   12.8 "PC_FDI" "DEU"
2 2016 14.331 "PC_FDI" "DEU"
2 2017 15.647 "PC_FDI" "DEU"
2 2018 14.407 "PC_FDI" "DEU"
2 2019 16.261 "PC_FDI" "DEU"
2 2020 18.732 "PC_FDI" "DEU"
2 2012  1.312 "PC_FDI" "FRA"
2 2013  1.074 "PC_FDI" "FRA"
2 2014  2.093 "PC_FDI" "FRA"
2 2015  2.021 "PC_FDI" "FRA"
2 2016   2.12 "PC_FDI" "FRA"
2 2017  2.113 "PC_FDI" "FRA"
2 2018  1.708 "PC_FDI" "FRA"
2 2019   .928 "PC_FDI" "FRA"
2 2020   .955 "PC_FDI" "FRA"
2 2012  2.445 "PC_FDI" "GBR"
2 2013  3.366 "PC_FDI" "GBR"
2 2014  3.591 "PC_FDI" "GBR"
2 2015  3.712 "PC_FDI" "GBR"
2 2016  4.156 "PC_FDI" "GBR"
2 2017  3.328 "PC_FDI" "GBR"
2 2018  2.663 "PC_FDI" "GBR"
2 2019  2.529 "PC_FDI" "GBR"
2 2020  2.746 "PC_FDI" "GBR"
2 2012  1.719 "PC_FDI" "ITA"
2 2013  1.419 "PC_FDI" "ITA"
2 2014  1.122 "PC_FDI" "ITA"
2 2015  1.203 "PC_FDI" "ITA"
2 2016  1.606 "PC_FDI" "ITA"
2 2017  1.621 "PC_FDI" "ITA"
2 2018  1.951 "PC_FDI" "ITA"
2 2019  1.861 "PC_FDI" "ITA"
2 2020  2.233 "PC_FDI" "ITA"
2 2012    .03 "PC_FDI" "JPN"
2 2013   .032 "PC_FDI" "JPN"
2 2014   .042 "PC_FDI" "JPN"
2 2015   .065 "PC_FDI" "JPN"
2 2016   .089 "PC_FDI" "JPN"
2 2017   .073 "PC_FDI" "JPN"
2 2018   .129 "PC_FDI" "JPN"
2 2019   .144 "PC_FDI" "JPN"
2 2020   .164 "PC_FDI" "JPN"
2 2012   3.19 "PC_FDI" "USA"
2 2013  3.641 "PC_FDI" "USA"
2 2014  4.758 "PC_FDI" "USA"
2 2015  5.306 "PC_FDI" "USA"
2 2016  5.305 "PC_FDI" "USA"
2 2017  5.069 "PC_FDI" "USA"
2 2018  5.755 "PC_FDI" "USA"
2 2019  5.594 "PC_FDI" "USA"
2 2020  5.816 "PC_FDI" "USA"
end

And the following data are the form that I wash for.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte countryid int year double fdistock_mln_usd str3 fdistock_subject_mln_usd double fdistock_pc_fdi str3 fdistock_subject_pc_fdi
1 2009    2124.14794921875 "CHN"                   . ""  
1 2010     6829.9619140625 "CHN"                   . ""  
1 2011    6524.39013671875 "CHN"                   . ""  
1 2012     8969.8232421875 "CHN"                   . ""  
1 2013                   . "CHN"                   . ""  
1 2014        10307.453125 "CHN"                   . ""  
1 2015       10150.5078125 "CHN"                   . ""  
1 2016     9672.9384765625 "CHN"                   . ""  
1 2017      10492.16015625 "CHN"                   . ""  
1 2018     9812.2529296875 "CHN"                   . ""  
1 2019    10867.3720703125 "CHN"                   . ""  
1 2009       89786.5078125 "USA"                   . ""  
1 2010       97319.8515625 "USA"                   . ""  
1 2011      101819.1015625 "USA"                   . ""  
1 2012      109685.7421875 "USA"                   . ""  
1 2013       110407.796875 "USA"                   . ""  
1 2014       109386.796875 "USA"                   . ""  
1 2015       80235.2578125 "USA"                   . ""  
1 2016         89867.59375 "USA"                   . ""  
1 2017           104398.25 "USA"                   . ""  
1 2018       91698.8984375 "USA"                   . ""  
1 2019      102163.5234375 "USA"                   . ""  
2 2012       3288.91796875 "CHN"                   . ""  
2 2013   2785.822998046875 "CHN"                   . ""  
2 2014   2445.197021484375 "CHN"                   . ""  
2 2015   2481.218994140625 "CHN"                   . ""  
2 2016   2615.157958984375 "CHN"                   . ""  
2 2017      3445.669921875 "CHN"                   . ""  
2 2018   3764.597900390625 "CHN"                   . ""  
2 2019      3756.458984375 "CHN"                   . ""  
2 2020   3568.535888671875 "CHN"                   . ""  
2 2012      6683.376953125 "USA"                   . ""  
2 2013          8441.59375 "USA"                   . ""  
2 2014    10376.9130859375 "USA"                   . ""  
2 2015      10890.58203125 "USA"                   . ""  
2 2016    10451.1435546875 "USA"                   . ""  
2 2017    11849.3642578125 "USA"                   . ""  
2 2018     13234.486328125 "USA"                   . ""  
2 2019     13385.755859375 "USA"                   . ""  
2 2020     13818.873046875 "USA"                   . ""  
2 2012  1.0379999876022339 "CAN"  1.0379999876022339 "CAN"
2 2013   .9850000143051148 "CAN"   .9850000143051148 "CAN"
2 2014   .6430000066757202 "CAN"   .6430000066757202 "CAN"
2 2015   .6140000224113464 "CAN"   .6140000224113464 "CAN"
2 2016   .5009999871253967 "CAN"   .5009999871253967 "CAN"
2 2017  .21899999678134918 "CAN"  .21899999678134918 "CAN"
2 2018  .06599999964237213 "CAN"  .06599999964237213 "CAN"
2 2019  .08900000154972076 "CAN"  .08900000154972076 "CAN"
2 2020  .10400000214576721 "CAN"  .10400000214576721 "CAN"
2 2012  14.604000091552734 "DEU"  14.604000091552734 "DEU"
2 2013  14.178000450134277 "DEU"  14.178000450134277 "DEU"
2 2014  13.045999526977539 "DEU"  13.045999526977539 "DEU"
2 2015  12.800000190734863 "DEU"  12.800000190734863 "DEU"
2 2016  14.331000328063965 "DEU"  14.331000328063965 "DEU"
2 2017  15.647000312805176 "DEU"  15.647000312805176 "DEU"
2 2018  14.406999588012695 "DEU"  14.406999588012695 "DEU"
2 2019   16.26099967956543 "DEU"   16.26099967956543 "DEU"
2 2020   18.73200035095215 "DEU"   18.73200035095215 "DEU"
2 2012   1.312000036239624 "FRA"   1.312000036239624 "FRA"
2 2013  1.0740000009536743 "FRA"  1.0740000009536743 "FRA"
2 2014  2.0929999351501465 "FRA"  2.0929999351501465 "FRA"
2 2015  2.0209999084472656 "FRA"  2.0209999084472656 "FRA"
2 2016   2.119999885559082 "FRA"   2.119999885559082 "FRA"
2 2017    2.11299991607666 "FRA"    2.11299991607666 "FRA"
2 2018  1.7079999446868896 "FRA"  1.7079999446868896 "FRA"
2 2019   .9279999732971191 "FRA"   .9279999732971191 "FRA"
2 2020   .9549999833106995 "FRA"   .9549999833106995 "FRA"
2 2012   2.444999933242798 "GBR"   2.444999933242798 "GBR"
2 2013   3.365999937057495 "GBR"   3.365999937057495 "GBR"
2 2014  3.5910000801086426 "GBR"  3.5910000801086426 "GBR"
2 2015  3.7119998931884766 "GBR"  3.7119998931884766 "GBR"
2 2016   4.156000137329102 "GBR"   4.156000137329102 "GBR"
2 2017   3.328000068664551 "GBR"   3.328000068664551 "GBR"
2 2018  2.6630001068115234 "GBR"  2.6630001068115234 "GBR"
2 2019  2.5290000438690186 "GBR"  2.5290000438690186 "GBR"
2 2020   2.746000051498413 "GBR"   2.746000051498413 "GBR"
2 2012   1.718999981880188 "ITA"   1.718999981880188 "ITA"
2 2013  1.4190000295639038 "ITA"  1.4190000295639038 "ITA"
2 2014   1.121999979019165 "ITA"   1.121999979019165 "ITA"
2 2015  1.2029999494552612 "ITA"  1.2029999494552612 "ITA"
2 2016  1.6059999465942383 "ITA"  1.6059999465942383 "ITA"
2 2017   1.621000051498413 "ITA"   1.621000051498413 "ITA"
2 2018  1.9509999752044678 "ITA"  1.9509999752044678 "ITA"
2 2019  1.8609999418258667 "ITA"  1.8609999418258667 "ITA"
2 2020  2.2330000400543213 "ITA"  2.2330000400543213 "ITA"
2 2012 .029999999329447746 "JPN" .029999999329447746 "JPN"
2 2013  .03200000151991844 "JPN"  .03200000151991844 "JPN"
2 2014 .041999999433755875 "JPN" .041999999433755875 "JPN"
2 2015  .06499999761581421 "JPN"  .06499999761581421 "JPN"
2 2016  .08900000154972076 "JPN"  .08900000154972076 "JPN"
2 2017   .0729999989271164 "JPN"   .0729999989271164 "JPN"
2 2018   .1289999932050705 "JPN"   .1289999932050705 "JPN"
2 2019  .14399999380111694 "JPN"  .14399999380111694 "JPN"
2 2020    .164000004529953 "JPN"    .164000004529953 "JPN"
2 2012   3.190000057220459 "USA"   3.190000057220459 "USA"
2 2013  3.6410000324249268 "USA"  3.6410000324249268 "USA"
2 2014   4.757999897003174 "USA"   4.757999897003174 "USA"
2 2015   5.306000232696533 "USA"   5.306000232696533 "USA"
2 2016   5.304999828338623 "USA"   5.304999828338623 "USA"
2 2017   5.068999767303467 "USA"   5.068999767303467 "USA"
end

I am grateful for your reading and hope anyone can give me suggestions or feedback.