BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Wednesday, August 31, 2022

Computing indices from panel data

I have a panel Data from 1995 to 2021 for 14 products. I want to normalize var1 by its value in 2008 across time. How can I do that?

Mathematical Optimization in Stata: LP and MILP

Here's a 2013 presentation, posted on https://ideas.repec.org, describing two user-developed commands that do mathematical optimization in Stata: lp (linear programming) and milp (mixed integer linear programming)

They look very handy, but does anyone know where to find them? -findit lp- turns up nothing, and I have so far been unable to reach the author, Choonjoo Lee

interaction terms in stata (without main effect)

Hi,

I am little confuse about the way stata handle and designate interaction variables. To me (and to all textbooks), an interaction variable is X*W. But in stata language, interaction variables are coded X#W but they are not equivalent to X*W in a regression framework.

As an example, I would have expected that:

Code:

sysuse nlsw88.dta
gen south_married=south*married
reg wage  south south_married

gives the same results than

Code:

reg wage  i.south i.south#i.married

But they do not. The coefficients for the interaction term are the same but not the main effect (south). How do you interpret both south coefficients then?

Thanks for your help

Two time-point measure, subgroups

Hi Statalist, I hope for some ideas - because I am stuck.

I have dyadic (child & caregiver) data with two time points (pre- and post-intervention). The dependent variable is zero-bound continuous (health service use) data.
Co-variates are clinical and demographical variables (both categorical and continuous, both constant and time-varying).
I am aiming to look at treatment effects in groups of caregivers (male vs female) in children (if there is a difference) according to treatment effect.

Which analysis makes the most sense here, when I am interested in the intervention effect in a subgroup of caregivers? I have been thinking about reg with interaction term .reg depvar1 i.treatment##i.cgsex##timepoint i.covariable c. covariable, vce(cluster id)

Does this make sense or am I on the very wrong track here?

Kind regards, Maria

metaprop: error on using cimethod(exact)

Hi,
I have been getting error as below on using cimethod(exact). But with cimethod(score) there is no such error. How do I resolve this issue? Are these methods replaceable?

metaprop rall30 TotalSize, random by(author) cimethod (exact)
variable ul already defined
r(110);

Thanks

ci2 interferes with esttab

Hello,

After hours of troubleshooting code that used to work and stopped doing so, I figured out that the problem was introduced by my new usage of the -ci2- command. After I run the ci2 command the esttab command no longer works. It gives me the error message:
level() invalid -- invalid number, noninteger value
r(126);

Even after I run the commands -clear all- and -eststo clear- this still happens. The only way I have figured to get the problem to go away is to close stata and start a new session.

So, the question is, is there a more specific clear command to use to resolve this without restarting Stata?

I also hope this post helps others avoid the same problem.

Thanks,
Dan

logit and mixed regression

Hello together,

I am currently working on my Stata code for my bachelor thesis. I’m quite a beginner and would be super grateful for some help — best would be in as simpel words as possible.

The data set I’m working with is a two-level data set, with the person id as the first level and the household in which they live as second level. Most of my variables differ only within the household. For example I investigate the effect of birth order, so every child in the household has the same fixed effects concerning their household (all children live in the household with one level of assets, with one level of parental education and so on). So I have some sort of panel structure, even if I don’t make observations over time.

Now I have tow questions:

I want to estimate the effect of the childs gender and the family size in one model. The childs gender can have household fixed effects and only varies within the household. But the family size varies between households. I found out, that I somehow have to use the mixed command, but I’m not sure how exactly to implement it.
I have two types of outcome variables: Dummy variables and variables that could take more than just two values. I found out that for the dummy variables I have to use the logit or logistic command. If I type „logit Dummyvaribale sex“ or „logistic Dummyvaribale sex“ I get the following result: outcome does not vary; remember: 0 = negative outcome, all other non-missing values = positive outcome, r(2000). I don’t understand what was wrong here.

Thank you so much for your help - I’m super grateful for any advice!!

Here my not working relevant part of the code
* options:
mixed highest_grade_compleated sex number_siblings|| hh_idn:, mle
mixed highest_grade_compleated sex || hh_idn: number_siblings
*sex allein in nem fixed effects modell hat 0,60 als coeffizineten

reg dummycur number_siblings
reg enrollment_age number_siblings
reg highest_grade_compleated number_siblings age
reg schooling_progression number_siblings

xtset hh_idn
xtlogit current_yn sex, fe
*ODER
clogit current_yn sex, group(hh_idn)

Twoway Bar Graph Colors for Each Bar

Tuesday, August 30, 2022

Distinct occurrences of variable value within each individual in a panel

The server I’m working on does not allow me to install -dataex-, so I apologize for the lack of standardized example!

I’m working with a panel. I have individual-level purchase data from a chain of stores, and each person’s purchases were aggregated monthly. I have an id variable for each person (mvp_id) and an indicator variable for month (mthyr). I also have a variable that gives me the store ID where the person spent most money in each month (topstore1). I need to know how many people have (two or more) different top stores across the months in my dataset, versus how many people have the same top store across all months in my dataset.

Cox proportional test

my proportional-hazard assumption test based on Global test derived from Schoenfeld residuals was significant ----------------------------------------------------------------
| chi2 df Prob>chi2
------------+---------------------------------------------------
global test | 183.92 17 0.0000
----------------------------------------------------------------

what should i do????

Relative risk ratios and their CIs from teffects psmatch

I am trying to work out relative risk ratios from the ATET result on a binary outcome in teffects psmatch. I'd like to present the results as Relative risk ratio, absolute risk difference and number needed to harm - the public health audience the paper is intended for will have a greater understanding of these than presenting the ATET alone.

My understanding is that for a binary outcome the ATET is equivalent to the absolute risk difference between i. those treated and ii. those matched to the treated using PSM

So to my thinking the ATET for my primary outcome, depression, is .0604, so the absolute risk difference is 6.0%. The proportion of those treated (received a new form of welfare) who were depressed was 51.7%, therefore the proportion of those untreated ought to be 51.7-6.0 = 45.7%. From this you can calculate the relative risk at 51.7/45.7% = 1.13.

My question are
1. Is this correct? I haven't found anything relating to calculating the proportions of those with the outcome in the untreated or risk ratios.
2. if it is correct how do I calculate confidence intervals?

code

Code:

teffects psmatch (mcs_bin) (switch sex_dv1 i.agecat1 i.hhdi_5_1 health1 i.jbstat2 i.marstat_dv1 smoker1 i.hiqual2 i.region1 i.ethn2 ca1 i.intdaty_dv1), atet nn(3)

proportion mcs_bin if switch==1

Thank you!

Replace a observation by a missing value

I would like to know the command to replace the observations where it is written "refusal" and "don´t know" in the variable "resunsafe1_11" by a missing value (.).
I send an example of the dataset below.
Thank you very much in advance.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long(pidp hidp) byte(pno resunsafe1_11 resunsafe2_11 resunsafe3_11)
  68513407   71114404 1  0  0  0
  69394015   75595604 3  0  0  0
  69406247   75629604 1  0  0  0
  69451131   75711204 2  0  0  0
  69854367   76098804 1  0  0  0
 136376727  138264404 1  0  0  0
 136591607  139468004 1  0  0  0
 136591615  139468004 3  0  0  0
 137389927  143949204 1  0  0  0
 137436167  143996804 1  0  0  0
 137779567  144350404 1  0  0  0
 204068691  204462404 2  0  0  1
 204595687  207454404 1  0  0  1
 204595691  207454404 2  0  0  1
 205007775  209712004 1  0  0  0
 205420535  211724804 2  1  0  0
 205604807  211949204 1  0  0  0
 205786367  212119204 1  0  0  0
 205958407  212398004 1  0  0  0
 206224967  212554404 1  0  0  0
 272165931  272979204 2  0  0  0
 340061887  340360404 1  0  0  0
 340434527  342550004 1  0  0  0
 341325331  347330404 2  0  0  0
 341419847  347514004 1  0  0  0
 341537487  347629604 1  0  0  0
 341691167  347888004 1  0  0  0
 341764607  347949204 1  0  0  1
 342008727  348214404 1  0  1  0
 342014847  348228004 1  1  1  0
 342222247  348357204 1  0  0  0
 342222251  348357204 2  0  0  0
 342222259  348357204 4  0  0  0
 408041487  408292404 1  0  0  0
 408205367  409251204 1  0  0  0
 409638807  415854004 1  0  0  0
 409653091  415874404 2  0  0  0
 409693207  415922004 1  0  0  1
 409733327  415996804 1  0  0  0
 409779571  416030804 2  0  0  0
 409847567  416132804 1  0  0  0
 409897211  416200804 2  0  0  0
 410126371  416493204 2  0  0  0
 410131127  416500004 1  0  0  0
 410230407  416568004 1  0  0  0
 476000691  476006804 2  0  0  0
 478404487  484275604 1  0  0  1
 545284531  551337204 2  0  0  0
 545532047  551854004 1  0  0  0
 545532055  551854004 2  0  0  0
 545781607  551996804 1  0  0  0
 546314047  552595204 1  0  0  0
 546524847  552690404 1  0  0  1
 546761499  552942004 2  0  0  0
 546766931  552962404 2  0  0  0
 612193131  613292004 2  0  0  0
 612621527  616059604 1 -1 -1 -1
 612677971  616433604 2  0  0  0
 613438887  620194004 1  0  0  0
 613518455  620234804 3  0  0  0
 613783651  620479604 1  1  1  0
 613806087  620520404 1  0  0  0
 614375931  620982804 2  0  0  0
 680330491  681917604 2  0  0  0
 680954047  685202004 1  0  0  0
 680962211  685276804 2  0  0  0
 681007099  685542004 1  0  0  0
 681551087  687554804 1  0  0  0
 681585767  687561604 1  0  0  0
 681717687  687772404 1  0  0  0
 681717691  687772404 2  0  0  1
 681792487  687847204 1  1  0  0
 748594327  751481604 1  0  0  0
 749419847  755656804 1  0  0  1
 750244007  756336804 1  0  0  0
 750505807  756534004 1  0  0  0
 817670087  824071604 1  0  0  0
 817992411  824642804 2  0  0  0
 884224415  885197484 1  0  0  0
 884536527  886896804 1  0  0  0
 884616767  887325204 1  0  0  0
 885250527  890582404 1  0  0  0
 885392651  891051604 2  0  0  0
 885861167  891650004 1  0  0  0
 886011447  891826804 1  0  0  0
 886012127  891833604 1  0  0  0
 886305207  892166804 1  0  0  0
 886346691  892200804 2  0  0  0
 886428967  892289204 1  0  0  0
 886558847  892384404 1  0  0  0
 952242767  953455204 1  0  0  0
 953100931  958018004 2  0  0  0
 953649019  959643204 4  0  0  0
 953905367  959962804 1  0  0  1
 954174651  960377604 2  0  0  0
 954463647  960608804 1  1  0  1
1022476567 1028574804 1  0  0  0
1089592567 1095439204 1  0  0  0
1089592571 1095439204 2  0  0  0
1089644247 1095507204 1  1  1  0
end
label values resunsafe1_11 c_resunsafe1_11
label def c_resunsafe1_11 -1 "don't know", modify
label def c_resunsafe1_11 0 "not mentioned", modify
label def c_resunsafe1_11 1 "mentioned", modify
label values resunsafe2_11 c_resunsafe2_11
label def c_resunsafe2_11 -1 "don't know", modify
label def c_resunsafe2_11 0 "not mentioned", modify
label def c_resunsafe2_11 1 "mentioned", modify
label values resunsafe3_11 c_resunsafe3_11
label def c_resunsafe3_11 -1 "don't know", modify
label def c_resunsafe3_11 0 "not mentioned", modify
label def c_resunsafe3_11 1 "mentioned", modify

Count number of individuals that gave an answer in a questionnarie

I would like to know the code to know the number of individuals (each individuak is identified by the number "pidp") that answer "mentioned" in the question represented by the variable "unsafe11" and also anser "mentioned" in the question represented by the variable "resunsafe2_11".
I send an example of the dataset below.
Thank you very much

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long(pidp hidp psu) byte(unsafe1 resunsafe1_11 resunsafe2_11 resunsafe3_11 resunsafe4_11 resunsafe5_11 resunsafe6_11)
  68513407   71114404  2996 0  0  0  0  0  0  0
  69394015   75595604  7105 0  0  0  0  0  0  0
  69406247   75629604  7123 0  0  0  0  0  0  0
  69451131   75711204  7190 0  0  0  0  0  0  0
  69854367   76098804  7792 0  0  0  0  0  1  0
 136376727  138264404  2724 0  0  0  0  0  0  0
 136591607  139468004  3156 0  0  0  0  0  0  0
 136591615  139468004  3156 0  0  0  0  0  0  0
 137389927  143949204  8919 1  0  0  0  0  0  0
 137436167  143996804  8987 0  0  0  0  0  0  0
 137779567  144350404  9495 0  0  0  0  0  0  0
 204068691  204462404  2140 0  0  0  1  0  0  0
 204595687  207454404  3172 0  0  0  1  0  0  0
 204595691  207454404  3172 0  0  0  1  0  0  0
 205007775  209712004  3988 0  0  0  0  0  0  0
 205420535  211724804 10962 0  1  0  0  0  0  0
 205604807  211949204 11244 0  0  0  0  0  0  0
 205786367  212119204 11461 0  0  0  0  0  0  0
 205958407  212398004 11723 0  0  0  0  0  0  0
 206224967  212554404 12119 0  0  0  0  0  0  0
 272165931  272979204  2327 0  0  0  0  0  0  0
 340061887  340360404  2127 0  0  0  0  0  0  0
 340434527  342550004  2847 0  0  0  0  0  0  0
 341325331  347330404  4599 0  0  0  0  0  0  0
 341419847  347514004 14121 0  0  0  0  0  0  0
 341537487  347629604 14294 0  0  0  0  0  0  0
 341691167  347888004 14522 0  0  0  0  0  0  0
 341764607  347949204 14630 0  0  0  1  0  0  0
 342008727  348214404 15006 0  0  1  0  0  1  0
 342014847  348228004 15015 0  1  1  0  0  0  0
 342222247  348357204 15274 0  0  0  0  0  0  0
 342222251  348357204 15274 0  0  0  0  0  0  1
 342222259  348357204 15274 0  0  0  0  0  0  0
 408041487  408292404  2095 0  0  0  0  0  0  0
 408205367  409251204  2407 0  0  0  0  0  0  0
 409638807  415854004 16094 0  0  0  0  0  0  0
 409653091  415874404 16115 0  0  0  0  0  0  0
 409693207  415922004 16174 0  0  0  1  0  0  0
 409733327  415996804 16235 0  0  0  0  0  0  0
 409779571  416030804 16303 0  0  0  0  0  0  0
 409847567  416132804 16403 0  0  0  0  0  0  0
 409897211  416200804 16476 0  0  0  0  0  0  0
 410126371  416493204 16834 0  0  0  0  0  0  0
 410131127  416500004 16841 0  0  0  0  0  0  0
 410230407  416568004 16987 0  0  0  0  0  0  0
 476000691  476006804  2010 0  0  0  0  0  0  0
 478404487  484275604 18940 0  0  0  1  0  0  0
 545284531  551337204  4498 0  0  0  0  0  0  0
 545532047  551854004 19529 0  0  0  0  0  0  0
 545532055  551854004 19529 0  0  0  0  0  0  0
 545781607  551996804 19851 0  0  0  0  0  0  0
 546314047  552595204 20661 0  0  0  0  0  0  0
 546524847  552690404 20974 0  0  0  1  0  0  1
 546761499  552942004 21324 0  0  0  0  0  0  1
 546766931  552962404 21332 0  0  0  0  0  0  0
 612193131  613292004  2378 0  0  0  0  0  0  0
 612621527  616059604  3218 0 -1 -1 -1 -1 -1 -1
 612677971  616433604  3338 0  0  0  0  0  1  0
 613438887  620194004 21493 0  0  0  0  0  0  0
 613518455  620234804 21610 0  0  0  0  0  0  0
 613783651  620479604 22003 0  1  1  0  0  0  0
 613806087  620520404 22036 0  0  0  0  0  0  0
 614375931  620982804 22886 0  0  0  0  0  0  0
 680330491  681917604  2637 0  0  0  0  0  0  0
 680954047  685202004  3933 0  0  0  0  0  0  0
 680962211  685276804  3957 0  0  0  0  0  0  0
 681007099  685542004  4053 0  0  0  0  0  0  0
 681551087  687554804 23449 0  0  0  0  0  0  0
 681585767  687561604 23500 0  0  0  0  0  1  0
 681717687  687772404 23699 0  0  0  0  0  0  0
 681717691  687772404 23699 0  0  0  1  0  0  0
 681792487  687847204 23809 0  1  0  0  0  0  0
 748594327  751481604  3133 0  0  0  0  0  0  0
 749419847  755656804 24858 0  0  0  1  0  0  0
 750244007  756336804 26180 0  0  0  0  0  0  0
 750505807  756534004 26569 0  0  0  0  0  0  0
 817670087  824071604 27171 0  0  0  0  0  0  0
 817992411  824642804 27600 0  0  0  0  0  0  0
 884224415  885197484  2446 0  0  0  0  0  0  0
 884536527  886896804  3046 0  0  0  0  0  0  0
 884616767  887325204  3214 0  0  0  0  0  0  0
 885250527  890582404  4510 0  0  0  0  0  0  0
 885392651  891051604 28589 0  0  0  0  0  0  0
 885861167  891650004 51325 0  0  0  0  0  0  0
 886011447  891826804 29468 0  0  0  0  0  0  0
 886012127  891833604 29469 0  0  0  0  0  0  0
 886305207  892166804 29906 0  0  0  0  0  0  0
 886346691  892200804 29976 0  0  0  0  0  0  0
 886428967  892289204 30103 0  0  0  0  0  0  0
 886558847  892384404 50120 0  0  0  0  0  0  0
 952242767  953455204  2502 0  0  0  0  0  0  0
 953100931  958018004  4134 0  0  0  0  0  0  0
 953649019  959643204 30609 0  0  0  0  0  0  1
 953905367  959962804 30999 0  0  0  1  0  0  1
 954174651  960377604 31425 0  0  0  0  0  0  0
 954463647  960608804 31911 0  1  0  1  0  0  1
1022476567 1028574804 33615 0  0  0  0  0  0  0
1089592567 1095439204 34368 0  0  0  0  0  0  0
1089592571 1095439204 34368 0  0  0  0  0  0  0
1089644247 1095507204 34444 0  1  1  0  0  0  0
end
label values psu c_psu
label values unsafe1 c_unsafe1
label def c_unsafe1 0 "not mentioned", modify
label def c_unsafe1 1 "Mentioned", modify
label values resunsafe1_11 c_resunsafe1_11
label def c_resunsafe1_11 -1 "don't know", modify
label def c_resunsafe1_11 0 "not mentioned", modify
label def c_resunsafe1_11 1 "mentioned", modify
label values resunsafe2_11 c_resunsafe2_11
label def c_resunsafe2_11 -1 "don't know", modify
label def c_resunsafe2_11 0 "not mentioned", modify
label def c_resunsafe2_11 1 "mentioned", modify
label values resunsafe3_11 c_resunsafe3_11
label def c_resunsafe3_11 -1 "don't know", modify
label def c_resunsafe3_11 0 "not mentioned", modify
label def c_resunsafe3_11 1 "mentioned", modify
label values resunsafe4_11 c_resunsafe4_11
label def c_resunsafe4_11 -1 "don't know", modify
label def c_resunsafe4_11 0 "not mentioned", modify
label values resunsafe5_11 c_resunsafe5_11
label def c_resunsafe5_11 -1 "don't know", modify
label def c_resunsafe5_11 0 "not mentioned", modify
label def c_resunsafe5_11 1 "mentioned", modify
label values resunsafe6_11 c_resunsafe6_11
label def c_resunsafe6_11 -1 "don't know", modify
label def c_resunsafe6_11 0 "not mentioned", modify
label def c_resunsafe6_11 1 "mentioned", modify

Specifying a 'fill in the blank' variable in a loop

Hello,

I'm sure this is a function but I'm having trouble finding it --
I'm trying to make a loop that behaves differently for each observation depending on the value for one of the variables.

I have this command right now,

Code:

 foreach f in Allergy_Envir Allergy_Skin Allergy_Other Asthma Psoriasis Thyroiditis T1Diabetes Uveitis DownSyndrome{
      egen `f'_sensitivity = rowtotal(ind_`f'_w1 - ind_`f'_w5) if start_of_symptoms == 5

My variable, start_of_symptoms, ranges from 1 to 21. I want to create a loop that can go through all the values of start_of_symptoms and add up the ind_`f'_w* variables accordingly.

I was trying to do this,

Code:

 foreach f in Allergy_Envir Allergy_Skin Allergy_Other Asthma Psoriasis Thyroiditis T1Diabetes Uveitis DownSyndrome{
       foreach w in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21{
              replace `f'_sensitivity = rowtotal(ind_`f'_w1 - ind_`f'_w`w') if start_of_symptoms == `w'

But I quickly realized you can't use the replace command like that.

Is there an easy way to set this up?

Thank you,
Jon

Loop mixed model over a least of predictors and store the coefficients and P values

Hi,

I want to loop a mixed model over a list of predictors (fixed effects) and store the coefficients and P values in Stata v. 17.

Code:

sysuse auto, clear

xtmixed price mpg || foreign:, covariance(independent) vce(robust)

foreach var of varlist mpg-gear_ratio {
    xtmixed price `var' || foreign:, covariance(exchangeable) vce(robust)
}

Can anyone help me with how to proceed to store the coefficients and P values of the fixed effect for each model?

Thanks in advance,

Sergio

Monday, August 29, 2022

preparing data for propensity score match

Dear All,

I want to analyze if the gain in public health insurance due to this policy (Medicaid expansion) has led poor individuals to increase access to preventive services using The Medical Expenditure Panel Survey — Household Component (MEPS-HC), conducted by the Agency for Healthcare Research Quality (AHRQ), collected from 2013 to 2015.

I want to append this two longitudinal data files

https://www.meps.ahrq.gov/mepsweb/da...fNumber=HC-172 (2013-14)

https://www.meps.ahrq.gov/data_stats...83/h183doc.pdf (2014-15)

I want to merge this two data set and then use psscorematch to find out the change in gain in public health insurance for using preventive services. I will be using the package psmatch2 for my analysis.

I am facing serious challenge to organize the data set. I am looking for some example files but unable to do so.

Any help in this regard would be highly solicitated.

With sincere regards,
Upananda Pani

Matching Actions to Years

I have to different databases. Database 1 has executive actions in specific years, but this is not continuous (there may be an executive action in 2002 then nothing until 2004) Database 2 has country specific economic performance in specific years. I need somehow to match up the executive action score to the most recent event. So I have database 1 that looks like:

year | governance_score | country
1919 5 USA
1979 12 FRA
1988 10 USA
1988 3 DEU
2001 9 FRA
2005 11 DEU

And Database 2
year | country | GDP
1990 USA 290
1991 USA 292
1992 USA 294
1995 FRA 143

How can I keep the governance score consistent for each year after the active date until the country gets a subsequent score and match it to the country?

Thank you in advance for your help!

generate a value indicating how many time a situation happens before the specific date

Hello everyone in the community! Sorry to have some problems with the same data again , thanks again, in advance, for all your help!

This is the previous post and I think I just need to build the dummy before, but I figure out I also need to build the number variable

https://www.statalist.org/forums/for...-specific-date

A project (identified by pid) may submit its business plan to investors with different status (identified by investor_h_position: dummy, 1 for high-status) multiple times (each different submission is identified by v1), and (the submission date is marked by bp_submission_date).

I would like to create a variable high_position_before_num, which divides the submissions with the same pid into a group, and identifies that, in each submissions, how many times the project has been submitted to high-status investor before this submission date.

Since I've found difficult to deal with the date data, I hope that I could get suggestions and guidance from you. Thanks in advance!

Here is my sample data.

* Example generated by -dataex-. To install: ssc install dataex clear input long(v1 pid) str10 bp_submission_date float investor_h_position 59536 10134 "2016-07-14" 1 59908 10134 "2016-07-03" 0 59866 10134 "2016-06-27" 0 7782 10134 "2016-04-26" 1 59929 10134 "2016-08-01" 1 59861 10134 "2016-06-25" 0 60190 10134 "2016-07-29" 1 62622 10134 "2017-03-22" 1 60196 10134 "2016-08-28" 0 59943 10134 "2016-07-12" 0 62228 10137 "2016-12-28" 1 62230 10137 "2017-01-27" 1 62229 10137 "2016-12-30" 1 62231 10137 "2017-01-27" 1 62232 10137 "2017-01-27" 0 62234 10137 "2017-01-27" 1 62403 10137 "2017-02-16" 0 62233 10137 "2017-01-27" 1 53934 10155 "2016-01-29" 1 53942 10155 "2016-01-19" 1 53935 10155 "2016-01-19" 1 54917 10155 "2016-02-17" 1 53936 10155 "2016-01-21" 1 53938 10155 "2016-04-26" 1 62525 10155 "2017-03-16" 1 53937 10155 "2016-04-26" 1 59078 10157 "2016-06-15" 1 61380 10159 "2016-10-31" 0 62738 10161 "2017-04-05" 0 62739 10161 "2017-04-06" 1 60493 10162 "2016-09-10" 0 60559 10162 "2016-09-17" 1 60503 10162 "2016-09-10" 0 60487 10162 "2016-08-12" 1 60542 10162 "2016-09-17" 0 59237 10162 "2016-05-31" 0 60546 10162 "2016-09-17" 1 60502 10162 "2016-09-10" 1 60499 10162 "2016-09-10" 0 59227 10162 "2016-06-01" 0 56144 10162 "2016-03-06" 0 56186 10162 "2016-03-08" 0 60563 10162 "2016-09-17" 1 56143 10162 "2016-03-11" 1 56140 10162 "2016-03-06" 1 56182 10162 "2016-04-26" 1 60508 10162 "2016-09-10" 0 60488 10162 "2016-08-11" 0 60506 10162 "2016-09-10" 0 60548 10162 "2016-09-17" 0 60498 10162 "2016-08-29" 0 56184 10162 "2016-04-26" 0 60507 10162 "2016-09-10" 0 56162 10162 "2016-04-26" 1 56146 10162 "2016-03-07" 0 60565 10162 "2016-09-17" 0 56181 10162 "2016-04-26" 1 56151 10162 "2016-04-16" 1 56157 10162 "2016-03-07" 0 56177 10162 "2016-04-15" 0 59229 10162 "2016-06-12" 1 60485 10162 "2016-08-11" 1 59232 10162 "2016-06-28" 0 56171 10162 "2016-03-13" 0 56164 10162 "2016-04-26" 1 60564 10162 "2016-09-17" 0 60505 10162 "2016-09-10" 1 60495 10162 "2016-09-10" 0 56153 10162 "2016-04-26" 0 60504 10162 "2016-09-10" 1 60551 10162 "2016-09-17" 1 60555 10162 "2016-09-17" 1 60484 10162 "2016-08-11" 0 60501 10162 "2016-09-10" 1 56190 10162 "2016-04-26" 0 56148 10162 "2016-04-26" 1 56169 10162 "2016-04-26" 1 59236 10162 "2016-06-09" 1 56141 10162 "2016-03-06" 1 56189 10162 "2016-04-26" 1 56142 10162 "2016-03-09" 1 56166 10162 "2016-04-26" 1 56172 10162 "2016-04-26" 1 56154 10162 "2016-03-18" 1 60544 10162 "2016-09-17" 1 59235 10162 "2016-06-28" 1 56147 10162 "2016-03-08" 1 59238 10162 "2016-05-31" 0 56188 10162 "2016-04-26" 0 56139 10162 "2016-03-06" 1 56187 10162 "2016-03-06" 1 60483 10162 "2016-08-11" 0 56150 10162 "2016-03-10" 0 56145 10162 "2016-03-07" 0 60494 10162 "2016-09-10" 0 60550 10162 "2016-09-17" 0 56167 10162 "2016-04-26" 1 60489 10162 "2016-08-16" 1 60545 10162 "2016-09-17" 1 59240 10162 "2016-06-28" 0 end

rename variables

I am poor at Stata coding.
I would like to rename from "logind1-logind84"
to " logcoefstandind101, logcoefstandind102, logcoefstandind105, logcoefstandind106, logcoefstandind107, logcoefstandind108, logcoefstandind111, logcoefstandind112, logcoefstandind131, logcoefstandind132, logcoefstandind133, logcoefstandind141, logcoefstandind151, logcoefstandind161, logcoefstandind162, logcoefstandind171, logcoefstandind179,
logcoefstandind181, logcoefstandind192, logcoefstandind201, logcoefstandind202, logcoefstandind203, logcoefstandind204, logcoefstandind205, logcoefstandind212, logcoefstandind213,
logcoefstandind221, logcoefstandind222, logcoefstandind231, logcoefstandind232, logcoefstandind233, logcoefstandind239, logcoefstandind241, logcoefstandind242, logcoefstandind243,
logcoefstandind251, logcoefstandind259, logcoefstandind261, logcoefstandind262, logcoefstandind263, logcoefstandind264, logcoefstandind265, logcoefstandind272, logcoefstandind273,
logcoefstandind281, logcoefstandind282, logcoefstandind283, logcoefstandind285, logcoefstandind289, logcoefstandind291, logcoefstandind292, logcoefstandind303, logcoefstandind311,
logcoefstandind313, logcoefstandind319, logcoefstandind320, logcoefstandind332, logcoefstandind334, logcoefstandind339, logcoefstandind581, logcoefstandind582, logcoefstandind591,
logcoefstandind602, logcoefstandind612, logcoefstandind620, logcoefstandind631, logcoefstandind639, logcoefstandind641, logcoefstandind649, logcoefstandind651, logcoefstandind661,
logcoefstandind662, logcoefstandind701, logcoefstandind712, logcoefstandind713, logcoefstandind721, logcoefstandind729, logcoefstandind732, logcoefstandind855, logcoefstandind856,
logcoefstandind857, logcoefstandind901, logcoefstandind911, logcoefstandind912"

I try to make coding. But, it failed. Can we fix the problem?

program define XXXXX11111112345
local somevar=(101 102 105 106 107 108 ///
111 112 ///
131 132 133 141 151 161 162 171 179 181 ///
192 201 202 203 204 205 212 213 221 222 ///
231 232 233 239 241 242 243 251 259 261 ///
262 263 264 265 272 273 281 282 283 285 ///
289 291 292 303 311 313 319 320 332 334 ///
339 581 582 591 602 612 620 631 639 641 ///
649 651 661 662 701 712 713 721 729 732 ///
855 856 857 901 911 912)
foreach v of local somevar{
rename (logind1-logind84) (logcoefstandind`v')
}
end
XXXXX11111112345
exit

forvalues i=1/84 {
local j=(102 105 106 107 108 111 112 ///
131 132 133 141 151 161 162 171 179 181 ///
192 201 202 203 204 205 212 213 221 222 ///
231 232 233 239 241 242 243 251 259 261 ///
262 263 264 265 272 273 281 282 283 285 ///
289 291 292 303 311 313 319 320 332 334 ///
339 581 582 591 602 612 620 631 639 641 ///
649 651 661 662 701 712 713 721 729 732 ///
855 856 857 901 911 912)
rename coefstandind`i' logcoefstandind`j'
}
}

Kaplan Meier survival description output - no median result

hi all,
I am using Stat 16 on Mac.
I am calculating Kaplan Meier survival curves for 2 groups of patients who do or do not have a genetic mutation.
But after I set the survival data (stset), I am getting what seems like an error in my descriptive statistics for the survival data.
Ex: nothing listed for 25%, 50%, 75% survival times.
Is this because my failure rate is low compared to the time at risk?
Is there any way around this to determine the median survival time for each group?
Thank you!

stsum, by(hyper)

failure _d: recur2 == 1
analysis time _t: betweendx2censor

| Incidence Number of |------ Survival time -----|
hyperm~h | Time at risk rate subjects 25% 50% 75%
---------+---------------------------------------------------------------------
No | 352,899 .0000878 260 . . .
Yes | 87,166 .0001262 63 1943 . .
---------+---------------------------------------------------------------------
Total | 440,065 .0000954 323 . . .

Interval censored survival using aggregate data

Hello,

I've got data on vials of drosophila infected with a virus. Then at each time point, I record how many are still alive. I would like to do an interval censored survival analysis to determine how many are alive at the end.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(STRAIN infected startingsample time1 time2 time3 time4 time5 time6) str1 gender byte age
1 1 20 19 19 18 18 18 15 "F" 2
2 0 15 14 14 14 14 12 12 "M" 4
3 1 30 29 29 29 29 29 29 "F" 3
4 1 15 15 15 15 13 12 10 "F" 3
5 0 10 10  9  9  8  7  7 "F" 3
6 1 20 17 16 16 16 14 14 "M" 4
7 0  9  8  5  3  0  0  0 "F" 2
8 0 12 10 10 10 10  9  7 "M" 3
end

------------------ copy up to and including the previous line ------------------

I used this code below

[/CODE]
stintreg i.infected gender , interval(t6 startingsample) distribution(weibull)

This code doesn't allow me to enter all the timepoints in the interval. Do I only need to enter the last sample recorded regardless of other time points and the starting sample? Thank you in advance.

Changing a binary policy variable shifting it to next year for repeated cross section data

I've enlisted policy as a binary variable which I need to take 1 year forward. Like in my data for county 1003 the policy takes value 1 at year 2021. But, I need to create a new variable where the policy will take the value 1 for county 1003 at year =2022. Same goes for each observation. I have a repeated cross section dataset.

Can anyone kindly take me how I can execute it? Here is the sample data for your convenience.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float policy double county int year
0 1003 2002
1 1003 2021
0 2012 2002
0 1003 2002
1 1004 2015
0 1004 2004
0 1013 2002
1 1013 2018
1 10003 2004
0 12003 2009
0 10003 2000
1 12003 2014
0 10101 2021
1 12003 2014
end

Selecting a subset of ordered variables without typing them manually

Dear all,

I am trying to execute a fairly simple operation:

Code:

egen duration_desk=rowtotal(t_*_pagesubmit)

The issue I have is that with the code I just showed, I am selecting all my variables that include 't_*_pagesubmit' in their name, and I am only interested in a few of them that luckly are ordered. However, in-between there are other non-related variables that do not follow the 't_*_pagesubmit' formula.

Say I am interested in variables 't_1_pagesubmit' 't_2_pagesubmit' and so on until 't_10_pagesubmit'. I could use a code that selects from 1 to 10 but how can I do that without selecting the variables in-between them?

Of course I could simply type all the variables manually but there must be a simpler way around.
I do apologize if this has been answered, I honestly did not know how to search for such a problem.
Thank you for your time.

A numerical representation of distance from a regression line?

Hello

Am a novice Stata user/ statistician. I have created a two way scatter of aggregated travel distances (y axis) and passenger counts (x axis) for suburbs of Cape Town. This I created from aggregated data bins from the original dataset using a collapse function (I think this was the correct method).I have a regression line. The data has non-normal distribution. I am trying to identify the 'worst outliers' for travel distance per capita. Is there a way to represent this in a numerical way? I can 'see' them on the graph, but for accuracy I would like to isolate the top 5 locations and describe them with more detail.

Many thanks. Mark
Array

Creating lagged variables with pooled cross sectional data

I am working with a dataset that contains student grade observations in a variety of subjects, in different schools, over ten years. This is not a time-series dataset, although the observations are not random, because the students take different exams each year, and some students/year/exams have siginificantly less observations than others.
I want to create a lagged variable that contains the average test score for each test, in each school, in year t-1. For example, the variable t1_lag_grade_avg in an observation with a student\s grade in exam no. 831 from school no. 5 would receive the average of students' grades in school no. 5, exam no. 831 from year t-1. I want to create two lagged variables: t1_lag_grade_avg and t2_lag_grade_avg that is lagged to year t-2.
Because this is not panel/time series data, I can't seem to find a way to add the lagged variable to a given group of grades without collapsing the group and losing observations. This is my current code, notice that it works only for the first observation of each group. I am also having trouble creating the t-2 variable using this method. I would appreciate any help with this issue.

Code:

sort school exam year
egen totalgrade= total(grade), by(school exam year) 
bysort school exam year: gen meangrade = totalgrade/_N 
bysort school exam (year): gen t1_lag_grade_avg = meangrade[_n-1]

Sunday, August 28, 2022

Interpreting Interaction Term Coefficients

Dear Stata Users,

I intend to study the relationship between ethnicity and COVID-19 on income.

I conduct a log-linear regression with the log of income as the dependent as follows...

Code:

reg LOGNETINCOME MIXED INDIAN PAKISTANI BANGLADESHI CHINESE OTHER_ASIAN BLACK OTHER COVIDDEATHSBYPOP c.COVIDDEATHSBYPOP##i.MIXED c.COVIDDEATHSBYPOP##i.INDIAN c.COVIDDEATHSBYPOP##i.PAKISTANI c.COVIDDEATHSBYPOP##i.BANGLADESHI c.COVIDDEATHSBYPOP##i.CHINESE c.COVIDDEATHSBYPOP##i.OTHER_ASIAN c.COVIDDEATHSBYPOP##i.BLACK c.COVIDDEATHSBYPOP##i.OTHER if FEMALE==0, robust

The regression also includes several other variables, particularly dummy variables (quarter, region, education, industry)

With particular reference to the interaction terms, I am unfortunately struggling to understand their effects therefore I would like to ask...

1) Put simply in words, how does one interpret the interaction between COVID-19 deaths by population (COVIDDEATHSBYPOP) and Ethnic Group?

2) Given the comparatively high (or low in the negative sense) coefficients of the interactions, does this remain possible with the log of income as the y-variable, or have I gone disastrously wrong?

An extract of the regression output is as follows:

LOGNETINCOME_________________Coef. Std. Err. t P>|t| [95% Conf. Interval]
INDIAN#c.COVIDDEATHSBYPOP_ . 157.1472 58.63142 2.68 0.007 42.2305 272.0638
INDIAN _______________________-.0268082 .0246202 -1.09 0.276 -.0750634 .0214471
COVIDDEATHSBYPOP___________58.12169 29.31941 1.98 0.047 .6561067 115.5873

Apologies if any issues have been made in the posting procedure...rookie mistakes I'm sure.

Thanks for any help!

I appreciate your time.

Regards,

Sandeep.

HELP! xtline producing different graphs each run

Please help!

I have a simple, balanced longitudinal data set. I xtset the data and then run xtline. Each time I run xtline I am getting a completely different graph that does not reflect the summary data.

Any ideas why this is happening and how I can fix it??

Below is my code:

egen id = group(study_id)
sort id visit_month
drop study_id
xtset id visit_month

bysort arm visit_month: sum hfais_summary
xtline hfais_summary, recast(connected) i(id) t(visit_month)

bysort arm visit_month: sum wealthscore
xtline wealthscore, recast(connected) i(arm) t(visit_month)

bysort arm visit_month: sum social_support_sum
xtline social_support_sum, recast(connected) i(arm) t(visit_month)

Spilt variable and fill in numeric terms in the string

Dear Statalist,

I have a variable called Name which looks like below. I would like to spilt Name into Name1 and Name2 by '/' or '-', and then have both of them ending up with the numeric term.

Name	Name1	Name2
Passyunk230kV
NorthBangor34.5kV
Peckville/Varden34.5kV	Peckville34.5kV	Varden34.5kV
ErieSouth-Warren230kV	ErieSouth230kV	Warren230kV
Frackville-Hauto#369kV	Frackville#369kV	Hauto#369kV

I thought about using spilt to do separate out the Name variable, but do not know how to make them both end with the numeric term.

Thanks!

Eliminating leading blanks from a string

Hi all,

It seems there are leading blanks for a string variable state in 2 out of 5 rounds of data sets I'm working with. Due to these leading blanks after appending, apparently same value of the string appears twice upon tabulating the state variable.

I tried

Code:

keep if state=="XY"

with spaces in different positions and could confirm that the space is on left of "XY". I could also identify the rounds in which these spaces occur by

Code:

tab state round

and noting down the sample size under each value.

However when I try

Code:

replace state==trim(state)

on the offending rounds, I get "0 real changes made".

Could someone please help me out?

Thanks,

De-trending Panel Data

Hi everyone,
I need help with detrending panel data.
I have a panel of firm observations across years and industries. I want to detrend the performance variable (ROA) to remove year and industry effects so i can use the detrennded variable on the left hand side of another model that does not work well with controls, thus requiring ex-ante detrending.

My idea was to simply run the following code, essentially regressing ROA on a set of multiplicative year X industry dummies, and use the residuals as the detrended variable.

Code:

gen indxyear=ind*year
tab indxyear, gen(trenddummy)
reghdfe roa trenddummy*, nosample fastregress residual(res_roa)

I am posting data below. The variables are id, year, ROA and industry designator, in that order. Thanks for the help!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id double year float(roa ind)
1380 1992  .020091366 3
1380 1993 -.010184983 3
1380 1994   .04607277 3
1380 1995   .03616318 3
1380 1996   .06704622 3
1380 1997  .029870747 3
1380 1998   -.0048403 3
1408 1992   .11132794 1
1408 1993   .08327315 1
1408 1994    .1268582 1
1408 1995    .1289832 1
1408 1996   .11449675 1
1408 1997   .08191574 1
1408 1998   .08418822 1
1609 1992  .036867816 3
1609 1993    .0833743 3
1609 1994   .06494747 3
1609 1995    .0918747 3
1609 1996   .11155763 3
1609 1997   .10006464 3
1661 1992   .10992254 3
1661 1993   .07423703 3
1661 1994    .0861485 3
1661 1995   .10034772 3
1661 1996   .08848996 3
1661 1997   .12681246 3
1661 1998   .12572937 3
1663 1992   .16850606 1
1663 1993    .1633135 1
1663 1994   .17193583 1
1663 1995   .16928685 1
1663 1996   .19391987 1
1663 1997    .1750646 1
1663 1998   .17023782 1
1678 1992   .06116251 3
1678 1993   .05108807 3
1678 1994   .04647045 3
1678 1995   .04226407 3
1678 1996   .07594678 3
1678 1997   .08104512 3
1678 1998 -.029105654 3
1722 1993   .08783213 1
1722 1994   .08761986 1
1722 1995   .12433536 1
1722 1996   .08750617 1
1722 1997   .07014403 1
1722 1998   .05553664 1
end

How can I get gr_edit to refer to specific graphs when multiple graphs are open in the graph window?

I'm using a user-created command (-synth2-) that generates multiple graphs as outputs. I want to be able to edit these graphs with code, but there doesn't seem to be an easy way to do this with the synth2 command. Thus, I am instead trying to use the -gr_edit- command.

However, I'm running into a problem where the -gr_edit- command only applies changes to the last graph generated by the synth2 command. I don't know how to get gr_edit to refer to a specific graph.

To demonstrate my issue, consider the following example. Suppose I generate a scatterplot (graph1) and a histogram (graph2).

Code:

sysuse auto, clear
scatter price mpg, name(graph1)
histogram price, name(graph2)

Suppose I want to use gr_edit to make the scatterplot markers orange and the histogram bars blue. How would I do this?

My attempt was:

Code:

gr_edit plotregion1.plot1.style.editstyle marker(fillcolor(orange)) editcopy
gr_edit plotregion1.plot1.style.editstyle marker(linestyle(color(orange))) editcopy

gr_edit plotregion1.plot1.style.editstyle area(shadestyle(color(blue))) editcopy

Since I don't know how to get gr_edit to refer to specific graphs (e.g. "graph1" vs. "graph2"), this command just made the histogram bars blue and didn't do anything to the scatterplot markers.

What components should I pay attention to when buying a new laptop for Stata?

Hello Statalisters,

I don't know if I'm allowed to post this thread as it is not related to Stata per se but to the best ways to use Stata. For the next years I might have to spend my days using Stata and my laptop is currently dying, so I need to replace it. At this moment, my laptop struggles with large datasets or regressions using a lot of iterations (it can take several hours to run, and with a noise that is very worrying and uncomfortable).

Sadly, I don't know anything about computers. So I wanted to know if there was any tech fan among this forum to tell me what are the main components I should pay attention to when looking for a new laptop. I'm not interested in brands, mainly about the components!

Thank you in advance,

Adam

Saturday, August 27, 2022

max # of variables in Cochran's Q test?

Hello- I am trying to evaluate multiple choice (non-exclusive) binary questions. Ex: on a survey, a participant can choose "yes" box on any of 6 medical conditions (diabetes, high cholesterol, stroke etc). I wanted to evaluate the relationships- thought I could use the cochran's Q test, but the command is telling me I am imputing too many variables. How many can you do at a time?
I

Abnormal return for everday in a year

Good day everybody,

I am new to Stata, and came up with an issue regarding the event study. I use the command from Princeton to calculate abnormal returns:
forvalues i=1(1)N { /*note: replace N with the highest value of id */ l id company_id if id==`i' & dif==0 reg ret market_return if id==`i' & estimation_window==1 predict p if id==`i' replace predicted_return = p if id==`i' & event_window==1 drop p } How can I modify this code to make it calculate abnormal returns for everyday in 12-month data, instead of for only event window? Bests <3

Logistic Regression Output with Factor Variable i.Year - Issues with Interpretation

Hi all!

I’m having issues interpreting the coefficients in my regression output. My model has a dichotomous dependent variable: Education (Does have a bachelor’s degree=1, 0 otherwise) and a factor variable: i.Year (for the years 2007 to 2011).
The command:
logistic Education i.Year, coef

Logistic regression

Education	Coef.		St.Err.	t-value		p-value	[95% Conf		Interval]	Sig
2007b	0		.	.		.	.		.
2008	.433		.073	5.95		0	.29		.575	***
2009	.602		.075	8.00		0	.454		.749	***
2010	.217		.074	2.92		.004	.071		.362	***
2011	.225		.077	2.93		.003	.075		.376	***
Constant	-.381		.052	-7.40		0	-.483		-.28	***

Mean dependent var		0.477			SD dependent var			0.500
Pseudo r-squared		0.008			Number of obs			7080
Chi-square		75.687			Prob > chi2			0.000
Akaike crit. (AIC)		9734.809			Bayesian crit. (BIC)			9769.134
* p<.01, p<.05, * p<.1

The coefficient of 2008 is 0.433, which suggests that the proportion of those with a bachelor’s degree from 2007 to 2008 fell by 0.433 log of odds. The standard interpretation of the logistic coefficient is "for a one unit change in variable X, we expect the log of the odds of the outcome to change by *coefficient* units, holding all other variables constant". Since my focus is on the trend of “education” between the years 2007 to 2011, it's difficult to follow the standard interpretation. Therefore, I'm unsure of what 0.433 is supposed to represent in the case of a trend. Would it be correct if I were to interpret the data as "the log odds of graduating with a bachelor’s has increased by 0.433 in 2008 in comparison to 2007? In comparison to 2007, the log odds of graduating with a bachelor’s have increased by 0.602 in 2009 etc?". My data is both nonlinear and non-normal, so the standard trend tests are not suitable. The purpose is to have a table that would indicate the changes in education between the years 2007 to 2011.

Thank you for your help!

spmap error

Hi,

I am learning how to create maps in Stata using spmap. Was trying out the examples using help spmap. But Stata is throwing an error that the Italy dataset is not found. How can I fix this. Unable to access datasets like "Italy-RegionsData.dta", "Italy-RegionsCoordinates.dta" given in help spmap.

Thanks.

Friday, August 26, 2022

difficult to find the problem in MLE coding

Hello everyone, this is the code i am writing to estimate MLE estimates. while writing this program i am trying to check this program by ML check command and getting "prob1l^ invalid name". it is showing that invalid name in the program. this i am unable to understand. please help me on this problem.

Test 1: Calling HL_cpt0 to check if it computes log likelihood and
does not alter coefficient vector...
FAILED; HL_cpt0 returned error 198.

Here is a trace of its execution:
------------------------------------------------------------------------------
-> HL_cpt0 __000008 __000009
- `begin'
= capture noisily version 12: HL_cpt0 __000008 __000009
----------------------------------------------------------------------------------------------------------------------------------------- begin HL_cpt0 ---
- args lnf alpha gamma
- tempvar choices prob1l prob2l prob1r prob2r y1 y2 y3 y4 euL euR euDiff tmp
- quietly {
- generate double `tmp' = (($ML_y2^`gamma')+((1-$ML_y2)^`gamma'))
= generate double __00000M = ((prob1l^)+((1-prob1l)^))
prob1l^ invalid name
replace `tmp' = `tmp'^(1/`gamma')
generate double `prob1l' = ($ML_y2^`gamma')/`tmp'
generate double `prob2l' = ((1-$ML_y2)^(`gamma'))/`tmp'
replace `tmp' = (($ML_y3^`gamma')+((1-$ML_y3)^`gamma'))
replace `tmp' = `tmp'^(1/`gamma')
generate double `prob1r' = ($ML_y3^`gamma')/`tmp'
generate double `prob2r' = ((1-$ML_y3)^(`gamma'))/`tmp'
generate double `y1' = ($ML_y4)^(`alpha') if $ML_y4>=0
generate double `y2' = ($ML_y5)^(`alpha') if $ML_y5>=0
generate double `y3' = ($ML_y6)^(`alpha') if $ML_y6>=0
generate double `y4' = ($ML_y7)^(`alpha') if $ML_y7>=0
gen double `euL'=(`prob1l'*`y1')+(`prob2l'*`y2')
gen double `euR'=(`prob1r'*`y3')+((1-`prob2r')*`y4')
generate double `euDiff' = `euR' - `euL'
replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1
replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0
}
------------------------------------------------------------------------------------------------------------------------------------------- end HL_cpt0 ---
- `end'
= set trace off
------------------------------------------------------------------------------
Fix HL_cpt0.
r(198);

Clustering with imputation

I am using a dataset that uses a school-based sampling design, and thus I want to cluster with school id in my modeling. However, I'm also using multiple imputation. Do I need to include the cluster id variable in the imputation model, or can I just specify it in the ensuing models? And if so, how do I do that given that it's in my dataset as a string--I don't think it would make sense to destring and then convert back to string after imputation, would it?

As of now, without the school cluster id included in the imputation model, I have something like:

Code:

mi set mlong
mi register imputed y x1 x2 x3 
mi impute chained (regress) x1 x2 x3 = y, add(5) rseed(100)

mi estimate: regress y x1 x2 x3, vce(cluster school_id)

Using the coefplot command to plot odds ratios with p-values in multiple imputation estimations – potential bug to fix!

Dear Ben Jann,

I was trying to plot odds ratios of mixed-effects logistic models of the association of four sexual risk-taking behaviours (Sex after substance use "SexDrunkDrugs", infrequent condom use "SexLastUnsafeConsist", multiple sexual partnership "SexPartnerFreq2More", and inequitable sexual partnership "InequitSex") with mobile phone use for health content "XC" and mobile phone use for social media "XM" by gender (boys and girls). I also wanted to show the p-values labels on the graphs.

I used your coefplot command to plot these from multiply imputed data. I am posting the codes here. Please, note that I am showing coefplot on one of the four outcomes here.

Code:

mi set mlong // mi set data for imputation
mi reshape wide SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex Access XC XM Use, i(ID) j(j)
* Identifying which variables in the imputation model have missing information.
mi register imputed NecessitiesAll_r HomeTypeInformal SexDrunkDrugs0 SexDrunkDrugs1 InequitSex0 InequitSex1

*** Mutiple imputation
*MI using chained equations/MICE (also known as the fully conditional specification or sequential generalized regression) for binary outcomes.  In simulation studies (Lee & Carlin, 2010; Van Buuren, 2007), MICE has been show to produce estimates that are comparable to MVN method.
  
* Clear any xtset that is ongoing
mi xtset, clear
mi impute chained  (logit) NecessitiesAll_r HomeTypeInformal SexDrunkDrugs0  ///
SexDrunkDrugs1 InequitSex0 InequitSex1 = Access0 Access1  ///
  XC0 XM0 XC1 XM1 Sex ARTever AgeC Rural Relationship     ///
  SchEnrol, add(10) rseed(20210101) /*skipnonconvergence(5)*/ savetrace(tracedata, replace)

* Bring data back to long format
mi reshape long SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex Access XC XM Use, i(ID) j(j)
  
* We can check the data and see if there is no problems in our imputation
summ SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex HomeTypeInformal NecessitiesAll_r

* Regressions
local y     SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex
local x0    Access
local x1    XC XM
local x     Rural AgeC Sex HomeTypeInformal NecessitiesAll_r ARTever Relationship SchEnrol j
local xsex  Rural AgeC HomeTypeInformal NecessitiesAll_r ARTever Relationship SchEnrol j
  
foreach ys of local y {
   forvalues f = 0/1 {
       cap noi mi estimate, post saving(`ys'`f', replace) cmdok : melogit `ys' `x1' `xsex' if Sex == `f' || ID:, or cov(un)
       cap noi estimate store `ys'`f'
   }
}

* Plot graphs
*set trace on
set scheme s1color
cap drop xpos
gen xpos = 3.5

coefplot (SexLastUnsafeConsist0, label("Boy") msymbol(S) mcolor(black) mfcolor(white))      ///
  (SexLastUnsafeConsist1, label("Girl") msymbol(O) mcolor(black) mfcolor(black)), ///
bylabel("Infrequent condom use") mlabel(cond(@pval<0.001, "{it:p} <0.001", "{it:p} =" + string(@pval,"%9.3f"))) ///
    mlabc(none) addplot(scatter @at xpos, ms(i) mlab(@mlbl) mlabcolor(black) mlabpos(9) mlabgap(-10) mlabsize(small)) ///
    graphr(margin(r=10)) xsc(r(-10 5))

Andrew Musau helped me investigate the issues and suggested a workaround that involves approximating the p-values using the normal distribution in the cases where p-values are missing. Andrew believes that this is a bug in the coefplot program and recommended that I send you a dataset (attached here) and some code that reproduces the issue so that you may have a look at it. Codes for his approximation is highlighted below.

Code:

mi set mlong // mi set data for imputation
mi reshape wide SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex Access XC XM Use, i(ID) j(j)
* Identifying which variables in the imputation model have missing information.
mi register imputed NecessitiesAll_r HomeTypeInformal SexDrunkDrugs0 SexDrunkDrugs1 InequitSex0 InequitSex1

*** Mutiple imputation
*MI using chained equations/MICE (also known as the fully conditional specification or sequential generalized regression) for binary outcomes.  In simulation studies (Lee & Carlin, 2010; Van Buuren, 2007), MICE has been show to produce estimates that are comparable to MVN method.
  
* Clear any xtset that is ongoing
mi xtset, clear
mi impute chained  (logit) NecessitiesAll_r HomeTypeInformal SexDrunkDrugs0  ///
SexDrunkDrugs1 InequitSex0 InequitSex1 = Access0 Access1  ///
  XC0 XM0 XC1 XM1 Sex ARTever AgeC Rural Relationship     ///
  SchEnrol, add(10) rseed(20210101) /*skipnonconvergence(5)*/ savetrace(tracedata, replace)

* Bring data back to long format
mi reshape long SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex Access XC XM Use, i(ID) j(j)
  
* We can check the data and see if there is no problems in our imputation
summ SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex HomeTypeInformal NecessitiesAll_r

* Regressions
local y     SexDrunkDrugs SexLastUnsafeConsist SexPartnerFreq2More InequitSex
local x0    Access
local x1    XC XM
local x     Rural AgeC Sex HomeTypeInformal NecessitiesAll_r ARTever Relationship SchEnrol j
local xsex  Rural AgeC HomeTypeInformal NecessitiesAll_r ARTever Relationship SchEnrol j
  
foreach ys of local y {
   forvalues f = 0/1 {
       cap noi mi estimate, post saving(`ys'`f', replace) cmdok : melogit `ys' `x1' `xsex' if Sex == `f' || ID:, or cov(un)
       cap noi estimate store `ys'`f'
   }
}

* Plot graphs
*set trace on
set scheme s1color
cap drop xpos
gen xpos = 3.5

coefplot (SexLastUnsafeConsist0, label("Boy") msymbol(S) mcolor(black) mfcolor(white))      ///
  (SexLastUnsafeConsist1, label("Girl") msymbol(O) mcolor(black) mfcolor(black)), ///
  bylabel("Infrequent condom use") mlabel(cond(@pval<0.001, "{it:p} < 0.001", ///
  cond(@pval>0.001 &@pval<., "{it:p} = " + string(@pval,"%9.3f"), ///
  cond(missing(@pval) & 2*normal(-abs(@t))<0.001, "{it:p} < 0.001", "{it:p} = " + string(2*normal(-abs(@t)),"%9.3f")) ))) ///
  mlabc(none) addplot(scatter @at xpos, ms(i) mlab(@mlbl) mlabcolor(black) ///
  mlabpos(9) mlabgap(-10) mlabsize(small)) graphr(margin(r=10)) xsc(r(-10 5))

I hope this helps.

Thank you again Andrew Musau

T-Test Using Three Groups

I am trying to do t-test where I have three groups (AIC, GIC, Others) - accordingly I created a variable (Test1) which is equal to 1 if AIC and 2 if GIC and 3 if Others. However, when I run t-test I can see it only takes two groups.

Is there is any way I can do three groups by Ticker and by Year ?

Any advice would be highly helpful

Code:

 ttest DA_w, by(Test1) unequal welch

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       1 |   1,105    .0142281    .0045872    .1524843    .0052275    .0232286
       2 |   2,515    -.006903     .003283    .1646403   -.0133406   -.0004654
---------+--------------------------------------------------------------------
Combined |   3,620   -.0004528    .0026809       .1613    -.005709    .0048035
---------+--------------------------------------------------------------------
    diff |             .021131    .0056409                .0100691    .0321929
------------------------------------------------------------------------------
    diff = mean(1) - mean(2)                                      t =   3.7460
H0: diff = 0                             Welch's degrees of freedom =  2265.65

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9999         Pr(|T| > |t|) = 0.0002          Pr(T > t) = 0.0001

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str9 Ticker int Year byte(AIC GIC Others) float(Test1 DA_w)
"TH:2S"    2003 . 1 . 2           .
"TH:2S"    2004 . 1 . 2           .
"TH:2S"    2005 . 1 . 2           .
"TH:2S"    2006 . 1 . 2           .
"TH:2S"    2007 . 1 . 2  -.08454061
"TH:2S"    2008 . 1 . 2  -.25864583
"TH:2S"    2009 . 1 . 2    .6085715
"TH:2S"    2010 . 1 . 2   .09650405
"TH:2S"    2011 . 1 . 2   -.1017216
"TH:2S"    2012 . 1 . 2   .11378127
"TH:2S"    2013 . 1 . 2  -.05965737
"TH:2S"    2014 . 1 . 2  -.04307863
"TH:A"     2003 1 . . 1           .
"TH:A"     2004 1 . . 1    .6278349
"TH:A"     2005 1 . . 1   .06726224
"TH:A"     2006 1 . . 1   -.1024842
"TH:A"     2007 1 . . 1   .09083974
"TH:A"     2008 1 . . 1  -.02267651
"TH:A"     2009 1 . . 1  -.15725097
"TH:A"     2010 1 . . 1    .0957468
"TH:A"     2011 1 . . 1  .032091457
"TH:A"     2012 1 . . 1    .1184249
"TH:A"     2013 1 . . 1    .2663026
"TH:A"     2014 1 . . 1    .2288631
"TH:ABICO" 2003 . 1 . 2   -.1509448
"TH:ABICO" 2004 . 1 . 2   -.7443731
"TH:ABICO" 2005 . 1 . 2    .6278349
"TH:ABICO" 2006 . 1 . 2 -.013820863
"TH:ABICO" 2007 . 1 . 2  .066848315
"TH:ABICO" 2008 . 1 . 2   -.1630021
"TH:ABICO" 2009 . 1 . 2    .1760944
"TH:ABICO" 2010 . 1 . 2  .027172344
"TH:ABICO" 2011 . 1 . 2    .1821229
"TH:ABICO" 2012 . 1 . 2    .2805742
"TH:ABICO" 2013 . 1 . 2   .04285799
"TH:ABICO" 2014 . 1 . 2   .06678031
"TH:ACC"   2003 . 1 . 2   .13844515
"TH:ACC"   2004 . 1 . 2  -.23043732
"TH:ACC"   2005 . 1 . 2   -.1565992
"TH:ACC"   2006 . 1 . 2  -.09132891
"TH:ACC"   2007 . 1 . 2   .04345912
"TH:ACC"   2008 . 1 . 2   .18008757
"TH:ACC"   2009 . 1 . 2           .
"TH:ACC"   2010 . 1 . 2           .
"TH:ACC"   2011 . 1 . 2     .213304
"TH:ACC"   2012 . 1 . 2   .05535654
"TH:ACC"   2013 . 1 . 2  -.17010236
"TH:ACC"   2014 . 1 . 2           .
"TH:AFC"   2003 . 1 . 2   .04718301
"TH:AFC"   2004 . 1 . 2   .05981214
"TH:AFC"   2005 . 1 . 2   .23442598
"TH:AFC"   2006 . 1 . 2   -.1069364
"TH:AFC"   2007 . 1 . 2  -.04298536
"TH:AFC"   2008 . 1 . 2   .05517821
"TH:AFC"   2009 . 1 . 2 -.006168552
"TH:AFC"   2010 . 1 . 2   .01499469
"TH:AFC"   2011 . 1 . 2  .012949838
"TH:AFC"   2012 . 1 . 2   .04469505
"TH:AFC"   2013 . 1 . 2   .05154819
"TH:AFC"   2014 . 1 . 2           .
"TH:AGE"   2003 . 1 . 2           .
"TH:AGE"   2004 . 1 . 2           .
"TH:AGE"   2005 . 1 . 2           .
"TH:AGE"   2006 . 1 . 2           .
"TH:AGE"   2007 . 1 . 2           .
"TH:AGE"   2008 . 1 . 2   .26367554
"TH:AGE"   2009 . 1 . 2   .08931428
"TH:AGE"   2010 . 1 . 2   .25956905
"TH:AGE"   2011 . 1 . 2  -.09252103
"TH:AGE"   2012 . 1 . 2   .04484579
"TH:AGE"   2013 . 1 . 2   .13395208
"TH:AGE"   2014 . 1 . 2  -.27340496
"TH:AH"    2003 1 . . 1   -.2078416
"TH:AH"    2004 1 . . 1   -.1830832
"TH:AH"    2005 1 . . 1 .0010229541
"TH:AH"    2006 1 . . 1  .016149808
"TH:AH"    2007 1 . . 1   -.0128961
"TH:AH"    2008 1 . . 1  -.04499222
"TH:AH"    2009 1 . . 1   -.0351702
"TH:AH"    2010 1 . . 1  -.04360331
"TH:AH"    2011 1 . . 1 -.005175362
"TH:AH"    2012 1 . . 1  .030353934
"TH:AH"    2013 1 . . 1  -.05105714
"TH:AH"    2014 1 . . 1  -.07181978
"TH:AHC"   2003 . 1 . 2  -.04777425
"TH:AHC"   2004 . 1 . 2 -.019984696
"TH:AHC"   2005 . 1 . 2   .04643552
"TH:AHC"   2006 . 1 . 2 -.003667239
"TH:AHC"   2007 . 1 . 2   .01884266
"TH:AHC"   2008 . 1 . 2  -.01925189
"TH:AHC"   2009 . 1 . 2  .027770244
"TH:AHC"   2010 . 1 . 2 -.013661582
"TH:AHC"   2011 . 1 . 2  -.03151839
"TH:AHC"   2012 . 1 . 2  -.02424605
"TH:AHC"   2013 . 1 . 2   .02805965
"TH:AHC"   2014 . 1 . 2  .018061165
"TH:AI"    2003 . 1 . 2  .072685905
"TH:AI"    2004 . 1 . 2  -.10123187
"TH:AI"    2005 . 1 . 2  .012330643
"TH:AI"    2006 . 1 . 2   .04343369
end

Creating a Dummy variable (Time variant )

Dear All,

1. I want to create a Dummy Variable ( Health Status of Indivuduals) which is either yes or no from INSURCY2 . if it yes i want to code it to 1 and 0 otherwise. The different values of the variable INSURCY2 is given in the following link.

https://meps.ahrq.gov/mepsweb//data_...rName=INSURCY2

I am considering this value as the predictor of the treatment (Individuals who have or not an insurance)

2. I want to put the above dummy variable at the time variant variables at the time T-1(lagged variables) such as income, education, married status, employment, region and if in need of care or not.

The description of data is given as below:

The Medical Expenditure Panel Survey — Household Component (MEPS-HC), conducted by the Agency for Healthcare Research Quality (AHRQ), longitudinal data collected from 2013 to 2015. The objective of the work is to highlight the changes happened shortly after the implementation of the policy. The variables that we will use are socio-economic and demographic variables and health status indicators: Independent variables: Age, male, white, black, Hispanic, other race, married, northeast, Midwest, south, west, years of education, high risk, employed, union, need specialist, need care, insurance prefer., insurance attitude, family members, limitations, income

Looking forward to your suggestions regarding 1 and 2.

With sincere regards,
Upananda Pani

Thursday, August 25, 2022

Help trying to determine an Asymmetric effect for dummy variables in a Linear Random Effects Model

Hello this is my first post here.

So I am making a model to try and show the asymmetric effect in my independent dummy variables but with the model being linear obviously the coefficients are just mirrors of each other. So with my limited econometric knowledge beyond your typical inference and linear model these are my best attempts at rectifying my issue and I need your judgment on whether they are of any use or not.

Firstly PS is the dummy variable that I am attempting to find an asymmetric effect for. My first idea is to treat it as a continuous variable and find the marginal effect when PS=1,0.5 and 0. My plan is to use the standard errors instead of the standard deviations (due to standard deviation being identical for PS=0 and PS=1 seeming that they are the same variable.) and multiply these against the margins to provide a consistent scale. I would then find the difference between the scaled margins of PS=1 and PS=0.5 and then subtract the difference of PS=0.5 and PS=0 to determine if there is an asymmetric effect. i.e if <0 PS=0 is weighted more than PS=1, if it is >0 then the reverse is the case, and finally if the result of this is 0 then there is no asymmetric effect.

Because of the issues I faced trying to upload everything I have attached further explanations and regressions to this pdf file. sorry for any inconvenience.

Thanks for the support.

Reshape wide with no time/order variable

Hello. I am working with a dataset that has multiple observations for most of the identifiers labeled study_id. I would like to reshape wide but there is no variable indicating the order. Please see example of my data below. Is it possible to reshape such that the first medication is labeled med1, and the second (if any) is med2, etc.?

[CODE]

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int study_id str34 med
  4 "cyclobenzaprine"       
  5 "lorazepam"             
  5 "promethazine"          
  5 "trazodone"             
  6 "alprazolam"            
  6 "eszopiclone"           
  7 "cyclobenzaprine"       
  7 "temazepam"             
  8 "citalopram"            
 9 "cyclobenzaprine"       
 22 "amitriptyline"         
 24 "venlafaxine"

Coefplot - multiple models with different dependent variables

All, I am running multiple regressions with the same independent variable and 29 different dependent variable, and then running these same models multiple times for different time points. I would like to get a plot of the regression coefficients where the 29 dependent variables are along the Y axis, and multiple plots, side-by-side for the different time points. I found a post where one of the comments was getting at something similar, but I can't figure out how to get it the way I need it.
So, using this code:

Code:

sysuse auto, clear  
reg trunk foreign gear_ratio if rep78 == 5  // Outcome = Trunk and sub-group = 4
estimates store trunk_s5
reg trunk foreign gear_ratio if rep78 == 4    // Outcome = Trunk and sub-group = 5
estimates store trunk_s4  

reg mpg foreign gear_ratio if rep78 ==5  // Outcome = MPG and sub-group = 4
estimates store mpg_s5
reg mpg foreign gear_ratio if rep78 ==4  // Outcome = MPG and sub-group = 5
estimates store mpg_s4  

reg turn foreign gear_ratio if rep78 ==5 // Outcome = Turn and sub-group = 4
estimates store turn_s5
reg turn foreign gear_ratio if rep78 ==4  // Outcome = Turn and sub-group = 5
estimates store turn_s4  

coefplot (trunk_s4, keep(foreign) \ mpg_s4 \ turn_s4, keep(gear_ratio) \) ///      
|| (trunk_s5, keep(foreign) \ mpg_s5 \ turn_s5, keep(gear_ratio) \) ///      
|| , drop(_cons) aseq swapnames eqrename(*_s4 = "" *_s5 = "") ///          
order(trunk turn mpg) bylabels("Subgroup 4" "Supgroup 5") ///            
eqlabels(, asheading)

I get the following plot:
Array

What I would REALLY like to have happen is for the two trunk variables (trunk_s4 and trunk_s5) to be on the same line (i.e., side-by-side), AND to be able to label that as "Trunk". And the same goes for MPG, Turn, etc. Is there a way to do that?? I have tried everything I can think of, and have been at this for several days now.

covariates and distal outcomes in a latent class model

Hi Statalist
Is there a way to implement the 3-step Latent Class Analysis (to include the covariates and distal outcomes in the model) in stata?

ivreg2 - despite using partial option I get full rank error

In a multiple instrument regression using ivreg2 with gmm2s option I'm using partial. Despite using partial option, I get the following error which I attached at the last.

Code:

ivreg2 dep_var  (endogenous_var= i.year#i.dummyb) male ismarried wasmarried age age2 black asian hispanic lths hsdegree somecollege i.year i.county $SCONTROL , gmm2s robust cluster(county) partial(i.county)

My dependent variable is a dummy variable, endogenous variable is a dummy variable. The multiple IV is the interaction of dummy variable of year and dummy variable of another variable - which takes up to 18 dummy values.

When I'm keeping the IV as i.year#c.dummyb then the regression runs and gives me Hansen J statistics. But when I'm using this as IV i.year#i.dummyb then I receive the following error despite keeping everything else the same.

Code:

Error: estimated covariance matrix of moment conditions not of full rank,
       and optimal GMM weighting matrix not unique.
Possible causes:
       singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
       partial option may address problem.
r(506);

end of do-file

r(506);

This error I got after the regression running on my laptop for 2.5 hours. So, before experimenting anything else I would look for some kind guidance. Please help if you are familiar with this.

optimization with STATA

I am unfamiliar with STATA programming, and I am trying to program some very rudimentary optimization problems.

Suppose I have two variables: age and attitude. I want to find a cutoff in age such that I can minimize the MSE of predicting attitude with just a binary variable indicating whether age is above or below the cutoff.

I thought I could achieve that with the function nl and I wrote the following:

Code:

nl (attitude = {b0} + {b1}*(age>{cutoff}), hasconstant(b0)

but the result shows that both b1 and cutoff are constrained. How can I achieve what I want?

Also, what if I want to find the cutoff with another criteria? One potential idea is finding the cutoff that maximizes the Kolmogorov-Smirnov test statistics between the distribution of attitude among the two age groups.

Transform a table: making options as variable

Dear all,

I have this type of table:

ID	Date	Culture ID	Culture type	Sample result
1	1 Jan 2022	001	MGIT	Positive
1	1 Jan 2022	001	LJ	Positive
2	1 Jan 2022	001	MGIT	Negative
2	1 Jan 2022	001	LJ	Positive
2	2 Jan 2022	002	MGIT	Positive

I would like to transform it into:

ID	Date	Culture ID	MGIT	LJ
1	1 Jan 2022	001	Positive	Positive
2	1 Jan 2022	001	Negative	Positive
2	2 Jan 2022	002	Positive

Could you please help me with the syntax that I could use?

Thank you in advance.

Assuming ICC

Hello everyone, This is my first post here. I need icc for power calculation. I have outcome variables test scores and student's effort to study .Can you hint me any source to assume icc ? I have been told to look at similar study so I was checking data of a similar study. I am trying to find out the intraclass correlation between testscores and then student's effort separately, not between them. I have the test score, student ids and they are clustered by village id.I also have 2 treatment groups and 1 control. I have tried the stata command but I can't understand who is my rater/target? Not sure whether to pick one way or two way.It is a Randomised Controlled Trial model.for power calculation I should be using : power twomeans 0, cluster k1(75) k2(75) m1(15) m2(15) power(.8 .9) rho() I need to assume icc/rho.Thanks.

Wednesday, August 24, 2022

generate t-stat

How can we modify this code to generate t-stat of the means instead of standard deviation (sd), and also for 4*4 sorts.

Code:

frame create means_and_sds int (idiovol_group mcap_group) ///
    float(mean_vw sd_vw mean_ew sd_ew)
forvalues iv = 1/5 {
    forvalues mc = 1/5 {
        summ vw_mean_q`mc'_idiovol_q`iv'_rt
        local mean_vw = r(mean)
        local sd_vw = r(sd)
        summ ew_mean_q`mc'_idiovol_q`iv'_rt
        local mean_ew = r(mean)
        local sd_ew = r(sd)
        frame post means_and_sds (`iv') (`mc') (`mean_vw') (`sd_vw') ///
            (`mean_ew') (`sd_ew')
    }
}
frame change means_and_sds
rename (mean* sd*) =_idiovol_
reshape wide *_idiovol_, i(mcap_group) j(idiovol_group)

How to store p-value and CI in a new variable in prais-winsten regression?

Hi!

I'm new to Stata! I have a database model as shown below (hypothetical data). When running Prais-winsten regression, I would like to store the p-values and confidence intervals in a new variable.

The command “ statsby, by(id): prais log10_prevame year ”, stores only the beta values in a new variable. How do I store the p-value and CI as well?

I also tested the regsave command (below) but it only stores the values of the last prais-wisten regression and not the set of regressions by id (my database is approximately 2000 id)

by(id): prais log10_prevame year
regsave, tstat pval ci

Array

I thank the help of all you.

spatial weight matrix

Hello,
As part of my research, I want to build a spatial weight matrix. I want a variable that shows the average GVC of each of the contiguous neighboring countries.
what is the average GVC of all of Austria's neighboring countries in 1990. 1991 etc. ? How do I create this variable?
I don't always understand the process of building a matrix.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 A byte B str4 C str10 D str18(E F G)
""         . "Year" "GVC"       "due_to_within_EU28" "due_to_income_EU28" "due_to_pop_EU28"   
"Austria"  1 "1990" "29805700"  ".0481634656111964"  ".9024480458070983"  "-.0191678639953992"
"Austria"  1 "1991" "30938910"  ".1836609116141972"  "1.944608703151403"  "-.0441903522094975"
"Austria"  1 "1992" "34690810"  ".374990608645696"   "2.423186855571203"  "-.0670849539272993"
"Austria"  1 "1993" "30951600"  ".6465782510456961"  "2.387666853502203"  "-.0922915704774994"
"Austria"  1 "1994" "34472510"  ".8342666347497953"  "2.295696995644306"  "-.1160531306933024"
"Austria"  1 "1995" "43642010"  ".9401863633068999"  "1.998167904203711"  "-.1434793089426023"
"Austria"  1 "1996" "45747690"  ".8188593139070974"  "1.78346026440601"   "-.1607268091505034"
"Austria"  1 "1997" "44950790"  ".7875487975271014"  "1.81007552918431"   "-.1758015333879044"
"Austria"  1 "1998" "47853380"  ".8011219821127966"  "1.812754054488806"  "-.1918774365390092"
"Austria"  1 "1999" "48544530"  ".8388342510556939"  "1.901551387258806"  "-.2060778344974139"
"Austria"  1 "2000" "49766170"  ".8847901921023933"  "1.826584293540904"  "-.2273702337304115"
"Austria"  1 "2001" "52696490"  ".9489439828825965"  "1.672656645632905"  "-.2585411221463119"
"Austria"  1 "2002" "56400760"  ".9532259074613947"  "1.385412068788604"  "-.3050060786057145"
"Austria"  1 "2003" "71220730"  "1.076461129929193"  "1.097109321098706"  "-.3292282168301099"
"Austria"  1 "2004" "92070150"  "1.312537125220892"  ".783644244392903"   "-.3445579339966116"
"Austria"  1 "2005" "101081500" "1.408761366994995"  ".5412083125216043"  "-.3630515445323113"
"Austria"  1 "2006" "124432900" "1.469008083114495"  ".2761671932472041"  "-.3780829601737068"
"Austria"  1 "2007" "155928200" "1.527217282333993"  ".0185255577209986"  "-.3913022873326071"
"Austria"  1 "2008" "183194900" "1.622884240942696"  "-.3050427344651965" "-.4098649309953046"
"Austria"  1 "2009" "142540500" "1.649642827765895"  "-.444219273757902"  "-.420585170634709" 
"Austria"  1 "2010" "173373400" "1.754465818836096"  "-.3424530512790014" "-.4334694235139054"
"Austria"  1 "2011" "190063200" "1.820813707667696"  "-.3540547723328018" "-.4506867045009031"
"Austria"  1 "2012" "187674200" "1.930985100447394"  "-.4128715802334995" "-.4611845414062046"
"Austria"  1 "2013" "196076500" "2.022298897724198"  "-.4191095057589962" "-.4715954975592993"
"Austria"  1 "2014" "205255700" "1.9478777170334"    "-.4614612203901984" "-.4817944606082989"
"Austria"  1 "2015" "174853500" "1.921528790290004"  "-.4723106970221949" "-.4926251645836004"
"Austria"  1 "2016" "178600100" "1.945843827191304"  "-.5911460549507979" "-.5033391209945037"
"Austria"  1 "2017" "185883700" "1.841171280076601"  "-.7111840237362017" "-.5109963075212036"
"Belgium"  2 "1990" "97696230"  ".0481634656111964"  ".9024480458070983"  "-.0191678639953992"
"Belgium"  2 "1991" "102237000" ".1836609116141972"  "1.944608703151403"  "-.0441903522094975"
"Belgium"  2 "1992" "116127100" ".374990608645696"   "2.423186855571203"  "-.0670849539272993"
"Belgium"  2 "1993" "105241000" ".6465782510456961"  "2.387666853502203"  "-.0922915704774994"
"Belgium"  2 "1994" "117889000" ".8342666347497953"  "2.295696995644306"  "-.1160531306933024"
"Belgium"  2 "1995" "144896900" ".9401863633068999"  "1.998167904203711"  "-.1434793089426023"
"Belgium"  2 "1996" "147398200" ".8188593139070974"  "1.78346026440601"   "-.1607268091505034"
"Belgium"  2 "1997" "141897800" ".7875487975271014"  "1.81007552918431"   "-.1758015333879044"
"Belgium"  2 "1998" "145804000" ".8011219821127966"  "1.812754054488806"  "-.1918774365390092"
"Belgium"  2 "1999" "146973000" ".8388342510556939"  "1.901551387258806"  "-.2060778344974139"
"Belgium"  2 "2000" "151373300" ".8847901921023933"  "1.826584293540904"  "-.2273702337304115"
"Belgium"  2 "2001" "155192100" ".9489439828825965"  "1.672656645632905"  "-.2585411221463119"
"Belgium"  2 "2002" "165131500" ".9532259074613947"  "1.385412068788604"  "-.3050060786057145"
"Belgium"  2 "2003" "203479000" "1.076461129929193"  "1.097109321098706"  "-.3292282168301099"
"Belgium"  2 "2004" "254260000" "1.312537125220892"  ".783644244392903"   "-.3445579339966116"
"Belgium"  2 "2005" "278069300" "1.408761366994995"  ".5412083125216043"  "-.3630515445323113"
"Belgium"  2 "2006" "328224800" "1.469008083114495"  ".2761671932472041"  "-.3780829601737068"
"Belgium"  2 "2007" "399167100" "1.527217282333993"  ".0185255577209986"  "-.3913022873326071"
"Belgium"  2 "2008" "471904900" "1.622884240942696"  "-.3050427344651965" "-.4098649309953046"
"Belgium"  2 "2009" "377217400" "1.649642827765895"  "-.444219273757902"  "-.420585170634709" 
"Belgium"  2 "2010" "447540900" "1.754465818836096"  "-.3424530512790014" "-.4334694235139054"
"Belgium"  2 "2011" "476072700" "1.820813707667696"  "-.3540547723328018" "-.4506867045009031"
"Belgium"  2 "2012" "471962500" "1.930985100447394"  "-.4128715802334995" "-.4611845414062046"
"Belgium"  2 "2013" "494917700" "2.022298897724198"  "-.4191095057589962" "-.4715954975592993"
"Belgium"  2 "2014" "517249300" "1.9478777170334"    "-.4614612203901984" "-.4817944606082989"
"Belgium"  2 "2015" "462273700" "1.921528790290004"  "-.4723106970221949" "-.4926251645836004"
"Belgium"  2 "2016" "485990500" "1.945843827191304"  "-.5911460549507979" "-.5033391209945037"
"Belgium"  2 "2017" "508085700" "1.841171280076601"  "-.7111840237362017" "-.5109963075212036"
"Bulgaria" 3 "1990" "1897707"   ".0481634656111964"  ".9024480458070983"  "-.0191678639953992"
"Bulgaria" 3 "1991" "1501139"   ".1836609116141972"  "1.944608703151403"  "-.0441903522094975"
"Bulgaria" 3 "1992" "1880353"   ".374990608645696"   "2.423186855571203"  "-.0670849539272993"
"Bulgaria" 3 "1993" "1723912"   ".6465782510456961"  "2.387666853502203"  "-.0922915704774994"
"Bulgaria" 3 "1994" "1887298"   ".8342666347497953"  "2.295696995644306"  "-.1160531306933024"
"Bulgaria" 3 "1995" "2491514"   ".9401863633068999"  "1.998167904203711"  "-.1434793089426023"
"Bulgaria" 3 "1996" "2505765"   ".8188593139070974"  "1.78346026440601"   "-.1607268091505034"
"Bulgaria" 3 "1997" "2769917"   ".7875487975271014"  "1.81007552918431"   "-.1758015333879044"
"Bulgaria" 3 "1998" "2872591"   ".8011219821127966"  "1.812754054488806"  "-.1918774365390092"
"Bulgaria" 3 "1999" "2894217"   ".8388342510556939"  "1.901551387258806"  "-.2060778344974139"
"Bulgaria" 3 "2000" "2948860"   ".8847901921023933"  "1.826584293540904"  "-.2273702337304115"
"Bulgaria" 3 "2001" "2972950"   ".9489439828825965"  "1.672656645632905"  "-.2585411221463119"
"Bulgaria" 3 "2002" "3236181"   ".9532259074613947"  "1.385412068788604"  "-.3050060786057145"
"Bulgaria" 3 "2003" "4059918"   "1.076461129929193"  "1.097109321098706"  "-.3292282168301099"
"Bulgaria" 3 "2004" "5344343"   "1.312537125220892"  ".783644244392903"   "-.3445579339966116"
"Bulgaria" 3 "2005" "5499580"   "1.408761366994995"  ".5412083125216043"  "-.3630515445323113"
"Bulgaria" 3 "2006" "8273654"   "1.469008083114495"  ".2761671932472041"  "-.3780829601737068"
"Bulgaria" 3 "2007" "10386200"  "1.527217282333993"  ".0185255577209986"  "-.3913022873326071"
"Bulgaria" 3 "2008" "12811460"  "1.622884240942696"  "-.3050427344651965" "-.4098649309953046"
"Bulgaria" 3 "2009" "9413788"   "1.649642827765895"  "-.444219273757902"  "-.420585170634709" 
"Bulgaria" 3 "2010" "12147060"  "1.754465818836096"  "-.3424530512790014" "-.4334694235139054"
"Bulgaria" 3 "2011" "14525560"  "1.820813707667696"  "-.3540547723328018" "-.4506867045009031"
"Bulgaria" 3 "2012" "14306560"  "1.930985100447394"  "-.4128715802334995" "-.4611845414062046"
"Bulgaria" 3 "2013" "14811590"  "2.022298897724198"  "-.4191095057589962" "-.4715954975592993"
"Bulgaria" 3 "2014" "15562760"  "1.9478777170334"    "-.4614612203901984" "-.4817944606082989"
"Bulgaria" 3 "2015" "13264430"  "1.921528790290004"  "-.4723106970221949" "-.4926251645836004"
"Bulgaria" 3 "2016" "13868150"  "1.945843827191304"  "-.5911460549507979" "-.5033391209945037"
"Bulgaria" 3 "2017" "14494140"  "1.841171280076601"  "-.7111840237362017" "-.5109963075212036"
"Croatia"  4 "1990" "1292701"   ".0481634656111964"  ".9024480458070983"  "-.0191678639953992"
"Croatia"  4 "1991" "1342475"   ".1836609116141972"  "1.944608703151403"  "-.0441903522094975"
"Croatia"  4 "1992" "1278446"   ".374990608645696"   "2.423186855571203"  "-.0670849539272993"
"Croatia"  4 "1993" "1161347"   ".6465782510456961"  "2.387666853502203"  "-.0922915704774994"
"Croatia"  4 "1994" "1278436"   ".8342666347497953"  "2.295696995644306"  "-.1160531306933024"
"Croatia"  4 "1995" "1499801"   ".9401863633068999"  "1.998167904203711"  "-.1434793089426023"
"Croatia"  4 "1996" "1686822"   ".8188593139070974"  "1.78346026440601"   "-.1607268091505034"
"Croatia"  4 "1997" "1771090"   ".7875487975271014"  "1.81007552918431"   "-.1758015333879044"
"Croatia"  4 "1998" "1875018"   ".8011219821127966"  "1.812754054488806"  "-.1918774365390092"
"Croatia"  4 "1999" "1874454"   ".8388342510556939"  "1.901551387258806"  "-.2060778344974139"
"Croatia"  4 "2000" "2185950"   ".8847901921023933"  "1.826584293540904"  "-.2273702337304115"
"Croatia"  4 "2001" "2262487"   ".9489439828825965"  "1.672656645632905"  "-.2585411221463119"
"Croatia"  4 "2002" "2448941"   ".9532259074613947"  "1.385412068788604"  "-.3050060786057145"
"Croatia"  4 "2003" "3073162"   "1.076461129929193"  "1.097109321098706"  "-.3292282168301099"
"Croatia"  4 "2004" "3927037"   "1.312537125220892"  ".783644244392903"   "-.3445579339966116"
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 785 observations
Use the count() option to list more

.

Post-estimation tests or heteroskedasticity check to confirm normality of the residuals?

Hi Stata users,

I'd like to ask for your help. I read STATA manual pdf on post-estimation for mixed-effects linear regression model, however, I still don't get which tests I need to use to confirm the normality of the residuals (because my data is non-distributed). And how to read the results of post-estimation for me to determine whether the model fits well and if the non-distributed data was okay to be used.

Here's my command for the mixed model: mixed health t_in rh_in i.race || Homeid:

Thank you for your help.

Tuesday, August 23, 2022

what is the base for -collapse- percent statistic with by()?

The -collapse- command (https://www.stata.com/manuals/dcollapse.pdf) has a -percent- statistic that (according to the documentation) shows the

percentage of nonmissing observations

Here is an example with a by() option:

Code:

. input a b

             a          b
  1. 1 1
  2. 1 1
  3. 1 2
  4. . 2
  5. end

. collapse (percent) a,by(b)

. list

     +--------------+
     | b          a |
     |--------------|
  1. | 1   66.66667 |
  2. | 2   33.33333 |
     +--------------+

I am surprised by this result. I expected the collapsed values of a to be 100 and 50. Are not all the values of a non-missing for b==1 and half the values non-missing for b==2? Where does the 3 in the denominator of the -collapse- calculation come from? Is there a way to get the result I was expecting?

Correlating random effects to study interhospital variation

Hi All,

Using data from ~2000 hospitals, I have identified patients who underwent 1 of 3 operations (type A, B, C). I would like to explore whether hospitals with high mortality following operation A also have high mortality following operations B and C.

As an initial approach, I fit 3 multi-level, mixed effects models with mortality as my dependent variable and hospital identifier as the random effect. I then estimated the random effects using the "predict" postestimation command and assessed correlations between the random effects using the "corr" command. Following pairwise comparison, I found no association (r<0.1) between random effects of mortality after all three operations.

To assess the efficacy of my method, I split my sample of Type A operations and fit separate multi-level models to each sample. Although I would expect to see significant correlation in random effects across the two samples, I did not, leading me to believe that there is something wrong with my methodology.

I would appreciate any advice.

Code:

melogit died age i.female if operationtype == 1 || hospital: , or base
predict randomeffect1 if operationtype == 1, reffects

melogit died age i.female if operationtype == 2 || hospital: , or base
predict randomeffect2 if operationtype == 2, reffects

melogit died age i.female if operationtype == 3 || hospital: , or base
predict randomeffect3 if operationtype == 3, reffects

preserve
duplicates drop hospital, force
scatter randomeffect1 randomeffect2
corr randomeffect1 randomeffect2