Wednesday, September 30, 2020

Help with R squared in Cox model with shared frailty

Hi all,

I am fitting a Cox model with shared frailty and I hope to get the (pseudo) R squared for my model. Here is my code:

stcox x1 x2 x3 x4, shared(ID)
display e(r2_p)


Nothing comes out.


However, if I take out the shared command, I can get an output of (pseudo) R squared.

stcox x1 x2 x3 x4
display e(r2_p)

.05290264


Can anyone help with this problem? Is it even possible to get the (pseudo) R squared in a Cox model with shared frailty?

Thanks!

Jasmine

Tabulation of percentages for an outcome variable by gender for each racial group

Hi,
I would please like some help with a tabulation issue I am having. So, I have about 10 causes of death and I would like the percentages in a table by gender for each race. So for example, the table would contain 5 columns if I am considering 2 races. The first column would be the causes of death and the 2nd column would be a percentage for each cause of death for males of race 1, then 3rd column would be %age for each cause of death for females of race 1, 4th column would be % for each COD for males of race 2 and the 5th column would be % for each COD for females of race 2.
Thank you

How to add multiple regression lines to a marginsplot graph?

Hello,

I would like to use marginsplot to show many regressions on the same graph. My dependent variable is a scale (0, .5,1, 1.5, 2, 2.5, 3). I would like to graph separate race/sex pairs. I can show the probability of being at each point in the scale by using if statements.

Code:
ologit chinese_scale age educ ib2.pid  if black==1 & woman ==1
margins
marginsplot, noci
The code produces this graphArray


However, when I try to use the following code, I can no longer see the scale on the x-axis

Code:
ologit chinese_scale age educ ib2.pid  i.birace i.woman
margins woman#birace
marginsplot, noci
Now the scale is no longer on the x-axis

Array

I would greatly appreciate and help or thoughts you can provide.

Crosstabulation Question: Options row vs. col

I know this is a really basic question, but the logic confuses me every time I do crosstabs - even within a few weeks of the last time I did it. Can anyone recommend a trick, like a mnemonic device or a pattern, that helps with knowing:

1) when to use (ex) tab var1 var2, row (instead of tab var1 var2, col)
2) how to interpret the results (of ex, tab var1 var2, row or tab var1 var2 col)

I swear I'm not stupid; I just really get my logic inverted when I look at these tables. If anyone has any advice on how to keep track of when to use & how to interpret row vs column crosstabs, it would be a huge help.

Thank you,

Tatiana

Which is the correct approach in coding a dummy variable

Hi Statalist.

I want to generated a dummy variable from a categorical variable with values ranging '0-10'. The range '0-2' is nil to low and '3-10' is mid-high. I note that I have two categorical variables: one relates to responses by husbands and the other by wives(relimp1 - importance for husband, relimp2 - importance for wife):
Code:
gen byte imp2 = inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) & relimp1 < . & relimp2 < .
However as you can see below, "0" was given when relimp1 or relimp2 were 'missing', so I tried:
Code:
gen byte imp4 = 1 if inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) & relimp1 < . & relimp2 < . replace imp4 = 0 if (relimp12 == 1 & relimp22 == 1) | (relimp12 == 1 & inlist(relimp22, 2, 3)) | (inlist(relimp12, 2, 3) & relimp22 == 1)
which provided "1" when true, "0" when false, and "." when missing - which is what I thought I should get. Based on my reading of https://www.stata.com/support/faqs/d...rue-and-false/ I thought the first piece of code would have given me this outcome.

Given the first piece of code has considerably more "0" than the second piece of code, I believe I should go with the second piece of code (imp4). Am I reading too much into this? Help is appreciated.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave relimp1 relimp2 imp2 imp4)
106 1002 10  .  . 0 .
106 1002 11  .  . 0 .
106 1002 12  .  . 0 .
106 1002 13  .  . 0 .
106 1002 14  0  0 0 0
106 1002 15  .  . 0 .
106 1002 16  .  . 0 .
106 1002 17  .  . 0 .
106 1002 18  0  0 0 0
108  109  1  .  . 0 .
108  109  2  .  . 0 .
108  109  3  .  . 0 .
108  109  4  5  6 1 1
108  109  5  .  . 0 .
108  109  6  .  . 0 .
108  109  7  .  5 0 .
103  104  1  .  . 0 .
103  104  2  .  . 0 .
103  104  3  .  . 0 .
103  104  4 10 10 1 1
103  104  5  .  . 0 .
103  104  6  .  . 0 .
103  104  7 10 10 1 1
103  104  8  .  . 0 .
103  104  9  .  . 0 .
103  104 10 10 10 1 1
103  104 11  .  . 0 .
103  104 12  .  . 0 .
103  104 13  .  . 0 .
103  104 14 10 10 1 1
103  104 15  .  . 0 .
103  104 16  .  . 0 .
103  104 17  .  . 0 .
103  104 18 10 10 1 1
end
Am I correct in my understanding that
Code:
! missing(relimp1, relimp2)    is the same as   
relimp1 < . & relimp2 < .
Stata 15.1

Note this was originally posted at https://www.statalist.org/forums/for...=1601514760045 though resposted as nature of question differs from that thread.

Knots in Non parametric series regression

Dear All,

I am Maheswaran Kesavan doing masters in University college London.

I am doing an Non parametric series regression using B spline basis.
I want to know :
1) Where the knot lies in my data.
2) How to incorporate a knot at a specific point of my choice.
3) How to plot the curve for Non parametric series regression.
4) What is the minimum number of data points we can use in Non parametric regression models (mine is 100)

Thank you in advance

Predict based on regression model

Hi,

I estimated on the regression model:

Code:
reg lnChild lnCash lnWhite lnCash*lnWhite
Based on this regression, I want to predict the outcome using the mean values of the independent variables.

For instance, I want to plug in the average of Cash on this regression to see how it differs to the real # of children.

Do you have any ideas how to do this?

Thank you in advance!

ICC (Intra-class correlation coefficient) vs "9% of this variation in mortality was attributable solely to the surgeon."

I am trying to find out the relationship between a. the ICC for surgeons and the b. the variation due to surgeons.
In Udyavar,2018 (The impact of individual physicians on outcomes after trauma: is it the system or the surgeon?") both the ICC ("Surgeons with higher mortality rates were not clustered at specific hospitals, as the intraclass correlation for surgeon level mortality rates was 0.02") and the quote in the subject above were given. I am doing a systematic review and wonder how the ICC and this particular variation are related (the relationship is not that the ICC is the square of the variation).

The reason is that some papers list the variation due to the surgeon while other papers show the ICC (Intra-class correlation coefficient).

(The ICC is important as even a small ICC can have a substantial design effect - if you cluster a randomized controlled trial by practitioner - surgeon for example - you will need substantially more patients to gain sufficient statistical power than if the ICC is nil).

Does anybody know how the ICC for a practitioner and the variation due to a practitioner are related?

power analysis for modified poisson regression

Hello,

I am trying to determine the power of one my analyses. The outcome is binary and relative risk if the effect estimate, determined using a modified Poisson regression with robust error variance. The exposure variable is categorical and has 5 groups (placebo with 4 levels of treatment). I am struggling on how to determine the power, or which options to choose in STATA, for this aim given the 5 levels of exposure. I was wondering if anyone could provide some assistance on how to proceed with this power analysis in STATA? Thanks!

"Not sorted"

Dear All,
I have individual level panel data that includes spells, with a 6 month *follow up* period after each spell as follows:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(id month) byte spell float spell_followup double(base_y followup_y) float followupmonth
52   1 0 .                 .                  . .
52   2 0 .                 .                  . .
52   3 0 .                 .                  . .
52   4 0 .                 .                  . .
52   5 0 .                 .                  . .
52   6 0 .                 .                  . .
52   7 0 .                 .                  . .
52   8 0 .                 .                  . .
52   9 0 .                 .                  . .
52  10 0 .                 .                  . .
52  11 0 .                 .                  . .
52  12 0 .                 .                  . .
52  13 0 .                 .                  . .
52  14 0 .                 .                  . .
52  15 0 .                 .                  . .
52  16 0 .                 .                  . .
52  17 1 .                 .                  . .
52  18 1 .                 .                  . .
52  19 1 .                 .                  . .
52  20 1 . 30.35714340209961                  . .
52  21 0 1                 .              21.25 1
52  22 2 1                 .              21.25 2
52  23 2 1                 .              21.25 3
52  24 2 1                 .              21.25 4
52  25 2 1                 .              21.25 5
52  26 2 1             21.25              21.25 6
52  27 0 2                 .                  0 1
52  28 0 2                 .  32.04545593261719 2
52  29 0 2                 .  69.54545593261719 3
52  30 0 2                 .                 75 4
52  31 0 2                 .               37.5 5
52  32 0 2                 .                  0 6
52  33 0 .                 .                  . .
52  34 0 .                 .                  . .
52  35 0 .                 .                  . .
52  36 0 .                 .                  . .
52  37 0 .                 .                  . .
52  38 0 .                 .                  . .
52  39 0 .                 .                  . .
52  40 0 .                 .                  . .
52  41 0 .                 .                  . .
52  42 0 .                 .                  . .
52  43 0 .                 .                  . .
52  44 0 .                 .                  . .
52  45 0 .                 .                  . .
52  46 0 .                 .                  . .
52  47 0 .                 .                  . .
52  48 0 .                 .                  . .
52  49 0 .                 .                  . .
52  50 0 .                 .                  . .
52  51 0 .                 .                  . .
52  52 0 .                 .                  . .
52  53 0 .                 .                  . .
52  54 0 .                 .                  . .
52  55 0 .                 .                  . .
52  56 0 .                 .                  . .
52  57 0 .                 .                  . .
52  58 0 .                 .                  . .
52  59 0 .                 .                  . .
52  60 0 .                 .                  . .
52  61 0 .                 .                  . .
52  62 0 .                 .                  . .
52  63 0 .                 .                  . .
52  64 0 .                 .                  . .
52  65 0 .                 .                  . .
52  66 0 .                 .                  . .
52  67 0 .                 .                  . .
52  68 0 .                 .                  . .
52  69 0 .                 .                  . .
52  70 0 .                 .                  . .
52  71 0 .                 .                  . .
52  72 0 .                 .                  . .
52  73 0 .                 .                  . .
52  74 0 .                 .                  . .
52  75 0 .                 .                  . .
52  76 0 .                 .                  . .
52  77 0 .                 .                  . .
52  78 0 .                 .                  . .
52  79 0 .                 .                  . .
52  80 0 .                 .                  . .
52  81 0 .                 .                  . .
52  82 0 .                 .                  . .
52  83 0 .                 .                  . .
52  84 0 .                 .                  . .
52  85 0 .                 .                  . .
52  86 3 .                 .                  . .
52  87 3 .                 .                  . .
52  88 3 .             23.25                  . .
52  89 0 3                 . 10.576614379882813 1
52  90 0 3                 . 5.2016143798828125 2
52  91 0 3                 .                  0 3
52  92 0 3                 .                 10 4
52  93 0 3                 .             27.375 5
52  94 0 3                 .              34.75 6
52  95 0 .                 .                  . .
52  96 0 .                 .                  . .
52  97 0 .                 .                  . .
52  98 0 .                 .                  . .
52  99 0 .                 .                  . .
52 100 0 .                 .                  . .
end

In each of the 6 month follow up period, I want to check if the deviation from the base_y is greater than 15/30/40%. The code I was trying to run is:

Code:
tsset id month
bysort id spell_followup (followupmonth): gen var15=((l1.base_y-followup_y)/l1.base_y/>.15) if followupmonth==1
bysort id spell_followup (followupmonth): gen var30=((l1.base_y-followup_y)/l1.base_y/>.30) if followupmonth==1
bysort id spell_followup (followupmonth): gen var50=((l1.base_y-followup_y)/l1.base_y/>.50) if followupmonth==1

But I get an error "not sorted".

Code:
. bysort id spell_followup (followupmonth): gen var15=((l1.base_y-followup_y)/l1.ba
> se_y/>.15) if followupmonth==1
not sorted
r(5);

end of do-file

r(5);
I did try to tsset the data and use bysort. But, I am clearly messing up somewhere but the error doesn't give a lot of detail and I am stuck. I will be grateful for your help.
Sincerely,
Sumedha.

initial values not feasible melogit

Hello -- I am running an melogit on a 330,000 person nested in ~1900 neighborhoods in 47 countries. For my models, I keep getting "Initial Values Not Feasible". Below are the recommendations I have tried from other forums. I also randomly selected 50% of my sample and it ran then. However, this is not a really feasible solution. Any help is greatly appreciated !!!

Code:
melogit YNipv urban femalemore malemore working Zsurveyyr [pw=dvwgt] || country: || newid: , or nolog

logit YNipv zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt]
mat a=e(b)
mat a1=(a,0)
melogit YNipv zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] || country: ///
|| newid: , or from (B) intmethod(laplace)

melogit YNipv zfemeduc zage working Zsurveyyr childtot urban || country: || newid: ,noestimate
matrix define B = e(b)
matrix define B[1,4] = 1e-8
matrix b1=e(b)
melogit YNipv zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] || country: ///
|| newid: , or from (B, skip)

melogit YNipvemo zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] || country: ///
|| newid: , or startgrid(2)

Comparing two datasets by two variables

Hi, I am very new to stata. I have x1, y1, z variables in data1.dta and x2, y2, N in data2.dta.

I am trying to run an analysis where:
  1. Step 1: x1 and x2 will be matched (merge) first.
  2. Step 2: Within the matched result y1 and y2 will be matched (merge).
  3. Expected result will be the data where y1 and y2 have finally matched (merge) and I'll get to see the z and N where x1=x2 within which y1=y2.
Constraints are:
Data within x1, y1, x2, y2 aren't unique which is why I can't merge or append the datasets.

I was hoping to run this process in a loop.

Thanks in advance.

Log transforming variables

Two questions related to log transforming variables.

I understand that we would want to log transform the dependent variable if normal distribution is not present. However, what if you take the log and you still don't have a normal distribution?

I am not clear on how to determine if you should take the log of independent variables.

How to assess if there is enough variation in your dependent variable

Is there a simple way to assess if there is enough variation in your dependent variable, or is it best to just run the regression and asses the R-squared value?

Change in variance time series

Hi everyone,

I am analysing a time series (stock returns) and I am trying to check whether variance in the second half of my sample is different from the first half. I assigned a period to the observations. Here is an example (not the real data, but this is what it looks like):
PHP Code:
Period     X            Date
      1    .02784243     1
/8/2010
      1    .01478848     1
/15/2010
      1    
-.04267111    1/22/2010
      2    
-.011348      1/29/2010
      2    
-.09616897    2/5/2010 

PHP Code:
robvar Polenby(Periode)

Summary of X
Periode         Mean            Std
Dev.        Freq.
        
1               .0000922          .0367802        261
2               .00006544        .02613092        261
        
Total           .00007882        .03187241        522

W0  
10.8059198   df(1520)     Pr F    =    0.00108013

W50 
=  9.6731110   df(1520)     Pr F    =    0.0019724

W10 
=  9.8870904   df(1520)     Pr F    =    0.00175953 

I am wondering whether this is a valid method for time series? Anyone around here who can help me answer this question? If it isn't, is there another method (that is not too hard for a beginner?) Thanks in advance!!

xtivreg , first failes with "conformability error" r(503)

Dear Readers

xtivreg without "first" runs fine, but fails when I add the "first" option with "conformability error" r(503)

Using "set trace on" the error appears at:

Code:
est repost b=`bw', rename findomitted buildfvinfo
= est repost b=__00008Q, rename findomitted buildfvinfo
conformability error
di
di as text "First-stage within regression"
`vv' xtreg , level(`level') `diopts'
}
}
    end    xtivreg.Estimate    ---
        end xtivreg    ---
r(503);
Any ideas?

Windows 10 pro up to date and Stata 16.1 29/9/2020

All help most appreciated.

Best wishes

Richard

Split String

Code:
 
 clear input HAVE  WANT1   WANT2 AA01    AA  01 AZ02    AZ  02 AV03    AV  03 AA04    AA  04 AA05    AA  05 A06 A   06 A07 A   07 A08 A   08 A09 A   09 A1Z0    AZ  10 B11 B   11 BB12    BB  12 BQ13    BQ  13 D14 D   14 F15 F   15 G16 G   16 G17 G   17 H18 H   18 I19 I   19 I20 I   20 I21 I   21 I22 I   22 II23    II  23 end

I have variable 'HAVE' and wish to get 'WANT1' and 'WANT2' where 'WANT1' is the alpha character in 'HAVE' and 'WANT2' is the numeric. I know how to split string based on position for example to take the first or second character of the string but I do not know how to specify to put the alpha characteristics in 'WANT1' and the numeric in 'WANT2'

I crosspost this on Stack Overflow.

New version of wridit on SSC

Thanks as always to Kit Baum, a new version of the wridit package is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of wridit.

The wridit package is described as below on my website. The new version adds an option handedness(), with possible values center, left and right, specifying the default standard center ridits, left-continuous ridits and right-continuous ridits, respectively. The right-continuous ridit function is also known as the cumulative distribution function.

This version is planned as the final Stata Version 10 version of wridit. I am planning the first Stata Version 16 version of wridit, using data frames.

Best wishes

Roger

---------------------------------------------------------------------------
package wridit from w:\stata10
---------------------------------------------------------------------------

TITLE
wridit: Generate weighted ridits

DESCRIPTION/AUTHOR(S)
wridit inputs a variable and generates its weighted ridits. If no
weights are provided, then all weights are assumed equal to 1, so
unweighted ridits are generated. Zero weights are allowed, and
imply that the ridits calculated for the observations with zero
weights will refer to the distribution of weights in the
observations with nonzero weights.

Author: Roger Newson
Distribution-Date: 28september2020
Stata-Version: 10

INSTALLATION FILES (click here to install)
wridit.ado
wridit.sthlp
---------------------------------------------------------------------------
(click here to return to the previous screen)


compare value different variable and rows

hello,
I want to compare values of two varible in diffrent rows.
For example
vr1 vr2
10 20
20 30
40 50
the vr1 has the value 20for obs2 and for obs 1 in vr2
thank you

table summarizing 3 categorical variables in string form between two groups (binary variable)

Hello,

I am having a hard time finding examples of summary tables between two groups let's say students who dropped out vs those who didn't (a binary variable 1=dropped out and 2= student), I want to know their gender, age (18-88), parents' highest education level(5 choices), marital status (5 choices).
I have been using tabulate command to make two-way table to compare each demographic (tab gender dropout) but ideally I would like to have them all in one table.

Combining two datasets and keeping specific observations

I have two datasets. Dataset A and Dataset B. Dataset A is my existing data that I have organized and cleaned. Data examples are given below:

Dataset A

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str20(reporter partner) double import int year
"Albania" "Argentina"               515256 2019
"Albania" "Australia"               243387 2019
"Albania" "Austria"                4070764 2019
"Albania" "Bahrain"                    400 2019
"Albania" "Bangladesh"              653907 2019
"Albania" "Belgium"                2439898 2019
"Albania" "Brazil"                 5157799 2019
"Albania" "Bulgaria"               4492959 2019
"Albania" "Cambodia"                228032 2019
"Albania" "Cameroon"                 45297 2019
"Albania" "Canada"                 3393615 2019
"Albania" "Chile"                     3844 2019
"Albania" "China"                 45867296 2019
"Albania" "Colombia"               3427729 2019
"Albania" "Costa Rica"               69864 2019
"Albania" "Croatia, Rep. of"       4308223 2019
"Albania" "Cyprus"                   75552 2019
"Albania" "Czech Rep."             5715907 2019
"Albania" "Denmark"                 711316 2019
"Albania" "Egypt"                  3067760 2019
"Albania" "Finland"                 972839 2019
"Albania" "France"                 6724754 2019
"Albania" "Germany"               26218870 2019
"Albania" "Greece"                36231564 2019
"Albania" "Greenland"               498771 2019
"Albania" "Hong Kong"               202057 2019
"Albania" "Hungary"                3770876 2019
"Albania" "Iceland"                    103 2019
"Albania" "India"                  3752246 2019
"Albania" "Indonesia"              1287032 2019
"Albania" "Iran"                    169407 2019
"Albania" "Iraq"                     28564 2019
"Albania" "Ireland"                1270395 2019
"Albania" "Israel"                 5164563 2019
"Albania" "Italy"                101123701 2019
"Albania" "Japan"                  1325641 2019
"Albania" "Jordan"                   37307 2019
"Albania" "Kenya"                   106517 2019
"Albania" "Kuwait"                    5607 2019
"Albania" "Lithuania"               520594 2019
"Albania" "Luxembourg"               37569 2019
"Albania" "Malaysia"                779891 2019
"Albania" "Mauritius"                31204 2019
"Albania" "Mexico"                  625250 2019
"Albania" "Morocco"                 176254 2019
"Albania" "Netherlands, The"       4212582 2019
"Albania" "New Zealand"               1273 2019
"Albania" "Nigeria"                 101085 2019
"Albania" "Norway"                  648814 2019
"Albania" "Pakistan"               1018572 2019
"Albania" "Panama"                       0 2019
"Albania" "Paraguay"                 47577 2019
"Albania" "Peru"                    154590 2019
"Albania" "Philippines"              57933 2019
"Albania" "Poland, Rep. of"        6135012 2019
"Albania" "Portugal"                821642 2019
"Albania" "Qatar"                   231825 2019
"Albania" "Romania"                3265392 2019
"Albania" "Russian Federation"    10036688 2019
"Albania" "Saudi Arabia"             27167 2019
"Albania" "Serbia, Rep. of"       14200179 2019
"Albania" "Sierra Leone"              3814 2019
"Albania" "Singapore"                 6700 2019
"Albania" "Slovak Rep."             702780 2019
"Albania" "Slovenia, Rep. of"     10444300 2019
"Albania" "South Africa"            136819 2019
"Albania" "South Korea"            1617098 2019
"Albania" "Spain"                  4673846 2019
"Albania" "Sri Lanka"               265608 2019
"Albania" "Sweden"                  711592 2019
"Albania" "Switzerland"           14513361 2019
"Albania" "Taiwan"                  645289 2019
"Albania" "Thailand"                927471 2019
"Albania" "Tunisia"                1280895 2019
"Albania" "Turkey"                35528918 2019
"Albania" "Uganda"                   15271 2019
"Albania" "Ukraine"                4250674 2019
"Albania" "United Arab Emirates"     17129 2019
"Albania" "United Kingdom"         3322489 2019
"Albania" "United States"          8260812 2019
"Albania" "Venezuela"                10681 2019
"Albania" "Vietnam"                1865398 2019
"Algeria" "Albania"                   5720 2019
"Algeria" "Angola"                    9576 2019
"Algeria" "Argentina"             90107125 2019
"Algeria" "Australia"               190070 2019
"Algeria" "Austria"               27912946 2019
"Algeria" "Bahrain"                1844807 2019
"Algeria" "Bangladesh"             1456613 2019
"Algeria" "Belgium"               36570747 2019
"Algeria" "Brazil"                50796873 2019
"Algeria" "Bulgaria"               5689167 2019
"Algeria" "Cambodia"                590320 2019
"Algeria" "Cameroon"                670926 2019
"Algeria" "Canada"                18572061 2019
"Algeria" "Chile"                   380512 2019
"Algeria" "China"                518205282 2019
"Algeria" "Colombia"                163860 2019
"Algeria" "Costa Rica"               76043 2019
"Algeria" "Croatia, Rep. of"       4332027 2019
end
Dataset B

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str40 reporter str42 partner double import float year
"Albania" "Advanced Economies"                312497189 2019
"Albania" "Advanced Economies"                201028905 2019
"Albania" "Advanced Economies"                261178953 2019
"Albania" "Africa"                              2325741 2019
"Albania" "Africa"                              2782719 2019
"Albania" "Africa"                              4857234 2019
"Albania" "Algeria"                             1807390 2019
"Albania" "Algeria"                             1510581 2019
"Albania" "Algeria"                             1369995 2019
"Albania" "Antigua and Barbuda"                      62 2019
"Albania" "Antigua and Barbuda"                      74 2019
"Albania" "Antigua and Barbuda"                  447646 2019
"Albania" "Argentina"                           1027521 2019
"Albania" "Argentina"                            815640 2019
"Albania" "Argentina"                            858781 2019
"Albania" "Australia"                            282861 2019
"Albania" "Australia"                             40745 2019
"Albania" "Australia"                            338440 2019
"Albania" "Austria"                             4004822 2019
"Albania" "Austria"                             4791717 2019
"Albania" "Austria"                             3984711 2019
"Albania" "Azerbaijan, Rep. of"                    6999 2019
"Albania" "Bahrain, Kingdom of"                       . 2019
"Albania" "Bangladesh"                            23529 2019
"Albania" "Bangladesh"                           520849 2019
"Albania" "Bangladesh"                           623189 2019
"Albania" "Belarus, Rep. of"                      51160 2019
"Albania" "Belarus, Rep. of"                      16541 2019
"Albania" "Belarus, Rep. of"                      61212 2019
"Albania" "Belgium"                             4005557 2019
"Albania" "Belgium"                             3347765 2019
"Albania" "Belgium"                             3558505 2019
"Albania" "Bolivia"                                2260 2019
"Albania" "Bolivia"                                1222 2019
"Albania" "Bolivia"                                2703 2019
"Albania" "Bosnia and Herzegovina"              1878627 2019
"Albania" "Bosnia and Herzegovina"              2247752 2019
"Albania" "Bosnia and Herzegovina"              1622352 2019
"Albania" "Brazil"                              3207452 2019
"Albania" "Brazil"                              2352098 2019
"Albania" "Brazil"                              1965837 2019
"Albania" "Bulgaria"                            5100845 2019
"Albania" "Bulgaria"                            5279264 2019
"Albania" "Bulgaria"                            6103094 2019
"Albania" "Cambodia"                             250155 2019
"Albania" "Cambodia"                               2493 2019
"Albania" "Cambodia"                             299307 2019
"Albania" "Cameroon"                              53497 2019
"Albania" "Cameroon"                              44712 2019
"Albania" "Cameroon"                              41696 2019
"Albania" "Canada"                              5997194 2019
"Albania" "Canada"                              3317402 2019
"Albania" "Canada"                              2772619 2019
"Albania" "Chile"                                109403 2019
"Albania" "Chile"                                 70864 2019
"Albania" "Chile"                                130899 2019
"Albania" "China"                              50511440 2019
"Albania" "China"                              28918088 2019
"Albania" "China"                              42216460 2019
"Albania" "China, P.R.: Macao"                     3742 2019
"Albania" "China, P.R.: Macao"                     4478 2019
"Albania" "Colombia"                             261384 2019
"Albania" "Colombia"                             277623 2019
"Albania" "Colombia"                             218459 2019
"Albania" "Congo, Dem. Rep. of the"               51263 2019
"Albania" "Congo, Dem. Rep. of the"               61335 2019
"Albania" "Costa Rica"                            49926 2019
"Albania" "Costa Rica"                            59735 2019
"Albania" "Costa Rica"                           148829 2019
"Albania" "Croatia, Rep. of"                    5025324 2019
"Albania" "Croatia, Rep. of"                    6181352 2019
"Albania" "Croatia, Rep. of"                    5166251 2019
"Albania" "Cuba"                                    423 2019
"Albania" "Cuba"                                    506 2019
"Albania" "Cyprus"                                87233 2019
"Albania" "Cyprus"                               104373 2019
"Albania" "Cyprus"                               471599 2019
"Albania" "Czech Rep."                          4687524 2019
"Albania" "Czech Rep."                          3935415 2019
"Albania" "Czech Rep."                          3917740 2019
"Albania" "Côte d'Ivoire"                            . 2019
"Albania" "Côte d'Ivoire"                            . 2019
"Albania" "Côte d'Ivoire"                        11059 2019
"Albania" "Denmark"                              978386 2019
"Albania" "Denmark"                             1446774 2019
"Albania" "Denmark"                             1209185 2019
"Albania" "Dominican Rep."                        11869 2019
"Albania" "Dominican Rep."                        14201 2019
"Albania" "Dominican Rep."                         5096 2019
"Albania" "Ecuador"                             3735312 2019
"Albania" "Ecuador"                             2943260 2019
"Albania" "Ecuador"                             3521572 2019
"Albania" "Egypt"                                978088 2019
"Albania" "Egypt"                               2966472 2019
"Albania" "Egypt"                               1170269 2019
"Albania" "Emerging and Developing Asia"       61863154 2019
"Albania" "Emerging and Developing Asia"       51703998 2019
"Albania" "Emerging and Developing Asia"       36319937 2019
"Albania" "Emerging and Developing Economies" 162389109 2019
"Albania" "Emerging and Developing Economies" 222863859 2019
end



As you can see, there are observations in Dataset B not available in Dataset A. I want to combine these two dataset such that only the observations (in this case countries) in Dataset A are kept. The countries that are in Dataset A are kept. The rest of the observations are dropped. I am doing it manually using the
Code:
drop if reporter == "" | partner ==""
command but it is very time consuming.

Replacing values from one observation to another

Dear Statalist users,

this may be a trivial question to you, but I am a little bit struggling to do this.

My data are as this example

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float year str3 iso3 str36 cntry double high float share
2010 "CSK" "W"  682448769         .
2012 "CSK" "W" 1113816002         .
2010 ""    "A"          .   .736144
2012 ""    "A"          .  .7545093
2010 ""    "B"          . .26385596
2012 ""    "B"          .  .2454907
end
What I would like to do is to have for variable high for countries A and B the values of country W multiplied by the share. But I would like to do this only if the value for high is missing.

So I should have something like

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float year str3 iso3 str36 cntry double high float share
2010 "CSK" "W"  682448769         .
2012 "CSK" "W" 1113816002         .
2010 ""    "A"          502380567   .736144
2012 ""    "A"         840384531,99  .7545093
2010 ""    "B"      180068175     .26385596
2012 ""    "B"          27338243,6  .2454907
end
Any help would be appreciated!
Thank you!!

Fixed effects regression is doing something I'm not noticing?

In Stata I complete a fixed-effects conditional logistic regression model of a binary predictor (1==Unemployed | 0 ==Employed) on a binary outcome (1==Overweight | 0 == Not overweight), with some controls in a longitudinal panel of 3 waves.
Code:
  clogit kidsweight i.parentsunemployed i.urban_or_rural i.year i.parents_age_y i.Parents_Educa i.Parents_Marital, cluster (id) group(id) nolog
note: multiple positive outcomes within groups encountered.
note: 9,091 groups (23,274 obs) dropped because of all positive or
      all negative outcomes.

Conditional (fixed-effects) logistic regression

                                                Number of obs     =      5,532
                                                Wald chi2(12)     =     268.06
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -1892.4384               Pseudo R2         =     0.0603

                                                       (Std. Err. adjusted for 1,945 clusters in id)
----------------------------------------------------------------------------------------------------
                                   |               Robust
       kidsweight|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------------------+----------------------------------------------------------------
               1.parentsunemployed |   .2586795   .0991969     2.61   0.009     .0642571    .4531019
                      1.urban_or_rural |   .0284788   .1521921     0.19   0.852    -.2698122    .3267699
                                   |
                              year |
                                1  |    .331549   .0608113     5.45   0.000     .2123611    .4507368
                                2  |  -.5641933   .0786183    -7.18   0.000    -.7182823   -.4101043
                                   |
                       parents_age_y |
                            30-39  |   -.019373   .1321831    -0.15   0.883    -.2784471     .239701
                       40 or more  |   -.131816   .1917598    -0.69   0.492    -.5076582    .2440262
                                   |
               Parents_Educa |
Leaving Certificate to Non Degree  |   .3654921   .2249296     1.62   0.104    -.0753619     .806346
        Primary Degree or greater  |   .4395884   .2934593     1.50   0.134    -.1355812    1.014758
                                   |
                     Parents_Marital |
                                2  |   -.154054   .2966866    -0.52   0.604     -.735549    .4274409
                                3  |  -.4093562   .3844533    -1.06   0.287    -1.162871    .3441584
                                4  |  -.1921434   .1805024    -1.06   0.287    -.5459217    .1616349
                                5  |   .7150017   1.125252     0.64   0.525    -1.490451    2.920455
----------------------------------------------------------------------------------------------------

. margins, dydx(parentsunemployed) post

Average marginal effects                        Number of obs     =      5,532
Model VCE    : Robust

Expression   : Pr(kidsweight |fixed effect is 0), predict(pu0)
dy/dx w.r.t. : 1.parentsunemployed

-----------------------------------------------------------------------------------------------
                              |            Delta-method
                              |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------+----------------------------------------------------------------
            1.parentsunemployed|   .0605013   .0229353     2.64   0.008     .0155489    .1054537
-----------------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
I report the coefficient on the unemployment variable as the effect of unemployment on overweight. So, here I say if your parent experienced unemployment at any point across the three waves of the study, your probability of being overweight was 0.06 percentage points higher.

I received a comment that I treat transitions from unemployment to employment on weight similarly to transitions from employment to unemployment on weight.

But, I only ever report the coefficient on parentsunemployed (0.06) and it's my understanding that due to how I set up my binary predictor and outcome I'm only ever considering the effect of a change from employment to unemployment on a change from not overweight to overweight.

So, why were changes from unemployment to employment even mentioned? Is it possible that I am considering this and don't even know it? And how?!

I could really do with some advice!

All the best,

John

Interpreting stset- output

Hello,
I use Stata 15.1 for survival analysis, using a Cox-model (stcox). My master dataset is the UCDP Peace Agreement Dataset V19.1, with three different merged replication datasets. I want to investigate the relationship between gender provisions in peace agreements and the duration of peace agreements.

While stsetting my data, I received the following picture: Array
How can I interpret the output?
What does the Probable Error mean?

Kind regards,
Theresa

Suest after fracreg

Hello all,

This seems like a simple question.

I wanted to compare coefficients from two models estimated using fracreg command (fractional logit).

*******
fracreg logit quality robots if industry==1

est store ind1

fracreg logit quality robots if industry=2

est store ind2

suest ind1 ind2

*******

I get this error message:

"ind1 was estimated with a nonstandard vce (robust)"


I found that this is because fracreg by default uses vce(robust), while suest does not permit vce(robust), nor vce(jackknife) or vce(cluster) - the other vce options available with fracreg.
I was wondering sure if there was a way to run fracreg by changing the robust option, or an alternative way to compare the coefficients.

Regards,

Joseph Bakker

reshaping complex panel data

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 _AIHWperiod int SA3 str20 _AIHWgeoname str61 _AIHWservice str11 _AIHWdemo str42 _AIHWname double _AIHWvalue
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Per cent of people who had the service (%)" 4.79
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Services per 100 people" 21.27
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Medicare benefits per 100 people ($)" 2058
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "No. of patients" 3467
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "No. of services" 15390
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Total Medicare benefits paid ($)" 1489193
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Total provider fees ($)" 1708162
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Per cent of people who had the service (%)" 1.39
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Services per 100 people" 6.81
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Medicare benefits per 100 people ($)" 840
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "No. of patients" 1004
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "No. of services" 4928
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Total Medicare benefits paid ($)" 607614
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Total provider fees ($)" 663148
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Per cent of people who had the service (%)" .63
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Services per 100 people" 2.5
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Medicare benefits per 100 people ($)" 192
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "No. of patients" 453
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "No. of services" 1811
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Total Medicare benefits paid ($)" 139200
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Total provider fees ($)" 162119
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Per cent of people who had the service (%)" 3.01
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Services per 100 people" 11.96
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Medicare benefits per 100 people ($)" 1026
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "No. of patients" 2175
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "No. of services" 8651
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Total Medicare benefits paid ($)" 742379
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Total provider fees ($)" 882895
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Per cent of people who had the service (%)" 31.93
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Services per 100 people" 43.3
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Medicare benefits per 100 people ($)" 2069
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "No. of patients" 23100
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "No. of services" 31329
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Total Medicare benefits paid ($)" 1496811
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Total provider fees ($)" 1539638
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Per cent of people who had the service (%)" 5.56
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Services per 100 people" 14.96
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Medicare benefits per 100 people ($)" 799
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "No. of patients" 4020
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "No. of services" 10824
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Total Medicare benefits paid ($)" 578384
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Total provider fees ($)" 631516
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Per cent of people who had the service (%)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Services per 100 people" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Medicare benefits per 100 people ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "No. of patients" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "No. of services" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Total Medicare benefits paid ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Total provider fees ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Per cent of people who had the service (%)" .77
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Services per 100 people" 1.19
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Medicare benefits per 100 people ($)" 63
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "No. of patients" 555
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "No. of services" 863
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Total Medicare benefits paid ($)" 45884
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Total provider fees ($)" 54623
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Per cent of people who had the service (%)" .03
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Services per 100 people" .07
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Medicare benefits per 100 people ($)" 5
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "No. of patients" 25
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "No. of services" 52
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Total Medicare benefits paid ($)" 3881
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Total provider fees ($)" 6485
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Per cent of people who had the service (%)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Services per 100 people" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Medicare benefits per 100 people ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "No. of patients" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "No. of services" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Total Medicare benefits paid ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Total provider fees ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Per cent of people who had the service (%)" 4.56
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Services per 100 people" 12.85
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Medicare benefits per 100 people ($)" 684
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "No. of patients" 3296
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "No. of services" 9297
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Total Medicare benefits paid ($)" 494529
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Total provider fees ($)" 528365
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Per cent of people who had the service (%)" .06
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Services per 100 people" .17
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Medicare benefits per 100 people ($)" 12
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "No. of patients" 41
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "No. of services" 124
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Total Medicare benefits paid ($)" 8507
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Total provider fees ($)" 15561
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Per cent of people who had the service (%)" 5.08
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Services per 100 people" 16.03
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Medicare benefits per 100 people ($)" 843
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "No. of patients" 3678
end

Hi,
I am trying to eventually merge this dataset to my master dataset using "SA3" variable. To do this, I am aware that I need to reshape this long dataset to wide so I can have 1 line per SA3 code. There are 6 SA3 codes in total eg;
SA3 | Freq. Percent Cum.
------------+-----------------------------------
10104 | 1,232 16.67 16.67
10701 | 1,232 16.67 33.33
10703 | 1,232 16.67 50.00
10704 | 1,232 16.67 66.67
11401 | 1,232 16.67 83.33
11402 | 1,232 16.67 100.00
------------+-----------------------------------
Total | 7,392 100.00

As you can see, this dataset is complex as there are 7 variables in total, however per SA3 area, over 2 time periods (2016-17, 2017-18), there are various services used and these are also split by a demographic variable. Essentially Im trying to get 1 line of all these variables per SA3. I have tried many codes, however keep getting this error. Any help would be much appreciated!
reshape wide _AIHWvalue, i(SA3) j(_AIHWperiod) string
(note: j = 2016-17 2017-18)
values of variable _AIHWperiod not unique within SA3
Your data are currently long. You are performing a reshape wide. You specified i(SA3) and j(_AIHWperiod). There are observations within
i(SA3) with the same value of j(_AIHWperiod). In the long data, variables i() and j() together must uniquely identify the observations.




Logit model for estimate Demand (Berry 1944)

Hello everybody

I have the following excercise to estimate Demand allowing for heterogeneal preference shocks:

Supermarkets market

Time (m) = 5
Area (i) = 18
Branch for each Area (h) = 199. Adding the outside option 200
3 different firms
distance= distance to the city center
employees= number of the employees in the branch


Utility funtion of the customers:

U(ihm) = β1income(im) + β2Firm1(hm) + β3Firm2(hm) + β4employees(hm) + β5distancia(ih) + ε(ihm)

As you can note the U function doesn´t include price because one assumption is that every supermarket sell at the same price

I also have the variable Area Share(ihm), which represent the market share of the Branch(h) in the Area(i) for the period of time (m)



If someone have experience estimating that kind of models it would be really helpfull.

Thanks,
Manuel.

Tuesday, September 29, 2020

Deciding equation when analyzing by ppml

Hello
My data is panel data with strongly balanced . As I want to know the effect of lpi on export and import I have chosen export and import as dep vars and lpi, gdp, distance and dummy as indep vars. The summaries of my data as follow as
Code:
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      export |      1,328     833.446    3024.465          0   41549.71
      import |      1,328     837.313    3961.817          0   58532.57
    distance |      1,328    8860.618    4304.137    478.553   19228.99
         gdp |      1,328     4523.65    16524.65    1.97454   196236.7
  landlocked |      1,328    .2108434    .4080611          0          1
-------------+---------------------------------------------------------
        lpi0 |      1,262    2.879012    .5787635   1.598322   4.225967
        lpi1 |      1,262    2.691696    .5987479   1.111111    4.20779
        lpi2 |      1,262    2.754088    .6829058   1.237654   4.439356
        lpi3 |      1,262    2.846396    .5248384   1.362654      4.235
        lpi4 |      1,262    2.828908     .608916   1.394253    4.31065
-------------+---------------------------------------------------------
        lpi5 |      1,262    2.886015    .6297591   1.513605   4.377678
        lpi6 |      1,262    3.253649    .5854234   1.665079   4.795714
Because the cases with the export or import value is equal to 0 account for about 18% of total obs so I decided to use pplm to analyse. My equation becomes like this
Ex= a ln(gdp) + b ln(dis) + c ln( lpi) + e. dummy ( landlocked)
But there are some missing data on lpi because in some years in some specific countries , LPI were not collected
Code:
 
gen ll1=ln(lpi1)
(66 missing values generated)
So I wonder whether my equation is suitable or not. If not what is equation should I use?
Please give me advice
Thank so much

Non integer weights - problem

Hi.
I'm using weights from the European social Survey which have three different weights: design, population and post estimation weights. I really don't know what kind of weights they are (if fw, or aw or pw). I simply tried to graph (histogram) a variable ( worry about climate change) using weights but stata said: "may not use noninteger frequency weights". I'm aware of that would mean that my weights are not integer and that I need frequency weight, but the question is: can i transform my non integer weights in integer ones?
i attach a brief doc that explains the ESS database's weights. (http://www.europeansocialsurvey.org/...ing_data_1.pdf)
Thank you very much
Gab

Mark Highest Recurring Observation

Dear All,

Hope you are well.

I am wanting to generate a variable that marks the highest recurring violation type for each business.

There are cases when one business has multiple violation repetitions of the same maximum number so I am unable to determine a way as to how I should deal with these.

For example business id 27 has 4 observations and 2 violations repeating 2 times. This means one business has 2 violation types but having the same maximum count.

Your help will be appreciated. business_id is the id of business, violation_type is the type of violation that business made and violation_rpt is how many times the business repeated that violation.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long business_id byte violation_type float violation_rpt
26  1  2
26  1  2
26  4  1
26  7  3
26  7  3
26  7  3
26 13  5
26 13  5
26 13  5
26 13  5
26 13  5
27  7  2
27  7  2
27 13  2
27 13  2
28  1  1
28  3  1
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  4 13
28  9  1
28 11  1
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
28 13 18
29  4  4
29  4  4
29  4  4
29  4  4
29  7  1
29  9  6
29  9  6
29  9  6
29  9  6
29  9  6
29  9  6
29 11  1
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
29 13 16
30  1  1
30  3  1
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  4 12
30  7  3
30  7  3
30  7  3
30  9  1
30 10  2
30 10  2
30 11  5
30 11  5
end
label values violation_type violation_type
label def violation_type 1 "Adulteration", modify
label def violation_type 3 "Unhygienic Items", modify
label def violation_type 4 "Uncleanliness", modify
label def violation_type 7 "Overpricing", modify
label def violation_type 9 "Incorrect Weights & Measures", modify
label def violation_type 10 "Non availability of price list", modify
label def violation_type 11 "Violation of regulations", modify
label def violation_type 13 "No Violation", modify

How to make mother education variable for each observation in a large data

i have data in the form of
Parent key KEY indvidual_number_ in_ roaster completed education mother line number in roaster

PARENT_KEY KEY_ A1_1 education_completed A10_1
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-1 1 higher .
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-2 2 secondary .
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-3 3 higher 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-4 4 higher 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-5 5 secondary 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-6 6 primary 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-7 7 none/pre-shool 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-8 8 none/pre-shool 2
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-1 1 primary .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-2 2 none/pre-shool .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-3 3 none/pre-shool 2
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-4 4 primary .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-5 5 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-6 6 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-7 7 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-8 8 primary 2

here A10_1 present the line number on which real mother of an individual lies and dot values for those whose mother is not present in roaster. i want to create mother education variable.

How to use a Cox regression model for time-to-event analysis using different years as control

Good day

Background: I am trying to conduct a retrospective epidemiological study looking at the incidence of a certain disease (on admission) in one year (2020) compared to another (2019).
In this case the year would be the exposure variable (2019 vs. 2020), the outcome being the incidence of disease.

Data: Retrospective admissions data for a pre-defined population between the period 23/03/2020 to 01/08/2020 was collected. This totals 182 admissions with 7 incidences of the disease on admission (1 readmission).
Data on the time period of 23/03/2019 to 01/08/2019 was also collected. This amounts to 218 admissions with 17 incidences of the disease on admission (1 readmission)

Analysis plan: Use stset command for dataset and perform Cox regression, taking into account the reoccurrence (readmissions) of the disease and non-reoccurrence to perform a time-to-event comparison between 2020 and 2019 (confounders to be adjusted for).

Problem: These are the same time-periods (start 23/03 and end 01/08) but in different years. Can I compare datasets using stset for different date start points (23/03/2020 vs 2019/03/2020).

My (rather crude) solution was to simply use 23/03/2020 as the start date for both years (since the time between 23/03/2020-01/08/2020 and 23/03/2019-01/08/2020 are the same: 131 days),
and to create a new variable for year as exposure using 2020 and 2019 respectively, and compare time-to-event this way.

Thank you kindly for your help.



Redirecting All Ado Paths to New Drive & Folder

At some unfortunate time, I named my partitioned drive with the letter "B". I am setting up a new computer and IT requires me to use C: (without the partition). I will have move all of the B drive data into a new folder on C. So now all my .do files will be pointing to an obsolete path when trying to call data/ado etc. E.g., a file previously stored as "B:\project_a\data\dataset1" might now be "C:\db\project_a\data\dataset1".

Although some of my .do files start with a global directory declaration at the top of the file, which could be changed, many do not. Therefore many .do files have files used for appending, merging, and other .do files preceded with the path to be used preceding the command.

E.g.,

Code:
cd "B:\project_a\data"
use dataset1, clear
or

Code:
cd "B:\project_a\data"
merge 1:1 id using dataset2
Respectively these would need to be changed to:

Code:
cd "C:\db\project_a\data"
use dataset1, clear
and

Code:
cd "C:\db\project_a\data"
merge 1:1 id using dataset2
Is there any hope of resolving this problem in some find and replace bulk method?

Thanks, in advance,

Ben

Time series operators not allowed?

Dear All,
I would like to create a 5 month *follow up period* after a specific variable takes a value of 1. My data looks as follows:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(id month) float followup
52   1 .
52   2 .
52   3 .
52   4 .
52   5 .
52   6 .
52   7 .
52   8 .
52   9 .
52  10 .
52  11 .
52  12 .
52  13 .
52  14 .
52  15 .
52  16 .
52  17 .
52  18 .
52  19 .
52  20 .
52  21 1
52  22 .
52  23 .
52  24 .
52  25 .
52  26 .
52  27 1
52  28 .
52  29 .
52  30 .
52  31 .
52  32 .
52  33 .
52  34 .
52  35 .
52  36 .
52  37 .
52  38 .
52  39 .
52  40 .
52  41 .
52  42 .
52  43 .
52  44 .
52  45 .
52  46 .
52  47 .
52  48 1
52  49 .
52  50 .
52  51 .
52  52 .
52  53 .
52  54 .
52  55 .
52  56 .
52  57 .
52  58 .
52  59 .
52  60 .
52  61 .
52  62 .
52  63 .
52  64 .
52  65 .
52  66 .
52  67 .
52  68 .
52  69 .
52  70 .
52  71 .
52  72 .
52  73 .
52  74 .
52  75 .
52  76 .
52  77 .
52  78 .
52  79 .
52  80 .
52  81 .
52  82 .
52  83 .
52  84 .
52  85 .
52  86 .
52  87 .
52  88 .
52  89 1
52  90 .
52  91 .
52  92 .
52  93 .
52  94 .
52  95 .
52  96 .
52  97 .
52  98 .
52  99 .
52 100 .
end
So each time the variable followup==1, I want to replace F1.followup-F5.followup==1. For instance, after followup==1 in month 21, I would like to replace followup in months 22-26 with 1 as well. But I get an error when I try to do this:

Code:
 bysort id: replace F1.followup=1 if followup==1
factor variables and time-series operators not allowed
r(101);
I am not sure why time series operator is not working in this case as it seems to work otherwise. I will be grateful for your help.
Sincerely,
Sumedha.

Dropping all companies with no observation at the start of the time period

Hi Statalist,

I have a database of a lot of companies for 7 years: 01-2013 to 01-2019. However, in order to calculate certain variables, I first need to be sure the data of every company starts in 01-2013. And the data for some companies start in 2015 for example. Could you please tell me the code to drop all companies that do not start on 01-2013?

Sidenote: I know deleting all companies that don't make it till the end would cause survivorship-bias, but deleting all funds of which there is no data at the beginning of the period will not do any harm right?

Thank you in advance.

Best regards,
Tom Reinders

Solve a system of equations

Hello everybody!
I would like to solve in Stata ( or maybe Mata? I never used it) this system of equations:

99-x=(4999-y)*0.0198
99-x=(2256.293-z)*0.0438
x+y+z=1491.293

the number of unknows could also be higher but the concept is always the same, where 99-x is equal to something minus another variable multiplied by something else and then that all unknows summed up give a certain value


I would really appreciate a help!

Thank you in advance

I need help on merging two datasets.

Hello,
I am having trouble merging two datasets for my thesis. To reduce clutter, I have only included 4 different variables. 'gvkey' and 'fyear' are the identifiers for these datasets and 'debt' and 'PrincipalAmtDbtOutstanding' are used to check if the sets are merged correctly (they are roughly the same).
I would like the datasets to merge on gvkey and fyear. If a certain gvkey is missing observations in fyear I would like STATA to create a missing value for either 'debt' or 'PrincipalAmtDbtOutstanding'.

As can be seen from the datasets below, there are more observations for gvkey and fyear in the first dataset than in the second dataset.
I have tried a 1:1 merge, a 1:m merge and a m:1 merge, but they all give the same error code: "variables gvkey fyear do not uniquely identify observations in the master data" r(459).

Thanks in advance!
Kind regards,

Maks van Noort

gvkey fyear debt
001166 2014 0
001166 2015 0
001166 2016 0
001166 2017 0
001166 2018 0
008546 2014 4104
008546 2015 5760
008546 2016 5606
008546 2017 3697
008546 2018 3927
010846 2014 12372
010846 2015 14519
010846 2016 16410
010846 2017 24009
010846 2018 24483
013145 2014 6617
013145 2015 8630
013145 2016 8515
013145 2017 7331
013145 2018 7509
013556 2014 1576
013556 2015 1536.3
013556 2016 1545.1
013556 2017 1660.8
013556 2018 2729.7
013683 2014 56150
013683 2015 56735
013683 2016 56842
013683 2017 52594
013683 2018 52304
013932 2014 65.149
013932 2015 62.781
013932 2016 1875.368
013932 2017 1896.965
013932 2018 1919.5


and


gvkey fyear PrincipalAmtDbtOutstanding
001166 2016 0
001166 2017 0
008546 2014 4135
008546 2015 5796
008546 2016 5637
008546 2017
008546 2018
010846 2014
010846 2014
010846 2016
010846 2017
010846 2018
013145 2014 6617
013145 2015
013145 2016 8515
013145 2017 7331
013556 2014 1559.200000000000045
013556 2015
013556 2016
013556 2017 1660.799999999999955
013683 2014 56769
013683 2015 56734
013683 2016
013683 2017 52707
013932 2014
013932 2014
013932 2015
013932 2016 1939.70900000000006

Choosing between OLS , RE, FE

Hello
I am analyzing the effect of LPI on export and import trade ( using panel data) when I ran OLS, the result was below
Code:
.  reg lex lgdp dis ll0 landlocked, cluster(country1)
note: landlocked omitted because of collinearity

Linear regression                               Number of obs     =        156
                                                F(3, 19)          =     125.91
                                                Prob > F          =     0.0000
                                                R-squared         =     0.8903
                                                Root MSE          =     .57966

                              (Std. Err. adjusted for 20 clusters in country1)
------------------------------------------------------------------------------
             |               Robust
         lex |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lgdp |   .8434597    .074269    11.36   0.000      .688013    .9989065
         dis |  -.7162724   .0859011    -8.34   0.000    -.8960654   -.5364794
         ll0 |   1.771018    .588909     3.01   0.007     .5384175    3.003619
  landlocked |          0  (omitted)
       _cons |   3.902149   1.193095     3.27   0.004     1.404974    6.399325
------------------------------------------------------------------------------
The result shows that LPI has effect on export with statistically significant at 1% level
But when I ran RE and Fe the result was very different with OLS' result . All in RE and Fe, the result showed that LPI don't have any effect on export. More over , the result of F-test in fe and the p value Hausman test showed that Fe was best choice
Code:
. xtreg lex lgdp dis ll0 landlocked,fe
note: dis omitted because of collinearity
note: landlocked omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        156
Group variable: country1                        Number of groups  =         20

R-sq:                                           Obs per group:
     within  = 0.2828                                         min =          4
     between = 0.7417                                         avg =        7.8
     overall = 0.6698                                         max =          8

                                                F(2,134)          =      26.41
corr(u_i, Xb)  = -0.9032                        Prob > F          =     0.0000

------------------------------------------------------------------------------
         lex |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lgdp |   2.068337   .2848885     7.26   0.000     1.504877    2.631797
         dis |          0  (omitted)
         ll0 |   -.861858   1.238365    -0.70   0.488    -3.311128    1.587412
  landlocked |          0  (omitted)
       _cons |  -9.631277   2.850162    -3.38   0.001     -15.2684   -3.994153
-------------+----------------------------------------------------------------
     sigma_u |  2.2327644
     sigma_e |   .4243563
         rho |  .96513702   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(19, 134) = 27.17                    Prob > F = 0.0000
Code:
. . xtreg lex lgdp dis ll0 landlocked,re
note: landlocked omitted because of collinearity

Random-effects GLS regression                   Number of obs     =        156
Group variable: country1                        Number of groups  =         20

R-sq:                                           Obs per group:
     within  = 0.2616                                         min =          4
     between = 0.9389                                         avg =        7.8
     overall = 0.8841                                         max =          8

                                                Wald chi2(3)      =     278.22
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
         lex |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lgdp |   .9661854   .0729139    13.25   0.000     .8232767    1.109094
         dis |  -.7608128   .1180357    -6.45   0.000    -.9921585    -.529467
         ll0 |   .7772783   .7426183     1.05   0.295    -.6782268    2.232783
  landlocked |          0  (omitted)
       _cons |   4.390683   1.316084     3.34   0.001     1.811206     6.97016
-------------+----------------------------------------------------------------
     sigma_u |  .43224225
     sigma_e |   .4243563
         rho |  .50920534   (fraction of variance due to u_i)
------------------------------------------------------------------------------
Code:
. hausman fe re

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |       fe           re         Difference          S.E.
-------------+----------------------------------------------------------------
        lgdp |    2.068337     .9661854        1.102151        .2753998
         ll0 |    -.861858     .7772783       -1.639136        .9909921
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       19.30
                Prob>chi2 =      0.0001
What method should I use? Can you give me some advice?
Thanks so much

summarizing in Stata

Hi, I´m a rookie in using Stata and I am stuck at this point. I have an issue using the sum function. I have a data set of 1156 observations and I have encoded my data from string variables to numeric (long) variables. When opening the data editor I therefore now have the first variables in a kolonne with string values (colored yellow) and another new kolonne of generated numeric values (colored blue). As of earlier experience the data should be colored white (?).

The numeric variable I now have called "nbitprice" is encoded by using the following command: -encode bitprice, gen(nbitprice)- because they were recognized as strings

The problem is when I am running the -sum- command on nbitprice I am not getting the mean of the values in the observations which have a range from 3,000 to 19,000 in value. Instead I get the mean or median of the number of observations, meaning I get 577,1427 when having 1156 observations. What I want is the mean of the values for each observations over time. I hope I am explaining myself good enough.

When I list the observations there are values for each observation.

As reading of some earlier posts you would probably like som info:

. describe nbitprice

storage display value
variable name type format label variable label
--------------------------------------------------------------------------
nbitprice long %9.0g nbitprice
Bitprice

. count
1,156

. summarize nbitprice, detail

Bitprice
-------------------------------------------------------------
Percentiles Smallest
1% 12 1
5% 58 2
10% 116 3 Obs 1,156
25% 288.5 4 Sum of Wgt. 1,156

50% 577.5 Mean 577.1427
Largest Std. Dev. 332.9293
75% 865.5 1150
90% 1038 1151 Variance 110841.9
95% 1096 1152 Skewness -.0003597
99% 1142 1153 Kurtosis 1.79956


Can someone explain what I need to do to get the summarized results I need? I would like to get the mean of the actual value of the 1156 different observations, the standard deviation, min and max value.

Thank you for your help in advance.



Comparing categorical variables over time in a randomised cluster trial

Good morning everyone, hope you're all well - and hope that you can help with some confusion.

I have a dataset where I am looking at a categorical variable describing monthly household income, and the dataset has two sources of clustering - over time, and from the randomisation procedure (cluster randomised trial, at health clinic level). The same people have responded at baseline and at follow-up. The dataset is in long format. I want to look at whether the stated household income has changed between baseline and follow-up. I'm working my way through the 'multilevel and longitudinal modeling using stata manual', and am having difficulty finding the relevant section and code that would take into account the clustering over time and account for clustering at the clinic level as well. Can anyone help with suggesting code for examining whether there has been a change in how respondents have answered the household income category between baseline and endline?

My variables are as follows:

hhincomecat:
0 "0-2,000 rand"
1 "2,000-5,000 rand"
2 "5,000-50,000 rand"

time:
0 Baseline
1 Post-lockdown

Health clinic:
categories 1-12 with name of clinic


Happy to give further information. Thanks in advance.

Drop duplicate quarterly dates

Hi,

I am working with a dataset which consisted of monthly obeservations of two variables m1 and m3. I have now converted these montly dates to quarterly dates using
Code:
 gen tq=qofd(dofm(mdate))
I then used
Code:
 bys tq : egen m_m3 = mean(m3)
and
Code:
 bys tq : egen m_m1 = mean(m1)
to generate quarterly means of m1 and m3.

Now I am left with this
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(m1 m3) float(mdate tq m_m3 m_m1)
675257 1010626 469 156 1017618.7 675961.3
673473 1013138 470 156 1017618.7 675961.3
671683 1027389 471 157 1052953.6 685409.3
686539 1061767 472 157 1052953.6 685409.3
698006 1069705 473 157 1052953.6 685409.3
689486 1063959 474 158 1081558.4   700371
698217 1088696 475 158 1081558.4   700371
713410 1092020 476 158 1081558.4   700371
733974 1117443 477 159   1113815   741235
737779 1119119 478 159   1113815   741235
751952 1104883 479 159   1113815   741235
end
format %tm mdate
format %tq tq
And the only thing left to do is to drop the duplicate quarterly variables, and of course the corresponding variables of these quarters. That is I want to drop observation 2, 3, 5, 6, 8, 9, ... , 305, 306"

I have been experimenting with
Code:
 list tq if mod(_n,2)
but I only mangages to list every 2nd, 3rd etc, observation.

putexcel command - error

Hi all

I have a question about the putexcel command.
When I run the code, sometimes I get the following error message:


file C:\...\reports.xls could not be saved
r(603)


Now, my code is something like the following:

foreach ctr in `countries' {
foreach yrs in `years' {
putexcel set "${path_reports}\report_`ctr'_`yrs'.xls", sheet("reports_`ctr'_`yrs'") replace
putexcel A1=("title A1")
[and so on and so forth]
}
}

The "stage" thing is that sometimes the codes works without problems, and sometimes it stops with the above-mentioned error message.
What am I doing in the wrong way?

Thank you all in advance!!

Cumulative event duration with repeated events as a function of follow up time

Dear STATAlist

I have a dataset with a starting date (different for each id), and different events which can occur repeatedly. What I am interested in, is the cumulative time duration of each event as a function of follow up time. In the end, this would result in a graph which shows follow up time on the x-axis, and cumulative time duration of each type of event (in this case being hospitalized) over the entire population on the y-axis.

Example based on code below

For event1, ID 8 has the first occurence of the event 16 days after F/U start., with only one day of duration (start and end at same day). So up to day = 15 the cumulative event duration for the entire population would be zero, after this 1. This would remain 1 until after 27 days ID 5 experiences an event with duration of 2 days. So at t = 27 cumulative event duration would be 2, and at t=28 would be 3, and so on.

In this case I present 2 types of events (event1_x and event2_x), there are more. I would like to calculate and visualize cumulative event durations at specific time points (365 days, 730 days etc) and present this in a graph a) separately for each event and b) cumulative over multiple events.

Thank you

Kevin Damman

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double id long(date_fu_start date_start_event1_1 date_end_event1_1 date_start_event1_2 date_end_event1_2 date_start_event2_1 date_end_event2_1 date_start_event2_2 date_end_event2_2)
 1 21024 21722 21728 21954 21957 22057 22064     .     .
 2 19323     .     .     .     .     .     .     .     .
 3 19340     . 19340 20927 20927     .     .     .     .
 4 19558     .     .     .     . 19649 19668 19866 19870
 5 19852 19879 19880 20231 20235     .     .     .     .
 6 19890     .     .     .     .     .     .     .     .
 7 20303 20509 20509     .     . 20328 20359 20425 20425
 8 20493 20509 20509     .     .     .     .     .     .
 9 20521     .     .     .     . 21051 21115     .     .
10 21767     .     .     .     .     .     .     .     .
end
format %tdD_m_Y date_fu_start
format %tdD_m_Y date_start_event1_1
format %tdD_m_Y date_end_event1_1
format %tdD_m_Y date_start_event1_2
format %tdD_m_Y date_end_event1_2
format %tdD_m_Y date_start_event2_1
format %tdD_m_Y date_end_event2_1
format %tdD_m_Y date_start_event2_2
format %tdD_m_Y date_end_event2_2

Monday, September 28, 2020

Time specification


Dear colleagues,

I am working on 15 years of repeated cross-sectional data. I was wondering whether it is an appropriate strategy to include all interaction terms with higher order time?

reg Y X1##c.T X1##c.Tsq X1##c.Tcub Xk##c.T Xk##c.Tsq Xk##c.Tcub ...

Some previous studies just simply used a linear specification, while others also include cubic terms. Suppose that we have 10 IVs, and I am particularly interested in the influences of X1 variables on Y over time. If we control time-varying effects of all other covariates including cubic terms, isn't it over-control? What is the recommended strategy for time specification? If you know a good reference, please share with me. Appreciate that.






Fillin/expand with panel and different dates

Hi All, please, could someone help me?
I have the data below:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id x1 x2) str10 date
1 1 10 "24/09/2013"
1 1 12 "25/09/2013"
2 1 13 "24/09/2013"
2 1 15 "25/09/2013"
3 2 12 "05/10/2014"
3 2 17 "06/10/2014"
4 3 10 "05/10/2014"
4 3  9 "06/10/2014"
5 3  8 "05/10/2015"
5 3 12 "06/10/2015"
end

I need to create missing values for each id corresponding to 3 days before its date and 3 days after its date. How can I do that, please? So each id would have 2 obs with x1 non missing and 6 obs with x1 missing.

Many thanks. :-)

Inclusion of both Age and Time Indicators in Panel Data Analysis

Dear Colleagues,

I am analyzing a panel data that collects information on children every two years since 2010, so the panel data has a total of five waves (2010,12,14,16,18). I want to know the association of family structure (especially parental absence) on child's mental health (whether child (age>=10) are depressed or not). I am using xtlogit command with either re / fe option to run random and fixed effect models.
Besides family structure variables, I have included child's age and its quadratic term as the independent variables, and both are significant.
However, when I included the panel indicator dummies (cfps_wave: panel year indicators 2012, 2014, 2016, 2018), the coefficients and significant levels for the age and the family structure variables changed considerably.

Code:
. xtlogit depress2cat ib1.race_han_x c.age_self_x##c.age_self_x ib3.tz_4cat i.region3cat, fe nolog or
PHP Code:
-------------------------------------------------------------------------------------------
              
depress2cat Odds Ratio   StdErr.      z    P>|z|     [95ConfInterval]
--------------------------+----------------------------------------------------------------
               
race_han_x |
      
minority ethnicity  |          1  (omitted)
               
age_self_x |   .6243016   .0433709    -6.78   0.000     .5448294    .7153661
                          
|
c.age_self_x#c.age_self_x |   1.011407   .0022313     5.14   0.000     1.007043    1.015789
                          
|
                  
tz_4cat |
       
no parent at home  |   1.187886   .1047134     1.95   0.051      .999403    1.411917
       only mama at home  
|   1.022447   .1331064     0.17   0.865     .7921873    1.319635
       only baba at home  
|    .890598   .1651428    -0.62   0.532     .6192189    1.280912
                          
|
               
region3cat |
                 
Central  |   .5709849   .2092956    -1.53   0.126     .2783652    1.171208
             West Region  
|   1.141726   .3581595     0.42   0.673     .6173623    2.111465
------------------------------------------------------------------------------------------- 
PHP Code:
-------------------------------------------------------------------------------------------
              
depress2cat Odds Ratio   StdErr.      z    P>|z|     [95ConfInterval]
--------------------------+----------------------------------------------------------------
               
race_han_x |
      
minority ethnicity  |          1  (omitted)
               
age_self_x |   .7124481   .0749088    -3.22   0.001     .5797696    .8754895
                          
|
c.age_self_x#c.age_self_x |   1.010447   .0022359     4.70   0.000     1.006074    1.014839
                          
|
                  
tz_4cat |
       
no parent at home  |    1.06177    .142238     0.45   0.655      .816584    1.380575
       only mama at home  
|   1.028602   .1349585     0.21   0.830     .7953619     1.33024
       only baba at home  
|   .8869098   .1651149    -0.64   0.519     .6157611    1.277458
                          
|
                
cfps_wave |
                    
2012  |    .552281   .0923899    -3.55   0.000     .3978913    .7665769
                    2014  
|   .4985043   .1642456    -2.11   0.035     .2613471    .9508676
                    2016  
|   .4897406    .244647    -1.43   0.153     .1839727    1.303704
                    2018  
|   .4146053   .2762311    -1.32   0.186     .1123365    1.530202
                          
|
               
region3cat |
                 
Central  |   .5780685   .2126524    -1.49   0.136     .2810931    1.188799
             West Region  
|   1.178924   .3724338     0.52   0.602     .6347208    2.189721
------------------------------------------------------------------------------------------ 
My question is: should I include the panel year indicators in the model or not? I know that including time trend is important. As children get older, their mental health state will change. but since I have already included age, should I also need to include survey year indicators? Then the effect of survey year indicators may be spurious. I would be glad if you can give me some advice or references on this issue.