Hi all,
I am fitting a Cox model with shared frailty and I hope to get the (pseudo) R squared for my model. Here is my code:
stcox x1 x2 x3 x4, shared(ID)
display e(r2_p)
Nothing comes out.
However, if I take out the shared command, I can get an output of (pseudo) R squared.
stcox x1 x2 x3 x4
display e(r2_p)
.05290264
Can anyone help with this problem? Is it even possible to get the (pseudo) R squared in a Cox model with shared frailty?
Thanks!
Jasmine
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Wednesday, September 30, 2020
Tabulation of percentages for an outcome variable by gender for each racial group
Hi,
I would please like some help with a tabulation issue I am having. So, I have about 10 causes of death and I would like the percentages in a table by gender for each race. So for example, the table would contain 5 columns if I am considering 2 races. The first column would be the causes of death and the 2nd column would be a percentage for each cause of death for males of race 1, then 3rd column would be %age for each cause of death for females of race 1, 4th column would be % for each COD for males of race 2 and the 5th column would be % for each COD for females of race 2.
Thank you
I would please like some help with a tabulation issue I am having. So, I have about 10 causes of death and I would like the percentages in a table by gender for each race. So for example, the table would contain 5 columns if I am considering 2 races. The first column would be the causes of death and the 2nd column would be a percentage for each cause of death for males of race 1, then 3rd column would be %age for each cause of death for females of race 1, 4th column would be % for each COD for males of race 2 and the 5th column would be % for each COD for females of race 2.
Thank you
How to add multiple regression lines to a marginsplot graph?
Hello,
I would like to use marginsplot to show many regressions on the same graph. My dependent variable is a scale (0, .5,1, 1.5, 2, 2.5, 3). I would like to graph separate race/sex pairs. I can show the probability of being at each point in the scale by using if statements.
The code produces this graphArray
However, when I try to use the following code, I can no longer see the scale on the x-axis
Now the scale is no longer on the x-axis
Array
I would greatly appreciate and help or thoughts you can provide.
I would like to use marginsplot to show many regressions on the same graph. My dependent variable is a scale (0, .5,1, 1.5, 2, 2.5, 3). I would like to graph separate race/sex pairs. I can show the probability of being at each point in the scale by using if statements.
Code:
ologit chinese_scale age educ ib2.pid if black==1 & woman ==1 margins marginsplot, noci
However, when I try to use the following code, I can no longer see the scale on the x-axis
Code:
ologit chinese_scale age educ ib2.pid i.birace i.woman margins woman#birace marginsplot, noci
Array
I would greatly appreciate and help or thoughts you can provide.
Crosstabulation Question: Options row vs. col
I know this is a really basic question, but the logic confuses me every time I do crosstabs - even within a few weeks of the last time I did it. Can anyone recommend a trick, like a mnemonic device or a pattern, that helps with knowing:
1) when to use (ex) tab var1 var2, row (instead of tab var1 var2, col)
2) how to interpret the results (of ex, tab var1 var2, row or tab var1 var2 col)
I swear I'm not stupid; I just really get my logic inverted when I look at these tables. If anyone has any advice on how to keep track of when to use & how to interpret row vs column crosstabs, it would be a huge help.
Thank you,
Tatiana
1) when to use (ex) tab var1 var2, row (instead of tab var1 var2, col)
2) how to interpret the results (of ex, tab var1 var2, row or tab var1 var2 col)
I swear I'm not stupid; I just really get my logic inverted when I look at these tables. If anyone has any advice on how to keep track of when to use & how to interpret row vs column crosstabs, it would be a huge help.
Thank you,
Tatiana
Which is the correct approach in coding a dummy variable
Hi Statalist.
I want to generated a dummy variable from a categorical variable with values ranging '0-10'. The range '0-2' is nil to low and '3-10' is mid-high. I note that I have two categorical variables: one relates to responses by husbands and the other by wives(relimp1 - importance for husband, relimp2 - importance for wife):
However as you can see below, "0" was given when relimp1 or relimp2 were 'missing', so I tried:
which provided "1" when true, "0" when false, and "." when missing - which is what I thought I should get. Based on my reading of https://www.stata.com/support/faqs/d...rue-and-false/ I thought the first piece of code would have given me this outcome.
Given the first piece of code has considerably more "0" than the second piece of code, I believe I should go with the second piece of code (imp4). Am I reading too much into this? Help is appreciated.
Am I correct in my understanding that
Stata 15.1
Note this was originally posted at https://www.statalist.org/forums/for...=1601514760045 though resposted as nature of question differs from that thread.
I want to generated a dummy variable from a categorical variable with values ranging '0-10'. The range '0-2' is nil to low and '3-10' is mid-high. I note that I have two categorical variables: one relates to responses by husbands and the other by wives(relimp1 - importance for husband, relimp2 - importance for wife):
Code:
gen byte imp2 = inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) & relimp1 < . & relimp2 < .
Code:
gen byte imp4 = 1 if inrange(relimp1, 3, 10) & inrange(relimp2, 3, 10) & relimp1 < . & relimp2 < . replace imp4 = 0 if (relimp12 == 1 & relimp22 == 1) | (relimp12 == 1 & inlist(relimp22, 2, 3)) | (inlist(relimp12, 2, 3) & relimp22 == 1)
Given the first piece of code has considerably more "0" than the second piece of code, I believe I should go with the second piece of code (imp4). Am I reading too much into this? Help is appreciated.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave relimp1 relimp2 imp2 imp4) 106 1002 10 . . 0 . 106 1002 11 . . 0 . 106 1002 12 . . 0 . 106 1002 13 . . 0 . 106 1002 14 0 0 0 0 106 1002 15 . . 0 . 106 1002 16 . . 0 . 106 1002 17 . . 0 . 106 1002 18 0 0 0 0 108 109 1 . . 0 . 108 109 2 . . 0 . 108 109 3 . . 0 . 108 109 4 5 6 1 1 108 109 5 . . 0 . 108 109 6 . . 0 . 108 109 7 . 5 0 . 103 104 1 . . 0 . 103 104 2 . . 0 . 103 104 3 . . 0 . 103 104 4 10 10 1 1 103 104 5 . . 0 . 103 104 6 . . 0 . 103 104 7 10 10 1 1 103 104 8 . . 0 . 103 104 9 . . 0 . 103 104 10 10 10 1 1 103 104 11 . . 0 . 103 104 12 . . 0 . 103 104 13 . . 0 . 103 104 14 10 10 1 1 103 104 15 . . 0 . 103 104 16 . . 0 . 103 104 17 . . 0 . 103 104 18 10 10 1 1 end
Code:
! missing(relimp1, relimp2) is the same as relimp1 < . & relimp2 < .
Note this was originally posted at https://www.statalist.org/forums/for...=1601514760045 though resposted as nature of question differs from that thread.
Knots in Non parametric series regression
Dear All,
I am Maheswaran Kesavan doing masters in University college London.
I am doing an Non parametric series regression using B spline basis.
I want to know :
1) Where the knot lies in my data.
2) How to incorporate a knot at a specific point of my choice.
3) How to plot the curve for Non parametric series regression.
4) What is the minimum number of data points we can use in Non parametric regression models (mine is 100)
Thank you in advance
I am Maheswaran Kesavan doing masters in University college London.
I am doing an Non parametric series regression using B spline basis.
I want to know :
1) Where the knot lies in my data.
2) How to incorporate a knot at a specific point of my choice.
3) How to plot the curve for Non parametric series regression.
4) What is the minimum number of data points we can use in Non parametric regression models (mine is 100)
Thank you in advance
Predict based on regression model
Hi,
I estimated on the regression model:
Based on this regression, I want to predict the outcome using the mean values of the independent variables.
For instance, I want to plug in the average of Cash on this regression to see how it differs to the real # of children.
Do you have any ideas how to do this?
Thank you in advance!
I estimated on the regression model:
Code:
reg lnChild lnCash lnWhite lnCash*lnWhite
For instance, I want to plug in the average of Cash on this regression to see how it differs to the real # of children.
Do you have any ideas how to do this?
Thank you in advance!
ICC (Intra-class correlation coefficient) vs "9% of this variation in mortality was attributable solely to the surgeon."
I am trying to find out the relationship between a. the ICC for surgeons and the b. the variation due to surgeons.
In Udyavar,2018 (The impact of individual physicians on outcomes after trauma: is it the system or the surgeon?") both the ICC ("Surgeons with higher mortality rates were not clustered at specific hospitals, as the intraclass correlation for surgeon level mortality rates was 0.02") and the quote in the subject above were given. I am doing a systematic review and wonder how the ICC and this particular variation are related (the relationship is not that the ICC is the square of the variation).
The reason is that some papers list the variation due to the surgeon while other papers show the ICC (Intra-class correlation coefficient).
(The ICC is important as even a small ICC can have a substantial design effect - if you cluster a randomized controlled trial by practitioner - surgeon for example - you will need substantially more patients to gain sufficient statistical power than if the ICC is nil).
Does anybody know how the ICC for a practitioner and the variation due to a practitioner are related?
In Udyavar,2018 (The impact of individual physicians on outcomes after trauma: is it the system or the surgeon?") both the ICC ("Surgeons with higher mortality rates were not clustered at specific hospitals, as the intraclass correlation for surgeon level mortality rates was 0.02") and the quote in the subject above were given. I am doing a systematic review and wonder how the ICC and this particular variation are related (the relationship is not that the ICC is the square of the variation).
The reason is that some papers list the variation due to the surgeon while other papers show the ICC (Intra-class correlation coefficient).
(The ICC is important as even a small ICC can have a substantial design effect - if you cluster a randomized controlled trial by practitioner - surgeon for example - you will need substantially more patients to gain sufficient statistical power than if the ICC is nil).
Does anybody know how the ICC for a practitioner and the variation due to a practitioner are related?
power analysis for modified poisson regression
Hello,
I am trying to determine the power of one my analyses. The outcome is binary and relative risk if the effect estimate, determined using a modified Poisson regression with robust error variance. The exposure variable is categorical and has 5 groups (placebo with 4 levels of treatment). I am struggling on how to determine the power, or which options to choose in STATA, for this aim given the 5 levels of exposure. I was wondering if anyone could provide some assistance on how to proceed with this power analysis in STATA? Thanks!
I am trying to determine the power of one my analyses. The outcome is binary and relative risk if the effect estimate, determined using a modified Poisson regression with robust error variance. The exposure variable is categorical and has 5 groups (placebo with 4 levels of treatment). I am struggling on how to determine the power, or which options to choose in STATA, for this aim given the 5 levels of exposure. I was wondering if anyone could provide some assistance on how to proceed with this power analysis in STATA? Thanks!
"Not sorted"
Dear All,
I have individual level panel data that includes spells, with a 6 month *follow up* period after each spell as follows:
But I get an error "not sorted".
I did try to tsset the data and use bysort. But, I am clearly messing up somewhere but the error doesn't give a lot of detail and I am stuck. I will be grateful for your help.
Sincerely,
Sumedha.
I have individual level panel data that includes spells, with a 6 month *follow up* period after each spell as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(id month) byte spell float spell_followup double(base_y followup_y) float followupmonth 52 1 0 . . . . 52 2 0 . . . . 52 3 0 . . . . 52 4 0 . . . . 52 5 0 . . . . 52 6 0 . . . . 52 7 0 . . . . 52 8 0 . . . . 52 9 0 . . . . 52 10 0 . . . . 52 11 0 . . . . 52 12 0 . . . . 52 13 0 . . . . 52 14 0 . . . . 52 15 0 . . . . 52 16 0 . . . . 52 17 1 . . . . 52 18 1 . . . . 52 19 1 . . . . 52 20 1 . 30.35714340209961 . . 52 21 0 1 . 21.25 1 52 22 2 1 . 21.25 2 52 23 2 1 . 21.25 3 52 24 2 1 . 21.25 4 52 25 2 1 . 21.25 5 52 26 2 1 21.25 21.25 6 52 27 0 2 . 0 1 52 28 0 2 . 32.04545593261719 2 52 29 0 2 . 69.54545593261719 3 52 30 0 2 . 75 4 52 31 0 2 . 37.5 5 52 32 0 2 . 0 6 52 33 0 . . . . 52 34 0 . . . . 52 35 0 . . . . 52 36 0 . . . . 52 37 0 . . . . 52 38 0 . . . . 52 39 0 . . . . 52 40 0 . . . . 52 41 0 . . . . 52 42 0 . . . . 52 43 0 . . . . 52 44 0 . . . . 52 45 0 . . . . 52 46 0 . . . . 52 47 0 . . . . 52 48 0 . . . . 52 49 0 . . . . 52 50 0 . . . . 52 51 0 . . . . 52 52 0 . . . . 52 53 0 . . . . 52 54 0 . . . . 52 55 0 . . . . 52 56 0 . . . . 52 57 0 . . . . 52 58 0 . . . . 52 59 0 . . . . 52 60 0 . . . . 52 61 0 . . . . 52 62 0 . . . . 52 63 0 . . . . 52 64 0 . . . . 52 65 0 . . . . 52 66 0 . . . . 52 67 0 . . . . 52 68 0 . . . . 52 69 0 . . . . 52 70 0 . . . . 52 71 0 . . . . 52 72 0 . . . . 52 73 0 . . . . 52 74 0 . . . . 52 75 0 . . . . 52 76 0 . . . . 52 77 0 . . . . 52 78 0 . . . . 52 79 0 . . . . 52 80 0 . . . . 52 81 0 . . . . 52 82 0 . . . . 52 83 0 . . . . 52 84 0 . . . . 52 85 0 . . . . 52 86 3 . . . . 52 87 3 . . . . 52 88 3 . 23.25 . . 52 89 0 3 . 10.576614379882813 1 52 90 0 3 . 5.2016143798828125 2 52 91 0 3 . 0 3 52 92 0 3 . 10 4 52 93 0 3 . 27.375 5 52 94 0 3 . 34.75 6 52 95 0 . . . . 52 96 0 . . . . 52 97 0 . . . . 52 98 0 . . . . 52 99 0 . . . . 52 100 0 . . . . end
In each of the 6 month follow up period, I want to check if the deviation from the base_y is greater than 15/30/40%. The code I was trying to run is:
Code:
tsset id month bysort id spell_followup (followupmonth): gen var15=((l1.base_y-followup_y)/l1.base_y/>.15) if followupmonth==1 bysort id spell_followup (followupmonth): gen var30=((l1.base_y-followup_y)/l1.base_y/>.30) if followupmonth==1 bysort id spell_followup (followupmonth): gen var50=((l1.base_y-followup_y)/l1.base_y/>.50) if followupmonth==1
But I get an error "not sorted".
Code:
. bysort id spell_followup (followupmonth): gen var15=((l1.base_y-followup_y)/l1.ba > se_y/>.15) if followupmonth==1 not sorted r(5); end of do-file r(5);
Sincerely,
Sumedha.
initial values not feasible melogit
Hello -- I am running an melogit on a 330,000 person nested in ~1900 neighborhoods in 47 countries. For my models, I keep getting "Initial Values Not Feasible". Below are the recommendations I have tried from other forums. I also randomly selected 50% of my sample and it ran then. However, this is not a really feasible solution. Any help is greatly appreciated !!!
Code:
Code:
melogit YNipv urban femalemore malemore working Zsurveyyr [pw=dvwgt] || country: || newid: , or nolog logit YNipv zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] mat a=e(b) mat a1=(a,0) melogit YNipv zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] || country: /// || newid: , or from (B) intmethod(laplace) melogit YNipv zfemeduc zage working Zsurveyyr childtot urban || country: || newid: ,noestimate matrix define B = e(b) matrix define B[1,4] = 1e-8 matrix b1=e(b) melogit YNipv zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] || country: /// || newid: , or from (B, skip) melogit YNipvemo zfemeduc zage working Zsurveyyr childtot urban [pw=dvwgt] || country: /// || newid: , or startgrid(2) |
Comparing two datasets by two variables
Hi, I am very new to stata. I have x1, y1, z variables in data1.dta and x2, y2, N in data2.dta.
I am trying to run an analysis where:
Data within x1, y1, x2, y2 aren't unique which is why I can't merge or append the datasets.
I was hoping to run this process in a loop.
Thanks in advance.
I am trying to run an analysis where:
- Step 1: x1 and x2 will be matched (merge) first.
- Step 2: Within the matched result y1 and y2 will be matched (merge).
- Expected result will be the data where y1 and y2 have finally matched (merge) and I'll get to see the z and N where x1=x2 within which y1=y2.
Data within x1, y1, x2, y2 aren't unique which is why I can't merge or append the datasets.
I was hoping to run this process in a loop.
Thanks in advance.
Log transforming variables
Two questions related to log transforming variables.
I understand that we would want to log transform the dependent variable if normal distribution is not present. However, what if you take the log and you still don't have a normal distribution?
I am not clear on how to determine if you should take the log of independent variables.
I understand that we would want to log transform the dependent variable if normal distribution is not present. However, what if you take the log and you still don't have a normal distribution?
I am not clear on how to determine if you should take the log of independent variables.
How to assess if there is enough variation in your dependent variable
Is there a simple way to assess if there is enough variation in your dependent variable, or is it best to just run the regression and asses the R-squared value?
Change in variance time series
Hi everyone,
I am analysing a time series (stock returns) and I am trying to check whether variance in the second half of my sample is different from the first half. I assigned a period to the observations. Here is an example (not the real data, but this is what it looks like):
I am wondering whether this is a valid method for time series? Anyone around here who can help me answer this question? If it isn't, is there another method (that is not too hard for a beginner?) Thanks in advance!!
I am analysing a time series (stock returns) and I am trying to check whether variance in the second half of my sample is different from the first half. I assigned a period to the observations. Here is an example (not the real data, but this is what it looks like):
PHP Code:
Period X Date
1 .02784243 1/8/2010
1 .01478848 1/15/2010
1 -.04267111 1/22/2010
2 -.011348 1/29/2010
2 -.09616897 2/5/2010
PHP Code:
robvar Polen, by(Periode)
Summary of X
Periode Mean Std. Dev. Freq.
1 .0000922 .0367802 261
2 .00006544 .02613092 261
Total .00007882 .03187241 522
W0 = 10.8059198 df(1, 520) Pr > F = 0.00108013
W50 = 9.6731110 df(1, 520) Pr > F = 0.0019724
W10 = 9.8870904 df(1, 520) Pr > F = 0.00175953
I am wondering whether this is a valid method for time series? Anyone around here who can help me answer this question? If it isn't, is there another method (that is not too hard for a beginner?) Thanks in advance!!
xtivreg , first failes with "conformability error" r(503)
Dear Readers
xtivreg without "first" runs fine, but fails when I add the "first" option with "conformability error" r(503)
Using "set trace on" the error appears at:
Any ideas?
Windows 10 pro up to date and Stata 16.1 29/9/2020
All help most appreciated.
Best wishes
Richard
xtivreg without "first" runs fine, but fails when I add the "first" option with "conformability error" r(503)
Using "set trace on" the error appears at:
Code:
est repost b=`bw', rename findomitted buildfvinfo = est repost b=__00008Q, rename findomitted buildfvinfo conformability error di di as text "First-stage within regression" `vv' xtreg , level(`level') `diopts' } } end xtivreg.Estimate --- end xtivreg --- r(503);
Windows 10 pro up to date and Stata 16.1 29/9/2020
All help most appreciated.
Best wishes
Richard
Split String
Code:
clear input HAVE WANT1 WANT2 AA01 AA 01 AZ02 AZ 02 AV03 AV 03 AA04 AA 04 AA05 AA 05 A06 A 06 A07 A 07 A08 A 08 A09 A 09 A1Z0 AZ 10 B11 B 11 BB12 BB 12 BQ13 BQ 13 D14 D 14 F15 F 15 G16 G 16 G17 G 17 H18 H 18 I19 I 19 I20 I 20 I21 I 21 I22 I 22 II23 II 23 end
I have variable 'HAVE' and wish to get 'WANT1' and 'WANT2' where 'WANT1' is the alpha character in 'HAVE' and 'WANT2' is the numeric. I know how to split string based on position for example to take the first or second character of the string but I do not know how to specify to put the alpha characteristics in 'WANT1' and the numeric in 'WANT2'
I crosspost this on Stack Overflow.
New version of wridit on SSC
Thanks as always to Kit Baum, a new version of the wridit package is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of wridit.
The wridit package is described as below on my website. The new version adds an option handedness(), with possible values center, left and right, specifying the default standard center ridits, left-continuous ridits and right-continuous ridits, respectively. The right-continuous ridit function is also known as the cumulative distribution function.
This version is planned as the final Stata Version 10 version of wridit. I am planning the first Stata Version 16 version of wridit, using data frames.
Best wishes
Roger
---------------------------------------------------------------------------
package wridit from w:\stata10
---------------------------------------------------------------------------
TITLE
wridit: Generate weighted ridits
DESCRIPTION/AUTHOR(S)
wridit inputs a variable and generates its weighted ridits. If no
weights are provided, then all weights are assumed equal to 1, so
unweighted ridits are generated. Zero weights are allowed, and
imply that the ridits calculated for the observations with zero
weights will refer to the distribution of weights in the
observations with nonzero weights.
Author: Roger Newson
Distribution-Date: 28september2020
Stata-Version: 10
INSTALLATION FILES (click here to install)
wridit.ado
wridit.sthlp
---------------------------------------------------------------------------
(click here to return to the previous screen)
The wridit package is described as below on my website. The new version adds an option handedness(), with possible values center, left and right, specifying the default standard center ridits, left-continuous ridits and right-continuous ridits, respectively. The right-continuous ridit function is also known as the cumulative distribution function.
This version is planned as the final Stata Version 10 version of wridit. I am planning the first Stata Version 16 version of wridit, using data frames.
Best wishes
Roger
---------------------------------------------------------------------------
package wridit from w:\stata10
---------------------------------------------------------------------------
TITLE
wridit: Generate weighted ridits
DESCRIPTION/AUTHOR(S)
wridit inputs a variable and generates its weighted ridits. If no
weights are provided, then all weights are assumed equal to 1, so
unweighted ridits are generated. Zero weights are allowed, and
imply that the ridits calculated for the observations with zero
weights will refer to the distribution of weights in the
observations with nonzero weights.
Author: Roger Newson
Distribution-Date: 28september2020
Stata-Version: 10
INSTALLATION FILES (click here to install)
wridit.ado
wridit.sthlp
---------------------------------------------------------------------------
(click here to return to the previous screen)
compare value different variable and rows
hello,
I want to compare values of two varible in diffrent rows.
For example
vr1 vr2
10 20
20 30
40 50
the vr1 has the value 20for obs2 and for obs 1 in vr2
thank you
I want to compare values of two varible in diffrent rows.
For example
vr1 vr2
10 20
20 30
40 50
the vr1 has the value 20for obs2 and for obs 1 in vr2
thank you
table summarizing 3 categorical variables in string form between two groups (binary variable)
Hello,
I am having a hard time finding examples of summary tables between two groups let's say students who dropped out vs those who didn't (a binary variable 1=dropped out and 2= student), I want to know their gender, age (18-88), parents' highest education level(5 choices), marital status (5 choices).
I have been using tabulate command to make two-way table to compare each demographic (tab gender dropout) but ideally I would like to have them all in one table.
I am having a hard time finding examples of summary tables between two groups let's say students who dropped out vs those who didn't (a binary variable 1=dropped out and 2= student), I want to know their gender, age (18-88), parents' highest education level(5 choices), marital status (5 choices).
I have been using tabulate command to make two-way table to compare each demographic (tab gender dropout) but ideally I would like to have them all in one table.
Combining two datasets and keeping specific observations
I have two datasets. Dataset A and Dataset B. Dataset A is my existing data that I have organized and cleaned. Data examples are given below:
Dataset A
Dataset B
As you can see, there are observations in Dataset B not available in Dataset A. I want to combine these two dataset such that only the observations (in this case countries) in Dataset A are kept. The countries that are in Dataset A are kept. The rest of the observations are dropped. I am doing it manually using the
command but it is very time consuming.
Dataset A
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str20(reporter partner) double import int year "Albania" "Argentina" 515256 2019 "Albania" "Australia" 243387 2019 "Albania" "Austria" 4070764 2019 "Albania" "Bahrain" 400 2019 "Albania" "Bangladesh" 653907 2019 "Albania" "Belgium" 2439898 2019 "Albania" "Brazil" 5157799 2019 "Albania" "Bulgaria" 4492959 2019 "Albania" "Cambodia" 228032 2019 "Albania" "Cameroon" 45297 2019 "Albania" "Canada" 3393615 2019 "Albania" "Chile" 3844 2019 "Albania" "China" 45867296 2019 "Albania" "Colombia" 3427729 2019 "Albania" "Costa Rica" 69864 2019 "Albania" "Croatia, Rep. of" 4308223 2019 "Albania" "Cyprus" 75552 2019 "Albania" "Czech Rep." 5715907 2019 "Albania" "Denmark" 711316 2019 "Albania" "Egypt" 3067760 2019 "Albania" "Finland" 972839 2019 "Albania" "France" 6724754 2019 "Albania" "Germany" 26218870 2019 "Albania" "Greece" 36231564 2019 "Albania" "Greenland" 498771 2019 "Albania" "Hong Kong" 202057 2019 "Albania" "Hungary" 3770876 2019 "Albania" "Iceland" 103 2019 "Albania" "India" 3752246 2019 "Albania" "Indonesia" 1287032 2019 "Albania" "Iran" 169407 2019 "Albania" "Iraq" 28564 2019 "Albania" "Ireland" 1270395 2019 "Albania" "Israel" 5164563 2019 "Albania" "Italy" 101123701 2019 "Albania" "Japan" 1325641 2019 "Albania" "Jordan" 37307 2019 "Albania" "Kenya" 106517 2019 "Albania" "Kuwait" 5607 2019 "Albania" "Lithuania" 520594 2019 "Albania" "Luxembourg" 37569 2019 "Albania" "Malaysia" 779891 2019 "Albania" "Mauritius" 31204 2019 "Albania" "Mexico" 625250 2019 "Albania" "Morocco" 176254 2019 "Albania" "Netherlands, The" 4212582 2019 "Albania" "New Zealand" 1273 2019 "Albania" "Nigeria" 101085 2019 "Albania" "Norway" 648814 2019 "Albania" "Pakistan" 1018572 2019 "Albania" "Panama" 0 2019 "Albania" "Paraguay" 47577 2019 "Albania" "Peru" 154590 2019 "Albania" "Philippines" 57933 2019 "Albania" "Poland, Rep. of" 6135012 2019 "Albania" "Portugal" 821642 2019 "Albania" "Qatar" 231825 2019 "Albania" "Romania" 3265392 2019 "Albania" "Russian Federation" 10036688 2019 "Albania" "Saudi Arabia" 27167 2019 "Albania" "Serbia, Rep. of" 14200179 2019 "Albania" "Sierra Leone" 3814 2019 "Albania" "Singapore" 6700 2019 "Albania" "Slovak Rep." 702780 2019 "Albania" "Slovenia, Rep. of" 10444300 2019 "Albania" "South Africa" 136819 2019 "Albania" "South Korea" 1617098 2019 "Albania" "Spain" 4673846 2019 "Albania" "Sri Lanka" 265608 2019 "Albania" "Sweden" 711592 2019 "Albania" "Switzerland" 14513361 2019 "Albania" "Taiwan" 645289 2019 "Albania" "Thailand" 927471 2019 "Albania" "Tunisia" 1280895 2019 "Albania" "Turkey" 35528918 2019 "Albania" "Uganda" 15271 2019 "Albania" "Ukraine" 4250674 2019 "Albania" "United Arab Emirates" 17129 2019 "Albania" "United Kingdom" 3322489 2019 "Albania" "United States" 8260812 2019 "Albania" "Venezuela" 10681 2019 "Albania" "Vietnam" 1865398 2019 "Algeria" "Albania" 5720 2019 "Algeria" "Angola" 9576 2019 "Algeria" "Argentina" 90107125 2019 "Algeria" "Australia" 190070 2019 "Algeria" "Austria" 27912946 2019 "Algeria" "Bahrain" 1844807 2019 "Algeria" "Bangladesh" 1456613 2019 "Algeria" "Belgium" 36570747 2019 "Algeria" "Brazil" 50796873 2019 "Algeria" "Bulgaria" 5689167 2019 "Algeria" "Cambodia" 590320 2019 "Algeria" "Cameroon" 670926 2019 "Algeria" "Canada" 18572061 2019 "Algeria" "Chile" 380512 2019 "Algeria" "China" 518205282 2019 "Algeria" "Colombia" 163860 2019 "Algeria" "Costa Rica" 76043 2019 "Algeria" "Croatia, Rep. of" 4332027 2019 end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str40 reporter str42 partner double import float year "Albania" "Advanced Economies" 312497189 2019 "Albania" "Advanced Economies" 201028905 2019 "Albania" "Advanced Economies" 261178953 2019 "Albania" "Africa" 2325741 2019 "Albania" "Africa" 2782719 2019 "Albania" "Africa" 4857234 2019 "Albania" "Algeria" 1807390 2019 "Albania" "Algeria" 1510581 2019 "Albania" "Algeria" 1369995 2019 "Albania" "Antigua and Barbuda" 62 2019 "Albania" "Antigua and Barbuda" 74 2019 "Albania" "Antigua and Barbuda" 447646 2019 "Albania" "Argentina" 1027521 2019 "Albania" "Argentina" 815640 2019 "Albania" "Argentina" 858781 2019 "Albania" "Australia" 282861 2019 "Albania" "Australia" 40745 2019 "Albania" "Australia" 338440 2019 "Albania" "Austria" 4004822 2019 "Albania" "Austria" 4791717 2019 "Albania" "Austria" 3984711 2019 "Albania" "Azerbaijan, Rep. of" 6999 2019 "Albania" "Bahrain, Kingdom of" . 2019 "Albania" "Bangladesh" 23529 2019 "Albania" "Bangladesh" 520849 2019 "Albania" "Bangladesh" 623189 2019 "Albania" "Belarus, Rep. of" 51160 2019 "Albania" "Belarus, Rep. of" 16541 2019 "Albania" "Belarus, Rep. of" 61212 2019 "Albania" "Belgium" 4005557 2019 "Albania" "Belgium" 3347765 2019 "Albania" "Belgium" 3558505 2019 "Albania" "Bolivia" 2260 2019 "Albania" "Bolivia" 1222 2019 "Albania" "Bolivia" 2703 2019 "Albania" "Bosnia and Herzegovina" 1878627 2019 "Albania" "Bosnia and Herzegovina" 2247752 2019 "Albania" "Bosnia and Herzegovina" 1622352 2019 "Albania" "Brazil" 3207452 2019 "Albania" "Brazil" 2352098 2019 "Albania" "Brazil" 1965837 2019 "Albania" "Bulgaria" 5100845 2019 "Albania" "Bulgaria" 5279264 2019 "Albania" "Bulgaria" 6103094 2019 "Albania" "Cambodia" 250155 2019 "Albania" "Cambodia" 2493 2019 "Albania" "Cambodia" 299307 2019 "Albania" "Cameroon" 53497 2019 "Albania" "Cameroon" 44712 2019 "Albania" "Cameroon" 41696 2019 "Albania" "Canada" 5997194 2019 "Albania" "Canada" 3317402 2019 "Albania" "Canada" 2772619 2019 "Albania" "Chile" 109403 2019 "Albania" "Chile" 70864 2019 "Albania" "Chile" 130899 2019 "Albania" "China" 50511440 2019 "Albania" "China" 28918088 2019 "Albania" "China" 42216460 2019 "Albania" "China, P.R.: Macao" 3742 2019 "Albania" "China, P.R.: Macao" 4478 2019 "Albania" "Colombia" 261384 2019 "Albania" "Colombia" 277623 2019 "Albania" "Colombia" 218459 2019 "Albania" "Congo, Dem. Rep. of the" 51263 2019 "Albania" "Congo, Dem. Rep. of the" 61335 2019 "Albania" "Costa Rica" 49926 2019 "Albania" "Costa Rica" 59735 2019 "Albania" "Costa Rica" 148829 2019 "Albania" "Croatia, Rep. of" 5025324 2019 "Albania" "Croatia, Rep. of" 6181352 2019 "Albania" "Croatia, Rep. of" 5166251 2019 "Albania" "Cuba" 423 2019 "Albania" "Cuba" 506 2019 "Albania" "Cyprus" 87233 2019 "Albania" "Cyprus" 104373 2019 "Albania" "Cyprus" 471599 2019 "Albania" "Czech Rep." 4687524 2019 "Albania" "Czech Rep." 3935415 2019 "Albania" "Czech Rep." 3917740 2019 "Albania" "Côte d'Ivoire" . 2019 "Albania" "Côte d'Ivoire" . 2019 "Albania" "Côte d'Ivoire" 11059 2019 "Albania" "Denmark" 978386 2019 "Albania" "Denmark" 1446774 2019 "Albania" "Denmark" 1209185 2019 "Albania" "Dominican Rep." 11869 2019 "Albania" "Dominican Rep." 14201 2019 "Albania" "Dominican Rep." 5096 2019 "Albania" "Ecuador" 3735312 2019 "Albania" "Ecuador" 2943260 2019 "Albania" "Ecuador" 3521572 2019 "Albania" "Egypt" 978088 2019 "Albania" "Egypt" 2966472 2019 "Albania" "Egypt" 1170269 2019 "Albania" "Emerging and Developing Asia" 61863154 2019 "Albania" "Emerging and Developing Asia" 51703998 2019 "Albania" "Emerging and Developing Asia" 36319937 2019 "Albania" "Emerging and Developing Economies" 162389109 2019 "Albania" "Emerging and Developing Economies" 222863859 2019 end
As you can see, there are observations in Dataset B not available in Dataset A. I want to combine these two dataset such that only the observations (in this case countries) in Dataset A are kept. The countries that are in Dataset A are kept. The rest of the observations are dropped. I am doing it manually using the
Code:
drop if reporter == "" | partner ==""
Replacing values from one observation to another
Dear Statalist users,
this may be a trivial question to you, but I am a little bit struggling to do this.
My data are as this example
What I would like to do is to have for variable high for countries A and B the values of country W multiplied by the share. But I would like to do this only if the value for high is missing.
So I should have something like
Any help would be appreciated!
Thank you!!
this may be a trivial question to you, but I am a little bit struggling to do this.
My data are as this example
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float year str3 iso3 str36 cntry double high float share 2010 "CSK" "W" 682448769 . 2012 "CSK" "W" 1113816002 . 2010 "" "A" . .736144 2012 "" "A" . .7545093 2010 "" "B" . .26385596 2012 "" "B" . .2454907 end
So I should have something like
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float year str3 iso3 str36 cntry double high float share 2010 "CSK" "W" 682448769 . 2012 "CSK" "W" 1113816002 . 2010 "" "A" 502380567 .736144 2012 "" "A" 840384531,99 .7545093 2010 "" "B" 180068175 .26385596 2012 "" "B" 27338243,6 .2454907 end
Thank you!!
Fixed effects regression is doing something I'm not noticing?
In Stata I complete a fixed-effects conditional logistic regression model of a binary predictor (1==Unemployed | 0 ==Employed) on a binary outcome (1==Overweight | 0 == Not overweight), with some controls in a longitudinal panel of 3 waves.
I report the coefficient on the unemployment variable as the effect of unemployment on overweight. So, here I say if your parent experienced unemployment at any point across the three waves of the study, your probability of being overweight was 0.06 percentage points higher.
I received a comment that I treat transitions from unemployment to employment on weight similarly to transitions from employment to unemployment on weight.
But, I only ever report the coefficient on parentsunemployed (0.06) and it's my understanding that due to how I set up my binary predictor and outcome I'm only ever considering the effect of a change from employment to unemployment on a change from not overweight to overweight.
So, why were changes from unemployment to employment even mentioned? Is it possible that I am considering this and don't even know it? And how?!
I could really do with some advice!
All the best,
John
Code:
clogit kidsweight i.parentsunemployed i.urban_or_rural i.year i.parents_age_y i.Parents_Educa i.Parents_Marital, cluster (id) group(id) nolog note: multiple positive outcomes within groups encountered. note: 9,091 groups (23,274 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 5,532 Wald chi2(12) = 268.06 Prob > chi2 = 0.0000 Log pseudolikelihood = -1892.4384 Pseudo R2 = 0.0603 (Std. Err. adjusted for 1,945 clusters in id) ---------------------------------------------------------------------------------------------------- | Robust kidsweight| Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------------------+---------------------------------------------------------------- 1.parentsunemployed | .2586795 .0991969 2.61 0.009 .0642571 .4531019 1.urban_or_rural | .0284788 .1521921 0.19 0.852 -.2698122 .3267699 | year | 1 | .331549 .0608113 5.45 0.000 .2123611 .4507368 2 | -.5641933 .0786183 -7.18 0.000 -.7182823 -.4101043 | parents_age_y | 30-39 | -.019373 .1321831 -0.15 0.883 -.2784471 .239701 40 or more | -.131816 .1917598 -0.69 0.492 -.5076582 .2440262 | Parents_Educa | Leaving Certificate to Non Degree | .3654921 .2249296 1.62 0.104 -.0753619 .806346 Primary Degree or greater | .4395884 .2934593 1.50 0.134 -.1355812 1.014758 | Parents_Marital | 2 | -.154054 .2966866 -0.52 0.604 -.735549 .4274409 3 | -.4093562 .3844533 -1.06 0.287 -1.162871 .3441584 4 | -.1921434 .1805024 -1.06 0.287 -.5459217 .1616349 5 | .7150017 1.125252 0.64 0.525 -1.490451 2.920455 ---------------------------------------------------------------------------------------------------- . margins, dydx(parentsunemployed) post Average marginal effects Number of obs = 5,532 Model VCE : Robust Expression : Pr(kidsweight |fixed effect is 0), predict(pu0) dy/dx w.r.t. : 1.parentsunemployed ----------------------------------------------------------------------------------------------- | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] ------------------------------+---------------------------------------------------------------- 1.parentsunemployed| .0605013 .0229353 2.64 0.008 .0155489 .1054537 ----------------------------------------------------------------------------------------------- Note: dy/dx for factor levels is the discrete change from the base level.
I received a comment that I treat transitions from unemployment to employment on weight similarly to transitions from employment to unemployment on weight.
But, I only ever report the coefficient on parentsunemployed (0.06) and it's my understanding that due to how I set up my binary predictor and outcome I'm only ever considering the effect of a change from employment to unemployment on a change from not overweight to overweight.
So, why were changes from unemployment to employment even mentioned? Is it possible that I am considering this and don't even know it? And how?!
I could really do with some advice!
All the best,
John
Interpreting stset- output
Hello,
I use Stata 15.1 for survival analysis, using a Cox-model (stcox). My master dataset is the UCDP Peace Agreement Dataset V19.1, with three different merged replication datasets. I want to investigate the relationship between gender provisions in peace agreements and the duration of peace agreements.
While stsetting my data, I received the following picture: Array
How can I interpret the output?
What does the Probable Error mean?
Kind regards,
Theresa
I use Stata 15.1 for survival analysis, using a Cox-model (stcox). My master dataset is the UCDP Peace Agreement Dataset V19.1, with three different merged replication datasets. I want to investigate the relationship between gender provisions in peace agreements and the duration of peace agreements.
While stsetting my data, I received the following picture: Array
How can I interpret the output?
What does the Probable Error mean?
Kind regards,
Theresa
Suest after fracreg
Hello all,
This seems like a simple question.
I wanted to compare coefficients from two models estimated using fracreg command (fractional logit).
*******
fracreg logit quality robots if industry==1
est store ind1
fracreg logit quality robots if industry=2
est store ind2
suest ind1 ind2
*******
I get this error message:
"ind1 was estimated with a nonstandard vce (robust)"
I found that this is because fracreg by default uses vce(robust), while suest does not permit vce(robust), nor vce(jackknife) or vce(cluster) - the other vce options available with fracreg.
I was wondering sure if there was a way to run fracreg by changing the robust option, or an alternative way to compare the coefficients.
Regards,
Joseph Bakker
This seems like a simple question.
I wanted to compare coefficients from two models estimated using fracreg command (fractional logit).
*******
fracreg logit quality robots if industry==1
est store ind1
fracreg logit quality robots if industry=2
est store ind2
suest ind1 ind2
*******
I get this error message:
"ind1 was estimated with a nonstandard vce (robust)"
I found that this is because fracreg by default uses vce(robust), while suest does not permit vce(robust), nor vce(jackknife) or vce(cluster) - the other vce options available with fracreg.
I was wondering sure if there was a way to run fracreg by changing the robust option, or an alternative way to compare the coefficients.
Regards,
Joseph Bakker
reshaping complex panel data
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 _AIHWperiod int SA3 str20 _AIHWgeoname str61 _AIHWservice str11 _AIHWdemo str42 _AIHWname double _AIHWvalue
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Per cent of people who had the service (%)" 4.79
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Services per 100 people" 21.27
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Medicare benefits per 100 people ($)" 2058
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "No. of patients" 3467
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "No. of services" 15390
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Total Medicare benefits paid ($)" 1489193
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Total provider fees ($)" 1708162
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Per cent of people who had the service (%)" 1.39
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Services per 100 people" 6.81
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Medicare benefits per 100 people ($)" 840
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "No. of patients" 1004
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "No. of services" 4928
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Total Medicare benefits paid ($)" 607614
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Total provider fees ($)" 663148
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Per cent of people who had the service (%)" .63
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Services per 100 people" 2.5
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Medicare benefits per 100 people ($)" 192
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "No. of patients" 453
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "No. of services" 1811
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Total Medicare benefits paid ($)" 139200
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Total provider fees ($)" 162119
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Per cent of people who had the service (%)" 3.01
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Services per 100 people" 11.96
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Medicare benefits per 100 people ($)" 1026
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "No. of patients" 2175
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "No. of services" 8651
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Total Medicare benefits paid ($)" 742379
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Total provider fees ($)" 882895
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Per cent of people who had the service (%)" 31.93
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Services per 100 people" 43.3
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Medicare benefits per 100 people ($)" 2069
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "No. of patients" 23100
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "No. of services" 31329
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Total Medicare benefits paid ($)" 1496811
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Total provider fees ($)" 1539638
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Per cent of people who had the service (%)" 5.56
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Services per 100 people" 14.96
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Medicare benefits per 100 people ($)" 799
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "No. of patients" 4020
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "No. of services" 10824
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Total Medicare benefits paid ($)" 578384
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Total provider fees ($)" 631516
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Per cent of people who had the service (%)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Services per 100 people" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Medicare benefits per 100 people ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "No. of patients" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "No. of services" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Total Medicare benefits paid ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Total provider fees ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Per cent of people who had the service (%)" .77
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Services per 100 people" 1.19
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Medicare benefits per 100 people ($)" 63
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "No. of patients" 555
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "No. of services" 863
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Total Medicare benefits paid ($)" 45884
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Total provider fees ($)" 54623
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Per cent of people who had the service (%)" .03
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Services per 100 people" .07
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Medicare benefits per 100 people ($)" 5
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "No. of patients" 25
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "No. of services" 52
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Total Medicare benefits paid ($)" 3881
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Total provider fees ($)" 6485
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Per cent of people who had the service (%)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Services per 100 people" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Medicare benefits per 100 people ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "No. of patients" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "No. of services" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Total Medicare benefits paid ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Total provider fees ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Per cent of people who had the service (%)" 4.56
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Services per 100 people" 12.85
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Medicare benefits per 100 people ($)" 684
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "No. of patients" 3296
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "No. of services" 9297
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Total Medicare benefits paid ($)" 494529
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Total provider fees ($)" 528365
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Per cent of people who had the service (%)" .06
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Services per 100 people" .17
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Medicare benefits per 100 people ($)" 12
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "No. of patients" 41
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "No. of services" 124
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Total Medicare benefits paid ($)" 8507
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Total provider fees ($)" 15561
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Per cent of people who had the service (%)" 5.08
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Services per 100 people" 16.03
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Medicare benefits per 100 people ($)" 843
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "No. of patients" 3678
end
Hi,
I am trying to eventually merge this dataset to my master dataset using "SA3" variable. To do this, I am aware that I need to reshape this long dataset to wide so I can have 1 line per SA3 code. There are 6 SA3 codes in total eg;
SA3 | Freq. Percent Cum.
------------+-----------------------------------
10104 | 1,232 16.67 16.67
10701 | 1,232 16.67 33.33
10703 | 1,232 16.67 50.00
10704 | 1,232 16.67 66.67
11401 | 1,232 16.67 83.33
11402 | 1,232 16.67 100.00
------------+-----------------------------------
Total | 7,392 100.00
As you can see, this dataset is complex as there are 7 variables in total, however per SA3 area, over 2 time periods (2016-17, 2017-18), there are various services used and these are also split by a demographic variable. Essentially Im trying to get 1 line of all these variables per SA3. I have tried many codes, however keep getting this error. Any help would be much appreciated!
reshape wide _AIHWvalue, i(SA3) j(_AIHWperiod) string
(note: j = 2016-17 2017-18)
values of variable _AIHWperiod not unique within SA3
Your data are currently long. You are performing a reshape wide. You specified i(SA3) and j(_AIHWperiod). There are observations within
i(SA3) with the same value of j(_AIHWperiod). In the long data, variables i() and j() together must uniquely identify the observations.
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 _AIHWperiod int SA3 str20 _AIHWgeoname str61 _AIHWservice str11 _AIHWdemo str42 _AIHWname double _AIHWvalue
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Per cent of people who had the service (%)" 4.79
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Services per 100 people" 21.27
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Medicare benefits per 100 people ($)" 2058
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "No. of patients" 3467
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "No. of services" 15390
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Total Medicare benefits paid ($)" 1489193
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Total provider fees ($)" 1708162
"2017-18" 10104 "South Coast" "Allied Health subtotal - Mental Health Care" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Per cent of people who had the service (%)" 1.39
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Services per 100 people" 6.81
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Medicare benefits per 100 people ($)" 840
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "No. of patients" 1004
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "No. of services" 4928
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Total Medicare benefits paid ($)" 607614
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Total provider fees ($)" 663148
"2017-18" 10104 "South Coast" "Clinical Psychologist" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Per cent of people who had the service (%)" .63
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Services per 100 people" 2.5
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Medicare benefits per 100 people ($)" 192
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "No. of patients" 453
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "No. of services" 1811
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Total Medicare benefits paid ($)" 139200
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Total provider fees ($)" 162119
"2017-18" 10104 "South Coast" "Other Allied Mental Health" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Per cent of people who had the service (%)" 3.01
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Services per 100 people" 11.96
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Medicare benefits per 100 people ($)" 1026
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "No. of patients" 2175
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "No. of services" 8651
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Total Medicare benefits paid ($)" 742379
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Total provider fees ($)" 882895
"2017-18" 10104 "South Coast" "Other Psychologist" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Per cent of people who had the service (%)" 31.93
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Services per 100 people" 43.3
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Medicare benefits per 100 people ($)" 2069
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "No. of patients" 23100
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "No. of services" 31329
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Total Medicare benefits paid ($)" 1496811
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Total provider fees ($)" 1539638
"2017-18" 10104 "South Coast" "Allied Health subtotal - Optometry" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Per cent of people who had the service (%)" 5.56
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Services per 100 people" 14.96
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Medicare benefits per 100 people ($)" 799
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "No. of patients" 4020
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "No. of services" 10824
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Total Medicare benefits paid ($)" 578384
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Total provider fees ($)" 631516
"2017-18" 10104 "South Coast" "Allied Health subtotal - Other" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Per cent of people who had the service (%)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Services per 100 people" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Medicare benefits per 100 people ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "No. of patients" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "No. of services" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Total Medicare benefits paid ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Total provider fees ($)" .
"2017-18" 10104 "South Coast" "Diabetes Education" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Per cent of people who had the service (%)" .77
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Services per 100 people" 1.19
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Medicare benefits per 100 people ($)" 63
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "No. of patients" 555
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "No. of services" 863
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Total Medicare benefits paid ($)" 45884
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Total provider fees ($)" 54623
"2017-18" 10104 "South Coast" "Dietetics" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Per cent of people who had the service (%)" .03
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Services per 100 people" .07
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Medicare benefits per 100 people ($)" 5
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "No. of patients" 25
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "No. of services" 52
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Total Medicare benefits paid ($)" 3881
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Total provider fees ($)" 6485
"2017-18" 10104 "South Coast" "Occupational Therapy" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Per cent of people who had the service (%)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Services per 100 people" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Medicare benefits per 100 people ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "No. of patients" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "No. of services" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Total Medicare benefits paid ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Total provider fees ($)" .
"2017-18" 10104 "South Coast" "Other Allied Health" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Per cent of people who had the service (%)" 4.56
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Services per 100 people" 12.85
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Medicare benefits per 100 people ($)" 684
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "No. of patients" 3296
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "No. of services" 9297
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Total Medicare benefits paid ($)" 494529
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Total provider fees ($)" 528365
"2017-18" 10104 "South Coast" "Podiatry" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Per cent of people who had the service (%)" .06
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Services per 100 people" .17
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Medicare benefits per 100 people ($)" 12
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "No. of patients" 41
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "No. of services" 124
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Total Medicare benefits paid ($)" 8507
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Total provider fees ($)" 15561
"2017-18" 10104 "South Coast" "Speech Pathology" "All persons" "Estimated resident population" 72351
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Per cent of people who had the service (%)" 5.08
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Services per 100 people" 16.03
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "Medicare benefits per 100 people ($)" 843
"2017-18" 10104 "South Coast" "Allied Health subtotal - Physical Health Care" "All persons" "No. of patients" 3678
end
Hi,
I am trying to eventually merge this dataset to my master dataset using "SA3" variable. To do this, I am aware that I need to reshape this long dataset to wide so I can have 1 line per SA3 code. There are 6 SA3 codes in total eg;
SA3 | Freq. Percent Cum.
------------+-----------------------------------
10104 | 1,232 16.67 16.67
10701 | 1,232 16.67 33.33
10703 | 1,232 16.67 50.00
10704 | 1,232 16.67 66.67
11401 | 1,232 16.67 83.33
11402 | 1,232 16.67 100.00
------------+-----------------------------------
Total | 7,392 100.00
As you can see, this dataset is complex as there are 7 variables in total, however per SA3 area, over 2 time periods (2016-17, 2017-18), there are various services used and these are also split by a demographic variable. Essentially Im trying to get 1 line of all these variables per SA3. I have tried many codes, however keep getting this error. Any help would be much appreciated!
reshape wide _AIHWvalue, i(SA3) j(_AIHWperiod) string
(note: j = 2016-17 2017-18)
values of variable _AIHWperiod not unique within SA3
Your data are currently long. You are performing a reshape wide. You specified i(SA3) and j(_AIHWperiod). There are observations within
i(SA3) with the same value of j(_AIHWperiod). In the long data, variables i() and j() together must uniquely identify the observations.
Logit model for estimate Demand (Berry 1944)
Hello everybody
I have the following excercise to estimate Demand allowing for heterogeneal preference shocks:
Supermarkets market
Time (m) = 5
Area (i) = 18
Branch for each Area (h) = 199. Adding the outside option 200
3 different firms
distance= distance to the city center
employees= number of the employees in the branch
Utility funtion of the customers:
U(ihm) = β1income(im) + β2Firm1(hm) + β3Firm2(hm) + β4employees(hm) + β5distancia(ih) + ε(ihm)
As you can note the U function doesn´t include price because one assumption is that every supermarket sell at the same price
I also have the variable Area Share(ihm), which represent the market share of the Branch(h) in the Area(i) for the period of time (m)
If someone have experience estimating that kind of models it would be really helpfull.
Thanks,
Manuel.
I have the following excercise to estimate Demand allowing for heterogeneal preference shocks:
Supermarkets market
Time (m) = 5
Area (i) = 18
Branch for each Area (h) = 199. Adding the outside option 200
3 different firms
distance= distance to the city center
employees= number of the employees in the branch
Utility funtion of the customers:
U(ihm) = β1income(im) + β2Firm1(hm) + β3Firm2(hm) + β4employees(hm) + β5distancia(ih) + ε(ihm)
As you can note the U function doesn´t include price because one assumption is that every supermarket sell at the same price
I also have the variable Area Share(ihm), which represent the market share of the Branch(h) in the Area(i) for the period of time (m)
If someone have experience estimating that kind of models it would be really helpfull.
Thanks,
Manuel.
Tuesday, September 29, 2020
Deciding equation when analyzing by ppml
Hello
My data is panel data with strongly balanced . As I want to know the effect of lpi on export and import I have chosen export and import as dep vars and lpi, gdp, distance and dummy as indep vars. The summaries of my data as follow as
Because the cases with the export or import value is equal to 0 account for about 18% of total obs so I decided to use pplm to analyse. My equation becomes like this
Ex= a ln(gdp) + b ln(dis) + c ln( lpi) + e. dummy ( landlocked)
But there are some missing data on lpi because in some years in some specific countries , LPI were not collected
So I wonder whether my equation is suitable or not. If not what is equation should I use?
Please give me advice
Thank so much
My data is panel data with strongly balanced . As I want to know the effect of lpi on export and import I have chosen export and import as dep vars and lpi, gdp, distance and dummy as indep vars. The summaries of my data as follow as
Code:
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- export | 1,328 833.446 3024.465 0 41549.71 import | 1,328 837.313 3961.817 0 58532.57 distance | 1,328 8860.618 4304.137 478.553 19228.99 gdp | 1,328 4523.65 16524.65 1.97454 196236.7 landlocked | 1,328 .2108434 .4080611 0 1 -------------+--------------------------------------------------------- lpi0 | 1,262 2.879012 .5787635 1.598322 4.225967 lpi1 | 1,262 2.691696 .5987479 1.111111 4.20779 lpi2 | 1,262 2.754088 .6829058 1.237654 4.439356 lpi3 | 1,262 2.846396 .5248384 1.362654 4.235 lpi4 | 1,262 2.828908 .608916 1.394253 4.31065 -------------+--------------------------------------------------------- lpi5 | 1,262 2.886015 .6297591 1.513605 4.377678 lpi6 | 1,262 3.253649 .5854234 1.665079 4.795714
Ex= a ln(gdp) + b ln(dis) + c ln( lpi) + e. dummy ( landlocked)
But there are some missing data on lpi because in some years in some specific countries , LPI were not collected
Code:
gen ll1=ln(lpi1) (66 missing values generated)
Please give me advice
Thank so much
Non integer weights - problem
Hi.
I'm using weights from the European social Survey which have three different weights: design, population and post estimation weights. I really don't know what kind of weights they are (if fw, or aw or pw). I simply tried to graph (histogram) a variable ( worry about climate change) using weights but stata said: "may not use noninteger frequency weights". I'm aware of that would mean that my weights are not integer and that I need frequency weight, but the question is: can i transform my non integer weights in integer ones?
i attach a brief doc that explains the ESS database's weights. (http://www.europeansocialsurvey.org/...ing_data_1.pdf)
Thank you very much
Gab
I'm using weights from the European social Survey which have three different weights: design, population and post estimation weights. I really don't know what kind of weights they are (if fw, or aw or pw). I simply tried to graph (histogram) a variable ( worry about climate change) using weights but stata said: "may not use noninteger frequency weights". I'm aware of that would mean that my weights are not integer and that I need frequency weight, but the question is: can i transform my non integer weights in integer ones?
i attach a brief doc that explains the ESS database's weights. (http://www.europeansocialsurvey.org/...ing_data_1.pdf)
Thank you very much
Gab
Mark Highest Recurring Observation
Dear All,
Hope you are well.
I am wanting to generate a variable that marks the highest recurring violation type for each business.
There are cases when one business has multiple violation repetitions of the same maximum number so I am unable to determine a way as to how I should deal with these.
For example business id 27 has 4 observations and 2 violations repeating 2 times. This means one business has 2 violation types but having the same maximum count.
Your help will be appreciated. business_id is the id of business, violation_type is the type of violation that business made and violation_rpt is how many times the business repeated that violation.
Hope you are well.
I am wanting to generate a variable that marks the highest recurring violation type for each business.
There are cases when one business has multiple violation repetitions of the same maximum number so I am unable to determine a way as to how I should deal with these.
For example business id 27 has 4 observations and 2 violations repeating 2 times. This means one business has 2 violation types but having the same maximum count.
Your help will be appreciated. business_id is the id of business, violation_type is the type of violation that business made and violation_rpt is how many times the business repeated that violation.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long business_id byte violation_type float violation_rpt 26 1 2 26 1 2 26 4 1 26 7 3 26 7 3 26 7 3 26 13 5 26 13 5 26 13 5 26 13 5 26 13 5 27 7 2 27 7 2 27 13 2 27 13 2 28 1 1 28 3 1 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 4 13 28 9 1 28 11 1 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 28 13 18 29 4 4 29 4 4 29 4 4 29 4 4 29 7 1 29 9 6 29 9 6 29 9 6 29 9 6 29 9 6 29 9 6 29 11 1 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 29 13 16 30 1 1 30 3 1 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 4 12 30 7 3 30 7 3 30 7 3 30 9 1 30 10 2 30 10 2 30 11 5 30 11 5 end label values violation_type violation_type label def violation_type 1 "Adulteration", modify label def violation_type 3 "Unhygienic Items", modify label def violation_type 4 "Uncleanliness", modify label def violation_type 7 "Overpricing", modify label def violation_type 9 "Incorrect Weights & Measures", modify label def violation_type 10 "Non availability of price list", modify label def violation_type 11 "Violation of regulations", modify label def violation_type 13 "No Violation", modify
How to make mother education variable for each observation in a large data
i have data in the form of
Parent key KEY indvidual_number_ in_ roaster completed education mother line number in roaster
PARENT_KEY KEY_ A1_1 education_completed A10_1
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-1 1 higher .
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-2 2 secondary .
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-3 3 higher 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-4 4 higher 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-5 5 secondary 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-6 6 primary 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-7 7 none/pre-shool 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-8 8 none/pre-shool 2
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-1 1 primary .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-2 2 none/pre-shool .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-3 3 none/pre-shool 2
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-4 4 primary .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-5 5 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-6 6 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-7 7 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-8 8 primary 2
here A10_1 present the line number on which real mother of an individual lies and dot values for those whose mother is not present in roaster. i want to create mother education variable.
Parent key KEY indvidual_number_ in_ roaster completed education mother line number in roaster
PARENT_KEY KEY_ A1_1 education_completed A10_1
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-1 1 higher .
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-2 2 secondary .
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-3 3 higher 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-4 4 higher 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-5 5 secondary 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-6 6 primary 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-7 7 none/pre-shool 2
uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c uuid:003d89e0-4eb3-402e-ab13-75e0a6224e6c-8 8 none/pre-shool 2
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-1 1 primary .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-2 2 none/pre-shool .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-3 3 none/pre-shool 2
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-4 4 primary .
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-5 5 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-6 6 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-7 7 none/pre-shool 4
uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b uuid:004b8834-2c03-4a1e-8fc5-6bf2a3379b2b-8 8 primary 2
here A10_1 present the line number on which real mother of an individual lies and dot values for those whose mother is not present in roaster. i want to create mother education variable.
How to use a Cox regression model for time-to-event analysis using different years as control
Good day
Background: I am trying to conduct a retrospective epidemiological study looking at the incidence of a certain disease (on admission) in one year (2020) compared to another (2019).
In this case the year would be the exposure variable (2019 vs. 2020), the outcome being the incidence of disease.
Data: Retrospective admissions data for a pre-defined population between the period 23/03/2020 to 01/08/2020 was collected. This totals 182 admissions with 7 incidences of the disease on admission (1 readmission).
Data on the time period of 23/03/2019 to 01/08/2019 was also collected. This amounts to 218 admissions with 17 incidences of the disease on admission (1 readmission)
Analysis plan: Use stset command for dataset and perform Cox regression, taking into account the reoccurrence (readmissions) of the disease and non-reoccurrence to perform a time-to-event comparison between 2020 and 2019 (confounders to be adjusted for).
Problem: These are the same time-periods (start 23/03 and end 01/08) but in different years. Can I compare datasets using stset for different date start points (23/03/2020 vs 2019/03/2020).
My (rather crude) solution was to simply use 23/03/2020 as the start date for both years (since the time between 23/03/2020-01/08/2020 and 23/03/2019-01/08/2020 are the same: 131 days),
and to create a new variable for year as exposure using 2020 and 2019 respectively, and compare time-to-event this way.
Thank you kindly for your help.
Background: I am trying to conduct a retrospective epidemiological study looking at the incidence of a certain disease (on admission) in one year (2020) compared to another (2019).
In this case the year would be the exposure variable (2019 vs. 2020), the outcome being the incidence of disease.
Data: Retrospective admissions data for a pre-defined population between the period 23/03/2020 to 01/08/2020 was collected. This totals 182 admissions with 7 incidences of the disease on admission (1 readmission).
Data on the time period of 23/03/2019 to 01/08/2019 was also collected. This amounts to 218 admissions with 17 incidences of the disease on admission (1 readmission)
Analysis plan: Use stset command for dataset and perform Cox regression, taking into account the reoccurrence (readmissions) of the disease and non-reoccurrence to perform a time-to-event comparison between 2020 and 2019 (confounders to be adjusted for).
Problem: These are the same time-periods (start 23/03 and end 01/08) but in different years. Can I compare datasets using stset for different date start points (23/03/2020 vs 2019/03/2020).
My (rather crude) solution was to simply use 23/03/2020 as the start date for both years (since the time between 23/03/2020-01/08/2020 and 23/03/2019-01/08/2020 are the same: 131 days),
and to create a new variable for year as exposure using 2020 and 2019 respectively, and compare time-to-event this way.
Thank you kindly for your help.
Redirecting All Ado Paths to New Drive & Folder
At some unfortunate time, I named my partitioned drive with the letter "B". I am setting up a new computer and IT requires me to use C: (without the partition). I will have move all of the B drive data into a new folder on C. So now all my .do files will be pointing to an obsolete path when trying to call data/ado etc. E.g., a file previously stored as "B:\project_a\data\dataset1" might now be "C:\db\project_a\data\dataset1".
Although some of my .do files start with a global directory declaration at the top of the file, which could be changed, many do not. Therefore many .do files have files used for appending, merging, and other .do files preceded with the path to be used preceding the command.
E.g.,
or
Respectively these would need to be changed to:
and
Is there any hope of resolving this problem in some find and replace bulk method?
Thanks, in advance,
Ben
Although some of my .do files start with a global directory declaration at the top of the file, which could be changed, many do not. Therefore many .do files have files used for appending, merging, and other .do files preceded with the path to be used preceding the command.
E.g.,
Code:
cd "B:\project_a\data" use dataset1, clear
Code:
cd "B:\project_a\data" merge 1:1 id using dataset2
Code:
cd "C:\db\project_a\data" use dataset1, clear
Code:
cd "C:\db\project_a\data" merge 1:1 id using dataset2
Thanks, in advance,
Ben
Time series operators not allowed?
Dear All,
I would like to create a 5 month *follow up period* after a specific variable takes a value of 1. My data looks as follows:
So each time the variable followup==1, I want to replace F1.followup-F5.followup==1. For instance, after followup==1 in month 21, I would like to replace followup in months 22-26 with 1 as well. But I get an error when I try to do this:
I am not sure why time series operator is not working in this case as it seems to work otherwise. I will be grateful for your help.
Sincerely,
Sumedha.
I would like to create a 5 month *follow up period* after a specific variable takes a value of 1. My data looks as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(id month) float followup 52 1 . 52 2 . 52 3 . 52 4 . 52 5 . 52 6 . 52 7 . 52 8 . 52 9 . 52 10 . 52 11 . 52 12 . 52 13 . 52 14 . 52 15 . 52 16 . 52 17 . 52 18 . 52 19 . 52 20 . 52 21 1 52 22 . 52 23 . 52 24 . 52 25 . 52 26 . 52 27 1 52 28 . 52 29 . 52 30 . 52 31 . 52 32 . 52 33 . 52 34 . 52 35 . 52 36 . 52 37 . 52 38 . 52 39 . 52 40 . 52 41 . 52 42 . 52 43 . 52 44 . 52 45 . 52 46 . 52 47 . 52 48 1 52 49 . 52 50 . 52 51 . 52 52 . 52 53 . 52 54 . 52 55 . 52 56 . 52 57 . 52 58 . 52 59 . 52 60 . 52 61 . 52 62 . 52 63 . 52 64 . 52 65 . 52 66 . 52 67 . 52 68 . 52 69 . 52 70 . 52 71 . 52 72 . 52 73 . 52 74 . 52 75 . 52 76 . 52 77 . 52 78 . 52 79 . 52 80 . 52 81 . 52 82 . 52 83 . 52 84 . 52 85 . 52 86 . 52 87 . 52 88 . 52 89 1 52 90 . 52 91 . 52 92 . 52 93 . 52 94 . 52 95 . 52 96 . 52 97 . 52 98 . 52 99 . 52 100 . end
Code:
bysort id: replace F1.followup=1 if followup==1 factor variables and time-series operators not allowed r(101);
Sincerely,
Sumedha.
Dropping all companies with no observation at the start of the time period
Hi Statalist,
I have a database of a lot of companies for 7 years: 01-2013 to 01-2019. However, in order to calculate certain variables, I first need to be sure the data of every company starts in 01-2013. And the data for some companies start in 2015 for example. Could you please tell me the code to drop all companies that do not start on 01-2013?
Sidenote: I know deleting all companies that don't make it till the end would cause survivorship-bias, but deleting all funds of which there is no data at the beginning of the period will not do any harm right?
Thank you in advance.
Best regards,
Tom Reinders
I have a database of a lot of companies for 7 years: 01-2013 to 01-2019. However, in order to calculate certain variables, I first need to be sure the data of every company starts in 01-2013. And the data for some companies start in 2015 for example. Could you please tell me the code to drop all companies that do not start on 01-2013?
Sidenote: I know deleting all companies that don't make it till the end would cause survivorship-bias, but deleting all funds of which there is no data at the beginning of the period will not do any harm right?
Thank you in advance.
Best regards,
Tom Reinders
Solve a system of equations
Hello everybody!
I would like to solve in Stata ( or maybe Mata? I never used it) this system of equations:
99-x=(4999-y)*0.0198
99-x=(2256.293-z)*0.0438
x+y+z=1491.293
the number of unknows could also be higher but the concept is always the same, where 99-x is equal to something minus another variable multiplied by something else and then that all unknows summed up give a certain value
I would really appreciate a help!
Thank you in advance
I would like to solve in Stata ( or maybe Mata? I never used it) this system of equations:
99-x=(4999-y)*0.0198
99-x=(2256.293-z)*0.0438
x+y+z=1491.293
the number of unknows could also be higher but the concept is always the same, where 99-x is equal to something minus another variable multiplied by something else and then that all unknows summed up give a certain value
I would really appreciate a help!
Thank you in advance
I need help on merging two datasets.
Hello,
I am having trouble merging two datasets for my thesis. To reduce clutter, I have only included 4 different variables. 'gvkey' and 'fyear' are the identifiers for these datasets and 'debt' and 'PrincipalAmtDbtOutstanding' are used to check if the sets are merged correctly (they are roughly the same).
I would like the datasets to merge on gvkey and fyear. If a certain gvkey is missing observations in fyear I would like STATA to create a missing value for either 'debt' or 'PrincipalAmtDbtOutstanding'.
As can be seen from the datasets below, there are more observations for gvkey and fyear in the first dataset than in the second dataset.
I have tried a 1:1 merge, a 1:m merge and a m:1 merge, but they all give the same error code: "variables gvkey fyear do not uniquely identify observations in the master data" r(459).
Thanks in advance!
Kind regards,
Maks van Noort
gvkey fyear debt
001166 2014 0
001166 2015 0
001166 2016 0
001166 2017 0
001166 2018 0
008546 2014 4104
008546 2015 5760
008546 2016 5606
008546 2017 3697
008546 2018 3927
010846 2014 12372
010846 2015 14519
010846 2016 16410
010846 2017 24009
010846 2018 24483
013145 2014 6617
013145 2015 8630
013145 2016 8515
013145 2017 7331
013145 2018 7509
013556 2014 1576
013556 2015 1536.3
013556 2016 1545.1
013556 2017 1660.8
013556 2018 2729.7
013683 2014 56150
013683 2015 56735
013683 2016 56842
013683 2017 52594
013683 2018 52304
013932 2014 65.149
013932 2015 62.781
013932 2016 1875.368
013932 2017 1896.965
013932 2018 1919.5
and
gvkey fyear PrincipalAmtDbtOutstanding
001166 2016 0
001166 2017 0
008546 2014 4135
008546 2015 5796
008546 2016 5637
008546 2017
008546 2018
010846 2014
010846 2014
010846 2016
010846 2017
010846 2018
013145 2014 6617
013145 2015
013145 2016 8515
013145 2017 7331
013556 2014 1559.200000000000045
013556 2015
013556 2016
013556 2017 1660.799999999999955
013683 2014 56769
013683 2015 56734
013683 2016
013683 2017 52707
013932 2014
013932 2014
013932 2015
013932 2016 1939.70900000000006
I am having trouble merging two datasets for my thesis. To reduce clutter, I have only included 4 different variables. 'gvkey' and 'fyear' are the identifiers for these datasets and 'debt' and 'PrincipalAmtDbtOutstanding' are used to check if the sets are merged correctly (they are roughly the same).
I would like the datasets to merge on gvkey and fyear. If a certain gvkey is missing observations in fyear I would like STATA to create a missing value for either 'debt' or 'PrincipalAmtDbtOutstanding'.
As can be seen from the datasets below, there are more observations for gvkey and fyear in the first dataset than in the second dataset.
I have tried a 1:1 merge, a 1:m merge and a m:1 merge, but they all give the same error code: "variables gvkey fyear do not uniquely identify observations in the master data" r(459).
Thanks in advance!
Kind regards,
Maks van Noort
gvkey fyear debt
001166 2014 0
001166 2015 0
001166 2016 0
001166 2017 0
001166 2018 0
008546 2014 4104
008546 2015 5760
008546 2016 5606
008546 2017 3697
008546 2018 3927
010846 2014 12372
010846 2015 14519
010846 2016 16410
010846 2017 24009
010846 2018 24483
013145 2014 6617
013145 2015 8630
013145 2016 8515
013145 2017 7331
013145 2018 7509
013556 2014 1576
013556 2015 1536.3
013556 2016 1545.1
013556 2017 1660.8
013556 2018 2729.7
013683 2014 56150
013683 2015 56735
013683 2016 56842
013683 2017 52594
013683 2018 52304
013932 2014 65.149
013932 2015 62.781
013932 2016 1875.368
013932 2017 1896.965
013932 2018 1919.5
and
gvkey fyear PrincipalAmtDbtOutstanding
001166 2016 0
001166 2017 0
008546 2014 4135
008546 2015 5796
008546 2016 5637
008546 2017
008546 2018
010846 2014
010846 2014
010846 2016
010846 2017
010846 2018
013145 2014 6617
013145 2015
013145 2016 8515
013145 2017 7331
013556 2014 1559.200000000000045
013556 2015
013556 2016
013556 2017 1660.799999999999955
013683 2014 56769
013683 2015 56734
013683 2016
013683 2017 52707
013932 2014
013932 2014
013932 2015
013932 2016 1939.70900000000006
Choosing between OLS , RE, FE
Hello
I am analyzing the effect of LPI on export and import trade ( using panel data) when I ran OLS, the result was below
The result shows that LPI has effect on export with statistically significant at 1% level
But when I ran RE and Fe the result was very different with OLS' result . All in RE and Fe, the result showed that LPI don't have any effect on export. More over , the result of F-test in fe and the p value Hausman test showed that Fe was best choice
What method should I use? Can you give me some advice?
Thanks so much
I am analyzing the effect of LPI on export and import trade ( using panel data) when I ran OLS, the result was below
Code:
. reg lex lgdp dis ll0 landlocked, cluster(country1) note: landlocked omitted because of collinearity Linear regression Number of obs = 156 F(3, 19) = 125.91 Prob > F = 0.0000 R-squared = 0.8903 Root MSE = .57966 (Std. Err. adjusted for 20 clusters in country1) ------------------------------------------------------------------------------ | Robust lex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lgdp | .8434597 .074269 11.36 0.000 .688013 .9989065 dis | -.7162724 .0859011 -8.34 0.000 -.8960654 -.5364794 ll0 | 1.771018 .588909 3.01 0.007 .5384175 3.003619 landlocked | 0 (omitted) _cons | 3.902149 1.193095 3.27 0.004 1.404974 6.399325 ------------------------------------------------------------------------------
But when I ran RE and Fe the result was very different with OLS' result . All in RE and Fe, the result showed that LPI don't have any effect on export. More over , the result of F-test in fe and the p value Hausman test showed that Fe was best choice
Code:
. xtreg lex lgdp dis ll0 landlocked,fe note: dis omitted because of collinearity note: landlocked omitted because of collinearity Fixed-effects (within) regression Number of obs = 156 Group variable: country1 Number of groups = 20 R-sq: Obs per group: within = 0.2828 min = 4 between = 0.7417 avg = 7.8 overall = 0.6698 max = 8 F(2,134) = 26.41 corr(u_i, Xb) = -0.9032 Prob > F = 0.0000 ------------------------------------------------------------------------------ lex | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lgdp | 2.068337 .2848885 7.26 0.000 1.504877 2.631797 dis | 0 (omitted) ll0 | -.861858 1.238365 -0.70 0.488 -3.311128 1.587412 landlocked | 0 (omitted) _cons | -9.631277 2.850162 -3.38 0.001 -15.2684 -3.994153 -------------+---------------------------------------------------------------- sigma_u | 2.2327644 sigma_e | .4243563 rho | .96513702 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(19, 134) = 27.17 Prob > F = 0.0000
Code:
. . xtreg lex lgdp dis ll0 landlocked,re note: landlocked omitted because of collinearity Random-effects GLS regression Number of obs = 156 Group variable: country1 Number of groups = 20 R-sq: Obs per group: within = 0.2616 min = 4 between = 0.9389 avg = 7.8 overall = 0.8841 max = 8 Wald chi2(3) = 278.22 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ lex | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lgdp | .9661854 .0729139 13.25 0.000 .8232767 1.109094 dis | -.7608128 .1180357 -6.45 0.000 -.9921585 -.529467 ll0 | .7772783 .7426183 1.05 0.295 -.6782268 2.232783 landlocked | 0 (omitted) _cons | 4.390683 1.316084 3.34 0.001 1.811206 6.97016 -------------+---------------------------------------------------------------- sigma_u | .43224225 sigma_e | .4243563 rho | .50920534 (fraction of variance due to u_i) ------------------------------------------------------------------------------
Code:
. hausman fe re ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E. -------------+---------------------------------------------------------------- lgdp | 2.068337 .9661854 1.102151 .2753998 ll0 | -.861858 .7772783 -1.639136 .9909921 ------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 19.30 Prob>chi2 = 0.0001
Thanks so much
summarizing in Stata
Hi, I´m a rookie in using Stata and I am stuck at this point. I have an issue using the sum function. I have a data set of 1156 observations and I have encoded my data from string variables to numeric (long) variables. When opening the data editor I therefore now have the first variables in a kolonne with string values (colored yellow) and another new kolonne of generated numeric values (colored blue). As of earlier experience the data should be colored white (?).
The numeric variable I now have called "nbitprice" is encoded by using the following command: -encode bitprice, gen(nbitprice)- because they were recognized as strings
The problem is when I am running the -sum- command on nbitprice I am not getting the mean of the values in the observations which have a range from 3,000 to 19,000 in value. Instead I get the mean or median of the number of observations, meaning I get 577,1427 when having 1156 observations. What I want is the mean of the values for each observations over time. I hope I am explaining myself good enough.
When I list the observations there are values for each observation.
As reading of some earlier posts you would probably like som info:
. describe nbitprice
storage display value
variable name type format label variable label
--------------------------------------------------------------------------
nbitprice long %9.0g nbitprice
Bitprice
. count
1,156
. summarize nbitprice, detail
Bitprice
-------------------------------------------------------------
Percentiles Smallest
1% 12 1
5% 58 2
10% 116 3 Obs 1,156
25% 288.5 4 Sum of Wgt. 1,156
50% 577.5 Mean 577.1427
Largest Std. Dev. 332.9293
75% 865.5 1150
90% 1038 1151 Variance 110841.9
95% 1096 1152 Skewness -.0003597
99% 1142 1153 Kurtosis 1.79956
Can someone explain what I need to do to get the summarized results I need? I would like to get the mean of the actual value of the 1156 different observations, the standard deviation, min and max value.
Thank you for your help in advance.
The numeric variable I now have called "nbitprice" is encoded by using the following command: -encode bitprice, gen(nbitprice)- because they were recognized as strings
The problem is when I am running the -sum- command on nbitprice I am not getting the mean of the values in the observations which have a range from 3,000 to 19,000 in value. Instead I get the mean or median of the number of observations, meaning I get 577,1427 when having 1156 observations. What I want is the mean of the values for each observations over time. I hope I am explaining myself good enough.
When I list the observations there are values for each observation.
As reading of some earlier posts you would probably like som info:
. describe nbitprice
storage display value
variable name type format label variable label
--------------------------------------------------------------------------
nbitprice long %9.0g nbitprice
Bitprice
. count
1,156
. summarize nbitprice, detail
Bitprice
-------------------------------------------------------------
Percentiles Smallest
1% 12 1
5% 58 2
10% 116 3 Obs 1,156
25% 288.5 4 Sum of Wgt. 1,156
50% 577.5 Mean 577.1427
Largest Std. Dev. 332.9293
75% 865.5 1150
90% 1038 1151 Variance 110841.9
95% 1096 1152 Skewness -.0003597
99% 1142 1153 Kurtosis 1.79956
Can someone explain what I need to do to get the summarized results I need? I would like to get the mean of the actual value of the 1156 different observations, the standard deviation, min and max value.
Thank you for your help in advance.
Comparing categorical variables over time in a randomised cluster trial
Good morning everyone, hope you're all well - and hope that you can help with some confusion.
I have a dataset where I am looking at a categorical variable describing monthly household income, and the dataset has two sources of clustering - over time, and from the randomisation procedure (cluster randomised trial, at health clinic level). The same people have responded at baseline and at follow-up. The dataset is in long format. I want to look at whether the stated household income has changed between baseline and follow-up. I'm working my way through the 'multilevel and longitudinal modeling using stata manual', and am having difficulty finding the relevant section and code that would take into account the clustering over time and account for clustering at the clinic level as well. Can anyone help with suggesting code for examining whether there has been a change in how respondents have answered the household income category between baseline and endline?
My variables are as follows:
hhincomecat:
0 "0-2,000 rand"
1 "2,000-5,000 rand"
2 "5,000-50,000 rand"
time:
0 Baseline
1 Post-lockdown
Health clinic:
categories 1-12 with name of clinic
Happy to give further information. Thanks in advance.
I have a dataset where I am looking at a categorical variable describing monthly household income, and the dataset has two sources of clustering - over time, and from the randomisation procedure (cluster randomised trial, at health clinic level). The same people have responded at baseline and at follow-up. The dataset is in long format. I want to look at whether the stated household income has changed between baseline and follow-up. I'm working my way through the 'multilevel and longitudinal modeling using stata manual', and am having difficulty finding the relevant section and code that would take into account the clustering over time and account for clustering at the clinic level as well. Can anyone help with suggesting code for examining whether there has been a change in how respondents have answered the household income category between baseline and endline?
My variables are as follows:
hhincomecat:
0 "0-2,000 rand"
1 "2,000-5,000 rand"
2 "5,000-50,000 rand"
time:
0 Baseline
1 Post-lockdown
Health clinic:
categories 1-12 with name of clinic
Happy to give further information. Thanks in advance.
Drop duplicate quarterly dates
Hi,
I am working with a dataset which consisted of monthly obeservations of two variables m1 and m3. I have now converted these montly dates to quarterly dates using
I then used
and
to generate quarterly means of m1 and m3.
Now I am left with this
And the only thing left to do is to drop the duplicate quarterly variables, and of course the corresponding variables of these quarters. That is I want to drop observation 2, 3, 5, 6, 8, 9, ... , 305, 306"
I have been experimenting with
but I only mangages to list every 2nd, 3rd etc, observation.
I am working with a dataset which consisted of monthly obeservations of two variables m1 and m3. I have now converted these montly dates to quarterly dates using
Code:
gen tq=qofd(dofm(mdate))
Code:
bys tq : egen m_m3 = mean(m3)
Code:
bys tq : egen m_m1 = mean(m1)
Now I am left with this
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(m1 m3) float(mdate tq m_m3 m_m1) 675257 1010626 469 156 1017618.7 675961.3 673473 1013138 470 156 1017618.7 675961.3 671683 1027389 471 157 1052953.6 685409.3 686539 1061767 472 157 1052953.6 685409.3 698006 1069705 473 157 1052953.6 685409.3 689486 1063959 474 158 1081558.4 700371 698217 1088696 475 158 1081558.4 700371 713410 1092020 476 158 1081558.4 700371 733974 1117443 477 159 1113815 741235 737779 1119119 478 159 1113815 741235 751952 1104883 479 159 1113815 741235 end format %tm mdate format %tq tq
I have been experimenting with
Code:
list tq if mod(_n,2)
putexcel command - error
Hi all 
I have a question about the putexcel command.
When I run the code, sometimes I get the following error message:
file C:\...\reports.xls could not be saved
r(603)
Now, my code is something like the following:
foreach ctr in `countries' {
foreach yrs in `years' {
putexcel set "${path_reports}\report_`ctr'_`yrs'.xls", sheet("reports_`ctr'_`yrs'") replace
putexcel A1=("title A1")
[and so on and so forth]
}
}
The "stage" thing is that sometimes the codes works without problems, and sometimes it stops with the above-mentioned error message.
What am I doing in the wrong way?
Thank you all in advance!!

I have a question about the putexcel command.
When I run the code, sometimes I get the following error message:
file C:\...\reports.xls could not be saved
r(603)
Now, my code is something like the following:
foreach ctr in `countries' {
foreach yrs in `years' {
putexcel set "${path_reports}\report_`ctr'_`yrs'.xls", sheet("reports_`ctr'_`yrs'") replace
putexcel A1=("title A1")
[and so on and so forth]
}
}
The "stage" thing is that sometimes the codes works without problems, and sometimes it stops with the above-mentioned error message.
What am I doing in the wrong way?
Thank you all in advance!!
Cumulative event duration with repeated events as a function of follow up time
Dear STATAlist
I have a dataset with a starting date (different for each id), and different events which can occur repeatedly. What I am interested in, is the cumulative time duration of each event as a function of follow up time. In the end, this would result in a graph which shows follow up time on the x-axis, and cumulative time duration of each type of event (in this case being hospitalized) over the entire population on the y-axis.
Example based on code below
For event1, ID 8 has the first occurence of the event 16 days after F/U start., with only one day of duration (start and end at same day). So up to day = 15 the cumulative event duration for the entire population would be zero, after this 1. This would remain 1 until after 27 days ID 5 experiences an event with duration of 2 days. So at t = 27 cumulative event duration would be 2, and at t=28 would be 3, and so on.
In this case I present 2 types of events (event1_x and event2_x), there are more. I would like to calculate and visualize cumulative event durations at specific time points (365 days, 730 days etc) and present this in a graph a) separately for each event and b) cumulative over multiple events.
Thank you
Kevin Damman
I have a dataset with a starting date (different for each id), and different events which can occur repeatedly. What I am interested in, is the cumulative time duration of each event as a function of follow up time. In the end, this would result in a graph which shows follow up time on the x-axis, and cumulative time duration of each type of event (in this case being hospitalized) over the entire population on the y-axis.
Example based on code below
For event1, ID 8 has the first occurence of the event 16 days after F/U start., with only one day of duration (start and end at same day). So up to day = 15 the cumulative event duration for the entire population would be zero, after this 1. This would remain 1 until after 27 days ID 5 experiences an event with duration of 2 days. So at t = 27 cumulative event duration would be 2, and at t=28 would be 3, and so on.
In this case I present 2 types of events (event1_x and event2_x), there are more. I would like to calculate and visualize cumulative event durations at specific time points (365 days, 730 days etc) and present this in a graph a) separately for each event and b) cumulative over multiple events.
Thank you
Kevin Damman
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double id long(date_fu_start date_start_event1_1 date_end_event1_1 date_start_event1_2 date_end_event1_2 date_start_event2_1 date_end_event2_1 date_start_event2_2 date_end_event2_2) 1 21024 21722 21728 21954 21957 22057 22064 . . 2 19323 . . . . . . . . 3 19340 . 19340 20927 20927 . . . . 4 19558 . . . . 19649 19668 19866 19870 5 19852 19879 19880 20231 20235 . . . . 6 19890 . . . . . . . . 7 20303 20509 20509 . . 20328 20359 20425 20425 8 20493 20509 20509 . . . . . . 9 20521 . . . . 21051 21115 . . 10 21767 . . . . . . . . end format %tdD_m_Y date_fu_start format %tdD_m_Y date_start_event1_1 format %tdD_m_Y date_end_event1_1 format %tdD_m_Y date_start_event1_2 format %tdD_m_Y date_end_event1_2 format %tdD_m_Y date_start_event2_1 format %tdD_m_Y date_end_event2_1 format %tdD_m_Y date_start_event2_2 format %tdD_m_Y date_end_event2_2
Monday, September 28, 2020
Time specification
Dear colleagues,
I am working on 15 years of repeated cross-sectional data. I was wondering whether it is an appropriate strategy to include all interaction terms with higher order time?
reg Y X1##c.T X1##c.Tsq X1##c.Tcub Xk##c.T Xk##c.Tsq Xk##c.Tcub ...
Some previous studies just simply used a linear specification, while others also include cubic terms. Suppose that we have 10 IVs, and I am particularly interested in the influences of X1 variables on Y over time. If we control time-varying effects of all other covariates including cubic terms, isn't it over-control? What is the recommended strategy for time specification? If you know a good reference, please share with me. Appreciate that.
Fillin/expand with panel and different dates
Hi All, please, could someone help me?
I have the data below:
I need to create missing values for each id corresponding to 3 days before its date and 3 days after its date. How can I do that, please? So each id would have 2 obs with x1 non missing and 6 obs with x1 missing.
Many thanks. :-)
I have the data below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id x1 x2) str10 date 1 1 10 "24/09/2013" 1 1 12 "25/09/2013" 2 1 13 "24/09/2013" 2 1 15 "25/09/2013" 3 2 12 "05/10/2014" 3 2 17 "06/10/2014" 4 3 10 "05/10/2014" 4 3 9 "06/10/2014" 5 3 8 "05/10/2015" 5 3 12 "06/10/2015" end
I need to create missing values for each id corresponding to 3 days before its date and 3 days after its date. How can I do that, please? So each id would have 2 obs with x1 non missing and 6 obs with x1 missing.
Many thanks. :-)
Inclusion of both Age and Time Indicators in Panel Data Analysis
Dear Colleagues,
I am analyzing a panel data that collects information on children every two years since 2010, so the panel data has a total of five waves (2010,12,14,16,18). I want to know the association of family structure (especially parental absence) on child's mental health (whether child (age>=10) are depressed or not). I am using xtlogit command with either re / fe option to run random and fixed effect models.
Besides family structure variables, I have included child's age and its quadratic term as the independent variables, and both are significant.
However, when I included the panel indicator dummies (cfps_wave: panel year indicators 2012, 2014, 2016, 2018), the coefficients and significant levels for the age and the family structure variables changed considerably.
My question is: should I include the panel year indicators in the model or not? I know that including time trend is important. As children get older, their mental health state will change. but since I have already included age, should I also need to include survey year indicators? Then the effect of survey year indicators may be spurious. I would be glad if you can give me some advice or references on this issue.
I am analyzing a panel data that collects information on children every two years since 2010, so the panel data has a total of five waves (2010,12,14,16,18). I want to know the association of family structure (especially parental absence) on child's mental health (whether child (age>=10) are depressed or not). I am using xtlogit command with either re / fe option to run random and fixed effect models.
Besides family structure variables, I have included child's age and its quadratic term as the independent variables, and both are significant.
However, when I included the panel indicator dummies (cfps_wave: panel year indicators 2012, 2014, 2016, 2018), the coefficients and significant levels for the age and the family structure variables changed considerably.
Code:
. xtlogit depress2cat ib1.race_han_x c.age_self_x##c.age_self_x ib3.tz_4cat i.region3cat, fe nolog or
PHP Code:
-------------------------------------------------------------------------------------------
depress2cat | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
race_han_x |
minority ethnicity | 1 (omitted)
age_self_x | .6243016 .0433709 -6.78 0.000 .5448294 .7153661
|
c.age_self_x#c.age_self_x | 1.011407 .0022313 5.14 0.000 1.007043 1.015789
|
tz_4cat |
no parent at home | 1.187886 .1047134 1.95 0.051 .999403 1.411917
only mama at home | 1.022447 .1331064 0.17 0.865 .7921873 1.319635
only baba at home | .890598 .1651428 -0.62 0.532 .6192189 1.280912
|
region3cat |
Central | .5709849 .2092956 -1.53 0.126 .2783652 1.171208
West Region | 1.141726 .3581595 0.42 0.673 .6173623 2.111465
-------------------------------------------------------------------------------------------
PHP Code:
-------------------------------------------------------------------------------------------
depress2cat | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
race_han_x |
minority ethnicity | 1 (omitted)
age_self_x | .7124481 .0749088 -3.22 0.001 .5797696 .8754895
|
c.age_self_x#c.age_self_x | 1.010447 .0022359 4.70 0.000 1.006074 1.014839
|
tz_4cat |
no parent at home | 1.06177 .142238 0.45 0.655 .816584 1.380575
only mama at home | 1.028602 .1349585 0.21 0.830 .7953619 1.33024
only baba at home | .8869098 .1651149 -0.64 0.519 .6157611 1.277458
|
cfps_wave |
2012 | .552281 .0923899 -3.55 0.000 .3978913 .7665769
2014 | .4985043 .1642456 -2.11 0.035 .2613471 .9508676
2016 | .4897406 .244647 -1.43 0.153 .1839727 1.303704
2018 | .4146053 .2762311 -1.32 0.186 .1123365 1.530202
|
region3cat |
Central | .5780685 .2126524 -1.49 0.136 .2810931 1.188799
West Region | 1.178924 .3724338 0.52 0.602 .6347208 2.189721
------------------------------------------------------------------------------------------