Is there a difference between clogit and xtlogit, fe? It appears to me they both do conditional logistic regression with fixed effects.
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Friday, May 31, 2019
Time lag for Multi_level fixed effect panel data.
Hi all,
I have some douts about time lagging for my independent variables in multi-way FE panel data (reghddfe). I have this panel model with a health outcome (ho) as my dependent variable and socioeconomic and health services system indicators (hi, hi_1, hi_2 and hi_3) as independent variables. As one of my IV's is the unemployment rate (ur), and presumably with lagged effect over my DP, I also suppose that the present unemployment rate also affect the health outcome, in a cummulative or interactive way?!?
Let's say that my model is :
reghdfe ho ur gdp gini hi hi_1 hi_2 hi_3, absorb(state) vce(cluster state#year),
Suppose that I have ur (actual unemployment rate) and ur_1, ur_2 and ur_3 as unemployment rates lagged in 1, 2 and 3 years in my data set, and I want to observe te effect of three consecutive years of unemployment (or occupation rate) over health outcome (ho my DP).
Would it be correct to model:
reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#c.ur_1#c.ur_2#c.ur_3, absorb(state) vce(cluster state#year)
, or, instead of using my own lagged data, to use the Stata “L" command for time lagged associated with unempolyment factor variable? So:
reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#L1.c.ur#L2.c.ur#L3.c.ur, absorb(state) vce(cluster state#year)
Any suggestion for this model.
Thanks in advance.
Alexandre Bugelli
I have some douts about time lagging for my independent variables in multi-way FE panel data (reghddfe). I have this panel model with a health outcome (ho) as my dependent variable and socioeconomic and health services system indicators (hi, hi_1, hi_2 and hi_3) as independent variables. As one of my IV's is the unemployment rate (ur), and presumably with lagged effect over my DP, I also suppose that the present unemployment rate also affect the health outcome, in a cummulative or interactive way?!?
Let's say that my model is :
reghdfe ho ur gdp gini hi hi_1 hi_2 hi_3, absorb(state) vce(cluster state#year),
Suppose that I have ur (actual unemployment rate) and ur_1, ur_2 and ur_3 as unemployment rates lagged in 1, 2 and 3 years in my data set, and I want to observe te effect of three consecutive years of unemployment (or occupation rate) over health outcome (ho my DP).
Would it be correct to model:
reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#c.ur_1#c.ur_2#c.ur_3, absorb(state) vce(cluster state#year)
, or, instead of using my own lagged data, to use the Stata “L" command for time lagged associated with unempolyment factor variable? So:
reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#L1.c.ur#L2.c.ur#L3.c.ur, absorb(state) vce(cluster state#year)
Any suggestion for this model.
Thanks in advance.
Alexandre Bugelli
-sreshape-
i just installed -sreshape- onto stata mp.. does anyone know if it has a max variable limit? i have 7500 vars in wide format that i wanted to -sreshape- into long but i keep getting error message
Lasso regress questions
Hi, I am college student from Barcelona. It is hard to learn Stata by myself because the teacher does not explain how to use commands, what they do... and we are in an introductory subject. The homework for this weekend uses a dataset with wage and some covariates, and we should use the lasso and ridge approach. He encouraged us to create as many variables as we can (I do not why, dummies, etc...). But he told that we should install (net install elasticregress, replace) and (ssc install lassopack, replace). I suppose that it install some new commands.
In the second question he says that we should use the commands rlasso and lassoregress. I do not know what is the difference between both commands, I could not fin it in Internet. Also I saw an extra command called lasso2. What they do? Thank you.
2) Use the lasso methods (rlasso, lassoregress and ridgeregress) to select the most relevant covariates for the analysis.
In the second question he says that we should use the commands rlasso and lassoregress. I do not know what is the difference between both commands, I could not fin it in Internet. Also I saw an extra command called lasso2. What they do? Thank you.
2) Use the lasso methods (rlasso, lassoregress and ridgeregress) to select the most relevant covariates for the analysis.
Add Text to Graph Combine
I am combining three graphs using 'graph combine'. By default they appear in a 2 x 2 arrangement with the lower right slot empty. I'd like to add text to this empty area. What's the best way to do this?
I have tried the 'caption' and 'note' options, with and without the 'position' suboption, but that distorts the shape or is at the far bottom of the combined graph.
I don't know if this would work, but if I could either save the text as a standalone .gph file, I could add that way, or perhaps there is an option I'm missing for 'graph combine' whereby you can just place text anywhere you like (using coordinates, not clock position).
I have tried the 'caption' and 'note' options, with and without the 'position' suboption, but that distorts the shape or is at the far bottom of the combined graph.
I don't know if this would work, but if I could either save the text as a standalone .gph file, I could add that way, or perhaps there is an option I'm missing for 'graph combine' whereby you can just place text anywhere you like (using coordinates, not clock position).
Obtaining mean and SD from survey data
Hello,
I am trying to get the differences in the length of stay(LOS) in subpopulation of myocarditis, categorized by whether they have arrhythmia or not (Tarry or not). I get mean and standard error. I would like to get mean and standard deviation. How would I be able to get that? Thanks.
This is what I did and what I got.
. svy linearized, subpop(myocarditis) : mean LOS, over(Tarry)
0: Tarry = 0
1: Tarry = 1
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
LOS |
0 | 8.035982 .3003205 7.447308 8.624657
1 | 15.02443 1.001127 13.06207 16.98679
--------------------------------------------------------------
I am trying to get the differences in the length of stay(LOS) in subpopulation of myocarditis, categorized by whether they have arrhythmia or not (Tarry or not). I get mean and standard error. I would like to get mean and standard deviation. How would I be able to get that? Thanks.
This is what I did and what I got.
. svy linearized, subpop(myocarditis) : mean LOS, over(Tarry)
0: Tarry = 0
1: Tarry = 1
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
LOS |
0 | 8.035982 .3003205 7.447308 8.624657
1 | 15.02443 1.001127 13.06207 16.98679
--------------------------------------------------------------
Showing country-specific treatment effects
Hello,
I'm currently working with a dataset with individual respondents. In my analysis, I show the average treatment effect for the treated group. I suspect, however, that treatment effects vary across countries. How can I show this?
I wish to report average treatment effects for several countries as shown in the picture (from another analysis):
Array
I'm currently working with a dataset with individual respondents. In my analysis, I show the average treatment effect for the treated group. I suspect, however, that treatment effects vary across countries. How can I show this?
I wish to report average treatment effects for several countries as shown in the picture (from another analysis):
Array
Standard Error Correction in a two step process
Dear all
I wonder if anyone has any references, and perhaps Stata applications, that can help me solve a problem like follows:
All models provide the same point estimates, but different standard errors. The benchmark is column 1.
Column 2 combines the effect of female and educ, and adds them into the model. and column 3 simply extracts the effect of female and education from lnwage before estimating the model.
My question is, does anyone know how to correct the standard errors from model 3 or 2, and obtain the "correct" ones from model 1?
I know this could be done using bootstrap methods, but Im trying to see if it can be done in a different way.
For more details on why this. Im revising Robinson's Semiparametric estimator. (see reference below). Im aware about the userwriten command -semipar-. However, the application itself does not provide much detail on how to estimate the standard errors of the nonparametric section.
Looking through the code, it does it in a similar way as the results from column 3, but what we want are the results from column 1.
Thank you in advance.
Robinson, P. M. 1988. Root-n-consistent semiparametric regression. Econometrica 56: 931–954.
I wonder if anyone has any references, and perhaps Stata applications, that can help me solve a problem like follows:
Code:
use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
* This is the benchmark
reg lnwage educ exper tenure female single
est sto m1
* now, i can create the following variables:
gen double female2=female*_b[female]
gen double educ2=educ*_b[educ]
gen double femaleeduc=female2+educ2
gen double lnwage2=lnwage-female2-educ2
* And estimate this model
reg lnwage femaleeduc exper tenure single
est sto m2
* or this other
reg lnwage2 exper tenure single
est sto m3
est tab m1 m2 m3, se
-----------------------------------------------------
Variable | m1 m2 m3
-------------+---------------------------------------
educ | .08303358
| .00513458
exper | .00895939 .00895939 .00895939
| .00156103 .00155551 .00153964
tenure | .00613849 .00613849 .00613849
| .00187615 .00186656 .00185977
female | -.09382159
| .02490175
single | -.16080964 -.16080964 -.16080964
| .02702903 .02694462 .0269209
femaleeduc | 1
| .05800787
_cons | 2.3408107 2.3408107 2.3408107
| .07085517 .06106525 .02653445
-----------------------------------------------------
legend: b/se
Column 2 combines the effect of female and educ, and adds them into the model. and column 3 simply extracts the effect of female and education from lnwage before estimating the model.
My question is, does anyone know how to correct the standard errors from model 3 or 2, and obtain the "correct" ones from model 1?
I know this could be done using bootstrap methods, but Im trying to see if it can be done in a different way.
For more details on why this. Im revising Robinson's Semiparametric estimator. (see reference below). Im aware about the userwriten command -semipar-. However, the application itself does not provide much detail on how to estimate the standard errors of the nonparametric section.
Looking through the code, it does it in a similar way as the results from column 3, but what we want are the results from column 1.
Thank you in advance.
Robinson, P. M. 1988. Root-n-consistent semiparametric regression. Econometrica 56: 931–954.
WLS regression using regwls and regress
Dear All,
I have a question regarding WLS regression using Stata commands regwls and regress and would kindly ask for your help.
For your information, The general idea is I want to use WLS regression with the monthly number of firms for each observation as weights (I created the weight column on excel and imported into Stata since I do not know how to generate it on Stata). Additionally, the "avg" variable is the equal-weighted monthly returns on each portfolio this is my dependent variable.
Firstly, I used the command "regwls avg MktRF SMB HML [aw=1/ Weight]" for the WLS regression with the analytical weight (However this command does not work on my Stata 13 version). Lately, I tried the command "regress avg MktRF SMB HML [aw=1/ Weight]" and it worked.
Could someone please let me know if the two commands are the same and whether my approach is correct?
Many thanks for your help!
Best regards,
Chi
I have a question regarding WLS regression using Stata commands regwls and regress and would kindly ask for your help.
For your information, The general idea is I want to use WLS regression with the monthly number of firms for each observation as weights (I created the weight column on excel and imported into Stata since I do not know how to generate it on Stata). Additionally, the "avg" variable is the equal-weighted monthly returns on each portfolio this is my dependent variable.
Firstly, I used the command "regwls avg MktRF SMB HML [aw=1/ Weight]" for the WLS regression with the analytical weight (However this command does not work on my Stata 13 version). Lately, I tried the command "regress avg MktRF SMB HML [aw=1/ Weight]" and it worked.
Could someone please let me know if the two commands are the same and whether my approach is correct?
Many thanks for your help!
Best regards,
Chi
Inverted Normal Graph
Hello all,
Admittedly a mundane question here... I'm simply trying to plot an inverted normal distribution for an upcoming presentation. I've successfully plotted a normal distribution, but now I simply need to flip it upside down.
Any suggestions would be greatly appreciated! I recently converted to Stata from SPSS and apologies for presumably a rather elementary question!
J.
Admittedly a mundane question here... I'm simply trying to plot an inverted normal distribution for an upcoming presentation. I've successfully plotted a normal distribution, but now I simply need to flip it upside down.
Code:
clear
set obs 100
gen x=rnormal(0,1)
twoway function y=normalden(x), range(-4 4) xtitle("{it: x}") ///
ytitle("Density") title("Standard Normal")
J.
How to do matrix exponential operation in Stata?
Code:
matrix A = (1,0,0,0,0\0.6,0,.4,0,0\0,.6,0,.4,0\0,0,.6,0,.4\0,0,0,0,1) matrix list A matrix B = A*A
Many thanks in advance!
gsem covstruct
Hello.
I am running LPA analysis using gsem command. I ve run the analysis in R using the mclust command too. The problem I have is that I dont get similar results. For example, for my best model in R (based on BIC etc), I am getting a 3 class what is called VVI model (that is varying volume and shape and identity for the orientation). In stata, I am trying to put the same constrains (that is, I want all parameters to vary freely) and I am not sure I am getting. I ve tried lcinvariant (none) and covstruct(e._LEn, diagonal) and I get similar but not the same results
anyone familiar with this?
thank you a lot
I am running LPA analysis using gsem command. I ve run the analysis in R using the mclust command too. The problem I have is that I dont get similar results. For example, for my best model in R (based on BIC etc), I am getting a 3 class what is called VVI model (that is varying volume and shape and identity for the orientation). In stata, I am trying to put the same constrains (that is, I want all parameters to vary freely) and I am not sure I am getting. I ve tried lcinvariant (none) and covstruct(e._LEn, diagonal) and I get similar but not the same results
anyone familiar with this?
thank you a lot
Generate balance table
Hi, I'm trying to replicate this
balance table (as in the picture) using some of the example datasets installed with Stata, in particular I was trying to use the bplong.dta. However I haven't been able to
do so. I found the use of the command iebaltab to do this table but I'm having problems understanding how it works. Do you have any idea how can I do this?

. Array
do so. I found the use of the command iebaltab to do this table but I'm having problems understanding how it works. Do you have any idea how can I do this?

. Array
extract variable labels for new variable names
Hi there
I try to automate my programming as much as possible and one challenge I've come up against recently is in trying to name new variables according to the value labels of existing variables.
For example:
My goal is to name the new variables:
type_measured
type_estimated
type_interpolated
Any advice greatly appreciated.
I try to automate my programming as much as possible and one challenge I've come up against recently is in trying to name new variables according to the value labels of existing variables.
For example:
Code:
sysuse sandstone tab type, gen(type_name)
type_measured
type_estimated
type_interpolated
Any advice greatly appreciated.
Count number of cases if dates are within a certain range (a la statsby)
Greetings all,
I have single line per observation survival data (4 million lines). Here is a simplified example
One thing I'd like to with my data is to understand the default rate per quarter, by department. I'd ultimately like to construct a second panel (or, a first one, since this isn't per se a panel as is) where I have the different zip codes as the subjects to be followed through time, and in the end the rate of default per time. I have unemployment data that is already organized in this fashion, and naturally I want to combine it with a default rate (# defaults / # "alive" or "at risk" loans) per zip code:
One guess was to create some new variable that uniquely identifies zipcode/quarter combinations, and then to do a statsby on this. But that would imply ~12,000 groups (100 zip codes * 30 years *4 quarters), and that just doesn't seem right/efficient.
It shouldn't be hard for me to find a way to count the defaults per quarter/department (although I can't do tab default department zipcode, as this is too many variables :/), but I must confess I have no idea where to start on counting (and organizing in a new panel, without Excel) the at-risk loans per quarter.
Thank you so much for even some rough intuitions about how to go about this in STATA.
Have a great day,
John
I have single line per observation survival data (4 million lines). Here is a simplified example
| default | zip code | date_start | date_end | date_default |
| 1 | 12345 | 2000q2 | 2016q1 | 2005q3 |
| 0 | 54321 | 1993q4 | 2016q1 | |
| 1 | 13467 | 2003q1 | 2016q1 | 2010q1 |
| zip code | date | unemployment | default rate |
| 11111 | 1990q1 | 4.2 | x |
| 11111 | 1990q2 | 4.1 | x |
| 11111 | 1990q3 | 4.6 | x |
One guess was to create some new variable that uniquely identifies zipcode/quarter combinations, and then to do a statsby on this. But that would imply ~12,000 groups (100 zip codes * 30 years *4 quarters), and that just doesn't seem right/efficient.
It shouldn't be hard for me to find a way to count the defaults per quarter/department (although I can't do tab default department zipcode, as this is too many variables :/), but I must confess I have no idea where to start on counting (and organizing in a new panel, without Excel) the at-risk loans per quarter.
Thank you so much for even some rough intuitions about how to go about this in STATA.
Have a great day,
John
Multinomial logit with sample selection
Dear everyone,
I am looking for something similar to Heckman selection model/svysemlog with a modification.
I have a selectiion variable with two values (0 and 1) in the first step, and a mulitnomial non-ordinal categorical variable (with six categories) in the second step.
I am interested only in positive (1) values in the first step (around 30% of the total sample).
What I did at the first place was a. logit analysis for the first step b. multinomial logit for the second step. However, I was advised to use the Heckman selection model for multiple reasons.
However, if I am not mistaken, Heckman (and svysemlog) cannot be used if the outcome variable is a non-ordinal variable.
I have two questions:
a. Is there any Stata package that adresses my problem?
b. Do you have any advice how to proceed, in case there is no ready-made solution in Stata?
Thanks in advance!
I am looking for something similar to Heckman selection model/svysemlog with a modification.
I have a selectiion variable with two values (0 and 1) in the first step, and a mulitnomial non-ordinal categorical variable (with six categories) in the second step.
I am interested only in positive (1) values in the first step (around 30% of the total sample).
What I did at the first place was a. logit analysis for the first step b. multinomial logit for the second step. However, I was advised to use the Heckman selection model for multiple reasons.
However, if I am not mistaken, Heckman (and svysemlog) cannot be used if the outcome variable is a non-ordinal variable.
I have two questions:
a. Is there any Stata package that adresses my problem?
b. Do you have any advice how to proceed, in case there is no ready-made solution in Stata?
Thanks in advance!
Manually installing Blindschemes by Daniel Bischof
Dear Statalisters
I admit this is a bit of a non-problem, but I'd like to find a solution nonetheless. Never underappreciate a nice graph.
I'm trying to use Daniel Bischof's schemes for making graphs (found here: https://danbischof.com/2015/02/04/stata-figure-schemes/). My organisation doesn't allow installing via ssc, so I downloaded all the scheme and style files and added them to the folder where all my other ado files are stored. I saved the color files both into a separate folder called "style" (this is what ssc does, I think), as well as in the same folder with the scheme files. Now, when I'm setting the color scheme to plotplainblind, the graphs come out in that scheme, but in black and white. The command doesn't seem to find the colors. So, I think I need to define these colors first in some way, but I don't know how. Any suggestions?
Many thanks
Carolin
I admit this is a bit of a non-problem, but I'd like to find a solution nonetheless. Never underappreciate a nice graph.
I'm trying to use Daniel Bischof's schemes for making graphs (found here: https://danbischof.com/2015/02/04/stata-figure-schemes/). My organisation doesn't allow installing via ssc, so I downloaded all the scheme and style files and added them to the folder where all my other ado files are stored. I saved the color files both into a separate folder called "style" (this is what ssc does, I think), as well as in the same folder with the scheme files. Now, when I'm setting the color scheme to plotplainblind, the graphs come out in that scheme, but in black and white. The command doesn't seem to find the colors. So, I think I need to define these colors first in some way, but I don't know how. Any suggestions?
Many thanks
Carolin
How to declare data with tournament structure as panel data?
Dear all,
I recently read some papers using panel data from sports. I started to wonder how one would actually declear data e.g. from tennis to be panel data.
Typically, in tennis there is a season which consists of several tournaments. In turn, each of these tournaments consists of several matches. Each match consists of a sequence of sets. A set in turn, consists of a sequence games.
So, one observation is for player x from game g in set s of match m played for tournament t in season z. If there are seperate variables indicating the season (e.g. 2015), the tournament (e.g. 1), the match (e.g. 1), the set (e.g. 1), and the game (e.g. 1), how would one declare the data to be panel while keeping the structure described above? I included the code for a sample data set below.
Obviously, the panelvar in the xtset-command would be player_id. But how would one set the timevar if one's goal was to run a panel data regression (e.g. using xtreg) at the game-level which includes time lags (e.g. matchlevelstat1 from the previous match as well as gamelevelstat1 and gamelevelstat2 from the previous game, which might actually be from the same tournament and same match but from the previous set of that match) as independent variables?
I recently read some papers using panel data from sports. I started to wonder how one would actually declear data e.g. from tennis to be panel data.
Typically, in tennis there is a season which consists of several tournaments. In turn, each of these tournaments consists of several matches. Each match consists of a sequence of sets. A set in turn, consists of a sequence games.
So, one observation is for player x from game g in set s of match m played for tournament t in season z. If there are seperate variables indicating the season (e.g. 2015), the tournament (e.g. 1), the match (e.g. 1), the set (e.g. 1), and the game (e.g. 1), how would one declare the data to be panel while keeping the structure described above? I included the code for a sample data set below.
Obviously, the panelvar in the xtset-command would be player_id. But how would one set the timevar if one's goal was to run a panel data regression (e.g. using xtreg) at the game-level which includes time lags (e.g. matchlevelstat1 from the previous match as well as gamelevelstat1 and gamelevelstat2 from the previous game, which might actually be from the same tournament and same match but from the previous set of that match) as independent variables?
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(year player_id tournament match) byte(set game) float(gamelevelstat1 gamelvelstat2 setlevelstat1 matchlevelstat1 tournamentlevelstat1 yearlevelstat1) 2018 1 1 1 1 1 19 19 22 26 57 17 2018 1 1 1 1 2 64 19 100 39 3 47 2018 1 1 1 2 1 100 32 79 93 32 92 2018 1 1 1 2 2 67 70 15 63 82 88 2018 1 1 2 1 1 86 12 83 92 55 50 2018 1 1 2 1 2 67 97 95 93 100 48 2018 1 1 2 2 1 14 53 58 28 26 6 2018 1 2 1 1 1 8 78 6 35 22 41 2018 1 2 1 1 2 87 85 68 55 98 17 2018 1 2 1 2 1 32 56 87 69 40 94 2018 1 2 1 2 2 47 24 42 89 32 99 2018 1 2 2 1 1 16 98 38 85 21 11 2018 1 2 2 1 2 88 1 87 60 96 28 2018 1 2 2 2 1 14 72 50 19 55 14 2019 1 1 1 1 1 34 48 16 38 95 44 2019 1 1 1 1 2 73 6 25 26 93 96 2019 1 1 1 2 1 92 27 48 89 68 99 2019 1 1 1 2 2 62 66 66 27 80 22 2019 1 1 2 1 1 69 46 40 2 90 59 2019 1 1 2 1 2 27 74 55 13 14 73 2019 1 1 2 2 1 11 61 75 26 73 26 2019 1 2 1 1 1 12 43 16 28 58 15 2019 1 2 1 1 2 49 49 91 83 61 35 2019 1 2 1 2 1 71 1 62 90 50 54 2019 1 2 1 2 2 88 53 6 58 40 99 2019 1 2 2 1 1 84 13 33 96 3 30 2019 1 2 2 1 2 79 68 80 18 86 19 2019 1 2 2 2 1 52 5 77 17 36 48 2018 2 1 1 1 1 59 67 5 29 96 22 2018 2 1 1 1 2 89 34 22 69 100 40 2018 2 1 1 2 1 5 74 8 49 97 83 2018 2 1 1 2 2 58 91 44 66 58 62 2018 2 1 2 1 1 96 77 73 53 59 62 2018 2 1 2 1 2 90 38 32 80 2 42 2018 2 1 2 2 1 79 43 90 18 6 1 2018 2 2 1 1 1 49 85 38 25 95 33 2018 2 2 1 1 2 23 35 35 51 9 53 2018 2 2 1 2 1 9 92 49 98 91 44 2018 2 2 1 2 2 78 9 26 81 23 39 2018 2 2 2 1 1 85 13 98 55 8 77 2018 2 2 2 1 2 24 38 75 12 1 53 2018 2 2 2 2 1 65 91 31 49 96 70 2019 2 1 1 1 1 100 38 9 86 15 83 2019 2 1 1 1 2 78 3 94 9 32 26 2019 2 1 1 2 1 73 40 41 62 60 59 2019 2 1 1 2 2 2 30 26 62 78 49 2019 2 1 2 1 1 21 83 58 10 25 16 2019 2 1 2 1 2 63 92 78 4 29 23 2019 2 1 2 2 1 98 67 59 61 82 62 2019 2 2 1 1 1 75 48 72 25 14 64 2019 2 2 1 1 2 87 76 87 98 60 7 2019 2 2 1 2 1 42 40 38 12 61 29 2019 2 2 1 2 2 12 82 72 48 61 59 2019 2 2 2 1 1 35 42 50 24 14 17 2019 2 2 2 1 2 84 73 75 25 25 72 2019 2 2 2 2 1 50 85 79 8 56 52 end label var year "season" label var player_id "player " label var tournament "tournament number" label var match "match number" label var set "set" label var game "game" label var gamelevelstat1 "game-level statistic 1" label var gamelvelstat2 "game-level statistic 2" label var setlevelstat1 "set-level statistic 1" label var matchlevelstat1 "match-level statistic 1" label var tournamentlevelstat1 "tournament-level statistic 1" label var yearlevelstat1 "season-level statistic 1"
Drop ID if different observations for that same ID do not vary across another variable
Hello,
I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.
The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.
I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.
I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.
The data looks like this: (I excluded other variables that are not important to the question)
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12
344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]
I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.
The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.
I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.
I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.
The data looks like this: (I excluded other variables that are not important to the question)
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12
344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]
Drop ID if different observations for that same ID do not vary across another variable
Hello,
I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.
The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.
I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.
I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.
The data looks like this: (I excluded other variables that are not important to the question)
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12
344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]
I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.
The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.
I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.
I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.
The data looks like this: (I excluded other variables that are not important to the question)
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12
344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]
Mixed or reg i.country i.year for repeated cross-section data
Dear all,
I am doing a time-series cross-sectional data from 4 waves and around 25 countries and I am using Stata 14. The dataset used is the International Social Survey Programme years 1988, 1994, 2002 and 2012. My main variable of interest is female hours worked per week (originally WRKHRS, for purpose of analysis generated work hours only for females, 0 if otherwise) and how are they affected by the benefit amount/presence in the country. First I had these benefits in the percentage of expenditure per GDP, but my supervisor told me to generate dummies, 0 for no benefit and 1 for the benefit, for all the different types I had. I have them both ways now. I have two parts of the research: first is a regression with female hours worked per week and the relationship with different types of benefits, the second part is focused on analyzing attitudes - support for traditional gender roles of men, comparing between countries.
I want to do an individual level analysis (within respondents) on the effect based on education##benefit, marital status, attendance of religious services and presence of a child. On country level variables I have the benefits and Unemployment rates and labor force participation for men and women, total fertility rate and types of expenditure - public total, in-kind % of GDP, in cash % of GDP and real GDP forecast. I know its too much, I won't be using all of them, just letting you know what I have.
I was planning to do a mixed command, starting with basic mixed femworkhours || countryid: , and build upon that, adding more lvl1 predictors and then lvl 2. However, I cannot declare it a panel data set because of repeated time values within the data set, so I set it xtset countryid (As i read somewhere in this forum it is an option for repeated cross-section data). Since this is my thesis, I asked my supervisor if I should use mixed or a simple reg with i.countryid i.wave, and he suggested to use reg with i.countryid i.year. Nevertheless, when I regress it does not seem that there is a significant but small country effect, and it comes out that the first part of the analysis ignores country and year effects. Could the problem be if I run a basic regression with fixed country and year effects I should use mean hours worked by country rather than individual level? I was browsing this forum and the internet and unfortunately could not find the answers I was looking for.
Hence the question, what would you suggest to do with this data? The variable female work hours presented below looks like many observations are missing, but that is not the case since I run mdesc command and from the total sample 33% are missing (the values range from 0-80 hours worked per week). I hope this question is clear enough to understand, if not, please let me know where I can elaborate.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(femworkhours married incgroup fulltime parttime attend1) byte educ float(dbgrant drealfam dincmaint ddaycare dpleave dchildall wave countryid)
0 0 1 0 0 1 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 0 1 0 0 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 0 1 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 2 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 0 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 1 0 0 1 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
0 0 3 0 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 5 1 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 0 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 3 0 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
0 1 5 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 0 3 0 1 0 2 0 1 0 0 0 0 2 1
. 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
0 1 1 0 0 0 1 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 1 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 1 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
end
label values incgroup incgroup
label def incgroup 1 "10%", modify
label def incgroup 2 "25%", modify
label def incgroup 3 "50%", modify
label def incgroup 4 "75%", modify
label def incgroup 5 "90%", modify
label values fulltime employed
label def employed 0 "not fulltime", modify
label def employed 1 "fulltime", modify
label values educ educ
label def educ 0 "no education", modify
label def educ 1 "primary/lower secondary", modify
label def educ 2 "upper/post secondary", modify
label def educ 3 "lower/upper tertiary", modify
label values wave wave
label def wave 2 "1994", modify
label values countryid countryid
label def countryid 1 "AU", modify
[/CODE]
I am doing a time-series cross-sectional data from 4 waves and around 25 countries and I am using Stata 14. The dataset used is the International Social Survey Programme years 1988, 1994, 2002 and 2012. My main variable of interest is female hours worked per week (originally WRKHRS, for purpose of analysis generated work hours only for females, 0 if otherwise) and how are they affected by the benefit amount/presence in the country. First I had these benefits in the percentage of expenditure per GDP, but my supervisor told me to generate dummies, 0 for no benefit and 1 for the benefit, for all the different types I had. I have them both ways now. I have two parts of the research: first is a regression with female hours worked per week and the relationship with different types of benefits, the second part is focused on analyzing attitudes - support for traditional gender roles of men, comparing between countries.
I want to do an individual level analysis (within respondents) on the effect based on education##benefit, marital status, attendance of religious services and presence of a child. On country level variables I have the benefits and Unemployment rates and labor force participation for men and women, total fertility rate and types of expenditure - public total, in-kind % of GDP, in cash % of GDP and real GDP forecast. I know its too much, I won't be using all of them, just letting you know what I have.
I was planning to do a mixed command, starting with basic mixed femworkhours || countryid: , and build upon that, adding more lvl1 predictors and then lvl 2. However, I cannot declare it a panel data set because of repeated time values within the data set, so I set it xtset countryid (As i read somewhere in this forum it is an option for repeated cross-section data). Since this is my thesis, I asked my supervisor if I should use mixed or a simple reg with i.countryid i.wave, and he suggested to use reg with i.countryid i.year. Nevertheless, when I regress it does not seem that there is a significant but small country effect, and it comes out that the first part of the analysis ignores country and year effects. Could the problem be if I run a basic regression with fixed country and year effects I should use mean hours worked by country rather than individual level? I was browsing this forum and the internet and unfortunately could not find the answers I was looking for.
Hence the question, what would you suggest to do with this data? The variable female work hours presented below looks like many observations are missing, but that is not the case since I run mdesc command and from the total sample 33% are missing (the values range from 0-80 hours worked per week). I hope this question is clear enough to understand, if not, please let me know where I can elaborate.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(femworkhours married incgroup fulltime parttime attend1) byte educ float(dbgrant drealfam dincmaint ddaycare dpleave dchildall wave countryid)
0 0 1 0 0 1 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 0 1 0 0 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 0 1 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 2 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 0 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 1 0 0 1 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
0 0 3 0 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 5 1 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 0 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 3 0 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
0 1 5 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 0 3 0 1 0 2 0 1 0 0 0 0 2 1
. 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
0 1 1 0 0 0 1 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 1 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 1 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
end
label values incgroup incgroup
label def incgroup 1 "10%", modify
label def incgroup 2 "25%", modify
label def incgroup 3 "50%", modify
label def incgroup 4 "75%", modify
label def incgroup 5 "90%", modify
label values fulltime employed
label def employed 0 "not fulltime", modify
label def employed 1 "fulltime", modify
label values educ educ
label def educ 0 "no education", modify
label def educ 1 "primary/lower secondary", modify
label def educ 2 "upper/post secondary", modify
label def educ 3 "lower/upper tertiary", modify
label values wave wave
label def wave 2 "1994", modify
label values countryid countryid
label def countryid 1 "AU", modify
[/CODE]
destring numbers in scientific notation
Dear community,
I was inattentive when pasting data into a new stata-file and now the following problem presents itself: I have a unique numeric identifier with verly large number such that stata abbreviated it to scientific notation e.g. 1.7876423e+11. In the new file this numeric identifier appears as string and contains commas (",") instead of dots. I have now trired to destring this varlist unsuccessfully.
Is there someone who has encountered a similar problem before and could help me out?
Kind regards,
Marie
I was inattentive when pasting data into a new stata-file and now the following problem presents itself: I have a unique numeric identifier with verly large number such that stata abbreviated it to scientific notation e.g. 1.7876423e+11. In the new file this numeric identifier appears as string and contains commas (",") instead of dots. I have now trired to destring this varlist unsuccessfully.
Is there someone who has encountered a similar problem before and could help me out?
Kind regards,
Marie
Interpretation of the interaction term when the relevant dummy variable is insignificant
Dear all,
In the model that I have run to analyse the effect of currency swaps on the gross capital flows of the countries signing them I have included a dummy variable for the signing of such a currency swap agreement (signing=1) as well as a dummy for whether the country is a developed economy or a developing one (developing =1) To check whether the effect of a currency swap differs between a developing or developed country I have included an interaction term between these two dummy variables (signed and developing =1). However, the results that the model produced have rendered my interaction term significant at the 1% level with a positive coefficient, but my dummy variable for the signing of the currency swap is negative (as expected) but insignificant. How do I interpret these results? Due to the fact that the original currency swap dummy variable is insignificant it is impossible to conclude how large the positive effect of a currency swap is for a developing country right? However, does it still allow me to say that a positive relationship exist, but that the size of it is unclear due to the insignificance of the foregoing dummy variable? Many thanks in advance!
Kind regards,
Owen
In the model that I have run to analyse the effect of currency swaps on the gross capital flows of the countries signing them I have included a dummy variable for the signing of such a currency swap agreement (signing=1) as well as a dummy for whether the country is a developed economy or a developing one (developing =1) To check whether the effect of a currency swap differs between a developing or developed country I have included an interaction term between these two dummy variables (signed and developing =1). However, the results that the model produced have rendered my interaction term significant at the 1% level with a positive coefficient, but my dummy variable for the signing of the currency swap is negative (as expected) but insignificant. How do I interpret these results? Due to the fact that the original currency swap dummy variable is insignificant it is impossible to conclude how large the positive effect of a currency swap is for a developing country right? However, does it still allow me to say that a positive relationship exist, but that the size of it is unclear due to the insignificance of the foregoing dummy variable? Many thanks in advance!
Kind regards,
Owen
Thursday, May 30, 2019
categorizing data
Hi all, I have a dataset includes up to 30 string variables. Some of them are dummy variables and the others are categorized with a limited number of categories. I'm trying to categorize the data according to their common features. A potential approach is to use the "tabulate" command. However, Tabulating for 30 variables makes no sense and is difficult even using a prefix command like "by."
Droping observations with x amount of missing values
Dear all,
I'm working with a messy data set with approximately 870 observations at the moment.
After using the command missings table, I realised that 91 observations have 99 missing values out of 108 variables. I used missings list, min(99) to see which observations
account for this.
Now, I want to drop these observations from the data set. I wonder if there is a command that would use the information produced by missings list, min(99) to drop these observations?
Can anyone help? I've been looking for a solution for quite some time, without success.
Thank you.
I'm working with a messy data set with approximately 870 observations at the moment.
After using the command missings table, I realised that 91 observations have 99 missing values out of 108 variables. I used missings list, min(99) to see which observations
account for this.
Now, I want to drop these observations from the data set. I wonder if there is a command that would use the information produced by missings list, min(99) to drop these observations?
Can anyone help? I've been looking for a solution for quite some time, without success.
Thank you.
Meta analysis of hazard ratio
Dear ,
My name is Hatem Ali
I am trying to do meta analysis of hazard ratio to assess effect of rise in IL6 on overall survival
I have the following data
As you see, I have sample size, p value, lower and higher confidence interval.
However, only 3 studies are reporting HR, 2 are reporting OR, one is reporting chi square.
Is there a way to calculate hazard ratio from the studies reporting odds ratio or chi square?
In other words,can I convert Odds ratio to hazard ratio? and can I convert chi square to hazard ratio?
If that is not possible, then , can I calculate Relative risk for each study from the data I have?
Can I calculate RR from HR, 95% CI , sample size and P value?
Can I calculate RR drom OR, 95%CI, sample size and P value?
Can I calculate RR from chi square, sample size and P value?
Finally , after calculating HR, the syntax to use is : metan HR lower higher, counts random
is that correct?
How can I add the name of the studies to the forest plot using this syntax?
Looking forward to hear back from you
My name is Hatem Ali
I am trying to do meta analysis of hazard ratio to assess effect of rise in IL6 on overall survival
I have the following data
| id | Sample size | p value | CI LOWER | CI HIGHER | notes |
| Pecoits-Filho (HR for overall mortality,IL6 was higher in CVD group) | 99 | 0.01 | chi square=11.3 | ||
| Liu et al | 50 | 0.001 | OR=6.9 | ||
| cho et al (trend over time) | 175 | 0.03 | 1.31 | 87.75 | OR=10.72 |
| Lambie et al | 575 | 0.008 | 1.22 | 3.78 | HR=2.15 |
| Lambie et al 2 | 384 | 0.009 | 1.28 | 5.58 | HR=2.68 |
| Wang et al(mortality and Coronary calcification) | 152 | 0.003 | 1.53 | 8.26 | HR=3.56 |
As you see, I have sample size, p value, lower and higher confidence interval.
However, only 3 studies are reporting HR, 2 are reporting OR, one is reporting chi square.
Is there a way to calculate hazard ratio from the studies reporting odds ratio or chi square?
In other words,can I convert Odds ratio to hazard ratio? and can I convert chi square to hazard ratio?
If that is not possible, then , can I calculate Relative risk for each study from the data I have?
Can I calculate RR from HR, 95% CI , sample size and P value?
Can I calculate RR drom OR, 95%CI, sample size and P value?
Can I calculate RR from chi square, sample size and P value?
Finally , after calculating HR, the syntax to use is : metan HR lower higher, counts random
is that correct?
How can I add the name of the studies to the forest plot using this syntax?
Looking forward to hear back from you
Creating a value that equals the average of other values for the same variable
(Sorry about the poor phrasing of the question)
My dataset contains variables such as country and prevalence of obesity, for 34 countries. I want to create a new value for country variable that will be the average of obesity of all the countries, i.e there will be 35 categories under the country variable. Is there any command to do that in Stata 14.
My dataset contains variables such as country and prevalence of obesity, for 34 countries. I want to create a new value for country variable that will be the average of obesity of all the countries, i.e there will be 35 categories under the country variable. Is there any command to do that in Stata 14.
Collinearity error when including continuous variable in dummy regression
I am trying to run the following two regressions to compare the coefficients on the 'iso_str' dummies. The only difference between the two is that the second one includes the variable 'shr',
1.
2.
When I run regression (2) above, Stata omits the 'shr' variable because of collinearity.
Then, I tried an alternative formulation of the above two regressions to see if this way I could compare their coefficients. Again, the only difference is the inclusion of the variable 'shr' in the second regression.
3.
4.
Notice that the outputted coefficients for the 'iso_str' dummies in (3) are identical to those in (4). However, I still can't compare 3 vs.4, as this time Stata doesn't omit the 'shr' variable in reg 4, but it omits one of the 'var_str' dummies in 4 (again, because of collinearity)..even though I used the 'ibn' command so that none would be dropped!
How can I compare the 'iso_str' coefficients outputted by these two regressions, with and without the variable 'shr'? Perhaps there is a way around the collinearity issue I am facing, e.g. rearranging my data differently?
Thank you. An excerpt of my data is below.
1.
Code:
reg lncost ib6.iso_str i.var_str, eform(exp_coeff) baselevels
Code:
reg lncost ib6.iso_str shr i.var_str, eform(exp_coeff) baselevels
Then, I tried an alternative formulation of the above two regressions to see if this way I could compare their coefficients. Again, the only difference is the inclusion of the variable 'shr' in the second regression.
3.
Code:
reg lncost ib6.iso_str ibn.var_str, noconstant eform(exp_coeff) baselevels
Code:
reg lncost ib6.iso_str shr ibn.var_str, noconstant eform(exp_coeff) baselevels
How can I compare the 'iso_str' coefficients outputted by these two regressions, with and without the variable 'shr'? Perhaps there is a way around the collinearity issue I am facing, e.g. rearranging my data differently?
Thank you. An excerpt of my data is below.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str3 iso str5 var double(cost shr) long(iso_str var_str) float lncost "CIV" "x1105" 11458.3333333333 49.674 1 1 9.346473 "COD" "x1105" 44083.2888217523 56.12 2 1 10.693836 "MRT" "x1105" 540 47.176 3 1 6.291569 "NGA" "x1105" 16842.1052631579 50.481 4 1 9.731637 "TGO" "x1105" 5590.76923076923 58.838 5 1 8.628872 "TZA" "x1105" 48000 66.947 6 1 10.778956 "ZAF" "x1105" 904.655301204819 34.150000000000006 7 1 6.807554 "CIV" "x1106" 10441.1764705882 49.674 1 2 9.253512 "COD" "x1106" 39391.0340285401 56.12 2 2 10.581293 "MRT" "x1106" 520 47.176 3 2 6.253829 "NGA" "x1106" 11834.3195266272 50.481 4 2 9.378759 "TGO" "x1106" 4398.8603988604 58.838 5 2 8.389101 "TZA" "x1106" 45000 66.947 6 2 10.714417 "ZAF" "x1106" 608.84493902439 34.150000000000006 7 2 6.411563 "CIV" "x1107" 12032.0855614973 49.674 1 3 9.395332 "MRT" "x1107" 463.636363636364 47.176 3 3 6.139101 "NGA" "x1107" 17391.3043478261 50.481 4 3 9.763725 "TGO" "x1107" 5015.38461538462 58.838 5 3 8.520266 "TZA" "x1107" 43636.3636363636 66.947 6 3 10.683646 "ZAF" "x1107" 984.375 34.150000000000006 7 3 6.892007 end label values iso_str iso_str label def iso_str 1 "CIV", modify label def iso_str 2 "COD", modify label def iso_str 3 "MRT", modify label def iso_str 4 "NGA", modify label def iso_str 5 "TGO", modify label def iso_str 6 "TZA", modify label def iso_str 7 "ZAF", modify label values var_str var_str label def var_str 1 "x1105", modify label def var_str 2 "x1106", modify label def var_str 3 "x1107", modify
getting a sample size
hi,
I'm trying to get the sample size of black women from my data set. I created a black women(bw) variable, counted the bw and then collapsed it. Take a look at my code below. Is this the right approach to get sample size for bw? also should i add more or less variables in my by( )?
Array
I'm trying to get the sample size of black women from my data set. I created a black women(bw) variable, counted the bw and then collapsed it. Take a look at my code below. Is this the right approach to get sample size for bw? also should i add more or less variables in my by( )?
Array
Cluster Randomized Controlled Trial
I have a question about the Cluster Randomized Controlled Trial. Is it recommended to perform the svyset command when doing the Cluster Randomized Trial? Another question is that what command can we use if we want to adjust for clustering?
Thanks!
Thanks!
traj command with hierarchical data structure
I'm interested in using the user-written traj command (link below) to identify latent trajectories of change in patients' BMIs. The command has useful features like joint trajectory modeling and accounting for non-random attrition.
However, patients in my dataset are nested within physicians; I have unique physician identifiers for each physician.
Questions:
1. Is there a method or workaround that would allow traj to account for hierarchically nested data?
2. If not, to what degree would traj be robust to violation of the assumption that patients are independent of each other?
https://www.andrew.cmu.edu/user/bjones/
However, patients in my dataset are nested within physicians; I have unique physician identifiers for each physician.
Questions:
1. Is there a method or workaround that would allow traj to account for hierarchically nested data?
2. If not, to what degree would traj be robust to violation of the assumption that patients are independent of each other?
https://www.andrew.cmu.edu/user/bjones/
Dipendent Double-Sorting 25 Portfolios
Dear all,
I am struggling to replicate the FF-25 portfolios with a variant, I should employ a dependent sort instead of a independent sort.
My dataset is the following one:
where permo identifies the company, primexch identifies the stock exhanges ( Q=Nasdaq, N=Nyse, A=Amex), ret indicates the returns, BM (book-to-market value) is calculated at the end of the year t and it results to be publicly available in June of the year t+1 until May of the year t+2, finally MarketCap indicates the size of each company and it is calculated in June of year t and remains constant until May of year t+1.
I should sort stocks into 5 quintiles in accordance with their BM, and secondly I should double sort within each quintile according to companies’ Market Cap. The quintiles breakpoints should be calculated using only NYSE stocks (“N”).
Therefore I should obtain 25 portfolios, which are firstly sorted on BM and secondly on MarketCap.
Finally I should calculated the value-weighted monthly returns on these 25 portfolios from July of year t to June of year t+1.
Ps: I tried to write a code for calculate the value-weighted monthly returns on 10 deciles sorted on MarketCap for another type of calculation I had to do..Maybe it could be helpful
forvalues i = 1(1)10 {
egen num_return_dec_`i' = total(MarketCap * ret * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
egen den_return_dec_`i' = total(MarketCap * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
gen vw_return_dec`i' = num_return_dec_`i'/den_return_dec_`i' if deciles_MarketCap==`i'
}
Any help would be really appreciated as it is one week that I have been trying to solve this problem.
Best regards,
Antonio
I am struggling to replicate the FF-25 portfolios with a variant, I should employ a dependent sort instead of a independent sort.
My dataset is the following one:
| permno | date | primexch | ret | year | month | datem | BM | id | MarketCap |
| 10001 | 30-May-86 | Q | -0.00980 | 1986 | 5 | 316 | . | 2 | . |
| 10001 | 30-Jun-86 | Q | -0.01307 | 1986 | 6 | 317 | . | 2 | 1.797265 |
| 10001 | 31-Jul-86 | Q | -0.01020 | 1986 | 7 | 318 | . | 2 | 1.797265 |
| 10001 | 29-Aug-86 | Q | 0.07216 | 1986 | 8 | 319 | . | 2 | 1.797265 |
| 10001 | 30-Sep-86 | Q | -0.00308 | 1986 | 9 | 320 | . | 2 | 1.797265 |
| 10001 | 31-Oct-86 | Q | 0.03922 | 1986 | 10 | 321 | . | 2 | 1.797265 |
| 10001 | 28-Nov-86 | Q | 0.05660 | 1986 | 11 | 322 | . | 2 | 1.797265 |
| 10001 | 31-Dec-86 | Q | 0.01500 | 1986 | 12 | 323 | . | 2 | 1.797265 |
| 10001 | 30-Jan-87 | Q | -0.03571 | 1987 | 1 | 324 | . | 2 | 1.797265 |
| 10001 | 27-Feb-87 | Q | -0.07407 | 1987 | 2 | 325 | . | 2 | 1.797265 |
| 10001 | 31-Mar-87 | Q | 0.03680 | 1987 | 3 | 326 | . | 2 | 1.797265 |
| 10001 | 30-Apr-87 | Q | -0.03922 | 1987 | 4 | 327 | . | 2 | 1.797265 |
| 10001 | 29-May-87 | Q | -0.07143 | 1987 | 5 | 328 | . | 2 | 1.797265 |
| 10001 | 30-Jun-87 | Q | 0.05143 | 1987 | 6 | 329 | 1.0144155 | 2 | 1.761665 |
| 10001 | 31-Jul-87 | Q | 0.02128 | 1987 | 7 | 330 | 1.0144155 | 2 | 1.761665 |
| 10001 | 31-Aug-87 | Q | 0.08333 | 1987 | 8 | 331 | 1.0144155 | 2 | 1.761665 |
| 10001 | 30-Sep-87 | Q | -0.02231 | 1987 | 9 | 332 | 1.0144155 | 2 | 1.761665 |
| 10001 | 30-Oct-87 | Q | 0.02000 | 1987 | 10 | 333 | 1.0144155 | 2 | 1.761665 |
| 10001 | 30-Nov-87 | Q | -0.02941 | 1987 | 11 | 334 | 1.0144155 | 2 | 1.761665 |
| 10001 | 31-Dec-87 | Q | -0.03354 | 1987 | 12 | 335 | 1.0144155 | 2 | 1.761665 |
| 10001 | 29-Jan-88 | Q | 0.06383 | 1988 | 1 | 336 | 1.0144155 | 2 | 1.761665 |
| 10001 | 29-Feb-88 | Q | 0.08000 | 1988 | 2 | 337 | 1.0144155 | 2 | 1.761665 |
| 10001 | 31-Mar-88 | Q | -0.07630 | 1988 | 3 | 338 | 1.0144155 | 2 | 1.761665 |
| 10001 | 29-Apr-88 | Q | 0.03061 | 1988 | 4 | 339 | 1.0144155 | 2 | 1.761665 |
| 10001 | 31-May-88 | Q | 0.01980 | 1988 | 5 | 340 | 1.0144155 | 2 | 1.761665 |
| 10001 | 30-Jun-88 | Q | -0.01204 | 1988 | 6 | 341 | 1.2076184 | 2 | 1.824549 |
| 10001 | 29-Jul-88 | Q | 0.03000 | 1988 | 7 | 342 | 1.2076184 | 2 | 1.824549 |
| 10001 | 31-Aug-88 | Q | 0.02913 | 1988 | 8 | 343 | 1.2076184 | 2 | 1.824549 |
| 10001 | 30-Sep-88 | Q | -0.021132076 | 1988 | 9 | 344 | 1.2076184 | 2 | 1.824549 |
| 10001 | 31-Oct-88 | Q | 0.039215688 | 1988 | 10 | 345 | 1.2076184 | 2 | 1.824549 |
| 10001 | 30-Nov-88 | Q | 0 | 1988 | 11 | 346 | 1.2076184 | 2 | 1.824549 |
I should sort stocks into 5 quintiles in accordance with their BM, and secondly I should double sort within each quintile according to companies’ Market Cap. The quintiles breakpoints should be calculated using only NYSE stocks (“N”).
Therefore I should obtain 25 portfolios, which are firstly sorted on BM and secondly on MarketCap.
Finally I should calculated the value-weighted monthly returns on these 25 portfolios from July of year t to June of year t+1.
Ps: I tried to write a code for calculate the value-weighted monthly returns on 10 deciles sorted on MarketCap for another type of calculation I had to do..Maybe it could be helpful
forvalues i = 1(1)10 {
egen num_return_dec_`i' = total(MarketCap * ret * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
egen den_return_dec_`i' = total(MarketCap * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
gen vw_return_dec`i' = num_return_dec_`i'/den_return_dec_`i' if deciles_MarketCap==`i'
}
Any help would be really appreciated as it is one week that I have been trying to solve this problem.
Best regards,
Antonio
Generate variables with forvals
Hello Statalist,
I have a dataset which contains the following variables: firm (every firm has an assigned number from 1-1000), their products (every product has an asigned number), the costs of sales, the revenue of the sale, and the year. Now I am trying to generate two very similar variables and a third one. One displays the average sales for each year and each firm and one displays the average costs for each year and each firm. The code should be equal for both, simply exchanging sales with costs. Now I am trying to construct this variable using a loop but I don't get the right result. My first try was the following:
forval i = 1/1000 {
forval j = 2001/2012 {
sum ventas if firma == `i' & year == `j'
gen ventap_`i'_`j' = `r(mean)'
}
}
There are 1000 firms. However the year data is not equal for all firms. There are some firms who have data from for example 2004-2009 or others with different periods (a lot of different periods) But the min of the variable year is 2001 and the max is 2012.
So when I run this code I encounter 2 problems: first it doesnt work because the firm doesnt have any observations for the year 2012 or other years (invalid syntax error). Second it creates a variable for every single year, displaying the average for that year. However I want just one variable displaying the average for the corresponding year for all cases.
The third variable that I have to create is one that displays the product that has the highest sales. The code should be a similar one to the first one, using 2 forvals containing the year and the firm but instead of using r(mean) it should probably use r(max). However here I encounter the same problem that not all firms have data for all the years between 2001 and 2012 and it generates a lot of variables instead of just one which shows the product id with the highest sales for the corresponding year.
I Hope i explained it understandibly and you can help me.
Thanks a lot
I have a dataset which contains the following variables: firm (every firm has an assigned number from 1-1000), their products (every product has an asigned number), the costs of sales, the revenue of the sale, and the year. Now I am trying to generate two very similar variables and a third one. One displays the average sales for each year and each firm and one displays the average costs for each year and each firm. The code should be equal for both, simply exchanging sales with costs. Now I am trying to construct this variable using a loop but I don't get the right result. My first try was the following:
forval i = 1/1000 {
forval j = 2001/2012 {
sum ventas if firma == `i' & year == `j'
gen ventap_`i'_`j' = `r(mean)'
}
}
There are 1000 firms. However the year data is not equal for all firms. There are some firms who have data from for example 2004-2009 or others with different periods (a lot of different periods) But the min of the variable year is 2001 and the max is 2012.
So when I run this code I encounter 2 problems: first it doesnt work because the firm doesnt have any observations for the year 2012 or other years (invalid syntax error). Second it creates a variable for every single year, displaying the average for that year. However I want just one variable displaying the average for the corresponding year for all cases.
The third variable that I have to create is one that displays the product that has the highest sales. The code should be a similar one to the first one, using 2 forvals containing the year and the firm but instead of using r(mean) it should probably use r(max). However here I encounter the same problem that not all firms have data for all the years between 2001 and 2012 and it generates a lot of variables instead of just one which shows the product id with the highest sales for the corresponding year.
I Hope i explained it understandibly and you can help me.
Thanks a lot
Interaction between variables changes the results fundamentally!
Dear All,
I would appreciate your help on the following please:
The correlation between y and x1 x2 is negative but between y and the interaction between x1 and x2 is positive, that's strange! Could somebody explain this to me, please?
Array
I would appreciate your help on the following please:
The correlation between y and x1 x2 is negative but between y and the interaction between x1 and x2 is positive, that's strange! Could somebody explain this to me, please?
Array
Weighted least squares (WLS) with wls0 and regwls
Dear Statalist,
I am conducting a long-run event study with the use of the Fama French 3 Factors model.
I am using the WLS regression, I want to use the monthly number of firms in the event portfolio as weights. Moreover, I want to use the equal-weighted monthly returns on each portfolio.
First of all, I uploaded the excel file and changed the format of the Date2 from string to Date3 variable (monthly format), then I declared the data set to be time series data.
I intend to use the Stata command wlsreg or wls0 with the options: wvar = No. of firms in the event portfolio in a month type(wlstype) - The choices are: abse - absolute value of residual e2 (With the dependent variable is the company name, explanatory variables are MktRF, SMB, and HML.
Could anyone please help me with the Stata command for WLS regression? I am very grateful.
Thank you and Kind regards,
Chi
I am conducting a long-run event study with the use of the Fama French 3 Factors model.
I am using the WLS regression, I want to use the monthly number of firms in the event portfolio as weights. Moreover, I want to use the equal-weighted monthly returns on each portfolio.
First of all, I uploaded the excel file and changed the format of the Date2 from string to Date3 variable (monthly format), then I declared the data set to be time series data.
I intend to use the Stata command wlsreg or wls0 with the options: wvar = No. of firms in the event portfolio in a month type(wlstype) - The choices are: abse - absolute value of residual e2 (With the dependent variable is the company name, explanatory variables are MktRF, SMB, and HML.
Could anyone please help me with the Stata command for WLS regression? I am very grateful.
Thank you and Kind regards,
Chi
Which Flavor of ADF Estimation Does Stata Use?
Hey Everyone,
with WLS/ADF estimation there are different flavors out there that are being used in statistics software. It is rather easy to control the specific estimator in R but in Stata I have not found any information on the exact source/reference the ADF estimation is built off.
My key interest is whether it is the pure Browne formula or if any of the adjustments to make WLS more robust to small sample size have been implemented.
Thanks and best
Leon
with WLS/ADF estimation there are different flavors out there that are being used in statistics software. It is rather easy to control the specific estimator in R but in Stata I have not found any information on the exact source/reference the ADF estimation is built off.
My key interest is whether it is the pure Browne formula or if any of the adjustments to make WLS more robust to small sample size have been implemented.
Thanks and best
Leon
fill in empty adjacent cells within a group
Dear all,
I would like to ask how to fill in empty adjacent rows within a group. Problem here is the confusing data structure.
I want to put subjid like this.
Using personid is not a good idea because personid is repeated in row 4651 again- it strats with 1, 2 ...
if you let me know any solution for this, I would reapply appreciate it.
Kind regards,
Kim
I would like to ask how to fill in empty adjacent rows within a group. Problem here is the confusing data structure.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str12 subjid str21 personid str23 invtype "a" "1" "Cranial Ultrasound Scan" "" "1" "Other Ultrasound Scan" "" "1" "CT scan" "" "1" "X-ray" "" "1" "EEG" "" "1" "MRI" "" "1" "ECHO" "" "1" "ECG" "" "1" "" "" "1" "" "" "1" "" "b" "2" "Cranial Ultrasound Scan" "" "2" "Other Ultrasound Scan" "" "2" "CT scan" "" "2" "X-ray" "" "2" "EEG" "" "2" "MRI" "" "2" "ECHO" "" "2" "ECG" "" "2" "" "" "2" "" end
I want to put subjid like this.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str12 subjid str21 personid str23 invtype "a" "1" "Cranial Ultrasound Scan" "a" "1" "Other Ultrasound Scan" "a" "1" "CT scan" "a" "1" "X-ray" "a" "1" "EEG" "a" "1" "MRI" "a" "1" "ECHO" "a" "1" "ECG" "" "1" "" "" "1" "" "" "1" "" "b" "2" "Cranial Ultrasound Scan" "b" "2" "Other Ultrasound Scan" "b" "2" "CT scan" "b" "2" "X-ray" "b" "2" "EEG" "b" "2" "MRI" "b" "2" "ECHO" "b" "2" "ECG" "" "2" "" "" "2" "" end
if you let me know any solution for this, I would reapply appreciate it.
Kind regards,
Kim
Test for a unit root using panel data
Dear Community,
For my master thesis I want to test wether the error term of my model is stationary or not, in order to verify if the regression is spurious.
I am using unbalanced panel data, so based on what I read there are two possibile tests: xtunitroot lps and xtunitroot fisher.
However, when I try the lps test, I get an error saying ''insufficient observations'' and when I run the fisher test it takes ages for my computer to compute the test statistic.
I believe that stata computes the test for every panel (11000 in my case).
Do you know a way to solve these issues?
For my master thesis I want to test wether the error term of my model is stationary or not, in order to verify if the regression is spurious.
I am using unbalanced panel data, so based on what I read there are two possibile tests: xtunitroot lps and xtunitroot fisher.
However, when I try the lps test, I get an error saying ''insufficient observations'' and when I run the fisher test it takes ages for my computer to compute the test statistic.
I believe that stata computes the test for every panel (11000 in my case).
Do you know a way to solve these issues?
Losing groups while running a regression
Hello,
I am trying to make a panel data regression. My group variable (cntry) has 41 countries. But when I run a regression the number of groups is reduced to 18. I cannot find anything about this and stata does not say anything about it or why these groups are left out. Whaat are possible reasons that these groups are not taken into account when running the regression?
I am trying to make a panel data regression. My group variable (cntry) has 41 countries. But when I run a regression the number of groups is reduced to 18. I cannot find anything about this and stata does not say anything about it or why these groups are left out. Whaat are possible reasons that these groups are not taken into account when running the regression?
Latent class analysis categorical variables
Dear all,
I have been using latent class analysis in STATA 15 and have been able to get results using the gsem command for binary variables using the following code:
However when i try to analyse categorical variables, I get an error message
Could anyone help me with this?
Thank you
I have been using latent class analysis in STATA 15 and have been able to get results using the gsem command for binary variables using the following code:
Code:
gsem (isch afib ckd dbt hyp pvd <- _cons), logit lclass(C 3)
Code:
.gsem (bmigroup hb_class age_gp <- _cons), ologit lclass(C 2) invalid path specification; ordinal response bmi_fr1 may not have an intercept
Thank you
X11 Forwarding only showing half screen with Stata
I am using PuTTY to do SSH tunneling and X11 Forwarding with an Amazon EC2 Ubuntu instance. Multiple individuals in my system login to their own remote desktops, and then connect to the same EC2 instance.
I am currently trying to fix an issue where half of the screen is cut off for some users after the forwarding. I've tried
1) Make sure everyone uses the same network, however this doesn't solve the problem, the full screen shows up for some users and only a cut off screen for other users
2) Logging into the remote desktop of the users who are experiencing the cut off screen problem (using my own laptop), however I'm not able to replicate the issue
3) Configure the Xming display settings in XLaunch, and have Xming display in one window instead of multiple windows. Jury is still out on whether or not this works, I haven't had the users try out the new configuration yet. Also, when I open save XLaunch configs, hit finish, and then later open up XLaunch again, "multiple windows" / default settings are selected, rather than "one window". So before I tell users to try out the "one window" setting, how can I make sure that my new configurations are actually saved?
4) Do the PuTTY connection and X11 forwarding set up through XLaunch, rather than through PuTTY. I think this would be rather complicated so haven't tried it yet... though willing to do so if it could solve the issue.
Thoughts? Image of the problem is shown below.
Array
I am currently trying to fix an issue where half of the screen is cut off for some users after the forwarding. I've tried
1) Make sure everyone uses the same network, however this doesn't solve the problem, the full screen shows up for some users and only a cut off screen for other users
2) Logging into the remote desktop of the users who are experiencing the cut off screen problem (using my own laptop), however I'm not able to replicate the issue
3) Configure the Xming display settings in XLaunch, and have Xming display in one window instead of multiple windows. Jury is still out on whether or not this works, I haven't had the users try out the new configuration yet. Also, when I open save XLaunch configs, hit finish, and then later open up XLaunch again, "multiple windows" / default settings are selected, rather than "one window". So before I tell users to try out the "one window" setting, how can I make sure that my new configurations are actually saved?
4) Do the PuTTY connection and X11 forwarding set up through XLaunch, rather than through PuTTY. I think this would be rather complicated so haven't tried it yet... though willing to do so if it could solve the issue.
Thoughts? Image of the problem is shown below.
Array
xtivreg2 - identifying singleton observations
Dear All,
I have a panel dataset with 18,071 observations.
I am estimating the following model in stata:
xtivreg2 ret2_w mret2_w s_r10_lmcap s_r10_bm s_r10_mom s_r10_op_prof s_r10_agro s_r10_stdret s_r10_vol_s s_r10_lag_ue_p s_r10_lnumage s_r10_divy yr1-yr13 (tq2_centered_w = wklymret_w wklych_usd_w) if sample_to_use == 3, fe first liml cluster(cnum) endog(tq2_centered_w)
The output begins with the warning:
Warning - singleton groups detected. 194 observation(s) not used.
Partial output below indicates 17,877 observations were used (18071-194 = 17877)
Number of clusters (cnum) = 1132 Number of obs = 17877
F( 25, 1131) = 59.19
Prob > F = 0.0000
Total (centered) SS = 56.71687777 Centered R2 = -0.1134
Total (uncentered) SS = 56.71687777 Uncentered R2 = -0.1134
Residual SS = 63.15058851 Root MSE = .06141
When I check generate descriptive statistics, I have complete data for all the variables in the above model for 18,071 observations. No missing values.
Importantly, I have only 7 singletons in my 18071-observation dataset:
. count if number == 1 & sample_to_use == 3
7
I would like to drop the 194 singletons. Could some one let me know how to identify and eliminate the 194 observations?
Best,
Srinivasan Rangan
I have a panel dataset with 18,071 observations.
I am estimating the following model in stata:
xtivreg2 ret2_w mret2_w s_r10_lmcap s_r10_bm s_r10_mom s_r10_op_prof s_r10_agro s_r10_stdret s_r10_vol_s s_r10_lag_ue_p s_r10_lnumage s_r10_divy yr1-yr13 (tq2_centered_w = wklymret_w wklych_usd_w) if sample_to_use == 3, fe first liml cluster(cnum) endog(tq2_centered_w)
The output begins with the warning:
Warning - singleton groups detected. 194 observation(s) not used.
Partial output below indicates 17,877 observations were used (18071-194 = 17877)
Number of clusters (cnum) = 1132 Number of obs = 17877
F( 25, 1131) = 59.19
Prob > F = 0.0000
Total (centered) SS = 56.71687777 Centered R2 = -0.1134
Total (uncentered) SS = 56.71687777 Uncentered R2 = -0.1134
Residual SS = 63.15058851 Root MSE = .06141
When I check generate descriptive statistics, I have complete data for all the variables in the above model for 18,071 observations. No missing values.
Importantly, I have only 7 singletons in my 18071-observation dataset:
. count if number == 1 & sample_to_use == 3
7
I would like to drop the 194 singletons. Could some one let me know how to identify and eliminate the 194 observations?
Best,
Srinivasan Rangan
Saving WTP estimates to conduct Poe Test
Dear all
I have run a clogit model and have estimated wtp via wtp, krisnky command. I now want to save the wtp estimates so I can perform a Poe test to compare with wtp estimates for another clogit model. However I do not see any option for saving these wtp estimates. The saving function is only available with the wtpcikr function which I do not think can be used with a clogit model.
Can anyone help me with this please
I have run a clogit model and have estimated wtp via wtp, krisnky command. I now want to save the wtp estimates so I can perform a Poe test to compare with wtp estimates for another clogit model. However I do not see any option for saving these wtp estimates. The saving function is only available with the wtpcikr function which I do not think can be used with a clogit model.
Can anyone help me with this please
Formatting and managing dates, from String to MMYYYY format
Dear All,
I am a new user in Stata.
I am having a basic question and would kindly ask for your help. I want to change the data format from string to the date format (MMYYYY). I have tried this formula date(Date2, "MY") and then created the monthly format (Date3). However, the format is not what I expected.
Many thanks for your support,
Chi
I am a new user in Stata.
I am having a basic question and would kindly ask for your help. I want to change the data format from string to the date format (MMYYYY). I have tried this formula date(Date2, "MY") and then created the monthly format (Date3). However, the format is not what I expected.
Many thanks for your support,
Chi
Bootstrap
Hi everyone,
I need some help with understanding why STATA doesn't let me use the command: bootstrap_b.
It writes: unrecognized command: bootstrap_b.
What can I do?
Thanks in advanced.
Gal.
I need some help with understanding why STATA doesn't let me use the command: bootstrap_b.
It writes: unrecognized command: bootstrap_b.
What can I do?
Thanks in advanced.
Gal.
Advices on how to learn systematically how to work with panel
Hi everyone,
I have very elementary skills in econometrics and until now I've only worked with cross-sectional data. Now I need to work with panel data, but I feel I lack even the basic competences (even for doing descriptive statistics). Until now I've tried to fill my gaps "on the road", basically trying to learn only the things that I needed immediately. I resorted to this "easy" steategy only because I'm really short of time.
But that's not working. I need a more systematic training on how to explore my data and work with them when they have a panel dimension.
My handbook is not very helpful: the chapter on Panel starts from regressions; I want to be able to know my data in detail and know how to work with them before I do regressions. There is probably I reason for that lack in my book (maybe I should look into the time series methods for descriptive statistics?), but I don't know it.
So my question is: considering that (independently on my will) I'm short of time, what book/video/online resource would you suggest to have a systematic introduction to panel data in Stata, which includes all the "tricks" to describe them and work with them (I learned how to do many things the long way, and then found a much shorter way in Statalist.. isn't there a way to learn these things systematically?)?
Aurora
I have very elementary skills in econometrics and until now I've only worked with cross-sectional data. Now I need to work with panel data, but I feel I lack even the basic competences (even for doing descriptive statistics). Until now I've tried to fill my gaps "on the road", basically trying to learn only the things that I needed immediately. I resorted to this "easy" steategy only because I'm really short of time.
But that's not working. I need a more systematic training on how to explore my data and work with them when they have a panel dimension.
My handbook is not very helpful: the chapter on Panel starts from regressions; I want to be able to know my data in detail and know how to work with them before I do regressions. There is probably I reason for that lack in my book (maybe I should look into the time series methods for descriptive statistics?), but I don't know it.
So my question is: considering that (independently on my will) I'm short of time, what book/video/online resource would you suggest to have a systematic introduction to panel data in Stata, which includes all the "tricks" to describe them and work with them (I learned how to do many things the long way, and then found a much shorter way in Statalist.. isn't there a way to learn these things systematically?)?
Aurora
How to interpret the result of the "Total Factor Productivity of Manufacturing Firms" based on Levinsohn and Petrin (2003) approach?
I intend to measure the TFP of manufacturing firms for 23 firms through Cobb - Douglas Production Function Approach using Prodest code in Stata for the period 2015-2017.
I am using Levinsohn and Petrin (2003) approach with the attached Stata dataset for the same. However, I got negative coefficients of logL and logK in case of Levinsohn and Petrin (2003) approach. Results have been attached in the form of the image below. These individual TFP Values as dependent variables are regressed with infrastructure stocks as an independent variable.
Stata Code:
prodest lnGVA, method (lp) free(lnL) proxy(lnInput) state(lnK) poly(3) valueadded reps(250)
predict TFP
Can anyone help to overcome this issue in the result? Please respond.
I am using Levinsohn and Petrin (2003) approach with the attached Stata dataset for the same. However, I got negative coefficients of logL and logK in case of Levinsohn and Petrin (2003) approach. Results have been attached in the form of the image below. These individual TFP Values as dependent variables are regressed with infrastructure stocks as an independent variable.
Stata Code:
prodest lnGVA, method (lp) free(lnL) proxy(lnInput) state(lnK) poly(3) valueadded reps(250)
predict TFP
Can anyone help to overcome this issue in the result? Please respond.
| Famid | year | lnGVA | lnK | lnL | lnInput |
| 1 | 2015 | 13.34451139 | 14.43711069 | 13.82499642 | 14.94789177 |
| 2 | 2015 | 10.90103056 | 11.39432509 | 12.00363817 | 12.56028455 |
| 3 | 2015 | 10.52884158 | 10.90823019 | 11.74051512 | 12.56156862 |
| 4 | 2015 | 11.71408167 | 12.96707595 | 11.86919333 | 13.19120632 |
| 5 | 2015 | 10.78025708 | 10.57660072 | 11.29931136 | 12.61370021 |
| 6 | 2015 | 11.30195799 | 10.79404052 | 10.71557266 | 12.07061138 |
| 7 | 2015 | 13.89161883 | 14.69274188 | 13.91372923 | 15.59004602 |
| 8 | 2015 | 12.68505841 | 13.08795162 | 13.27239071 | 14.34357928 |
| 9 | 2015 | 12.17481436 | 12.49800879 | 11.90358822 | 13.0876142 |
| 10 | 2015 | 10.37213186 | 10.36546633 | 10.85971028 | 11.51539809 |
| 11 | 2015 | 11.89178185 | 12.93752458 | 11.87475212 | 13.15976277 |
| 12 | 2015 | 12.88529455 | 13.74124074 | 13.52565546 | 14.57748312 |
| 13 | 2015 | 11.26551282 | 11.99049128 | 12.59243988 | 13.35132999 |
| 14 | 2015 | 11.91772596 | 13.18896836 | 12.4565356 | 13.65617625 |
| 15 | 2015 | 14.08798489 | 14.43171365 | 14.08198176 | 15.39174269 |
| 16 | 2015 | 11.84720763 | 14.04701543 | 12.27763023 | 13.27226267 |
| 17 | 2015 | 11.84987474 | 12.32818954 | 13.05611887 | 13.71366131 |
| 18 | 2015 | 12.2899301 | 12.94906791 | 12.83675914 | 13.8142172 |
| 19 | 2015 | 13.31159488 | 14.0107976 | 14.37021545 | 14.99198913 |
| 20 | 2015 | 7.930242796 | 7.485056583 | 10.17564981 | 8.622648785 |
| 21 | 2015 | 12.58255248 | 13.30226199 | 13.42014082 | 14.52482222 |
| 22 | 2015 | 12.45019737 | 12.54385883 | 12.59546596 | 13.57663852 |
| 23 | 2015 | 11.82869258 | 13.06866314 | 13.13062515 | 14.08698579 |
| 1 | 2016 | 13.48373632 | 14.55212104 | 13.81087483 | 14.87352281 |
| 2 | 2016 | 11.09011635 | 12.09660422 | 12.06294103 | 12.5021024 |
| 3 | 2016 | 10.43491923 | 10.92367085 | 11.54507315 | 12.3586785 |
| 4 | 2016 | 11.26804467 | 13.0838402 | 11.83438558 | 13.05207522 |
| 5 | 2016 | 10.7530443 | 10.4765827 | 11.23227197 | 12.73695465 |
| 6 | 2016 | 11.41296781 | 10.8661762 | 10.80685514 | 12.14263568 |
| 7 | 2016 | 13.9810047 | 14.90279048 | 13.99064468 | 15.47850878 |
| 8 | 2016 | 12.75290698 | 13.25330751 | 13.23466654 | 14.43921191 |
| 9 | 2016 | 12.17262238 | 12.62311498 | 11.81393335 | 12.95478697 |
| 10 | 2016 | 10.49895051 | 10.55531489 | 10.86526777 | 11.58579992 |
| 11 | 2016 | 11.52578885 | 12.93922703 | 11.86191193 | 13.20034564 |
| 12 | 2016 | 12.99481134 | 13.78688988 | 13.55250289 | 14.51278126 |
| 13 | 2016 | 11.53622211 | 12.27852734 | 12.51736626 | 13.27750928 |
| 14 | 2016 | 12.20480924 | 13.55141515 | 12.49875718 | 13.69012924 |
| 15 | 2016 | 14.14469829 | 14.47629798 | 14.13122964 | 15.45322844 |
| 16 | 2016 | 11.80136403 | 14.22621284 | 12.25082132 | 13.35906515 |
| 17 | 2016 | 11.91459215 | 12.41077943 | 13.10543896 | 13.69703396 |
| 18 | 2016 | 12.39233098 | 13.06787942 | 12.88085472 | 13.9162249 |
| 19 | 2016 | 13.50721806 | 14.10146753 | 14.47325385 | 14.97086086 |
| 20 | 2016 | 7.590132471 | 7.815032882 | 10.07225939 | 8.626449627 |
| 21 | 2016 | 12.79879814 | 13.47297179 | 13.50166245 | 14.52501448 |
| 22 | 2016 | 12.73090918 | 12.64631381 | 12.64053977 | 13.87087559 |
| 23 | 2016 | 12.05060545 | 13.1477183 | 13.11664506 | 14.11136877 |
| 1 | 2017 | 13.37820655 | 14.57031481 | 13.87654921 | 14.99791944 |
| 2 | 2017 | 11.31482949 | 11.94872061 | 12.1067936 | 12.48835749 |
| 3 | 2017 | 10.48552974 | 11.50844226 | 11.50258216 | 12.33606399 |
| 4 | 2017 | 11.45326146 | 13.4451234 | 11.89512877 | 13.13305958 |
| 5 | 2017 | 10.53909949 | 10.41244812 | 11.2286642 | 12.71520385 |
| 6 | 2017 | 11.32904661 | 10.91523748 | 10.70495088 | 11.98065828 |
| 7 | 2017 | 13.91086343 | 15.06872287 | 14.03587425 | 15.54622029 |
| 8 | 2017 | 12.98490547 | 13.39216318 | 13.3848061 | 14.65935721 |
| 9 | 2017 | 12.08523011 | 12.39343758 | 11.86197541 | 12.95263912 |
| 10 | 2017 | 10.6732993 | 10.92220152 | 10.98576719 | 11.73012375 |
| 11 | 2017 | 11.90939933 | 13.25324863 | 11.88186489 | 13.18303098 |
| 12 | 2017 | 13.19309476 | 13.82810032 | 13.62542212 | 14.61800501 |
| 13 | 2017 | 11.72306923 | 12.43306157 | 12.42895616 | 13.41005668 |
| 14 | 2017 | 12.21973581 | 13.62180352 | 12.54387614 | 13.72163599 |
| 15 | 2017 | 14.10537468 | 14.43870418 | 14.12643932 | 15.34049703 |
| 16 | 2017 | 12.04438547 | 14.43877548 | 12.31398041 | 13.40533761 |
| 17 | 2017 | 11.98871848 | 12.33912744 | 13.18484604 | 13.69294326 |
| 18 | 2017 | 12.53221462 | 13.23962174 | 12.93065551 | 14.01079714 |
| 19 | 2017 | 13.5782688 | 14.25990109 | 14.51053547 | 15.04941816 |
| 20 | 2017 | 7.673050689 | 7.846913183 | 10.08397409 | 8.60560012 |
| 21 | 2017 | 13.23123952 | 13.50385468 | 13.57157867 | 14.59337051 |
| 22 | 2017 | 12.61675213 | 12.68825504 | 12.74948936 | 13.66224058 |
| 23 | 2017 | 12.22915794 | 13.35048112 | 13.11830917 | 14.14173054 |
PMG insufficient observation
Hi everyone,
I was trying to run panel under PMG effect. N= 36 over the time period 1984-2016. When I run the command for the full panel, it is fine. But, in the case of developed and developing countries, I got this message: insufficient observations r(2001). Anyone has idea or suggestion. Please,
Regards,
Marwan
I was trying to run panel under PMG effect. N= 36 over the time period 1984-2016. When I run the command for the full panel, it is fine. But, in the case of developed and developing countries, I got this message: insufficient observations r(2001). Anyone has idea or suggestion. Please,
Regards,
Marwan
Finding code snippets from Stata's base commands
Dear Statalist,
Is there a way to see the programme code from Stata's own base commands?
In my case, I am interested in seeing how the firstrow option of Stata's import excel command works exactly because I want to learn from it in order to do something similar (I want to be able to modify the column headers of my Excel file after importing and only then should these headers become variable names, so I can't just use the firstrow option directly). But when I type sysdir and then locate the import_excel.ado file in my BASE folder, it contains only very limited reference code, not the full programme...
Many thanks,
Felix
Is there a way to see the programme code from Stata's own base commands?
In my case, I am interested in seeing how the firstrow option of Stata's import excel command works exactly because I want to learn from it in order to do something similar (I want to be able to modify the column headers of my Excel file after importing and only then should these headers become variable names, so I can't just use the firstrow option directly). But when I type sysdir and then locate the import_excel.ado file in my BASE folder, it contains only very limited reference code, not the full programme...
Many thanks,
Felix