Friday, May 31, 2019

clogit vs. xtlogit, fe

Is there a difference between clogit and xtlogit, fe? It appears to me they both do conditional logistic regression with fixed effects.

Time lag for Multi_level fixed effect panel data.

Hi all,
I have some douts about time lagging for my independent variables in multi-way FE panel data (reghddfe). I have this panel model with a health outcome (ho) as my dependent variable and socioeconomic and health services system indicators (hi, hi_1, hi_2 and hi_3) as independent variables. As one of my IV's is the unemployment rate (ur), and presumably with lagged effect over my DP, I also suppose that the present unemployment rate also affect the health outcome, in a cummulative or interactive way?!?

Let's say that my model is :

reghdfe ho ur gdp gini hi hi_1 hi_2 hi_3, absorb(state) vce(cluster state#year),

Suppose that I have ur (actual unemployment rate) and ur_1, ur_2 and ur_3 as unemployment rates lagged in 1, 2 and 3 years in my data set, and I want to observe te effect of three consecutive years of unemployment (or occupation rate) over health outcome (ho my DP).

Would it be correct to model:

reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#c.ur_1#c.ur_2#c.ur_3, absorb(state) vce(cluster state#year)

, or, instead of using my own lagged data, to use the Stata “L" command for time lagged associated with unempolyment factor variable? So:


reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#L1.c.ur#L2.c.ur#L3.c.ur, absorb(state) vce(cluster state#year)


Any suggestion for this model.

Thanks in advance.

Alexandre Bugelli

-sreshape-

i just installed -sreshape- onto stata mp.. does anyone know if it has a max variable limit? i have 7500 vars in wide format that i wanted to -sreshape- into long but i keep getting error message

Lasso regress questions

Hi, I am college student from Barcelona. It is hard to learn Stata by myself because the teacher does not explain how to use commands, what they do... and we are in an introductory subject. The homework for this weekend uses a dataset with wage and some covariates, and we should use the lasso and ridge approach. He encouraged us to create as many variables as we can (I do not why, dummies, etc...). But he told that we should install (net install elasticregress, replace) and (ssc install lassopack, replace). I suppose that it install some new commands.

In the second question he says that we should use the commands rlasso and lassoregress. I do not know what is the difference between both commands, I could not fin it in Internet. Also I saw an extra command called lasso2. What they do? Thank you.
2) Use the lasso methods (rlasso, lassoregress and ridgeregress) to select the most relevant covariates for the analysis.

Add Text to Graph Combine

I am combining three graphs using 'graph combine'. By default they appear in a 2 x 2 arrangement with the lower right slot empty. I'd like to add text to this empty area. What's the best way to do this?

I have tried the 'caption' and 'note' options, with and without the 'position' suboption, but that distorts the shape or is at the far bottom of the combined graph.

I don't know if this would work, but if I could either save the text as a standalone .gph file, I could add that way, or perhaps there is an option I'm missing for 'graph combine' whereby you can just place text anywhere you like (using coordinates, not clock position).

Obtaining mean and SD from survey data

Hello,

I am trying to get the differences in the length of stay(LOS) in subpopulation of myocarditis, categorized by whether they have arrhythmia or not (Tarry or not). I get mean and standard error. I would like to get mean and standard deviation. How would I be able to get that? Thanks.

This is what I did and what I got.
. svy linearized, subpop(myocarditis) : mean LOS, over(Tarry)


0: Tarry = 0

1: Tarry = 1



--------------------------------------------------------------

| Linearized

Over | Mean Std. Err. [95% Conf. Interval]

-------------+------------------------------------------------

LOS |

0 | 8.035982 .3003205 7.447308 8.624657

1 | 15.02443 1.001127 13.06207 16.98679

--------------------------------------------------------------

Showing country-specific treatment effects

Hello,

I'm currently working with a dataset with individual respondents. In my analysis, I show the average treatment effect for the treated group. I suspect, however, that treatment effects vary across countries. How can I show this?

I wish to report average treatment effects for several countries as shown in the picture (from another analysis):

Array

Standard Error Correction in a two step process

Dear all
I wonder if anyone has any references, and perhaps Stata applications, that can help me solve a problem like follows:
Code:
use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
* This is the benchmark
reg lnwage educ exper tenure female single
est sto m1
* now, i can create the following variables:
gen double female2=female*_b[female]
gen double educ2=educ*_b[educ]
gen double femaleeduc=female2+educ2
gen double lnwage2=lnwage-female2-educ2
* And estimate this model
reg lnwage femaleeduc exper tenure single
est sto m2
* or this other
reg lnwage2 exper tenure single
est sto m3

est tab m1 m2 m3, se
-----------------------------------------------------
    Variable |     m1           m2           m3      
-------------+---------------------------------------
        educ |  .08303358                            
             |  .00513458                            
       exper |  .00895939    .00895939    .00895939  
             |  .00156103    .00155551    .00153964  
      tenure |  .00613849    .00613849    .00613849  
             |  .00187615    .00186656    .00185977  
      female | -.09382159                            
             |  .02490175                            
      single | -.16080964   -.16080964   -.16080964  
             |  .02702903    .02694462     .0269209  
  femaleeduc |                       1               
             |               .05800787               
       _cons |  2.3408107    2.3408107    2.3408107  
             |  .07085517    .06106525    .02653445  
-----------------------------------------------------
                                         legend: b/se
All models provide the same point estimates, but different standard errors. The benchmark is column 1.
Column 2 combines the effect of female and educ, and adds them into the model. and column 3 simply extracts the effect of female and education from lnwage before estimating the model.
My question is, does anyone know how to correct the standard errors from model 3 or 2, and obtain the "correct" ones from model 1?
I know this could be done using bootstrap methods, but Im trying to see if it can be done in a different way.

For more details on why this. Im revising Robinson's Semiparametric estimator. (see reference below). Im aware about the userwriten command -semipar-. However, the application itself does not provide much detail on how to estimate the standard errors of the nonparametric section.
Looking through the code, it does it in a similar way as the results from column 3, but what we want are the results from column 1.
Thank you in advance.

Robinson, P. M. 1988. Root-n-consistent semiparametric regression. Econometrica 56: 931–954.

WLS regression using regwls and regress

Dear All,

I have a question regarding WLS regression using Stata commands regwls and regress and would kindly ask for your help.
For your information, The general idea is I want to use WLS regression with the monthly number of firms for each observation as weights (I created the weight column on excel and imported into Stata since I do not know how to generate it on Stata). Additionally, the "avg" variable is the equal-weighted monthly returns on each portfolio this is my dependent variable.
Firstly, I used the command "regwls avg MktRF SMB HML [aw=1/ Weight]" for the WLS regression with the analytical weight (However this command does not work on my Stata 13 version). Lately, I tried the command "regress avg MktRF SMB HML [aw=1/ Weight]" and it worked.

Could someone please let me know if the two commands are the same and whether my approach is correct?
Many thanks for your help!

Best regards,
Chi

Inverted Normal Graph

Hello all,

Admittedly a mundane question here... I'm simply trying to plot an inverted normal distribution for an upcoming presentation. I've successfully plotted a normal distribution, but now I simply need to flip it upside down.

Code:
clear
set obs 100
gen x=rnormal(0,1)
twoway function y=normalden(x), range(-4 4) xtitle("{it: x}") ///
ytitle("Density") title("Standard Normal")
Any suggestions would be greatly appreciated! I recently converted to Stata from SPSS and apologies for presumably a rather elementary question!

J.

How to do matrix exponential operation in Stata?

Code:
matrix A = (1,0,0,0,0\0.6,0,.4,0,0\0,.6,0,.4,0\0,0,.6,0,.4\0,0,0,0,1)
matrix list A
matrix B = A*A
I know how to do A^2. Now I want to do A^20. How should I do this?

Many thanks in advance!

gsem covstruct

Hello.
I am running LPA analysis using gsem command. I ve run the analysis in R using the mclust command too. The problem I have is that I dont get similar results. For example, for my best model in R (based on BIC etc), I am getting a 3 class what is called VVI model (that is varying volume and shape and identity for the orientation). In stata, I am trying to put the same constrains (that is, I want all parameters to vary freely) and I am not sure I am getting. I ve tried lcinvariant (none) and covstruct(e._LEn, diagonal) and I get similar but not the same results
anyone familiar with this?
thank you a lot

Generate balance table

Hi, I'm trying to replicate this balance table (as in the picture) using some of the example datasets installed with Stata, in particular I was trying to use the bplong.dta. However I haven't been able to


do so. I found the use of the command iebaltab to do this table but I'm having problems understanding how it works. Do you have any idea how can I do this?


. Array

extract variable labels for new variable names

Hi there

I try to automate my programming as much as possible and one challenge I've come up against recently is in trying to name new variables according to the value labels of existing variables.

For example:

Code:
sysuse sandstone
tab type, gen(type_name)
My goal is to name the new variables:
type_measured
type_estimated
type_interpolated

Any advice greatly appreciated.

Count number of cases if dates are within a certain range (a la statsby)

Greetings all,

I have single line per observation survival data (4 million lines). Here is a simplified example
default zip code date_start date_end date_default
1 12345 2000q2 2016q1 2005q3
0 54321 1993q4 2016q1
1 13467 2003q1 2016q1 2010q1
One thing I'd like to with my data is to understand the default rate per quarter, by department. I'd ultimately like to construct a second panel (or, a first one, since this isn't per se a panel as is) where I have the different zip codes as the subjects to be followed through time, and in the end the rate of default per time. I have unemployment data that is already organized in this fashion, and naturally I want to combine it with a default rate (# defaults / # "alive" or "at risk" loans) per zip code:
zip code date unemployment default rate
11111 1990q1 4.2 x
11111 1990q2 4.1 x
11111 1990q3 4.6 x

One guess was to create some new variable that uniquely identifies zipcode/quarter combinations, and then to do a statsby on this. But that would imply ~12,000 groups (100 zip codes * 30 years *4 quarters), and that just doesn't seem right/efficient.

It shouldn't be hard for me to find a way to count the defaults per quarter/department (although I can't do tab default department zipcode, as this is too many variables :/), but I must confess I have no idea where to start on counting (and organizing in a new panel, without Excel) the at-risk loans per quarter.

Thank you so much for even some rough intuitions about how to go about this in STATA.

Have a great day,
John



Multinomial logit with sample selection

Dear everyone,


I am looking for something similar to Heckman selection model/svysemlog with a modification.

I have a selectiion variable with two values (0 and 1) in the first step, and a mulitnomial non-ordinal categorical variable (with six categories) in the second step.
I am interested only in positive (1) values in the first step (around 30% of the total sample).

What I did at the first place was a. logit analysis for the first step b. multinomial logit for the second step. However, I was advised to use the Heckman selection model for multiple reasons.
However, if I am not mistaken, Heckman (and svysemlog) cannot be used if the outcome variable is a non-ordinal variable.


I have two questions:


a. Is there any Stata package that adresses my problem?
b. Do you have any advice how to proceed, in case there is no ready-made solution in Stata?


Thanks in advance!




Manually installing Blindschemes by Daniel Bischof

Dear Statalisters

I admit this is a bit of a non-problem, but I'd like to find a solution nonetheless. Never underappreciate a nice graph.

I'm trying to use Daniel Bischof's schemes for making graphs (found here: https://danbischof.com/2015/02/04/stata-figure-schemes/). My organisation doesn't allow installing via ssc, so I downloaded all the scheme and style files and added them to the folder where all my other ado files are stored. I saved the color files both into a separate folder called "style" (this is what ssc does, I think), as well as in the same folder with the scheme files. Now, when I'm setting the color scheme to plotplainblind, the graphs come out in that scheme, but in black and white. The command doesn't seem to find the colors. So, I think I need to define these colors first in some way, but I don't know how. Any suggestions?

Many thanks

Carolin


How to declare data with tournament structure as panel data?

Dear all,

I recently read some papers using panel data from sports. I started to wonder how one would actually declear data e.g. from tennis to be panel data.
Typically, in tennis there is a season which consists of several tournaments. In turn, each of these tournaments consists of several matches. Each match consists of a sequence of sets. A set in turn, consists of a sequence games.

So, one observation is for player x from game g in set s of match m played for tournament t in season z. If there are seperate variables indicating the season (e.g. 2015), the tournament (e.g. 1), the match (e.g. 1), the set (e.g. 1), and the game (e.g. 1), how would one declare the data to be panel while keeping the structure described above? I included the code for a sample data set below.

Obviously, the panelvar in the xtset-command would be player_id. But how would one set the timevar if one's goal was to run a panel data regression (e.g. using xtreg) at the game-level which includes time lags (e.g. matchlevelstat1 from the previous match as well as gamelevelstat1 and gamelevelstat2 from the previous game, which might actually be from the same tournament and same match but from the previous set of that match) as independent variables?



Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year player_id tournament match) byte(set game) float(gamelevelstat1 gamelvelstat2 setlevelstat1 matchlevelstat1 tournamentlevelstat1 yearlevelstat1)
2018 1 1 1 1 1  19 19  22 26  57 17
2018 1 1 1 1 2  64 19 100 39   3 47
2018 1 1 1 2 1 100 32  79 93  32 92
2018 1 1 1 2 2  67 70  15 63  82 88
2018 1 1 2 1 1  86 12  83 92  55 50
2018 1 1 2 1 2  67 97  95 93 100 48
2018 1 1 2 2 1  14 53  58 28  26  6
2018 1 2 1 1 1   8 78   6 35  22 41
2018 1 2 1 1 2  87 85  68 55  98 17
2018 1 2 1 2 1  32 56  87 69  40 94
2018 1 2 1 2 2  47 24  42 89  32 99
2018 1 2 2 1 1  16 98  38 85  21 11
2018 1 2 2 1 2  88  1  87 60  96 28
2018 1 2 2 2 1  14 72  50 19  55 14
2019 1 1 1 1 1  34 48  16 38  95 44
2019 1 1 1 1 2  73  6  25 26  93 96
2019 1 1 1 2 1  92 27  48 89  68 99
2019 1 1 1 2 2  62 66  66 27  80 22
2019 1 1 2 1 1  69 46  40  2  90 59
2019 1 1 2 1 2  27 74  55 13  14 73
2019 1 1 2 2 1  11 61  75 26  73 26
2019 1 2 1 1 1  12 43  16 28  58 15
2019 1 2 1 1 2  49 49  91 83  61 35
2019 1 2 1 2 1  71  1  62 90  50 54
2019 1 2 1 2 2  88 53   6 58  40 99
2019 1 2 2 1 1  84 13  33 96   3 30
2019 1 2 2 1 2  79 68  80 18  86 19
2019 1 2 2 2 1  52  5  77 17  36 48
2018 2 1 1 1 1  59 67   5 29  96 22
2018 2 1 1 1 2  89 34  22 69 100 40
2018 2 1 1 2 1   5 74   8 49  97 83
2018 2 1 1 2 2  58 91  44 66  58 62
2018 2 1 2 1 1  96 77  73 53  59 62
2018 2 1 2 1 2  90 38  32 80   2 42
2018 2 1 2 2 1  79 43  90 18   6  1
2018 2 2 1 1 1  49 85  38 25  95 33
2018 2 2 1 1 2  23 35  35 51   9 53
2018 2 2 1 2 1   9 92  49 98  91 44
2018 2 2 1 2 2  78  9  26 81  23 39
2018 2 2 2 1 1  85 13  98 55   8 77
2018 2 2 2 1 2  24 38  75 12   1 53
2018 2 2 2 2 1  65 91  31 49  96 70
2019 2 1 1 1 1 100 38   9 86  15 83
2019 2 1 1 1 2  78  3  94  9  32 26
2019 2 1 1 2 1  73 40  41 62  60 59
2019 2 1 1 2 2   2 30  26 62  78 49
2019 2 1 2 1 1  21 83  58 10  25 16
2019 2 1 2 1 2  63 92  78  4  29 23
2019 2 1 2 2 1  98 67  59 61  82 62
2019 2 2 1 1 1  75 48  72 25  14 64
2019 2 2 1 1 2  87 76  87 98  60  7
2019 2 2 1 2 1  42 40  38 12  61 29
2019 2 2 1 2 2  12 82  72 48  61 59
2019 2 2 2 1 1  35 42  50 24  14 17
2019 2 2 2 1 2  84 73  75 25  25 72
2019 2 2 2 2 1  50 85  79  8  56 52
end
label var year "season"
label var player_id "player "
label var tournament "tournament number"
label var match "match number"
label var set "set"
label var game "game"
label var gamelevelstat1 "game-level statistic 1"
label var gamelvelstat2 "game-level statistic 2"
label var setlevelstat1 "set-level statistic 1"
label var matchlevelstat1 "match-level statistic 1"
label var tournamentlevelstat1 "tournament-level statistic 1"
label var yearlevelstat1 "season-level statistic 1"

Drop ID if different observations for that same ID do not vary across another variable

Hello,

I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.

The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.

I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.

I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.

The data looks like this: (I excluded other variables that are not important to the question)

* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12

344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]

Drop ID if different observations for that same ID do not vary across another variable

Hello,

I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.

The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.

I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.

I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.

The data looks like this: (I excluded other variables that are not important to the question)

* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12

344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]

Mixed or reg i.country i.year for repeated cross-section data

Dear all,

I am doing a time-series cross-sectional data from 4 waves and around 25 countries and I am using Stata 14. The dataset used is the International Social Survey Programme years 1988, 1994, 2002 and 2012. My main variable of interest is female hours worked per week (originally WRKHRS, for purpose of analysis generated work hours only for females, 0 if otherwise) and how are they affected by the benefit amount/presence in the country. First I had these benefits in the percentage of expenditure per GDP, but my supervisor told me to generate dummies, 0 for no benefit and 1 for the benefit, for all the different types I had. I have them both ways now. I have two parts of the research: first is a regression with female hours worked per week and the relationship with different types of benefits, the second part is focused on analyzing attitudes - support for traditional gender roles of men, comparing between countries.

I want to do an individual level analysis (within respondents) on the effect based on education##benefit, marital status, attendance of religious services and presence of a child. On country level variables I have the benefits and Unemployment rates and labor force participation for men and women, total fertility rate and types of expenditure - public total, in-kind % of GDP, in cash % of GDP and real GDP forecast. I know its too much, I won't be using all of them, just letting you know what I have.

I was planning to do a mixed command, starting with basic mixed femworkhours || countryid: , and build upon that, adding more lvl1 predictors and then lvl 2. However, I cannot declare it a panel data set because of repeated time values within the data set, so I set it xtset countryid (As i read somewhere in this forum it is an option for repeated cross-section data). Since this is my thesis, I asked my supervisor if I should use mixed or a simple reg with i.countryid i.wave, and he suggested to use reg with i.countryid i.year. Nevertheless, when I regress it does not seem that there is a significant but small country effect, and it comes out that the first part of the analysis ignores country and year effects. Could the problem be if I run a basic regression with fixed country and year effects I should use mean hours worked by country rather than individual level? I was browsing this forum and the internet and unfortunately could not find the answers I was looking for.

Hence the question, what would you suggest to do with this data? The variable female work hours presented below looks like many observations are missing, but that is not the case since I run mdesc command and from the total sample 33% are missing (the values range from 0-80 hours worked per week). I hope this question is clear enough to understand, if not, please let me know where I can elaborate.

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(femworkhours married incgroup fulltime parttime attend1) byte educ float(dbgrant drealfam dincmaint ddaycare dpleave dchildall wave countryid)
0 0 1 0 0 1 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 0 1 0 0 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 0 1 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 2 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 0 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 1 0 0 1 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
0 0 3 0 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 5 1 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 0 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 3 0 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
0 1 5 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 0 3 0 1 0 2 0 1 0 0 0 0 2 1
. 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
0 1 1 0 0 0 1 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 1 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 1 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
end
label values incgroup incgroup
label def incgroup 1 "10%", modify
label def incgroup 2 "25%", modify
label def incgroup 3 "50%", modify
label def incgroup 4 "75%", modify
label def incgroup 5 "90%", modify
label values fulltime employed
label def employed 0 "not fulltime", modify
label def employed 1 "fulltime", modify
label values educ educ
label def educ 0 "no education", modify
label def educ 1 "primary/lower secondary", modify
label def educ 2 "upper/post secondary", modify
label def educ 3 "lower/upper tertiary", modify
label values wave wave
label def wave 2 "1994", modify
label values countryid countryid
label def countryid 1 "AU", modify
[/CODE]




destring numbers in scientific notation

Dear community,

I was inattentive when pasting data into a new stata-file and now the following problem presents itself: I have a unique numeric identifier with verly large number such that stata abbreviated it to scientific notation e.g. 1.7876423e+11. In the new file this numeric identifier appears as string and contains commas (",") instead of dots. I have now trired to destring this varlist unsuccessfully.

Is there someone who has encountered a similar problem before and could help me out?

Kind regards,

Marie

Interpretation of the interaction term when the relevant dummy variable is insignificant

Dear all,

In the model that I have run to analyse the effect of currency swaps on the gross capital flows of the countries signing them I have included a dummy variable for the signing of such a currency swap agreement (signing=1) as well as a dummy for whether the country is a developed economy or a developing one (developing =1) To check whether the effect of a currency swap differs between a developing or developed country I have included an interaction term between these two dummy variables (signed and developing =1). However, the results that the model produced have rendered my interaction term significant at the 1% level with a positive coefficient, but my dummy variable for the signing of the currency swap is negative (as expected) but insignificant. How do I interpret these results? Due to the fact that the original currency swap dummy variable is insignificant it is impossible to conclude how large the positive effect of a currency swap is for a developing country right? However, does it still allow me to say that a positive relationship exist, but that the size of it is unclear due to the insignificance of the foregoing dummy variable? Many thanks in advance!

Kind regards,

Owen

Thursday, May 30, 2019

categorizing data

Hi all, I have a dataset includes up to 30 string variables. Some of them are dummy variables and the others are categorized with a limited number of categories. I'm trying to categorize the data according to their common features. A potential approach is to use the "tabulate" command. However, Tabulating for 30 variables makes no sense and is difficult even using a prefix command like "by."

Droping observations with x amount of missing values

Dear all,
I'm working with a messy data set with approximately 870 observations at the moment.
After using the command missings table, I realised that 91 observations have 99 missing values out of 108 variables. I used missings list, min(99) to see which observations
account for this.
Now, I want to drop these observations from the data set. I wonder if there is a command that would use the information produced by missings list, min(99) to drop these observations?
Can anyone help? I've been looking for a solution for quite some time, without success.

Thank you.

Meta analysis of hazard ratio

Dear ,
My name is Hatem Ali
I am trying to do meta analysis of hazard ratio to assess effect of rise in IL6 on overall survival
I have the following data
id Sample size p value CI LOWER CI HIGHER notes
Pecoits-Filho (HR for overall mortality,IL6 was higher in CVD group) 99 0.01 chi square=11.3
Liu et al 50 0.001 OR=6.9
cho et al (trend over time) 175 0.03 1.31 87.75 OR=10.72
Lambie et al 575 0.008 1.22 3.78 HR=2.15
Lambie et al 2 384 0.009 1.28 5.58 HR=2.68
Wang et al(mortality and Coronary calcification) 152 0.003 1.53 8.26 HR=3.56










As you see, I have sample size, p value, lower and higher confidence interval.
However, only 3 studies are reporting HR, 2 are reporting OR, one is reporting chi square.

Is there a way to calculate hazard ratio from the studies reporting odds ratio or chi square?
In other words,can I convert Odds ratio to hazard ratio? and can I convert chi square to hazard ratio?

If that is not possible, then , can I calculate Relative risk for each study from the data I have?
Can I calculate RR from HR, 95% CI , sample size and P value?
Can I calculate RR drom OR, 95%CI, sample size and P value?
Can I calculate RR from chi square, sample size and P value?

Finally , after calculating HR, the syntax to use is : metan HR lower higher, counts random
is that correct?
How can I add the name of the studies to the forest plot using this syntax?

Looking forward to hear back from you

Creating a value that equals the average of other values for the same variable

(Sorry about the poor phrasing of the question)
My dataset contains variables such as country and prevalence of obesity, for 34 countries. I want to create a new value for country variable that will be the average of obesity of all the countries, i.e there will be 35 categories under the country variable. Is there any command to do that in Stata 14.

Collinearity error when including continuous variable in dummy regression

I am trying to run the following two regressions to compare the coefficients on the 'iso_str' dummies. The only difference between the two is that the second one includes the variable 'shr',

1.
Code:
reg lncost ib6.iso_str i.var_str, eform(exp_coeff) baselevels
2.
Code:
reg lncost ib6.iso_str shr i.var_str, eform(exp_coeff) baselevels
When I run regression (2) above, Stata omits the 'shr' variable because of collinearity.

Then, I tried an alternative formulation of the above two regressions to see if this way I could compare their coefficients. Again, the only difference is the inclusion of the variable 'shr' in the second regression.

3.
Code:
reg lncost ib6.iso_str ibn.var_str, noconstant eform(exp_coeff) baselevels
4.
Code:
reg lncost ib6.iso_str shr ibn.var_str, noconstant eform(exp_coeff) baselevels
Notice that the outputted coefficients for the 'iso_str' dummies in (3) are identical to those in (4). However, I still can't compare 3 vs.4, as this time Stata doesn't omit the 'shr' variable in reg 4, but it omits one of the 'var_str' dummies in 4 (again, because of collinearity)..even though I used the 'ibn' command so that none would be dropped!

How can I compare the 'iso_str' coefficients outputted by these two regressions, with and without the variable 'shr'? Perhaps there is a way around the collinearity issue I am facing, e.g. rearranging my data differently?

Thank you. An excerpt of my data is below.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 iso str5 var double(cost shr) long(iso_str var_str) float lncost
"CIV" "x1105" 11458.3333333333             49.674 1 1  9.346473
"COD" "x1105" 44083.2888217523              56.12 2 1 10.693836
"MRT" "x1105"              540             47.176 3 1  6.291569
"NGA" "x1105" 16842.1052631579             50.481 4 1  9.731637
"TGO" "x1105" 5590.76923076923             58.838 5 1  8.628872
"TZA" "x1105"            48000             66.947 6 1 10.778956
"ZAF" "x1105" 904.655301204819 34.150000000000006 7 1  6.807554
"CIV" "x1106" 10441.1764705882             49.674 1 2  9.253512
"COD" "x1106" 39391.0340285401              56.12 2 2 10.581293
"MRT" "x1106"              520             47.176 3 2  6.253829
"NGA" "x1106" 11834.3195266272             50.481 4 2  9.378759
"TGO" "x1106"  4398.8603988604             58.838 5 2  8.389101
"TZA" "x1106"            45000             66.947 6 2 10.714417
"ZAF" "x1106"  608.84493902439 34.150000000000006 7 2  6.411563
"CIV" "x1107" 12032.0855614973             49.674 1 3  9.395332
"MRT" "x1107" 463.636363636364             47.176 3 3  6.139101
"NGA" "x1107" 17391.3043478261             50.481 4 3  9.763725
"TGO" "x1107" 5015.38461538462             58.838 5 3  8.520266
"TZA" "x1107" 43636.3636363636             66.947 6 3 10.683646
"ZAF" "x1107"          984.375 34.150000000000006 7 3  6.892007
end
label values iso_str iso_str
label def iso_str 1 "CIV", modify
label def iso_str 2 "COD", modify
label def iso_str 3 "MRT", modify
label def iso_str 4 "NGA", modify
label def iso_str 5 "TGO", modify
label def iso_str 6 "TZA", modify
label def iso_str 7 "ZAF", modify
label values var_str var_str
label def var_str 1 "x1105", modify
label def var_str 2 "x1106", modify
label def var_str 3 "x1107", modify

getting a sample size

hi,

I'm trying to get the sample size of black women from my data set. I created a black women(bw) variable, counted the bw and then collapsed it. Take a look at my code below. Is this the right approach to get sample size for bw? also should i add more or less variables in my by( )?

Array

Cluster Randomized Controlled Trial

I have a question about the Cluster Randomized Controlled Trial. Is it recommended to perform the svyset command when doing the Cluster Randomized Trial? Another question is that what command can we use if we want to adjust for clustering?

Thanks!

traj command with hierarchical data structure

I'm interested in using the user-written traj command (link below) to identify latent trajectories of change in patients' BMIs. The command has useful features like joint trajectory modeling and accounting for non-random attrition.

However, patients in my dataset are nested within physicians; I have unique physician identifiers for each physician.

Questions:

1. Is there a method or workaround that would allow traj to account for hierarchically nested data?
2. If not, to what degree would traj be robust to violation of the assumption that patients are independent of each other?


https://www.andrew.cmu.edu/user/bjones/

Dipendent Double-Sorting 25 Portfolios

Dear all,
I am struggling to replicate the FF-25 portfolios with a variant, I should employ a dependent sort instead of a independent sort.
My dataset is the following one:
permno date primexch ret year month datem BM id MarketCap
10001 30-May-86 Q -0.00980 1986 5 316 . 2 .
10001 30-Jun-86 Q -0.01307 1986 6 317 . 2 1.797265
10001 31-Jul-86 Q -0.01020 1986 7 318 . 2 1.797265
10001 29-Aug-86 Q 0.07216 1986 8 319 . 2 1.797265
10001 30-Sep-86 Q -0.00308 1986 9 320 . 2 1.797265
10001 31-Oct-86 Q 0.03922 1986 10 321 . 2 1.797265
10001 28-Nov-86 Q 0.05660 1986 11 322 . 2 1.797265
10001 31-Dec-86 Q 0.01500 1986 12 323 . 2 1.797265
10001 30-Jan-87 Q -0.03571 1987 1 324 . 2 1.797265
10001 27-Feb-87 Q -0.07407 1987 2 325 . 2 1.797265
10001 31-Mar-87 Q 0.03680 1987 3 326 . 2 1.797265
10001 30-Apr-87 Q -0.03922 1987 4 327 . 2 1.797265
10001 29-May-87 Q -0.07143 1987 5 328 . 2 1.797265
10001 30-Jun-87 Q 0.05143 1987 6 329 1.0144155 2 1.761665
10001 31-Jul-87 Q 0.02128 1987 7 330 1.0144155 2 1.761665
10001 31-Aug-87 Q 0.08333 1987 8 331 1.0144155 2 1.761665
10001 30-Sep-87 Q -0.02231 1987 9 332 1.0144155 2 1.761665
10001 30-Oct-87 Q 0.02000 1987 10 333 1.0144155 2 1.761665
10001 30-Nov-87 Q -0.02941 1987 11 334 1.0144155 2 1.761665
10001 31-Dec-87 Q -0.03354 1987 12 335 1.0144155 2 1.761665
10001 29-Jan-88 Q 0.06383 1988 1 336 1.0144155 2 1.761665
10001 29-Feb-88 Q 0.08000 1988 2 337 1.0144155 2 1.761665
10001 31-Mar-88 Q -0.07630 1988 3 338 1.0144155 2 1.761665
10001 29-Apr-88 Q 0.03061 1988 4 339 1.0144155 2 1.761665
10001 31-May-88 Q 0.01980 1988 5 340 1.0144155 2 1.761665
10001 30-Jun-88 Q -0.01204 1988 6 341 1.2076184 2 1.824549
10001 29-Jul-88 Q 0.03000 1988 7 342 1.2076184 2 1.824549
10001 31-Aug-88 Q 0.02913 1988 8 343 1.2076184 2 1.824549
10001 30-Sep-88 Q -0.021132076 1988 9 344 1.2076184 2 1.824549
10001 31-Oct-88 Q 0.039215688 1988 10 345 1.2076184 2 1.824549
10001 30-Nov-88 Q 0 1988 11 346 1.2076184 2 1.824549
where permo identifies the company, primexch identifies the stock exhanges ( Q=Nasdaq, N=Nyse, A=Amex), ret indicates the returns, BM (book-to-market value) is calculated at the end of the year t and it results to be publicly available in June of the year t+1 until May of the year t+2, finally MarketCap indicates the size of each company and it is calculated in June of year t and remains constant until May of year t+1.
I should sort stocks into 5 quintiles in accordance with their BM, and secondly I should double sort within each quintile according to companies’ Market Cap. The quintiles breakpoints should be calculated using only NYSE stocks (“N”).
Therefore I should obtain 25 portfolios, which are firstly sorted on BM and secondly on MarketCap.
Finally I should calculated the value-weighted monthly returns on these 25 portfolios from July of year t to June of year t+1.

Ps: I tried to write a code for calculate the value-weighted monthly returns on 10 deciles sorted on MarketCap for another type of calculation I had to do..Maybe it could be helpful
forvalues i = 1(1)10 {
egen num_return_dec_`i' = total(MarketCap * ret * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
egen den_return_dec_`i' = total(MarketCap * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
gen vw_return_dec`i' = num_return_dec_`i'/den_return_dec_`i' if deciles_MarketCap==`i'
}
Any help would be really appreciated as it is one week that I have been trying to solve this problem.
Best regards,
Antonio

Generate variables with forvals

Hello Statalist,
I have a dataset which contains the following variables: firm (every firm has an assigned number from 1-1000), their products (every product has an asigned number), the costs of sales, the revenue of the sale, and the year. Now I am trying to generate two very similar variables and a third one. One displays the average sales for each year and each firm and one displays the average costs for each year and each firm. The code should be equal for both, simply exchanging sales with costs. Now I am trying to construct this variable using a loop but I don't get the right result. My first try was the following:

forval i = 1/1000 {
forval j = 2001/2012 {
sum ventas if firma == `i' & year == `j'
gen ventap_`i'_`j' = `r(mean)'
}
}



There are 1000 firms. However the year data is not equal for all firms. There are some firms who have data from for example 2004-2009 or others with different periods (a lot of different periods) But the min of the variable year is 2001 and the max is 2012.

So when I run this code I encounter 2 problems: first it doesnt work because the firm doesnt have any observations for the year 2012 or other years (invalid syntax error). Second it creates a variable for every single year, displaying the average for that year. However I want just one variable displaying the average for the corresponding year for all cases.

The third variable that I have to create is one that displays the product that has the highest sales. The code should be a similar one to the first one, using 2 forvals containing the year and the firm but instead of using r(mean) it should probably use r(max). However here I encounter the same problem that not all firms have data for all the years between 2001 and 2012 and it generates a lot of variables instead of just one which shows the product id with the highest sales for the corresponding year.

I Hope i explained it understandibly and you can help me.
Thanks a lot

Interaction between variables changes the results fundamentally!

Dear All,
I would appreciate your help on the following please:
The correlation between y and x1 x2 is negative but between y and the interaction between x1 and x2 is positive, that's strange! Could somebody explain this to me, please?
Array

Weighted least squares (WLS) with wls0 and regwls

Dear Statalist,
I am conducting a long-run event study with the use of the Fama French 3 Factors model.
I am using the WLS regression, I want to use the monthly number of firms in the event portfolio as weights. Moreover, I want to use the equal-weighted monthly returns on each portfolio.
First of all, I uploaded the excel file and changed the format of the Date2 from string to Date3 variable (monthly format), then I declared the data set to be time series data.

I intend to use the Stata command wlsreg or wls0 with the options: wvar = No. of firms in the event portfolio in a month type(wlstype) - The choices are: abse - absolute value of residual e2 (With the dependent variable is the company name, explanatory variables are MktRF, SMB, and HML.
Could anyone please help me with the Stata command for WLS regression? I am very grateful.

Thank you and Kind regards,
Chi

Which Flavor of ADF Estimation Does Stata Use?

Hey Everyone,

with WLS/ADF estimation there are different flavors out there that are being used in statistics software. It is rather easy to control the specific estimator in R but in Stata I have not found any information on the exact source/reference the ADF estimation is built off.

My key interest is whether it is the pure Browne formula or if any of the adjustments to make WLS more robust to small sample size have been implemented.

Thanks and best
Leon

fill in empty adjacent cells within a group

Dear all,

I would like to ask how to fill in empty adjacent rows within a group. Problem here is the confusing data structure.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 subjid str21 personid str23 invtype
"a" "1" "Cranial Ultrasound Scan"
""  "1" "Other Ultrasound Scan"  
""  "1" "CT scan"                
""  "1" "X-ray"                  
""  "1" "EEG"                    
""  "1" "MRI"                    
""  "1" "ECHO"                   
""  "1" "ECG"                    
""  "1" ""                       
""  "1" ""                       
""  "1" ""                       
"b" "2" "Cranial Ultrasound Scan"
""  "2" "Other Ultrasound Scan"  
""  "2" "CT scan"                
""  "2" "X-ray"                  
""  "2" "EEG"                    
""  "2" "MRI"                    
""  "2" "ECHO"                   
""  "2" "ECG"                    
""  "2" ""                       
""  "2" ""                       
end

I want to put subjid like this.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 subjid str21 personid str23 invtype
"a" "1" "Cranial Ultrasound Scan"
"a" "1" "Other Ultrasound Scan"  
"a" "1" "CT scan"                
"a" "1" "X-ray"                  
"a" "1" "EEG"                    
"a" "1" "MRI"                    
"a" "1" "ECHO"                   
"a" "1" "ECG"                    
""  "1" ""                       
""  "1" ""                       
""  "1" ""                       
"b" "2" "Cranial Ultrasound Scan"
"b" "2" "Other Ultrasound Scan"  
"b" "2" "CT scan"                
"b" "2" "X-ray"                  
"b" "2" "EEG"                    
"b" "2" "MRI"                    
"b" "2" "ECHO"                   
"b" "2" "ECG"                    
""  "2" ""                       
""  "2" ""                       
end
Using personid is not a good idea because personid is repeated in row 4651 again- it strats with 1, 2 ...

if you let me know any solution for this, I would reapply appreciate it.

Kind regards,

Kim

Test for a unit root using panel data

Dear Community,

For my master thesis I want to test wether the error term of my model is stationary or not, in order to verify if the regression is spurious.
I am using unbalanced panel data, so based on what I read there are two possibile tests: xtunitroot lps and xtunitroot fisher.

However, when I try the lps test, I get an error saying ''insufficient observations'' and when I run the fisher test it takes ages for my computer to compute the test statistic.
I believe that stata computes the test for every panel (11000 in my case).

Do you know a way to solve these issues?


Losing groups while running a regression

Hello,

I am trying to make a panel data regression. My group variable (cntry) has 41 countries. But when I run a regression the number of groups is reduced to 18. I cannot find anything about this and stata does not say anything about it or why these groups are left out. Whaat are possible reasons that these groups are not taken into account when running the regression?

Latent class analysis categorical variables

Dear all,

I have been using latent class analysis in STATA 15 and have been able to get results using the gsem command for binary variables using the following code:

Code:
gsem (isch afib ckd dbt hyp pvd <- _cons), logit lclass(C 3)
However when i try to analyse categorical variables, I get an error message


Code:
.gsem (bmigroup  hb_class  age_gp <- _cons), ologit lclass(C 2)

invalid path specification;
ordinal response bmi_fr1 may not have an intercept
Could anyone help me with this?

Thank you

X11 Forwarding only showing half screen with Stata

I am using PuTTY to do SSH tunneling and X11 Forwarding with an Amazon EC2 Ubuntu instance. Multiple individuals in my system login to their own remote desktops, and then connect to the same EC2 instance.

I am currently trying to fix an issue where half of the screen is cut off for some users after the forwarding. I've tried

1) Make sure everyone uses the same network, however this doesn't solve the problem, the full screen shows up for some users and only a cut off screen for other users

2) Logging into the remote desktop of the users who are experiencing the cut off screen problem (using my own laptop), however I'm not able to replicate the issue

3) Configure the Xming display settings in XLaunch, and have Xming display in one window instead of multiple windows. Jury is still out on whether or not this works, I haven't had the users try out the new configuration yet. Also, when I open save XLaunch configs, hit finish, and then later open up XLaunch again, "multiple windows" / default settings are selected, rather than "one window". So before I tell users to try out the "one window" setting, how can I make sure that my new configurations are actually saved?

4) Do the PuTTY connection and X11 forwarding set up through XLaunch, rather than through PuTTY. I think this would be rather complicated so haven't tried it yet... though willing to do so if it could solve the issue.

Thoughts? Image of the problem is shown below.


Array

xtivreg2 - identifying singleton observations

Dear All,

I have a panel dataset with 18,071 observations.

I am estimating the following model in stata:

xtivreg2 ret2_w mret2_w s_r10_lmcap s_r10_bm s_r10_mom s_r10_op_prof s_r10_agro s_r10_stdret s_r10_vol_s s_r10_lag_ue_p s_r10_lnumage s_r10_divy yr1-yr13 (tq2_centered_w = wklymret_w wklych_usd_w) if sample_to_use == 3, fe first liml cluster(cnum) endog(tq2_centered_w)

The output begins with the warning:

Warning - singleton groups detected. 194 observation(s) not used.

Partial output below indicates 17,877 observations were used (18071-194 = 17877)

Number of clusters (cnum) = 1132 Number of obs = 17877
F( 25, 1131) = 59.19
Prob > F = 0.0000
Total (centered) SS = 56.71687777 Centered R2 = -0.1134
Total (uncentered) SS = 56.71687777 Uncentered R2 = -0.1134
Residual SS = 63.15058851 Root MSE = .06141

When I check generate descriptive statistics, I have complete data for all the variables in the above model for 18,071 observations. No missing values.

Importantly, I have only 7 singletons in my 18071-observation dataset:

. count if number == 1 & sample_to_use == 3
7

I would like to drop the 194 singletons. Could some one let me know how to identify and eliminate the 194 observations?

Best,

Srinivasan Rangan

Saving WTP estimates to conduct Poe Test

Dear all

I have run a clogit model and have estimated wtp via wtp, krisnky command. I now want to save the wtp estimates so I can perform a Poe test to compare with wtp estimates for another clogit model. However I do not see any option for saving these wtp estimates. The saving function is only available with the wtpcikr function which I do not think can be used with a clogit model.

Can anyone help me with this please
​​​​

Formatting and managing dates, from String to MMYYYY format

Dear All,
I am a new user in Stata.
I am having a basic question and would kindly ask for your help. I want to change the data format from string to the date format (MMYYYY). I have tried this formula date(Date2, "MY") and then created the monthly format (Date3). However, the format is not what I expected.
Many thanks for your support,
Chi

Bootstrap

Hi everyone,

I need some help with understanding why STATA doesn't let me use the command: bootstrap_b.
It writes: unrecognized command: bootstrap_b.
What can I do?

Thanks in advanced.
Gal.

Advices on how to learn systematically how to work with panel

Hi everyone,

I have very elementary skills in econometrics and until now I've only worked with cross-sectional data. Now I need to work with panel data, but I feel I lack even the basic competences (even for doing descriptive statistics). Until now I've tried to fill my gaps "on the road", basically trying to learn only the things that I needed immediately. I resorted to this "easy" steategy only because I'm really short of time.
But that's not working. I need a more systematic training on how to explore my data and work with them when they have a panel dimension.
My handbook is not very helpful: the chapter on Panel starts from regressions; I want to be able to know my data in detail and know how to work with them before I do regressions. There is probably I reason for that lack in my book (maybe I should look into the time series methods for descriptive statistics?), but I don't know it.

So my question is: considering that (independently on my will) I'm short of time, what book/video/online resource would you suggest to have a systematic introduction to panel data in Stata, which includes all the "tricks" to describe them and work with them (I learned how to do many things the long way, and then found a much shorter way in Statalist.. isn't there a way to learn these things systematically?)?

Aurora


How to interpret the result of the "Total Factor Productivity of Manufacturing Firms" based on Levinsohn and Petrin (2003) approach?

I intend to measure the TFP of manufacturing firms for 23 firms through Cobb - Douglas Production Function Approach using Prodest code in Stata for the period 2015-2017.
I am using Levinsohn and Petrin (2003) approach with the attached Stata dataset for the same. However, I got negative coefficients of logL and logK in case of Levinsohn and Petrin (2003) approach. Results have been attached in the form of the image below. These individual TFP Values as dependent variables are regressed with infrastructure stocks as an independent variable.
Stata Code:

prodest lnGVA, method (lp) free(lnL) proxy(lnInput) state(lnK) poly(3) valueadded reps(250)

predict TFP


Can anyone help to overcome this issue in the result? Please respond.
Famid year lnGVA lnK lnL lnInput
1 2015 13.34451139 14.43711069 13.82499642 14.94789177
2 2015 10.90103056 11.39432509 12.00363817 12.56028455
3 2015 10.52884158 10.90823019 11.74051512 12.56156862
4 2015 11.71408167 12.96707595 11.86919333 13.19120632
5 2015 10.78025708 10.57660072 11.29931136 12.61370021
6 2015 11.30195799 10.79404052 10.71557266 12.07061138
7 2015 13.89161883 14.69274188 13.91372923 15.59004602
8 2015 12.68505841 13.08795162 13.27239071 14.34357928
9 2015 12.17481436 12.49800879 11.90358822 13.0876142
10 2015 10.37213186 10.36546633 10.85971028 11.51539809
11 2015 11.89178185 12.93752458 11.87475212 13.15976277
12 2015 12.88529455 13.74124074 13.52565546 14.57748312
13 2015 11.26551282 11.99049128 12.59243988 13.35132999
14 2015 11.91772596 13.18896836 12.4565356 13.65617625
15 2015 14.08798489 14.43171365 14.08198176 15.39174269
16 2015 11.84720763 14.04701543 12.27763023 13.27226267
17 2015 11.84987474 12.32818954 13.05611887 13.71366131
18 2015 12.2899301 12.94906791 12.83675914 13.8142172
19 2015 13.31159488 14.0107976 14.37021545 14.99198913
20 2015 7.930242796 7.485056583 10.17564981 8.622648785
21 2015 12.58255248 13.30226199 13.42014082 14.52482222
22 2015 12.45019737 12.54385883 12.59546596 13.57663852
23 2015 11.82869258 13.06866314 13.13062515 14.08698579
1 2016 13.48373632 14.55212104 13.81087483 14.87352281
2 2016 11.09011635 12.09660422 12.06294103 12.5021024
3 2016 10.43491923 10.92367085 11.54507315 12.3586785
4 2016 11.26804467 13.0838402 11.83438558 13.05207522
5 2016 10.7530443 10.4765827 11.23227197 12.73695465
6 2016 11.41296781 10.8661762 10.80685514 12.14263568
7 2016 13.9810047 14.90279048 13.99064468 15.47850878
8 2016 12.75290698 13.25330751 13.23466654 14.43921191
9 2016 12.17262238 12.62311498 11.81393335 12.95478697
10 2016 10.49895051 10.55531489 10.86526777 11.58579992
11 2016 11.52578885 12.93922703 11.86191193 13.20034564
12 2016 12.99481134 13.78688988 13.55250289 14.51278126
13 2016 11.53622211 12.27852734 12.51736626 13.27750928
14 2016 12.20480924 13.55141515 12.49875718 13.69012924
15 2016 14.14469829 14.47629798 14.13122964 15.45322844
16 2016 11.80136403 14.22621284 12.25082132 13.35906515
17 2016 11.91459215 12.41077943 13.10543896 13.69703396
18 2016 12.39233098 13.06787942 12.88085472 13.9162249
19 2016 13.50721806 14.10146753 14.47325385 14.97086086
20 2016 7.590132471 7.815032882 10.07225939 8.626449627
21 2016 12.79879814 13.47297179 13.50166245 14.52501448
22 2016 12.73090918 12.64631381 12.64053977 13.87087559
23 2016 12.05060545 13.1477183 13.11664506 14.11136877
1 2017 13.37820655 14.57031481 13.87654921 14.99791944
2 2017 11.31482949 11.94872061 12.1067936 12.48835749
3 2017 10.48552974 11.50844226 11.50258216 12.33606399
4 2017 11.45326146 13.4451234 11.89512877 13.13305958
5 2017 10.53909949 10.41244812 11.2286642 12.71520385
6 2017 11.32904661 10.91523748 10.70495088 11.98065828
7 2017 13.91086343 15.06872287 14.03587425 15.54622029
8 2017 12.98490547 13.39216318 13.3848061 14.65935721
9 2017 12.08523011 12.39343758 11.86197541 12.95263912
10 2017 10.6732993 10.92220152 10.98576719 11.73012375
11 2017 11.90939933 13.25324863 11.88186489 13.18303098
12 2017 13.19309476 13.82810032 13.62542212 14.61800501
13 2017 11.72306923 12.43306157 12.42895616 13.41005668
14 2017 12.21973581 13.62180352 12.54387614 13.72163599
15 2017 14.10537468 14.43870418 14.12643932 15.34049703
16 2017 12.04438547 14.43877548 12.31398041 13.40533761
17 2017 11.98871848 12.33912744 13.18484604 13.69294326
18 2017 12.53221462 13.23962174 12.93065551 14.01079714
19 2017 13.5782688 14.25990109 14.51053547 15.04941816
20 2017 7.673050689 7.846913183 10.08397409 8.60560012
21 2017 13.23123952 13.50385468 13.57157867 14.59337051
22 2017 12.61675213 12.68825504 12.74948936 13.66224058
23 2017 12.22915794 13.35048112 13.11830917 14.14173054

PMG insufficient observation

Hi everyone,

I was trying to run panel under PMG effect. N= 36 over the time period 1984-2016. When I run the command for the full panel, it is fine. But, in the case of developed and developing countries, I got this message: insufficient observations r(2001). Anyone has idea or suggestion. Please,

Regards,
Marwan

Finding code snippets from Stata's base commands

Dear Statalist,

Is there a way to see the programme code from Stata's own base commands?

In my case, I am interested in seeing how the firstrow option of Stata's import excel command works exactly because I want to learn from it in order to do something similar (I want to be able to modify the column headers of my Excel file after importing and only then should these headers become variable names, so I can't just use the firstrow option directly). But when I type sysdir and then locate the import_excel.ado file in my BASE folder, it contains only very limited reference code, not the full programme...

Many thanks,
Felix