BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Friday, May 31, 2019

clogit vs. xtlogit, fe

Is there a difference between clogit and xtlogit, fe? It appears to me they both do conditional logistic regression with fixed effects.

Time lag for Multi_level fixed effect panel data.

Hi all,
I have some douts about time lagging for my independent variables in multi-way FE panel data (reghddfe). I have this panel model with a health outcome (ho) as my dependent variable and socioeconomic and health services system indicators (hi, hi_1, hi_2 and hi_3) as independent variables. As one of my IV's is the unemployment rate (ur), and presumably with lagged effect over my DP, I also suppose that the present unemployment rate also affect the health outcome, in a cummulative or interactive way?!?

Let's say that my model is :

reghdfe ho ur gdp gini hi hi_1 hi_2 hi_3, absorb(state) vce(cluster state#year),

Suppose that I have ur (actual unemployment rate) and ur_1, ur_2 and ur_3 as unemployment rates lagged in 1, 2 and 3 years in my data set, and I want to observe te effect of three consecutive years of unemployment (or occupation rate) over health outcome (ho my DP).

Would it be correct to model:

reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#c.ur_1#c.ur_2#c.ur_3, absorb(state) vce(cluster state#year)

, or, instead of using my own lagged data, to use the Stata “L" command for time lagged associated with unempolyment factor variable? So:

reghdfe ho gdp gini hi hi_1 hi_2 hi_3 c.ur#L1.c.ur#L2.c.ur#L3.c.ur, absorb(state) vce(cluster state#year)

Any suggestion for this model.

Thanks in advance.

Alexandre Bugelli

-sreshape-

i just installed -sreshape- onto stata mp.. does anyone know if it has a max variable limit? i have 7500 vars in wide format that i wanted to -sreshape- into long but i keep getting error message

Lasso regress questions

Hi, I am college student from Barcelona. It is hard to learn Stata by myself because the teacher does not explain how to use commands, what they do... and we are in an introductory subject. The homework for this weekend uses a dataset with wage and some covariates, and we should use the lasso and ridge approach. He encouraged us to create as many variables as we can (I do not why, dummies, etc...). But he told that we should install (net install elasticregress, replace) and (ssc install lassopack, replace). I suppose that it install some new commands.

In the second question he says that we should use the commands rlasso and lassoregress. I do not know what is the difference between both commands, I could not fin it in Internet. Also I saw an extra command called lasso2. What they do? Thank you.
2) Use the lasso methods (rlasso, lassoregress and ridgeregress) to select the most relevant covariates for the analysis.

Add Text to Graph Combine

I am combining three graphs using 'graph combine'. By default they appear in a 2 x 2 arrangement with the lower right slot empty. I'd like to add text to this empty area. What's the best way to do this?

I have tried the 'caption' and 'note' options, with and without the 'position' suboption, but that distorts the shape or is at the far bottom of the combined graph.

I don't know if this would work, but if I could either save the text as a standalone .gph file, I could add that way, or perhaps there is an option I'm missing for 'graph combine' whereby you can just place text anywhere you like (using coordinates, not clock position).

Obtaining mean and SD from survey data

Hello,

I am trying to get the differences in the length of stay(LOS) in subpopulation of myocarditis, categorized by whether they have arrhythmia or not (Tarry or not). I get mean and standard error. I would like to get mean and standard deviation. How would I be able to get that? Thanks.

This is what I did and what I got.
. svy linearized, subpop(myocarditis) : mean LOS, over(Tarry)

0: Tarry = 0

1: Tarry = 1

--------------------------------------------------------------

| Linearized

Over | Mean Std. Err. [95% Conf. Interval]

-------------+------------------------------------------------

LOS |

0 | 8.035982 .3003205 7.447308 8.624657

1 | 15.02443 1.001127 13.06207 16.98679

--------------------------------------------------------------

Showing country-specific treatment effects

Hello,

I'm currently working with a dataset with individual respondents. In my analysis, I show the average treatment effect for the treated group. I suspect, however, that treatment effects vary across countries. How can I show this?

I wish to report average treatment effects for several countries as shown in the picture (from another analysis):

Array

Standard Error Correction in a two step process

Dear all
I wonder if anyone has any references, and perhaps Stata applications, that can help me solve a problem like follows:

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
* This is the benchmark
reg lnwage educ exper tenure female single
est sto m1
* now, i can create the following variables:
gen double female2=female*_b[female]
gen double educ2=educ*_b[educ]
gen double femaleeduc=female2+educ2
gen double lnwage2=lnwage-female2-educ2
* And estimate this model
reg lnwage femaleeduc exper tenure single
est sto m2
* or this other
reg lnwage2 exper tenure single
est sto m3

est tab m1 m2 m3, se
-----------------------------------------------------
    Variable |     m1           m2           m3      
-------------+---------------------------------------
        educ |  .08303358                            
             |  .00513458                            
       exper |  .00895939    .00895939    .00895939  
             |  .00156103    .00155551    .00153964  
      tenure |  .00613849    .00613849    .00613849  
             |  .00187615    .00186656    .00185977  
      female | -.09382159                            
             |  .02490175                            
      single | -.16080964   -.16080964   -.16080964  
             |  .02702903    .02694462     .0269209  
  femaleeduc |                       1               
             |               .05800787               
       _cons |  2.3408107    2.3408107    2.3408107  
             |  .07085517    .06106525    .02653445  
-----------------------------------------------------
                                         legend: b/se

All models provide the same point estimates, but different standard errors. The benchmark is column 1.
Column 2 combines the effect of female and educ, and adds them into the model. and column 3 simply extracts the effect of female and education from lnwage before estimating the model.
My question is, does anyone know how to correct the standard errors from model 3 or 2, and obtain the "correct" ones from model 1?
I know this could be done using bootstrap methods, but Im trying to see if it can be done in a different way.

For more details on why this. Im revising Robinson's Semiparametric estimator. (see reference below). Im aware about the userwriten command -semipar-. However, the application itself does not provide much detail on how to estimate the standard errors of the nonparametric section.
Looking through the code, it does it in a similar way as the results from column 3, but what we want are the results from column 1.
Thank you in advance.

Robinson, P. M. 1988. Root-n-consistent semiparametric regression. Econometrica 56: 931–954.

WLS regression using regwls and regress

Dear All,

I have a question regarding WLS regression using Stata commands regwls and regress and would kindly ask for your help.
For your information, The general idea is I want to use WLS regression with the monthly number of firms for each observation as weights (I created the weight column on excel and imported into Stata since I do not know how to generate it on Stata). Additionally, the "avg" variable is the equal-weighted monthly returns on each portfolio this is my dependent variable.
Firstly, I used the command "regwls avg MktRF SMB HML [aw=1/ Weight]" for the WLS regression with the analytical weight (However this command does not work on my Stata 13 version). Lately, I tried the command "regress avg MktRF SMB HML [aw=1/ Weight]" and it worked.

Could someone please let me know if the two commands are the same and whether my approach is correct?
Many thanks for your help!

Best regards,
Chi

Inverted Normal Graph

Hello all,

Admittedly a mundane question here... I'm simply trying to plot an inverted normal distribution for an upcoming presentation. I've successfully plotted a normal distribution, but now I simply need to flip it upside down.

Code:

clear
set obs 100
gen x=rnormal(0,1)
twoway function y=normalden(x), range(-4 4) xtitle("{it: x}") ///
ytitle("Density") title("Standard Normal")

Any suggestions would be greatly appreciated! I recently converted to Stata from SPSS and apologies for presumably a rather elementary question!

J.

How to do matrix exponential operation in Stata?

Code:

matrix A = (1,0,0,0,0\0.6,0,.4,0,0\0,.6,0,.4,0\0,0,.6,0,.4\0,0,0,0,1)
matrix list A
matrix B = A*A

I know how to do A^2. Now I want to do A^20. How should I do this?

Many thanks in advance!

gsem covstruct

Hello.
I am running LPA analysis using gsem command. I ve run the analysis in R using the mclust command too. The problem I have is that I dont get similar results. For example, for my best model in R (based on BIC etc), I am getting a 3 class what is called VVI model (that is varying volume and shape and identity for the orientation). In stata, I am trying to put the same constrains (that is, I want all parameters to vary freely) and I am not sure I am getting. I ve tried lcinvariant (none) and covstruct(e._LEn, diagonal) and I get similar but not the same results
anyone familiar with this?
thank you a lot

Generate balance table

Hi, I'm trying to replicate this

balance table (as in the picture) using some of the example datasets installed with Stata, in particular I was trying to use the bplong.dta. However I haven't been able to

do so. I found the use of the command iebaltab to do this table but I'm having problems understanding how it works. Do you have any idea how can I do this?

. Array

extract variable labels for new variable names

Hi there

I try to automate my programming as much as possible and one challenge I've come up against recently is in trying to name new variables according to the value labels of existing variables.

For example:

Code:

sysuse sandstone
tab type, gen(type_name)

My goal is to name the new variables:
type_measured
type_estimated
type_interpolated

Any advice greatly appreciated.

Count number of cases if dates are within a certain range (a la statsby)

Greetings all,

I have single line per observation survival data (4 million lines). Here is a simplified example

default	zip code	date_start	date_end	date_default
1	12345	2000q2	2016q1	2005q3
0	54321	1993q4	2016q1
1	13467	2003q1	2016q1	2010q1

One thing I'd like to with my data is to understand the default rate per quarter, by department. I'd ultimately like to construct a second panel (or, a first one, since this isn't per se a panel as is) where I have the different zip codes as the subjects to be followed through time, and in the end the rate of default per time. I have unemployment data that is already organized in this fashion, and naturally I want to combine it with a default rate (# defaults / # "alive" or "at risk" loans) per zip code:

zip code	date	unemployment	default rate
11111	1990q1	4.2	x
11111	1990q2	4.1	x
11111	1990q3	4.6	x

One guess was to create some new variable that uniquely identifies zipcode/quarter combinations, and then to do a statsby on this. But that would imply ~12,000 groups (100 zip codes * 30 years *4 quarters), and that just doesn't seem right/efficient.

It shouldn't be hard for me to find a way to count the defaults per quarter/department (although I can't do tab default department zipcode, as this is too many variables :/), but I must confess I have no idea where to start on counting (and organizing in a new panel, without Excel) the at-risk loans per quarter.

Thank you so much for even some rough intuitions about how to go about this in STATA.

Have a great day,
John

Multinomial logit with sample selection

Dear everyone,

I am looking for something similar to Heckman selection model/svysemlog with a modification.

I have a selectiion variable with two values (0 and 1) in the first step, and a mulitnomial non-ordinal categorical variable (with six categories) in the second step.
I am interested only in positive (1) values in the first step (around 30% of the total sample).

What I did at the first place was a. logit analysis for the first step b. multinomial logit for the second step. However, I was advised to use the Heckman selection model for multiple reasons.
However, if I am not mistaken, Heckman (and svysemlog) cannot be used if the outcome variable is a non-ordinal variable.

I have two questions:

a. Is there any Stata package that adresses my problem?
b. Do you have any advice how to proceed, in case there is no ready-made solution in Stata?

Thanks in advance!

Manually installing Blindschemes by Daniel Bischof

Dear Statalisters

I admit this is a bit of a non-problem, but I'd like to find a solution nonetheless. Never underappreciate a nice graph.

I'm trying to use Daniel Bischof's schemes for making graphs (found here: https://danbischof.com/2015/02/04/stata-figure-schemes/). My organisation doesn't allow installing via ssc, so I downloaded all the scheme and style files and added them to the folder where all my other ado files are stored. I saved the color files both into a separate folder called "style" (this is what ssc does, I think), as well as in the same folder with the scheme files. Now, when I'm setting the color scheme to plotplainblind, the graphs come out in that scheme, but in black and white. The command doesn't seem to find the colors. So, I think I need to define these colors first in some way, but I don't know how. Any suggestions?

Many thanks

Carolin

How to declare data with tournament structure as panel data?

Dear all,

I recently read some papers using panel data from sports. I started to wonder how one would actually declear data e.g. from tennis to be panel data.
Typically, in tennis there is a season which consists of several tournaments. In turn, each of these tournaments consists of several matches. Each match consists of a sequence of sets. A set in turn, consists of a sequence games.

So, one observation is for player x from game g in set s of match m played for tournament t in season z. If there are seperate variables indicating the season (e.g. 2015), the tournament (e.g. 1), the match (e.g. 1), the set (e.g. 1), and the game (e.g. 1), how would one declare the data to be panel while keeping the structure described above? I included the code for a sample data set below.

Obviously, the panelvar in the xtset-command would be player_id. But how would one set the timevar if one's goal was to run a panel data regression (e.g. using xtreg) at the game-level which includes time lags (e.g. matchlevelstat1 from the previous match as well as gamelevelstat1 and gamelevelstat2 from the previous game, which might actually be from the same tournament and same match but from the previous set of that match) as independent variables?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year player_id tournament match) byte(set game) float(gamelevelstat1 gamelvelstat2 setlevelstat1 matchlevelstat1 tournamentlevelstat1 yearlevelstat1)
2018 1 1 1 1 1  19 19  22 26  57 17
2018 1 1 1 1 2  64 19 100 39   3 47
2018 1 1 1 2 1 100 32  79 93  32 92
2018 1 1 1 2 2  67 70  15 63  82 88
2018 1 1 2 1 1  86 12  83 92  55 50
2018 1 1 2 1 2  67 97  95 93 100 48
2018 1 1 2 2 1  14 53  58 28  26  6
2018 1 2 1 1 1   8 78   6 35  22 41
2018 1 2 1 1 2  87 85  68 55  98 17
2018 1 2 1 2 1  32 56  87 69  40 94
2018 1 2 1 2 2  47 24  42 89  32 99
2018 1 2 2 1 1  16 98  38 85  21 11
2018 1 2 2 1 2  88  1  87 60  96 28
2018 1 2 2 2 1  14 72  50 19  55 14
2019 1 1 1 1 1  34 48  16 38  95 44
2019 1 1 1 1 2  73  6  25 26  93 96
2019 1 1 1 2 1  92 27  48 89  68 99
2019 1 1 1 2 2  62 66  66 27  80 22
2019 1 1 2 1 1  69 46  40  2  90 59
2019 1 1 2 1 2  27 74  55 13  14 73
2019 1 1 2 2 1  11 61  75 26  73 26
2019 1 2 1 1 1  12 43  16 28  58 15
2019 1 2 1 1 2  49 49  91 83  61 35
2019 1 2 1 2 1  71  1  62 90  50 54
2019 1 2 1 2 2  88 53   6 58  40 99
2019 1 2 2 1 1  84 13  33 96   3 30
2019 1 2 2 1 2  79 68  80 18  86 19
2019 1 2 2 2 1  52  5  77 17  36 48
2018 2 1 1 1 1  59 67   5 29  96 22
2018 2 1 1 1 2  89 34  22 69 100 40
2018 2 1 1 2 1   5 74   8 49  97 83
2018 2 1 1 2 2  58 91  44 66  58 62
2018 2 1 2 1 1  96 77  73 53  59 62
2018 2 1 2 1 2  90 38  32 80   2 42
2018 2 1 2 2 1  79 43  90 18   6  1
2018 2 2 1 1 1  49 85  38 25  95 33
2018 2 2 1 1 2  23 35  35 51   9 53
2018 2 2 1 2 1   9 92  49 98  91 44
2018 2 2 1 2 2  78  9  26 81  23 39
2018 2 2 2 1 1  85 13  98 55   8 77
2018 2 2 2 1 2  24 38  75 12   1 53
2018 2 2 2 2 1  65 91  31 49  96 70
2019 2 1 1 1 1 100 38   9 86  15 83
2019 2 1 1 1 2  78  3  94  9  32 26
2019 2 1 1 2 1  73 40  41 62  60 59
2019 2 1 1 2 2   2 30  26 62  78 49
2019 2 1 2 1 1  21 83  58 10  25 16
2019 2 1 2 1 2  63 92  78  4  29 23
2019 2 1 2 2 1  98 67  59 61  82 62
2019 2 2 1 1 1  75 48  72 25  14 64
2019 2 2 1 1 2  87 76  87 98  60  7
2019 2 2 1 2 1  42 40  38 12  61 29
2019 2 2 1 2 2  12 82  72 48  61 59
2019 2 2 2 1 1  35 42  50 24  14 17
2019 2 2 2 1 2  84 73  75 25  25 72
2019 2 2 2 2 1  50 85  79  8  56 52
end
label var year "season"
label var player_id "player "
label var tournament "tournament number"
label var match "match number"
label var set "set"
label var game "game"
label var gamelevelstat1 "game-level statistic 1"
label var gamelvelstat2 "game-level statistic 2"
label var setlevelstat1 "set-level statistic 1"
label var matchlevelstat1 "match-level statistic 1"
label var tournamentlevelstat1 "tournament-level statistic 1"
label var yearlevelstat1 "season-level statistic 1"

Drop ID if different observations for that same ID do not vary across another variable

Hello,

I am using Stata 14.2 on Windows. This is my first post so I hope I am doing this correctly.

The dataset I am using contains around 100.000 observations with information about buildings.
Each building has an ID number like 344100000000006, followed by an adress, (..some more variables that are not important for the question) and the function (labeled with values 1 - 12).
One building can contain multiple living units, a store on the ground floor etc. These units are all seperate observations with the same building ID (so they will have the same adress and only (if) differ in function). Therefore one building ID can occur for example 16 times.

I want to know which buildings have more than one function, like building with ID 344100000000042, which is used for both function 3 and 12.
I am not interested in buildings with only one function so I want to drop them from the data set.

I believe I need to combine different observations with the same ID into one, and while this is an issue I found many forumusers are struggeling with, I am not experienced enough with Stata to apply suggestions to other problems to my own case. Therefore I sincerely hope someone is willing to help me.

The data looks like this: (I excluded other variables that are not important to the question)

* Example generated by -dataex-. To install: ssc install dataex
clear
input double gebwbagidgetal long gebruiksdoel_n
344100000000006 12
344100000000006 12
344100000000008 12
344100000000008 12
344100000000011 12
344100000000011 12
344100000000011 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000014 12
344100000000016 12
344100000000016 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000029 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000039 12
344100000000041 12
344100000000041 12
344100000000042 3
344100000000042 12
344100000000053 12
344100000000053 12
344100000000061 3
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000061 12
344100000000064 12
344100000000064 12
344100000000074 12
344100000000074 12
344100000000074 3
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000074 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000079 12
344100000000082 12
344100000000082 3
344100000000084 12
344100000000084 3
344100000000084 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000089 12
344100000000090 12
344100000000090 12
344100000000090 12
344100000000091 3
344100000000091 12
344100000000098 3
344100000000098 12
344100000000102 3
344100000000102 12
344100000000106 12
344100000000106 12
344100000000109 3
344100000000109 12
344100000000114 3
344100000000114 3
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
344100000000116 12
end
label values gebruiksdoel_n gebruiksdoel_n
label def gebruiksdoel_n 3 "gemengd", modify
label def gebruiksdoel_n 12 "woonfunctie", modify
[/CODE]

Drop ID if different observations for that same ID do not vary across another variable

Mixed or reg i.country i.year for repeated cross-section data

Dear all,

I am doing a time-series cross-sectional data from 4 waves and around 25 countries and I am using Stata 14. The dataset used is the International Social Survey Programme years 1988, 1994, 2002 and 2012. My main variable of interest is female hours worked per week (originally WRKHRS, for purpose of analysis generated work hours only for females, 0 if otherwise) and how are they affected by the benefit amount/presence in the country. First I had these benefits in the percentage of expenditure per GDP, but my supervisor told me to generate dummies, 0 for no benefit and 1 for the benefit, for all the different types I had. I have them both ways now. I have two parts of the research: first is a regression with female hours worked per week and the relationship with different types of benefits, the second part is focused on analyzing attitudes - support for traditional gender roles of men, comparing between countries.

I want to do an individual level analysis (within respondents) on the effect based on education##benefit, marital status, attendance of religious services and presence of a child. On country level variables I have the benefits and Unemployment rates and labor force participation for men and women, total fertility rate and types of expenditure - public total, in-kind % of GDP, in cash % of GDP and real GDP forecast. I know its too much, I won't be using all of them, just letting you know what I have.

I was planning to do a mixed command, starting with basic mixed femworkhours || countryid: , and build upon that, adding more lvl1 predictors and then lvl 2. However, I cannot declare it a panel data set because of repeated time values within the data set, so I set it xtset countryid (As i read somewhere in this forum it is an option for repeated cross-section data). Since this is my thesis, I asked my supervisor if I should use mixed or a simple reg with i.countryid i.wave, and he suggested to use reg with i.countryid i.year. Nevertheless, when I regress it does not seem that there is a significant but small country effect, and it comes out that the first part of the analysis ignores country and year effects. Could the problem be if I run a basic regression with fixed country and year effects I should use mean hours worked by country rather than individual level? I was browsing this forum and the internet and unfortunately could not find the answers I was looking for.

Hence the question, what would you suggest to do with this data? The variable female work hours presented below looks like many observations are missing, but that is not the case since I run mdesc command and from the total sample 33% are missing (the values range from 0-80 hours worked per week). I hope this question is clear enough to understand, if not, please let me know where I can elaborate.

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(femworkhours married incgroup fulltime parttime attend1) byte educ float(dbgrant drealfam dincmaint ddaycare dpleave dchildall wave countryid)
0 0 1 0 0 1 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 0 1 0 0 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 0 1 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 2 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 0 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 1 0 0 1 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
0 0 3 0 0 0 1 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 5 1 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 0 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 3 0 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
0 1 5 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 0 3 0 1 0 2 0 1 0 0 0 0 2 1
. 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
0 1 1 0 0 0 1 0 1 0 0 0 0 2 1
. 1 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
0 1 4 0 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 1 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 1 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 0 1 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 1 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 1 3 0 0 0 1 0 1 0 0 0 0 2 1
. 1 3 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 0 5 1 0 0 3 0 1 0 0 0 0 2 1
. 0 4 1 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 1 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
. 0 4 1 0 0 3 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 2 0 1 0 0 0 0 2 1
. 0 1 1 0 0 2 0 1 0 0 0 0 2 1
. 1 3 1 0 0 2 0 1 0 0 0 0 2 1
. 1 4 0 1 0 1 0 1 0 0 0 0 2 1
. 1 5 1 0 0 1 0 1 0 0 0 0 2 1
0 0 4 0 0 0 1 0 1 0 0 0 0 2 1
. 1 4 1 0 0 1 0 1 0 0 0 0 2 1
end
label values incgroup incgroup
label def incgroup 1 "10%", modify
label def incgroup 2 "25%", modify
label def incgroup 3 "50%", modify
label def incgroup 4 "75%", modify
label def incgroup 5 "90%", modify
label values fulltime employed
label def employed 0 "not fulltime", modify
label def employed 1 "fulltime", modify
label values educ educ
label def educ 0 "no education", modify
label def educ 1 "primary/lower secondary", modify
label def educ 2 "upper/post secondary", modify
label def educ 3 "lower/upper tertiary", modify
label values wave wave
label def wave 2 "1994", modify
label values countryid countryid
label def countryid 1 "AU", modify
[/CODE]

destring numbers in scientific notation

Dear community,

I was inattentive when pasting data into a new stata-file and now the following problem presents itself: I have a unique numeric identifier with verly large number such that stata abbreviated it to scientific notation e.g. 1.7876423e+11. In the new file this numeric identifier appears as string and contains commas (",") instead of dots. I have now trired to destring this varlist unsuccessfully.

Is there someone who has encountered a similar problem before and could help me out?

Kind regards,

Marie

Interpretation of the interaction term when the relevant dummy variable is insignificant

Dear all,

In the model that I have run to analyse the effect of currency swaps on the gross capital flows of the countries signing them I have included a dummy variable for the signing of such a currency swap agreement (signing=1) as well as a dummy for whether the country is a developed economy or a developing one (developing =1) To check whether the effect of a currency swap differs between a developing or developed country I have included an interaction term between these two dummy variables (signed and developing =1). However, the results that the model produced have rendered my interaction term significant at the 1% level with a positive coefficient, but my dummy variable for the signing of the currency swap is negative (as expected) but insignificant. How do I interpret these results? Due to the fact that the original currency swap dummy variable is insignificant it is impossible to conclude how large the positive effect of a currency swap is for a developing country right? However, does it still allow me to say that a positive relationship exist, but that the size of it is unclear due to the insignificance of the foregoing dummy variable? Many thanks in advance!

Kind regards,

Owen

Thursday, May 30, 2019

categorizing data

Hi all, I have a dataset includes up to 30 string variables. Some of them are dummy variables and the others are categorized with a limited number of categories. I'm trying to categorize the data according to their common features. A potential approach is to use the "tabulate" command. However, Tabulating for 30 variables makes no sense and is difficult even using a prefix command like "by."

Droping observations with x amount of missing values

Dear all,
I'm working with a messy data set with approximately 870 observations at the moment.
After using the command missings table, I realised that 91 observations have 99 missing values out of 108 variables. I used missings list, min(99) to see which observations
account for this.
Now, I want to drop these observations from the data set. I wonder if there is a command that would use the information produced by missings list, min(99) to drop these observations?
Can anyone help? I've been looking for a solution for quite some time, without success.

Thank you.

Meta analysis of hazard ratio

Dear ,
My name is Hatem Ali
I am trying to do meta analysis of hazard ratio to assess effect of rise in IL6 on overall survival
I have the following data

id	Sample size	p value	CI LOWER	CI HIGHER	notes
Pecoits-Filho (HR for overall mortality,IL6 was higher in CVD group)	99	0.01			chi square=11.3
Liu et al	50	0.001			OR=6.9
cho et al (trend over time)	175	0.03	1.31	87.75	OR=10.72
Lambie et al	575	0.008	1.22	3.78	HR=2.15
Lambie et al 2	384	0.009	1.28	5.58	HR=2.68
Wang et al(mortality and Coronary calcification)	152	0.003	1.53	8.26	HR=3.56

As you see, I have sample size, p value, lower and higher confidence interval.
However, only 3 studies are reporting HR, 2 are reporting OR, one is reporting chi square.

Is there a way to calculate hazard ratio from the studies reporting odds ratio or chi square?
In other words,can I convert Odds ratio to hazard ratio? and can I convert chi square to hazard ratio?

If that is not possible, then , can I calculate Relative risk for each study from the data I have?
Can I calculate RR from HR, 95% CI , sample size and P value?
Can I calculate RR drom OR, 95%CI, sample size and P value?
Can I calculate RR from chi square, sample size and P value?

Finally , after calculating HR, the syntax to use is : metan HR lower higher, counts random
is that correct?
How can I add the name of the studies to the forest plot using this syntax?

Looking forward to hear back from you

Creating a value that equals the average of other values for the same variable

(Sorry about the poor phrasing of the question)
My dataset contains variables such as country and prevalence of obesity, for 34 countries. I want to create a new value for country variable that will be the average of obesity of all the countries, i.e there will be 35 categories under the country variable. Is there any command to do that in Stata 14.

Collinearity error when including continuous variable in dummy regression

I am trying to run the following two regressions to compare the coefficients on the 'iso_str' dummies. The only difference between the two is that the second one includes the variable 'shr',

1.

Code:

reg lncost ib6.iso_str i.var_str, eform(exp_coeff) baselevels

Code:

reg lncost ib6.iso_str shr i.var_str, eform(exp_coeff) baselevels

When I run regression (2) above, Stata omits the 'shr' variable because of collinearity.

Then, I tried an alternative formulation of the above two regressions to see if this way I could compare their coefficients. Again, the only difference is the inclusion of the variable 'shr' in the second regression.

3.

Code:

reg lncost ib6.iso_str ibn.var_str, noconstant eform(exp_coeff) baselevels

Code:

reg lncost ib6.iso_str shr ibn.var_str, noconstant eform(exp_coeff) baselevels

Notice that the outputted coefficients for the 'iso_str' dummies in (3) are identical to those in (4). However, I still can't compare 3 vs.4, as this time Stata doesn't omit the 'shr' variable in reg 4, but it omits one of the 'var_str' dummies in 4 (again, because of collinearity)..even though I used the 'ibn' command so that none would be dropped!

How can I compare the 'iso_str' coefficients outputted by these two regressions, with and without the variable 'shr'? Perhaps there is a way around the collinearity issue I am facing, e.g. rearranging my data differently?

Thank you. An excerpt of my data is below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 iso str5 var double(cost shr) long(iso_str var_str) float lncost
"CIV" "x1105" 11458.3333333333             49.674 1 1  9.346473
"COD" "x1105" 44083.2888217523              56.12 2 1 10.693836
"MRT" "x1105"              540             47.176 3 1  6.291569
"NGA" "x1105" 16842.1052631579             50.481 4 1  9.731637
"TGO" "x1105" 5590.76923076923             58.838 5 1  8.628872
"TZA" "x1105"            48000             66.947 6 1 10.778956
"ZAF" "x1105" 904.655301204819 34.150000000000006 7 1  6.807554
"CIV" "x1106" 10441.1764705882             49.674 1 2  9.253512
"COD" "x1106" 39391.0340285401              56.12 2 2 10.581293
"MRT" "x1106"              520             47.176 3 2  6.253829
"NGA" "x1106" 11834.3195266272             50.481 4 2  9.378759
"TGO" "x1106"  4398.8603988604             58.838 5 2  8.389101
"TZA" "x1106"            45000             66.947 6 2 10.714417
"ZAF" "x1106"  608.84493902439 34.150000000000006 7 2  6.411563
"CIV" "x1107" 12032.0855614973             49.674 1 3  9.395332
"MRT" "x1107" 463.636363636364             47.176 3 3  6.139101
"NGA" "x1107" 17391.3043478261             50.481 4 3  9.763725
"TGO" "x1107" 5015.38461538462             58.838 5 3  8.520266
"TZA" "x1107" 43636.3636363636             66.947 6 3 10.683646
"ZAF" "x1107"          984.375 34.150000000000006 7 3  6.892007
end
label values iso_str iso_str
label def iso_str 1 "CIV", modify
label def iso_str 2 "COD", modify
label def iso_str 3 "MRT", modify
label def iso_str 4 "NGA", modify
label def iso_str 5 "TGO", modify
label def iso_str 6 "TZA", modify
label def iso_str 7 "ZAF", modify
label values var_str var_str
label def var_str 1 "x1105", modify
label def var_str 2 "x1106", modify
label def var_str 3 "x1107", modify

getting a sample size

hi,

I'm trying to get the sample size of black women from my data set. I created a black women(bw) variable, counted the bw and then collapsed it. Take a look at my code below. Is this the right approach to get sample size for bw? also should i add more or less variables in my by( )?

Array

Cluster Randomized Controlled Trial

I have a question about the Cluster Randomized Controlled Trial. Is it recommended to perform the svyset command when doing the Cluster Randomized Trial? Another question is that what command can we use if we want to adjust for clustering?

Thanks!

traj command with hierarchical data structure

I'm interested in using the user-written traj command (link below) to identify latent trajectories of change in patients' BMIs. The command has useful features like joint trajectory modeling and accounting for non-random attrition.

However, patients in my dataset are nested within physicians; I have unique physician identifiers for each physician.

Questions:

1. Is there a method or workaround that would allow traj to account for hierarchically nested data?
2. If not, to what degree would traj be robust to violation of the assumption that patients are independent of each other?

https://www.andrew.cmu.edu/user/bjones/

Dipendent Double-Sorting 25 Portfolios

Dear all,
I am struggling to replicate the FF-25 portfolios with a variant, I should employ a dependent sort instead of a independent sort.
My dataset is the following one:

permno	date	primexch	ret	year	month	datem	BM	id	MarketCap
10001	30-May-86	Q	-0.00980	1986	5	316	.	2	.
10001	30-Jun-86	Q	-0.01307	1986	6	317	.	2	1.797265
10001	31-Jul-86	Q	-0.01020	1986	7	318	.	2	1.797265
10001	29-Aug-86	Q	0.07216	1986	8	319	.	2	1.797265
10001	30-Sep-86	Q	-0.00308	1986	9	320	.	2	1.797265
10001	31-Oct-86	Q	0.03922	1986	10	321	.	2	1.797265
10001	28-Nov-86	Q	0.05660	1986	11	322	.	2	1.797265
10001	31-Dec-86	Q	0.01500	1986	12	323	.	2	1.797265
10001	30-Jan-87	Q	-0.03571	1987	1	324	.	2	1.797265
10001	27-Feb-87	Q	-0.07407	1987	2	325	.	2	1.797265
10001	31-Mar-87	Q	0.03680	1987	3	326	.	2	1.797265
10001	30-Apr-87	Q	-0.03922	1987	4	327	.	2	1.797265
10001	29-May-87	Q	-0.07143	1987	5	328	.	2	1.797265
10001	30-Jun-87	Q	0.05143	1987	6	329	1.0144155	2	1.761665
10001	31-Jul-87	Q	0.02128	1987	7	330	1.0144155	2	1.761665
10001	31-Aug-87	Q	0.08333	1987	8	331	1.0144155	2	1.761665
10001	30-Sep-87	Q	-0.02231	1987	9	332	1.0144155	2	1.761665
10001	30-Oct-87	Q	0.02000	1987	10	333	1.0144155	2	1.761665
10001	30-Nov-87	Q	-0.02941	1987	11	334	1.0144155	2	1.761665
10001	31-Dec-87	Q	-0.03354	1987	12	335	1.0144155	2	1.761665
10001	29-Jan-88	Q	0.06383	1988	1	336	1.0144155	2	1.761665
10001	29-Feb-88	Q	0.08000	1988	2	337	1.0144155	2	1.761665
10001	31-Mar-88	Q	-0.07630	1988	3	338	1.0144155	2	1.761665
10001	29-Apr-88	Q	0.03061	1988	4	339	1.0144155	2	1.761665
10001	31-May-88	Q	0.01980	1988	5	340	1.0144155	2	1.761665
10001	30-Jun-88	Q	-0.01204	1988	6	341	1.2076184	2	1.824549
10001	29-Jul-88	Q	0.03000	1988	7	342	1.2076184	2	1.824549
10001	31-Aug-88	Q	0.02913	1988	8	343	1.2076184	2	1.824549
10001	30-Sep-88	Q	-0.021132076	1988	9	344	1.2076184	2	1.824549
10001	31-Oct-88	Q	0.039215688	1988	10	345	1.2076184	2	1.824549
10001	30-Nov-88	Q	0	1988	11	346	1.2076184	2	1.824549

where permo identifies the company, primexch identifies the stock exhanges ( Q=Nasdaq, N=Nyse, A=Amex), ret indicates the returns, BM (book-to-market value) is calculated at the end of the year t and it results to be publicly available in June of the year t+1 until May of the year t+2, finally MarketCap indicates the size of each company and it is calculated in June of year t and remains constant until May of year t+1.
I should sort stocks into 5 quintiles in accordance with their BM, and secondly I should double sort within each quintile according to companies’ Market Cap. The quintiles breakpoints should be calculated using only NYSE stocks (“N”).
Therefore I should obtain 25 portfolios, which are firstly sorted on BM and secondly on MarketCap.
Finally I should calculated the value-weighted monthly returns on these 25 portfolios from July of year t to June of year t+1.

Ps: I tried to write a code for calculate the value-weighted monthly returns on 10 deciles sorted on MarketCap for another type of calculation I had to do..Maybe it could be helpful
forvalues i = 1(1)10 {
egen num_return_dec_`i' = total(MarketCap * ret * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
egen den_return_dec_`i' = total(MarketCap * !missing(MarketCap, ret)) if deciles_MarketCap==`i', by (datem)
gen vw_return_dec`i' = num_return_dec_`i'/den_return_dec_`i' if deciles_MarketCap==`i'
}
Any help would be really appreciated as it is one week that I have been trying to solve this problem.
Best regards,
Antonio

Generate variables with forvals

Hello Statalist,
I have a dataset which contains the following variables: firm (every firm has an assigned number from 1-1000), their products (every product has an asigned number), the costs of sales, the revenue of the sale, and the year. Now I am trying to generate two very similar variables and a third one. One displays the average sales for each year and each firm and one displays the average costs for each year and each firm. The code should be equal for both, simply exchanging sales with costs. Now I am trying to construct this variable using a loop but I don't get the right result. My first try was the following:

forval i = 1/1000 {
forval j = 2001/2012 {
sum ventas if firma == `i' & year == `j'
gen ventap_`i'_`j' = `r(mean)'
}
}

There are 1000 firms. However the year data is not equal for all firms. There are some firms who have data from for example 2004-2009 or others with different periods (a lot of different periods) But the min of the variable year is 2001 and the max is 2012.

So when I run this code I encounter 2 problems: first it doesnt work because the firm doesnt have any observations for the year 2012 or other years (invalid syntax error). Second it creates a variable for every single year, displaying the average for that year. However I want just one variable displaying the average for the corresponding year for all cases.

The third variable that I have to create is one that displays the product that has the highest sales. The code should be a similar one to the first one, using 2 forvals containing the year and the firm but instead of using r(mean) it should probably use r(max). However here I encounter the same problem that not all firms have data for all the years between 2001 and 2012 and it generates a lot of variables instead of just one which shows the product id with the highest sales for the corresponding year.

I Hope i explained it understandibly and you can help me.
Thanks a lot

Interaction between variables changes the results fundamentally!

Dear All,
I would appreciate your help on the following please:
The correlation between y and x1 x2 is negative but between y and the interaction between x1 and x2 is positive, that's strange! Could somebody explain this to me, please?
Array

Weighted least squares (WLS) with wls0 and regwls

Dear Statalist,
I am conducting a long-run event study with the use of the Fama French 3 Factors model.
I am using the WLS regression, I want to use the monthly number of firms in the event portfolio as weights. Moreover, I want to use the equal-weighted monthly returns on each portfolio.
First of all, I uploaded the excel file and changed the format of the Date2 from string to Date3 variable (monthly format), then I declared the data set to be time series data.

I intend to use the Stata command wlsreg or wls0 with the options: wvar = No. of firms in the event portfolio in a month type(wlstype) - The choices are: abse - absolute value of residual e2 (With the dependent variable is the company name, explanatory variables are MktRF, SMB, and HML.
Could anyone please help me with the Stata command for WLS regression? I am very grateful.

Thank you and Kind regards,
Chi

Which Flavor of ADF Estimation Does Stata Use?

Hey Everyone,

with WLS/ADF estimation there are different flavors out there that are being used in statistics software. It is rather easy to control the specific estimator in R but in Stata I have not found any information on the exact source/reference the ADF estimation is built off.

My key interest is whether it is the pure Browne formula or if any of the adjustments to make WLS more robust to small sample size have been implemented.

Thanks and best
Leon

fill in empty adjacent cells within a group

Dear all,

I would like to ask how to fill in empty adjacent rows within a group. Problem here is the confusing data structure.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 subjid str21 personid str23 invtype
"a" "1" "Cranial Ultrasound Scan"
""  "1" "Other Ultrasound Scan"  
""  "1" "CT scan"                
""  "1" "X-ray"                  
""  "1" "EEG"                    
""  "1" "MRI"                    
""  "1" "ECHO"                   
""  "1" "ECG"                    
""  "1" ""                       
""  "1" ""                       
""  "1" ""                       
"b" "2" "Cranial Ultrasound Scan"
""  "2" "Other Ultrasound Scan"  
""  "2" "CT scan"                
""  "2" "X-ray"                  
""  "2" "EEG"                    
""  "2" "MRI"                    
""  "2" "ECHO"                   
""  "2" "ECG"                    
""  "2" ""                       
""  "2" ""                       
end

I want to put subjid like this.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 subjid str21 personid str23 invtype
"a" "1" "Cranial Ultrasound Scan"
"a" "1" "Other Ultrasound Scan"  
"a" "1" "CT scan"                
"a" "1" "X-ray"                  
"a" "1" "EEG"                    
"a" "1" "MRI"                    
"a" "1" "ECHO"                   
"a" "1" "ECG"                    
""  "1" ""                       
""  "1" ""                       
""  "1" ""                       
"b" "2" "Cranial Ultrasound Scan"
"b" "2" "Other Ultrasound Scan"  
"b" "2" "CT scan"                
"b" "2" "X-ray"                  
"b" "2" "EEG"                    
"b" "2" "MRI"                    
"b" "2" "ECHO"                   
"b" "2" "ECG"                    
""  "2" ""                       
""  "2" ""                       
end

Using personid is not a good idea because personid is repeated in row 4651 again- it strats with 1, 2 ...

if you let me know any solution for this, I would reapply appreciate it.

Kind regards,

Kim

Test for a unit root using panel data

Dear Community,

For my master thesis I want to test wether the error term of my model is stationary or not, in order to verify if the regression is spurious.
I am using unbalanced panel data, so based on what I read there are two possibile tests: xtunitroot lps and xtunitroot fisher.

However, when I try the lps test, I get an error saying ''insufficient observations'' and when I run the fisher test it takes ages for my computer to compute the test statistic.
I believe that stata computes the test for every panel (11000 in my case).

Do you know a way to solve these issues?

Losing groups while running a regression

Hello,

I am trying to make a panel data regression. My group variable (cntry) has 41 countries. But when I run a regression the number of groups is reduced to 18. I cannot find anything about this and stata does not say anything about it or why these groups are left out. Whaat are possible reasons that these groups are not taken into account when running the regression?

Latent class analysis categorical variables

Dear all,

I have been using latent class analysis in STATA 15 and have been able to get results using the gsem command for binary variables using the following code:

Code:

gsem (isch afib ckd dbt hyp pvd <- _cons), logit lclass(C 3)

However when i try to analyse categorical variables, I get an error message

Code:

.gsem (bmigroup  hb_class  age_gp <- _cons), ologit lclass(C 2)

invalid path specification;
ordinal response bmi_fr1 may not have an intercept

Could anyone help me with this?

Thank you

X11 Forwarding only showing half screen with Stata

I am using PuTTY to do SSH tunneling and X11 Forwarding with an Amazon EC2 Ubuntu instance. Multiple individuals in my system login to their own remote desktops, and then connect to the same EC2 instance.

I am currently trying to fix an issue where half of the screen is cut off for some users after the forwarding. I've tried

1) Make sure everyone uses the same network, however this doesn't solve the problem, the full screen shows up for some users and only a cut off screen for other users

2) Logging into the remote desktop of the users who are experiencing the cut off screen problem (using my own laptop), however I'm not able to replicate the issue

3) Configure the Xming display settings in XLaunch, and have Xming display in one window instead of multiple windows. Jury is still out on whether or not this works, I haven't had the users try out the new configuration yet. Also, when I open save XLaunch configs, hit finish, and then later open up XLaunch again, "multiple windows" / default settings are selected, rather than "one window". So before I tell users to try out the "one window" setting, how can I make sure that my new configurations are actually saved?

4) Do the PuTTY connection and X11 forwarding set up through XLaunch, rather than through PuTTY. I think this would be rather complicated so haven't tried it yet... though willing to do so if it could solve the issue.

Thoughts? Image of the problem is shown below.

Array

xtivreg2 - identifying singleton observations

Dear All,

I have a panel dataset with 18,071 observations.

I am estimating the following model in stata:

xtivreg2 ret2_w mret2_w s_r10_lmcap s_r10_bm s_r10_mom s_r10_op_prof s_r10_agro s_r10_stdret s_r10_vol_s s_r10_lag_ue_p s_r10_lnumage s_r10_divy yr1-yr13 (tq2_centered_w = wklymret_w wklych_usd_w) if sample_to_use == 3, fe first liml cluster(cnum) endog(tq2_centered_w)

The output begins with the warning:

Warning - singleton groups detected. 194 observation(s) not used.

Partial output below indicates 17,877 observations were used (18071-194 = 17877)

Number of clusters (cnum) = 1132 Number of obs = 17877
F( 25, 1131) = 59.19
Prob > F = 0.0000
Total (centered) SS = 56.71687777 Centered R2 = -0.1134
Total (uncentered) SS = 56.71687777 Uncentered R2 = -0.1134
Residual SS = 63.15058851 Root MSE = .06141

When I check generate descriptive statistics, I have complete data for all the variables in the above model for 18,071 observations. No missing values.

Importantly, I have only 7 singletons in my 18071-observation dataset:

. count if number == 1 & sample_to_use == 3
7

I would like to drop the 194 singletons. Could some one let me know how to identify and eliminate the 194 observations?

Best,

Srinivasan Rangan

Saving WTP estimates to conduct Poe Test

Dear all

I have run a clogit model and have estimated wtp via wtp, krisnky command. I now want to save the wtp estimates so I can perform a Poe test to compare with wtp estimates for another clogit model. However I do not see any option for saving these wtp estimates. The saving function is only available with the wtpcikr function which I do not think can be used with a clogit model.

Can anyone help me with this please

Formatting and managing dates, from String to MMYYYY format

Dear All,
I am a new user in Stata.
I am having a basic question and would kindly ask for your help. I want to change the data format from string to the date format (MMYYYY). I have tried this formula date(Date2, "MY") and then created the monthly format (Date3). However, the format is not what I expected.
Many thanks for your support,
Chi

Bootstrap

Hi everyone,

I need some help with understanding why STATA doesn't let me use the command: bootstrap_b.
It writes: unrecognized command: bootstrap_b.
What can I do?

Thanks in advanced.
Gal.

Advices on how to learn systematically how to work with panel

Hi everyone,

I have very elementary skills in econometrics and until now I've only worked with cross-sectional data. Now I need to work with panel data, but I feel I lack even the basic competences (even for doing descriptive statistics). Until now I've tried to fill my gaps "on the road", basically trying to learn only the things that I needed immediately. I resorted to this "easy" steategy only because I'm really short of time.
But that's not working. I need a more systematic training on how to explore my data and work with them when they have a panel dimension.
My handbook is not very helpful: the chapter on Panel starts from regressions; I want to be able to know my data in detail and know how to work with them before I do regressions. There is probably I reason for that lack in my book (maybe I should look into the time series methods for descriptive statistics?), but I don't know it.

So my question is: considering that (independently on my will) I'm short of time, what book/video/online resource would you suggest to have a systematic introduction to panel data in Stata, which includes all the "tricks" to describe them and work with them (I learned how to do many things the long way, and then found a much shorter way in Statalist.. isn't there a way to learn these things systematically?)?

Aurora

How to interpret the result of the "Total Factor Productivity of Manufacturing Firms" based on Levinsohn and Petrin (2003) approach?

I intend to measure the TFP of manufacturing firms for 23 firms through Cobb - Douglas Production Function Approach using Prodest code in Stata for the period 2015-2017.
I am using Levinsohn and Petrin (2003) approach with the attached Stata dataset for the same. However, I got negative coefficients of logL and logK in case of Levinsohn and Petrin (2003) approach. Results have been attached in the form of the image below. These individual TFP Values as dependent variables are regressed with infrastructure stocks as an independent variable.
Stata Code:

prodest lnGVA, method (lp) free(lnL) proxy(lnInput) state(lnK) poly(3) valueadded reps(250)

predict TFP

Can anyone help to overcome this issue in the result? Please respond.

Famid	year	lnGVA	lnK	lnL	lnInput
1	2015	13.34451139	14.43711069	13.82499642	14.94789177
2	2015	10.90103056	11.39432509	12.00363817	12.56028455
3	2015	10.52884158	10.90823019	11.74051512	12.56156862
4	2015	11.71408167	12.96707595	11.86919333	13.19120632
5	2015	10.78025708	10.57660072	11.29931136	12.61370021
6	2015	11.30195799	10.79404052	10.71557266	12.07061138
7	2015	13.89161883	14.69274188	13.91372923	15.59004602
8	2015	12.68505841	13.08795162	13.27239071	14.34357928
9	2015	12.17481436	12.49800879	11.90358822	13.0876142
10	2015	10.37213186	10.36546633	10.85971028	11.51539809
11	2015	11.89178185	12.93752458	11.87475212	13.15976277
12	2015	12.88529455	13.74124074	13.52565546	14.57748312
13	2015	11.26551282	11.99049128	12.59243988	13.35132999
14	2015	11.91772596	13.18896836	12.4565356	13.65617625
15	2015	14.08798489	14.43171365	14.08198176	15.39174269
16	2015	11.84720763	14.04701543	12.27763023	13.27226267
17	2015	11.84987474	12.32818954	13.05611887	13.71366131
18	2015	12.2899301	12.94906791	12.83675914	13.8142172
19	2015	13.31159488	14.0107976	14.37021545	14.99198913
20	2015	7.930242796	7.485056583	10.17564981	8.622648785
21	2015	12.58255248	13.30226199	13.42014082	14.52482222
22	2015	12.45019737	12.54385883	12.59546596	13.57663852
23	2015	11.82869258	13.06866314	13.13062515	14.08698579
1	2016	13.48373632	14.55212104	13.81087483	14.87352281
2	2016	11.09011635	12.09660422	12.06294103	12.5021024
3	2016	10.43491923	10.92367085	11.54507315	12.3586785
4	2016	11.26804467	13.0838402	11.83438558	13.05207522
5	2016	10.7530443	10.4765827	11.23227197	12.73695465
6	2016	11.41296781	10.8661762	10.80685514	12.14263568
7	2016	13.9810047	14.90279048	13.99064468	15.47850878
8	2016	12.75290698	13.25330751	13.23466654	14.43921191
9	2016	12.17262238	12.62311498	11.81393335	12.95478697
10	2016	10.49895051	10.55531489	10.86526777	11.58579992
11	2016	11.52578885	12.93922703	11.86191193	13.20034564
12	2016	12.99481134	13.78688988	13.55250289	14.51278126
13	2016	11.53622211	12.27852734	12.51736626	13.27750928
14	2016	12.20480924	13.55141515	12.49875718	13.69012924
15	2016	14.14469829	14.47629798	14.13122964	15.45322844
16	2016	11.80136403	14.22621284	12.25082132	13.35906515
17	2016	11.91459215	12.41077943	13.10543896	13.69703396
18	2016	12.39233098	13.06787942	12.88085472	13.9162249
19	2016	13.50721806	14.10146753	14.47325385	14.97086086
20	2016	7.590132471	7.815032882	10.07225939	8.626449627
21	2016	12.79879814	13.47297179	13.50166245	14.52501448
22	2016	12.73090918	12.64631381	12.64053977	13.87087559
23	2016	12.05060545	13.1477183	13.11664506	14.11136877
1	2017	13.37820655	14.57031481	13.87654921	14.99791944
2	2017	11.31482949	11.94872061	12.1067936	12.48835749
3	2017	10.48552974	11.50844226	11.50258216	12.33606399
4	2017	11.45326146	13.4451234	11.89512877	13.13305958
5	2017	10.53909949	10.41244812	11.2286642	12.71520385
6	2017	11.32904661	10.91523748	10.70495088	11.98065828
7	2017	13.91086343	15.06872287	14.03587425	15.54622029
8	2017	12.98490547	13.39216318	13.3848061	14.65935721
9	2017	12.08523011	12.39343758	11.86197541	12.95263912
10	2017	10.6732993	10.92220152	10.98576719	11.73012375
11	2017	11.90939933	13.25324863	11.88186489	13.18303098
12	2017	13.19309476	13.82810032	13.62542212	14.61800501
13	2017	11.72306923	12.43306157	12.42895616	13.41005668
14	2017	12.21973581	13.62180352	12.54387614	13.72163599
15	2017	14.10537468	14.43870418	14.12643932	15.34049703
16	2017	12.04438547	14.43877548	12.31398041	13.40533761
17	2017	11.98871848	12.33912744	13.18484604	13.69294326
18	2017	12.53221462	13.23962174	12.93065551	14.01079714
19	2017	13.5782688	14.25990109	14.51053547	15.04941816
20	2017	7.673050689	7.846913183	10.08397409	8.60560012
21	2017	13.23123952	13.50385468	13.57157867	14.59337051
22	2017	12.61675213	12.68825504	12.74948936	13.66224058
23	2017	12.22915794	13.35048112	13.11830917	14.14173054

PMG insufficient observation

Hi everyone,

I was trying to run panel under PMG effect. N= 36 over the time period 1984-2016. When I run the command for the full panel, it is fine. But, in the case of developed and developing countries, I got this message: insufficient observations r(2001). Anyone has idea or suggestion. Please,

Regards,
Marwan

Finding code snippets from Stata's base commands

Dear Statalist,

Is there a way to see the programme code from Stata's own base commands?

In my case, I am interested in seeing how the firstrow option of Stata's import excel command works exactly because I want to learn from it in order to do something similar (I want to be able to modify the column headers of my Excel file after importing and only then should these headers become variable names, so I can't just use the firstrow option directly). But when I type sysdir and then locate the import_excel.ado file in my BASE folder, it contains only very limited reference code, not the full programme...

Many thanks,
Felix