BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Friday, January 31, 2020

Is ROC curve for 3x3 table possible?

Dear All,
I have a confusing doubt in my mind. ROC curve for 3x3 or 2X3 table is possible?
If so,
1. Can any one please give some example and the hypothesis statements?
2. how the ROC curve will be in that case?
3. Stata codes for that ROC curve

In my case, I have blood sugar level (test) comparison with HbA1c (gold standard) in 3 categories like "normal" "Pre-Diabetic" and "Diabetic".

Please let me know the answer for the above mentioned questions.

Thanks a lot in advance.

inquire foreach list

I find that

Code:

foreach x in  Gold Platinum_upgrade Platinum_upgrade_merit {

in fact, there is no Gold variable in the dataset. Instead, this is Gold_benefits variable. It seems in the list in foreach, variable name can be truncated. Is my understanding correct?

Another problem is that I cannot run foreach loop in the do file. It always prompts "invalid syntax
r(198);
"

Many thanks in advance!

How to split?

Dear All, I have this data set,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str61 y
"4.4%"                    
"One Year Deposit Rate+3.25%"
"Five Year Deposit Rate-2.25%" 
end

and wish to obtain the following result

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str22 y1 str5 y2
""                       "4.4%" 
"One Year Deposit Rate"  "3.25%"
"Five Year Deposit Rate" "2.25%"
end

Any suggestion is appreciated. Thanks.

inquire about capture command

I see a data code that starts:

Code:

clear *
capture cd "~/Dropbox/Projects/The Demand for Status/Final_data_QJE"
set more off

Can anyone explain the second code for me?

Many thanks in advance!

Gravity Model: reverse causality LEAD variable

Hi,

I am trying to test for potential reverse causality between RTAs using a gravity model.

RTA = 1 if exporter and importer have a RTA at year t.
The pairid is the distance between exporter and importer.

I would like to generate a lead variable capturing the future level of RTAs (in the next 4 years):

tsset pairid year
gen RTA_LEAD4 = f4.RTA
replace RTA_LEAD4 = 0 if RTA_LEAD4 == .

However, I received this error:

tsset pairid year
repeated time values within panel

I think this is because in my database trade flows are treated separately each way (exports and imports) so each pairid of countries is two times each year.

How could I generate the RTA_LEAD4 without changing my pairid?

Thanks!!

Interpreting coefficients of a three-level mixed model

Dear Stata-listers,

I have a three-level mixed model and I am struggling on the specific interpretation of the coefficients.
I have observations nested within individuals, further nested within firms. I am trying to understand whether a policy's effectiveness varies with age. The data are in a three-wave panel with two years between each wave. The panel is quite unbalanced with many people appearing only once.

A simplified version of my model and output is below. This is using test data, due to data access rules, so the results may not make a lot of sense. I will also add that I have calculated marginal effects which help me interpret the model practically, but I am unclear precisely how to interpret the regression coefficients with respect to the levels (i.e., within and between person). I have seen other posts on this issue but am still unsure with regard to my own model. I would appreciate any help.

Code:

mixed y c.age##i.policy i.year || firm: || person: age

Mixed-effects ML regression                     Number of obs     =     21,398

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
             firm |      6,502          1        3.3         19
        person |     17,582          1        1.2          3
-------------------------------------------------------------

                                                Wald chi2(5)      =       3.19
Log likelihood = -23030.701                     Prob > chi2       =     0.6707

---------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
            age |   .0000896   .0015416     0.06   0.954     -.002932    .0031111
                   |
         policy |
         1 Yes  |  -.1205814   .0992169    -1.22   0.224    -.3150429    .0738802
                   |
policy#c.age |
         1 Yes  |   .0024477   .0021021     1.16   0.244    -.0016723    .0065676
                   |
           year |
          2000  |  -.0004633   .0119253    -0.04   0.969    -.0238365    .0229099
          2002  |   -.002491   .0126007    -0.20   0.843     -.027188    .0222059
                |
          _cons |   3.002878   .0716159    41.93   0.000     2.862513    3.143243
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
firm: Identity          |
                   sd(_cons) |   1.35e-06          .             .           .
-----------------------------+------------------------------------------------
person: Independent         |
                     sd(age) |   6.83e-06          .             .           .
                   sd(_cons) |    .000022          .             .           .
-----------------------------+------------------------------------------------
                sd(Residual) |   .7098958          .             .           .
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 0.00                  Prob > chi2 = 1.0000

Note: LR test is conservative and provided only for reference.

I realize that the coefficients are not statistically significant, however this is because I can only share results of test data here. I am not sure why I am not getting CIs or SEs for the variance components, but I assume this is also a test data issue as this problem is not there with the real data. Assuming the random slopes and intercepts are statistically significant from zero, how do I interpret for example the effect of age? Does this mean that taking the average person, when they are 1 year older than their average age in the panel they have a 0.0000896 higher value of y on average (again ignoring statistical significance)?

Text analysis - find frequency of several keywords within several strings + stop words removal + sentiment analysis

Hello! I am a new user of Stata and I have a problem with a task I have to solve.
As a part of my thesis, I have to do a text analysis. Afterwards, I will perform the statistical analysis with Stata and I am not sure whether I can use Stata for the text analysis as well.

I have tried it and researched a lot. But I am not sure whether my approach is correct.
There are three aspects I want to adress in this post:

As the most relevant aspect, I would like to generate a new variable that shows the frequency of a key word (substring) in a text (string / variable), for instance "machine learning". What is the best way? Would you recommend to use Stata or the integration of Wordstat or Python? There are three options but so far neither of them has been succesful.

I have installed Wordstat in Stata and used it to import PDF files of annual reports. These reports are stored as text / string in the variable DOCUMENT in Stata. I would like to use Wordstat for the text analysis as well. So, I used User>Wordstat>content analysis and I generated a dictionary with key words. However, when using frequencies I get the error "No valid cases".
So far, I have found options in Stata that show if a particular substring is inside a string or not (strpos, regexm, substr, subinst) , but I need to know the frequency. Noccur is a command that offers this, the only one in my opinion. However, I have used this command and the calculated frequency for some key words in Stata is lower than the actual frequency I have found in the PDF file or the text file of the particular annual report, using command F.
Python is possible as an integration in Stata but I have not figured out how to interact, how to use the variable in the Stata table in the Python command and how to export the results of Python in the Stata table as a new variable. In Python I have found the regex command re.findall(pattern, string, flags=0). Is it recommended to use Python instead of Stata for the text analysis and to do statistics in Stata afterwards? Should I install Python and load the files there? Then, I would need to save the Python table as an Excel file and to create a Stata file from it. With one variable that is the same in both stata tables the merging is possible afterwards. Is that correct?

In a second step, I would like to use stop words removal in order to have the whole amount of relevant words of a document. The variable DOCUMENT stores text files as strings that contain relevant words and stop words. I would like to generate a new variable that contains the strings without stop words. I think it is possible to use the coomand txttool. So far it has not been succesful.
For the stop words removal I would use lists of Wordstat. WordStat provides stop words lists in the 4 languages English, French, German, Spanish that I need. How can I use Stata and or Wordstat? Is it recommended to use Stata and or WordStat or Python?

Moreover, I am not sure if a sentiment analysis is possible in Stata itself or with the integration of Python. The sentiment analysis could show which documents use rather positive or negative words in the same sentences that contain a particular key word.

Thank you and best regards

Splitting a string variable in to two parts

Hello...
I have a string variable that has 3-5 words. I want to split it into two parts.
Here's the example

Code:

clear all
input str50 product
"105 Dairy products - Paneer, curd etc"
"011 Vegetable cultivation"
"563 Tea selling"
"561 Hotel (restaurant)"
"105 Dairy products - Paneer, curd etc"
"321 Jewellery making"
"107 Bakery"
"563 Tea and Snacks"
"011 Vegetable cultivation"
"105 Dairy products - Paneer, curd etc"
"105 Dairy products - Paneer, curd etc"
"011 Vegetable cultivation"
"011 Fruits production"
"000 Others"
"011 Vegetable cultivation"
"013 Plant nursery"
end

I want the string line to be separated from the three digits. That is to say, it should look like:

Code:

clear all
input str10 product str50 product1
"105" "Dairy products - Paneer, curd etc"
"011" "Vegetable cultivation"
"563" "Tea selling"
"561" "Hotel (restaurant)"
end

I tried it with split command. But could not succeed.
I also went through this thread:
https://www.statalist.org/forums/for...at-has-a-space

Any help will be greatly appreciated.

How to reshape

I have a problem with reshaping, I don't know how to do this. Can someone help please?
I have this:
obs_no vname place v_ Var2
250 "term1" "rancho mirage" 1996 . "A199601817" . . . . . .
250 "term2" "rancho mirage" 1997 . "A199601817" . . . . . .
250 "term3" "rancho mirage" 1998 . "A199601817" . . . . . .
250 "term4" "rancho mirage" 1999 . "A199601817" . . . . . .
251 "term1" "rancho mirage" 1994 . "A199601815" . . . . . .
251 "term2" "rancho mirage" 1995 . "A199601815" . . . . . .
251 "term3" "rancho mirage" 1996 . "A199601815" . . . . . .
251 "term4" "rancho mirage" 1997 . "A199601815" . . . . . .
252 "term1" "riverside" 1996 . "A199601819" . . . . . .
252 "term2" "riverside" 1997 . "A199601819" . . . . . .
252 "term3" "riverside" 1998 . "A199601819" . . . . . .
252 "term4" "riverside" 1999 . "A199601819" . . . . . .
253 "term1" "riverside" 1999 . "A199601821" . . . . . .
253 "term2" "riverside" 2000 . "A199601821" . . . . . .
253 "term3" "riverside" 2001 . "A199601821" . . . . . .
253 "term4" "riverside" 2002. "A199601821" . . . . . .

I want this:

place	A199601817	A199601815	A199601819	A199601821
"rancho mirage"	1996	1994
"rancho mirage"	1997	1995
"rancho mirage"	1998	1996
"rancho mirage"	1999	1997
"riverside"			1996	1999
"riverside"			1997	2000
"riverside"			1998	2001
"riverside"			1999	2002

predict after xtreg, fe

Hello,

I have a fixed effects panel regression: xtreg exp cost i.year, fe

My regression coefficient is 0.5. A one unit increase in cost is associated with a 0.5 increase in exp.
I would now like to calculate the effect of a 10% increase in cost.
I would prefer not to log these variables.
Does anyone have any suggestions?
Thank you

Exporting a cross table using three string variables to a CSV file

Dear Statalisters,
How do I export a result table using three string variables ("region", "cat", "field") in a CSV format?
by field, sort : tab region cat , miss
I tried several commands but they did not seem to do what I want.
Sincerely,
FUKUGAWA Nobuya

Stata MP in Microsoft Terminal Server Environment

Hello, would anyone be able to offer any white paper or solutions articles on deploying Stata MP in a Microsoft Terminal Server environment. I would be interested in what kinds of hardware specs (memory and CPU) for between 5 to 10 concurrent users working on data sets of about 1gb in size.

Coefplot how to change bar colors?

My question is how can I change the bar colors when using

HTML Code:

coefplot

I tried this:

HTML Code:

  bar(1,bcolor(navy)) bar(2, bcolor(maroon))

option bar() not allowed r(198);

HTML Code:

coefplot fcn1 fcn5, format(%9.0f) ///
title("(a) Overall Perception of China", size(medium)) ///
color(navy) lcolor(black) lpattern(solid) byopts(cols(1)) ///
ciopts(recast(rcap)) citop citype(logit) ///    
recast(bar) rescale(80) vertical  ///
ytitle ("") ///
graphregion(color()) yscale(r(0 50)) ylabel(0(10)50) ///
addplot(scatter @b @at, ms(i) mlabel(@b) mlabpos(2) mlabcolor(black)) ///    
xlabel(1 "Favourable" 2 "Neutral" 3 "Unfavourable") ///
ylabel(, labcol(black)) barwidth(.3) legend(label(1 "Remainers") label(3 "Leavers") ring(0) position(2) bmargin(large))

Mediation Analysis in Difference-in-Differences (DiD)

Hello,

I am trying to test mediation using difference-in-differences. My outcome variable is company total sales and my treatment variable is boycott (which equals 1 every year after a company is boycotted initially by consumers and 0 before that date and for control companies). I know in the traditional DID I might write:

Code:

 xtreg sales boycott $controls i.year, fe

Now my question is if I want to test whether this effect on sales by boycotts is channeled (mediated) through media tenor (positive/negative media attention), how would I do that?

Thank you in advance.

RC

Create beeswarm and beanplot

Hi all,

I would like to create a beeswarm plot and a beanplot. My dataset contains a numeric variable - I compare this variable over subgroups and wish to graph this in a beeswarm/beanplot with medians and IQR. Does anyone know the commands in Stata for this? I understand this is possible in R, but I don't have access to R unfortunately. I'm using Stata 16.0.

Many thanks!

Best
Gunnar

Mata implementation of a fast (k) nearest neighbours lookup algorithm

Hello everyone

I have implemented a kd-tree search algorithm in Mata, that can find the k nearest neighbours of a p-dimensional point among a set of points. For large data sets, this can be much faster than a 'brute force' search, and it could be useful for researchers doing spatial analysis.

The code is available from my Github repository. Simply download and run the file mata_knn.do; this will intialize all the Mata functions. Example usage:

Code:

version 15.1
mata: mata clear
mata: mata set matastrict on
run mata_knn.do
mata:
    N = 10000
    k = 5
    query_coords = runiform(N,2)
    data_coords = runiform(N,2)
    knn(data, data, k, kni=., knd=.)
end

The matrices kni and knd contain the indices of, and distances to the k nearest points, for each query point. Of course, the query and the data points could be the same in which case the first nearest neighbour is always 'self'. Duplicate data_coords are not allowed, and will throw an error.

I have only thoroughly tested it with 2-dimensional points yet. If you feel that this is useful, or if you find any bugs, kindly let me know! I also consider uploading it to the SSC archive, but have not found the time to do so yet.

Best
Robert

Creating a sum variable of the prior 12 months

Dear Users,
For a current project I need to create a new variable (call it X), which exist of the sum of the previous 12 months.
I need to compute the total loan facility amount originated in the prior 12 months per Lender. Lender is the obviously the lender , Mdate is the monthly date and Money is the loan facility reported at the certain date. The total dataset is about 200.000 observations from 2000-2018. I've tried it in a few ways, but non of them reported a logical or even close to logical number.

My question: Could you help me to obtain code to generate this variable in stata?

* Example generated by -dataex-. To install: ssc install dataex
clear
input str74 Lender float(Mdate Money)
"Bank of America" 480 126420000
"General Electric Capital Corp" 480 1.20e+08
"ABN AMRO Bank NV [RBS]" 480 166666672
"Bank of America" 480 133333336
"Bank of New York" 480 7.50e+07
"Guaranty Federal Bank" 480 100625000
"Chase Manhattan Bank" 480 5.45e+08
"Chase Manhattan Bank" 480 6.00e+07
"Bank of America" 480 7000000
"Heller Financial Inc" 480 6500000
end
format %tm Mdate
[/CODE]

The method is based on the paper from Cai et all (2018)*
*Cai, Jian & Eidam, Frederik & Saunders, Anthony & Steffen, Sascha. (2017). Syndication, Interconnectedness, and Systemic Risk. Journal of Financial Stability. 34. 10.1016/j.jfs.2017.12.005.

(I'm using the most recent version of Stata).

Manipulating forest plot

Hello,

So I am working on my first meta-analysis in which I am assessing the incidence of traumatic brain injury in LMIC, so to pool the rate I needed the incidence rate reported per the specific study, the lower bound and upper bound. After using the metan command this how the forest plot looks like.
How do I zoom the plots so that the confidence interval lines of each study can be seen?
What is the command that I can use to give the weight of the studies? i.e. Population-based studies/ cohort studies/ I would want them to be given more weight.

Regards,
Gideon.
Array

SAMPLE SIZE CALCULATION FOR PANEL DATA/ USE OF Stata PSS

Good morning all
I need assistance to calculate the minimum sample size for a piece of hospital-based research work. I have read through the Stata documentation for power and sample size (Release 15) and I still do not have a sense of direction about how to do so
The aim is to determine the appropriate time for the estimation of haemoglobin concentration after blood transfusion in children aged 1- 17 years.
With objectives to determine the time of equilibration of haemoglobin concentration after a blood transfusion and to determine the relationship between time of equilibration and recipients’ variables like age, sex, and body mass index and pre-transfusion Hb concentration and To determine the relationship between donor’s Hb concentration, duration of storage of blood and equilibration time.
I wish to know what level of change occurs in haemoglobin concentration between 1, 6, 12, and 24 hours after blood transfusion
I suppose this is a time series or panel data and I am needed to show how I will derive the minimum sample size
I couldn’t decide the aspect of power and sample size section (Stata 16) to use in Stata and or how to calculate it manually and will appreciate help
Thank you
Ezeanosike Obum

How to filter out string observations with common characters?

Hi Statalisters,

I am new to Stata and when I work on my data. I want to filter out string observations with common characters. For example, as the following chart shows, the string variable name is InstituteName and it contains many observations. However, these five observations presented below have a common character "Government" thus I want filter out them with character "Government". What command should I use to deal with this problem?

InstituteName

US Government

U.S Government

U S Government

Destring with many decimal issue

Hi

I am fairly new to Stata, and have searched a lot on my issue but have not yet found a solution..

My issue is with destring of my variables. I have imported an excel sheets and some of my observations (due to currency conversion) contains a lot of decimals: eg.

FirmID fias
1 1604.26972699244
2 1454.1477388612

I then use the following destring command: (I have missing data as "n.a.")

destring fias, gen(fias_num) ignore("n.a.")

I get the following result:

FirmID fias_num
1 1.60e+14
2 1.45e+13

Hence that is a completely different number!!

Do anyone have a solution for this?

Fingers crossed
Rasmus

SEM model group analysis not concave

Hello there !

I am trying to use Stata and check my SEM model for measurement in-variance among two different groups. The issue that i am facing is that my model doesn't concave when i apply NO constrains to it. However as soon as i apply constrains to the measurement intercepts and thereafter to other parameters (e.g. measurement coefficients, structural coefficients) the model concaves. Essentially I would like to assess the model without any constrains and then check for goodness of fit when applying the constrains, however I can't get it to run..What is your idea about that ? I sense that the variability in the measurement intercepts is not "helping" stata to run the model !??

Thanks in advance for your response.
George

Thursday, January 30, 2020

Dummy variable for neonatal Mortality, Infant mortality and Child mortality

Hi, I am working on the Demographic health survey (DHS) data. I have to compute the neonatal Mortality (death during the first 28 days of life (0-27 days)), Infant mortality (dies with first 12 months) and Child mortality(dies with first 60 months).

I have the following variables
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
b1 byte %8.0g month of birth
b2 int %8.0g year of birth
b3 int %8.0g date of birth (cmc)
b4 byte %8.0g LABL sex of child
b5 byte %8.0g LABN child is alive
b6 int %8.0g b6 age at death
b7 int %8.0g age at death (months-imputed)
b8 byte %8.0g current age of child
b9 byte %8.0g b9 child lives with whom
b10 byte %8.0g LABB completeness of information
b11 int %8.0g preceding birth interval
b12 int %8.0g succeeding birth interval
b13 byte %8.0g b13 flag for age at death
b15 byte %8.0g LABN live birth between births
b16 byte %8.0g b16 child's line number in household

any one please how I can compute the neonatal Mortality, Infant mortality and Child mortality as a dummy variable taking value 1 if the child dies within the specified period and 0 otherwise.
{res}

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte b1 int(b2 b3) byte(b4 b5) int(b6 b7) byte(b8 b9 b10) int(b11 b12) byte(b13 b15 b16)
10 1994 1138 1 1   . . 12 0 1  11   . . 0 5
11 1993 1127 1 0 100 0  . . 5  62  11 0 0 .
 9 1988 1065 2 1   . . 18 0 1  25  62 . 0 4
 8 1986 1040 2 1   . . 20 0 1  16  25 . 0 3
 4 1985 1024 2 1   . . 21 4 1  10  16 . 0 0
 6 1984 1014 1 0 100 0  . . 5  12  10 0 0 .
 6 1983 1002 1 0 100 0  . . 5  13  12 0 0 .
 5 1982  989 2 1   . . 24 4 1   .  13 . 0 0
 9 2001 1221 2 1   . .  5 0 1 105   . . 0 2
12 1992 1116 1 0 105 0  . . 1 156 105 0 0 .
end
label values b4 LABL
label def LABL 1 "male", modify
label def LABL 2 "female", modify
label values b5 LABN
label values b15 LABN
label def LABN 0 "no", modify
label def LABN 1 "yes", modify
label values b6 b6
label def b6 100 "0 days", modify
label values b9 b9
label def b9 0 "respondent", modify
label def b9 4 "lives elsewhere", modify
label values b10 LABB
label def LABB 1 "month and year", modify
label def LABB 5 "year - a, m imp", modify
label values b13 b13
label def b13 0 "no flag", modify
label values b16 b16
label def b16 0 "not listed in household", modify

-margins- after -xtlogit,fe-

Dear Statalist

This is a question about interpreting the results from a panel data fixed-effects logistic regression. The outcome variable is binary & the main regressor is categorical with 4 levels.

As the estimated odds ratios change depending on which base level is selected, in a cross-sectional setting I prefer to use -margins- and interpret the results in terms of average adjusted predictions (which is unaffected by the base level). However, when using -xtlogit-, the average adjusted predictions appear to change depending on the base level.

Question: is this the expected behaviour for -margins- after -xtlogit-? If so, would it be preferable to interpret the results in terms of odds ratio instead of probabilities in a panel-data setting?

Code:

use http://www.stata-press.com/data/r16/union.dta, clear

xtset idcode year, yearly

* Discretize the -grade- variable into 4 levels for illustration purpose
egen grade_category = cut(grade), at(0,7,13,16,19) icodes
label define grade_category 0 "primary" 1 "secondary" 2 "undergraduate" 3 "postgraduate"
label values grade_category grade_category

If we treat the data as cross-sectional, the results from -margins- are unchanged by the base level of the regressor.

Code:

quietly logit union i.year ib(0).grade_category

margins grade_category

Predictive margins                              Number of obs     =     26,200
Model VCE    : OIM

Expression   : Pr(union), predict()

--------------------------------------------------------------------------------
               |            Delta-method
               |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
grade_category |
      primary  |   .2349991   .0276247     8.51   0.000     .1808556    .2891425
    secondary  |   .2073589   .0031732    65.35   0.000     .2011395    .2135782
undergraduate  |   .1943004   .0058311    33.32   0.000     .1828718    .2057291
 postgraduate  |   .2937748   .0064781    45.35   0.000      .281078    .3064717
--------------------------------------------------------------------------------

quietly logit union i.year ib(1).grade_category
margins grade_category
*(output omitted)

quietly logit union i.year ib(2).grade_category
margins grade_category
*(output omitted)

quietly logit union i.year ib(3).grade_category
margins grade_category
*(output omitted)

This is not the case, however, with panel-data -xtlogit-

Code:

. quietly xtlogit union i.year ib(0).grade_category, fe

. margins grade_category

Predictive margins                              Number of obs     =     12,035
Model VCE    : OIM

Expression   : Pr(union|fixed effect is 0), predict(pu0)

--------------------------------------------------------------------------------
               |            Delta-method
               |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
grade_category |
      primary  |   .5184114   .0215869    24.02   0.000     .4761018    .5607209
    secondary  |   .5703154   .2774507     2.06   0.040      .026522    1.114109
undergraduate  |   .5507514   .2823345     1.95   0.051    -.0026142    1.104117
 postgraduate  |   .6687735   .2569906     2.60   0.009     .1650813    1.172466
--------------------------------------------------------------------------------

. quietly xtlogit union i.year ib(1).grade_category, fe

. margins grade_category

Predictive margins                              Number of obs     =     12,035
Model VCE    : OIM

Expression   : Pr(union|fixed effect is 0), predict(pu0)

--------------------------------------------------------------------------------
               |            Delta-method
               |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
grade_category |
      primary  |   .4661257   .2823028     1.65   0.099    -.0871777    1.019429
    secondary  |   .5184114   .0215869    24.02   0.000     .4761018    .5607209
undergraduate  |   .4985708   .0396701    12.57   0.000     .4208188    .5763228
 postgraduate  |   .6207837   .0584854    10.61   0.000     .5061544     .735413
--------------------------------------------------------------------------------

*and so on

Thanks,
Junran

Loop over multiple arrays via foreach

Hello together!

trying now since a few hours to fix my problem - I am not able to come up with an adequate solution. I need your help / advise!

I would like to run the blow shown STATA code. My problem is that it will only work for the first observation (example1 @ 1999 and example2 @ 2005).

How can I work with multiple entries? Any ideas?

Code:

tsset ID YEAR

local example1 1999
local example1 2000
local example2 1999
local example2 2005

foreach i in numlist 100005 {                  
local t `example`i''
replace  TEST_variable  = 10 if ID == "`i'" & YEAR == `t'
}

Thank you for your help!

Konstantin

ICD 10 to ICD-9 mapping using GEMs

Hi stata users,
anybody has any experience in using GEMs files to map backwards from icd10 to ICD10 codes in stata. I have downloaded following GEM file from below link but not sure whats the way to do it accurately.
https://data.nber.org/data/icd9-icd-...e-mapping.html
Can you please share any tips?

Logit Regression Model / Plotting Interactions

Hi everyone,

I am working at my final paper and need some guidance.

I want to look at the party identification with a specific party_x in the time period 2014 to 2017. My dependent variable is party identification with party_x (binary coding: 0= if identification with another party, 1= if identification with party_x). My independent variables are theoretically based and about right wing extremism.

I would like to answer two questions:

First I want to look at the general trend regarding my indicators and the party_x identification. I am planning to do so with:

– logit party_x independent _variable_1 independent _variable_2 independent _variable_n ,or –

My idea is to look at the Odds Ratio for getting information about the general trend over that time period in regard to the independent variables.

Second I want to look if there is an interaction term by looking at just one independent variable and the interaction with a binary coded time-dummy variable (1= 2016/2017 and 0=2014/2015):

– logit party_x independent_variable_1##time-dummy independent_variable_2 independent_variable_n –

Than I would look at the – margins independent_variable_1, at (time-dummy=(0(1)1)) – for the logit coefficients and plot this with – marginsplot – .

The idea behind that is to visualize the different potential interactions with the gradients of the logit coefficients.

Is that a suitable way to achieve my goals?

I am really thankful for any help or advise

Best

Looping over correlation function

I am trying to produce a dataset that includes the autocorrelations of a variable (ret) across groups (portfolio). I am able to report the first autocorrelation(ret,lag_ret1) for each portfolio, but I can not figure out how to loop through and save the 2nd (ret, lag_ret2) and 3rd (ret, lag_ret3) autocorrelation for each portfolio.
Ultimately, I want to sum up all three correlation coefficients. So if the matrix methodology is not the most efficient way, I am open to other options.

Code:

matrix corre = J(3,3,0)
matrix list corre

forvalues port=1/3{
forvalues lag=1/3{
correlate ret lag_ret`lag' if portfolio ==`port'
matrix c=r(C)
matrix corre[`lag',`port']= c[`lag'+1,1]
}
}
matrix list corre

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float portfolio double ret float(lag_ret1 lag_ret2 lag_ret3)
1  .0023099415536437717 -.00009128346   -.001888566  -.0013676324
1 -.0005077351815998554   .0023099415 -.00009128346   -.001888566
1 -.0011798939152530073  -.0005077352   .0023099415 -.00009128346
1  .0018680473880964184   -.001179894  -.0005077352   .0023099415
1  .0052912740931999915   .0018680474   -.001179894  -.0005077352
1  -.002362792730488092    .005291274   .0018680474   -.001179894
1  -.004439577525402656   -.002362793    .005291274   .0018680474
1   .011633304806676608  -.0044395775   -.002362793    .005291274
2   .010083204463551768    .009573128    -.00733168    -.00981641
2  .0029652350642756235    .010083204    .009573128    -.00733168
2   .002781096110813135    .002965235    .010083204    .009573128
2 -.0004845770619188746    .002781096    .002965235    .010083204
2   .009512511099169611  -.0004845771    .002781096    .002965235
2 -.0031710873696614394     .00951251  -.0004845771    .002781096
2  .0022258799652889617  -.0031710875     .00951251  -.0004845771
2   .010312552598198174     .00222588  -.0031710875     .00951251
3  .0014073778989679012    .005959528   -.007621187  -.0040159286
3   .014511975238374511    .001407378    .005959528   -.007621187
3  -.014570034809472146    .014511975    .001407378    .005959528
3   .011700462176626132   -.014570035    .014511975    .001407378
3 -.0024378968148746276    .011700463   -.014570035    .014511975
3 -.0025385521429901322   -.002437897    .011700463   -.014570035
3  -.004463553329410612   -.002538552   -.002437897    .011700463
3    .01900731705770385  -.0044635534   -.002538552   -.002437897
end

Coefplot: reduce spacing on y-axis

Hello everybody,

I have a question about how to reduce the spacing between labels on the y-axis when using Benn Jann's coefplot command. This is related to a post I found on stackoverflow,, but the discussion there didn't quite solve the problem I think. Otherwise the plot will take up to much space in my document and I don't want to scale it down in a minipage or so, because then also the labels will be very small.
Anyway, reading the post there gave me the idea to fumble around with ysize(), xsize() and aspectratio().

Here is some code for demonstration:

Code:

quietly sysuse auto, clear
quietly regress price mpg trunk length turn
set autotabgraphs on

* i)
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          scheme(s1mono) ///
          title(i) ///
          name(i, replace)
* ii)
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          xsize(3) ysize(1) ///
          scheme(s1mono) ///
          title(ii) ///
          name(ii, replace)
* iii)
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          aspectratio(.33) ///
          scheme(s1mono) ///
          title(iii) ///
          name(iii, replace)
* iv)          
coefplot, drop(_cons) xline(0) ///
          xlabel(-600(100)300) ///
          aspectratio(.33) ///
          xsize(3) ysize(1) ///
          scheme(s1mono) ///
          title(iv) ///
          name(iv, replace)

Number iii comes closest to what I need, however I would like to remove the empty white space on top and below the plotregion. I tried to do that by fooling around with margin() in graphregion() and plotregion(), but that didn't yield the desired result.
What I would like to have is number iii with the grey shaded area removed:

Array

Does anybody know how to achieve that - or where to find the solution?

Thanks in advance and best regards,
Boris

Getting an error from putexcel set ..., open

I have this simple line of code to setup my excel worksheet but I get the following error when I try to run it:
putexcel_set_new(): 3010 attempt to dereference NULL pointer
<istmt>: - function returned error

Code:

putexcel set "Data collection tracked", replace open

It was working at first but then, without changing the line, I started to get this error. I appreciate any insight anyone has on how to resolve this.

Thank you,
-Daniel

coefplot chang bar color?

My question is how can I change the bar colors when I`m using

HTML Code:

coefplot

I added this

HTML Code:

bar(1,bcolor(navy)) bar(2, bcolor(maroon))

but it says:

option bar() not allowed
r(198);

HTML Code:

coefplot fcn1 fcn5, format(%9.0f) ///
 title("(a) Overall Perception of China", size(medium)) ///
 color(navy) lcolor(black) lpattern(solid) byopts(cols(1)) ///
 ciopts(recast(rcap)) citop citype(logit) ///    
 recast(bar) bar(1,bcolor(navy)) bar(2, bcolor(maroon))  rescale(80) vertical  ///
 ytitle ("") ///
 graphregion(color()) yscale(r(0 50)) ylabel(0(10)50) ///
 addplot(scatter @b @at, ms(i) mlabel(@b) mlabpos(2) mlabcolor(black)) ///   
 xlabel(1 "Favourable" 2 "Neutral" 3 "Unfavourable") ///
ylabel(, labcol(black)) barwidth(.3) legend(label(1 "Remainers") label(3 "Leavers") ring(0) position(2) bmargin(large))

How to generate predicted wage (gap) with margins command and plot a profile with marginsplot

Dear Stata user,

I would like to have predicted mean immigrant-native wage gaps on the vertical axis and years since migration (ysm) on the horizontal axis and generate such a profile based on the following simple regression pooling immigrants and natives:

Code:

regress logwage im ysm im#i.ysm

where im is a binary variable equal one if immigrant and equal zero if native.

Code:

margins i.ysm

generates the predicted logwage values. I treat years since migration as a set of binary variables where ysm ranges from 0 through 50 years.

Code:

marginsplot, x(ysm) recast(scatter)

produces the plot with ysm on the horizontal axis.

I have no clue how to get the predicted mean immigrant native wage gap on the vertical axis!?

I would appreciate some thoughts on this.

Thank you very much in advance.

Nico

Interpreting Xtabond2 Coefficients & Coefficient Inflation?

Hi,

This is a follow-up question for this topic, but because the question is different, I thought may be it can be more useful in the future if someone searches this question.
Older topic: https://www.statalist.org/forums/for...are-endogenous

As before, here is the characteristics of my data:
"My data has three waves, but with gaps (unbalanced), and the N is approximately 2500. All the variables below are statistically significant in the FE/RE estimations."

I have estimated the following equation (the original variable names are changed for ease of reading):

Code:

xtabond2 DV L.DV X1 X2 X3 X4 wave2 wave3, gmm(L.DV) gmm(X1, lag(2 .)) gmm(X2 lag(2 .)) gmm(X3, lag(1 .)) gmm(X4, lag(1 .)) iv(wave2 wave3, eq(level)) robust small twostep

All Hansen tests, including the incremental ones, are above 0.25 threshold (Kiviat 2019, Roodman 2009).

My question is as follows:
- In the pooled OLS regression, B for X1 is 0.66, and SE is 0.05.
- In the FE regression, B for X1 is 0.22, and SE is 0.05.
- In the Lagged FE (biased) regression, B for X1 is 0.19, and SE is 0.07.

BUT, in the dynamic panel specified above:
- B for X1 is 4.87, and SE is. 2.26.

Other variables (X2, X3, X4) are not like this, they are within the bounds of pooled OLS and FE regression.

What could be the reason for this?

How to look at regression between score bands

Hi, I have a binary variable and a score variable. I want to look at the binary variable in different bands of the score variable (specifically, 0-10, 10-20, 20-30 >30).

Not sure how to do the coding

Code:

clogit case b_var if score== ? , group(set) or

bayesgraphs diagnostics in Item Response Model

Please, how do we use bayesgraphs diagnostics command in Item Response Model?
Thanks for your anticipated assistance.
olayiwola Adetutu
Kwara state University Malete,
Nigeria.

coefplot graph, symmetry between dots

Dear Statalisters,

I am using the coefplot command to create one graph with coefficients from 40 different regressions However, the position of the dots in the chart is not symmetric, as you can see below:
(for instance, look at Kenya and Pakistan plots)

Array

Here is the code I use:

Code:

coefplot /*
   */(female_ono_nr77_reg, aseq("India") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr77_iv, aseq("India") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr55_reg, aseq("United Kingdom") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr55_iv, aseq("United Kingdom") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr135_reg, aseq("Pakistan") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr135_iv, aseq("Pakistan") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr171_reg, aseq("United States") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr171_iv, aseq("United States") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr14_reg, aseq("Bangladesh") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr14_iv, aseq("Bangladesh") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr169_reg, aseq("Ukraine") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr169_iv, aseq("Ukraine") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr2_reg, aseq("United Arab Emirates") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr2_iv, aseq("United Arab Emirates") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr143_reg, aseq("Romania") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr143_iv, aseq("Romania") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr29_reg, aseq("Canada") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr29_iv, aseq("Canada") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr134_reg, aseq("Philippines") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr134_iv, aseq("Philippines") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr50_reg, aseq("Spain") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr50_iv, aseq("Spain") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr144_reg, aseq("Serbia") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr144_iv, aseq("Serbia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr49_reg, aseq("Egypt") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr49_iv, aseq("Egypt") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr80_reg, aseq("Italy") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr80_iv, aseq("Italy") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr65_reg, aseq("Greece") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr65_iv, aseq("Greece") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr74_reg, aseq("Ireland") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr74_iv, aseq("Ireland") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr85_reg, aseq("Kenya") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr85_iv, aseq("Kenya") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr10_reg, aseq("Australia") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr10_iv, aseq("Australia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr17_reg, aseq("Bulgaria") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr17_iv, aseq("Bulgaria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr123_reg, aseq("Nigeria") mcol(maroon) lcol(maroon) mlabcolor(maroon) ciopts(color(maroon))) (female_ono_nr123_iv, aseq("Nigeria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */, keep(female_ono_nr77 female_ono_nr55 female_ono_nr135 female_ono_nr171 female_ono_nr14 female_ono_nr169 female_ono_nr2 female_ono_nr143 female_ono_nr29 female_ono_nr134 female_ono_nr50 female_ono_nr144 female_ono_nr49 female_ono_nr80 female_ono_nr65 female_ono_nr74 female_ono_nr85 female_ono_nr10 female_ono_nr17 female_ono_nr123) xline(0) mlabel(cond(@pval<.01, "***", cond(@pval<.05, "**", cond(@pval<.1, "*", "")))) note("* p < .1, ** p < .05, *** p < .01") mlabgap(*0) mlabsize(vsmall) mlabposition(1) msym(d) aseq swapnames sort

 
 
 
*Top 20, only IV
coefplot /*
   */(female_ono_nr77_iv, aseq("India") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr55_iv, aseq("United Kingdom") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr135_iv, aseq("Pakistan") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr171_iv, aseq("United States") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr14_iv, aseq("Bangladesh") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr169_iv, aseq("Ukraine") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr2_iv, aseq("United Arab Emirates") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr143_iv, aseq("Romania") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr29_iv, aseq("Canada") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr134_iv, aseq("Philippines") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr50_iv, aseq("Spain") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr144_iv, aseq("Serbia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr49_iv, aseq("Egypt") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr80_iv, aseq("Italy") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr65_iv, aseq("Greece") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr74_iv, aseq("Ireland") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr85_iv, aseq("Kenya") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr10_iv, aseq("Australia") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr17_iv, aseq("Bulgaria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */(female_ono_nr123_iv, aseq("Nigeria") mcol(navy) lcol(navy) mlabcolor(navy) ciopts(color(navy)))/*
   */, keep(female_ono_nr77 female_ono_nr55 female_ono_nr135 female_ono_nr171 female_ono_nr14 female_ono_nr169 female_ono_nr2 female_ono_nr143 female_ono_nr29 female_ono_nr134 female_ono_nr50 female_ono_nr144 female_ono_nr49 female_ono_nr80 female_ono_nr65 female_ono_nr74 female_ono_nr85 female_ono_nr10 female_ono_nr17 female_ono_nr123) xline(0) mlabel(cond(@pval<.01, "***", cond(@pval<.05, "**", cond(@pval<.1, "*", "")))) note("* p < .1, ** p < .05, *** p < .01") mlabgap(*0) mlabsize(vsmall) mlabposition(1) msym(d) aseq swapnames sort

Any advice on how to make the graph look better?

Thank you very much,
Estrella

Help with counts

Hello,

I am working on a prescribing data set containing details of about 100,000 people. The lay out is similar to the table below:

ID	Gender (0=male, 1=female)	Paracetamol_2002 (0=no, 1=yes)	Paracetamol_2002_count	Codeine_2002 (0=no, 1=yes)	Codeine_2002_Count	Ibuprofen_2002 (0=no, 1=yes)	Ibuprofen_2002_Count
Pat01	0	1	15	0	0	1	1
Pat02	0	1	7	0	0	1	4
Pat03	1	1	3	1	4	0	0
Pat04	0	0	0	1	6	1	6
Pat05	1	0	0	1	12	0	0
Pat06	1	1	12	1	12	0	0
Pat07	1	0	0	0	0	0	0
Pat08	0	0	0	1	9	1	18
Pat09	1	0	0	1	1	1	5

There is information on 23 different medicines covering a period of 5 years. I need to find out the following:
1. how many prescriptions were issued for each medicine in each year- for this is used the command 'count if Paracetamol_2002_count > 0'. is this the best way to do this? i also tried commands such as 'tab paracetamol_2002_count'.
2. How many people received those prescriptions in a given year- i tried 'count if Paracetamol_2002 > 0' but it didn't quite work. Any suggestions on how i might do this? How do i work out how many women for instance, received prescriptions for each drug in a given year?

I appreciate that these might be somewhat elementary questions but i would really appreciate any help i can get.

How drop some elements from a macro list of files

Hi Statalist

I have a question.

I have the following datasets: 2008_a8.txt, 2010_a4.txt, 2012_a7.txt, 2014_a5.txt, 2015_a2.txt and 2016_a3.txt.
A first part of my .do file generate the following version for each dataset: *_v2.txt, *_v3a.txt, *_v3d.txt and *_v4.txt.

Thus, if I do the following commands:

local files : dir "C:\Users\nf19281\Desktop\example" files "*.txt"
macro list _files
_files: "2008_a8.txt" "2008_a8_v2.txt" "2008_a8_v3a.txt" "2008_a8_v3b.txt" "2008_a8_v4.txt" "2010_a4.txt" "2010_a4_v2.txt" "2010_a4_v3a.txt" "2010_a4_v3b.txt" "2010_a4_v4.txt"
"2012_a7.txt" "2012_a7_v2.txt" "2012_a7_v3a.txt" "2012_a7_v3b.txt" "2012_a7_v4.txt" "2014_a5.txt" "2014_a5_v2.txt" "2014_a5_v3a.txt" "2014_a5_v3b.txt" "2014_a5_v4.txt"
"2015_a2.txt" "2015_a2_v2.txt" "2015_a2_v3a.txt" "2015_a2_v3b.txt" "2015_a2_v4.txt" "2016_a3.txt" "2016_a3_v2.txt" "2016_a3_v3a.txt" "2016_a3_v3b.txt" "2016_a3_v4.txt"

Actually, I need that my macro list considers only initial datasets.
How can I ask to Stata to consider only the original datasets?

I tryed with: local files : dir "C:\Users\nf19281\Desktop\example" files "*.txt" !="*v*.txt" but it doesn't work.

Thank you!

Textbox position in bar graphs

I am trying to change the position of subtitles in bar graphs. I've created a MWE from Stata's dataset:

Code:

 
 use http://www.stata-press.com/data/r15/lifeexp, clear graph bar, over(lexp) by(region)

I wish that the subtitle textbox position (of all subgraphs) would be on the red rectangle shown in the attached screenshot (the screenshot shows only one subgraph).
Anyone knows how to do it?

Array

How check if value in one variable is equal to any value in other variables

I am trying to gen a variable or variables that say VAR1 has elements in commons with other 2 variables (VAR2 and VAR3), VAR2 has element in common with only one variable (VAR4), VAR 3 has elements in common with only VAR1 and VAR4 has elements in common with only one variable VAR2. I have to do this by place.

Is there any way to do this in Stata?
Thanks.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str51 place float(VAR1            VAR2                    VAR3                    VAR4)
"SAN LEANDRO"      1996 . . .
"SAN LEANDRO"      1997 . . .
"SAN LEANDRO"      1998 . . .
"SAN LEANDRO"      1999 . . .
"SAN LEANDRO"           . . . .                                 1994
"SAN LEANDRO"            . . . .                                1995
"SAN LEANDRO"             . . . .                               1996
"SAN LEANDRO""             . . . .                              1997
"SAN LEANDRO"" . . . .                   1998
"SAN LEANDRO""  . . . .                  1999
"SAN LEANDRO""   . . . .                  2000
"SAN LEANDRO""       . . . .              2001     
"SAN LEANDRO""           . . . .                                                          2000  
"SAN LEANDRO""            . . . .                                                         2001
"SAN LEANDRO""             . . . .                                                        2002
"SAN LEANDRO""              . . . .                                                       2003 
end

How to rescale A variable by range

Hello statalist members,
i want to rescale my contious variables(x,y) by range(maximum-minimum) because of my other variables which are dichotomous.so i want to rescale my contious variables(x,y) so that thier values have a maximum range of 1.variable x is scaled by range of x by code and year, and variable y is by range of y in the focal company.
My data set is individual level i,e age, tenure ,race
Could you please guide me how can i do it please?
best regards.

Textbox position in bar graphs

I am trying to change the position of subtitles in bar graphs. I've created a MWE from Stata's dataset:

Code:

use http://www.stata-press.com/data/r15/lifeexp, clear
graph bar, over(lexp) by(region)

I wish that the subtitle textbox position (of all subgraphs) would be on the red rectangle shown in the attached screenshot (the screenshot shows only one subgraph).
Anyone knows how to do it?

Array

Counting the number of ID type for each country in the data

Hi,

My Panel data has 260 Panel IDs(firms) over a number of time period and another variable names as ID_Type (which is either 1 or 2 for each ID(firm)) and a third variable stating total 12 countries that the IDs belong to. The data looks like this:

Time IDs ID_Type Country
-- A IFI Pak
--- B CFI Pak
--- C IFI Bahrain
-- D IFI India
--- E CFI India
--- F IFI Pakistan

I want produce the table where the number of IDS and ID-Types can be represented for each country.such as a Table like:

Country ID IFI CFI
Pak 3 2 1

Bahrain 1 1 0

India 2 1 1
All the tables that I have made show the number of observations for IDs and ID-Types. But I need to count how many unique ID_types exist in each county. I was able to make such table some time ago but now am not able to replicate again. My previous table that I want to make again is given below:

Country	IDs	CFI	IFI
Bahrain	15	8	7
Bangladesh	30	29	1
Egypt	13	11	2
Indonesia	36	23	13
Jordan	33	28	5
Kuwait	38	18	20
Malaysia	71	31	40
Pakistan	56	42	14
Qatar	15	9	6
KSA	18	10	8
Singapore	18	17	1

Total	376	250	126

extract year

Dear All, How can I extract year (Stata format) from the following variable?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 date
"28-Jun-95"
"29-Jun-99"
"30-Jun-00"
"1-Jul-05" 
"2-Jul-15" 
end

Thanks in advance.

Wednesday, January 29, 2020

clustering and concentration Stata code

Hi everyone,

I am in the process of writing one of my dissertation chapters. Using US Census and ACS, I want to measure Middle Eastern and North African immigrants’ spatial concentration and clustering in metropolitan areas.

I am wondering if there’s anyone who might have clustering and concentration Stata code. I would really appreciate it if anyone --who has worked on something similar--would like to share the code with me. That would be tremendously helpful!

Thank you,
Sevsem

Foreach Loop to Rename Variable

Dear all,

I was trying to use foreach loop to rename variables, I wanted to names of the variables to be like Capacity1, Capacity2, Capacity3...How ever I got a error message as shown below:

Array

Will appreciate if I could get any help on this!

Thank you so much,
Craig

Local and global macro are not capturing variables' names but the values in the first row

Dear all,

I have a problem when using the local and global macro.

When I set up a local macro based on variable names, then display it, what stored in the local is the data in the first row instead of the variable names. Same thing happened to global macro. The data was imported from excel file, I tried .csv file as well, didn't work either. Any idea how can I solve this?

code for importing data:
import excel using Indicator.xlsx, sheet("Sheet2") first clear

code and output for setting up local macro for all variables: Array

As shown above, the local macro stored the values of the first row of data, instead of the variable names. I tried select a list of variables manually to set up the local, same thing happened.

Will appreciate it if I can get help on this.

Many thanks,
Craig Yang

Nested Variables in glm

I am trying to use nested variables in glm because I have a non-normal dependent variable potentially explained by nested categorical and continuous variables. I see how to do this in anova (e.g., anova allwasps PlotCode / TmtCode|PlotCode / YR / Month/YR), but if I try the same with glm (e.g., glm allwasps PlotCode / TmtCode|PlotCode / YR / Month/YR, family(poisson)), I get the error message "'/' not allowed in varlist". Is there a way of nesting variables in glm syntax? Thank you.

How to calculate ten-year moving average of variable x by the dummy variable europe in a dataset as the following? Many thanks!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id_number int year float europe double x
  1 1302 1 1.1393939393939394
  2 1302 1                 .4
  3 1304 1                  .
  4 1304 1 1.0714285714285714
  5 1306 1  .6666666666666666
  6 1307 1                 .2
  7 1311 1  .4166666666666667
  8 1313 1                  .
  9 1314 1  .3865979381443299
 10 1315 1                  .
 11 1315 1              .1875
 12 1316 1 1.4285714285714286
 13 1316 1                  .
 14 1316 1                  .
 15 1318 1                  .
 16 1318 1                  1
 17 1319 1                  .
 18 1322 1                  .
 19 1322 1                  0
 20 1322 1                  4
 21 1322 1 1.2857142857142858
 22 1325 1             .21875
 23 1325 1                  .
 24 1328 1                  1
 25 1329 1                  2
 26 1329 1                  .
 27 1330 1  .3333333333333333
 28 1330 1                  .
 29 1332 1               1.78
 30 1332 1  .7142857142857143
 31 1333 1  .6923076923076923
 32 1336 0                  2
 33 1337 1                  .
 34 1337 0                  .
 35 1340 1                  .
 36 1340 1                .24
 37 1340 1                  .
 38 1341 1                  .
 39 1342 1                  .
 40 1344 1  2.066666666666667
 41 1345 1 .21428571428571427
 42 1345 1                  .
 43 1345 1 3.7735849056603774
 44 1345 1                  .
 45 1345 1                  .
 46 1346 1                  .
 47 1346 1                .18
 48 1346 1                  8
 49 1346 1 1.4285714285714286
 50 1346 1                 .3
 51 1346 1  .5416666666666666
 52 1347 1                  0
 53 1347 1                  .
 54 1347 1                  0
 55 1349 1                  3
 56 1349 1                  .
 57 1350 1                  .
 58 1351 1                  1
 59 1351 1                  .
 60 1352 1                 .4
 61 1354 1                  .
 62 1356 1  .2222222222222222
 63 1356 1  .5454545454545454
 64 1358 1  .4444444444444444
 65 1361 1                  .
 66 1362 1                  .
 67 1362 1                  .
 68 1364 1 .40454545454545454
 69 1364 1                  3
 70 1364 1               .875
 71 1365 1                  .
 72 1367 1  .4666666666666667
 73 1369 1                  .
 74 1370 1                 .7
 75 1371 1                  .
 76 1371 1                .55
 77 1372 1                  .
 78 1373 1                  .
 79 1377 1                  .
 80 1377 1                  .
 81 1377 1                  .
 82 1380 1                4.4
 83 1381 1                  .
 84 1382 1 1.1428571428571428
 85 1384 1                .28
 86 1385 1                  .
 87 1385 1                 .5
 88 1385 1 .20967741935483872
 89 1385 0                .15
 90 1385 1                  .
 91 1386 1                .35
 92 1386 1 1.6666666666666667
 93 1387 1  .6113207547169811
 94 1388 1             .90625
 95 1388 1  .3888888888888889
 96 1389 1 1.5952380952380953
 97 1391 1                  .
 98 1394 1                  .
 99 1394 1                  .
100 1395 1                  .
end

Studentized / Standardized Residuals

Is it possible to obtain predicted studentized (jackknifed) or standardized residuals with robust/clustered estimation or weighted data? I use weights and cluster the standard errors or use robust standard errors in some specifications. However I get an error message that it is not possible to predict studentized (jackknifed) or standardized residuals with weighted data or robust estimation.

TABSTAT: How to group my summary statistics?

Hello together.

I need help and search for a simple way to "group" my summary statistics.

My current Stata code is as follows: by year,sort :tabstat variable1 variable2 , s(n mean ...)

It gives me the number of observations and mean as output for each year...

... however, I now like to group the years by let's say (2000, 2001, 2002 and 2003) and (2004, 2005, 2006, 2007).

Is there an easy way to do this without creating new variables?

Thank you!

Konstantin

Troubleshooting - Temp file does not find a variable

Good evening everyone. I have the following code

Code:

use "Dataset.dta", replace

drop if 5*365>dis_end
tempfile A0
save `A0'

// Panel a

use `A0',clear
keep if 5*365<=dis_end
forvalue dfe = 1/5 {  //
 g         c_`dfe' = ned>`dfe'*365/2 if !mi(ned)
 replace c_`dfe' = 1                 if  mi(ned)
}
keep c_*
collapse (mean) c_*
 duplicates drop
g n =1
 reshape long  c_, i(n) j(week)
drop n
replace week = week/2
g cd_ = int(c_*1000)/1000
twoway ///
(line c_ week, lc(black)     lp(dash)) ///
(line c2 week, lc(black)     lp(dash) ) ///
(connect c_ week, lcolor(black) mcolor(black) msize(vsmall)) ///
(scatter c_ week if int(week)==week, mcolor(black) msize(vsmall) mlabel(cd_) mlabang(90)) ///
,  leg(off) ytitle("Survival rate") xtitle("Years since layoff") ///
xline(30 39, lp(dash) lcolor(gs10)) yline(0, lcolor(gs10))  
 graphregion(fcolor(white))  xl(1(1)5, grid) ylabel(0(.03).2, grid) ///   
 subtitle("Survival Rate (Prob. of Remaining Unemployed)",  size(m)) ///
 subt("Panel a") xtitle("Weeks since layoff")

keep c_ week
rename c_ survlevel
merge 1:1 week using `A1', nogen
tempfile A1
save    `A1', replace

My problems lies at the end. I think the very last line returns a "invalid file specification" error. I am reviewing the code and maybe there is a previous error in there, but does anyone see anything obvious that could prompt this error? Is there any way to open `A1' in its state before the last line for inspection?

npregress kernel regression using panel data

Hello, I am exploring npregress kernel command for a panel data (Country year). I was reading a working paper titled: Panel Data Specifications in Nonparametric Kernel Regression: An Application to Production Functions by Tomasz Czekaj and Arne Henningsen. They mention that we can use time variable and the individual identifier as additional (categorical) explanatory variables. I have a panel of 90 countries 24 years. I tried it. I created a variable id by country and n1 by year.

Command looks like: npregress kernel x y z i.id i.n1, reps(10) seed(8)

I get the results for time variable but the country id's are not there in the regression output (Effect). Will this approach work?

Best practice when using 'format' while maintaining precision of double

I added set type double at the beginning of my code to implement the double-precision for all newly created variables. This works fine.

However, I also want to display values of my newly created variables with two-decimal digits (e.g. 0.34, 12.01, 9.30). I naively thought that enabling double-precision throughout and then running format new_variable %12.2fc would do the trick. Clearly it doesn't as the format command cause precision issues.

How can I then display newly created variables with two-decimal digits while minimizing any lose in precision?

Thank you.

Coefs. not Signif. but Improving Moment Conditions | xtdpdgmm

Hi,

I'm currently modeling the public-private wage gap in Brazil. The work revolves around estimating Mincer earning equations for all Brazilian States for a set of years, based on public surveys that we have here, of course controlling (through a dummy) if the person is a public employee or not.

With they average pay gap estimated by the first set of equations I'm looking to how this pay gap behaves along other macro variables. After literature research and testing some models I have the following model:

xtdpdgmm l(0/3).gap pibg l(0/2).desemp, collapse model(diff) gmm(gap, l(2 4)) gmm(pibg, l(0 3)) gmm(desemp, l(0 6)) teffects two vce(robust)

Reporting only the coefficients and p-values in []

Fitting full model:
Step 1 f(b) = .00116484
Step 2 f(b) = .13189159

Group variable: uf Number of obs = 297
Time variable: year Number of groups = 27

Moment conditions: linear = 25
Obs per group: min = 11
nonlinear = 0 avg = 11
total = 25 max = 11

(Std. Err. adjusted for 27 clusters in uf)
------------------------------------------------------------------------------
| WC-Robust
gap | Coef. [P>|z|]
-------------+----------------------------------------------------------------
gap |
L1. | .4221597 [0.002]
L2. | .0809683 [0.525]
L3. | .1240309 [0.187]
|
pibg | .227904 [0.018]
|
desemp |
--. | 1.070113 [0.212]
L1. | -.0882452 [0.922]
L2. | 2.114248 [0.071]
|
year |
2006 | .0397795 [0.170]
2007 | .0089088 [0.671]
2008 | .0144458 [0.608]
2009 | .0381506 [0.119]
2010 | .0383212 [0.143]
2011 | .0269762 [0.289]
2012 | .0734528 [0.026]
2013 | .0696775 [0.010]
2014 | .1018182 [0.006]
2015 | .0931151 [0.032]
|
_cons | -.1425435 [0.551]
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
1, model(diff):
L2.gap L3.gap L4.gap
2, model(diff):
pibg L1.pibg L2.pibg L3.pibg
3, model(diff):
desemp L1.desemp L2.desemp L3.desemp L4.desemp L5.desemp L6.desemp
4, model(level):
2006bn.year 2007.year 2008.year 2009.year 2010.year 2011.year 2012.year
2013.year 2014.year 2015.year
5, model(level):
_cons

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix chi2(7) = 3.5611
Prob > chi2 = 0.8287

2-step moment functions, 3-step weighting matrix chi2(7) = 3.8972
Prob > chi2 = 0.7915

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1: z = -3.9248 Prob > |z| = 0.0001
H0: no autocorrelation of order 2: z = 1.3064 Prob > |z| = 0.1914
H0: no autocorrelation of order 3: z = 0.7208 Prob > |z| = 0.4711
H0: no autocorrelation of order 4: z = 0.6962 Prob > |z| = 0.4863
H0: no autocorrelation of order 5: z = -1.1282 Prob > |z| = 0.2592

Where 'gap' is the pay gap between public and private employees; 'pibg' is the growth rate of the real GDP and 'desemp' is the level of unemployment. Other variables and model specifications were tested and this one is the one that seems more reliable. Coefficients for are significant and with the expected signal

Taking into account the reduced number of instruments, overidentifying tests and ABAR tests the seems to be ok.

So what's my concern. The model seems to be well specified only when I include lags 2 and 3 of the dependent variable, and even if they are non significant if I remove them the fitting tests worsen a lot.

My doubt is: is there any problem to keep these parameters on the model even if they are non significant if they improve (a lot) the moment conditions?

Thank you!

Working on Nested Loop

My data is stored in a panel data format. I want to estimate the ARDL model that accommodates one structural break. The break dates are identified using the Bai-Perron test. Suppose I have data for 3 countries (identified by "ID" variable). The break dates are 2004 2001 2003. So, I wrote the following code:

Code:

forvalues i = 1/3 {
foreach j in 2004 2001 2003 {
use dataex, clear
keep if id==`i'
tab id if id==`i'
*egen trend=_n
gen DU=0
gen DTB=0

di "id        =`i'"
di "year      =`j'"

tsset year, yearly
// genarating break dummies
replace DU=1 if year>`j'
replace DTB= DU*trend if year>`j'
ardl x y, exo(DU DTB) maxcombs()
ardl x y, exo(DU DTB) ec maxcombs()
estat btest, n(25)
*outreg2 using "ARDL_results1", excel dec(2) br adjr2
} 
}
*

When I ran the above code it is estimating 9 ARDL models. But I want 3 models to be estimated, one country with one break. That is to, say, when ID=1, it should accommodate break date which is 2004. The next ARDL model for ID=2 should accommodate break date 2001.

When I run the above command, it is not running successfully. The dataex is the following:

Here is the data

Code:

clear all
input int id year x y
1 1993 1.3 16.32229
1 1994 1.19 16.37688
1 1995 0.87 16.43435
1 1996 0.97 16.49549
1 1997 1.04 16.48168
1 1998 1.04 16.59641
1 1999 1.01 16.64122
1 2000 1.05 16.7197
1 2001 1 16.76099
1 2002 0.98 16.78788
1 2003 0.95 16.87725
1 2004 0.84 16.95561
1 2005 0.83 17.00773
1 2006 0.82 17.11077
1 2007 0.92 17.23401
1 2008 0.96 17.25514
1 2009 0.98 17.32504
1 2010 1.03 17.39092
1 2011 1.11 17.45152
1 2012 1.01 17.45475
1 2013 1.04 17.52202
1 2014 0.7 17.61002
1 2015 0.6 17.71079
1 2016 0.67 17.81573
1 2017 0.58 17.92209
2 1993 1.57 15.89891
2 1994 1.53 16.00251
2 1995 1.08 15.85265
2 1996 0.95 16.06596
2 1997 0.95 16.02667
2 1998 0.95 16.09982
2 1999 1.42 16.13595
2 2000 1.16 16.28473
2 2001 0.85 16.23624
2 2002 0.81 16.34799
2 2003 0.76 16.29516
2 2004 0.64 16.41004
2 2005 0.91 16.39299
2 2006 0.83 16.54298
2 2007 0.86 16.59703
2 2008 0.71 16.73281
2 2009 0.71 16.78492
2 2010 0.61 16.92499
2 2011 0.61 17.0229
2 2012 0.54 17.06142
2 2013 0.55 17.11005
2 2014 0.85 17.14586
2 2015 0.84 17.20201
2 2016 0.98 17.29607
2 2017 1.16 17.4032
3 1993 1.42 13.74485
3 1994 1.26 13.79311
3 1995 1.2 13.86607
3 1996 1.19 14.00822
3 1997 1.09 14.0362
3 1998 1.1 14.23988
3 1999 1.09 14.26074
3 2000 1.08 14.22263
3 2001 1.08 14.26661
3 2002 1.01 14.33512
3 2003 0.99 14.40737
3 2004 0.89 14.50444
3 2005 0.87 14.57711
3 2006 0.82 14.67261
3 2007 0.76 14.72656
3 2008 0.87 14.82205
3 2009 0.94 14.91921
3 2010 0.98 15.07526
3 2011 0.87 15.25929
3 2012 1.06 15.09228
3 2013 1.21 14.96508
3 2014 1 15.20471
3 2015 0.96 15.34354
3 2016 0.95 15.46122
3 2017 1.07 15.58947
end

Any suggestions will be greatly appreciated.

Combining observations into one

I´m working on a data which has registered repeated observations because there are some observations (individuals) that belong to more than one values of the variable called P803.
The variable id has the purpose of identifying each individual, for example the line n°11 and 12 of the observations belong to the same individual but it is registered as two; this has caused me trouble due to the fact that I need to merge this database to another one and it cannot be done because of each id does not have unique values (they are registered as two observations)

What I´ve tried is to create as many dummy as the possible options of P803 are, however I still need to find a way to combine the observations so it appears as one unique observations, I will try to ilustrate with a table:

For example, As line 11 and 12, I have:

id	P803
201704007065005004064110101014402	7
201704007065005004064110101014402	8

And I need.

id	P803_7	P803_8
201704007065005004064110101014402	1	1

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 id byte P803
"201712007060005001002110101014401" 18
"201712007060005001023110101014401" 18
"201712007060005001023110101014402" 18
"201712007060005001035110101014401" 18
"201712007060005001046110101014402" 18
"201704007061005002009110101014401"  1
"201704007061005002071110101014403"  1
"201711007063005003001110101014402" 12
"201711007063005003053110101014401"  7
"201711007063005003070110101014401" 18
"201711007063005003140110101014402"  7
"201711007063005003140110101014402"  8
"201704007065005004025110101014401" 18
"201704007065005004064110101014402"  7
"201704007065005004064110101014402"  8
"201704007065005004160110101014401"  8
"201704007065005004160110101014402"  8
"201706007066005005042110101014402" 20
"201706007066005005114110101014401" 16
"201706007066005005114110101014402" 20
"201708007068005006016110101014402"  7
"201708007068005006016110101014402" 18
"201708007068005006046110101014401" 10
"201703007070005007007110101014403"  1
"201703007070005007036110101014401"  1
"201707007073005008004110101014402" 10
"201707007073005008004110101014402" 18
"201707007073005008058110101014401"  1
"201707007073005008058110101014403"  7
"201707007073005008058110101014404"  1
end
label values P803 P803
label def P803 1 "Clubes y asociaciones deportivas", modify
label def P803 7 "Asociación profesional", modify
label def P803 8 "Asociación de trabajadores o sindicato", modify
label def P803 10 "Asociación de padres de familia (APAFA)", modify
label def P803 12 "Comedor popular", modify
label def P803 16 "Comunidad campesina", modify
label def P803 18 "Otro/a", modify
label def P803 20 "Participación en la preparación de desayuno y/o almuerzo escolar", modify

This happens through the whole data, it is not about the observations from the example only, so I have to do it with each observation that has the same issue.
I really appreciate any response and possible solution in advance.
I using Stata 13/MP/64.

Calculating Standard Deviation with Rangestat

Hey all,

I have the following problem. I would like to calculate rolling standard deviation of the past returns of the 36 months in a panel data.I tried using he following code:
rangestat (sd) ret, interval (datem -36 -2) by (permno)

However, when looking at the output that Stata has generated, I can see that the standard deviation is calculated even if the number of observations equals two.

I would like to ask for your recommendation how to indicate that I would like Stata to generate the standard deviation when the number of observations equals one.

Thank you in advance!

Nick Cox

grc1leg; how to control legend text and marker size

Dear listers
I would like to use the grc1leg to produce a figure, housing multiple plots but with only one legend. the grc1leg (SCC) package can do that. I use the code:

Code:

grc1leg "g1.gph" "g2.gph" "g3.gph" "g4.gph", row(2)

now the graphs combine fine, and there is only one set of legends, but how do i control the size of the legends. If I put

Code:

legend(size(small))

after

Code:

row(2)

i get an error saying graph not found.

I can manually edit the graph in the editor, but i would like to code it into my do file.
thank you

Legend for graphs with marker symbol invisible

Hello,

I am graphing a twoway dot, with two variables.

I can´t seem to find a way to add a legend (it will display with no color) to the graph when the marker symbol is invisible and the marker label is the value of the variables, since there is no actual marker symbol to display, but the marker labels do have different colors for each variable.

I would appreciate some help with this,
Thank you

Quadratic assignment procedure: syntax for estimation command

I am running a qap command (from SSC), using one half (upper triangle) of a square matrix with the structure N*N*T. N= 2,832 and T=10, hence I have ~40 million observations. I am using Stata/IC 14.2.

The command is working as long as the syntax looks like this:

Code:

qap _qap rowvar colvar dependent_variable, cmd(reg dependent_variable explanatory_variable) stats(_b[explanatory_variable]) timevar(Year) count

As soon as I add more than one explanatory variable (controls) to the cmd(reg dependent_variable explanatory_variable) portion of the command, I get an "invalid syntax" error.

Any help on how to use qap with a more complex estimation program (cmd) would be greatly appreciated! Thank you

Merge if

Hello,

Hi, I want to use merge, but the problem is that my using data set is an already appended data set, so it has a few duplicates in my identification variable called ident.
Nevertheless, I only need to merge a subgroup of my using data set, for which gen==1, and there are no duplicates for ident in this subgroup.

Is there a way then to only merge to my master data the observations of my using data for which gen==1 ? (other than to first use

Code:

 keep if gen==1

on my using data, as I would like to keep it as it is for a moment, I could also use a copy of it, but I was wondering if there was a cleaner way to do this)

Thanks a lot for your help !

Panel Data, one year dummy variable omitted because of collinearity

Hi everybody,

I'm using panel data to examine the effect of CEO age on tone and readability of annual reports for 5 years. I have used fixed effect and random effect. I am including year as dummies, however Stata is omitting year-dummy 2013 because of collinearity which make the total number of observations low. I am using the command:

local RHS1 "ceoage "
*/

local controls "liquidityratio netincome firmsize PriceToBookValue boardsize firmage"

foreach depvar in tone readability {
forval i = 1/1 {
xi: xtreg `depvar' `RHS`i'' i.year, re robust

Any help would be hugely appreciated.

Yours Sincerely,

dan billy

Getting p-values for variables

I am doing a logistic regression using three predictor variables: country, age group, and gender. Is it possible to get the p-value for each of the predictors as a whole? In other words, I want to see if country is a significant predictor rather than just seeing if each country is significant when compared to country 1.

Code:

logit stage1 i.country i.ageGroup i.sex

Iteration 0:   log likelihood = -526.38644  
Iteration 1:   log likelihood = -441.62992  
Iteration 2:   log likelihood =  -436.0493  
Iteration 3:   log likelihood = -435.97828  
Iteration 4:   log likelihood = -435.97828  

Logistic regression                             Number of obs     =        940
                                                LR chi2(7)        =     180.82
                                                Prob > chi2       =     0.0000
Log likelihood = -435.97828                     Pseudo R2         =     0.1718

------------------------------------------------------------------------------
      stage1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     country |
          2  |    .313061   .2425741     1.29   0.197    -.1623756    .7884975
          3  |   .2633563   .3269829     0.81   0.421    -.3775184    .9042309
          4  |   1.205703   .2562296     4.71   0.000     .7035025    1.707904
          5  |    .105372   .3575094     0.29   0.768    -.5953335    .8060775
             |
    ageGroup |
          3  |  -1.481375    .220824    -6.71   0.000    -1.914182   -1.048568
          4  |  -2.649173   .2641079   -10.03   0.000    -3.166815   -2.131531
             |
       1.sex |  -.7830366   .1724172    -4.54   0.000    -1.120968   -.4451051
       _cons |   .5457643   .2142281     2.55   0.011     .1258849    .9656438
------------------------------------------------------------------------------

Thanks

Multiple lineal regression / Dummy variable, interactions and confounding variables

Hi.
I apologize if this is no a question I should post here. I am trying to analyze the interaction and confusion between variables in order to create a multiple linear regression model using STATA 15.

Dependent variable: HDL cholesterol (mg / dL) - quantitative variable (colhdl)
Independent variable: alcohol consumption - qualitative variable / 3 categories (1 “non-drinker”, 2 “moderate drinker” and 3 “risk drinker”).

1) I do not know if the variables should be added to the model as it is analyzed if there are interactions or if they appear to be confounding variables, but what if they are not or if only one category of an analyzed variable is significant?

For example:
- regress colhdl i.drinker -> gender: binary variable (1: male, 2: female)
- regress colhdl i.drinker if gender == 1
- regress colhdl i.drinker if gender == 2
- regress colhdl i.gender ## i.drinker

----------stata:
----------------------Coef. Std. Err. t P>|t| [95% Conf. Interval]
female #
mod drinker | .7557405 1.212174 0.62 (0.533) -1.621379 3.13286

female #
risk drinker | 3.273935 1.678158 1.95 (0.051) -.0169984 6.564868
_cons | 44,57028 .778798 57.23 (0.000) 43.04303 46.09753
*p value in parenthesis.

If it is not significant, do I not add this variable to the model? Or should I assess whether it is a confounding variable?
What do I do if only one of the dummies is significant? How do I add it to the model if I have other variables?

Thank you so much.

country/regional fixed effects

Hi I have a question regarding country fixed effects. I am quite new to this subject, so apologies for the silly question!

I have a model that try to consider what determines the adoption of a specific technology, my independent variables are fdi and some local variables (Patents, export, DVA, etc). I have 34 countries for 12 years. I am running OLS and IV. with time fixed effects. Is it really necessary to have country fixed effects if I control for GDP or population and regional areas? I divided my countries in economic and geographic regions and I am running my models with these three dummies. Is it ok or do I have to insert country fixed effect? Thanks

How to export this table to excel from stata

sysuse auto
tab foreign,sum(price)

This code generates a table that I need to export to Excel.
Can anyone help? Please, its urgent

Data Categorization txt file

Dear All

I have daily share price data in a txt file.

The issue is to split the variables into different columns. Variables are Date Symbol High Low Close Open TradingVolume.

Importing the txt file in stata provides all the information in one column.

Please find below the data imported in Stata:

Array

Will appreciate if I can get help on splitting the information in different columns.

Best regards

Yahya Ghazali

Odds ratio and Con

Hi all,
I have a three-part question from a logistic regression. In the case of this study, I have three categorical predictors: gender (M/F), age group (2, 3, 4), and country of origin (1,2,3,4,5). Both the logit and odds ratio outputs are below.
Here are my questions:
1.) Which confidence intervals do I report? The ones associated with the coefficient or with the odds ratio?
2.) What happens when CI and p-values don't agree? For example, in the first output, ageGroup 3 has an OR = 0.207-1.2, which crosses the "1" threshold, but has a p=0.005. Crossing the 1 means insignificant, correct? But p<0.05 is significant. Does one trump the other? Or does this mean something is wrong with my data?
3.) This might be answered by #2, but how do I interpret an OR greater than 2? I know and OR of 0.76 can be interpreted at "24% less likely" (correct?), but what about and OR of 2.02?

Thanks

Code:

logit stage4 i.ageGroup i.country i.sex

Iteration 0:   log likelihood = -612.81892  
Iteration 1:   log likelihood = -580.34846  
Iteration 2:   log likelihood = -579.85354  
Iteration 3:   log likelihood = -579.85215  
Iteration 4:   log likelihood = -579.85215  

Logistic regression                             Number of obs     =        940
                                                LR chi2(7)        =      65.93
                                                Prob > chi2       =     0.0000
Log likelihood = -579.85215                     Pseudo R2         =     0.0538

------------------------------------------------------------------------------
      stage4 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    ageGroup |
          3  |   .7038665   .2534184     2.78   0.005     .2071756    1.200557
          4  |    1.47736    .253085     5.84   0.000      .981322    1.973397
             |
     country |
          2  |  -.2736965    .208729    -1.31   0.190    -.6827979     .135405
          3  |  -.2919848   .2617669    -1.12   0.265    -.8050386    .2210689
          4  |  -.1398201   .2613874    -0.53   0.593    -.6521299    .3724897
          5  |  -.7572049    .354703    -2.13   0.033     -1.45241   -.0619997
             |
       1.sex |  -.3730072   .1425399    -2.62   0.009    -.6523802   -.0936342
       _cons |  -1.228277     .24262    -5.06   0.000    -1.703803   -.7527503
------------------------------------------------------------------------------

Code:

logit stage4 i.ageGroup i.country i.sex, or

Iteration 0:   log likelihood = -612.81892  
Iteration 1:   log likelihood = -580.34846  
Iteration 2:   log likelihood = -579.85354  
Iteration 3:   log likelihood = -579.85215  
Iteration 4:   log likelihood = -579.85215  

Logistic regression                             Number of obs     =        940
                                                LR chi2(7)        =      65.93
                                                Prob > chi2       =     0.0000
Log likelihood = -579.85215                     Pseudo R2         =     0.0538

------------------------------------------------------------------------------
      stage4 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    ageGroup |
          3  |   2.021554    .512299     2.78   0.005     1.230199    3.321968
          4  |   4.381361   1.108857     5.84   0.000     2.667981    7.195077
             |
     country |
          2  |   .7605629   .1587516    -1.31   0.190     .5052015       1.145
          3  |   .7467799   .1954823    -1.12   0.265     .4470707    1.247409
          4  |   .8695146   .2272801    -0.53   0.593      .520935    1.451344
          5  |   .4689754    .166347    -2.13   0.033     .2340056    .9398832
             |
       1.sex |   .6886603   .0981616    -2.62   0.009     .5208047    .9106158
       _cons |   .2927967   .0710383    -5.06   0.000     .1819901    .4710692
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.