BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Sunday, February 28, 2021

How does Stata plot the probability density function f (x) on the vertical axis and the quantile (0 to 1) on the horizontal axis？

Time to event graph

Hi,
I am really appreciative of all the help that is provided here. Thank you very much. I am trying to build a bar graph with superimposed line.
The x axis is time in days to the procedure from admission.
The y axis is the number of the procedures.

The dataset is as:

patient_id timetoprocedure
1 2
2 5
3 2
4 1
5 6
6 8
7 11
8 3
9 4
10 4
11 3
12 5
13 1
14 2
15 3

I will really appreciate the answer.
Kind regards, WQ

append error (r198)

Dear Madam/Sir,

I am a beginner for STATA. It will be highly appreciative if you can advise me how to fix this error potentially related to "outreg2".

regress ln_audit gafscore mascore big4 cspec ln_nonaudit icw restatement gc auchange merger financing yearend abaccrual ln_at mb leverage roa loss fsalepro SQ_SEGS ar_in spec
ial_item ln_tenure i.cyear i.sic2, robust cluster(gvkey) outreg2 using "C:\Users\hakjoon\Documents.out", append bdec(3) tstat excel bracket

invalid 'append'
r(198);

Thank you
Joon1

synth_runner package

Hello,
I am working on a comparative case-study using the synthetic control method (traditionally implemented by synth package).
However, I find the synth_runner (package) more handy and easy to implement unlike synth (package).
But, when I use synth_runner, the reference line for the policy/treatment implementation is plotted one period (say year) prior to the actual assigned date.
For instance, see the following codes:

. ssc install synth, all
. cap ado uninstall synth_runner //in-case already installed
. net install synth_runner, from(https://raw.github.com/bquistorff/synth_runner/master/) replace

. sysuse synth_smoking, clear
. tsset state year

1. The following code uses synth (the traditional package/code) and at least assigns the reference line to the actual treatment period. This makes sense to me because there reference period aught to be the actual treatment period.
. synth cigsale beer lnincome(1980&1985) retprice cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) fig

Array

2. The following codes uses synth_runner and ideally the treatment period is 1989 but the reference line (the vertical red solid line in the second fig.) is at 1988 (not 1989).
. synth_runner cigsale beer(1984(1)1988) lnincome(1972(1)1988) retprice age15to24 cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) gen_vars
. single_treatment_graphs, trlinediff(-1) raw_gname(cigsale1_raw) effects_gname(cigsale1_effects) effects_ylabels(-30(10)30) effects_ymax(35) effects_ymin(-35)
. effect_graphs , trlinediff(-1) effect_gname(cigsale1_effect) tc_gname(cigsale1_tc)

Array

Now, my question is, after using the synth_runner, is it appropriate to manually change the reference line to reflect the actual treatment date?
Would that alter the treatment effect?
Thank you.

Convert export values to USD

Hello,

I have a dataset that looks like with trade data from 1988-2017

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float year str43 hs06_name str28 country double export_CAD
2014 "Coconut, abaca, ramie/veg tex fib"    "United States"    7613
2015 "Coconut, abaca, ramie/veg tex fib"    "United States"   22107
2016 "Coconut, abaca, ramie/veg tex fib"    "United States"  123544
2017 "Coconut, abaca, ramie/veg tex fib"    "United States"    8401
2002 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"   7500
2003 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"   7500
2004 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"  23250
2005 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"   4000
2006 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"  21085
2007 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"  27562
2008 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom" 160485
2009 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"  37155
2010 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"  60613
2011 "Horses,asses,mules,hinnies,pure-bred" "United Kingdom"  66815
1988 "Horses, pure-bred breeding"           "United Kingdom" 478979
end

The issue is for the column export CAD, I want to make it into USD. I have a separate excel file that has a column with the exchange rate from 1988-2017similar to

year	avg.
1988	0.81
1989	0.84
1990	0.85
1991	0.87
1992	0.82
1993	0.77
1994	0.73
1995	0.72

. I need to multiply the exchange rate for every year by the export value for that particular year. Is there a way to do it by uploading my excel dataset of exchange rate to my trade dataset that already exists in stata?

THanks

description of tsappend1 data set

Hello, Does anybody have a quick description of the tsappend1 data set? Even if it was fictitious i guess it has description of the variables.
thanks

How to analyze Panel Survey Data (Likert scale questions)

Hi all,

I would like to analyze a two-period survey data with attrition, with both dependent and independent variables being Likert-scale questions. However, I have struggled to find answers to all my questions.

Should I dichotomize Likert-scale variables? I.e. go from a Strongly agree/agree/neither agree nor disagree/disagree/strongly disagree grading to a Not agree/Agree grading? It seems that keeping the variables as they are would allow to better identify changes in attitude (i.e. from not agree to neither agree nor disagree) which would not be captured by dichotomizing the variable. However, I’ve talked to a couple of economists who support dichotomizing variables (I forgot why!)
How to generate composite indicators? Based on what I read, there seem to be two school of thoughts: the “naïve” approach, i.e. just averaging the scores; and using explanatory factor analysis to identify factors and use them instead of the variables. However, I’ve been exploring using Confirmatory Factor Analysis to test whether variables measuring different aspect of a concept share common variance (i.e. test the validity of the theory).
Given the above, I was originally looking at using a Wald Difference-in-differences approach, using indicators for both the dependent and independent variables. However, I’m not sure if instead, I should be using a Wald Ordinary Logit/Probit Difference-in-Differences method. If so, I’m not sure whether such approach exists and how they would be implemented.

Thank you for your help

Lexis-diagram

Lexis

I have wondered why there isn't any good Lexis-plot command in Stata? Most of them are very old and do not use new commands stsplit etc. Evhistplot, grlexis2 and stlexis are quite old and lack of many features. Clayton & Hills stlexis is interesting but for Stata 5! Does anybody know a good ado for Lexis. Why there not any in official st-commands?
I was just wondering...

lincom calculation for a subset of data

Hello,
I'm trying to calculate the dfiference of certain Times on the basis of a mixed-effect linear regression model.
The model output looks like the following:
Array
The consecutive margins output is then giving me this:
Array
Well,since we clearly know, that there aren't any data for Saentis == 0 and Time == 4
the linear combination of (lincom 0#15 - 0#4) should result in an error.
But the output gives me an unclear output.
Array

For your help, I am very grateful.
Best regards, Simon

Function to give year of start

Dear all,

I have a dataset with the US census for the 1970-1999 period combined with a dataset with information about different public policies concerning the labor market. I have a cross-section of several million individuals, with a variable indicating the state of birth. I also have a series of dummy variables indicating if a policy was active at each of the available years. For example, if in Nevada the policy called G started in 1980 and was active for the rest of the period, for individuals born in Nevada, the dummy variables g_1970birth-g_1979birth will be 0, while the variables g_1980birth-g_1999birth will be 1.
I want to create a new variable, call it year_g, which will give me the year the policy G started in for the state of birth of each individual (year_g=1980 for the previous example). For that purpose, I have written the following line:

Code:

forval w=1970(1)1999{
replace year_g=`w' if g_`w'birth==1 & g_`w-1'birth==0
}

The idea is that since it requires the previous year to be 0, this can only work with the starting year. However, I'm obtaing year_g=1999 for all the observations, and I can't seem to figure out what is wrong with the code. I'd be very grateful if someone could help me figure it out.
Thank you very much for your time.

Percentile based on age and year using loop

Hi all,

I have a dataset that includes total income, variable age ranges from 25 to 54, year from 1982 to 2018 and immigrant dummy variable. I want to generate two dummy variables for those who are at the top 1% and top10% of income distribution based on the year and age and then plot the share of those immigrants who are at the top 1% and top10% over years. So, basically, I generated the percentiles for each age group and for each year as follow:

Code:

gen ptile_inc=.

forvalues a = 25/54 {
xtile p`a' = totinc if age==`a' & year==1982 [aw=weight], nq(100)
replace ptile_inc=p`a' if age==`a' & year==1982
}

gen ptile_inc1=.

forvalues a = 25/54 {
xtile p1_`a' = totinc if age==`a' & year==1983 [aw=weight], nq(100)
replace ptile_inc1=p1_`a' if age==`a' & year==1983
}

gen top1_1982=(ptile_inc==100 | 
gen top1_1983=ptile_inc1==100

gen top10_1982=ptile_inc>90
gen top10_1983=ptile_inc1>90

This code is for only two years. However, doing this for each year needs too many codes and also creates too many variables. I am just wondering is there a neat syntax for creating top1% and top 10% dummy variables. Any guidance on this is much appreciated.

Calculating proportional effects with a GLM model

Dear Statalist Members

When comparing the output of the margins [eydx] for semielasticity (proportional effects) of a GLM model with log link and another with identity link I found that they are very close. The estimates of course could differ. With the identity link function no transformation is done. With the log link the linear index is exponentiated. See how the estimates by class origin of the advantages of the white group in the children's income were for Brazil.

Code:

 
GLM model with family(gamma) link(log)
 
Average marginal effects                          Number of obs   =      30414
Model VCE    : OIM
Expression   : Predicted mean income, predict()
ey/dx w.r.t. : 1.white
------------------------------------------------------------------------------
             |            Delta-method
             |      ey/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1. white     |
       class |
  social top |    .535371   .0523935    10.22   0.000     .4326816    .6380604
     skilled |   .5341033   .0568641     9.39   0.000     .4226518    .6455549
 small assets|   .4788623   .0274684    17.43   0.000     .4250253    .5326994
      worker |   .3075652   .0313738     9.80   0.000     .2460737    .3690567
   destitute |    .353975    .025128    14.09   0.000     .3047249     .403225
------------------------------------------------------------------------------
 
 
GLM model with family(gamma) link(identify)
 
Average marginal effects                          Number of obs   =      30414
Model VCE    : OIM
Expression   : Predicted mean income, predict()
ey/dx w.r.t. : 1. white
------------------------------------------------------------------------------
             |            Delta-method
             |      ey/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1. white     |
       class |
  social top |    .567763   .0560234    10.13   0.000     .4579592    .6775669
     skilled |   .5688268   .0608277     9.35   0.000     .4496066     .688047
 small assets|   .4061208    .024772    16.39   0.000     .3575686    .4546731
      worker |   .3342984   .0340144     9.83   0.000     .2676314    .4009654
   destitute |   .3144392   .0246612    12.75   0.000     .2661042    .3627742

“GLM with a log link models the logarithm of the expected value of Y, conditional on X”, as explain Partha Deb and Edward C. Norton. I make this comparison to assess whether the semielasticity [margins, eydx] estimates made from the previous results of a GLM model with a log link would be distorted or not. It seems not. If the margins command [with eydx] had strictly calculated the logarithm from previous estimates already in a logarithmic scale the compression of the differences would be very large. Just calculate the logarithm of a value already in logarithm to show the degree of compression that occurs with its average. In the estimate presented no compression occurred. It appears that the margins command identifies the situation and uses the predicted average income retransformed to the original metric. As it is in the “Expression” posted at the output for both models: Predicted mean income, predict().
Note that a similar procedure cannot be done with OLS models with a logged dependent variable. “OLS regression with a log-transformed dependent variable models the expected value of the logarithm of Y conditional on X”, as explain Partha Deb and Edward C. Norton. As the dependent variable is already in log, there will be a strong compression of the estimated value.

A comment is welcome,

José Alcides

Invalid IVs with Fixed Effects XTIVREG2

Hi,

I am currently writing my thesis, using panel data to assess the relationship between financial development and economic growth. I want to control for alleged endogeneity, in the financial development indicators, I have used xtivreg2, but the Hansen J Stat is significant, which states that that the instruments are invalid. What would be suggested to do now?

This is the code I used:

xtivreg2 loggdppercap GrosscapitalformationofGD Schoolenrollmentsecondary (StockmarketcapitalizationtoG LiquidliabilitiestoGDPG = l.StockmarketcapitalizationtoG l2.StockmarketcapitalizationtoG l.LiquidliabilitiestoGDPG l2.LiquidliabilitiestoGDPG ) , fe cluster(c_id) endog( LiquidliabilitiestoGDPG StockmarketcapitalizationtoG)

Results:

------------------------
Number of groups = 89 Obs per group: min = 3
avg = 18.1
max = 28

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on c_id

Number of clusters (c_id) = 89 Number of obs = 1610
F( 4, 88) = 16.32
Prob > F = 0.0000
Total (centered) SS = 51.85112797 Centered R2 = 0.3885
Total (uncentered) SS = 51.85112797 Uncentered R2 = 0.3885
Residual SS = 31.70731729 Root MSE = .1444

----------------------------------------------------------------------------------------------
| Robust
loggdppercap | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
StockmarketcapitalizationtoG | .000945 .0005407 1.75 0.080 -.0001147 .0020047
LiquidliabilitiestoGDPG | .000902 .0005874 1.54 0.125 -.0002492 .0020533
GrosscapitalformationofGD | .0047421 .0028177 1.68 0.092 -.0007804 .0102646
Schoolenrollmentsecondary | .0092139 .0014798 6.23 0.000 .0063135 .0121143
----------------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 18.271
Chi-sq(3) P-val = 0.0004
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 1543.425
(Kleibergen-Paap rk Wald F statistic): 2035.606
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 11.04
10% maximal IV relative bias 7.56
20% maximal IV relative bias 5.57
30% maximal IV relative bias 4.73
10% maximal IV size 16.87
15% maximal IV size 9.93
20% maximal IV size 7.54
25% maximal IV size 6.28
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 9.791
Chi-sq(2) P-val = 0.0075
-endog- option:
Endogeneity test of endogenous regressors: 12.028
Chi-sq(2) P-val = 0.0024
Regressors tested: LiquidliabilitiestoGDPG StockmarketcapitalizationtoG
------------------------------------------------------------------------------
Instrumented: StockmarketcapitalizationtoG LiquidliabilitiestoGDPG
Included instruments: GrosscapitalformationofGD Schoolenrollmentsecondary
Excluded instruments: L.StockmarketcapitalizationtoG
L2.StockmarketcapitalizationtoG L.LiquidliabilitiestoGDPG
L2.LiquidliabilitiestoGDPG

Any help is much appreciated!

Set https proxy

Hi Statalisters,

I have some problems at importing data directly from internet.
I am using Stata in my company laptop, and I need to set up a proxy to avoid the company firewall. Therefore, I set the http proxy via netio:

Code:

set httpproxyhost my_proxy_host
set httpproxyport my_port
set httpproxyuser my_username
set httpproxypw   my_password
set httpproxy on
set httpproxyauth on

The proxy is correctly set. Nonetheless, I have problems at importing data from https URLs.

For instance, this code works fine:

Code:

import delimited http://www.stata.com/examples/auto.csv, clear

Whereas the following lines of code do not work:

Code:

import delimited https://covid.ourworldindata.org/data/owid-covid-data.csv, clear
import delimited https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv, clear

PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
could not open url
r(603)

I guess I need to set also a proxy for https website.
Do you know how to set an https proxy, or a workaround to solve this issue? I looked for a solution in help netio but I could not find anything.

My version of Stata is

Code:

. about
Stata/MP 16.0 for Windows (64-bit x86-64)
Revision 16 Oct 2019

Thanks in advance for your help!

Sorting the order of bar graph values while using by

Hello!

I want to order the results of a bar graph in descending order. I have two facets/panels as I want to disaggregate results by gender, male and female. But I want to order both panels/bars in descending order based on the order of how women responded. I am using this code:

graph hbar Q101A , over( Q102 ) by( Q1002 )

to get the attached graph but if I use the descending option the two panels are ordered differently. Any tips? Thanks! Array

Sequential Treatment Effects

Hi,

I have been struggling for a model to estimate related to sequential treatment effect and need a help desperately.
I would greatly appreciate it if you guide me to the resources or advice me on this matter.

In some situation in reality, treatment affects outcome, and outcome affects treatment in the next period.
For example, if we see the effect of construction of post box in the municipality and then increase the mail so that increase in the mail leads to more post box construction.

It can be interpreted "reverse causality", but, in the sense that outcome does not affect treatment in the past, it has a following sequential I guess. (Arrows means causal relationship).

T0 -> Y0 -> T1 -> Y1 -> T2 -> Y2 -> T3 -> Y3.....

To be specific:

Code:

Y_it = a + B*T_it + u_it

T_it = c + b*T_it-1 + e_it-1

There are several period t = 1,2,3,4,...,n, and T is a treatment/intervention variable (dummy) and Y is outcome (continuous; it can be converted into a dummy if needed).

Does this model can be appropriately estimated by simply using lag term?
For the causal inferences, what would be the best way to estimate this kind of mechanism?

Please kindly give your advice on this issue. Thank you in advance.

Stata - statistical graphing using "coefplot" by presenting odd ratio with 95%CI

Dear colleagues, I wish this message find you well.
I am wondering how to use "coefplot" by presenting odd ratio with 95%CI?
If "coefplot" is not working, is there any other solution.
I don't have raw data, I only want to plot odd ratio with 95% by below simple data.
Looking forward to hearing from you.
Thank you very much.
Best regards, Jiancong

Name	Lower	Upper	Odds
G-LIS	1.0	2.3	1.5
Swedish BAM	2.0	7.0	3.7

Durbin Watson d-statistic

Hi, I am running a regression with multiple lagged variables of the dependent variable

Code:

reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort CSAD_L1 CSAD_L2 CSAD_L3 CSAD_L4 CSAD_L5 CSAD_L6 CSAD_L7 CSAD_L8

, can one use the Durbin Watson d-statistic, (

Code:

estat dwatson

) to check whether serial correlation has been removed from my initial model

Code:

reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

?

I am concerned that the Durbin Watson d-statistic can only be used wen there is one lag of the dependent variable from what I have understood online, but I am not sure, could someone clarify this to me please.

It is also confusing because when running the Durbin Watson d-statistic on Stata I get a closer value to 2 (about 2.005) when running the test on only 2 lags of the dependent variable, whereas when running the test on 8 lags the Durbin Watson d-statistic is around 1.95.
Is this because the Durbin Watson d-statistic cannot be used to regressions that use more than one lag of the dependent variable on the RHS of the regression?

Thank you.

Non-linear fixed effects regression

Hi all!

I am currently working on my University dissertation and would greatly appreciate any help, as I am completely new to Stata and my University doesn't have any resources on it. I am trying to run a regression with several studies, looking at age of participants (x axis) and the concentration of a specific biomarker (y axis). I think the appropriate analysis to conduct in order to achieve a plot would be a non-linear fixed effects regression.

Would anyone please be able to advise me on whether this would be the correct analysis, and if so, whether this is achievable on Stata 15. I have been scouring the internet for help and have struggled to find anything, so any help (either on this forum or private messaged) would be greatly appreciated!

Many thanks

Predict Residuals in SQREG - DO File

Hi there - I am using the following code to predict the residuals for SQREG and want some guidance if this is the right way to predict residuals? I didn't find any example that uses predict with SQREG so need help.

Code:

set seed 896321
sqreg y rmrf hml smb mom, quantiles (10 25 50 75 95 99) rep(100)
predict y1, equation(q10) residuals
predict y2, equation(q25) residuals
predict y3, equation(q50) residuals
predict y4, equation(q75) residuals
predict y5, equation(q95) residuals
predict y6, equation(q99) residuals

once I predict the residuals, I need to perform the SQREG again on the residuals. I used the below commands and getting separate output for each - is there any possibility to get the output in one table?

Code:

sqreg y1 rm rm2, quantiles (10) rep(100)
sqreg y2 rm rm2, quantiles (25) rep(100)
sqreg y3 rm rm2, quantiles (50) rep(100)
sqreg y4 rm rm2, quantiles (75) rep(100)
sqreg y5 rm rm2, quantiles (95) rep(100)
sqreg y6 rm rm2, quantiles (99) rep(100)

Merge 1:1 problem

Hi,

I am trying to merge two excel datasets which have unemployment and education data at the US county level (by FIPS code). Both datasets have the same exact number of rows (one for each county) and when I upload them to Stata it works fine. However, when I try to merge them with "merge 1:1 FIPS using educationdata" it keeps keeps telling me "variable FIPS does not uniquely identify observations in the master data". Does anyone know why this may be happening? I also tried m:1 and 1:m (although I think 1:1 is the correct one given I am just adding columns) but I get the same problem.

Thank you very much!

Joan

Neumark or Oaxaca-Ransom

Quick question regarding decomposition methods and the use of omega with the Oaxaca command.

https://core.ac.uk/download/pdf/6442665.pdf page 16 indicates the use of omega with the Oaxaca command gives you the Oaxaca and Ransom's decomposition.

https://www.stata.com/statalist/arch.../msg00585.html indicates the use of omega with the Oaxaca command gives you the Neumark decomposition.

Which one is correct? Sorry to be a pain, just a tad confused!

Loop and Append .dta files

Hello All,

I am trying to import d.ta file and append them, what is wrong with this code? For some reason it only uses the first file from 1995, but does not continue with the subsequent years. Thanks in advance

clear all
cd "mydir"
save file_all_green, replace emptyok
forvalues i= 1995(1)2018 {
use country_partner_hsproduct6digit_year_`i',clear
keep year export_value location_code partner_code hs_product_code
append using "file_all_green"
save file_all_green,replace
}

graph with many lines

Dear All, Suppose that I have the following data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float(year y)
 1 2006   11.695
 1 2007   16.758
 1 2008   21.342
 1 2009   23.993
 1 2010   26.932
 1 2011   28.151
 1 2012   32.635
 1 2013   34.133
 1 2014   38.974
 1 2015   34.534
 1 2016   37.355
 2 2006   18.837
 2 2007   22.279
 2 2008   27.338
 2 2009   30.184
 2 2010   33.188
 2 2011   32.255
 2 2012   35.877
 2 2013   37.679
 2 2014   45.389
 2 2015   43.858
 2 2016   54.987
 3 2006  102.782
 3 2007   84.356
 3 2008   94.765
 3 2009  539.106
 3 2010 1228.086
 3 2011 1290.933
 3 2012 2019.757
 3 2013 2263.635
 3 2014 3393.346
 3 2015 3437.425
 3 2016 2845.547
 4 2006    4.616
 4 2007    4.786
 4 2008    5.955
 4 2009    7.835
 4 2010    7.253
 4 2011   10.058
 4 2012    11.42
 4 2013   13.421
 4 2014   16.156
 4 2015   17.725
 4 2016   20.976
 5 2006    3.866
 5 2007    4.401
 5 2008    4.724
 5 2009    6.238
 5 2010     6.32
 5 2011    6.717
 5 2012    8.057
 5 2013    8.715
 5 2014    10.17
 5 2015   10.911
 5 2016   12.611
 6 2006    9.166
 6 2007   11.723
 6 2008   18.203
 6 2009   19.796
 6 2010   18.498
 6 2011   14.654
 6 2012    15.82
 6 2013   15.202
 6 2014   20.151
 6 2015   19.021
 6 2016   21.341
 7 2006    8.055
 7 2007   10.428
 7 2008   12.545
 7 2009   13.285
 7 2010   19.067
 7 2011   31.286
 7 2012   34.669
 7 2013    35.08
 7 2014   36.112
 7 2015   32.004
 7 2016   32.868
 8 2006    1.165
 8 2007    1.242
 8 2008    1.802
 8 2009    2.518
 8 2010    1.763
 8 2011    1.604
 8 2012    1.972
 8 2013    2.332
 8 2014    2.649
 8 2015    2.975
 8 2016    3.617
 9 2006  721.351
 9 2007  737.638
 9 2008  674.506
 9 2009  662.644
 9 2010  701.561
 9 2011  902.733
 9 2012 1037.067
 9 2013 1019.113
 9 2014  1195.34
 9 2015 1187.552
 9 2016 1379.233
10 2006   77.254
10 2007  102.233
10 2008  124.216
10 2009   64.794
10 2010   99.241
10 2011  133.793
10 2012  158.205
10 2013   241.44
10 2014  390.768
10 2015  508.226
10 2016  693.224
11 2006   59.295
11 2007   82.017
11 2008   95.538
11 2009  107.414
11 2010   209.93
11 2011  230.368
11 2012  262.935
11 2013  307.755
11 2014   339.11
11 2015   321.27
11 2016   339.71
12 2006    1.574
12 2007    2.089
12 2008    2.502
12 2009    2.875
12 2010    3.528
12 2011    3.928
12 2012    4.923
12 2013    5.177
12 2014    6.048
12 2015    6.223
12 2016    8.335
13 2006    5.164
13 2007    7.853
13 2008    9.425
13 2009   11.877
13 2010   15.165
13 2011   19.709
13 2012   23.011
13 2013   20.953
13 2014   25.282
13 2015   30.153
13 2016   48.198
14 2006   98.383
14 2007  115.322
14 2008  121.286
14 2009  121.988
14 2010  157.046
14 2011  214.113
14 2012  243.034
14 2013  273.245
14 2014  278.919
14 2015   290.18
14 2016   292.24
15 2006    5.456
15 2007    7.543
15 2008    9.091
15 2009    10.93
15 2010   14.164
15 2011   23.195
15 2012   27.525
15 2013   37.062
15 2014   34.344
15 2015    52.64
15 2016  110.116
16 2006   86.239
16 2007   92.724
16 2008   97.854
16 2009  103.557
16 2010  111.093
16 2011  101.673
16 2012  106.843
16 2013  111.016
16 2014  131.714
16 2015   143.14
16 2016  166.044
17 2006     .614
17 2007     .545
17 2008     .629
17 2009     .512
17 2010     .589
17 2011     .689
17 2012      .67
17 2013     .601
17 2014     .698
17 2015     .726
17 2016     .731
end
label values id id
label def id 1 "AUS", modify
label def id 2 "CAN", modify
label def id 3 "CHN", modify
label def id 4 "DEU", modify
label def id 5 "FRA", modify
label def id 6 "GBR", modify
label def id 7 "IDN", modify
label def id 8 "ITA", modify
label def id 9 "JPN", modify
label def id 10 "KOR", modify
label def id 11 "MYS", modify
label def id 12 "NLD", modify
label def id 13 "PHL", modify
label def id 14 "SGP", modify
label def id 15 "THA", modify
label def id 16 "USA", modify
label def id 17 "ZAF", modify

with code

Code:

xtline y, overlay ytitle("y") ylabel(0(1000)4000) legend(position(11) cols(1) ring(0))

and graph like Array

I wonder if anyone can suggest a way to improve the quality of graph as above. Thanks.

Saturday, February 27, 2021

How to Drop Duplicate ID Observations if There are Multiple Conditions I Want to Apply

Hello Everyone,

I hope that you could help me with the below.

I have a cross-sectional dataset of 343 observations of students' scores in a test. Students are from different schools and grades. However, some students have solved the test multiple times and thus resulting in duplicates.

I have multiple conditions that I would like to tell STATA in order to drop specific duplicates:
1. I would like to drop the duplicate with a missing "Score".
2. If the duplicate does not have any missing scores, I would like to drop the duplicate with the earliest recorded date "StartDate".

A snippet of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(StartDate id) float(schoolname2 gender2 grade2) byte(Score tag) float dup
     1928621099000 505711 0 0 0  . 1 1
     1928621827000 505711 0 0 0  9 1 2
1928624421000.0002 505713 0 0 0 19 1 1
     1928624452000 505713 0 0 0 15 1 2
     1928623906000 505715 0 0 0 20 0 0
     1928621142000 505716 0 0 0 14 0 0
1928621051000.0002 505718 0 0 0 18 0 0
     1928623971000 505724 0 0 0 13 0 0
     1928614160000 505726 0 0 0 15 1 1
1928627513000.0002 505726 0 0 0 16 1 2
end
format %tcnn/dd/ccYY_hh:MM StartDate

To do this I have initially typed in the following syntax:

Code:

duplicates report id schoolname2
duplicates list id schoolname2, sepby (id)
duplicates tag id schoolname2, gen (tag)
duplicates list id schoolname2 if tag >=1, sepby (id)
sort schoolname2 id 
quietly by schoolname2 id: gen dup = cond(_N==1,0,_n) if schoolname!="" | id!=.
sort schoolname2 id StartDate

However, I could not come up with the code to achieve the above conditions.

Thank you. Looking forward.

Synthetic Control - minimum number of obs?

Can I build a synthetic control with 6 pre-treatment observations and 4 post-treatment? Is it valid, or does it raise questions about suitability? Do you know any paper with such a few number of pre-/post-treatment obs.?

The intersection of two Chinese variables

Dear All, I found this question here (in Chinese). The data set is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str20(var1 var2 newvar)
"中国人民大学" "中国农业大学" "中国大学"
"北京大学"       "清华大学"       "大学"      
"北京科技大学" "北京大学"       "北京大学"
"北京师范大学" "上海师范大学" "师范大学"
end

Given "var1" and "var2", the desired result is "newvar". Basically, "newvar" is the words appearing in both "var1" and "var2". Thanks.

Building cross-lagged panel models with categorical variables on Stata

Can someone here suggest to me a suitable way to build a cross-lagged panel model with categorical variables on Stata MP 15.1? I built a cross-lagged panel model with categorical variables on Stata. Xs are binary variables, while Ys are continuous variables. They are repeated measurement across 5 waves. Equality constraints (i.e., a and b) were added as they have reciprocal associations. The causal directions of these variables are shown below. Array

In the first place, I used generalised response variables (i.e., family/link: Bernoulli / logit) to construct the variables of x, but I got errors when Stata displayed the GSEM results. Then, I changed the generalised response variables to observed variables (i.e., squares), and it ran smoothly. However, the effects of y on categorical x can’t be explained by the coefficients. That is, squares are for continuous variables, but Xs are binary variable. Alternatively, I should not use squares to construct Xs. What techniques should I apply to deal with this issue?

Seasonality in annual data? Problem or not?

Hello, I have a panel data at an annual frequency. When I use the xtline command to get the linear plots, my data have obvious fluctuations and when I do further analyses, the results are not coming out as desired. I wonder if I should care about seasonality in annual data. Thank you!

Ranking the most important predictor variables from a series of regression models.

Hi,

I came across an interesting study that did the following:

'number of births in the last 5 years and the number of household members were the “most important” features for predicting whether a mother reported the death of a neonate. Out of the 20 models (10 countries and 2 DHS surveys per country) trained to predict neonatal mortality, the number of births in the last 5 years ranked first in all models except Burkina Faso 2003, Tanzania 2015, and Zambia 2007, where the feature ranked second.'

I am wondering whether this is possible to do using Stata.

Please let me know if you are aware of any process similar to this.

Thank you.

reshape or transpose?

Dear All, I was asked this question here (https://bbs.pinggu.org/forum.php?mod...a=#pid74912031). The data set is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long idstd float(lat lon) str13 longname
518601  -1.95972 30.12857 "56"           
518601  -1.95972 30.12857 "RN3"          
518601  -1.95972 30.12857 "Nyarugunga"   
518602 -1.936392 30.09101 "23"           
518602 -1.936392 30.09101 "KG 594 Street"
518602 -1.936392 30.09101 "Kacyiru"      
end

The desired output is that, for each "idstd", all the information is restructured in the same row (with elements in "type" as the variable names), i.e.,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long idstd float(lat lon) str13(street_number route political)
518601  -1.95972 30.12857 "56" "RN3"           "Nyarugunga"
518602 -1.936392 30.09101 "23" "KG 594 Street" "Kacyiru"   
end

Any suggestions are highly appreciated.

using forvalues loop and postfile command

Dear statalist
i have a sample of only 3 observation, i having problem calculating outer product of the gradient , the gradient and updating formula using forvalues loop and postfile command, i get an error message "post sim not found" another question could i use a variable or macro to hold the opg, gr and theta instead of scalar, any help would be greatly appreciated , my do file is as follows

clear all
set obs 100
input y
3.5
1
1.5
end
sca theta1 =1
gen f = -1/theta1 + y/theta1^2
tempname sim
tempfile result
postfile `sim' opg gr theta using result, replace
forvalues i=1/100 {
qui {
sca opg`i' = 1/3*(-1/theta`i' + y[1]/theta`i'^2 )^2 + 1/3*(-1/theta`i' + y[2]/theta`i'^2)^2+ 1/3*(-1/theta`i' + y[3]/theta`i'^2)^2 /* calculating OPG for every iteration*/

sca gr`i' = 1/3*(-1/theta`i' + y[1]/theta`i'^2 ) + 1/3*(-1/theta`i' + y[2]/theta`i'^2)+ 1/3*(-1/theta`i' + y[3]/theta`i'^2) /* calculating the gradient at every iteration*/
loc j=`i'+1
sca theta`j' = theta`i' + (opg`i')^-1*gr`i' /* updating formula*/
post sim (opg`i') (gr`i') (theta`j')
}
}
postclose `sim'
use result, clear
i get error message!!!
post sim not found
r(111);

end of do-file

r(111);

Differences between dates using panel data

Hi there,

I'm working with some panel data, whereby for each subject I have multiple events (lab tests) recorded in long form. Each event is dated and I need to calculate whether two specific tests (test_id 12 and 147) are conducted within 3 months of each other (to assess whether testing/diagnostic guidelines are being followed). Subjects may have multiple instances of each test.

At this moment in time, I'm considering looping through all instances of these tests to ascertain whether the other test occurs within 3 months. Given some subjects have 100s of test records and the dataset contains 100,000s of subjects, I'm wondering whether there is a more efficient method to derive my requirements.
(I found plenty of existing threads on the topic of dates and panel data, but nothing akin to the above)

Thanks in advance,

Rob.

Graph with conditioning

Hi there, I would like to create a graph with two lines one for each condition. This is what I tried:

graph twoway (line mean_BARTAdj weekday_ordered if origin_condition==0, line mean_BARTAdj weekday_ordered if origin_condition==1)

Continous-Continous Interaction when one of the variables has many zeros.

Hello,

I am interested in measuring how previous purchases from a given platform and time spent on the platform impact the probability of the purchase of a given item. I am using the following model

Code:

probit purchase c.lntimespent##c.lnpreviouspurchases i.categoryFE i.timeFe, cluster(customer)

However given that many customers are first time customers, my interaction term gets zero many times. In other words, although customer might spent a lot of time on the platform, if it is first time buyer, the interaction term always gets to value of zero.

What should I do in this? Center the variables? Any good references about what to do?

Using returned results in r() with -egen- is not working. Is that a bug?

Why is this not working?

Code:

. sysuse auto, clear
(1978 Automobile Data)

. summ price

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906

. egen meanprice = mean(price/r(max)), by(rep)
(74 missing values generated)

. summ meanprice

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   meanprice |          0

.

Spearman correlation table export into Excel

Hi everyone,
I have a problem trying to export my Spearman correlation results into Excel.

I use:
matrix A = r(Rho)
esttab matrix(A, fmt(%5.2f)) using "Results.csv", replace

Using this command I have the correlation table BUT without stars (0.01) + I need only print (0.1) with 4 signs after comma but in Excel I get everything exported with 2 signs.
Is it anyhow possible to get only the needed results exported and with the stars?***

I find the solution only for "correlation" and not for "spearman".
Thank you a lot in advance!

Interpretation Interactionterm in fixed effect regression

Dear all

,

I have performed a fixed effect regression with cluster(id)

xreg y i.FFF i.young FFF#young + Controls, fe cluster(id)

i use an interaction term with 2 dummy variables.

FFF=1 Family firm else FFF =0

young = 1 if firm age < 20years else young = 0

my dependent variable is a measure of innovation

Var | Coef | Std.Err | t | P>t

1.FFF | .0010349 | .0007549 | 1.37 | 0.171
young | .0000637 | .0007209 | 0.09 | 0.930
1.young | 0 (omitted)

FFF#young | -.0001431 | .001038 | -0.14 | 0.890
1 1

margins i.iFFF_PU##i.young

Predictive margins
Model VCE : Robust
FFF Margin Std. Err.
0 | .0045633 | .0001112
1 | .0055404 | .0003584
young
0 | .0047803 | .0002522
1 | .0048104 .0003708

iFFF_PU#young |
0 0 | .0045375 | .0003248
0 1 | .0046012 | .0004298
1 0 | .0055725 | .0006025
1 1 | .005493 | .000499

i am not sure how to interpret the interaction term.

is my assumption correct that the family firm's influence on the mean of the innovation measure decreased by -.0001431 when FFF=1 and young=1 ?

I also don't understand why the interaction term in the regression is negative, and the margins are all shown positive.

Maybe someone would have a hint or advice for me

xtprobit margin problem

Hello to All Stata Users!

HTML Code:

 xtoprobit y x z, vce(cluster id)

is the model I run . However when I run the margin commands:

HTML Code:

margins, predict(pu0) dydx(*)

the stata gives me following output:

HTML Code:

 "y" not found in list of covariates

. I know the importance of running probit with exp.variables respected according to whether fact/cont. However y is dependent variable here !
Thank you for your advice

merging datasets with different dimensions

Hi all,

I would like to merge together 16 datasets which are defined by country and are made as follows:
AUS.dta is:
var1
Mol1_a a1
Mol2_a a2
Mol3_a a3
Mol4_a a4
Mol5_a a5
...
Molk_a ak

GER.dta is:

var1
Mol1_g g1
Mol2_g g2
Mol3_g g3
Mol4_g g4
Mol5_g g5
...
Molk_g gk

where Moli_g is different from Moli_a. Both are string variables. And so on for other 14 countries.dta.
Now what I would like to do (possibly in a loop) is to merge together the datasets so that if a value (molecule) is missing in a country the data is maintained only in the country(its) where var1 is not missing and in the others it is put as missing.
For instance say Mol2_a and Mol5_a are present in AUS.dta but not in GER.dta and at the same time Mol1_g in GER.dta is not present in AUS.dta while other values are in common in the sense that Mol3_a coincides with Mol6_g for instance and Mol4_a with Mol5_g (and other with others molecules in GER.dta e.g. Mol6_a with g23 and so on)The resulting db should look like this:

GER AUS

Moll1_g g1 .
Moll1_a g2 .
Moll2_g g2 .
Moll2_a . a2
Moll3_g g3 a7
Moll3_a g6 a3
Moll4_g g2 .
Moll4_a g5 a4
Moll5_g g5 a8
Moll5_a . g5
Moll6_g g6 a3
Moll6_a a2 g23

Then I'll drop the duplicates...

Impact of an average change on dependent variable over time

Hello! For my thesis I wanted to test a hypothesis which requires a model that I wouldn't know how to design myself. For a panel data (12 years) sample of 500 companies, I have a dummy variable that equals 1 if a company discloses a certain information variable, and another dummy variable that equals 1 if a company is loss-making (not profitable). I should test whether companies stop disclosing the information when they become profitable.

example: company A is loss-making from 2007-2013 and discloses the information, and is profitable from 2014 and onwards and stops disclosing the information
--> I want to test if this, on average, holds for every company

My Stata knowledge is kind of limited, I only got a minor course about it last year, and I really have no idea how I would be able to test this

I hope there is someone here who could help me!

Using reghdfe command with if-statements

Hello, bit of a complex one here:

I’m currently working as a research assistant, using my supervisor’s code, which uses employee-level data for a firm which “de-trashes” stock coming into its warehouse i.e., removes transit packaging.
The code is designed to estimate productivity, measured in units [de-trashed] per minute (upm). It uses the reghdfe command, a linear regression that absorbs multiple layers of fixed effects. It also uses an independent variable called PLANNED_UPH which is a target that, if reached, workers get paid a bonus.
The fixed effects used in the regression equation are:

fe3_j (SKU code i.e., product fixed effects)
fe3_i (worker fixed effects)
fe3_t (date fixed effects)
fe3_dow (day of week fixed effects)
fe3_shift (shift type fixed effects i.e., day, early or late shift)
fe3_h (hour of the day fixed effects)
fe3_handle (handling class fixed effects)
fe3_station (warehouse workstation fixed effects)
fe3_group (group of workers fixed effects)

The code is as follows:

reghdfe uph PLANNED_UPH, ///
absorb(fe3_j=SKU_ID fe3_i=user_code fe3_t=date_code fe3_dow=dow fe3_shift=shift_type fe3_h=HourDay1 ///
fe3_handle=HANDLING_CLASS fe3_station=STATION_ID fe3_group=GROUP_ID)
quietly estadd local controls "Yes"
quietly estadd local FE_t "Yes"
quietly estadd local FE_i "Yes"
quietly estadd local FE_j "Yes"
est store H3

The output (H3) is as follows:

HDFE Linear regression			Number of obs =	2,480,900
Absorbing 9 HDFE groups			F( 1,2454358) =	1.66
			Prob > F =	0.1971
			R-squared =	0.5447
			Adj R-squared =	0.5398
			Within R-sq. =	0
			Root MSE =	0.2292


uph Coef.	Std. Err.	t	P>t [95% Conf.	Interval]

PLANNED_UPH -2.25e-06	1.75E-06	-1.29	0.197 -5.68e-06	1.17E-06
_cons .4962852	0.002311	214.75	0.000 .4917558	0.5008146

Absorbed degrees of freedom:

Absorbed FE	Categories	Redundant	Num. Coefs
			-
SKU_ID	25692	0	25692
user_code	567	1	566
date_code	232	1	231
dow	7	7	0
shift_type	3	1	2
HourDay1	9	1	8
HANDLING_CLASS	2	2	0
STATION_ID	38	1	37
GROUP_ID	7	2	5

What I have been asked to do is to first, split the data in half by date (I did this by just creating binary dummies called split1 and split2 to represent data from the first and second halves of the year, respectively). I then have to run the same regression again for just the first half and then copy the values of the coefficients on the fixed effects into the data subset from the second half.

To run the regression on the first half of code, I thought of running the code with if-statements so that the regressions would only run if split1==1. Then for each user ID (worker), I could copy the coefficients from split1 to split2 somehow, then run the code only for split2. However, wherever I place the if-statements in the code, it returns with errors. I’m grateful for any ideas, thanks.

AR(1) is insignificant in difference GMM

Hello !

I am trying to estimate the following model in both difference and system GMM. However, the results change significantly both in terms of significance and in terms of the coefficients. Moreover, in difference GMM AR(1)=0.8 whereas in system GMM AR(2)=0.979 but AR(3)=0.031. What does this suggest and how I can fix this problem?

I have 33 groups and a time period=5.

Difference GMM

xtabond2 diff_gdp log_initial log_Mcap log_Liab log_trade log_school log_govsize log_infl td*, gmm( log_initi
> al , collapse sp) gmm( L.( log_Mcap log_Liab log_trade log_govsize log_school log_infl ), collapse) iv( td*
> ) robust two small ar(3) nolevel
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
res}Warning: split has no effect in Difference GMM.
td5 dropped due to collinearity
Warning: Two-step estimated covariance matrix of moments is singular.
Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step difference GMM
------------------------------------------------------------------------------
Group variable: Country_ Number of obs = 126
Time variable : period Number of groups = 33
Number of instruments = 26 Obs per group: min = 2
F(0, 33) = . avg = 3.82
Prob > F = . max = 4
------------------------------------------------------------------------------
| Corrected
diff_gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_initial | -.0766945 .059806 -1.28 0.209 -.1983708 .0449818
log_Mcap | .0410073 .0179618 2.28 0.029 .0044639 .0775508
log_Liab | .0198893 .0579202 0.34 0.733 -.0979503 .1377289
log_trade | -.1288009 .1143785 -1.13 0.268 -.3615057 .1039039
log_school | -.0846866 .0904975 -0.94 0.356 -.2688052 .099432
log_govsize | .0193566 .0762848 0.25 0.801 -.135846 .1745592
log_infl | .0061522 .0101976 0.60 0.550 -.0145949 .0268993
td1 | -.0457857 .0314332 -1.46 0.155 -.109737 .0181657
td2 | -.0283545 .0247333 -1.15 0.260 -.0786748 .0219658
td3 | -.0247058 .0176698 -1.40 0.171 -.0606554 .0112437
td4 | -.0070187 .0076711 -0.91 0.367 -.0226256 .0085883
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(td1 td2 td3 td4 td5)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/4).(L.log_Mcap L.log_Liab L.log_trade L.log_govsize L.log_school
L.log_infl) collapsed
L(1/4).log_initial collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -0.25 Pr > z = 0.804
Arellano-Bond test for AR(2) in first differences: z = 0.56 Pr > z = 0.576
Arellano-Bond test for AR(3) in first differences: z = -1.07 Pr > z = 0.285
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(15) = 12.90 Prob > chi2 = 0.610
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(15) = 14.73 Prob > chi2 = 0.471
(Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
gmm(log_initial, collapse lag(1 .))
Hansen test excluding group: chi2(11) = 11.10 Prob > chi2 = 0.435
Difference (null H = exogenous): chi2(4) = 3.62 Prob > chi2 = 0.459
iv(td1 td2 td3 td4 td5)
Hansen test excluding group: chi2(11) = 8.51 Prob > chi2 = 0.667
Difference (null H = exogenous): chi2(4) = 6.22 Prob > chi2 = 0.183

System GMM

xtabond2 diff_gdp log_initial log_Mcap log_Liab log_trade log_school log_govsize log_infl td*, gmm( log_initi
> al , collapse sp) gmm( L.( log_Mcap log_Liab log_trade log_govsize log_school log_infl ), collapse) iv( td*
> ) robust two small ar(3)
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
td2 dropped due to collinearity
Warning: Number of instruments may be large relative to number of observations.
Warning: Two-step estimated covariance matrix of moments is singular.
Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: Country_ Number of obs = 160
Time variable : period Number of groups = 33
Number of instruments = 34 Obs per group: min = 3
F(11, 32) = 34.48 avg = 4.85
Prob > F = 0.000 max = 5
------------------------------------------------------------------------------
| Corrected
diff_gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_initial | -.0108161 .0085561 -1.26 0.215 -.0282444 .0066121
log_Mcap | .0029089 .0072577 0.40 0.691 -.0118746 .0176924
log_Liab | -.0101865 .0105349 -0.97 0.341 -.0316455 .0112725
log_trade | .0107354 .0097941 1.10 0.281 -.0092146 .0306853
log_school | .0400703 .0233032 1.72 0.095 -.0073967 .0875373
log_govsize | .0007092 .0124605 0.06 0.955 -.0246721 .0260904
log_infl | -.0005491 .0050937 -0.11 0.915 -.0109245 .0098264
td1 | -.0057598 .0053446 -1.08 0.289 -.0166465 .0051269
td3 | -.0054331 .0042171 -1.29 0.207 -.014023 .0031569
td4 | -.006852 .0052233 -1.31 0.199 -.0174916 .0037875
td5 | -.0181138 .0077344 -2.34 0.026 -.0338683 -.0023593
_cons | .025506 .1402732 0.18 0.857 -.2602211 .3112332
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(td1 td2 td3 td4 td5)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/4).(L.log_Mcap L.log_Liab L.log_trade L.log_govsize L.log_school
L.log_infl) collapsed
L(1/4).log_initial collapsed
Instruments for levels equation
Standard
td1 td2 td3 td4 td5
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.(L.log_Mcap L.log_Liab L.log_trade L.log_govsize L.log_school
L.log_infl) collapsed
D.log_initial collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -1.87 Pr > z = 0.062
Arellano-Bond test for AR(2) in first differences: z = 0.03 Pr > z = 0.979
Arellano-Bond test for AR(3) in first differences: z = -2.16 Pr > z = 0.031
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(22) = 43.85 Prob > chi2 = 0.004
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(22) = 25.91 Prob > chi2 = 0.255
(Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
GMM instruments for levels
Hansen test excluding group: chi2(15) = 15.40 Prob > chi2 = 0.423
Difference (null H = exogenous): chi2(7) = 10.52 Prob > chi2 = 0.161
gmm(log_initial, collapse eq(diff) lag(1 4))
Hansen test excluding group: chi2(18) = 23.43 Prob > chi2 = 0.175
Difference (null H = exogenous): chi2(4) = 2.49 Prob > chi2 = 0.647
gmm(log_initial, collapse eq(diff) lag(1 4)) eq(level) lag(0 0))
Hansen test excluding group: chi2(21) = 25.91 Prob > chi2 = 0.210
Difference (null H = exogenous): chi2(1) = -0.00 Prob > chi2 = 1.000
iv(td1 td2 td3 td4 td5)
Hansen test excluding group: chi2(18) = 24.99 Prob > chi2 = 0.125
Difference (null H = exogenous): chi2(4) = 0.92 Prob > chi2 = 0.921

doflist command in reghdfe

Hi there,

I am currently having issues with collinearity of fixed effects in my 'reghdfe- regression'. I read online that the 'doflist- comannd' (Degrees of freedom adjustments) could help. Yet I do not know how to implement that command in my regression.

The do-file for the regression looks like this:

eststo FullSample: reghdfe CashtoAt TXUNCER fiveyearRTC fiveyearCASHETR fcons nol lossfirm NWC lev EBITDA MTBR size divpo capEx aquist atCF resDev, absorb(industry* year*) vce(cluster gvkey fyear)

Where and how could I implement doflist now to control for collinearity ??

I would be very grateful for any reply. Thanks in advance.

Using starting values to overcome convergence problem - always getting r(503)

Dear all,

I am trying to fix convergence problem of my model using starting values as suggested many times here on statalist. However, I am always getting the following error:

a 0 x 0 is not a vector
an error occurred when mi estimate executed melogit on m=1
r(503)

simplified example of my models and starting values usage:

Code:

mi estimate, dots cmdok: melogit volba X Y Z [pweight=VAHA] || okres_cd:, covariance(unstructured) noestimate

mat a=e(b)

mi estimate, dots cmdok: melogit volba X Y Z [pweight=VAHA] || okres_cd: Z, covariance(unstructured) from(a)

My version of Stata is 16.1 MP.

Could you tell me where the mistake is and give me a suggestion of code change please? Thanks in advance!

Pooled OLS Firm effect vs Industry Effect

Hello

I have explored this forum for the question of firm vs industry fixed effect. But they always talk about panel regression "xtreg".
In my case, I have unbalanced Panel Data and whenever I use Panel regression, the results are insignificant and can't be used to explain my inferences. So I follow Wooldridge(2010) and due to unbalanced nature of my panel, I use POLS.
Since, I am working on firms' share price returns, I want to fix the firms so to avoid any firm effects. I have 18 industries and more than 4000 firms per year for 13 years.

Code:

regress y x1 x2 x3 x4 x5 x6 i.industry i.year, robust

With this code, I am able to control year and industry, but it doesn't seem practical to control for firms "i.firms" because of number of observations.

My questions,
1. is it sufficient to apply industry and exclude firm control.
2. my friend suggested to use cluster

Code:

regress y x1 x2 x3 x4 x5 x6 i.industry i.year, robust cluster(firms)

This code above only deals with the standard errors not fixing the firm effect (as per my knowledge).
Can you please give me any suggestion to fix the firm effect in POLS?

Thanks!
Qazi

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long Firms int Data_Year long Sector1 float(y x1 x2 x3 x4 x5 x6)
 2 2006  1          .            .       .            .         .         . .
 2 2007  1          .            .       .            .         .         . .
 2 2008  1          .            .       .            .         .         . .
 2 2009  1          .            .       .            .         .         . .
 2 2010  1          .            .       .            .         .         . .
 2 2011  1          .            .       .            .         .         . .
 2 2012  1          .            .       .            .         .         . .
 2 2013  1          .            .       .            .         .         . .
 2 2014  1          .            .       .            .         .         . .
 2 2015  1          .            .       .            .         .         . .
 2 2016  1          .            .       .            .         .         . .
 2 2017  1          .            .       .            .         .         . .
 2 2018  1          .            .       .            .         .         . .
 3 2006 15          .     .0588601 1.94666    .22520296  2.539871         0 1
 3 2007 15          .   .033332642 2.93485      .192904  2.547168         0 1
 3 2008 15          .    .01940063 1.84911    .14878628 2.5697694         0 1
 3 2009 15          .    .05142462 .953542      .246432  2.456603         0 1
 3 2010 15          .    .20495068 1.22109      .178483  2.408386         0 1
 3 2011 15          .   .009054668  1.4233    .11290167 2.4134254         0 1
 3 2012 15          .   .010843783 1.39291    .05148486  2.418654         0 1
 3 2013 15 -.45762715    .00931621 2.33558            0  2.398067         0 1
 3 2014 15          .   .010274763 2.00774            0  2.427436         0 1
 3 2015 15          .   .013881045 3.23829    .23651055   2.69642         0 0
 3 2016 15  -.8986493    .04266848 2.41774    .19479223  2.701517         0 0
 3 2017 15  -.8433397    .08782447 2.25026    .18349776 2.7423086         0 0
 3 2018 15          .   .013235296 2.57593    .16161986  2.756552         0 0
 5 2006  8          .            . .978182 .00024064996 1.5905855         0 0
 5 2007  8          .            . .647546            0  1.549604         . 0
 5 2008  8          .            .       .            .         .         . .
 5 2009  8          .            .       .            .         .         . .
 5 2010  8          .            .       .            .         .         . .
 5 2011  8          .            .       .            .         .         . .
 5 2012  8          .            .       .            .         .         . .
 5 2013  8          .            .       .            .         .         . .
 5 2014  8          .            .       .            .         .         . .
 5 2015  8 -1.2913605            .       .     1.524727 -1.555955         . 0
 5 2016  8          .            . .235211   .013280518 1.1000602         . 0
 5 2017  8          .            .       .            .         .         . .
 5 2018  8          .            .       .            .         .         . .
13 2006  4          .    .56787515       .    .54185754  1.618226         0 0
13 2007  4          .            .       .            .         .         . .
13 2008  4          .            .       .            .         .         . .
13 2009  4          .            .       .            .         .         . .
13 2010  4          .            .       .            .         .         . .
13 2011  4          .            .       .            .         .         . .
13 2012  4          .            .       .            .         .         . .
13 2013  4          .            .       .            .         .         . .
13 2014  4          .            .       .            .         .         . .
13 2015  4          .            .       .            .         .         . .
13 2016  4          .            .       .            .         .         . .
13 2017  4          .            .       .            .         .         . .
13 2018  4          .            .       .            .         .         . .
14 2006 16          .     .3742465 1.62565    .15619987 3.3326585   .845098 1
14 2007 16          .    .04114099 1.02707    .14252478   3.24923    .69897 1
14 2008 16          .   .009839674 1.51208    .08373041   3.25896    .60206 1
14 2009 16          .            .       .            .         .    .60206 .
14 2010 16          .            .       .            .         .         . .
14 2011 16          .            .       .            .         .         . .
14 2012 16          .            .       .            .         .         . .
14 2013 16          .            .       .            .         .         . .
14 2014 16          .            .       .            .         .         . .
14 2015 16          .            .       .            .         .         . .
14 2016 16          .            .       .            .         .         . .
14 2017 16          .            .       .            .         .         . .
14 2018 16          .            .       .            .         .         . .
15 2006  7          .    .02684195 4.37205    .14560093 2.2206154    .60206 0
15 2007  7          .   .011010598 3.26782    .05175494 2.2237165  .4771213 0
15 2008  7          .  .0012241866 1.73697    .05533915 2.1846972  .4771213 0
15 2009  7          .    .03988571 2.45002    .05487922 2.1772566  .4771213 0
15 2010  7          .    .05301388 5.52119    .03857759 2.3197305  .4771213 0
15 2011  7  -.6537684   .010773852 2.86267     .2996194  2.665557    .69897 0
15 2012  7 -1.2263236   .013061752 6.60859    .12986204  2.830872   .845098 0
15 2013  7  -.9993629   .006437473 10.2847   .017026918  3.040543   1.20412 0
15 2014  7          .  .0007668933 2.83528   .005819082 3.1847794 1.3802112 0
15 2015  7          .   .006556697 1.48682   .009178674  2.950345   1.30103 0
15 2016  7          .    .06107769 2.46043   .008934786  2.928986 1.2552725 0
15 2017  7          .    .04057406 1.60284   .007892824  2.952678  1.230449 0
15 2018  7          .    .01205367 2.03453    .03801258 2.9168916 1.0791812 0
16 2006  7          .            .       .            .         .         . .
16 2007  7          .            .       .            .         .         . .
16 2008  7          .            .       .            .         .         . .
16 2009  7          .            .       .            .         .         . .
16 2010  7          .            .       .            .         .         . .
16 2011  7          .            .       .            .         .         . .
16 2012  7          .            .       .            .         .         . .
16 2013  7          .            .       .            .         .         . .
16 2014  7          .            .       .            .         .         . .
16 2015  7          .            .       .            .         .         . .
16 2016  7          .            .       .            .         .         . .
16 2017  7          .            .       .            .         .         . .
16 2018  7          .            .       .            .         .         . .
17 2006 14          .  .0036297094 5.74644    .05222128  4.328257 1.1139433 1
17 2007 14          .  .0083065005 5.09032     .1655463 4.3925915 1.1760913 1
17 2008 14          . .00050353137 4.03952    .20253557  4.411502 1.1760913 1
17 2009 14          .    .02329066 4.60241    .19097248 4.4353666  1.146128 1
17 2010 14          .   .008952872 3.92285    .14182915  4.479374 1.1760913 1
17 2011 14          .  .0042309016 3.68352    .14432566  4.499907 1.2787536 1
17 2012 14  -1.780515   .016203607 3.62995    .14721337  4.529892  1.230449 1
17 2013 14          .    .02400288 5.31523    .13067064 4.5256925 1.2552725 1
17 2014 14   -2.93236    .01348641 7.96134    .21756545   4.49428  1.230449 1
end
label values Firms Firms
label def Firms 2 "1-800-Attorney, Inc.", modify
label def Firms 3 "1-800-FLOWERS.COM, Inc. Class A", modify
label def Firms 5 "1PM Industries, Inc.", modify
label def Firms 13 "360 Global Wine Co Com New", modify
label def Firms 14 "3Com Corp", modify
label def Firms 15 "3D Systems Corporation", modify
label def Firms 16 "3Dfx Interactive", modify
label def Firms 17 "3M Company", modify
label values Sector1 Sector1
label def Sector1 1 "Commercial Services", modify
label def Sector1 4 "Consumer Non-Durables", modify
label def Sector1 7 "Electronic Technology", modify
label def Sector1 8 "Energy Minerals", modify
label def Sector1 14 "Producer Manufacturing", modify
label def Sector1 15 "Retail Trade", modify
label def Sector1 16 "Technology Services", modify

Quadratic term and concave relationship threshold

Hello everyone,

I am exploring whether the hypothesis that above a certain threshold financial depth yields negative results for economic growth. For this purpose, I take the quadratic term of my variable of interest, namely bank credit to the private sector. First I estimate the simple OLS with robust standard errors, which shows that there is indeed a quadratic relationship and that upon a certain threshold financial depth has a negative impact on growth.

reg gr linitial prcreditBI prcreditBI2 log_trade log_govsize log_school log_infl , robust

Linear regression Number of obs = 64
F(7, 56) = 8.36
Prob > F = 0.0000
R-squared = 0.5037
Root MSE = 1.1029

------------------------------------------------------------------------------
| Robust
gr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
linitial | -.9851453 .335963 -2.93 0.005 -1.65816 -.3121306
prcreditBI | 6.026801 2.051998 2.94 0.005 1.916155 10.13745
prcreditBI2 | -3.052359 .9509696 -3.21 0.002 -4.95738 -1.147338
log_trade | -.001041 .2636371 -0.00 0.997 -.5291696 .5270876
log_govsize | -1.297607 .5570733 -2.33 0.023 -2.413559 -.1816553
log_school | 2.171247 .8182169 2.65 0.010 .532161 3.810332
log_infl | .0640021 .1290749 0.50 0.622 -.1945661 .3225703
_cons | 7.712365 1.98893 3.88 0.000 3.72806 11.69667

As you can see the coefficient of prcreditBI2 becomes negative. Nevertheless, I do not know how to find the exact threshold when bank credit to the private sector starts yielding a negative effect on growth (e.g credit to the private sector starts yielding negative effects on growth when it reaches 80%of GDP). Can somebody help me with this?

Thank you!

Missing values in snapspan command @StephenJenkins

Dear Stephen Jenkins,

I'm following your Survival Analysis guideline (https://www.iser.essex.ac.uk/resourc...sis-with-stata) and An Introduction to Survival Analysis Using Stata by Cleves et al. (2008).

I would like to ask your help about transforming panel data to duration data. I have a panel data with monthly observations of individuals' depression status. My research question is how individual characteristics affect the duration of depression stage. The problem is when I use the snapspan command, I could not control interval truncation in the data and missing observations counted as they were observed.

Current format of my data for individual A is:

Code:

     +------------------------+
     | month   event   gender |
     |------------------------|
A. |     1       0   Female |
A. |     6       0   Female |
A. |     8       1   Female |
A. |     9       1   Female |
A. |    10       1   Female |
     +------------------------+

For individual A month shows the month of interview, event=1 if individual depressed in this month, gender is one of the individual characteristics. As you can see I only observe individual A in January, June, August, September and October, while she missed the other 7 interviews. When I use the snapspan command

Code:

snapspan individualidsys month event, gen(date0) replace
rename month date1

The transformed data became:

Code:

     +--------------------------------+
     | date0   date1   event   gender |
     |--------------------------------|
A. |     .       1       0        . |
A. |     1       6       0   Female |
A. |     6       8       1   Female |
A. |     8       9       1   Female |
A. |     9      10       1   Female |
     +--------------------------------+

In the duration data it seems like I observed the individual A from month 1 (January) to month 6 (June) and from month 6 to month 8 which are not the case. My questions:

1) Would that be a problem, if so how should I fix the missing observation problem in transforming the data?
2) I'm planning to use following stset command

Code:

stset date1, id(id) time0(date0) failure(event==1)

do I need to define enter, exit or origin options in case of multiple failures?

Best regards,
John

Bar graph with multiple means of variables

Hello!

I want to create a bar graph with the means of several variables on the x axis. Any idea how I would do something like that?

Friday, February 26, 2021

dtalink vs reclink2 vs matchit

Hello, I've tried to tag this topic on an older thread, but haven't gotten any responses, so was trying to create a new conversation.
I have seen different descriptions comparing -matchit- and -reclink2-. There is also a Stata pdf presentation on -dtalink-. Some of the materials are pretty complex for these packages, but would any interested parties be able to give even a brief overview on when you would use one of these vs the other and which is better for what type of tasks? I will be using large data sets, (~500,000 observations), trying to perform fuzzy/imperfect matches on names, date of birth, event dates, identification numbers. Thanks!

Business calender generates year 1967 dates

Hi Statlisters,

I have succesfully loaded my business calender by using the below code, however when i generate my "bdate" variable it generates dates from 1967, I have searched through the different threads in here, but I dont seem to find any solution to my problem. My original date variable is named "date" - I am new to STATA so i generated the business calender based on post from statalist.org.

// Business calender code
purpose "Converting daily financial data into business calendar dates"
dateformat dmy
range 01jan2010 24feb2022
centerdate 01jan2010
omit dayofweek (Sa Su)
omit date 17feb2020

// Load business calender
bcal load daily

// Generate new date variable based on business calender
gen bdate = bofd("daily",date)
format %td bdate

//Time series
tsset ID bdate

When I run this code it generates the following output
panel variable: ID (strongly balanced)
time variable: bdate, 16mar1967 to 19apr1967
delta: 1 day

Can someone help me out on this issue?

Best regards,
Jeppe S

Verification of logic behind IV approach

I hope this is not too elementary a post for this forum. However, since I am not sure, I thought I would give it a shot.

I would like to know whether the following reasoning regarding the instrumental variable approach is acceptable. I understand there are case-by-case factors that affect the applicability of instruments that I am discussing below. But I just want to know if the general logic is correct or if I am missing something.

I am studying the effect of a state-level policy on a state-level outcome, y1. The simplest strategy would be to run a regression with the policy variable (p1) as a covariate along with a set of controls (x1-x5):

regress y1 p1 x1 x2 x3 x4 x5

But it is possible that there is reverse causality. Let's say that one of the factors driving the reverse causality is that there are special interests that would benefit from the policy, and states with higher values for the outcome variable have stronger special interests that lobby policymakers to implement it. If campaign contributions (camcon) only have an indirect causal influence on the dependent variable through the policy, then it would satisfy the exclusion restriction for instruments.

ivregress 2sls y1 x1 x2 x3 x4 x5 (p1 = camcon)

However, if there are multiple policies that benefit the special interests, it is likely that the special interests lobby for the other policies as well. If these policies were exogenous, then we would just need to include these other policies in the second-stage regression.

ivregress 2sls y1 p2 x1 x2 x3 x4 x5 (p1 = camcon)

But if the special interests are lobbying for the second policy, the policy would not be exogenous; there is reverse causality, as with the first policy. Thus, we need to treat the second policy as an endogenous variable as well. Therefore, we should try to find all the policies that could influence the outcome variable and treat them as endogenous variables. To do so, we would need as many instruments as there are endogenous variables. Assuming that the policies are the only endogenous variables, the instruments have no direct causal impact on the dependent variable, and the instruments are not correlated with the error in the second-stage regression, we should be able to adequately control for endogeneity in the estimation of the second-stage regression. If there are three endogenous policies and thus three instruments (i.e., camcon z2 z3), we would run the following:

ivregress 2sls y1 x1 x2 x3 x4 x5 (p1 p2 p3 = camcon z2 z3)

Note: Crossposted here:
https://stats.stackexchange.com/ques...iable-approach
https://www.reddit.com/r/econometric...e_iv_approach/

Exporting frequency table using esttab

Hi, I am trying to export the frequency table using esttab command. My observations are in the decimal point. When I use the following command, it generates the attached output. Can anybody please help me with the code to get the correct observations in the frequency table? Thanks a lot. :

bcuse wage1
estpost tab lwage
esttab using "educ_frequency_esttab.csv", cells ("b(label(freq)fmt(0)) pct (fmt(2))") nomtitle nonumber replace
Array

Encountering error r(198) while running a loop in stata

HI,

I am encountering error while running a loop in stata. The following is my code

HTML Code:

forvalues i = 1/`MaxLPLags' {

  foreach var in state_gdp state_exp TE_shock {

    gen bad`var'`i' = L`i'.`var'*`state'
    gen good`var'`i' = L`i'.`var'*(1-`state')
 
  }
}

I have declared state, MaxPLags, as locals. When I run this loop, I come across

HTML Code:

invalid syntax
r(198);

I am not sure why I am getting this error. Some help is appreciated.
Thank you.

Regards
Indrani

Kaplan Meier, Multi-survival analysis on one graph

Hello,

I am able to graph my trade spell duration survival over a 30year period using the

Code:

sts graph if export_CAD <=100000

to show the survival time of exports worth less than $100,000. If possible how would I show a second line on the same graph showing exports over $100,000. And is it possible to see in a table format the values for each each year? below is an example of what I was able to produce.
Array

Which method does xtlogit/clogit use to estimate a fixed effects model (i.e. mean difference/first difference/LSDV)?

In Asymmetric Fixed-effects Models for Panel Data (available here open access: https://journals.sagepub.com/doi/10....78023119826441) Paul D. Allison indicates the following:

For the two-period case, there are several equivalent ways to estimate the fixed-effects model, all producing identical estimates.

Here are the three most common methods:
1. Least squares dummy variables (LSDV).
2. Mean deviation.
3. First difference.

I have also seen mean deviation referred to as mean difference. My question is, which method does xtlogit use and which does clogit use, and is this the same method regardless of number of time periods?

I have read the manuals (xtlogit: https://www.stata.com/manuals/xtxtlogit.pdf) and (clogit: https://www.stata.com/manuals/rclogit.pdf) but find myself lost in some of the more technical terms...

Any help would be appreciated!

John

Making tables from data

Hi,

I've district-level migration data of India.

Say-

State no	district no	migration type 1 (percentage share)
1	1	39.7
1	2	21.5
2	3	15.3
2	4	17.2

I want to make tables like-

migration type 1 (%)	districts that fall under this
1-10	0
10-20	3,4
20-30	2
30-40	1
40-50	0

How can I do that?

Also, later I want to see which state is affected most.

In the above tables, you can see that in state 2, nearly 10-20 per cent of type 1 migration is seen.

If I understand this I can also do the same for migration type 2,3,4 & 5.

Please advise,

Missing values in Stata

I am creating my own data set for my thesis, and have run into a problem with missing values. I am creating the data set in Excel, and i coded missing values with . , but when I import the data set to Stata it does not recognize . as missing value. The count variables are not registered as count variables, and I am not able to run any statistical tests on the variables. How do I deal with this?

Interpretation of AMEs: CIs overlapping each other but not zero, can I conclude that there is an interaction?

Hi all,

I'm fitting a simple OLS-model including an interaction term. To interpret the interaction I compute AMEs:

Code:

reg ITL OS LFR GD IW CWL LS noempl i.sex i.poor_srh cheftot c.CV##i.function
margins, dydx(CV) at (function=(1 2 3))
marginsplot

Here is the output:

Code:

Average marginal effects                        Number of obs     =        220
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : CV

1._at        : function      =           1

2._at        : function      =           2

3._at        : function      =           3
------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
CV           |
         _at |
          1  |   .3672521   .1679337     2.19   0.030     .0361438    .6983604
          2  |   .0134818   .1664265     0.08   0.936    -.3146549    .3416185
          3  |   .4481599   .2125448     2.11   0.036     .0290936    .8672261

------------------------------------------------------------------------------

My question is a basic and general one - I'm wondering whether I can conclude that "the effect of CV depends on function group"? What makes me doubt such a conclusion is the fact that, while the effect of CV is only significant in two of the function groups, all three CIs overlap each other?

Best,
Caroline

Strange macro behaviour

Hi guys,

I am trying, with the following loop to drop some of my observations, specifically the ones that have quarters below the molecule with the highest quarter and the ones that have quarters above the molecule with the lowest quarter (so if I have mol1 appearing from 2012Q3 and mol2 ending in 2015q3 and other molecules ranging from 2009q3 to 2020q3 I would like to keep only observations from 2012q3 to 2015q3):

Code:

quietly drop if trimestre < `min' | trimestre > `max'

Now I would like to reproduce it on several datasets but it turns out that if I use the code for a single dataset it works as expected, whereas if I use it in the following lop it doesn't (i.e. it does not raise any error but it does no change to the data as if a local is not read or something like that):

Code:

use "/Users/federiconutarelli/Dropbox/PhD/Elasticities/2008_2020_db/dati_per_paese/2008_2020_prd.dta", clear
drop if Country =="ITALY"
levelsof Countr, local(levels)
foreach l of local levels {
    use "/Users/federiconutarelli/Dropbox/PhD/Elasticities/2008_2020_db/dati_per_paese/data_ctry/`l'_new.dta", clear
    quietly levelsof Mole, local(molecules)
    foreach k of local molecules {
        local rex =  strtoname("`k'")
        use "/Users/federiconutarelli/Desktop/here/`l'/`rex'.dta", clear
        gen price = sales/stdunits
        gen ln_price=ln(price)
        gen ln_stdunits = ln(stdunits)
        quietly sum panelsize
        local min_panelsize = r(min)
        sum trimestre if panelsize == `min_panelsize'
        local min = r(min)
        local max = r(max)
        quietly drop if trimestre < `min' | trimestre > `max'
        quietly drop panels*
        bysort id_prd id_mole: gen  panelsize = _N
        save "/Users/federiconutarelli/Desktop/here/`l'/`rex'.dta",replace
    }
 
}

Is there something wrong I am doing here?

Thank you,

Federico

F-keys in profile.do

Dear all,

I am generating my file profile.do:

Code:

/*===========================================================================
project: profile
Author: Dario Maimone Ansaldo Patti
---------------------------------------------------------------------------
Creation Date: February 19, 2021
===========================================================================*/

/*==============================================================================
    Program set up
==============================================================================*/

set more off, permanently
set r on, permanently

clear
clear matrix
clear mata
set matsize 10000

/*==============================================================================
    Startpush notifications
==============================================================================*/

statapushpref, token(o.NSrMrdnu0LsjkEsr3rhtmGwtCDyJ4ptf) userid(dmaimone@unime.it) provider(pushbullet)

/*==============================================================================
    Setting graph scheme
==============================================================================*/

set scheme s2mono

grstyle init
grstyle set plain, nogrid

/*==============================================================================
    Shortcuts
==============================================================================*/

global F2 "`"
global F3 "'"

Apparently, when I open Stata, I cannot use the shortcuts in the do file. Is there any way I can do that or it does work only from the command window?

Thanks for your kind attention.

Dario