BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Monday, October 31, 2022

Create Risk Prediction Variable from 2 Different Heckman Correction Models

Hi everyone,

I'm wondering if there's a way to create a risk prediction variable from the results of two different Heckman Correction models--one for Selection Bias and one for Self-Selection Bias. I think I've seen this done in academic literature before but the paper didn''t explain its methodology in detail. For example, let's say I have;

// For Self-Selection Bias (Variables for Heckman: lnsales, lnemp, fcf_w, mktshr, mss, hhi, lnbseg, lngseg, unrate):
heckman status c.preld c.durld c.postld i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl c.bm_w, select(status = i.regul c.lnsales c.lnemp c.fcf_w c.mktshr c.mss c.hhi c.lnbseg c.lngseg c.unrate c.bm_w i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl) nocons twostep first mills(imr) //vce(bootstrap)

/* (After running the first heckman bit of code just above, type "drop imr" in the command line before you run the second heckman bit of code just below here.) */
// For Selection Bias (Variables for Heckman: FPS variable, fcf_w):
heckman status c.preld c.durld c.postld i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl c.bm_w c.fcf_w, select(status = i.fps c.bm_w i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl c.fcf_w) nocons twostep first mills(imr) //vce(bootstrap)

(Note that variables preld, durld and postld needed to be removed from the 1st stages of the models because of the economic nature of those variables.)

The top model involves 9 "Heckman variables," while the bottom one only involves 2. One such variable is common to both models--fcf_w.
Any advice would be greatly appreciated, thank you!!
J.J.

Using mca or pca to generate a single variable

Dear colleagues I know I can use principle component analysis (PCA) on categorical variables by first coding them as dummies. Multiple correspondent analysis(MCA) can also be used but you don't need to convert your categorical variables into dummies. Let me assume I will convert my variables into dummies and use PCA and ask the below question.
I have A latent variabel called Accessibility inferred by five likert questions (scale 1-7) and other questions below:
1 How much do you spend on NHIF
2 Mode of NHIF payment code 1 '"Out pocket" 2 "Sallary deduction" 3 "Insurance"
I want combine all these questions to generate a singel variable called Accessibility and I plan to generate a set of dummies from my categorical variables inorder to use pca.

suppose my dummies were a b c d e f g h for categorical variables and I do:
pca a b c d e f g h How much do you spend on NHIF
predict comp
Can I use comp as my Accessibility variable?I just want to rename it to Accessibility and use it.I know the predicted comp will be first principle component anyway.
The main question here is can I work with the first component as my Accessibility variable?

I plan to use the same Idea for affordability and quality variables which are inferred by other several questions.Please advice

ipdmetan - estimation_command

I would like to use ipdmetan to do an IPD meta-analysis for studies with treatment response yes/no as the outcome. The sample code provided shows how to set up the estimation_command for hazard ratios, but not for dichotomous or 2x2 format outcomes. Any help would be much appreciated, and apologies if this is a stupid question! I'm a primary care researcher without access to a biostatistician who works with IPD meta-analysis. David Fisher

Testing dominance between Lorenz curves

Hello Stata Experts,

I will request information on the Stata code to test for dominance between two or more Lorenz curves. I am using Stata 15.1. Earlier, I used 'dominance' - a user written command but now it is nowhere to be found.

Thank you
Deepali

Conducting multinomial regression on multiply imputed variables for subset data

Hi
Is there a way to conduct multinomial regression for a weighted subsample datasets? For e.g. I used mi estimate: mlogit DV IV1 IV2 IV2[iweight=ngwt] for the full sample set N = 5777. However, out of the full sample set, I want to conduct multinomial regression on households that have received at least one welfare assistance (e.g. TANF, SNAP, WIC, etc). So I created a new variable called assist and replaced it with a binary (y/n) coded as 1 if assist (households with any of the welfare assistance), new sample set N = 676.. Then I ran a mi estimate:mlogit. However, the number of observations were the same as the full sample set 5777. Why is that?

Secondly, if I want to change my condition to households receiving at least ONE welfare assistance rather than ANY, e.g. so that I can compare households who did NOT receive welfare assistance vs households who received at least ONE, how would I code it? Someone suggested creating a by id and sort with egen?
Can I still use mi estimate:mlogit function with weights?
Please advise.
Thank you
Lena

Date values are different

Hello, I have values for the dates given in LB21B.
I used the code egen LB21B_12=convertCMC(LB21B)
when I apply %td on both values, none of its showing the correct date and both of them are different form each other as well. The right years would be somewhere beteen 2000 and 2005, but here some times it goes back to 1900, or 1959 or forward even 3033.
How do I correct this?
Im using Stata 14 MP.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(LB21B LB21B_12)
180605       .
180605       .
140103       .
140103       .
140103       .
140103       .
140103       .
140103       .
140103       .
 10100  285474
 10100  285481
300903       .
300903       .
     0       .
     0       .
150802       .
150802       .
150802       .
150802       .
150802       .
     0       .
     0       .
     0       .
     0       .
280604       .
     0       .
     0       .
     0       .
     0       .
     0       .
 70600 2126902
 70600 2126905
260604       .
260604       .
260604       .
260604       .
260604       .
261002       .
120604       .
120604       .
120604       .
131101       .
290203       .
290203       .
270802       .
270802       .
270802       .
     0       .
     0       .
170604       .
170604       .
 10301  291601
 10301  291602
 10301  291595
 90903 2744870
 90903 2744865
 90903 2744869
121001       .
250505       .
250505       .
250505       .
280103       .
280103       .
220302       .
240604       .
240604       .
240604       .
 51102 1533454
 51102 1533445
 50103 1503054
 10702  303815
 10702  303803
 10702  303806
 10702  303806
 10702  303810
 10702  303805
 10702  303805
 10702  303806
301103       .
301103       .
 50404 1512210
200700       .
 51002 1530403
120304       .
120504       .
120504       .
130503       .
130304       .
130304       .
110603       .
110603       .
171203       .
171203       .
180902       .
180902       .
 91102 2750919
 91102 2750933
250603       .
250603       .
 21203  623438
end
format %td LB21B
format %td LB21B_12

A conceptual question about when should I add fixed effect and cluster the fixed effect?

Suppose I add a time-fixed effect to a panel data regression that I want to estimate using OLS. My question is conceptually when should I also cluster by time (in addition to adding fixed effects)?

How can I get several variables' mean values at the smae

I want to create a vector containing several variables' mean values. But the summarize command only return the last variables' mean value stored in r(mean).

Although I can make it by repeating the summarize command, I just want to ask if there is any command can return a list of variables' mean values?

Interpreting coefficients (percentage points vs percentage)

I was reading a paper that ran a difference-in-differences regression and the coefficient value was -0.036. The dependent variable is vote shares and none of the variables were logged. The author wrote that the impact of X on Y is a decrease of 3.6% of vote shares.
Is this interpretation correct?.. I thought the coefficients are percentage points not actual percentages.. This is an article published at a top social science journal and I feel like I'm the one missing something here.

In addition, for simple OLS regressions, are coefficients usually percentages? Sometimes people say percentages and sometimes percentage points.. and this has been giving me a headache.. Can anyone help me clarify please?

Interpreting coefficients (percentage vs percentage points)

Sunday, October 30, 2022

Old bug

Dear All,

I hope someone following Stata for many years can help me with remembering: there used to be a problem (and I hope I can recollect correctly) where a method was introduced and then later withdrawn by the StataCorp. Here is what I remember, any or all of the below may be wrong:

- it was related to calculation of standard errors or confidence intervals;
- with survey data
- probably introduced in version 8 and recalled (disabled) in version 9, or in the 9 to 10 transition, but unlikely anything newer.
- could be survey proportion or survey ratio commands, or something similar.
- could be https://www.stata.com/statalist/arch.../msg00126.html , but I somehow remember this regarding the SVY subset of commands.

I believe this was mentioned earlier in the Statalist, but googling now didn't result in any mentioning of that problem.

If anyone can point to more details, please do.

Much appreciated, Sergiy Radyakin

What is a GROUP variable in difference in differences analysis?

Query: What is a GROUP variable in difference in differences analysis?

Study: Newspaper articles from 1985-2017 were categorized as either in favor, against, or neutral toward a specific topic. These are weighted and graphed.
Hypothesis: An event happened around 2007 that triggered a change in attitudes. I'd like to test this with a DID analysis.
Unit of Analysis: Newspapers for each year.

Outcome Variable: Favor -defined as newspaper articles with a favorable stance
Treatment: Treatment -defined as 0/1 "before 2008" and "2008 and on"
Time Variable: Year -defined as years 1985-2017
Group: ???

Syntax:didregress (Favor) (Treatment), group(?????) time(Year)

DID Model: See the image of the model
Array

A Control Variable?: I don't know if this is needed for DID. I created a dummy variable of the mean of Favor and Against.

Example Data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int Year byte(Favor Against Neutral Treatment Control)
1985  29 71  0 0 50
1986  16 44 40 0 30
1987  43 43 14 0 43
1988  51 27 22 0 39
1995  60 10 30 0 35
1996  33 57 10 0 45
1997  30 69  1 0 50
1998  50  0 50 0 25
1999  44 31 25 0 38
2000  42 40 18 0 41
2001  80 20  0 0 50
2003  80 20  0 0 50
2004   0 75 25 0 38
2005  38  2 60 0 20
2006 100  0  0 0 50
2007  15 55 30 0 35
2008  19 68 13 1 44
2009  44 38 18 1 41
2010  72 14 14 1 43
2016  88 12  0 1 50
2017  82  9  9 1 46
2018  79  7 14 1 43
end

What is a GROUP variable?

Testing for model specification error in fractional regression

Hello,

I employed fractional regression model proposed by Wooldridge and Papke to estimate the dependent variable (a proportional variable that computes remaining/whole) of my study. The dependent variable falls between 0 and 1 (including 1). I ran the regression using the fracreg command on stata. However, I would like to run Ramsey reset test for model specification errors, as Wooldridge reports the reset test value in his paper. When I entered the command "ovtest" and "linktest", Stata states the following:

. estat ovtest
estat ovtest not valid
r(321);

. linktest
not possible after fracreg
r(131);

How do I test for model specification after fracreg command? What other assumptions must be evaluated after estimating fractional regression. As such, in OLS regression, one would test if assumptions such as homoscedasticity, absence of autocorrelation and normality is violated. Similarly, what other tests should one evaluate after fractional regression?

Appreciate your advice.

VAR and Impulse response function

Dear Statalists,

I am working on a project where I need to
1. estimate the reduced form VAR to predict the residuals ( the shock/ the innovation) and the contemporaneous terms ( details in picture)
2. estimate the long-run (t=15) cumulative return responses to unit shocks of the innovations in structured VAR ( theta_rm, theta_x, theta_r). The short-run restriction is that rm can affect x and r, x can affect r, while r cannot affect rm and x contemporaneously.
This is the reduced VAR This is the structured VAR This is the relationship of innovation between reduced and structured VAR, considering the contemporaneous terms.
Array Array Array

Code:

var vwretd sdvol RET, lags(1/5)
predict resid_rm, residuals equation (vwretd) 
predict resid_x, residuals equation (sdvol) 
predict resid_r, residuals equation (RET)

reg resid_x resid_rm
rename b_resid_rm b_1_0
reg resid_r resid_rm resid_x
rename b_resid_rm c_1_0
rename b_resid_x c_2_0

After estimating all the parameters in reduced VAR, I need to get the cumulative impulse response function in structure VAR.

Code:

gen sdvol_1 = sdvol - b_1_0*resid_rm
gen r_1 = RET - c_1_0*resid_rm - c_2_0*resid_x 
var vwretd sdvol_1 r_1, lags(1/5)
irf create myirf, set(myirfs, replace) step(15)
irf table cirf, impulse(vwretd) response(vwretd sdvol_1 r_1)  individual step(15)
irf table cirf, impulse(sdvol_1) response(vwretd sdvol_1 r_1) individual step(15)
irf table cirf, impulse(r_1) response(vwretd sdvol_1 r_1) individual step(15)

Here I have the problem:
1. What I am doing is basically deduct the contemporaneous terms in the variables of reduced form, and estimate the VAR all over again. Is it correct to do such procedure? Or is there other way to acquire the cirf at step =15 of structured VAR?
2. In the irf file, with 3 variables, there are in total 3^2 impulse-response relationships. Yet, I am asked to calculate the response to unit shock of each innovation. Does this mean the theta is a actually a vector? E.g. the response to unit shock of vwretd is the sum of response (vwretd sdvol_1 r_1)?
3. I noticed there is also -svar- in Stata. Is the mechanism of -svar- the same as my procedure from reduced VAR to structured VAR?
4. How can I return the response values in each step stored in irf file? I have checked the reference: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://ift.tt/LzyrZU3
With the reference, I tried

Code:

irf describe myirf
mat list _dta(step)
or 
mat list r(step)

But get the error

Code:

matrix _dta(step) not found
matrix r(step) not found

Sorry the questions are a bit messily presented. The main questions I would love to ask are the problem 1-4.

Separating a string variable into separate variables

I have a string variable CODEX which has the underlying cause of death coded first, and any secondary causes of death following. The cause of deaths codes are from the ICD-10 codes (International Classification for Disease). All the causes of death are separated by spaces or commas.

I am attempting to separate this string variable CODEX into separate cause of death variables DEATH1, DEATH2 etc. Any suggestions for potential codes for this would be greatly appreciated as I am not too sure how to approach this issue.

Saturday, October 29, 2022

Generating Variable That Depicts Increase

My dataset comes from a survey with 5 waves that has a general score (variable GScore). I am trying to do the following:

1) I am trying to generate a new variables that shows the General Score increase from each wave to the next (so that a decrease in scores will be a negative value). What code would work best for this? I am currently trying to use the following unsuccessfully:

bysort id: gen GScore_increase = GScore - GScore[_n-1]

2) What code would calculate the overall mean increase (or potential decrease) in the GScore across all waves for all participants?

3) Is there an efficient code I could use to determine how many participants decreased or maintained the same GScore across all waves (i.e. how many never increased their Gscore across all the waves)?

SEM modeling ( path model)

Dear ll,

I hope you are doing well.

I'm working on my paper and i would like to study the mediating role of M on the relationship between X and Y like it's being showing following:

Array

Is it correct to study it using the sem command??
Is it called structural equations model? If yes, is it SEM path model ?

kind regards

sdid in case of unbalanced panel

I need to run synthetic difference is difference regression. Therefore, I need balanced panel data. But, as you can see my sample year is from 2000-2021. So, there are 22 years total. For all the counties at least one year info for desired variable wanted is missing. When I'm running this following command it's telling me year and county are missing

Code:

tsset county year
isid county year, sort
variables county and year should never be missing
r(459);

When I'm running the following command all the observations are getting dropped out - indicating not even a single county has variable wanted for 22 years.

Code:

by county (year): keep if _N == 22

I'm attaching a part of my data

Code:

* Example generated by -dataex-. For more info, type help dataex clear

input float(wanted county year)

2 1011 2002
2 1011 2003
1 1011 2004
1 1011 2019
1 1027 2000
2 1027 2002
1 1027 2008
1 1027 2009
1 1027 2013
1 1027 2018
4 1001 2000
3 1001 2001
1 1001 2002
1 1001 2003
3 1001 2004
5 1001 2005
2 1001 2006
3 1001 2007
2 1001 2008
2 1001 2009
3 1001 2010
2 1001 2011
7 1001 2012
3 1001 2013
3 1001 2014
3 1001 2015
2 1001 2016
7 1001 2017
11 1001 2018
3 1001 2019
3 1001 2020

end

After using - tsfill, full -command I was successful to keep the disappearing counties to show up in my data to make it strongly balanced.

Code:

tsset county year

tsfill, full

Then I replaced my wanted variable with 0 when wanted == . This is actually right since wanted is 0 when it doesn't show up in my data.

[CODE]

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(wanted county year policy)
    4 1001 2000 0
    3 1001 2001 0
    1 1001 2002 0
    1 1001 2003 0
    3 1001 2004 0
    5 1001 2005 0
    2 1001 2006 0
    3 1001 2007 0
    2 1001 2008 0
    2 1001 2009 0
    3 1001 2010 0
    2 1001 2011 0
    7 1001 2012 0
    3 1001 2013 0
    3 1001 2014 1
    3 1001 2015 1
    2 1001 2016 1
    7 1001 2017 1
   11 1001 2018 1
    3 1001 2019 1
    3 1001 2020 1
0 1001 2021 0
0 1001 2022 0
0 1011 2000 0
0 1011 2001 0
    2 1011 2002 0
    2 1011 2003 0
    1 1011 2004 0
0 1011 2005 0
0 1011 2006 0
0 1011 2007 0
0 1011 2008 0
0 1011 2009 0
0 1011 2010 0
0 1011 2011 0
0 1011 2012 0
0 1011 2013 0
0 1011 2014 0
0 1011 2015 0
0 1011 2016 0
0 1011 2017 0
0 1011 2018 0
    1 1011 2019 0
0 1011 2020 0
0 1011 2021 0
0 1011 2022 0
    1 1027 2000 0
0 1027 2001 0
    2 1027 2002 0
0 1027 2003 0
0 1027 2004 0
0 1027 2005 0
0 1027 2006 0
0 1027 2007 0
    1 1027 2008 0
    1 1027 2009 0
0 1027 2010 0
0 1027 2011 0
0 1027 2012 0
    1 1027 2013 0
    3 1027 2014 0
0 1027 2015 0
    1 1027 2016 0
0 1027 2017 0
    1 1027 2018 1
    2 1027 2019 1
0 1027 2020 1
0 1027 2021 1
0 1027 2022 1
end

it's still showing unbalanced panel when I'm running the following stata command for SDID or synthetic difference in difference

Code:

sdid wanted county year policy, vce(bootstrap) seed(1213)

Panel is unbalanced.
r(451);

Is there anything I can do to detect the error ?

Calculate age from other occurrences and attribute it to specific observation

Hi,

I'm working with consecutive censuses. I can follow the same individuals through several decades. However, age is not always reported (missing from the census, unreadable, etc.) and therefore a zero is shown instead of a missing (don't be mad, I know...moreover, newborns also show an age of 0...no comment). But age probably have been reported in a previous or a subsequent census. How can I use that information to infer age when it is 0 (when applicable)?

Also, age is not always consistent so (t-1 + 10) and (t+1 - 10) may yield different results. From my experience, most of the age spread through time range between 8 and 12 years so no matter the census year used in the calculation age should be in the ballpark. In the example below, how to determine which census to use in the calculation?

Finally, individuals are part of dyads (last variable) and may be present in more than one dyad. Note sure it is relevant in the calculation, but agediff is a dyad characteristic that will need to be updated afterwards.

I'm adding a few questions that may help figure out all the possible cases:
- A newborn will be coded as 0. What if in the next census the individual is also of age 0 (instead of 9-10)? Should the calculation start from the last occurrence to the first?
- What if it's the last occurrence that is 0?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ego int census byte(ego_age agediff) long dyad
708884 1881 24 23 11415600
708884 1891  0  0 11415600
708884 1901 40 20 11415600
708884 1911 57 25 11415600
739865 1881  1 23 11415600
739865 1891  0  0 11415600
739865 1901 20 20 11415600
739865 1911 32 25 11415600
end

Thanks

EDIT: data is coming from the censuses in SQL tables. Since individuals have only one occurrence by census (compared to multiple kin relationships in dyadic format), maybe I should figure how to recode age in SQL so that age remains consistent through all dyads.

Creating Graphs with Regression Output

I am attempting to determine whether hours worked and wages have changed overtime differently for different generations. So I am running the following regression:

Code:

foreach var of varlist m_hrsov_a m_paygro_cpi_c {

eststo: areg `var' m_age2 m_sex_bi m_child_c m_mar_c m_employed_bi m_cued_bi m_ethnic_bi m_occ_c m_region_c if m_age_c == 1, absorb(m_qtr) cluster(m_yr)

esttab using output_age.rtf, replace nogap b(%9.3f) se(%9.3f) parentheses star(* 0.10 ** 0.05 *** 0.01) drop(*cons*)

  }

esttab est1 est2 using output_age.rtf, replace nogap b(%9.3f) se(%9.3f) parentheses star(* 0.10 ** 0.05 *** 0.01) drop(*cons*)

eststo clear

Notes: (m_hrsov_a = actual hours worked including overtime; m_paygro_cpi_c = gross pay adjusted by cpi and categorised).

I need to graph the regression output (output_age.rtf) overtime, but I am not clear how. Any ideas?

Here is an example of the data:

Code:

         Age |
(Generations |
         ) - |
 Categorical |
         (4) |      Freq.     Percent        Cum.
-------------+-----------------------------------
      Gen. Z |    696,858       12.29       12.29
 Millennials |  1,740,155       30.70       42.99
      Gen. X |  1,879,408       33.15       76.14
Baby Boomers |  1,352,639       23.86      100.00
-------------+-----------------------------------
       Total |  5,669,060      100.00

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(m_yr m_qtr) byte m_age double m_hrsov_a float m_paygro_cpi
2002 1 31    .     .
2002 1 69    .     .
2002 1 26    .   730
2002 1 59    .     .
2002 1 21    .     .
2002 1 39    .     .
2002 1 25    .     .
2002 1 46   61     .
2002 1 52    .     .
2002 1 26   30  1900
2002 1 38   20 25000
end

Thanks in advance!

Mediation Analysis - Stata sem and medsem commands with fixed effects models

Hello everyone,

I am trying to perform a mediation analysis, in which the variables of concern are as follows:

Dependent Variable (Y) - Percentage_changes_per_industry

Independent Variable (X) - tenure_median

Mediator Variable (M) - median_tenure_promo_top2sf_1st

Control Variables - median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education norte algarve centro alentejo acores madeira

The issue I am having is that I want to include fixed effects models, but do not know how to do so when considering a mediation analysis. The commands for each of the individual regressions would be as follows:

First, regression with the Mediator as the Dependent Variable:

Code:

xtset caem2 year

xtreg median_tenure_promo_top2sf_1st  tenure_median median_age_promo_top2sf_1st  gender log_nemp_median numb_firms_div1000 vn_per_employee_median   higher_education  i.year  norte algarve centro alentejo acores madeira, fe cluster(caem2)

Second, regression with the Dependent Variable as the Dependent Variable:

Code:

xtset caem2 year

xtreg Percentage_changes_per_industry median_tenure_promo_top2sf_1st tenure_median median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median   higher_education  i.year  norte algarve centro alentejo acores madeira, fe cluster(caem2)

The sem/medsem commands I am using, to perform the actual mediation analysis (and obtain values for the Sobel tests as well), are as follows:

Code:

sem    (median_tenure_promo_top2sf_1st <- tenure_median    median_age_promo_top2sf_1st   gender log_nemp_median numb_firms_div1000 vn_per_employee_median   higher_education   norte algarve centro alentejo acores madeira)(Percentage_changes_per_industry <- median_tenure_promo_top2sf_1st tenure_median    median_age_promo_top2sf_1st   gender log_nemp_median numb_firms_div1000 vn_per_employee_median   higher_education   norte algarve centro alentejo acores madeira) , nocapslatent vce(bootstrap, reps(100))
   

medsem, indep(tenure_median) med(median_tenure_promo_top2sf_1st) dep(Percentage_changes_per_industry) mcreps(1000) rit rid

The issue I have is that these sem/medsem commands are performing regular OLS regressions, and not fixed effects (on caem2) regressions, as I would like them to. Is there any way of implementing a fixed effects regressions in the sem models? It would help me solve the issue I am having with implementing these analyses on Stata.

Thank you very much in advance!
Rui

Portfolio construction

Hello,

I have data of individual investors trading behavior. Below you see an abstract of my data. My data is sorted by investor and date. I have an investor identifier, I have dates when the investor sold or bought a stock. The variable "stock" indicates the amount of stocks purchased (negative value indicates a sale). Furthermore, I have a variable indicating the price of the stock and one variable indicating the volume. Additionally, I have a stock identifier (cusip) and the indicator variable "sell" which is equal to one if a sale take place.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long investor float(dates stocks price volume) str8 cusip float sell
2290 13366     100   54.375   5437.5 "06071610" 0
2290 13382     200       21     4200 "86707110" 0
2290 13408     200       25     5000 "17958410" 0
2290 13437     500       10     5000 "80850988" 0
3206 11501    1000     2.25     2250 "02365010" 0
3206 11967 152.486  32.7899     5000 "92190830" 0
3206 11967 452.489 11.04999     5000 "77957Q10" 0
3206 12404   -1000   1.0312  -1031.2 "02365010" 1
3206 13457 257.732     19.4     5000 "08188080" 0
6002 12058     100   18.625   1862.5 "81137170" 0
6002 12064     200     5.25     1050 "55261810" 0
6002 12075    1000    1.125     1125 "50221010" 0
6002 12088    -100    23.75    -2375 "81137170" 1
6002 12092     100    8.625    862.5 "81090510" 0
6002 12177     100     16.5     1650 "02312710" 0
6002 12221     200     3.25      650 "59373610" 0
6002 12257    -100    23.75    -2375 "02312710" 1
6002 12263     100   11.625   1162.5 "55917720" 0
6002 12435     100    6.875    687.5 "57679810" 0
6002 12498    -100     16.5    -1650 "55917720" 1
6002 12528     200      5.5     1100 "55261810" 0
6002 12667 226.655 11.02998     2500 "77064830" 0
6002 13137    -100    11.25    -1125 "81090510" 1
6002 13156 116.279 10.32001     1200 "77064887" 0
6002 13276    -200    1.625     -325 "59373610" 1
6002 13346  62.814 11.94001      750 "77064887" 0
6002 13356     200    4.875      975 "41025210" 0
6002 13356  53.433 36.70017     1961 "62838030" 0
6002 13402     100    15.75     1575 "64121010" 0
6002 13408     200   4.5625    912.5 "87990540" 0
6002 13426     200     2.75      550 "55403410" 0
6002 13439      50    19.75    987.5 "07251010" 0
6002 13460     100     1.75      175 "87990540" 0
7109 11340     100    27.25     2725 "20810810" 0
7109 11353    -100    27.25    -2725 "20810810" 1
7109 11396     100    22.25     2225 "00190710" 0
7109 11416     100       24     2400 "20810810" 0
7109 11548     100     27.5     2750 "25283630" 0
7109 11553     100    21.75     2175 "01849010" 0
7109 11728     100   19.625   1962.5 "59408710" 0
7109 11926     100       33     3300 "25084710" 0
7109 12074     100       27     2700 "87156510" 0
7109 12087    -100     22.5    -2250 "01849010" 1
7109 12087    -100   23.375  -2337.5 "20810810" 1
7109 12095      50   48.375  2418.75 "38131710" 0
7109 12134      50     65.5     3275 "10904310" 0
7109 12137    -100   12.125  -1212.5 "25283630" 1
7109 12163     -50   44.125 -2206.25 "38131710" 1
7109 12179     100     28.5     2850 "53567810" 0
7109 12228     100   20.875   2087.5 "09367610" 0
7109 12282     100     21.5     2150 "72387710" 0
7109 12290     100   40.875   4087.5 "50025510" 0
7109 12319      50   36.625  1831.25 "59408710" 0
7109 12362      50   30.875  1543.75 "25084710" 0
7109 12362     100   26.125   2612.5 "22237210" 0
7109 12380     100    20.75     2075 "03785710" 0
7109 12395    -150    32.25  -4837.5 "59408710" 1
7109 12403    -100     23.5    -2350 "00190710" 1
7109 12409    -100   46.625  -4662.5 "50025510" 1
7109 12418    -100    18.75    -1875 "72387710" 1
7109 12430     100    28.75     2875 "44984210" 0
7109 12445     100       32     3200 "00157530" 0
7109 12472     100    27.25     2725 "80818810" 0
7109 12472    -100    24.75    -2475 "09367610" 1
7109 12479     100       21     2100 "86693310" 0
7109 12514    -100   23.625  -2362.5 "22237210" 1
7109 12514    -100   34.375  -3437.5 "00157530" 1
7109 12514    -100    27.75    -2775 "80818810" 1
7109 12515    -100   22.375  -2237.5 "86693310" 1
7109 12516    -100    25.75    -2575 "03785710" 1
7109 12562     200   13.875     2775 "86881810" 0
7109 12575    -100     46.5    -4650 "53567810" 1
7109 12589     -50     72.5    -3625 "10904310" 1
7109 12607    -150   25.125 -3768.75 "25084710" 1
7109 12627    -200   14.625    -2925 "86881810" 1
7109 12661    -100       19    -1900 "87156510" 1
7109 12697     150     16.5     2475 "04890310" 0
7109 12697      50   68.375  3418.75 "10904310" 0
7109 12697     100    20.25     2025 "77136710" 0
7109 12738      50       51     2550 "53567810" 0
7109 12817     -50     51.5    -2575 "53567810" 1
7109 12907     200     18.5     3700 "86693310" 0
7109 13031     100       44     4400 "40621610" 0
7109 13234     100   40.375   4037.5 "53567810" 0
7109 13269     100    26.25     2625 "03785710" 0
7109 13319 251.678     5.96     1500 "48625010" 0
7109 13355     200    13.75     2750 "23281510" 0
7109 13443      50   62.875  3143.75 "12692010" 0
7109 13450     100    37.25     3725 "88355610" 0
7109 13464    -100       13    -1300 "44984210" 1
7109 13467     250    8.625  2156.25 "92551410" 0
7348 12466     100    19.25     1925 "74143W10" 0
7348 12768     100    13.75     1375 "74143W10" 0
8774 11947     100   29.875   2987.5 "00282410" 0
8774 11948 211.685    11.81     2500 "31606120" 0
9003 11372    1000     4.75     4750 "35671310" 0
9003 11430     300    11.75     3525 "09954120" 0
9003 11612    -300       14    -4200 "09954120" 1
9003 12337    1000    3.125     3125 "35671310" 0
9003 12991   -2000    5.125   -10250 "35671310" 1
end
format %td dates

Now I want to create investor-stock-portfolios. My analysis examines the portfolio of stocks that an investor could sell on each day that they do sell at least one position. Therefore, I already marked all dates where an investor sold a stock using the indicator variable "sell". Now, I want to create an observation for each position held by an investor on these sell dates.
For example, investor 6002: in line 13 he sells one stock holding. This investor currently holds 3 stocks (including the one he will now sell) and now I want to add 2 additional observations for the other two stocks that he holds for this sell date, that means for investor 6002 on this sell date I want three observations.

I would appreciate any help!

Friday, October 28, 2022

Reference period in new DID estimators

Hello all! I am running a few checks using the new heterogeneity-robust diff-in-diff estimators, specifically: csdid, did_imputation, and did_muliplegt. From each estimator, I retrieve the estimates for the leads and lags relative to the time of policy introduction and compare to the lead/lag estimates from a standard event study model (using reghdfe). It seems that these estimators by default choose the reference period as the latest lead, as opposed to the lead at period -1 right before policy introduction. This may be innocuous with simulated data but it creates problems when one tries to estimate the overall average ATT (all lag/lead estimates will be relative to the wrong period). I've seen this issue with simulated data (e.g. Asjad Naqvi 's incredibly helpful code: https://asjadnaqvi.github.io/DiD/docs/code/06_combined/) and with real data across different projects.

I was wondering if anyone has figured out how to somehow force the reference lead to be period -1 or otherwise re-scaled the lead and lag estimates to be relative to this more meaningful reference.

Thanks for chiming in!

P.S. This could be a very simple fix and I am just completely missing it!

Matching data over time?

Hello,

I have a sample from the Current Population Survey (CPS). My data is monthly from 2015-present for the variables empsame, empstat, and labforce. EMPSAME indicates whether or not the respondent was employed by the same employer and the same job he/she reported working as his/her main job in the previous month's survey. What I am trying to do here is to define the "newly employed" as those who responded NO in EMPSAME because this means that they are working for a new employer compared to the previous month.
Once I have the set of "newly employed" in period t, I then want to split it into the ones who were employed in t-1, and those who were not employed in t-1.

I have an identifier variable that gives a unique id for each individual surveyed.

I do not know how I can use the data at hand to achieve the above goal. Any ideas on how to use the identifier to track individuals over each 2 consecutive months to be able to get the 2 sets?

Creating new variable (subtracting) same individual

Hi,

I am using individual-level dataset across 4 years to try and calculate the mobility out of low wage between 2020 and 2017. I created these dummy variables (delta2020 and delta2017) expressing 1 if low wage and 0 if not.
However, I need to subtract delta2020 with delta2017 for the same individual (Person_Ref1).

How can I create a variable like diff below?

input Person_Ref1Person_Ref2 delta2020 delta2017 diff
"1000501" "2017" . 1 .
"1000501" "2018" . . .
"1000501" "2019" . . .
"1000501" "2020" 1 . 0
"1000502" "2017" . 0 .
"1000502" "2018" . . .
"1000502" "2019" . . .
"1000502" "2020" 0 . 0
"1000503" "2017" . 0 .
"1000503" "2018" . . .
"1000503" "2019" . . .
"1000503" "2020" 1 . 1
"1000701" "2017" . 1 .
"1000701" "2018" . . .
"1000701" "2019" . . .
"1000701" "2020" 1 . 0
"1000702" "2017" . 1 .
"1000702" "2018" . . .
"1000702" "2019" . . .
"1000702" "2020" 1 . 0
"1000703" "2017" . 0 .
"1000703" "2018" . . .
"1000703" "2019" . . .
"1000703" "2020" 0 . 0
"1002601" "2017" . 0 .
"1002601" "2018" . . .
"1002601" "2019" . . .
"1002601" "2020" 0 . 0
"1004203" "2017" . 1 .
"1004203" "2018" . . .
"1004203" "2019" . . .
"1004203" "2020" 1 . 1

Appreciate your help and thanks in advance

vlookup generates same id

I read the thread on Complicated Vlookup-type problem in a large dataset.

I am facing a similar problem with my research.
I was able to generate grandmaid separate from momid. In other words the grandmaid generated is not a duplicate of momid. However when I generate grandpaid. its the same as grandmaid.
Why does vlookup do that?

Thursday, October 27, 2022

Cox model with an unbalanced panel

Hello everyone,

I'm running a cox model with an unbalanced panel from 2015-2021 to measure the factors that influence the survival of companies, I have my time variable and the failure variable, my covariates vary over time, that is, they are GDP , agglomeration index, growth rate of economic sectors and other indexes, these variables are different each year and also vary depending on the region. When the covariates are included in the analysis, they turn out to be significant, but none of them meet the proportional hazards assumption. My question is, is there an error in my database?
The commands I use to run the model are
-snapspan idn time died, generate(time0) replace-
-stset time, id(idn) failure (died)-
-stcox- (with my covariates)
-stat phtest-

I have reviewed some works where tests of the proportional hazards assumption are not included, so I don't know if it is something necessary,

thanks in advance
Ana

When including interacted fixed effects, do I also have to include each fixed effects separately in the model?

When including interacted fixed effects, do I also have to include each fixed effects separately in the model?
For example, if I want to include firm x Industry fixed effects, do I also have to include firm fixed effects and industry fixed effects as well?

Thank you!

Difference in Difference analysis - how to create a group variable

I think my question is pretty straightforward. I'd like to know what my "Group" is for a difference in differences analysis.

Study: Newspaper articles from 1985-2017 were categorized as either in favor, against, or neutral toward a specific topic. These are weighted and graphed.

Array

There is a clear divergence between the Favor and Against variables that happen roughly around 2007 or 2008. I would like to test this using a treatment variable defined as "before 2008" and "2008 and on," Treatment 0/1. The time is Year. I've added "the mean of the Favor and Against variables" for the control variable, Control.

This is a mock-up of my data:

Array

The unit of analysis is newspapers for each year.
But what's up with the Group? I know it's supposed to be categorical ... but what am I supposed to put?

The syntax I have for the command is didregress (Favor) (Treatment), group(?????) time(Year)
Any thoughts on what my Group should be?
Thank you!

My Lorenz curves are coming out as straight lines.

Hello Stata Experts,

I am trying to create lorenz curves for a variable with four categories. I have done this before but this time my curves are coming out as straight lines. I am sharing an example dataset along with my commands.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float raw_rank1 byte v190 float(sei z) double rank float rank2
1519 2  -.57867503 0 .438 .438
1521 2  -.57797277 0 .439 .439
 890 1   -.8061847 0 .273 .273
2461 2 .0007149702 0 .655 .655
2061 2  -.28932393 0  .56  .56
2988 3    .4960965 0 .763 .763
1394 2   -.6336712 0 .407 .407
 119 1  -1.0780907 0 .045 .045
1511 2   -.5812709 0 .435 .435
 158 1  -1.0511917 0 .057 .057
3132 3    .6742755 0 .805 .805
1028 1   -.7624822 0 .311 .311
1493 2   -.5876991 0 .431 .431
1592 2   -.5417551 0 .451 .451
3225 3    .8141427 0 .822 .822
1406 2   -.6303535 0 .409 .409
1921 2   -.3819044 0 .526 .526
 266 1  -1.0056158 0 .092 .092
2282 2  -.14398591 0 .611 .611
  11 1   -1.223736 0 .001 .001
  66 1  -1.1141272 0 .027 .027
3905 3    2.594721 0 .983 .983
1562 2   -.5555258 0 .446 .446
 212 1  -1.0261097 0 .075 .075
 277 1  -1.0024703 0 .098 .098
3750 3   1.9207685 0 .949 .949
 114 1   -1.080746 0 .044 .044
 414 1   -.9529806 0 .141 .141
1021 1   -.7646693 0 .308 .308
3956 3    2.976368 0 .992 .992
2139 2   -.2403019 0 .576 .576
 872 1   -.8127694 0 .267 .267
1201 1   -.7066278 0 .357 .357
  42 1  -1.1533086 0 .019 .019
1211 1   -.7038143 0 .359 .359
 673 1   -.8647565 0 .215 .215
  61 1   -1.119929 0 .024 .024
2684 3    .1753951 0   .7   .7
2035 2   -.3084227 0 .552 .552
1706 2   -.4945633 0 .477 .477
 106 1  -1.0862021 0 .041 .041
 445 1   -.9433951 0  .15  .15
3025 3   .53929883 0 .775 .775
3368 3   1.0676816 0 .868 .868
2483 2  .011977144 0 .658 .658
1341 2   -.6609402 0 .392 .392
 123 1    -1.07341 0 .047 .047
3627 3   1.5838754 0 .929 .929
 942 1   -.7906448 0 .286 .286
2722 3   .20741607 0  .71  .71
3233 3    .8326448 0 .823 .823
1628 2  -.52488875 0 .461 .461
2923 3    .4188752 0 .753 .753
3454 3   1.2099286 0  .89  .89
1757 2   -.4668162 0 .489 .489
1435 2   -.6137782 0 .416 .416
2478 2  .008811969 0 .657 .657
3957 3     2.99273 0 .993 .993
1430 2   -.6182839 0 .414 .414
 964 1   -.7837582 0 .292 .292
1427 2   -.6187509 0 .412 .412
 927 1    -.795358 0 .282 .282
1634 2   -.5228623 0 .463 .463
 645 1   -.8762712 0 .206 .206
3474 3    1.252518 0 .894 .894
2442 2 -.012976546 0  .65  .65
 735 1   -.8494613 0 .237 .237
1143 1   -.7286108 0 .345 .345
2302 2   -.1215786 0 .617 .617
1652 2   -.5176178 0 .467 .467
 561 1   -.9010396 0 .187 .187
3317 3    .9688857 0 .854 .854
  73 1  -1.1084094 0  .03  .03
3866 3   2.3716063 0 .973 .973
 319 1   -.9914252 0  .11  .11
2440 2 -.013812775 0 .649 .649
  26 1  -1.1747433 0 .012 .012
 258 1  -1.0091702 0 .089 .089
1556 2   -.5576935 0 .445 .445
 322 1    -.990563 0 .112 .112
2132 2  -.24586444 0 .574 .574
 801 1   -.8318206 0 .254 .254
 711 1    -.856173 0 .229 .229
2699 3   .19003813 0 .705 .705
3192 3    .7619695 0 .819 .819
1226 1   -.6974397 0 .365 .365
1607 2  -.53420997 0 .454 .454
3172 3      .72391 0 .812 .812
2609 2   .10445177 0 .685 .685
1983 2   -.3496844 0  .54  .54
3392 3   1.1001801 0 .874 .874
3078 3    .6023574 0 .789 .789
 734 1   -.8496683 0 .236 .236
1976 2   -.3590761 0 .537 .537
2136 2  -.24302575 0 .575 .575
2604 2   .10296789 0 .682 .682
  17 1  -1.1979523 0 .006 .006
1433 2   -.6150438 0 .415 .415
1774 2    -.454978 0 .496 .496
3412 3   1.1337353 0 .879 .879
end
label values v190 wlth

Code:

pca x1-x74
predict comp1
rename comp1 v191

regr y v191 i.z        
drop if e(sample)!=1 
glcurve y , pvar(rank) glvar(Lorenz) sortvar(v191) replace by(z) split saving(v191_graph)

I also tried using a standardized wealth index but it is not working either.

Code:

egen sei=std(v191)
regr y sei i.z        
drop if e(sample)!=1 
glcurve y , pvar(rank) glvar(Lorenz) sortvar(sei) replace by(z) split saving(sei_graph)

I will really appreciate any insight into this-where I am going wrong or what does this mean? I can share my full PCA code and the related data, if needed.

Thanks in advance
Deepali
Array Array

Adding and simplifying observations; from daily to weekly

Hello to all.
I´m kind of new in using STATA, and I´m having trouble figuring out how to merge observations. I have binary values for several variables, with several observations per day, and I have to present the weekly values (the addition of them all, per week). I understand it´s a bit basic, but I would appreciate any help. Here´s an example:
Date Week CO MU MN
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 1 0
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 1 0

Interpretation of "/logs" in Stata17 output using mestreg

Dear Stata users,

Can someone please kindly explain in simple terms what the "/logs" output refers to in the following code? I've looked in the mestreg help, streg help, this blog file https://www.stata.com/stata-news/new...-correlations/ and the "An introduction to survival analysis using stata" textbook and can't find an explanation anywhere.

use https://www.stata-press.com/data/r17/catheter
stset
mestreg age female || patient:, distribution(lognormal) time

Mixed-effects lognormal AFT regression Number of obs = 76
Group variable: patient Number of groups = 38

Obs per group:
min = 2
avg = 2.0
max = 2

Integration method: mvaghermite Integration pts. = 7

Wald chi2(2) = 14.13
Log likelihood = -328.70499 Prob > chi2 = 0.0009
------------------------------------------------------------------------------
_t | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0060556 .0106775 -0.57 0.571 -.0269832 .014872
female | 1.334213 .3570964 3.74 0.000 .634317 2.034109
_cons | 3.480046 .5453814 6.38 0.000 2.411118 4.548973
-------------+----------------------------------------------------------------
/logs | .0359558 .1297168 -.2182844 .290196
-------------+----------------------------------------------------------------
patient |
var(_cons)| .322698 .2641533 .0648667 1.605354
------------------------------------------------------------------------------
LR test vs. lognormal model: chibar2(01) = 1.62 Prob >= chibar2 = 0.1017

If I calculate the exponent of the /logs output, i.e. exp(0.0359558) = 1.03667, is this the estimated residual variance between catheters?

Kind regards,
Ben

Wednesday, October 26, 2022

Plotting a bar graph with two y-axis

dear colleagues I am trying two create a bar graph with two y-axis from my data and code below and am getting error "invalid span'.What does this mean?Can I get an assistance kindly

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(Poulation Tube_conc Flow_rate) int(Low_EventRate High_EventRate No_cellsSeen) byte Sort_Duration float seen
1 1  1 1200 1500  5423 11 2
1 1  3 1000 1300  3460  5 2
1 1 10 2000 2300  9163  6 2
1 2  1  400  500  5941 13 2
1 2  3  170  300  3899 12 2
1 2 10  300  600  5394 11 2
1 3  1  250  300  4857 81 2
1 3  3  120  300  4696 67 2
1 3 10  250  400  8529 47 2
1 1  1 1200 1500  5968 11 1
1 1  3 1000 1300  3800  5 1
1 1 10 2000 2300 10478  6 1
1 2  1  400  500  6051 13 1
1 2  3  170  300  4157 12 1
1 2 10  300  600  6051 11 1
1 3  1  250  300  4839 81 1
1 3  3  120  300  4656 67 1
1 3 10  250  400  8591 47 1
end
label values Tube_conc mea
label def mea 1 "20k cells/ul", modify
label def mea 2 "5k cells/ul", modify
label def mea 3 "1k cells/ul", modify
label values seen s
label def s 1 "seen", modify
label def s 2 "sorted", modify

------------------ copy up to and including the previous line ------------------

Listed 18 out of 18 observations

twoway bar No_cellsSeen, ///
over(Flow_rate,label(labsize(small)) gap(10)) ///
over(Tube_conc, label(labsize(small)) relabel(`r(relabel)')) ///
ytitle("Number seen/sorted", size(small)) ///
yaxis(2) ///
, span size(medium)) ///
blabel(bar,position(top)) ///
intensity(40) bgcolor(white) ///
asyvars ||
twoway bar

Marketing Communication Mix

Defined as a “dialogue between business unit and its present and potential customers that takes place during pre-selling, selling and post-selling stages”[1], marketing communication mix, or promotion mix is considered to be a fundamental aspect of business marketing initiatives. Kotler and Keller (2009)[2] divided marketing communication mix into the following six categories: advertising, sales promotion, events and experiences, public relations and publicity, direct selling and personal selling. 1. Advertising is “a paid non-personal communication about an organisation and its products transmitted to a target audience”[3]. The following are important platforms of print and media advertising: TV Radio Magazines Newspapers Billboards & posters Viral marketing Celebrity endorsement Product placement 2. Sales Promotions is a marketing strategy which involves using temporary campaign or offer to increase interest and demand on products and services. The following are popular sales promotion techniques: Seasonal sales promotions. Businesses may reduce the prices of certain items during holidays and festive seasons such as Christmas holidays. Money off coupons. Customers receive coupons by mail, in store or they cut coupons our of magazines or product packaging to purchase the product next time for lower price. Purchase of a product allows the customer to participate in a game to win a price. Discount vouchers. A voucher that can be used to purchase a product at a reduced price. Free gifts. Customers get an item for free if they purchase a product. Point of sale materials. Use of posters, display stands and other promotional tools to present the product to customers within the shop. Loyalty cards. Possibility for customers to earn points for buying specific products or buying from specific sellers. Accumulated points can be exchanged for goods or other offers. For example, Tesco Clubcard, Sainsbury’s Nectar etc. 3. Events and experiences, as a marketing mix component relate to a…

eivreg in rolling window setting (estimates not stored properly)

Dear Statalist,

I might be shooting my shot with this post but I want to implement the eivreg command in a rolling-windows setting using an unbalanced panel (id x tid).
Currently, I am able to run the program but the estimates are not stored correctly. I would like to save the regression coefficients in a _b_* variables and the t-stats in _t_* variables.

The specifications for the rolling windows are the last 36 months (minimum 12 months of observations) by id. r_rf, mkt, and smb are assumed to be random variables.
The matrix r(table) stores the estimates for each window.

From similar threads (with other regressors), I ended up with this code.

Code:

g _b_mkt=.
g _b_smb=.
g _t_mkt=.
g _t_smb=.

capture program drop eiv_rol
program define eiv_rol

    qui: eivreg  r_rf mkt  smb ,  r(mkt 0.6 smb 0.6) 
    matrix mattest= r(table)
    
    replace _b_mkt= mattest[1, 1] if mi(_b_mkt)
    replace _b_hml= mattest[1, 2] if mi(_b_smb)
    replace _t_mkt= mattest[3, 1] if mi(_t_mkt)
    replace _t_hml= mattest[3, 2] if mi(_t_smb)
    
    exit
end

rangerun eiv_rol, by(id) interval(tid -36 -1)

If the estimates can be stored properly, I want to re-run the program changing the reliability (in this example case the reliability for both variables is 0.6). I am planning to increment by .1 in each step from 0 to 1. What other procedure would you recommend to speed up the process?

Another problem that I am facing is that I cannot break the code in between. I need to wait until the end to made changes, which might take several minutes. Do you know how to change this behavior?
Thank you

Ricardo

Combining surveys with distinct analytical weights

Hi.

I have a dataset which combine 14 household surveys in 14 countries. Each survey was conducted in different years and each survey has a household weight variable that only specifies to this country's context (data structure is the same across 14 countries). Now I merged them and tried to cross tabulate the country and gender_area (four types of value: male_rural, female_rural, male_urban, female_urban) variable with weights (tab country gender [aw=hhweight], m). But I found that such a cross-tabulation would create weird values for some of the countries.

For example, if I add one if condition by the end of the tab (tab country gender [aw=hhweight] if abc==1, m), some country (KHM, NPL) 's row total would be greater than their original row total without the condition. But in this dataset, a condition would give a smaller subsample. If I don't add the weight (tab country gender, m), there is no such a problem. If I just tab one country with weight, there is no such a problem either. So I wonder if there is any way for me to compare all countries with weight. I am not that familiar with survey data reference in Stata (svyset, strata, etc). Hopefully there is someone can help.

Array

regress with panel data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 ym byte hs_c float imports str3(importer exporter) float GDP_im long GDP_ex double Exrate_im float(Exrate_ex conflict)
"2001M01" 1 22907.025 "CHN" "USA" 290989.9 3425559 95.2487673974315 132.03389     -488.7
"2001M02" 1 22907.025 "CHN" "USA" 301626.9 3432654 94.9859811841317 132.53093     -842.2
"2001M03" 1 22907.025 "CHN" "USA" 312263.8 3439749 96.5148945688052 134.90831    -1374.5
"2001M04" 1 22907.025 "CHN" "USA" 322900.7 3446845 97.7540203225569 136.17534  -8309.801
"2001M05" 1 22907.025 "CHN" "USA" 329373.1 3442245 97.5296537738364 135.50266    -3037.7
"2001M06" 1 22907.025 "CHN" "USA" 335845.4 3437645 97.9459442254285 136.04584     -738.7
"2001M07" 1 22907.025 "CHN" "USA" 342317.8 3433045 98.5136769658743 135.71281    -2247.3
"2001M08" 1 22907.025 "CHN" "USA" 355947.1 3436219 96.5277841656989 132.57167    -1014.2
"2001M09" 1 22907.025 "CHN" "USA" 369576.5 3439392 95.4664429813288 131.81607    -2174.2
"2001M10" 1 22907.025 "CHN" "USA" 383205.9 3442566   96.36440014041 132.57242    -1261.1
"2001M11" 1 22907.025 "CHN" "USA"   361366 3452070 96.7013085639837 133.92229     -358.1
"2001M12" 1 22907.025 "CHN" "USA" 339526.1 3461574 97.5790651204532 134.67474       -575
"2002M01" 1  31262.51 "CHN" "USA" 317686.3 3471078 98.3799449054757 137.08035     -952.3
"2002M02" 1  31262.51 "CHN" "USA" 329365.3 3478199 99.2481334791048 138.10823    -1001.9
"2002M03" 1  31262.51 "CHN" "USA" 341044.4 3485321 97.8834719959884 137.17897     -903.7
"2002M04" 1  31262.51 "CHN" "USA" 352723.4 3492442 97.1425181532612 136.76201     -410.4
"2002M05" 1  31262.51 "CHN" "USA" 361031.6 3497140 95.4092946886645  132.7594     -399.8
"2002M06" 1  31262.51 "CHN" "USA" 369339.8 3501838 93.8074447121544 129.10393     -278.5
"2002M07" 1  31262.51 "CHN" "USA" 377648.1 3506535 91.6067234174144 125.30114     -757.6
"2002M08" 1  31262.51 "CHN" "USA" 392592.4 3508063 92.1153966922954  126.6381     -559.8
"2002M09" 1  31262.51 "CHN" "USA" 407536.7 3509590 92.3667537296484 127.08677     -600.7
"2002M10" 1  31262.51 "CHN" "USA"   422481 3511117 93.1840883646182 127.79578     -575.8
"2002M11" 1  31262.51 "CHN" "USA" 401759.3 3517165 91.9806011990866 126.60763     -544.6
"2002M12" 1  31262.51 "CHN" "USA" 381037.5 3523212     91.529878028 126.00581     -692.2
"2003M01" 1  40474.96 "CHN" "USA" 360315.8 3529260 90.1206136188956 123.26498     -836.1
"2003M02" 1  40474.96 "CHN" "USA" 371244.5 3539770  90.127945381461  123.0451     -943.2
"2003M03" 1  40474.96 "CHN" "USA" 382173.1 3550279 89.9944890775613 122.89462    -1089.8
"2003M04" 1  40474.96 "CHN" "USA" 393101.8 3560789 90.1636884906585 123.52025     -984.7
"2003M05" 1  40474.96 "CHN" "USA" 404193.4 3580574 87.9425578376848 118.27224     -636.2
"2003M06" 1  40474.96 "CHN" "USA"   415285 3600359 87.6403887383518  117.3232     -552.5
"2003M07" 1  40474.96 "CHN" "USA" 426376.6 3620144 88.1265059788148 118.31814     -717.4
"2003M08" 1  40474.96 "CHN" "USA" 444405.2 3634070 88.7374547228043 121.06068     -949.9
"2003M09" 1  40474.96 "CHN" "USA" 462433.9 3647996 87.8689113231125 120.48258     -790.7
"2003M10" 1  40474.96 "CHN" "USA" 480462.5 3661922 86.4437653981293 119.68214      -1097
"2003M11" 1  40474.96 "CHN" "USA" 459426.1 3668915 87.1762972397664 118.16027     -862.4
"2003M12" 1  40474.96 "CHN" "USA" 438389.6 3675909 86.1886065218889  113.6926 -1137.1001
"2004M01" 1 34436.934 "CHN" "USA" 417353.2 3682902 85.3595266527873 108.61776     -915.8
"2004M02" 1 34436.934 "CHN" "USA" 434094.1 3692462 84.1682250127312 108.43234    -1021.6
"2004M03" 1 34436.934 "CHN" "USA" 450835.1 3702022 85.8602293325678 110.82971    -1250.6
"2004M04" 1 34436.934 "CHN" "USA"   467576 3711582 86.4399453470538  111.8528    -1089.2
"2004M05" 1 34436.934 "CHN" "USA" 480282.2 3723227 87.9178017422674 114.70498     -806.5
"2004M06" 1 34436.934 "CHN" "USA" 492988.5 3734872 87.1046063014496 113.60893       -978
"2004M07" 1 34436.934 "CHN" "USA" 505694.7 3746517 86.8262024465545 113.53995     -753.5
"2004M08" 1 34436.934 "CHN" "USA" 525372.8 3759281  87.392378701273   113.966     -608.5
"2004M09" 1 34436.934 "CHN" "USA" 545050.9 3772044 87.4111484209643 112.92165     -624.5
"2004M10" 1 34436.934 "CHN" "USA" 564729.1 3784808 86.0960776241377  110.8545     -294.6
"2004M11" 1 34436.934 "CHN" "USA" 539410.4 3798823 83.5447414579257  105.9082     -292.2
"2004M12" 1 34436.934 "CHN" "USA" 514091.7 3812838 82.0541941942796 103.66364     -572.9
"2005M01" 1   56490.7 "CHN" "USA"   488773 3826852 82.1078962405486 103.37172     -304.8
"2005M02" 1   56490.7 "CHN" "USA" 506251.4 3833060 82.8945232416078 103.84805     -510.8
"2005M03" 1   56490.7 "CHN" "USA" 523729.8 3839267 82.0825859930692 102.95734     -700.6
"2005M04" 1   56490.7 "CHN" "USA" 541208.2 3845474 82.7821510742878 104.66167     -472.7
"2005M05" 1   56490.7 "CHN" "USA" 557561.4 3855544 83.1157669337385 106.34795     -990.3
"2005M06" 1   56490.7 "CHN" "USA" 573914.6 3865615 84.2998988532275 108.73842     -632.6
"2005M07" 1   56490.7 "CHN" "USA" 590267.8 3875685 85.8133001727106 111.08463     -750.3
"2005M08" 1   56490.7 "CHN" "USA" 616303.7 3883022 86.0680785320791  109.0041     -542.6
"2005M09" 1   56490.7 "CHN" "USA" 642339.7 3890359 86.1042618695447 108.34776     -788.6
"2005M10" 1   56490.7 "CHN" "USA" 668375.6 3897696   87.44376973554 109.18375     -434.4
"2005M11" 1   56490.7 "CHN" "USA" 640514.5 3915218  88.326428758111 111.26317     -713.4
"2005M12" 1   56490.7 "CHN" "USA" 612653.4 3932741 87.9763728373693  111.1355     -426.3
"2006M01" 1  81040.34 "CHN" "USA" 584792.3 3950263 86.5881366299368  110.3442     -678.8
"2006M02" 1  81040.34 "CHN" "USA"   609045 3953497 86.9561957937083    111.22     -609.2
"2006M03" 1  81040.34 "CHN" "USA" 633297.8 3956730 86.7243768381664 110.81589    -1333.8
"2006M04" 1  81040.34 "CHN" "USA" 657550.5 3959964  86.292626886158 109.12933    -1344.8
"2006M05" 1  81040.34 "CHN" "USA" 672993.5 3961950 84.4130107730991  104.7291      -1151
"2006M06" 1  81040.34 "CHN" "USA" 688436.4 3963936 85.6043521050252  105.7272    -1499.6
"2006M07" 1  81040.34 "CHN" "USA" 703879.4 3965922 85.5206781503312 105.19274     -807.5
"2006M08" 1  81040.34 "CHN" "USA" 738997.2 3977076 85.4012147233782 104.92684      -1113
"2006M09" 1  81040.34 "CHN" "USA" 774114.9 3988230   86.26758063039 105.98315    -1785.8
"2006M10" 1  81040.34 "CHN" "USA" 809232.6 3999383 87.2939849273982 107.83578    -1019.7
"2006M11" 1  81040.34 "CHN" "USA"   785074 4003297 86.9312059561653  106.8941    -1656.7
"2006M12" 1  81040.34 "CHN" "USA" 760915.3 4007211 87.0858381296629 105.88695    -1223.4
"2007M01" 1 121749.84 "CHN" "USA" 736756.6 4011125 88.1709058655663  108.1406    -2221.5
"2007M02" 1 121749.84 "CHN" "USA" 772398.6 4019661 88.4680309310452 107.47453    -1393.2
"2007M03" 1 121749.84 "CHN" "USA" 808040.6 4028198 88.2165896802754 106.04984    -2480.4
"2007M04" 1 121749.84 "CHN" "USA" 843682.5 4036734 87.5223551122833 103.90254      -3041
"2007M05" 1 121749.84 "CHN" "USA" 868947.3 4044838 88.4249762412445 103.01221    -2810.5
"2007M06" 1 121749.84 "CHN" "USA"   894212 4052943 89.7994176184484 102.72694    -2797.6
"2007M07" 1 121749.84 "CHN" "USA" 919476.8 4061047 90.1439913275446 100.60493    -2321.1
"2007M08" 1 121749.84 "CHN" "USA" 965902.5 4069300 91.1855516013926 100.73556    -2192.1
"2007M09" 1 121749.84 "CHN" "USA"  1012328 4077553 90.5984099873853  99.03867    -2978.4
"2007M10" 1 121749.84 "CHN" "USA"  1058754 4085805 89.5003273545357  96.85046 -4514.5996
"2007M11" 1 121749.84 "CHN" "USA"  1028848 4080256 89.2296783629896  94.79352      -3270
"2007M12" 1 121749.84 "CHN" "USA" 998941.9 4074707 90.6907358463584  96.37177    -2581.6
"2008M01" 1 151607.58 "CHN" "USA" 969035.9 4069158 91.4561432930558  95.55553      -2046
"2008M02" 1 151607.58 "CHN" "USA"  1023150 4076921 93.5013522500989  95.10735      -5062
"2008M03" 1 151607.58 "CHN" "USA"  1077263 4084683  92.190051215958  92.49255    -4052.7
"2008M04" 1 151607.58 "CHN" "USA"  1131377 4092446 93.1322198043677   92.6712    -5042.4
"2008M05" 1 151607.58 "CHN" "USA"  1156253 4085258 93.9823709559253  93.72097    -2640.7
"2008M06" 1 151607.58 "CHN" "USA"  1181128 4078071 94.9891741527411  94.85728    -2735.2
"2008M07" 1 151607.58 "CHN" "USA"  1206004 4070883 94.8847806429896   94.3504    -2784.9
"2008M08" 1 151607.58 "CHN" "USA"  1236305 4041259 96.8387184905498  98.96593      -3223
"2008M09" 1 151607.58 "CHN" "USA"  1266607 4011636 99.4853398062497 102.48123    -1573.9
"2008M10" 1 151607.58 "CHN" "USA"  1296908 3982012 103.995483591401  110.8625    -3169.7
"2008M11" 1 151607.58 "CHN" "USA"  1225333 3966539 106.160585135563 114.07413    -1732.3
"2008M12" 1 151607.58 "CHN" "USA"  1153759 3951066 103.497649081851 110.66965    -2386.7
"2009M01" 1 130867.41 "CHN" "USA"  1082184 3935593 104.036770558405  110.2786    -4105.6
"2009M02" 1 130867.41 "CHN" "USA"  1130807 3933372 105.915488371398 113.40341    -4055.5
"2009M03" 1 130867.41 "CHN" "USA"  1179430 3931152 107.163796022779 114.57028    -7922.8
"2009M04" 1 130867.41 "CHN" "USA"  1228053 3928932 105.124889734425  112.8565    -5349.3
"2002M09" 3 3570.5576 "CHN" "USA" 407536.7 3509590 92.3667537296484 127.08677     -600.7
"2002M10" 3 3570.5576 "CHN" "USA"   422481 3511117 93.1840883646182 127.79578     -575.8
"2002M11" 3 3570.5576 "CHN" "USA" 401759.3 3517165 91.9806011990866 126.60763     -544.6
"2002M12" 3 3570.5576 "CHN" "USA" 381037.5 3523212     91.529878028 126.00581     -692.2
"2003M01" 3  8855.441 "CHN" "USA" 360315.8 3529260 90.1206136188956 123.26498     -836.1
"2003M02" 3  8855.441 "CHN" "USA" 371244.5 3539770  90.127945381461  123.0451     -943.2
"2003M03" 3  8855.441 "CHN" "USA" 382173.1 3550279 89.9944890775613 122.89462    -1089.8
"2003M04" 3  8855.441 "CHN" "USA" 393101.8 3560789 90.1636884906585 123.52025     -984.7
"2003M05" 3  8855.441 "CHN" "USA" 404193.4 3580574 87.9425578376848 118.27224     -636.2
"2003M06" 3  8855.441 "CHN" "USA"   415285 3600359 87.6403887383518  117.3232     -552.5
"2003M07" 3  8855.441 "CHN" "USA" 426376.6 3620144 88.1265059788148 118.31814     -717.4
"2003M08" 3  8855.441 "CHN" "USA" 444405.2 3634070 88.7374547228043 121.06068     -949.9
"2003M09" 3  8855.441 "CHN" "USA" 462433.9 3647996 87.8689113231125 120.48258     -790.7
"2003M10" 3  8855.441 "CHN" "USA" 480462.5 3661922 86.4437653981293 119.68214      -1097
"2003M11" 3  8855.441 "CHN" "USA" 459426.1 3668915 87.1762972397664 118.16027     -862.4
"2003M12" 3  8855.441 "CHN" "USA" 438389.6 3675909 86.1886065218889  113.6926 -1137.1001
"2004M01" 3  3054.515 "CHN" "USA" 417353.2 3682902 85.3595266527873 108.61776     -915.8
"2004M02" 3  3054.515 "CHN" "USA" 434094.1 3692462 84.1682250127312 108.43234    -1021.6
"2004M03" 3  3054.515 "CHN" "USA" 450835.1 3702022 85.8602293325678 110.82971    -1250.6
"2004M04" 3  3054.515 "CHN" "USA"   467576 3711582 86.4399453470538  111.8528    -1089.2
"2004M05" 3  3054.515 "CHN" "USA" 480282.2 3723227 87.9178017422674 114.70498     -806.5
"2004M06" 3  3054.515 "CHN" "USA" 492988.5 3734872 87.1046063014496 113.60893       -978
"2004M07" 3  3054.515 "CHN" "USA" 505694.7 3746517 86.8262024465545 113.53995     -753.5
"2004M08" 3  3054.515 "CHN" "USA" 525372.8 3759281  87.392378701273   113.966     -608.5
"2004M09" 3  3054.515 "CHN" "USA" 545050.9 3772044 87.4111484209643 112.92165     -624.5
"2004M10" 3  3054.515 "CHN" "USA" 564729.1 3784808 86.0960776241377  110.8545     -294.6
"2004M11" 3  3054.515 "CHN" "USA" 539410.4 3798823 83.5447414579257  105.9082     -292.2
"2004M12" 3  3054.515 "CHN" "USA" 514091.7 3812838 82.0541941942796 103.66364     -572.9
"2005M01" 3 1885.4142 "CHN" "USA"   488773 3826852 82.1078962405486 103.37172     -304.8
"2005M02" 3 1885.4142 "CHN" "USA" 506251.4 3833060 82.8945232416078 103.84805     -510.8
"2005M03" 3 1885.4142 "CHN" "USA" 523729.8 3839267 82.0825859930692 102.95734     -700.6
"2005M04" 3 1885.4142 "CHN" "USA" 541208.2 3845474 82.7821510742878 104.66167     -472.7
"2005M05" 3 1885.4142 "CHN" "USA" 557561.4 3855544 83.1157669337385 106.34795     -990.3
"2005M06" 3 1885.4142 "CHN" "USA" 573914.6 3865615 84.2998988532275 108.73842     -632.6
"2005M07" 3 1885.4142 "CHN" "USA" 590267.8 3875685 85.8133001727106 111.08463     -750.3
"2005M08" 3 1885.4142 "CHN" "USA" 616303.7 3883022 86.0680785320791  109.0041     -542.6
"2005M09" 3 1885.4142 "CHN" "USA" 642339.7 3890359 86.1042618695447 108.34776     -788.6
"2005M10" 3 1885.4142 "CHN" "USA" 668375.6 3897696   87.44376973554 109.18375     -434.4
"2005M11" 3 1885.4142 "CHN" "USA" 640514.5 3915218  88.326428758111 111.26317     -713.4
"2005M12" 3 1885.4142 "CHN" "USA" 612653.4 3932741 87.9763728373693  111.1355     -426.3
"2006M01" 3  5560.878 "CHN" "USA" 584792.3 3950263 86.5881366299368  110.3442     -678.8
"2006M02" 3  5560.878 "CHN" "USA"   609045 3953497 86.9561957937083    111.22     -609.2
"2006M03" 3  5560.878 "CHN" "USA" 633297.8 3956730 86.7243768381664 110.81589    -1333.8
"2006M04" 3  5560.878 "CHN" "USA" 657550.5 3959964  86.292626886158 109.12933    -1344.8
"2006M05" 3  5560.878 "CHN" "USA" 672993.5 3961950 84.4130107730991  104.7291      -1151
"2006M06" 3  5560.878 "CHN" "USA" 688436.4 3963936 85.6043521050252  105.7272    -1499.6
"2006M07" 3  5560.878 "CHN" "USA" 703879.4 3965922 85.5206781503312 105.19274     -807.5
"2006M08" 3  5560.878 "CHN" "USA" 738997.2 3977076 85.4012147233782 104.92684      -1113
"2006M09" 3  5560.878 "CHN" "USA" 774114.9 3988230   86.26758063039 105.98315    -1785.8
"2006M10" 3  5560.878 "CHN" "USA" 809232.6 3999383 87.2939849273982 107.83578    -1019.7
"2006M11" 3  5560.878 "CHN" "USA"   785074 4003297 86.9312059561653  106.8941    -1656.7
"2006M12" 3  5560.878 "CHN" "USA" 760915.3 4007211 87.0858381296629 105.88695    -1223.4
"2007M01" 3 14253.283 "CHN" "USA" 736756.6 4011125 88.1709058655663  108.1406    -2221.5
"2007M02" 3 14253.283 "CHN" "USA" 772398.6 4019661 88.4680309310452 107.47453    -1393.2
"2007M03" 3 14253.283 "CHN" "USA" 808040.6 4028198 88.2165896802754 106.04984    -2480.4
"2007M04" 3 14253.283 "CHN" "USA" 843682.5 4036734 87.5223551122833 103.90254      -3041
"2007M05" 3 14253.283 "CHN" "USA" 868947.3 4044838 88.4249762412445 103.01221    -2810.5
"2007M06" 3 14253.283 "CHN" "USA"   894212 4052943 89.7994176184484 102.72694    -2797.6
"2007M07" 3 14253.283 "CHN" "USA" 919476.8 4061047 90.1439913275446 100.60493    -2321.1
"2007M08" 3 14253.283 "CHN" "USA" 965902.5 4069300 91.1855516013926 100.73556    -2192.1
"2007M09" 3 14253.283 "CHN" "USA"  1012328 4077553 90.5984099873853  99.03867    -2978.4
"2007M10" 3 14253.283 "CHN" "USA"  1058754 4085805 89.5003273545357  96.85046 -4514.5996
"2007M11" 3 14253.283 "CHN" "USA"  1028848 4080256 89.2296783629896  94.79352      -3270
"2007M12" 3 14253.283 "CHN" "USA" 998941.9 4074707 90.6907358463584  96.37177    -2581.6
"2008M01" 3   15328.8 "CHN" "USA" 969035.9 4069158 91.4561432930558  95.55553      -2046
"2008M02" 3   15328.8 "CHN" "USA"  1023150 4076921 93.5013522500989  95.10735      -5062
"2008M03" 3   15328.8 "CHN" "USA"  1077263 4084683  92.190051215958  92.49255    -4052.7
"2008M04" 3   15328.8 "CHN" "USA"  1131377 4092446 93.1322198043677   92.6712    -5042.4
"2008M05" 3   15328.8 "CHN" "USA"  1156253 4085258 93.9823709559253  93.72097    -2640.7
"2008M06" 3   15328.8 "CHN" "USA"  1181128 4078071 94.9891741527411  94.85728    -2735.2
"2008M07" 3   15328.8 "CHN" "USA"  1206004 4070883 94.8847806429896   94.3504    -2784.9
"2008M08" 3   15328.8 "CHN" "USA"  1236305 4041259 96.8387184905498  98.96593      -3223
"2008M09" 3   15328.8 "CHN" "USA"  1266607 4011636 99.4853398062497 102.48123    -1573.9
"2008M10" 3   15328.8 "CHN" "USA"  1296908 3982012 103.995483591401  110.8625    -3169.7
"2008M11" 3   15328.8 "CHN" "USA"  1225333 3966539 106.160585135563 114.07413    -1732.3
"2008M12" 3   15328.8 "CHN" "USA"  1153759 3951066 103.497649081851 110.66965    -2386.7
"2009M01" 3  5691.213 "CHN" "USA"  1082184 3935593 104.036770558405  110.2786    -4105.6
"2009M02" 3  5691.213 "CHN" "USA"  1130807 3933372 105.915488371398 113.40341    -4055.5
"2009M03" 3  5691.213 "CHN" "USA"  1179430 3931152 107.163796022779 114.57028    -7922.8
"2009M04" 3  5691.213 "CHN" "USA"  1228053 3928932 105.124889734425  112.8565    -5349.3
"2009M05" 3  5691.213 "CHN" "USA"  1257157 3933672 102.206898494182   107.628    -4022.9
"2009M06" 3  5691.213 "CHN" "USA"  1286261 3938412 100.704582631571 104.74212    -6657.6
"2009M07" 3  5691.213 "CHN" "USA"  1315365 3943152 100.204171397691 102.92252    -5497.3
"2009M08" 3  5691.213 "CHN" "USA"  1369192 3957189 99.4111044645282 101.29693    -5129.1
"2009M09" 3  5691.213 "CHN" "USA"  1423020 3971226 98.2524437102784  99.80137    -6877.1
"2009M10" 3  5691.213 "CHN" "USA"  1476847 3985263 96.7217188088701   98.4444    -6251.1
"2009M11" 3  5691.213 "CHN" "USA"  1411784 3991906 96.2560607379722  97.63283      -7399
"2009M12" 3  5691.213 "CHN" "USA"  1346720 3998549 97.2920921338491  98.14178    -6352.9
"2010M01" 3 37093.566 "CHN" "USA"  1281657 4005192  97.262547019236  98.15366   -18084.2
"2010M02" 3 37093.566 "CHN" "USA"  1339734 4018111 98.9521698050686 100.18134   -17574.3
"2010M03" 3 37093.566 "CHN" "USA"  1397811 4031031 98.4549443924715  99.84082    -4646.8
"2010M04" 3 37093.566 "CHN" "USA"  1455888 4043950 98.7186784280133  100.0141    -3887.3
"2010M05" 3 37093.566 "CHN" "USA"  1492501 4054372 101.432033026936 103.97416    -3251.9
"2010M06" 3 37093.566 "CHN" "USA"  1529114 4064794 102.688692447672 105.27413    -2925.3
"2010M07" 3 37093.566 "CHN" "USA"  1565727 4075216 101.455376872199 102.78316 -4395.4004
"2010M08" 3 37093.566 "CHN" "USA"  1641209 4082274 100.251113649551 101.02351  -8410.199
"2010M09" 3 37093.566 "CHN" "USA"  1716691 4089332 100.241808068078  99.60337    -8672.6
"2010M10" 3 37093.566 "CHN" "USA"  1792174 4096391 98.7447521503819   95.1022      -9068
"2010M11" 3 37093.566 "CHN" "USA"  1724092 4093101 100.476290307228  96.04137   -10536.6
"2010M12" 3 37093.566 "CHN" "USA"  1656011 4089811 101.321593833164  98.00818    -9005.7
end

In my dataset, the dependent variable is imports and the independent variables are conflict,GDP_im, GDP_ex, Exrate_im, Exrate_ex. Besides, hs_c means the different categories of international trade, and I want to analyze the influence degrees of conflict on different kinds of trade. But I don't know how to do this with panel data.

As I'm new with such kind of analysis, I would very much appreciate your help. Best, Kendal

asroprobit may estimate the variables that are not defined in the code?

Hello,

When I followed the tutorial to learn the asroprobit command in the Stata software, I typed the following code:

use http://www.stata-press.com/data/r15/wlsrank
asroprobit rank high low if noties, case(id) alternatives(jobchar) casevars(female score) reverse

Some of the output results is confusing, and what are the results estimated for the last five rows in the resulting table? I didn't define "/ln12_2, /ln13_3, /12_1, /13_1, /13_2" in the code.

Besides, when I include case-specific variables in rank-ordered logistic regression model, the estimation will omit the case-specific variables. Can a rank-ordered logistic regression model analyze case-specific variables?

Tuesday, October 25, 2022

Tabulating estimates of endogenous variables in dynamic forecast models

I am running some dynamic forecast models, each of which contains 7 endogenous variables. Since there are missing values for some of the endogenous variables, some values are model-run during the forecasting.

The forecasting has been completed, but I would like to take a look at the model-run estimates of 3 endogenous variables - the source of one of them is "estimates", and the sources of the other two are "identity".

Which command(s) should I use?

Thanks in advance!

[LASSO] Collinear covariates: Suggested addition to the documentation

I would like to suggest an addition to the documentation for collinear covariates in LASSO models. The Summary section currently reads as follows:

Summary
Consider factor variablegroupthat takes on the values 1, 2, and 3. If you type
. lasso linear y i.group. . .
lassowill know that separate covariates forgroup1, 2, and 3 are to be included among the variables
to be potentially included in the model.
If you create your own indicator variables, you need to create and specify indicators for all the
values of the factor variable:
. generate g1 = (group==1)
. generate g2 = (group==2)
. generate g3 = (group==3)
. lasso linear y g1 g2 g3. . .
It is important that you do not omit one of them, say,g1, and instead type
. lasso linear y g2 g3. . .

While tinkering around, I discovered that that one must not use ib#.group in place of i.group. Doing so causes the specified base level to be omitted, and will therefore give different results. I think a warning about this should be added to the documentation. E.g., something like this could be added to the Summary section.

Note as well that you must not use the ib# prefix, because that will cause the selected base level to be omitted. For example, using ib1.group is equivalent to including g2 and g3 but not g1.

I'm sure the folks who write the documentation can improve on the wording, but I hope this gets the idea across.

For anyone who is interested, the code for my "tinkering" is pasted below.

Cheers,
Bruce

Code:

// File:  LASSO_collinear_covariates.do
// Date:  25-Oct-2022
// Name:  Bruce Weaver, bweaver@lakeheadu.ca

// Suggestion:  Caution users of LASSO that factor variables will not
// be handled as described in the documentation if one uses ib#.variable.
// Only the i.variable form of factor variable notation is handled properly.

// The relevant documentation can be seen here:
// https://www.stata.com/manuals/lassocollinearcovariates.pdf#lassoCollinearcovariates

// Use auto.dta to create an example like the one described.
clear *
sysuse auto

// Create 5 indicator variables for rep78
forvalues i = 1(1)5 {
    generate byte rep`i' = rep78 == `i' if !missing(rep78)
}
summarize rep1-rep5

// NOTE that you must reset the seed before estimating each model.

* [1] Use factor variable notation for rep78
set seed 1234
quietly lasso linear mpg i.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)

* [2] Use the 5 indicator variables for rep78
set seed 1234
quietly lasso linear mpg rep1 rep2 rep3 rep4 rep5 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)

// Q. What happens if one uses ib#.rep78 rather than i.rep78?

forvalues i = 1(1)5 {
set seed 1234
display "Base level for rep78 = "`i'
quietly lasso linear mpg ib`i'.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     
}

// A. Stata omits the base level when I do that.
// Let's check a couple of them to verify.  

* Factor variable notiation with ib3.rep78
set seed 1234
quietly lasso linear mpg ib3.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     
* Indicator variables with rep3 omitted
set seed 1234
quietly lasso linear mpg rep1 rep2 rep4 rep5 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     

* Factor variable notiation with ib5.rep78
set seed 1234
quietly lasso linear mpg ib5.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     
* Indicator variables with rep5 omitted
set seed 1234
quietly lasso linear mpg rep1 rep2 rep3 rep4 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     

// Confirmed.

How to weight data using CPS weights in STATA?

Hello,

I have CPS monthly data 2015-present. I want to calculate unemployment rate. I am using empstat and labforce variables from the CPS. I am also using wtfinl as the weights for the data.

My question is how to calculate the weighted unemployment rate using this data? I was able to calculate the unweighted rate using

gen unemployed = (empstat==21)+(empstat==22)
gen lf = labforce==2
gen unemp_rate = (unemployed==1)/(lf==1)

When I multiplied unemployment_rate by the wtfinl variable, I got very large unemployment rate. I do not know what am I doing wrong? How to weight the unemployment rate?

Also, a very stupid question, my data is individual level. How can I aggregate the weighted unemployment rate into a table of (year - month - weighted unemployment rate)? I mean I do not know how to report the weighted unemployment rate for each month of my period (2015-present).

Thanks

Stata dataset to latex table

I have a dataset in stata that looks like this:

quintile mean t_stat 1 0.0100 4.100 2 0.0200 4.200 3 0.0300 4.300 4 0.0400 4.400 5 0.0500 4.500 6 0.0600 1.200 Ideally I would like to output it to a stata table that is transposed and looks like the table below. Is there any easy way of doing so?

Help with reshaping the data

Dear community,

I am trying to bring some data into a different format. The data is World Uncertainty Index, which is publicly available for research purposes (https://worlduncertaintyindex.com/). The data provided is in the format that doesn't fit my overall dataset with other variables, so I've been trying to reshape it but no luck.

Current data format is

Period	AFG	AGO	ALB	ARE
2009q1	var1 value	var1 value	var1 value	var1 value
2009q2	var1 value	var1 value	var1 value	var1 value
2009q3	var1 value	var1 value	var1 value	var1 value
2009q4	var1 value	var1 value	var1 value	var1 value

where three-letter codes are ISO-codes for countries.

What I am trying to get is as follows:

Country	Period	Var1
AFG	2009q1	value
AFG	2009q2	value
AFG	2009q3	value
AFG	2009q4	value
ARE	2009q1	value
ARE	2009q2	value
ARE	2009q3	value

Thank you in advance for any advice!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str6 year byte(AFG AGO) double ALB byte ARE double ARG byte ARM double AUS
"1952q1" . .  . .   1 .         0
"1952q2" . .  . .   0 .         2
"1952q3" . .  . .   0 .         0
"1952q4" . .  . .   0 .         0
"1953q1" . .  . .   0 .         1
"1953q2" . .  . .   0 .         2
"1953q3" . .  . .   0 .         0
"1953q4" . .  . .   1 .         0
"1954q1" . .  . . 1.5 .         0
"1954q2" . .  . .   2 .         2
"1954q3" . .  . .   0 .         2
"1954q4" . .  . .   0 .         2
"1955q1" . .  . .   2 .         1
"1955q2" . .  . .   0 .         0
"1955q3" . .  . .   1 .         3
"1955q4" . .  . .   1 .         5
"1956q1" 0 .  0 .   0 .         0
"1956q2" 0 .  0 .   0 .         2
"1956q3" 0 .  0 .   1 .         2
"1956q4" 0 .  1 .   2 .         2
"1957q1" 0 . .5 .   0 .         0
"1957q2" 0 .  0 .   0 . .66666667
"1957q3" 0 .  0 .   0 . 1.3333333
"1957q4" 0 .  0 .   0 .         2
"1958q1" 1 .  0 .   0 .         1
"1958q2" 0 .  0 .   0 .         1
"1958q3" 0 .  0 .   1 .         3
"1958q4" 0 .  0 .   1 .         1
"1959q1" 0 .  0 .   0 .         0
"1959q2" 0 .  0 .   0 .         0
"1959q3" 0 .  0 .   0 .         0
"1959q4" 0 .  0 .   2 .         0
"1960q1" 0 .  0 .   0 .         1
"1960q2" 0 .  0 .   2 .         3
"1960q3" 0 .  0 .   2 .         0
"1960q4" 0 .  0 .   0 .         0
"1961q1" 0 .  0 .   0 .         0
"1961q2" 0 .  0 .   1 .         1
"1961q3" 0 .  0 .   0 .         2
"1961q4" 0 .  0 .  .5 .         1
"1962q1" 0 .  0 .   1 .         5
"1962q2" 0 .  0 .   0 .         4
"1962q3" 0 .  0 .   1 .         1
"1962q4" 0 .  0 .   1 .         0
"1963q1" 0 .  0 .   4 .         3
"1963q2" 0 .  0 .   3 .         0
"1963q3" 0 .  0 .   2 .         0
"1963q4" 1 .  0 .   0 .         0
"1964q1" 0 .  0 .   1 .         0
"1964q2" 0 .  0 .   0 .         0
"1964q3" 0 .  0 .   3 .         0
"1964q4" 1 .  0 .   3 .         2
"1965q1" 1 .  0 .   1 .         0
"1965q2" 0 .  0 .   1 .         2
"1965q3" 0 .  0 .   0 .         0
"1965q4" 0 .  0 .   0 .         0
"1966q1" 0 .  0 .   1 .         0
"1966q2" 2 .  0 .   2 .         0
"1966q3" 0 .  0 .   3 .         0
"1966q4" 0 .  0 .   1 .         0
"1967q1" 0 .  0 .   1 .         0
"1967q2" 0 .  0 .   0 .         1
"1967q3" 0 .  0 .   0 .         1
"1967q4" 0 .  0 .   0 .         0
"1968q1" 0 .  0 .   0 .         1
"1968q2" 0 .  0 .   1 .         1
"1968q3" 0 .  0 .   1 .         1
"1968q4" 0 .  0 .   0 .         1
"1969q1" 0 .  0 .   2 .         1
"1969q2" 0 .  0 .   0 .         3
"1969q3" 0 .  0 .   0 .         3
"1969q4" 0 .  0 .   0 .         0
"1970q1" 0 .  0 .   0 .         0
"1970q2" 1 .  0 .   0 .         1
"1970q3" 0 .  0 .   1 .         0
"1970q4" 0 .  0 .   5 .         2
"1971q1" 0 .  0 0   3 .         4
"1971q2" 0 .  0 0   2 .         0
"1971q3" 1 .  0 0   0 .         3
"1971q4" 0 .  0 0   1 .         3
"1972q1" 0 .  0 1   3 .         3
"1972q2" 0 .  0 0   3 .         0
"1972q3" 0 .  0 0   4 .         3
"1972q4" 0 .  0 0   1 .         0
"1973q1" 0 .  1 0   3 .         2
"1973q2" 0 .  0 0   2 .         8
"1973q3" 0 .  0 1   1 .         2
"1973q4" 0 .  0 0   3 .         1
"1974q1" 3 0  0 1   1 .         0
"1974q2" 0 0  0 0   1 .         0
"1974q3" 0 0  0 0   3 .         2
"1974q4" 0 1  0 0   3 .         3
"1975q1" 0 0  0 0   1 .         0
"1975q2" 0 0  0 0   2 .         2
"1975q3" 0 0  0 0   0 .         2
"1975q4" 0 0  0 0   1 .         1
"1976q1" 0 0  0 0   1 .         0
"1976q2" 0 0  0 0   2 .         2
"1976q3" 0 1  1 0   0 .         0
"1976q4" 0 2  0 0   1 .         2
end

Show the working directory and save it in a macro

Hi~
I can use command cd or pwd to show the working directory. But how can I save the returned results in a macro.
If I can do this, it will be much easier when I share my codes with others. They will not need to update and reset the working director manually before running my codes.

Any advice?

Thanks~

Adding % to stacked bar chart

Dear statalisters,

is there a way to add %-signs (by code) to my stacked bar chart?

here is the code for data preperation and graph:

foreach var of varlist (Q603_1_N - Q603_7_N) {
recode `var' 3=1 1=2 6=3 2=4 5=5 4=6
}

mvdecode Q603_1_N - Q603_7_N, mv (6=.a)

foreach var of varlist (Q603_1_N - Q603_7_N) {
label define `var' 1 "gut" 2 "eher gut" 3 "teils, teils" ///
4 "eher schlecht" 5 "schlecht", modify
}

* Vorbereitung für Grafik
clonevar ta1= Q603_1_N
clonevar ta2= Q603_2_N
clonevar ta3= Q603_3_N
clonevar ta4= Q603_4_N
clonevar ta5= Q603_5_N
clonevar ta6= Q603_6_N
clonevar ta7= Q603_7_N

local ta1: var lab ta1
dis "`ta1`"
foreach var of varlist _all {
local `var`: var lab `var`
}

gen id = _n
reshape long ta, i(id)
rename ta score
*-------------------------------------------------------------------------------------------------------------------------
set scheme prognos
graph set window fontface "Franklin Gothic Book"
graph hbar (percent) if BundeslandAdressinformation== "Nordrhein-Westfalen" , over (score) over(_j, relabel ///
(1 "Werdende Eltern" 2 "Paarfamilien (Familien mit zwei Elternteilen)" 3 "Alleinerziehende (Familien mit einem Elternteil)" ///
4 "Familien mit Migrationshintergrund" 5 "Personen in Trennung/Scheidung" 6 "Sozial schwache Familien" ///
7 "Familien mit wenig Internetkenntnissen")) asyvars percentage stack ///
blabel(bar, pos(center) size(2.5) color(white) format(%2.0f)) ytitle("") ylabel( 0 "0%" 20 "20%" 40 "40%" 60 "60%" 80 "80%" 100 "100%") ///
title("") legend(pos(bottom) cols(5)) name(graph1, replace) xsize(7)

Thanks a lot! Beste regards, Tim

Monday, October 24, 2022

Cleaning string variable

Dear all,

I have a variable called place_birth in my dataset. Some of the locations weren't recorded properly.

place_birth

Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
ST B(?), canton de Chaillot
(?Chanvrand)Canton de La Guiche
Seine-Inférieure (Seine-Maritime)
Épinay-sur-Seine ,
Autine (?) Outines
Darrois ? Darvois

I would like to do two things.
First, separate what is inside parenthesis () and comma , and = from the text. With what I separate I can create an new variable called place_new
Second, clean both variable from weird signs like ?, =, . at the end, /, etc...

For example
Épinay-sur-Seine ,

should look like
Épinay-sur-Seine

replace ? and (?) with a comma
Autine (?) Outines
it becomes
Autine , Outines

For this one:
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)

Eliminate "Saint-James =" and just leave:
Feucherolles (Le château royal de Sainte-Gemme)

Then I can separate the strings by comma and parenthesis so that for example:

place_birth
(?Chanvrand)Canton de La Guiche

becomes:
place_new
Chanvrand

Or:
place_birth
Seine-Inférieure (Seine-Maritime)

Becomes in the new var:
place_new
Seine-Maritime

Error merging master data with using data keeping the merge key as city and statefip

I need to convert my master data to county level. For that in my master data I have statefip and city name. In my using data I have city , statefip , county_fip and county name.

The issue arises, since plenty of city in different states have multiple county code/county name as I'm showing a part of duplicates ( from my using data ) data with same city and state but different county name /couty_fip code.

Code:

list city state_id county_name, sepby(city state_id) noobs

  +---------------------------------------+
  |        city   state_id    county_name |
  |---------------------------------------|
  |      Midway         FL        Gadsden |
  |      Midway         FL     Santa Rosa |
  |      Midway         FL       Seminole |
  |---------------------------------------|
  |    Woodbury         NY         Nassau |
  |    Woodbury         NY         Orange |
  |---------------------------------------|
  |     Oakwood         OH       Cuyahoga |
  |     Oakwood         OH     Montgomery |
  |     Oakwood         OH       Paulding |
  |---------------------------------------|
  |    Franklin         PA        Cambria |
  |    Franklin         PA        Venango |
  |---------------------------------------|
  |  Georgetown         PA         Beaver |
  |  Georgetown         PA      Lancaster |
  |  Georgetown         PA        Luzerne |
  |---------------------------------------|

This is how my master data looks like :

dataex id city statefip if city=="Georgetown" & statefip=="PA"

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str14 id str43 city str3 statefip
"11311793" "Georgetown" "PA"
"12054868" "Georgetown" "PA"
"13761470" "Georgetown" "PA"
"63145796" "Georgetown" "PA"
"63058645" "Georgetown" "PA"
"11415903" "Georgetown" "PA"
"62891646" "Georgetown" "PA"
"10933939" "Georgetown" "PA"
"17385483" "Georgetown" "PA"
"10925795" "Georgetown" "PA"
end

This is how my using data looks like

dataex city statefip county_fips county_name if city== "Georgetown" & statefip=="PA"

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str35 city str2 statefip long county_fips str21 county_name
"Georgetown" "PA" 42007 "Beaver"   
"Georgetown" "PA" 42071 "Lancaster"
"Georgetown" "PA" 42079 "Luzerne"  
end

When I'm using m:1

Code:

 
merge m:1 city statefip using "using.dta", keep(master match) nogen
variables city statefip do not uniquely identify observations in the using data
r(459);

end of do-file

r(459);

Whenever I'm using 1:m the following error message shows up

Code:

merge 1:m city statefip using "city_countyfipshort.dta", keep(master match) nogen
variables city statefip do not uniquely identify observations in the master data
r(459);

end of do-file

r(459);

What direction should I go to given this particular problem ?