Hi everyone,
I'm wondering if there's a way to create a risk prediction variable from the results of two different Heckman Correction models--one for Selection Bias and one for Self-Selection Bias. I think I've seen this done in academic literature before but the paper didn''t explain its methodology in detail. For example, let's say I have;
// For Self-Selection Bias (Variables for Heckman: lnsales, lnemp, fcf_w, mktshr, mss, hhi, lnbseg, lngseg, unrate):
heckman status c.preld c.durld c.postld i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl c.bm_w, select(status = i.regul c.lnsales c.lnemp c.fcf_w c.mktshr c.mss c.hhi c.lnbseg c.lngseg c.unrate c.bm_w i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl) nocons twostep first mills(imr) //vce(bootstrap)
/* (After running the first heckman bit of code just above, type "drop imr" in the command line before you run the second heckman bit of code just below here.) */
// For Selection Bias (Variables for Heckman: FPS variable, fcf_w):
heckman status c.preld c.durld c.postld i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl c.bm_w c.fcf_w, select(status = i.fps c.bm_w i.cu c.jdg i.presid c.retexl c.retvolatl c.retskewl c.turnovrl c.fcf_w) nocons twostep first mills(imr) //vce(bootstrap)
(Note that variables preld, durld and postld needed to be removed from the 1st stages of the models because of the economic nature of those variables.)
The top model involves 9 "Heckman variables," while the bottom one only involves 2. One such variable is common to both models--fcf_w.
Any advice would be greatly appreciated, thank you!!
J.J.
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Monday, October 31, 2022
Using mca or pca to generate a single variable
Dear colleagues I know I can use principle component analysis (PCA) on categorical variables by first coding them as dummies. Multiple correspondent analysis(MCA) can also be used but you don't need to convert your categorical variables into dummies. Let me assume I will convert my variables into dummies and use PCA and ask the below question.
I have A latent variabel called Accessibility inferred by five likert questions (scale 1-7) and other questions below:
1 How much do you spend on NHIF
2 Mode of NHIF payment code 1 '"Out pocket" 2 "Sallary deduction" 3 "Insurance"
I want combine all these questions to generate a singel variable called Accessibility and I plan to generate a set of dummies from my categorical variables inorder to use pca.
suppose my dummies were a b c d e f g h for categorical variables and I do:
pca a b c d e f g h How much do you spend on NHIF
predict comp
Can I use comp as my Accessibility variable?I just want to rename it to Accessibility and use it.I know the predicted comp will be first principle component anyway.
The main question here is can I work with the first component as my Accessibility variable?
I plan to use the same Idea for affordability and quality variables which are inferred by other several questions.Please advice
I have A latent variabel called Accessibility inferred by five likert questions (scale 1-7) and other questions below:
1 How much do you spend on NHIF
2 Mode of NHIF payment code 1 '"Out pocket" 2 "Sallary deduction" 3 "Insurance"
I want combine all these questions to generate a singel variable called Accessibility and I plan to generate a set of dummies from my categorical variables inorder to use pca.
suppose my dummies were a b c d e f g h for categorical variables and I do:
pca a b c d e f g h How much do you spend on NHIF
predict comp
Can I use comp as my Accessibility variable?I just want to rename it to Accessibility and use it.I know the predicted comp will be first principle component anyway.
The main question here is can I work with the first component as my Accessibility variable?
I plan to use the same Idea for affordability and quality variables which are inferred by other several questions.Please advice
ipdmetan - estimation_command
I would like to use ipdmetan to do an IPD meta-analysis for studies with treatment response yes/no as the outcome. The sample code provided shows how to set up the estimation_command for hazard ratios, but not for dichotomous or 2x2 format outcomes. Any help would be much appreciated, and apologies if this is a stupid question! I'm a primary care researcher without access to a biostatistician who works with IPD meta-analysis. David Fisher
Testing dominance between Lorenz curves
Hello Stata Experts,
I will request information on the Stata code to test for dominance between two or more Lorenz curves. I am using Stata 15.1. Earlier, I used 'dominance' - a user written command but now it is nowhere to be found.
Thank you
Deepali
I will request information on the Stata code to test for dominance between two or more Lorenz curves. I am using Stata 15.1. Earlier, I used 'dominance' - a user written command but now it is nowhere to be found.
Thank you
Deepali
Conducting multinomial regression on multiply imputed variables for subset data
Hi
Is there a way to conduct multinomial regression for a weighted subsample datasets? For e.g. I used mi estimate: mlogit DV IV1 IV2 IV2[iweight=ngwt] for the full sample set N = 5777. However, out of the full sample set, I want to conduct multinomial regression on households that have received at least one welfare assistance (e.g. TANF, SNAP, WIC, etc). So I created a new variable called assist and replaced it with a binary (y/n) coded as 1 if assist (households with any of the welfare assistance), new sample set N = 676.. Then I ran a mi estimate:mlogit. However, the number of observations were the same as the full sample set 5777. Why is that?
Secondly, if I want to change my condition to households receiving at least ONE welfare assistance rather than ANY, e.g. so that I can compare households who did NOT receive welfare assistance vs households who received at least ONE, how would I code it? Someone suggested creating a by id and sort with egen?
Can I still use mi estimate:mlogit function with weights?
Please advise.
Thank you
Lena
Is there a way to conduct multinomial regression for a weighted subsample datasets? For e.g. I used mi estimate: mlogit DV IV1 IV2 IV2[iweight=ngwt] for the full sample set N = 5777. However, out of the full sample set, I want to conduct multinomial regression on households that have received at least one welfare assistance (e.g. TANF, SNAP, WIC, etc). So I created a new variable called assist and replaced it with a binary (y/n) coded as 1 if assist (households with any of the welfare assistance), new sample set N = 676.. Then I ran a mi estimate:mlogit. However, the number of observations were the same as the full sample set 5777. Why is that?
Secondly, if I want to change my condition to households receiving at least ONE welfare assistance rather than ANY, e.g. so that I can compare households who did NOT receive welfare assistance vs households who received at least ONE, how would I code it? Someone suggested creating a by id and sort with egen?
Can I still use mi estimate:mlogit function with weights?
Please advise.
Thank you
Lena
Date values are different
Hello, I have values for the dates given in LB21B.
I used the code egen LB21B_12=convertCMC(LB21B)
when I apply %td on both values, none of its showing the correct date and both of them are different form each other as well. The right years would be somewhere beteen 2000 and 2005, but here some times it goes back to 1900, or 1959 or forward even 3033.
How do I correct this?
Im using Stata 14 MP.
I used the code egen LB21B_12=convertCMC(LB21B)
when I apply %td on both values, none of its showing the correct date and both of them are different form each other as well. The right years would be somewhere beteen 2000 and 2005, but here some times it goes back to 1900, or 1959 or forward even 3033.
How do I correct this?
Im using Stata 14 MP.
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(LB21B LB21B_12) 180605 . 180605 . 140103 . 140103 . 140103 . 140103 . 140103 . 140103 . 140103 . 10100 285474 10100 285481 300903 . 300903 . 0 . 0 . 150802 . 150802 . 150802 . 150802 . 150802 . 0 . 0 . 0 . 0 . 280604 . 0 . 0 . 0 . 0 . 0 . 70600 2126902 70600 2126905 260604 . 260604 . 260604 . 260604 . 260604 . 261002 . 120604 . 120604 . 120604 . 131101 . 290203 . 290203 . 270802 . 270802 . 270802 . 0 . 0 . 170604 . 170604 . 10301 291601 10301 291602 10301 291595 90903 2744870 90903 2744865 90903 2744869 121001 . 250505 . 250505 . 250505 . 280103 . 280103 . 220302 . 240604 . 240604 . 240604 . 51102 1533454 51102 1533445 50103 1503054 10702 303815 10702 303803 10702 303806 10702 303806 10702 303810 10702 303805 10702 303805 10702 303806 301103 . 301103 . 50404 1512210 200700 . 51002 1530403 120304 . 120504 . 120504 . 130503 . 130304 . 130304 . 110603 . 110603 . 171203 . 171203 . 180902 . 180902 . 91102 2750919 91102 2750933 250603 . 250603 . 21203 623438 end format %td LB21B format %td LB21B_12
A conceptual question about when should I add fixed effect and cluster the fixed effect?
Suppose I add a time-fixed effect to a panel data regression that I want to estimate using OLS. My question is conceptually when should I also cluster by time (in addition to adding fixed effects)?
How can I get several variables' mean values at the smae
I want to create a vector containing several variables' mean values. But the summarize command only return the last variables' mean value stored in r(mean).
Although I can make it by repeating the summarize command, I just want to ask if there is any command can return a list of variables' mean values?
Although I can make it by repeating the summarize command, I just want to ask if there is any command can return a list of variables' mean values?
Interpreting coefficients (percentage points vs percentage)
I was reading a paper that ran a difference-in-differences regression and the coefficient value was -0.036. The dependent variable is vote shares and none of the variables were logged. The author wrote that the impact of X on Y is a decrease of 3.6% of vote shares.
Is this interpretation correct?.. I thought the coefficients are percentage points not actual percentages.. This is an article published at a top social science journal and I feel like I'm the one missing something here.
In addition, for simple OLS regressions, are coefficients usually percentages? Sometimes people say percentages and sometimes percentage points.. and this has been giving me a headache.. Can anyone help me clarify please?
Is this interpretation correct?.. I thought the coefficients are percentage points not actual percentages.. This is an article published at a top social science journal and I feel like I'm the one missing something here.
In addition, for simple OLS regressions, are coefficients usually percentages? Sometimes people say percentages and sometimes percentage points.. and this has been giving me a headache.. Can anyone help me clarify please?
Interpreting coefficients (percentage vs percentage points)
I was reading a paper that ran a difference-in-differences regression and the coefficient value was -0.036. The dependent variable is vote shares and none of the variables were logged. The author wrote that the impact of X on Y is a decrease of 3.6% of vote shares.
Is this interpretation correct?.. I thought the coefficients are percentage points not actual percentages.. This is an article published at a top social science journal and I feel like I'm the one missing something here.
In addition, for simple OLS regressions, are coefficients usually percentages? Sometimes people say percentages and sometimes percentage points.. and this has been giving me a headache.. Can anyone help me clarify please?
Is this interpretation correct?.. I thought the coefficients are percentage points not actual percentages.. This is an article published at a top social science journal and I feel like I'm the one missing something here.
In addition, for simple OLS regressions, are coefficients usually percentages? Sometimes people say percentages and sometimes percentage points.. and this has been giving me a headache.. Can anyone help me clarify please?
Sunday, October 30, 2022
Old bug
Dear All,
I hope someone following Stata for many years can help me with remembering: there used to be a problem (and I hope I can recollect correctly) where a method was introduced and then later withdrawn by the StataCorp. Here is what I remember, any or all of the below may be wrong:
- it was related to calculation of standard errors or confidence intervals;
- with survey data
- probably introduced in version 8 and recalled (disabled) in version 9, or in the 9 to 10 transition, but unlikely anything newer.
- could be survey proportion or survey ratio commands, or something similar.
- could be https://www.stata.com/statalist/arch.../msg00126.html , but I somehow remember this regarding the SVY subset of commands.
I believe this was mentioned earlier in the Statalist, but googling now didn't result in any mentioning of that problem.
If anyone can point to more details, please do.
Much appreciated, Sergiy Radyakin
I hope someone following Stata for many years can help me with remembering: there used to be a problem (and I hope I can recollect correctly) where a method was introduced and then later withdrawn by the StataCorp. Here is what I remember, any or all of the below may be wrong:
- it was related to calculation of standard errors or confidence intervals;
- with survey data
- probably introduced in version 8 and recalled (disabled) in version 9, or in the 9 to 10 transition, but unlikely anything newer.
- could be survey proportion or survey ratio commands, or something similar.
- could be https://www.stata.com/statalist/arch.../msg00126.html , but I somehow remember this regarding the SVY subset of commands.
I believe this was mentioned earlier in the Statalist, but googling now didn't result in any mentioning of that problem.
If anyone can point to more details, please do.
Much appreciated, Sergiy Radyakin
What is a GROUP variable in difference in differences analysis?
Query: What is a GROUP variable in difference in differences analysis?
Study: Newspaper articles from 1985-2017 were categorized as either in favor, against, or neutral toward a specific topic. These are weighted and graphed.
Hypothesis: An event happened around 2007 that triggered a change in attitudes. I'd like to test this with a DID analysis.
Unit of Analysis: Newspapers for each year.
Outcome Variable: Favor -defined as newspaper articles with a favorable stance
Treatment: Treatment -defined as 0/1 "before 2008" and "2008 and on"
Time Variable: Year -defined as years 1985-2017
Group: ???
Syntax:didregress (Favor) (Treatment), group(?????) time(Year)
DID Model: See the image of the model
Array
A Control Variable?: I don't know if this is needed for DID. I created a dummy variable of the mean of Favor and Against.
Example Data:
What is a GROUP variable?
Study: Newspaper articles from 1985-2017 were categorized as either in favor, against, or neutral toward a specific topic. These are weighted and graphed.
Hypothesis: An event happened around 2007 that triggered a change in attitudes. I'd like to test this with a DID analysis.
Unit of Analysis: Newspapers for each year.
Outcome Variable: Favor -defined as newspaper articles with a favorable stance
Treatment: Treatment -defined as 0/1 "before 2008" and "2008 and on"
Time Variable: Year -defined as years 1985-2017
Group: ???
Syntax:didregress (Favor) (Treatment), group(?????) time(Year)
DID Model: See the image of the model
Array
A Control Variable?: I don't know if this is needed for DID. I created a dummy variable of the mean of Favor and Against.
Example Data:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int Year byte(Favor Against Neutral Treatment Control) 1985 29 71 0 0 50 1986 16 44 40 0 30 1987 43 43 14 0 43 1988 51 27 22 0 39 1995 60 10 30 0 35 1996 33 57 10 0 45 1997 30 69 1 0 50 1998 50 0 50 0 25 1999 44 31 25 0 38 2000 42 40 18 0 41 2001 80 20 0 0 50 2003 80 20 0 0 50 2004 0 75 25 0 38 2005 38 2 60 0 20 2006 100 0 0 0 50 2007 15 55 30 0 35 2008 19 68 13 1 44 2009 44 38 18 1 41 2010 72 14 14 1 43 2016 88 12 0 1 50 2017 82 9 9 1 46 2018 79 7 14 1 43 end
What is a GROUP variable?
Testing for model specification error in fractional regression
Hello,
I employed fractional regression model proposed by Wooldridge and Papke to estimate the dependent variable (a proportional variable that computes remaining/whole) of my study. The dependent variable falls between 0 and 1 (including 1). I ran the regression using the fracreg command on stata. However, I would like to run Ramsey reset test for model specification errors, as Wooldridge reports the reset test value in his paper. When I entered the command "ovtest" and "linktest", Stata states the following:
. estat ovtest
estat ovtest not valid
r(321);
. linktest
not possible after fracreg
r(131);
How do I test for model specification after fracreg command? What other assumptions must be evaluated after estimating fractional regression. As such, in OLS regression, one would test if assumptions such as homoscedasticity, absence of autocorrelation and normality is violated. Similarly, what other tests should one evaluate after fractional regression?
Appreciate your advice.
I employed fractional regression model proposed by Wooldridge and Papke to estimate the dependent variable (a proportional variable that computes remaining/whole) of my study. The dependent variable falls between 0 and 1 (including 1). I ran the regression using the fracreg command on stata. However, I would like to run Ramsey reset test for model specification errors, as Wooldridge reports the reset test value in his paper. When I entered the command "ovtest" and "linktest", Stata states the following:
. estat ovtest
estat ovtest not valid
r(321);
. linktest
not possible after fracreg
r(131);
How do I test for model specification after fracreg command? What other assumptions must be evaluated after estimating fractional regression. As such, in OLS regression, one would test if assumptions such as homoscedasticity, absence of autocorrelation and normality is violated. Similarly, what other tests should one evaluate after fractional regression?
Appreciate your advice.
VAR and Impulse response function
Dear Statalists,
I am working on a project where I need to
1. estimate the reduced form VAR to predict the residuals ( the shock/ the innovation) and the contemporaneous terms ( details in picture)
2. estimate the long-run (t=15) cumulative return responses to unit shocks of the innovations in structured VAR ( theta_rm, theta_x, theta_r). The short-run restriction is that rm can affect x and r, x can affect r, while r cannot affect rm and x contemporaneously.
This is the reduced VAR This is the structured VAR This is the relationship of innovation between reduced and structured VAR, considering the contemporaneous terms.
Array Array Array
After estimating all the parameters in reduced VAR, I need to get the cumulative impulse response function in structure VAR.
Here I have the problem:
1. What I am doing is basically deduct the contemporaneous terms in the variables of reduced form, and estimate the VAR all over again. Is it correct to do such procedure? Or is there other way to acquire the cirf at step =15 of structured VAR?
2. In the irf file, with 3 variables, there are in total 3^2 impulse-response relationships. Yet, I am asked to calculate the response to unit shock of each innovation. Does this mean the theta is a actually a vector? E.g. the response to unit shock of vwretd is the sum of response (vwretd sdvol_1 r_1)?
3. I noticed there is also -svar- in Stata. Is the mechanism of -svar- the same as my procedure from reduced VAR to structured VAR?
4. How can I return the response values in each step stored in irf file? I have checked the reference: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://ift.tt/LzyrZU3
With the reference, I tried
But get the error
Sorry the questions are a bit messily presented. The main questions I would love to ask are the problem 1-4.
I am working on a project where I need to
1. estimate the reduced form VAR to predict the residuals ( the shock/ the innovation) and the contemporaneous terms ( details in picture)
2. estimate the long-run (t=15) cumulative return responses to unit shocks of the innovations in structured VAR ( theta_rm, theta_x, theta_r). The short-run restriction is that rm can affect x and r, x can affect r, while r cannot affect rm and x contemporaneously.
This is the reduced VAR This is the structured VAR This is the relationship of innovation between reduced and structured VAR, considering the contemporaneous terms.
Array Array Array
Code:
var vwretd sdvol RET, lags(1/5) predict resid_rm, residuals equation (vwretd) predict resid_x, residuals equation (sdvol) predict resid_r, residuals equation (RET) reg resid_x resid_rm rename b_resid_rm b_1_0 reg resid_r resid_rm resid_x rename b_resid_rm c_1_0 rename b_resid_x c_2_0
Code:
gen sdvol_1 = sdvol - b_1_0*resid_rm gen r_1 = RET - c_1_0*resid_rm - c_2_0*resid_x var vwretd sdvol_1 r_1, lags(1/5) irf create myirf, set(myirfs, replace) step(15) irf table cirf, impulse(vwretd) response(vwretd sdvol_1 r_1) individual step(15) irf table cirf, impulse(sdvol_1) response(vwretd sdvol_1 r_1) individual step(15) irf table cirf, impulse(r_1) response(vwretd sdvol_1 r_1) individual step(15)
1. What I am doing is basically deduct the contemporaneous terms in the variables of reduced form, and estimate the VAR all over again. Is it correct to do such procedure? Or is there other way to acquire the cirf at step =15 of structured VAR?
2. In the irf file, with 3 variables, there are in total 3^2 impulse-response relationships. Yet, I am asked to calculate the response to unit shock of each innovation. Does this mean the theta is a actually a vector? E.g. the response to unit shock of vwretd is the sum of response (vwretd sdvol_1 r_1)?
3. I noticed there is also -svar- in Stata. Is the mechanism of -svar- the same as my procedure from reduced VAR to structured VAR?
4. How can I return the response values in each step stored in irf file? I have checked the reference: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://ift.tt/LzyrZU3
With the reference, I tried
Code:
irf describe myirf mat list _dta(step) or mat list r(step)
Code:
matrix _dta(step) not found matrix r(step) not found
Separating a string variable into separate variables
I have a string variable CODEX which has the underlying cause of death coded first, and any secondary causes of death following. The cause of deaths codes are from the ICD-10 codes (International Classification for Disease). All the causes of death are separated by spaces or commas.
I am attempting to separate this string variable CODEX into separate cause of death variables DEATH1, DEATH2 etc. Any suggestions for potential codes for this would be greatly appreciated as I am not too sure how to approach this issue.
I am attempting to separate this string variable CODEX into separate cause of death variables DEATH1, DEATH2 etc. Any suggestions for potential codes for this would be greatly appreciated as I am not too sure how to approach this issue.
Saturday, October 29, 2022
Generating Variable That Depicts Increase
My dataset comes from a survey with 5 waves that has a general score (variable GScore). I am trying to do the following:
1) I am trying to generate a new variables that shows the General Score increase from each wave to the next (so that a decrease in scores will be a negative value). What code would work best for this? I am currently trying to use the following unsuccessfully:
bysort id: gen GScore_increase = GScore - GScore[_n-1]
2) What code would calculate the overall mean increase (or potential decrease) in the GScore across all waves for all participants?
3) Is there an efficient code I could use to determine how many participants decreased or maintained the same GScore across all waves (i.e. how many never increased their Gscore across all the waves)?
1) I am trying to generate a new variables that shows the General Score increase from each wave to the next (so that a decrease in scores will be a negative value). What code would work best for this? I am currently trying to use the following unsuccessfully:
bysort id: gen GScore_increase = GScore - GScore[_n-1]
2) What code would calculate the overall mean increase (or potential decrease) in the GScore across all waves for all participants?
3) Is there an efficient code I could use to determine how many participants decreased or maintained the same GScore across all waves (i.e. how many never increased their Gscore across all the waves)?
SEM modeling ( path model)
Dear ll,
I hope you are doing well.
I'm working on my paper and i would like to study the mediating role of M on the relationship between X and Y like it's being showing following:
Array
Is it correct to study it using the sem command??
Is it called structural equations model? If yes, is it SEM path model ?
kind regards
I hope you are doing well.
I'm working on my paper and i would like to study the mediating role of M on the relationship between X and Y like it's being showing following:
Array
Is it correct to study it using the sem command??
Is it called structural equations model? If yes, is it SEM path model ?
kind regards
sdid in case of unbalanced panel
I need to run synthetic difference is difference regression. Therefore, I need balanced panel data. But, as you can see my sample year is from 2000-2021. So, there are 22 years total. For all the counties at least one year info for desired variable wanted is missing. When I'm running this following command it's telling me year and county are missing
When I'm running the following command all the observations are getting dropped out - indicating not even a single county has variable wanted for 22 years.
I'm attaching a part of my data
After using - tsfill, full -command I was successful to keep the disappearing counties to show up in my data to make it strongly balanced.
Then I replaced my wanted variable with 0 when wanted == . This is actually right since wanted is 0 when it doesn't show up in my data.
[CODE]
it's still showing unbalanced panel when I'm running the following stata command for SDID or synthetic difference in difference
Is there anything I can do to detect the error ?
Code:
tsset county year isid county year, sort variables county and year should never be missing r(459);
Code:
by county (year): keep if _N == 22
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(wanted county year) 2 1011 2002 2 1011 2003 1 1011 2004 1 1011 2019 1 1027 2000 2 1027 2002 1 1027 2008 1 1027 2009 1 1027 2013 1 1027 2018 4 1001 2000 3 1001 2001 1 1001 2002 1 1001 2003 3 1001 2004 5 1001 2005 2 1001 2006 3 1001 2007 2 1001 2008 2 1001 2009 3 1001 2010 2 1001 2011 7 1001 2012 3 1001 2013 3 1001 2014 3 1001 2015 2 1001 2016 7 1001 2017 11 1001 2018 3 1001 2019 3 1001 2020 end
Code:
tsset county year tsfill, full
[CODE]
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(wanted county year policy) 4 1001 2000 0 3 1001 2001 0 1 1001 2002 0 1 1001 2003 0 3 1001 2004 0 5 1001 2005 0 2 1001 2006 0 3 1001 2007 0 2 1001 2008 0 2 1001 2009 0 3 1001 2010 0 2 1001 2011 0 7 1001 2012 0 3 1001 2013 0 3 1001 2014 1 3 1001 2015 1 2 1001 2016 1 7 1001 2017 1 11 1001 2018 1 3 1001 2019 1 3 1001 2020 1 0 1001 2021 0 0 1001 2022 0 0 1011 2000 0 0 1011 2001 0 2 1011 2002 0 2 1011 2003 0 1 1011 2004 0 0 1011 2005 0 0 1011 2006 0 0 1011 2007 0 0 1011 2008 0 0 1011 2009 0 0 1011 2010 0 0 1011 2011 0 0 1011 2012 0 0 1011 2013 0 0 1011 2014 0 0 1011 2015 0 0 1011 2016 0 0 1011 2017 0 0 1011 2018 0 1 1011 2019 0 0 1011 2020 0 0 1011 2021 0 0 1011 2022 0 1 1027 2000 0 0 1027 2001 0 2 1027 2002 0 0 1027 2003 0 0 1027 2004 0 0 1027 2005 0 0 1027 2006 0 0 1027 2007 0 1 1027 2008 0 1 1027 2009 0 0 1027 2010 0 0 1027 2011 0 0 1027 2012 0 1 1027 2013 0 3 1027 2014 0 0 1027 2015 0 1 1027 2016 0 0 1027 2017 0 1 1027 2018 1 2 1027 2019 1 0 1027 2020 1 0 1027 2021 1 0 1027 2022 1 end
Code:
sdid wanted county year policy, vce(bootstrap) seed(1213) Panel is unbalanced. r(451);
Calculate age from other occurrences and attribute it to specific observation
Hi,
I'm working with consecutive censuses. I can follow the same individuals through several decades. However, age is not always reported (missing from the census, unreadable, etc.) and therefore a zero is shown instead of a missing (don't be mad, I know...moreover, newborns also show an age of 0...no comment). But age probably have been reported in a previous or a subsequent census. How can I use that information to infer age when it is 0 (when applicable)?
Also, age is not always consistent so (t-1 + 10) and (t+1 - 10) may yield different results. From my experience, most of the age spread through time range between 8 and 12 years so no matter the census year used in the calculation age should be in the ballpark. In the example below, how to determine which census to use in the calculation?
Finally, individuals are part of dyads (last variable) and may be present in more than one dyad. Note sure it is relevant in the calculation, but agediff is a dyad characteristic that will need to be updated afterwards.
I'm adding a few questions that may help figure out all the possible cases:
- A newborn will be coded as 0. What if in the next census the individual is also of age 0 (instead of 9-10)? Should the calculation start from the last occurrence to the first?
- What if it's the last occurrence that is 0?
Thanks
EDIT: data is coming from the censuses in SQL tables. Since individuals have only one occurrence by census (compared to multiple kin relationships in dyadic format), maybe I should figure how to recode age in SQL so that age remains consistent through all dyads.
I'm working with consecutive censuses. I can follow the same individuals through several decades. However, age is not always reported (missing from the census, unreadable, etc.) and therefore a zero is shown instead of a missing (don't be mad, I know...moreover, newborns also show an age of 0...no comment). But age probably have been reported in a previous or a subsequent census. How can I use that information to infer age when it is 0 (when applicable)?
Also, age is not always consistent so (t-1 + 10) and (t+1 - 10) may yield different results. From my experience, most of the age spread through time range between 8 and 12 years so no matter the census year used in the calculation age should be in the ballpark. In the example below, how to determine which census to use in the calculation?
Finally, individuals are part of dyads (last variable) and may be present in more than one dyad. Note sure it is relevant in the calculation, but agediff is a dyad characteristic that will need to be updated afterwards.
I'm adding a few questions that may help figure out all the possible cases:
- A newborn will be coded as 0. What if in the next census the individual is also of age 0 (instead of 9-10)? Should the calculation start from the last occurrence to the first?
- What if it's the last occurrence that is 0?
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long ego int census byte(ego_age agediff) long dyad 708884 1881 24 23 11415600 708884 1891 0 0 11415600 708884 1901 40 20 11415600 708884 1911 57 25 11415600 739865 1881 1 23 11415600 739865 1891 0 0 11415600 739865 1901 20 20 11415600 739865 1911 32 25 11415600 end
EDIT: data is coming from the censuses in SQL tables. Since individuals have only one occurrence by census (compared to multiple kin relationships in dyadic format), maybe I should figure how to recode age in SQL so that age remains consistent through all dyads.
Creating Graphs with Regression Output
I am attempting to determine whether hours worked and wages have changed overtime differently for different generations. So I am running the following regression:
Notes: (m_hrsov_a = actual hours worked including overtime; m_paygro_cpi_c = gross pay adjusted by cpi and categorised).
I need to graph the regression output (output_age.rtf) overtime, but I am not clear how. Any ideas?
Here is an example of the data:
Thanks in advance!
Code:
foreach var of varlist m_hrsov_a m_paygro_cpi_c { eststo: areg `var' m_age2 m_sex_bi m_child_c m_mar_c m_employed_bi m_cued_bi m_ethnic_bi m_occ_c m_region_c if m_age_c == 1, absorb(m_qtr) cluster(m_yr) esttab using output_age.rtf, replace nogap b(%9.3f) se(%9.3f) parentheses star(* 0.10 ** 0.05 *** 0.01) drop(*cons*) } esttab est1 est2 using output_age.rtf, replace nogap b(%9.3f) se(%9.3f) parentheses star(* 0.10 ** 0.05 *** 0.01) drop(*cons*) eststo clear
I need to graph the regression output (output_age.rtf) overtime, but I am not clear how. Any ideas?
Here is an example of the data:
Code:
Age | (Generations | ) - | Categorical | (4) | Freq. Percent Cum. -------------+----------------------------------- Gen. Z | 696,858 12.29 12.29 Millennials | 1,740,155 30.70 42.99 Gen. X | 1,879,408 33.15 76.14 Baby Boomers | 1,352,639 23.86 100.00 -------------+----------------------------------- Total | 5,669,060 100.00
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(m_yr m_qtr) byte m_age double m_hrsov_a float m_paygro_cpi 2002 1 31 . . 2002 1 69 . . 2002 1 26 . 730 2002 1 59 . . 2002 1 21 . . 2002 1 39 . . 2002 1 25 . . 2002 1 46 61 . 2002 1 52 . . 2002 1 26 30 1900 2002 1 38 20 25000 end
Mediation Analysis - Stata sem and medsem commands with fixed effects models
Hello everyone,
I am trying to perform a mediation analysis, in which the variables of concern are as follows:
Dependent Variable (Y) - Percentage_changes_per_industry
Independent Variable (X) - tenure_median
Mediator Variable (M) - median_tenure_promo_top2sf_1st
Control Variables - median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education norte algarve centro alentejo acores madeira
The issue I am having is that I want to include fixed effects models, but do not know how to do so when considering a mediation analysis. The commands for each of the individual regressions would be as follows:
First, regression with the Mediator as the Dependent Variable:
Second, regression with the Dependent Variable as the Dependent Variable:
The sem/medsem commands I am using, to perform the actual mediation analysis (and obtain values for the Sobel tests as well), are as follows:
The issue I have is that these sem/medsem commands are performing regular OLS regressions, and not fixed effects (on caem2) regressions, as I would like them to. Is there any way of implementing a fixed effects regressions in the sem models? It would help me solve the issue I am having with implementing these analyses on Stata.
Thank you very much in advance!
Rui
I am trying to perform a mediation analysis, in which the variables of concern are as follows:
Dependent Variable (Y) - Percentage_changes_per_industry
Independent Variable (X) - tenure_median
Mediator Variable (M) - median_tenure_promo_top2sf_1st
Control Variables - median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education norte algarve centro alentejo acores madeira
The issue I am having is that I want to include fixed effects models, but do not know how to do so when considering a mediation analysis. The commands for each of the individual regressions would be as follows:
First, regression with the Mediator as the Dependent Variable:
Code:
xtset caem2 year xtreg median_tenure_promo_top2sf_1st tenure_median median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education i.year norte algarve centro alentejo acores madeira, fe cluster(caem2)
Second, regression with the Dependent Variable as the Dependent Variable:
Code:
xtset caem2 year xtreg Percentage_changes_per_industry median_tenure_promo_top2sf_1st tenure_median median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education i.year norte algarve centro alentejo acores madeira, fe cluster(caem2)
The sem/medsem commands I am using, to perform the actual mediation analysis (and obtain values for the Sobel tests as well), are as follows:
Code:
sem (median_tenure_promo_top2sf_1st <- tenure_median median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education norte algarve centro alentejo acores madeira)(Percentage_changes_per_industry <- median_tenure_promo_top2sf_1st tenure_median median_age_promo_top2sf_1st gender log_nemp_median numb_firms_div1000 vn_per_employee_median higher_education norte algarve centro alentejo acores madeira) , nocapslatent vce(bootstrap, reps(100)) medsem, indep(tenure_median) med(median_tenure_promo_top2sf_1st) dep(Percentage_changes_per_industry) mcreps(1000) rit rid
The issue I have is that these sem/medsem commands are performing regular OLS regressions, and not fixed effects (on caem2) regressions, as I would like them to. Is there any way of implementing a fixed effects regressions in the sem models? It would help me solve the issue I am having with implementing these analyses on Stata.
Thank you very much in advance!
Rui
Portfolio construction
Hello,
I have data of individual investors trading behavior. Below you see an abstract of my data. My data is sorted by investor and date. I have an investor identifier, I have dates when the investor sold or bought a stock. The variable "stock" indicates the amount of stocks purchased (negative value indicates a sale). Furthermore, I have a variable indicating the price of the stock and one variable indicating the volume. Additionally, I have a stock identifier (cusip) and the indicator variable "sell" which is equal to one if a sale take place.
Now I want to create investor-stock-portfolios. My analysis examines the portfolio of stocks that an investor could sell on each day that they do sell at least one position. Therefore, I already marked all dates where an investor sold a stock using the indicator variable "sell". Now, I want to create an observation for each position held by an investor on these sell dates.
For example, investor 6002: in line 13 he sells one stock holding. This investor currently holds 3 stocks (including the one he will now sell) and now I want to add 2 additional observations for the other two stocks that he holds for this sell date, that means for investor 6002 on this sell date I want three observations.
I would appreciate any help!
I have data of individual investors trading behavior. Below you see an abstract of my data. My data is sorted by investor and date. I have an investor identifier, I have dates when the investor sold or bought a stock. The variable "stock" indicates the amount of stocks purchased (negative value indicates a sale). Furthermore, I have a variable indicating the price of the stock and one variable indicating the volume. Additionally, I have a stock identifier (cusip) and the indicator variable "sell" which is equal to one if a sale take place.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long investor float(dates stocks price volume) str8 cusip float sell 2290 13366 100 54.375 5437.5 "06071610" 0 2290 13382 200 21 4200 "86707110" 0 2290 13408 200 25 5000 "17958410" 0 2290 13437 500 10 5000 "80850988" 0 3206 11501 1000 2.25 2250 "02365010" 0 3206 11967 152.486 32.7899 5000 "92190830" 0 3206 11967 452.489 11.04999 5000 "77957Q10" 0 3206 12404 -1000 1.0312 -1031.2 "02365010" 1 3206 13457 257.732 19.4 5000 "08188080" 0 6002 12058 100 18.625 1862.5 "81137170" 0 6002 12064 200 5.25 1050 "55261810" 0 6002 12075 1000 1.125 1125 "50221010" 0 6002 12088 -100 23.75 -2375 "81137170" 1 6002 12092 100 8.625 862.5 "81090510" 0 6002 12177 100 16.5 1650 "02312710" 0 6002 12221 200 3.25 650 "59373610" 0 6002 12257 -100 23.75 -2375 "02312710" 1 6002 12263 100 11.625 1162.5 "55917720" 0 6002 12435 100 6.875 687.5 "57679810" 0 6002 12498 -100 16.5 -1650 "55917720" 1 6002 12528 200 5.5 1100 "55261810" 0 6002 12667 226.655 11.02998 2500 "77064830" 0 6002 13137 -100 11.25 -1125 "81090510" 1 6002 13156 116.279 10.32001 1200 "77064887" 0 6002 13276 -200 1.625 -325 "59373610" 1 6002 13346 62.814 11.94001 750 "77064887" 0 6002 13356 200 4.875 975 "41025210" 0 6002 13356 53.433 36.70017 1961 "62838030" 0 6002 13402 100 15.75 1575 "64121010" 0 6002 13408 200 4.5625 912.5 "87990540" 0 6002 13426 200 2.75 550 "55403410" 0 6002 13439 50 19.75 987.5 "07251010" 0 6002 13460 100 1.75 175 "87990540" 0 7109 11340 100 27.25 2725 "20810810" 0 7109 11353 -100 27.25 -2725 "20810810" 1 7109 11396 100 22.25 2225 "00190710" 0 7109 11416 100 24 2400 "20810810" 0 7109 11548 100 27.5 2750 "25283630" 0 7109 11553 100 21.75 2175 "01849010" 0 7109 11728 100 19.625 1962.5 "59408710" 0 7109 11926 100 33 3300 "25084710" 0 7109 12074 100 27 2700 "87156510" 0 7109 12087 -100 22.5 -2250 "01849010" 1 7109 12087 -100 23.375 -2337.5 "20810810" 1 7109 12095 50 48.375 2418.75 "38131710" 0 7109 12134 50 65.5 3275 "10904310" 0 7109 12137 -100 12.125 -1212.5 "25283630" 1 7109 12163 -50 44.125 -2206.25 "38131710" 1 7109 12179 100 28.5 2850 "53567810" 0 7109 12228 100 20.875 2087.5 "09367610" 0 7109 12282 100 21.5 2150 "72387710" 0 7109 12290 100 40.875 4087.5 "50025510" 0 7109 12319 50 36.625 1831.25 "59408710" 0 7109 12362 50 30.875 1543.75 "25084710" 0 7109 12362 100 26.125 2612.5 "22237210" 0 7109 12380 100 20.75 2075 "03785710" 0 7109 12395 -150 32.25 -4837.5 "59408710" 1 7109 12403 -100 23.5 -2350 "00190710" 1 7109 12409 -100 46.625 -4662.5 "50025510" 1 7109 12418 -100 18.75 -1875 "72387710" 1 7109 12430 100 28.75 2875 "44984210" 0 7109 12445 100 32 3200 "00157530" 0 7109 12472 100 27.25 2725 "80818810" 0 7109 12472 -100 24.75 -2475 "09367610" 1 7109 12479 100 21 2100 "86693310" 0 7109 12514 -100 23.625 -2362.5 "22237210" 1 7109 12514 -100 34.375 -3437.5 "00157530" 1 7109 12514 -100 27.75 -2775 "80818810" 1 7109 12515 -100 22.375 -2237.5 "86693310" 1 7109 12516 -100 25.75 -2575 "03785710" 1 7109 12562 200 13.875 2775 "86881810" 0 7109 12575 -100 46.5 -4650 "53567810" 1 7109 12589 -50 72.5 -3625 "10904310" 1 7109 12607 -150 25.125 -3768.75 "25084710" 1 7109 12627 -200 14.625 -2925 "86881810" 1 7109 12661 -100 19 -1900 "87156510" 1 7109 12697 150 16.5 2475 "04890310" 0 7109 12697 50 68.375 3418.75 "10904310" 0 7109 12697 100 20.25 2025 "77136710" 0 7109 12738 50 51 2550 "53567810" 0 7109 12817 -50 51.5 -2575 "53567810" 1 7109 12907 200 18.5 3700 "86693310" 0 7109 13031 100 44 4400 "40621610" 0 7109 13234 100 40.375 4037.5 "53567810" 0 7109 13269 100 26.25 2625 "03785710" 0 7109 13319 251.678 5.96 1500 "48625010" 0 7109 13355 200 13.75 2750 "23281510" 0 7109 13443 50 62.875 3143.75 "12692010" 0 7109 13450 100 37.25 3725 "88355610" 0 7109 13464 -100 13 -1300 "44984210" 1 7109 13467 250 8.625 2156.25 "92551410" 0 7348 12466 100 19.25 1925 "74143W10" 0 7348 12768 100 13.75 1375 "74143W10" 0 8774 11947 100 29.875 2987.5 "00282410" 0 8774 11948 211.685 11.81 2500 "31606120" 0 9003 11372 1000 4.75 4750 "35671310" 0 9003 11430 300 11.75 3525 "09954120" 0 9003 11612 -300 14 -4200 "09954120" 1 9003 12337 1000 3.125 3125 "35671310" 0 9003 12991 -2000 5.125 -10250 "35671310" 1 end format %td dates
Now I want to create investor-stock-portfolios. My analysis examines the portfolio of stocks that an investor could sell on each day that they do sell at least one position. Therefore, I already marked all dates where an investor sold a stock using the indicator variable "sell". Now, I want to create an observation for each position held by an investor on these sell dates.
For example, investor 6002: in line 13 he sells one stock holding. This investor currently holds 3 stocks (including the one he will now sell) and now I want to add 2 additional observations for the other two stocks that he holds for this sell date, that means for investor 6002 on this sell date I want three observations.
I would appreciate any help!
Friday, October 28, 2022
Reference period in new DID estimators
Hello all! I am running a few checks using the new heterogeneity-robust diff-in-diff estimators, specifically: csdid, did_imputation, and did_muliplegt. From each estimator, I retrieve the estimates for the leads and lags relative to the time of policy introduction and compare to the lead/lag estimates from a standard event study model (using reghdfe). It seems that these estimators by default choose the reference period as the latest lead, as opposed to the lead at period -1 right before policy introduction. This may be innocuous with simulated data but it creates problems when one tries to estimate the overall average ATT (all lag/lead estimates will be relative to the wrong period). I've seen this issue with simulated data (e.g. Asjad Naqvi 's incredibly helpful code: https://asjadnaqvi.github.io/DiD/docs/code/06_combined/) and with real data across different projects.
I was wondering if anyone has figured out how to somehow force the reference lead to be period -1 or otherwise re-scaled the lead and lag estimates to be relative to this more meaningful reference.
Thanks for chiming in!
P.S. This could be a very simple fix and I am just completely missing it!
I was wondering if anyone has figured out how to somehow force the reference lead to be period -1 or otherwise re-scaled the lead and lag estimates to be relative to this more meaningful reference.
Thanks for chiming in!
P.S. This could be a very simple fix and I am just completely missing it!
Matching data over time?
Hello,
I have a sample from the Current Population Survey (CPS). My data is monthly from 2015-present for the variables empsame, empstat, and labforce. EMPSAME indicates whether or not the respondent was employed by the same employer and the same job he/she reported working as his/her main job in the previous month's survey. What I am trying to do here is to define the "newly employed" as those who responded NO in EMPSAME because this means that they are working for a new employer compared to the previous month.
Once I have the set of "newly employed" in period t, I then want to split it into the ones who were employed in t-1, and those who were not employed in t-1.
I have an identifier variable that gives a unique id for each individual surveyed.
I do not know how I can use the data at hand to achieve the above goal. Any ideas on how to use the identifier to track individuals over each 2 consecutive months to be able to get the 2 sets?
I have a sample from the Current Population Survey (CPS). My data is monthly from 2015-present for the variables empsame, empstat, and labforce. EMPSAME indicates whether or not the respondent was employed by the same employer and the same job he/she reported working as his/her main job in the previous month's survey. What I am trying to do here is to define the "newly employed" as those who responded NO in EMPSAME because this means that they are working for a new employer compared to the previous month.
Once I have the set of "newly employed" in period t, I then want to split it into the ones who were employed in t-1, and those who were not employed in t-1.
I have an identifier variable that gives a unique id for each individual surveyed.
I do not know how I can use the data at hand to achieve the above goal. Any ideas on how to use the identifier to track individuals over each 2 consecutive months to be able to get the 2 sets?
Creating new variable (subtracting) same individual
Hi,
I am using individual-level dataset across 4 years to try and calculate the mobility out of low wage between 2020 and 2017. I created these dummy variables (delta2020 and delta2017) expressing 1 if low wage and 0 if not.
However, I need to subtract delta2020 with delta2017 for the same individual (Person_Ref1).
How can I create a variable like diff below?
input Person_Ref1Person_Ref2 delta2020 delta2017 diff
"1000501" "2017" . 1 .
"1000501" "2018" . . .
"1000501" "2019" . . .
"1000501" "2020" 1 . 0
"1000502" "2017" . 0 .
"1000502" "2018" . . .
"1000502" "2019" . . .
"1000502" "2020" 0 . 0
"1000503" "2017" . 0 .
"1000503" "2018" . . .
"1000503" "2019" . . .
"1000503" "2020" 1 . 1
"1000701" "2017" . 1 .
"1000701" "2018" . . .
"1000701" "2019" . . .
"1000701" "2020" 1 . 0
"1000702" "2017" . 1 .
"1000702" "2018" . . .
"1000702" "2019" . . .
"1000702" "2020" 1 . 0
"1000703" "2017" . 0 .
"1000703" "2018" . . .
"1000703" "2019" . . .
"1000703" "2020" 0 . 0
"1002601" "2017" . 0 .
"1002601" "2018" . . .
"1002601" "2019" . . .
"1002601" "2020" 0 . 0
"1004203" "2017" . 1 .
"1004203" "2018" . . .
"1004203" "2019" . . .
"1004203" "2020" 1 . 1
Appreciate your help and thanks in advance
I am using individual-level dataset across 4 years to try and calculate the mobility out of low wage between 2020 and 2017. I created these dummy variables (delta2020 and delta2017) expressing 1 if low wage and 0 if not.
However, I need to subtract delta2020 with delta2017 for the same individual (Person_Ref1).
How can I create a variable like diff below?
input Person_Ref1Person_Ref2 delta2020 delta2017 diff
"1000501" "2017" . 1 .
"1000501" "2018" . . .
"1000501" "2019" . . .
"1000501" "2020" 1 . 0
"1000502" "2017" . 0 .
"1000502" "2018" . . .
"1000502" "2019" . . .
"1000502" "2020" 0 . 0
"1000503" "2017" . 0 .
"1000503" "2018" . . .
"1000503" "2019" . . .
"1000503" "2020" 1 . 1
"1000701" "2017" . 1 .
"1000701" "2018" . . .
"1000701" "2019" . . .
"1000701" "2020" 1 . 0
"1000702" "2017" . 1 .
"1000702" "2018" . . .
"1000702" "2019" . . .
"1000702" "2020" 1 . 0
"1000703" "2017" . 0 .
"1000703" "2018" . . .
"1000703" "2019" . . .
"1000703" "2020" 0 . 0
"1002601" "2017" . 0 .
"1002601" "2018" . . .
"1002601" "2019" . . .
"1002601" "2020" 0 . 0
"1004203" "2017" . 1 .
"1004203" "2018" . . .
"1004203" "2019" . . .
"1004203" "2020" 1 . 1
Appreciate your help and thanks in advance
vlookup generates same id
I read the thread on Complicated Vlookup-type problem in a large dataset.
I am facing a similar problem with my research.
I was able to generate grandmaid separate from momid. In other words the grandmaid generated is not a duplicate of momid. However when I generate grandpaid. its the same as grandmaid.
Why does vlookup do that?
I am facing a similar problem with my research.
I was able to generate grandmaid separate from momid. In other words the grandmaid generated is not a duplicate of momid. However when I generate grandpaid. its the same as grandmaid.
Why does vlookup do that?
Thursday, October 27, 2022
Cox model with an unbalanced panel
Hello everyone,
I'm running a cox model with an unbalanced panel from 2015-2021 to measure the factors that influence the survival of companies, I have my time variable and the failure variable, my covariates vary over time, that is, they are GDP , agglomeration index, growth rate of economic sectors and other indexes, these variables are different each year and also vary depending on the region. When the covariates are included in the analysis, they turn out to be significant, but none of them meet the proportional hazards assumption. My question is, is there an error in my database?
The commands I use to run the model are
-snapspan idn time died, generate(time0) replace-
-stset time, id(idn) failure (died)-
-stcox- (with my covariates)
-stat phtest-
I have reviewed some works where tests of the proportional hazards assumption are not included, so I don't know if it is something necessary,
thanks in advance
Ana
I'm running a cox model with an unbalanced panel from 2015-2021 to measure the factors that influence the survival of companies, I have my time variable and the failure variable, my covariates vary over time, that is, they are GDP , agglomeration index, growth rate of economic sectors and other indexes, these variables are different each year and also vary depending on the region. When the covariates are included in the analysis, they turn out to be significant, but none of them meet the proportional hazards assumption. My question is, is there an error in my database?
The commands I use to run the model are
-snapspan idn time died, generate(time0) replace-
-stset time, id(idn) failure (died)-
-stcox- (with my covariates)
-stat phtest-
I have reviewed some works where tests of the proportional hazards assumption are not included, so I don't know if it is something necessary,
thanks in advance
Ana
When including interacted fixed effects, do I also have to include each fixed effects separately in the model?
When including interacted fixed effects, do I also have to include each fixed effects separately in the model?
For example, if I want to include firm x Industry fixed effects, do I also have to include firm fixed effects and industry fixed effects as well?
Thank you!
For example, if I want to include firm x Industry fixed effects, do I also have to include firm fixed effects and industry fixed effects as well?
Thank you!
Difference in Difference analysis - how to create a group variable
I think my question is pretty straightforward. I'd like to know what my "Group" is for a difference in differences analysis.
Study: Newspaper articles from 1985-2017 were categorized as either in favor, against, or neutral toward a specific topic. These are weighted and graphed.
Array
There is a clear divergence between the Favor and Against variables that happen roughly around 2007 or 2008. I would like to test this using a treatment variable defined as "before 2008" and "2008 and on," Treatment 0/1. The time is Year. I've added "the mean of the Favor and Against variables" for the control variable, Control.
This is a mock-up of my data:
Array
The unit of analysis is newspapers for each year.
But what's up with the Group? I know it's supposed to be categorical ... but what am I supposed to put?
The syntax I have for the command is didregress (Favor) (Treatment), group(?????) time(Year)
Any thoughts on what my Group should be?
Thank you!
Study: Newspaper articles from 1985-2017 were categorized as either in favor, against, or neutral toward a specific topic. These are weighted and graphed.
Array
There is a clear divergence between the Favor and Against variables that happen roughly around 2007 or 2008. I would like to test this using a treatment variable defined as "before 2008" and "2008 and on," Treatment 0/1. The time is Year. I've added "the mean of the Favor and Against variables" for the control variable, Control.
This is a mock-up of my data:
Array
The unit of analysis is newspapers for each year.
But what's up with the Group? I know it's supposed to be categorical ... but what am I supposed to put?
The syntax I have for the command is didregress (Favor) (Treatment), group(?????) time(Year)
Any thoughts on what my Group should be?
Thank you!
My Lorenz curves are coming out as straight lines.
Hello Stata Experts,
I am trying to create lorenz curves for a variable with four categories. I have done this before but this time my curves are coming out as straight lines. I am sharing an example dataset along with my commands.
I also tried using a standardized wealth index but it is not working either.
I will really appreciate any insight into this-where I am going wrong or what does this mean? I can share my full PCA code and the related data, if needed.
Thanks in advance
Deepali
Array Array
I am trying to create lorenz curves for a variable with four categories. I have done this before but this time my curves are coming out as straight lines. I am sharing an example dataset along with my commands.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float raw_rank1 byte v190 float(sei z) double rank float rank2 1519 2 -.57867503 0 .438 .438 1521 2 -.57797277 0 .439 .439 890 1 -.8061847 0 .273 .273 2461 2 .0007149702 0 .655 .655 2061 2 -.28932393 0 .56 .56 2988 3 .4960965 0 .763 .763 1394 2 -.6336712 0 .407 .407 119 1 -1.0780907 0 .045 .045 1511 2 -.5812709 0 .435 .435 158 1 -1.0511917 0 .057 .057 3132 3 .6742755 0 .805 .805 1028 1 -.7624822 0 .311 .311 1493 2 -.5876991 0 .431 .431 1592 2 -.5417551 0 .451 .451 3225 3 .8141427 0 .822 .822 1406 2 -.6303535 0 .409 .409 1921 2 -.3819044 0 .526 .526 266 1 -1.0056158 0 .092 .092 2282 2 -.14398591 0 .611 .611 11 1 -1.223736 0 .001 .001 66 1 -1.1141272 0 .027 .027 3905 3 2.594721 0 .983 .983 1562 2 -.5555258 0 .446 .446 212 1 -1.0261097 0 .075 .075 277 1 -1.0024703 0 .098 .098 3750 3 1.9207685 0 .949 .949 114 1 -1.080746 0 .044 .044 414 1 -.9529806 0 .141 .141 1021 1 -.7646693 0 .308 .308 3956 3 2.976368 0 .992 .992 2139 2 -.2403019 0 .576 .576 872 1 -.8127694 0 .267 .267 1201 1 -.7066278 0 .357 .357 42 1 -1.1533086 0 .019 .019 1211 1 -.7038143 0 .359 .359 673 1 -.8647565 0 .215 .215 61 1 -1.119929 0 .024 .024 2684 3 .1753951 0 .7 .7 2035 2 -.3084227 0 .552 .552 1706 2 -.4945633 0 .477 .477 106 1 -1.0862021 0 .041 .041 445 1 -.9433951 0 .15 .15 3025 3 .53929883 0 .775 .775 3368 3 1.0676816 0 .868 .868 2483 2 .011977144 0 .658 .658 1341 2 -.6609402 0 .392 .392 123 1 -1.07341 0 .047 .047 3627 3 1.5838754 0 .929 .929 942 1 -.7906448 0 .286 .286 2722 3 .20741607 0 .71 .71 3233 3 .8326448 0 .823 .823 1628 2 -.52488875 0 .461 .461 2923 3 .4188752 0 .753 .753 3454 3 1.2099286 0 .89 .89 1757 2 -.4668162 0 .489 .489 1435 2 -.6137782 0 .416 .416 2478 2 .008811969 0 .657 .657 3957 3 2.99273 0 .993 .993 1430 2 -.6182839 0 .414 .414 964 1 -.7837582 0 .292 .292 1427 2 -.6187509 0 .412 .412 927 1 -.795358 0 .282 .282 1634 2 -.5228623 0 .463 .463 645 1 -.8762712 0 .206 .206 3474 3 1.252518 0 .894 .894 2442 2 -.012976546 0 .65 .65 735 1 -.8494613 0 .237 .237 1143 1 -.7286108 0 .345 .345 2302 2 -.1215786 0 .617 .617 1652 2 -.5176178 0 .467 .467 561 1 -.9010396 0 .187 .187 3317 3 .9688857 0 .854 .854 73 1 -1.1084094 0 .03 .03 3866 3 2.3716063 0 .973 .973 319 1 -.9914252 0 .11 .11 2440 2 -.013812775 0 .649 .649 26 1 -1.1747433 0 .012 .012 258 1 -1.0091702 0 .089 .089 1556 2 -.5576935 0 .445 .445 322 1 -.990563 0 .112 .112 2132 2 -.24586444 0 .574 .574 801 1 -.8318206 0 .254 .254 711 1 -.856173 0 .229 .229 2699 3 .19003813 0 .705 .705 3192 3 .7619695 0 .819 .819 1226 1 -.6974397 0 .365 .365 1607 2 -.53420997 0 .454 .454 3172 3 .72391 0 .812 .812 2609 2 .10445177 0 .685 .685 1983 2 -.3496844 0 .54 .54 3392 3 1.1001801 0 .874 .874 3078 3 .6023574 0 .789 .789 734 1 -.8496683 0 .236 .236 1976 2 -.3590761 0 .537 .537 2136 2 -.24302575 0 .575 .575 2604 2 .10296789 0 .682 .682 17 1 -1.1979523 0 .006 .006 1433 2 -.6150438 0 .415 .415 1774 2 -.454978 0 .496 .496 3412 3 1.1337353 0 .879 .879 end label values v190 wlth
Code:
pca x1-x74 predict comp1 rename comp1 v191 regr y v191 i.z drop if e(sample)!=1 glcurve y , pvar(rank) glvar(Lorenz) sortvar(v191) replace by(z) split saving(v191_graph)
I also tried using a standardized wealth index but it is not working either.
Code:
egen sei=std(v191) regr y sei i.z drop if e(sample)!=1 glcurve y , pvar(rank) glvar(Lorenz) sortvar(sei) replace by(z) split saving(sei_graph)
Thanks in advance
Deepali
Array Array
Adding and simplifying observations; from daily to weekly
Hello to all.
I´m kind of new in using STATA, and I´m having trouble figuring out how to merge observations. I have binary values for several variables, with several observations per day, and I have to present the weekly values (the addition of them all, per week). I understand it´s a bit basic, but I would appreciate any help. Here´s an example:
Date Week CO MU MN
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 1 0
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 1 0
I´m kind of new in using STATA, and I´m having trouble figuring out how to merge observations. I have binary values for several variables, with several observations per day, and I have to present the weekly values (the addition of them all, per week). I understand it´s a bit basic, but I would appreciate any help. Here´s an example:
Date Week CO MU MN
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 1 0
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-18 17/11 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 0 1
2017-03-19 17/12 0 1 0
Interpretation of "/logs" in Stata17 output using mestreg
Dear Stata users,
Can someone please kindly explain in simple terms what the "/logs" output refers to in the following code? I've looked in the mestreg help, streg help, this blog file https://www.stata.com/stata-news/new...-correlations/ and the "An introduction to survival analysis using stata" textbook and can't find an explanation anywhere.
use https://www.stata-press.com/data/r17/catheter
stset
mestreg age female || patient:, distribution(lognormal) time
Mixed-effects lognormal AFT regression Number of obs = 76
Group variable: patient Number of groups = 38
Obs per group:
min = 2
avg = 2.0
max = 2
Integration method: mvaghermite Integration pts. = 7
Wald chi2(2) = 14.13
Log likelihood = -328.70499 Prob > chi2 = 0.0009
------------------------------------------------------------------------------
_t | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0060556 .0106775 -0.57 0.571 -.0269832 .014872
female | 1.334213 .3570964 3.74 0.000 .634317 2.034109
_cons | 3.480046 .5453814 6.38 0.000 2.411118 4.548973
-------------+----------------------------------------------------------------
/logs | .0359558 .1297168 -.2182844 .290196
-------------+----------------------------------------------------------------
patient |
var(_cons)| .322698 .2641533 .0648667 1.605354
------------------------------------------------------------------------------
LR test vs. lognormal model: chibar2(01) = 1.62 Prob >= chibar2 = 0.1017
If I calculate the exponent of the /logs output, i.e. exp(0.0359558) = 1.03667, is this the estimated residual variance between catheters?
Kind regards,
Ben
Can someone please kindly explain in simple terms what the "/logs" output refers to in the following code? I've looked in the mestreg help, streg help, this blog file https://www.stata.com/stata-news/new...-correlations/ and the "An introduction to survival analysis using stata" textbook and can't find an explanation anywhere.
use https://www.stata-press.com/data/r17/catheter
stset
mestreg age female || patient:, distribution(lognormal) time
Mixed-effects lognormal AFT regression Number of obs = 76
Group variable: patient Number of groups = 38
Obs per group:
min = 2
avg = 2.0
max = 2
Integration method: mvaghermite Integration pts. = 7
Wald chi2(2) = 14.13
Log likelihood = -328.70499 Prob > chi2 = 0.0009
------------------------------------------------------------------------------
_t | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0060556 .0106775 -0.57 0.571 -.0269832 .014872
female | 1.334213 .3570964 3.74 0.000 .634317 2.034109
_cons | 3.480046 .5453814 6.38 0.000 2.411118 4.548973
-------------+----------------------------------------------------------------
/logs | .0359558 .1297168 -.2182844 .290196
-------------+----------------------------------------------------------------
patient |
var(_cons)| .322698 .2641533 .0648667 1.605354
------------------------------------------------------------------------------
LR test vs. lognormal model: chibar2(01) = 1.62 Prob >= chibar2 = 0.1017
If I calculate the exponent of the /logs output, i.e. exp(0.0359558) = 1.03667, is this the estimated residual variance between catheters?
Kind regards,
Ben
Wednesday, October 26, 2022
Plotting a bar graph with two y-axis
dear colleagues I am trying two create a bar graph with two y-axis from my data and code below and am getting error "invalid span'.What does this mean?Can I get an assistance kindly
------------------ copy up to and including the previous line ------------------
Listed 18 out of 18 observations
twoway bar No_cellsSeen, ///
over(Flow_rate,label(labsize(small)) gap(10)) ///
over(Tube_conc, label(labsize(small)) relabel(`r(relabel)')) ///
ytitle("Number seen/sorted", size(small)) ///
yaxis(2) ///
, span size(medium)) ///
blabel(bar,position(top)) ///
intensity(40) bgcolor(white) ///
asyvars ||
twoway bar
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(Poulation Tube_conc Flow_rate) int(Low_EventRate High_EventRate No_cellsSeen) byte Sort_Duration float seen 1 1 1 1200 1500 5423 11 2 1 1 3 1000 1300 3460 5 2 1 1 10 2000 2300 9163 6 2 1 2 1 400 500 5941 13 2 1 2 3 170 300 3899 12 2 1 2 10 300 600 5394 11 2 1 3 1 250 300 4857 81 2 1 3 3 120 300 4696 67 2 1 3 10 250 400 8529 47 2 1 1 1 1200 1500 5968 11 1 1 1 3 1000 1300 3800 5 1 1 1 10 2000 2300 10478 6 1 1 2 1 400 500 6051 13 1 1 2 3 170 300 4157 12 1 1 2 10 300 600 6051 11 1 1 3 1 250 300 4839 81 1 1 3 3 120 300 4656 67 1 1 3 10 250 400 8591 47 1 end label values Tube_conc mea label def mea 1 "20k cells/ul", modify label def mea 2 "5k cells/ul", modify label def mea 3 "1k cells/ul", modify label values seen s label def s 1 "seen", modify label def s 2 "sorted", modify
Listed 18 out of 18 observations
twoway bar No_cellsSeen, ///
over(Flow_rate,label(labsize(small)) gap(10)) ///
over(Tube_conc, label(labsize(small)) relabel(`r(relabel)')) ///
ytitle("Number seen/sorted", size(small)) ///
yaxis(2) ///
, span size(medium)) ///
blabel(bar,position(top)) ///
intensity(40) bgcolor(white) ///
asyvars ||
twoway bar
Marketing Communication Mix

eivreg in rolling window setting (estimates not stored properly)
Dear Statalist,
I might be shooting my shot with this post but I want to implement the eivreg command in a rolling-windows setting using an unbalanced panel (id x tid).
Currently, I am able to run the program but the estimates are not stored correctly. I would like to save the regression coefficients in a _b_* variables and the t-stats in _t_* variables.
The specifications for the rolling windows are the last 36 months (minimum 12 months of observations) by id. r_rf, mkt, and smb are assumed to be random variables.
The matrix r(table) stores the estimates for each window.
From similar threads (with other regressors), I ended up with this code.
If the estimates can be stored properly, I want to re-run the program changing the reliability (in this example case the reliability for both variables is 0.6). I am planning to increment by .1 in each step from 0 to 1. What other procedure would you recommend to speed up the process?
Another problem that I am facing is that I cannot break the code in between. I need to wait until the end to made changes, which might take several minutes. Do you know how to change this behavior?
Thank you
Ricardo
I might be shooting my shot with this post but I want to implement the eivreg command in a rolling-windows setting using an unbalanced panel (id x tid).
Currently, I am able to run the program but the estimates are not stored correctly. I would like to save the regression coefficients in a _b_* variables and the t-stats in _t_* variables.
The specifications for the rolling windows are the last 36 months (minimum 12 months of observations) by id. r_rf, mkt, and smb are assumed to be random variables.
The matrix r(table) stores the estimates for each window.
From similar threads (with other regressors), I ended up with this code.
Code:
g _b_mkt=. g _b_smb=. g _t_mkt=. g _t_smb=. capture program drop eiv_rol program define eiv_rol qui: eivreg r_rf mkt smb , r(mkt 0.6 smb 0.6) matrix mattest= r(table) replace _b_mkt= mattest[1, 1] if mi(_b_mkt) replace _b_hml= mattest[1, 2] if mi(_b_smb) replace _t_mkt= mattest[3, 1] if mi(_t_mkt) replace _t_hml= mattest[3, 2] if mi(_t_smb) exit end rangerun eiv_rol, by(id) interval(tid -36 -1)
Another problem that I am facing is that I cannot break the code in between. I need to wait until the end to made changes, which might take several minutes. Do you know how to change this behavior?
Thank you
Ricardo
Combining surveys with distinct analytical weights
Hi.
I have a dataset which combine 14 household surveys in 14 countries. Each survey was conducted in different years and each survey has a household weight variable that only specifies to this country's context (data structure is the same across 14 countries). Now I merged them and tried to cross tabulate the country and gender_area (four types of value: male_rural, female_rural, male_urban, female_urban) variable with weights (tab country gender [aw=hhweight], m). But I found that such a cross-tabulation would create weird values for some of the countries.
For example, if I add one if condition by the end of the tab (tab country gender [aw=hhweight] if abc==1, m), some country (KHM, NPL) 's row total would be greater than their original row total without the condition. But in this dataset, a condition would give a smaller subsample. If I don't add the weight (tab country gender, m), there is no such a problem. If I just tab one country with weight, there is no such a problem either. So I wonder if there is any way for me to compare all countries with weight. I am not that familiar with survey data reference in Stata (svyset, strata, etc). Hopefully there is someone can help.
Array
I have a dataset which combine 14 household surveys in 14 countries. Each survey was conducted in different years and each survey has a household weight variable that only specifies to this country's context (data structure is the same across 14 countries). Now I merged them and tried to cross tabulate the country and gender_area (four types of value: male_rural, female_rural, male_urban, female_urban) variable with weights (tab country gender [aw=hhweight], m). But I found that such a cross-tabulation would create weird values for some of the countries.
For example, if I add one if condition by the end of the tab (tab country gender [aw=hhweight] if abc==1, m), some country (KHM, NPL) 's row total would be greater than their original row total without the condition. But in this dataset, a condition would give a smaller subsample. If I don't add the weight (tab country gender, m), there is no such a problem. If I just tab one country with weight, there is no such a problem either. So I wonder if there is any way for me to compare all countries with weight. I am not that familiar with survey data reference in Stata (svyset, strata, etc). Hopefully there is someone can help.
Array
regress with panel data
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str7 ym byte hs_c float imports str3(importer exporter) float GDP_im long GDP_ex double Exrate_im float(Exrate_ex conflict) "2001M01" 1 22907.025 "CHN" "USA" 290989.9 3425559 95.2487673974315 132.03389 -488.7 "2001M02" 1 22907.025 "CHN" "USA" 301626.9 3432654 94.9859811841317 132.53093 -842.2 "2001M03" 1 22907.025 "CHN" "USA" 312263.8 3439749 96.5148945688052 134.90831 -1374.5 "2001M04" 1 22907.025 "CHN" "USA" 322900.7 3446845 97.7540203225569 136.17534 -8309.801 "2001M05" 1 22907.025 "CHN" "USA" 329373.1 3442245 97.5296537738364 135.50266 -3037.7 "2001M06" 1 22907.025 "CHN" "USA" 335845.4 3437645 97.9459442254285 136.04584 -738.7 "2001M07" 1 22907.025 "CHN" "USA" 342317.8 3433045 98.5136769658743 135.71281 -2247.3 "2001M08" 1 22907.025 "CHN" "USA" 355947.1 3436219 96.5277841656989 132.57167 -1014.2 "2001M09" 1 22907.025 "CHN" "USA" 369576.5 3439392 95.4664429813288 131.81607 -2174.2 "2001M10" 1 22907.025 "CHN" "USA" 383205.9 3442566 96.36440014041 132.57242 -1261.1 "2001M11" 1 22907.025 "CHN" "USA" 361366 3452070 96.7013085639837 133.92229 -358.1 "2001M12" 1 22907.025 "CHN" "USA" 339526.1 3461574 97.5790651204532 134.67474 -575 "2002M01" 1 31262.51 "CHN" "USA" 317686.3 3471078 98.3799449054757 137.08035 -952.3 "2002M02" 1 31262.51 "CHN" "USA" 329365.3 3478199 99.2481334791048 138.10823 -1001.9 "2002M03" 1 31262.51 "CHN" "USA" 341044.4 3485321 97.8834719959884 137.17897 -903.7 "2002M04" 1 31262.51 "CHN" "USA" 352723.4 3492442 97.1425181532612 136.76201 -410.4 "2002M05" 1 31262.51 "CHN" "USA" 361031.6 3497140 95.4092946886645 132.7594 -399.8 "2002M06" 1 31262.51 "CHN" "USA" 369339.8 3501838 93.8074447121544 129.10393 -278.5 "2002M07" 1 31262.51 "CHN" "USA" 377648.1 3506535 91.6067234174144 125.30114 -757.6 "2002M08" 1 31262.51 "CHN" "USA" 392592.4 3508063 92.1153966922954 126.6381 -559.8 "2002M09" 1 31262.51 "CHN" "USA" 407536.7 3509590 92.3667537296484 127.08677 -600.7 "2002M10" 1 31262.51 "CHN" "USA" 422481 3511117 93.1840883646182 127.79578 -575.8 "2002M11" 1 31262.51 "CHN" "USA" 401759.3 3517165 91.9806011990866 126.60763 -544.6 "2002M12" 1 31262.51 "CHN" "USA" 381037.5 3523212 91.529878028 126.00581 -692.2 "2003M01" 1 40474.96 "CHN" "USA" 360315.8 3529260 90.1206136188956 123.26498 -836.1 "2003M02" 1 40474.96 "CHN" "USA" 371244.5 3539770 90.127945381461 123.0451 -943.2 "2003M03" 1 40474.96 "CHN" "USA" 382173.1 3550279 89.9944890775613 122.89462 -1089.8 "2003M04" 1 40474.96 "CHN" "USA" 393101.8 3560789 90.1636884906585 123.52025 -984.7 "2003M05" 1 40474.96 "CHN" "USA" 404193.4 3580574 87.9425578376848 118.27224 -636.2 "2003M06" 1 40474.96 "CHN" "USA" 415285 3600359 87.6403887383518 117.3232 -552.5 "2003M07" 1 40474.96 "CHN" "USA" 426376.6 3620144 88.1265059788148 118.31814 -717.4 "2003M08" 1 40474.96 "CHN" "USA" 444405.2 3634070 88.7374547228043 121.06068 -949.9 "2003M09" 1 40474.96 "CHN" "USA" 462433.9 3647996 87.8689113231125 120.48258 -790.7 "2003M10" 1 40474.96 "CHN" "USA" 480462.5 3661922 86.4437653981293 119.68214 -1097 "2003M11" 1 40474.96 "CHN" "USA" 459426.1 3668915 87.1762972397664 118.16027 -862.4 "2003M12" 1 40474.96 "CHN" "USA" 438389.6 3675909 86.1886065218889 113.6926 -1137.1001 "2004M01" 1 34436.934 "CHN" "USA" 417353.2 3682902 85.3595266527873 108.61776 -915.8 "2004M02" 1 34436.934 "CHN" "USA" 434094.1 3692462 84.1682250127312 108.43234 -1021.6 "2004M03" 1 34436.934 "CHN" "USA" 450835.1 3702022 85.8602293325678 110.82971 -1250.6 "2004M04" 1 34436.934 "CHN" "USA" 467576 3711582 86.4399453470538 111.8528 -1089.2 "2004M05" 1 34436.934 "CHN" "USA" 480282.2 3723227 87.9178017422674 114.70498 -806.5 "2004M06" 1 34436.934 "CHN" "USA" 492988.5 3734872 87.1046063014496 113.60893 -978 "2004M07" 1 34436.934 "CHN" "USA" 505694.7 3746517 86.8262024465545 113.53995 -753.5 "2004M08" 1 34436.934 "CHN" "USA" 525372.8 3759281 87.392378701273 113.966 -608.5 "2004M09" 1 34436.934 "CHN" "USA" 545050.9 3772044 87.4111484209643 112.92165 -624.5 "2004M10" 1 34436.934 "CHN" "USA" 564729.1 3784808 86.0960776241377 110.8545 -294.6 "2004M11" 1 34436.934 "CHN" "USA" 539410.4 3798823 83.5447414579257 105.9082 -292.2 "2004M12" 1 34436.934 "CHN" "USA" 514091.7 3812838 82.0541941942796 103.66364 -572.9 "2005M01" 1 56490.7 "CHN" "USA" 488773 3826852 82.1078962405486 103.37172 -304.8 "2005M02" 1 56490.7 "CHN" "USA" 506251.4 3833060 82.8945232416078 103.84805 -510.8 "2005M03" 1 56490.7 "CHN" "USA" 523729.8 3839267 82.0825859930692 102.95734 -700.6 "2005M04" 1 56490.7 "CHN" "USA" 541208.2 3845474 82.7821510742878 104.66167 -472.7 "2005M05" 1 56490.7 "CHN" "USA" 557561.4 3855544 83.1157669337385 106.34795 -990.3 "2005M06" 1 56490.7 "CHN" "USA" 573914.6 3865615 84.2998988532275 108.73842 -632.6 "2005M07" 1 56490.7 "CHN" "USA" 590267.8 3875685 85.8133001727106 111.08463 -750.3 "2005M08" 1 56490.7 "CHN" "USA" 616303.7 3883022 86.0680785320791 109.0041 -542.6 "2005M09" 1 56490.7 "CHN" "USA" 642339.7 3890359 86.1042618695447 108.34776 -788.6 "2005M10" 1 56490.7 "CHN" "USA" 668375.6 3897696 87.44376973554 109.18375 -434.4 "2005M11" 1 56490.7 "CHN" "USA" 640514.5 3915218 88.326428758111 111.26317 -713.4 "2005M12" 1 56490.7 "CHN" "USA" 612653.4 3932741 87.9763728373693 111.1355 -426.3 "2006M01" 1 81040.34 "CHN" "USA" 584792.3 3950263 86.5881366299368 110.3442 -678.8 "2006M02" 1 81040.34 "CHN" "USA" 609045 3953497 86.9561957937083 111.22 -609.2 "2006M03" 1 81040.34 "CHN" "USA" 633297.8 3956730 86.7243768381664 110.81589 -1333.8 "2006M04" 1 81040.34 "CHN" "USA" 657550.5 3959964 86.292626886158 109.12933 -1344.8 "2006M05" 1 81040.34 "CHN" "USA" 672993.5 3961950 84.4130107730991 104.7291 -1151 "2006M06" 1 81040.34 "CHN" "USA" 688436.4 3963936 85.6043521050252 105.7272 -1499.6 "2006M07" 1 81040.34 "CHN" "USA" 703879.4 3965922 85.5206781503312 105.19274 -807.5 "2006M08" 1 81040.34 "CHN" "USA" 738997.2 3977076 85.4012147233782 104.92684 -1113 "2006M09" 1 81040.34 "CHN" "USA" 774114.9 3988230 86.26758063039 105.98315 -1785.8 "2006M10" 1 81040.34 "CHN" "USA" 809232.6 3999383 87.2939849273982 107.83578 -1019.7 "2006M11" 1 81040.34 "CHN" "USA" 785074 4003297 86.9312059561653 106.8941 -1656.7 "2006M12" 1 81040.34 "CHN" "USA" 760915.3 4007211 87.0858381296629 105.88695 -1223.4 "2007M01" 1 121749.84 "CHN" "USA" 736756.6 4011125 88.1709058655663 108.1406 -2221.5 "2007M02" 1 121749.84 "CHN" "USA" 772398.6 4019661 88.4680309310452 107.47453 -1393.2 "2007M03" 1 121749.84 "CHN" "USA" 808040.6 4028198 88.2165896802754 106.04984 -2480.4 "2007M04" 1 121749.84 "CHN" "USA" 843682.5 4036734 87.5223551122833 103.90254 -3041 "2007M05" 1 121749.84 "CHN" "USA" 868947.3 4044838 88.4249762412445 103.01221 -2810.5 "2007M06" 1 121749.84 "CHN" "USA" 894212 4052943 89.7994176184484 102.72694 -2797.6 "2007M07" 1 121749.84 "CHN" "USA" 919476.8 4061047 90.1439913275446 100.60493 -2321.1 "2007M08" 1 121749.84 "CHN" "USA" 965902.5 4069300 91.1855516013926 100.73556 -2192.1 "2007M09" 1 121749.84 "CHN" "USA" 1012328 4077553 90.5984099873853 99.03867 -2978.4 "2007M10" 1 121749.84 "CHN" "USA" 1058754 4085805 89.5003273545357 96.85046 -4514.5996 "2007M11" 1 121749.84 "CHN" "USA" 1028848 4080256 89.2296783629896 94.79352 -3270 "2007M12" 1 121749.84 "CHN" "USA" 998941.9 4074707 90.6907358463584 96.37177 -2581.6 "2008M01" 1 151607.58 "CHN" "USA" 969035.9 4069158 91.4561432930558 95.55553 -2046 "2008M02" 1 151607.58 "CHN" "USA" 1023150 4076921 93.5013522500989 95.10735 -5062 "2008M03" 1 151607.58 "CHN" "USA" 1077263 4084683 92.190051215958 92.49255 -4052.7 "2008M04" 1 151607.58 "CHN" "USA" 1131377 4092446 93.1322198043677 92.6712 -5042.4 "2008M05" 1 151607.58 "CHN" "USA" 1156253 4085258 93.9823709559253 93.72097 -2640.7 "2008M06" 1 151607.58 "CHN" "USA" 1181128 4078071 94.9891741527411 94.85728 -2735.2 "2008M07" 1 151607.58 "CHN" "USA" 1206004 4070883 94.8847806429896 94.3504 -2784.9 "2008M08" 1 151607.58 "CHN" "USA" 1236305 4041259 96.8387184905498 98.96593 -3223 "2008M09" 1 151607.58 "CHN" "USA" 1266607 4011636 99.4853398062497 102.48123 -1573.9 "2008M10" 1 151607.58 "CHN" "USA" 1296908 3982012 103.995483591401 110.8625 -3169.7 "2008M11" 1 151607.58 "CHN" "USA" 1225333 3966539 106.160585135563 114.07413 -1732.3 "2008M12" 1 151607.58 "CHN" "USA" 1153759 3951066 103.497649081851 110.66965 -2386.7 "2009M01" 1 130867.41 "CHN" "USA" 1082184 3935593 104.036770558405 110.2786 -4105.6 "2009M02" 1 130867.41 "CHN" "USA" 1130807 3933372 105.915488371398 113.40341 -4055.5 "2009M03" 1 130867.41 "CHN" "USA" 1179430 3931152 107.163796022779 114.57028 -7922.8 "2009M04" 1 130867.41 "CHN" "USA" 1228053 3928932 105.124889734425 112.8565 -5349.3 "2002M09" 3 3570.5576 "CHN" "USA" 407536.7 3509590 92.3667537296484 127.08677 -600.7 "2002M10" 3 3570.5576 "CHN" "USA" 422481 3511117 93.1840883646182 127.79578 -575.8 "2002M11" 3 3570.5576 "CHN" "USA" 401759.3 3517165 91.9806011990866 126.60763 -544.6 "2002M12" 3 3570.5576 "CHN" "USA" 381037.5 3523212 91.529878028 126.00581 -692.2 "2003M01" 3 8855.441 "CHN" "USA" 360315.8 3529260 90.1206136188956 123.26498 -836.1 "2003M02" 3 8855.441 "CHN" "USA" 371244.5 3539770 90.127945381461 123.0451 -943.2 "2003M03" 3 8855.441 "CHN" "USA" 382173.1 3550279 89.9944890775613 122.89462 -1089.8 "2003M04" 3 8855.441 "CHN" "USA" 393101.8 3560789 90.1636884906585 123.52025 -984.7 "2003M05" 3 8855.441 "CHN" "USA" 404193.4 3580574 87.9425578376848 118.27224 -636.2 "2003M06" 3 8855.441 "CHN" "USA" 415285 3600359 87.6403887383518 117.3232 -552.5 "2003M07" 3 8855.441 "CHN" "USA" 426376.6 3620144 88.1265059788148 118.31814 -717.4 "2003M08" 3 8855.441 "CHN" "USA" 444405.2 3634070 88.7374547228043 121.06068 -949.9 "2003M09" 3 8855.441 "CHN" "USA" 462433.9 3647996 87.8689113231125 120.48258 -790.7 "2003M10" 3 8855.441 "CHN" "USA" 480462.5 3661922 86.4437653981293 119.68214 -1097 "2003M11" 3 8855.441 "CHN" "USA" 459426.1 3668915 87.1762972397664 118.16027 -862.4 "2003M12" 3 8855.441 "CHN" "USA" 438389.6 3675909 86.1886065218889 113.6926 -1137.1001 "2004M01" 3 3054.515 "CHN" "USA" 417353.2 3682902 85.3595266527873 108.61776 -915.8 "2004M02" 3 3054.515 "CHN" "USA" 434094.1 3692462 84.1682250127312 108.43234 -1021.6 "2004M03" 3 3054.515 "CHN" "USA" 450835.1 3702022 85.8602293325678 110.82971 -1250.6 "2004M04" 3 3054.515 "CHN" "USA" 467576 3711582 86.4399453470538 111.8528 -1089.2 "2004M05" 3 3054.515 "CHN" "USA" 480282.2 3723227 87.9178017422674 114.70498 -806.5 "2004M06" 3 3054.515 "CHN" "USA" 492988.5 3734872 87.1046063014496 113.60893 -978 "2004M07" 3 3054.515 "CHN" "USA" 505694.7 3746517 86.8262024465545 113.53995 -753.5 "2004M08" 3 3054.515 "CHN" "USA" 525372.8 3759281 87.392378701273 113.966 -608.5 "2004M09" 3 3054.515 "CHN" "USA" 545050.9 3772044 87.4111484209643 112.92165 -624.5 "2004M10" 3 3054.515 "CHN" "USA" 564729.1 3784808 86.0960776241377 110.8545 -294.6 "2004M11" 3 3054.515 "CHN" "USA" 539410.4 3798823 83.5447414579257 105.9082 -292.2 "2004M12" 3 3054.515 "CHN" "USA" 514091.7 3812838 82.0541941942796 103.66364 -572.9 "2005M01" 3 1885.4142 "CHN" "USA" 488773 3826852 82.1078962405486 103.37172 -304.8 "2005M02" 3 1885.4142 "CHN" "USA" 506251.4 3833060 82.8945232416078 103.84805 -510.8 "2005M03" 3 1885.4142 "CHN" "USA" 523729.8 3839267 82.0825859930692 102.95734 -700.6 "2005M04" 3 1885.4142 "CHN" "USA" 541208.2 3845474 82.7821510742878 104.66167 -472.7 "2005M05" 3 1885.4142 "CHN" "USA" 557561.4 3855544 83.1157669337385 106.34795 -990.3 "2005M06" 3 1885.4142 "CHN" "USA" 573914.6 3865615 84.2998988532275 108.73842 -632.6 "2005M07" 3 1885.4142 "CHN" "USA" 590267.8 3875685 85.8133001727106 111.08463 -750.3 "2005M08" 3 1885.4142 "CHN" "USA" 616303.7 3883022 86.0680785320791 109.0041 -542.6 "2005M09" 3 1885.4142 "CHN" "USA" 642339.7 3890359 86.1042618695447 108.34776 -788.6 "2005M10" 3 1885.4142 "CHN" "USA" 668375.6 3897696 87.44376973554 109.18375 -434.4 "2005M11" 3 1885.4142 "CHN" "USA" 640514.5 3915218 88.326428758111 111.26317 -713.4 "2005M12" 3 1885.4142 "CHN" "USA" 612653.4 3932741 87.9763728373693 111.1355 -426.3 "2006M01" 3 5560.878 "CHN" "USA" 584792.3 3950263 86.5881366299368 110.3442 -678.8 "2006M02" 3 5560.878 "CHN" "USA" 609045 3953497 86.9561957937083 111.22 -609.2 "2006M03" 3 5560.878 "CHN" "USA" 633297.8 3956730 86.7243768381664 110.81589 -1333.8 "2006M04" 3 5560.878 "CHN" "USA" 657550.5 3959964 86.292626886158 109.12933 -1344.8 "2006M05" 3 5560.878 "CHN" "USA" 672993.5 3961950 84.4130107730991 104.7291 -1151 "2006M06" 3 5560.878 "CHN" "USA" 688436.4 3963936 85.6043521050252 105.7272 -1499.6 "2006M07" 3 5560.878 "CHN" "USA" 703879.4 3965922 85.5206781503312 105.19274 -807.5 "2006M08" 3 5560.878 "CHN" "USA" 738997.2 3977076 85.4012147233782 104.92684 -1113 "2006M09" 3 5560.878 "CHN" "USA" 774114.9 3988230 86.26758063039 105.98315 -1785.8 "2006M10" 3 5560.878 "CHN" "USA" 809232.6 3999383 87.2939849273982 107.83578 -1019.7 "2006M11" 3 5560.878 "CHN" "USA" 785074 4003297 86.9312059561653 106.8941 -1656.7 "2006M12" 3 5560.878 "CHN" "USA" 760915.3 4007211 87.0858381296629 105.88695 -1223.4 "2007M01" 3 14253.283 "CHN" "USA" 736756.6 4011125 88.1709058655663 108.1406 -2221.5 "2007M02" 3 14253.283 "CHN" "USA" 772398.6 4019661 88.4680309310452 107.47453 -1393.2 "2007M03" 3 14253.283 "CHN" "USA" 808040.6 4028198 88.2165896802754 106.04984 -2480.4 "2007M04" 3 14253.283 "CHN" "USA" 843682.5 4036734 87.5223551122833 103.90254 -3041 "2007M05" 3 14253.283 "CHN" "USA" 868947.3 4044838 88.4249762412445 103.01221 -2810.5 "2007M06" 3 14253.283 "CHN" "USA" 894212 4052943 89.7994176184484 102.72694 -2797.6 "2007M07" 3 14253.283 "CHN" "USA" 919476.8 4061047 90.1439913275446 100.60493 -2321.1 "2007M08" 3 14253.283 "CHN" "USA" 965902.5 4069300 91.1855516013926 100.73556 -2192.1 "2007M09" 3 14253.283 "CHN" "USA" 1012328 4077553 90.5984099873853 99.03867 -2978.4 "2007M10" 3 14253.283 "CHN" "USA" 1058754 4085805 89.5003273545357 96.85046 -4514.5996 "2007M11" 3 14253.283 "CHN" "USA" 1028848 4080256 89.2296783629896 94.79352 -3270 "2007M12" 3 14253.283 "CHN" "USA" 998941.9 4074707 90.6907358463584 96.37177 -2581.6 "2008M01" 3 15328.8 "CHN" "USA" 969035.9 4069158 91.4561432930558 95.55553 -2046 "2008M02" 3 15328.8 "CHN" "USA" 1023150 4076921 93.5013522500989 95.10735 -5062 "2008M03" 3 15328.8 "CHN" "USA" 1077263 4084683 92.190051215958 92.49255 -4052.7 "2008M04" 3 15328.8 "CHN" "USA" 1131377 4092446 93.1322198043677 92.6712 -5042.4 "2008M05" 3 15328.8 "CHN" "USA" 1156253 4085258 93.9823709559253 93.72097 -2640.7 "2008M06" 3 15328.8 "CHN" "USA" 1181128 4078071 94.9891741527411 94.85728 -2735.2 "2008M07" 3 15328.8 "CHN" "USA" 1206004 4070883 94.8847806429896 94.3504 -2784.9 "2008M08" 3 15328.8 "CHN" "USA" 1236305 4041259 96.8387184905498 98.96593 -3223 "2008M09" 3 15328.8 "CHN" "USA" 1266607 4011636 99.4853398062497 102.48123 -1573.9 "2008M10" 3 15328.8 "CHN" "USA" 1296908 3982012 103.995483591401 110.8625 -3169.7 "2008M11" 3 15328.8 "CHN" "USA" 1225333 3966539 106.160585135563 114.07413 -1732.3 "2008M12" 3 15328.8 "CHN" "USA" 1153759 3951066 103.497649081851 110.66965 -2386.7 "2009M01" 3 5691.213 "CHN" "USA" 1082184 3935593 104.036770558405 110.2786 -4105.6 "2009M02" 3 5691.213 "CHN" "USA" 1130807 3933372 105.915488371398 113.40341 -4055.5 "2009M03" 3 5691.213 "CHN" "USA" 1179430 3931152 107.163796022779 114.57028 -7922.8 "2009M04" 3 5691.213 "CHN" "USA" 1228053 3928932 105.124889734425 112.8565 -5349.3 "2009M05" 3 5691.213 "CHN" "USA" 1257157 3933672 102.206898494182 107.628 -4022.9 "2009M06" 3 5691.213 "CHN" "USA" 1286261 3938412 100.704582631571 104.74212 -6657.6 "2009M07" 3 5691.213 "CHN" "USA" 1315365 3943152 100.204171397691 102.92252 -5497.3 "2009M08" 3 5691.213 "CHN" "USA" 1369192 3957189 99.4111044645282 101.29693 -5129.1 "2009M09" 3 5691.213 "CHN" "USA" 1423020 3971226 98.2524437102784 99.80137 -6877.1 "2009M10" 3 5691.213 "CHN" "USA" 1476847 3985263 96.7217188088701 98.4444 -6251.1 "2009M11" 3 5691.213 "CHN" "USA" 1411784 3991906 96.2560607379722 97.63283 -7399 "2009M12" 3 5691.213 "CHN" "USA" 1346720 3998549 97.2920921338491 98.14178 -6352.9 "2010M01" 3 37093.566 "CHN" "USA" 1281657 4005192 97.262547019236 98.15366 -18084.2 "2010M02" 3 37093.566 "CHN" "USA" 1339734 4018111 98.9521698050686 100.18134 -17574.3 "2010M03" 3 37093.566 "CHN" "USA" 1397811 4031031 98.4549443924715 99.84082 -4646.8 "2010M04" 3 37093.566 "CHN" "USA" 1455888 4043950 98.7186784280133 100.0141 -3887.3 "2010M05" 3 37093.566 "CHN" "USA" 1492501 4054372 101.432033026936 103.97416 -3251.9 "2010M06" 3 37093.566 "CHN" "USA" 1529114 4064794 102.688692447672 105.27413 -2925.3 "2010M07" 3 37093.566 "CHN" "USA" 1565727 4075216 101.455376872199 102.78316 -4395.4004 "2010M08" 3 37093.566 "CHN" "USA" 1641209 4082274 100.251113649551 101.02351 -8410.199 "2010M09" 3 37093.566 "CHN" "USA" 1716691 4089332 100.241808068078 99.60337 -8672.6 "2010M10" 3 37093.566 "CHN" "USA" 1792174 4096391 98.7447521503819 95.1022 -9068 "2010M11" 3 37093.566 "CHN" "USA" 1724092 4093101 100.476290307228 96.04137 -10536.6 "2010M12" 3 37093.566 "CHN" "USA" 1656011 4089811 101.321593833164 98.00818 -9005.7 end
As I'm new with such kind of analysis, I would very much appreciate your help. Best, Kendal
asroprobit may estimate the variables that are not defined in the code?
Hello,
When I followed the tutorial to learn the asroprobit command in the Stata software, I typed the following code:
use http://www.stata-press.com/data/r15/wlsrank
asroprobit rank high low if noties, case(id) alternatives(jobchar) casevars(female score) reverse
Some of the output results is confusing, and what are the results estimated for the last five rows in the resulting table? I didn't define "/ln12_2, /ln13_3, /12_1, /13_1, /13_2" in the code.
Besides, when I include case-specific variables in rank-ordered logistic regression model, the estimation will omit the case-specific variables. Can a rank-ordered logistic regression model analyze case-specific variables?
When I followed the tutorial to learn the asroprobit command in the Stata software, I typed the following code:
use http://www.stata-press.com/data/r15/wlsrank
asroprobit rank high low if noties, case(id) alternatives(jobchar) casevars(female score) reverse
Some of the output results is confusing, and what are the results estimated for the last five rows in the resulting table? I didn't define "/ln12_2, /ln13_3, /12_1, /13_1, /13_2" in the code.
Besides, when I include case-specific variables in rank-ordered logistic regression model, the estimation will omit the case-specific variables. Can a rank-ordered logistic regression model analyze case-specific variables?
Tuesday, October 25, 2022
Tabulating estimates of endogenous variables in dynamic forecast models
Hi 
I am running some dynamic forecast models, each of which contains 7 endogenous variables. Since there are missing values for some of the endogenous variables, some values are model-run during the forecasting.
The forecasting has been completed, but I would like to take a look at the model-run estimates of 3 endogenous variables - the source of one of them is "estimates", and the sources of the other two are "identity".
Which command(s) should I use?
Thanks in advance!

I am running some dynamic forecast models, each of which contains 7 endogenous variables. Since there are missing values for some of the endogenous variables, some values are model-run during the forecasting.
The forecasting has been completed, but I would like to take a look at the model-run estimates of 3 endogenous variables - the source of one of them is "estimates", and the sources of the other two are "identity".
Which command(s) should I use?
Thanks in advance!
[LASSO] Collinear covariates: Suggested addition to the documentation
I would like to suggest an addition to the documentation for collinear covariates in LASSO models. The Summary section currently reads as follows:
While tinkering around, I discovered that that one must not use ib#.group in place of i.group. Doing so causes the specified base level to be omitted, and will therefore give different results. I think a warning about this should be added to the documentation. E.g., something like this could be added to the Summary section.
I'm sure the folks who write the documentation can improve on the wording, but I hope this gets the idea across.
For anyone who is interested, the code for my "tinkering" is pasted below.
Cheers,
Bruce
Summary
Consider factor variablegroupthat takes on the values 1, 2, and 3. If you type
. lasso linear y i.group. . .
lassowill know that separate covariates forgroup1, 2, and 3 are to be included among the variables
to be potentially included in the model.
If you create your own indicator variables, you need to create and specify indicators for all the
values of the factor variable:
. generate g1 = (group==1)
. generate g2 = (group==2)
. generate g3 = (group==3)
. lasso linear y g1 g2 g3. . .
It is important that you do not omit one of them, say,g1, and instead type
. lasso linear y g2 g3. . .
Consider factor variablegroupthat takes on the values 1, 2, and 3. If you type
. lasso linear y i.group. . .
lassowill know that separate covariates forgroup1, 2, and 3 are to be included among the variables
to be potentially included in the model.
If you create your own indicator variables, you need to create and specify indicators for all the
values of the factor variable:
. generate g1 = (group==1)
. generate g2 = (group==2)
. generate g3 = (group==3)
. lasso linear y g1 g2 g3. . .
It is important that you do not omit one of them, say,g1, and instead type
. lasso linear y g2 g3. . .
While tinkering around, I discovered that that one must not use ib#.group in place of i.group. Doing so causes the specified base level to be omitted, and will therefore give different results. I think a warning about this should be added to the documentation. E.g., something like this could be added to the Summary section.
Note as well that you must not use the ib# prefix, because that will cause the selected base level to be omitted. For example, using ib1.group is equivalent to including g2 and g3 but not g1.
For anyone who is interested, the code for my "tinkering" is pasted below.
Cheers,
Bruce
Code:
// File: LASSO_collinear_covariates.do // Date: 25-Oct-2022 // Name: Bruce Weaver, bweaver@lakeheadu.ca // Suggestion: Caution users of LASSO that factor variables will not // be handled as described in the documentation if one uses ib#.variable. // Only the i.variable form of factor variable notation is handled properly. // The relevant documentation can be seen here: // https://www.stata.com/manuals/lassocollinearcovariates.pdf#lassoCollinearcovariates // Use auto.dta to create an example like the one described. clear * sysuse auto // Create 5 indicator variables for rep78 forvalues i = 1(1)5 { generate byte rep`i' = rep78 == `i' if !missing(rep78) } summarize rep1-rep5 // NOTE that you must reset the seed before estimating each model. * [1] Use factor variable notation for rep78 set seed 1234 quietly lasso linear mpg i.rep78 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) * [2] Use the 5 indicator variables for rep78 set seed 1234 quietly lasso linear mpg rep1 rep2 rep3 rep4 rep5 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) // Q. What happens if one uses ib#.rep78 rather than i.rep78? forvalues i = 1(1)5 { set seed 1234 display "Base level for rep78 = "`i' quietly lasso linear mpg ib`i'.rep78 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) } // A. Stata omits the base level when I do that. // Let's check a couple of them to verify. * Factor variable notiation with ib3.rep78 set seed 1234 quietly lasso linear mpg ib3.rep78 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) * Indicator variables with rep3 omitted set seed 1234 quietly lasso linear mpg rep1 rep2 rep4 rep5 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) * Factor variable notiation with ib5.rep78 set seed 1234 quietly lasso linear mpg ib5.rep78 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) * Indicator variables with rep5 omitted set seed 1234 quietly lasso linear mpg rep1 rep2 rep3 rep4 /// foreign headroom weight turn gear_ratio price trunk length displacement * Show which variables have been retained lassocoef, display(coef) // Confirmed.
How to weight data using CPS weights in STATA?
Hello,
I have CPS monthly data 2015-present. I want to calculate unemployment rate. I am using empstat and labforce variables from the CPS. I am also using wtfinl as the weights for the data.
My question is how to calculate the weighted unemployment rate using this data? I was able to calculate the unweighted rate using
gen unemployed = (empstat==21)+(empstat==22)
gen lf = labforce==2
gen unemp_rate = (unemployed==1)/(lf==1)
When I multiplied unemployment_rate by the wtfinl variable, I got very large unemployment rate. I do not know what am I doing wrong? How to weight the unemployment rate?
Also, a very stupid question, my data is individual level. How can I aggregate the weighted unemployment rate into a table of (year - month - weighted unemployment rate)? I mean I do not know how to report the weighted unemployment rate for each month of my period (2015-present).
Thanks
I have CPS monthly data 2015-present. I want to calculate unemployment rate. I am using empstat and labforce variables from the CPS. I am also using wtfinl as the weights for the data.
My question is how to calculate the weighted unemployment rate using this data? I was able to calculate the unweighted rate using
gen unemployed = (empstat==21)+(empstat==22)
gen lf = labforce==2
gen unemp_rate = (unemployed==1)/(lf==1)
When I multiplied unemployment_rate by the wtfinl variable, I got very large unemployment rate. I do not know what am I doing wrong? How to weight the unemployment rate?
Also, a very stupid question, my data is individual level. How can I aggregate the weighted unemployment rate into a table of (year - month - weighted unemployment rate)? I mean I do not know how to report the weighted unemployment rate for each month of my period (2015-present).
Thanks
Stata dataset to latex table
Help with reshaping the data
Dear community,
I am trying to bring some data into a different format. The data is World Uncertainty Index, which is publicly available for research purposes (https://worlduncertaintyindex.com/). The data provided is in the format that doesn't fit my overall dataset with other variables, so I've been trying to reshape it but no luck.
Current data format is
where three-letter codes are ISO-codes for countries.
What I am trying to get is as follows:
Thank you in advance for any advice!
I am trying to bring some data into a different format. The data is World Uncertainty Index, which is publicly available for research purposes (https://worlduncertaintyindex.com/). The data provided is in the format that doesn't fit my overall dataset with other variables, so I've been trying to reshape it but no luck.
Current data format is
Period | AFG | AGO | ALB | ARE |
2009q1 | var1 value | var1 value | var1 value | var1 value |
2009q2 | var1 value | var1 value | var1 value | var1 value |
2009q3 | var1 value | var1 value | var1 value | var1 value |
2009q4 | var1 value | var1 value | var1 value | var1 value |
What I am trying to get is as follows:
Country | Period | Var1 |
AFG | 2009q1 | value |
AFG | 2009q2 | value |
AFG | 2009q3 | value |
AFG | 2009q4 | value |
ARE | 2009q1 | value |
ARE | 2009q2 | value |
ARE | 2009q3 | value |
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str6 year byte(AFG AGO) double ALB byte ARE double ARG byte ARM double AUS "1952q1" . . . . 1 . 0 "1952q2" . . . . 0 . 2 "1952q3" . . . . 0 . 0 "1952q4" . . . . 0 . 0 "1953q1" . . . . 0 . 1 "1953q2" . . . . 0 . 2 "1953q3" . . . . 0 . 0 "1953q4" . . . . 1 . 0 "1954q1" . . . . 1.5 . 0 "1954q2" . . . . 2 . 2 "1954q3" . . . . 0 . 2 "1954q4" . . . . 0 . 2 "1955q1" . . . . 2 . 1 "1955q2" . . . . 0 . 0 "1955q3" . . . . 1 . 3 "1955q4" . . . . 1 . 5 "1956q1" 0 . 0 . 0 . 0 "1956q2" 0 . 0 . 0 . 2 "1956q3" 0 . 0 . 1 . 2 "1956q4" 0 . 1 . 2 . 2 "1957q1" 0 . .5 . 0 . 0 "1957q2" 0 . 0 . 0 . .66666667 "1957q3" 0 . 0 . 0 . 1.3333333 "1957q4" 0 . 0 . 0 . 2 "1958q1" 1 . 0 . 0 . 1 "1958q2" 0 . 0 . 0 . 1 "1958q3" 0 . 0 . 1 . 3 "1958q4" 0 . 0 . 1 . 1 "1959q1" 0 . 0 . 0 . 0 "1959q2" 0 . 0 . 0 . 0 "1959q3" 0 . 0 . 0 . 0 "1959q4" 0 . 0 . 2 . 0 "1960q1" 0 . 0 . 0 . 1 "1960q2" 0 . 0 . 2 . 3 "1960q3" 0 . 0 . 2 . 0 "1960q4" 0 . 0 . 0 . 0 "1961q1" 0 . 0 . 0 . 0 "1961q2" 0 . 0 . 1 . 1 "1961q3" 0 . 0 . 0 . 2 "1961q4" 0 . 0 . .5 . 1 "1962q1" 0 . 0 . 1 . 5 "1962q2" 0 . 0 . 0 . 4 "1962q3" 0 . 0 . 1 . 1 "1962q4" 0 . 0 . 1 . 0 "1963q1" 0 . 0 . 4 . 3 "1963q2" 0 . 0 . 3 . 0 "1963q3" 0 . 0 . 2 . 0 "1963q4" 1 . 0 . 0 . 0 "1964q1" 0 . 0 . 1 . 0 "1964q2" 0 . 0 . 0 . 0 "1964q3" 0 . 0 . 3 . 0 "1964q4" 1 . 0 . 3 . 2 "1965q1" 1 . 0 . 1 . 0 "1965q2" 0 . 0 . 1 . 2 "1965q3" 0 . 0 . 0 . 0 "1965q4" 0 . 0 . 0 . 0 "1966q1" 0 . 0 . 1 . 0 "1966q2" 2 . 0 . 2 . 0 "1966q3" 0 . 0 . 3 . 0 "1966q4" 0 . 0 . 1 . 0 "1967q1" 0 . 0 . 1 . 0 "1967q2" 0 . 0 . 0 . 1 "1967q3" 0 . 0 . 0 . 1 "1967q4" 0 . 0 . 0 . 0 "1968q1" 0 . 0 . 0 . 1 "1968q2" 0 . 0 . 1 . 1 "1968q3" 0 . 0 . 1 . 1 "1968q4" 0 . 0 . 0 . 1 "1969q1" 0 . 0 . 2 . 1 "1969q2" 0 . 0 . 0 . 3 "1969q3" 0 . 0 . 0 . 3 "1969q4" 0 . 0 . 0 . 0 "1970q1" 0 . 0 . 0 . 0 "1970q2" 1 . 0 . 0 . 1 "1970q3" 0 . 0 . 1 . 0 "1970q4" 0 . 0 . 5 . 2 "1971q1" 0 . 0 0 3 . 4 "1971q2" 0 . 0 0 2 . 0 "1971q3" 1 . 0 0 0 . 3 "1971q4" 0 . 0 0 1 . 3 "1972q1" 0 . 0 1 3 . 3 "1972q2" 0 . 0 0 3 . 0 "1972q3" 0 . 0 0 4 . 3 "1972q4" 0 . 0 0 1 . 0 "1973q1" 0 . 1 0 3 . 2 "1973q2" 0 . 0 0 2 . 8 "1973q3" 0 . 0 1 1 . 2 "1973q4" 0 . 0 0 3 . 1 "1974q1" 3 0 0 1 1 . 0 "1974q2" 0 0 0 0 1 . 0 "1974q3" 0 0 0 0 3 . 2 "1974q4" 0 1 0 0 3 . 3 "1975q1" 0 0 0 0 1 . 0 "1975q2" 0 0 0 0 2 . 2 "1975q3" 0 0 0 0 0 . 2 "1975q4" 0 0 0 0 1 . 1 "1976q1" 0 0 0 0 1 . 0 "1976q2" 0 0 0 0 2 . 2 "1976q3" 0 1 1 0 0 . 0 "1976q4" 0 2 0 0 1 . 2 end
Show the working directory and save it in a macro
Hi~
I can use command cd or pwd to show the working directory. But how can I save the returned results in a macro.
If I can do this, it will be much easier when I share my codes with others. They will not need to update and reset the working director manually before running my codes.
Any advice?
Thanks~

I can use command cd or pwd to show the working directory. But how can I save the returned results in a macro.
If I can do this, it will be much easier when I share my codes with others. They will not need to update and reset the working director manually before running my codes.
Any advice?
Thanks~

Adding % to stacked bar chart
Dear statalisters,
is there a way to add %-signs (by code) to my stacked bar chart?
here is the code for data preperation and graph:
foreach var of varlist (Q603_1_N - Q603_7_N) {
recode `var' 3=1 1=2 6=3 2=4 5=5 4=6
}
mvdecode Q603_1_N - Q603_7_N, mv (6=.a)
foreach var of varlist (Q603_1_N - Q603_7_N) {
label define `var' 1 "gut" 2 "eher gut" 3 "teils, teils" ///
4 "eher schlecht" 5 "schlecht", modify
}
* Vorbereitung für Grafik
clonevar ta1= Q603_1_N
clonevar ta2= Q603_2_N
clonevar ta3= Q603_3_N
clonevar ta4= Q603_4_N
clonevar ta5= Q603_5_N
clonevar ta6= Q603_6_N
clonevar ta7= Q603_7_N
local ta1: var lab ta1
dis "`ta1`"
foreach var of varlist _all {
local `var`: var lab `var`
}
gen id = _n
reshape long ta, i(id)
rename ta score
*-------------------------------------------------------------------------------------------------------------------------
set scheme prognos
graph set window fontface "Franklin Gothic Book"
graph hbar (percent) if BundeslandAdressinformation== "Nordrhein-Westfalen" , over (score) over(_j, relabel ///
(1 "Werdende Eltern" 2 "Paarfamilien (Familien mit zwei Elternteilen)" 3 "Alleinerziehende (Familien mit einem Elternteil)" ///
4 "Familien mit Migrationshintergrund" 5 "Personen in Trennung/Scheidung" 6 "Sozial schwache Familien" ///
7 "Familien mit wenig Internetkenntnissen")) asyvars percentage stack ///
blabel(bar, pos(center) size(2.5) color(white) format(%2.0f)) ytitle("") ylabel( 0 "0%" 20 "20%" 40 "40%" 60 "60%" 80 "80%" 100 "100%") ///
title("") legend(pos(bottom) cols(5)) name(graph1, replace) xsize(7)
Thanks a lot! Beste regards, Tim
is there a way to add %-signs (by code) to my stacked bar chart?
here is the code for data preperation and graph:
foreach var of varlist (Q603_1_N - Q603_7_N) {
recode `var' 3=1 1=2 6=3 2=4 5=5 4=6
}
mvdecode Q603_1_N - Q603_7_N, mv (6=.a)
foreach var of varlist (Q603_1_N - Q603_7_N) {
label define `var' 1 "gut" 2 "eher gut" 3 "teils, teils" ///
4 "eher schlecht" 5 "schlecht", modify
}
* Vorbereitung für Grafik
clonevar ta1= Q603_1_N
clonevar ta2= Q603_2_N
clonevar ta3= Q603_3_N
clonevar ta4= Q603_4_N
clonevar ta5= Q603_5_N
clonevar ta6= Q603_6_N
clonevar ta7= Q603_7_N
local ta1: var lab ta1
dis "`ta1`"
foreach var of varlist _all {
local `var`: var lab `var`
}
gen id = _n
reshape long ta, i(id)
rename ta score
*-------------------------------------------------------------------------------------------------------------------------
set scheme prognos
graph set window fontface "Franklin Gothic Book"
graph hbar (percent) if BundeslandAdressinformation== "Nordrhein-Westfalen" , over (score) over(_j, relabel ///
(1 "Werdende Eltern" 2 "Paarfamilien (Familien mit zwei Elternteilen)" 3 "Alleinerziehende (Familien mit einem Elternteil)" ///
4 "Familien mit Migrationshintergrund" 5 "Personen in Trennung/Scheidung" 6 "Sozial schwache Familien" ///
7 "Familien mit wenig Internetkenntnissen")) asyvars percentage stack ///
blabel(bar, pos(center) size(2.5) color(white) format(%2.0f)) ytitle("") ylabel( 0 "0%" 20 "20%" 40 "40%" 60 "60%" 80 "80%" 100 "100%") ///
title("") legend(pos(bottom) cols(5)) name(graph1, replace) xsize(7)
Thanks a lot! Beste regards, Tim
Monday, October 24, 2022
Cleaning string variable
Dear all,
I have a variable called place_birth in my dataset. Some of the locations weren't recorded properly.
place_birth
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
ST B(?), canton de Chaillot
(?Chanvrand)Canton de La Guiche
Seine-Inférieure (Seine-Maritime)
Épinay-sur-Seine ,
Autine (?) Outines
Darrois ? Darvois
I would like to do two things.
First, separate what is inside parenthesis () and comma , and = from the text. With what I separate I can create an new variable called place_new
Second, clean both variable from weird signs like ?, =, . at the end, /, etc...
For example
Épinay-sur-Seine ,
should look like
Épinay-sur-Seine
replace ? and (?) with a comma
Autine (?) Outines
it becomes
Autine , Outines
For this one:
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
Eliminate "Saint-James =" and just leave:
Feucherolles (Le château royal de Sainte-Gemme)
Then I can separate the strings by comma and parenthesis so that for example:
place_birth
(?Chanvrand)Canton de La Guiche
becomes:
place_new
Chanvrand
Or:
place_birth
Seine-Inférieure (Seine-Maritime)
Becomes in the new var:
place_new
Seine-Maritime
I have a variable called place_birth in my dataset. Some of the locations weren't recorded properly.
place_birth
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
ST B(?), canton de Chaillot
(?Chanvrand)Canton de La Guiche
Seine-Inférieure (Seine-Maritime)
Épinay-sur-Seine ,
Autine (?) Outines
Darrois ? Darvois
I would like to do two things.
First, separate what is inside parenthesis () and comma , and = from the text. With what I separate I can create an new variable called place_new
Second, clean both variable from weird signs like ?, =, . at the end, /, etc...
For example
Épinay-sur-Seine ,
should look like
Épinay-sur-Seine
replace ? and (?) with a comma
Autine (?) Outines
it becomes
Autine , Outines
For this one:
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
Eliminate "Saint-James =" and just leave:
Feucherolles (Le château royal de Sainte-Gemme)
Then I can separate the strings by comma and parenthesis so that for example:
place_birth
(?Chanvrand)Canton de La Guiche
becomes:
place_new
Chanvrand
Or:
place_birth
Seine-Inférieure (Seine-Maritime)
Becomes in the new var:
place_new
Seine-Maritime
Error merging master data with using data keeping the merge key as city and statefip
I need to convert my master data to county level. For that in my master data I have statefip and city name. In my using data I have city , statefip , county_fip and county name.
The issue arises, since plenty of city in different states have multiple county code/county name as I'm showing a part of duplicates ( from my using data ) data with same city and state but different county name /couty_fip code.
This is how my master data looks like :
dataex id city statefip if city=="Georgetown" & statefip=="PA"
----------------------- copy starting from the next line -----------------------
This is how my using data looks like
dataex city statefip county_fips county_name if city== "Georgetown" & statefip=="PA"
----------------------- copy starting from the next line -----------------------
When I'm using m:1
Whenever I'm using 1:m the following error message shows up
What direction should I go to given this particular problem ?
The issue arises, since plenty of city in different states have multiple county code/county name as I'm showing a part of duplicates ( from my using data ) data with same city and state but different county name /couty_fip code.
Code:
list city state_id county_name, sepby(city state_id) noobs +---------------------------------------+ | city state_id county_name | |---------------------------------------| | Midway FL Gadsden | | Midway FL Santa Rosa | | Midway FL Seminole | |---------------------------------------| | Woodbury NY Nassau | | Woodbury NY Orange | |---------------------------------------| | Oakwood OH Cuyahoga | | Oakwood OH Montgomery | | Oakwood OH Paulding | |---------------------------------------| | Franklin PA Cambria | | Franklin PA Venango | |---------------------------------------| | Georgetown PA Beaver | | Georgetown PA Lancaster | | Georgetown PA Luzerne | |---------------------------------------|
dataex id city statefip if city=="Georgetown" & statefip=="PA"
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str14 id str43 city str3 statefip "11311793" "Georgetown" "PA" "12054868" "Georgetown" "PA" "13761470" "Georgetown" "PA" "63145796" "Georgetown" "PA" "63058645" "Georgetown" "PA" "11415903" "Georgetown" "PA" "62891646" "Georgetown" "PA" "10933939" "Georgetown" "PA" "17385483" "Georgetown" "PA" "10925795" "Georgetown" "PA" end
dataex city statefip county_fips county_name if city== "Georgetown" & statefip=="PA"
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str35 city str2 statefip long county_fips str21 county_name "Georgetown" "PA" 42007 "Beaver" "Georgetown" "PA" 42071 "Lancaster" "Georgetown" "PA" 42079 "Luzerne" end
When I'm using m:1
Code:
merge m:1 city statefip using "using.dta", keep(master match) nogen variables city statefip do not uniquely identify observations in the using data r(459); end of do-file r(459);
Code:
merge 1:m city statefip using "city_countyfipshort.dta", keep(master match) nogen variables city statefip do not uniquely identify observations in the master data r(459); end of do-file r(459);