Borroer_id Year Loan amount Loan maturity Loan interest
1 2011 101 60 8.5
1 2011 95 55 5.7
2 2011 85 55 8.6
3 2011 90 44 6.5
3 2012 82 46 7.5
4 2011 65 60 6.2
4 2012 67 90 5.4
5 2011 85 45 6.5
5 2011 62 78 4.5
6 2012 55 23 6.5
I would like to calculate the weighted average for the (loan amount, loan maturity, and loan interest ) according to the weight by loan amount relative to the total amount in each borrower_id.
For instance, for borrower 1, the weighted average is (101/196, 95/196), respectively equal to 0.51, 0.48 and using these percentages to calculate the weighted loan average = (101*0.51+95*0.48)=97.11
Using the same weighted average percentage created from borrower_1 loans (0.51, 0.48), I would like to use them to be applied to both the loan maturity and loan interest as well.
thank you so much in advance for your help.
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Tuesday, January 31, 2023
Calculating differences between adjacent elements
Dear all,
I wonder if there is any easy solution available in Mata for calculating the difference between adjacent elements of a (column) vector? For example, the difference of vector Y should return (Y[2]-Y[1], Y[3]-Y[2], ..., Y[n]-Y[n-1]).
Thank you,
Eilya.
I wonder if there is any easy solution available in Mata for calculating the difference between adjacent elements of a (column) vector? For example, the difference of vector Y should return (Y[2]-Y[1], Y[3]-Y[2], ..., Y[n]-Y[n-1]).
Thank you,
Eilya.
save regression results for margins ploting
My question is straightforward. Could Stata save regression results so that I do not have to re-run the model when trying to use margins and marginsplot commands?
Estimate the difference-in-differences estimator
Hi,
I restricted my wave sample from 1-10 to 9-10 and I was wondering how would i use the command bysort to create a covid variable. To show the difference between the level of wellbeing before (wave9) and after wave (10) using.
I would appreciate any clarifications as I do not have much knowledge of the bysort command.
Kind regards,
Iani
I restricted my wave sample from 1-10 to 9-10 and I was wondering how would i use the command bysort to create a covid variable. To show the difference between the level of wellbeing before (wave9) and after wave (10) using.
I would appreciate any clarifications as I do not have much knowledge of the bysort command.
Kind regards,
Iani
Combining variables in the variable list by country
I am looking at the dataset where I have three variable lists: life expectancy by country for specific age groups. The age interval for now is four years, and I would like to combine them (e.g. combine first 24 years 0 - 24) by country, so that the life expectancy is combined as well. Could you help me with what the easiest way would be? Thank you!
Interpretation of Competing Risk with stcrreg
I'm new to using competing risk analysis and want to make sure I'm interpreting and using it correctly (given that I'm getting good result, I don't want to be mis-representing the data)
I am trying to use a competing risk (Stcrreg) analysis for an intervention in a clinical study. I am interested in the first 48 hours after randomization. I’ve coded my desired outcome as event=0 and have two competing risk events (event=2, precluding any further possibility of the desired outcome; event=1, impeding the desired outcome). So, if one group has the intervention (=1) and the other group does not get the intervention (=0) can I interpret the subhazard ratio as the probability of experiencing the desired outcome during the time period for the events coded?
Code:
stset time, failure(event==0)
stcrreg intervention, compete(event==1 2)
I come up with SHR = 2.997 (95% CI 1.46 – 6.15)
Is it correct to say that the probability of the desired outcome is 3 times more likely during the first 48 hours with the intervention?
I am trying to use a competing risk (Stcrreg) analysis for an intervention in a clinical study. I am interested in the first 48 hours after randomization. I’ve coded my desired outcome as event=0 and have two competing risk events (event=2, precluding any further possibility of the desired outcome; event=1, impeding the desired outcome). So, if one group has the intervention (=1) and the other group does not get the intervention (=0) can I interpret the subhazard ratio as the probability of experiencing the desired outcome during the time period for the events coded?
Code:
stset time, failure(event==0)
stcrreg intervention, compete(event==1 2)
I come up with SHR = 2.997 (95% CI 1.46 – 6.15)
Is it correct to say that the probability of the desired outcome is 3 times more likely during the first 48 hours with the intervention?
Monday, January 30, 2023
Multiplying observations in a variable list
In a dataset of multiple variable lists, I would like to multiply the value of only selected variables in one variable list by 4. Namely, I want to keep the observations unchaged if the observation concerns the age_group of <1 years and 95 plus (age_group variable list is not numeric). For all the other cases, I am trying to multiply the observation values by 4. I initially used this command: gen new_var = old_var * 4 if age_group != "<1 year" but I cannot incorporate 95 plus age group in this command. Also, in a new variable list, observations for <1 age group are deleted, while I would just like to keep the previous value. Thanks!
Zero event meta analysis with ipdmetan
Hey everyone, I'm trying to run a two-stage meta-analysis to generate a forest plot with individual participant data using ipdmetan to plot the odds of an event across several trials. Unfortunately, for the majority of trials, in one of the subgroups of interest, there were no events. Is there any way ipdmetan can adjust for zero-event meta-analysis to include these trials in the analysis? I can only see an option "keepall" to include the excluded studies in the forest plot. Reading the literature for zero event meta-analysis, some propose to put in 0.5 events in the subgroup of interest. Is there any way ipdmetan can handle this? Thank you!
Traj command (dropout model)
Hello everyone,
I am using the traj command to examined gait speed trajectories and I have a question about the dropout model option. When I used this option, some of my models had positive BICs and some had negative BICs. However, when I removed that option, all BICs were positive. What is the benefit of using this option? Can I omit this option?
Here’s my code and BIC values:
2 groups (with dropout model)
traj, var(walkspeed_*) indep(time_*) model(cnorm) order(0 0) min(0.07533158) max(1.9544182) dropout(0 0)
BIC= -873.03 (N=9525) BIC= -869.39 (N=2836) AIC= -851.54 ll= -845.54
2 groups (without dropout model)
traj, var(walkspeed_*) indep(time_*) model(cnorm) order(0 0) min(0.07533158) max(1.9544182)
BIC= 2004.89 (N=9525) BIC= 2007.31 (N=2836) AIC= 2019.21 ll= 2023.21
Thanks in advance.
I am using the traj command to examined gait speed trajectories and I have a question about the dropout model option. When I used this option, some of my models had positive BICs and some had negative BICs. However, when I removed that option, all BICs were positive. What is the benefit of using this option? Can I omit this option?
Here’s my code and BIC values:
2 groups (with dropout model)
traj, var(walkspeed_*) indep(time_*) model(cnorm) order(0 0) min(0.07533158) max(1.9544182) dropout(0 0)
BIC= -873.03 (N=9525) BIC= -869.39 (N=2836) AIC= -851.54 ll= -845.54
2 groups (without dropout model)
traj, var(walkspeed_*) indep(time_*) model(cnorm) order(0 0) min(0.07533158) max(1.9544182)
BIC= 2004.89 (N=9525) BIC= 2007.31 (N=2836) AIC= 2019.21 ll= 2023.21
Thanks in advance.
Making a dummy for threshold
Hi,
I am going to make a dummy Ii ( Em (i)<= Em) that selects firms whose employment (Em(i)) is below a certain threshold, the median of employment over the sample.
firms ID: NPC_FIC
Employment: Em
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double NPC_FIC float Em 500008310 1 500027353 3 500049799 2 500050215 2 500135017 7 500139105 2 500154496 28 500206942 3 500251226 2 500265754 1 500321035 3 500369375 2 500392001 1 500392920 1 500422489 1 500458395 1 500979142 145 500979162 16 500979181 8 500979184 12 500979206 34 500979220 32 500979239 7 500979255 32 500979269 68 500979271 10 500979282 4 500979284 7 500979293 63 500979319 3 500979326 8 500979329 27 500979360 4 500979373 33 500979374 10 500979383 78 500979450 14 500979456 51 500979474 89 500979489 14 500979491 22 500979522 13 500979524 5 500979526 16 500979530 22 500979531 5 500979532 59 500979537 31 500979543 18 500979551 11 500979559 2 500979561 6 500979563 18 500979564 27 500979566 34 500979567 26 500979569 30 500979574 10 500979596 53 500979625 8 500979634 17 500979645 2 500979673 3 500979679 10 500979689 2 500979712 3 500979737 12 500979740 46 500979749 1 500979750 66 500979758 21 500979809 22 500979810 5 500979812 1 500979820 55 500979838 22 500979871 1 500979874 8 500979890 9 500979935 6 500979937 1 500979940 6 500979959 100 500979962 6 500979964 310 500979981 227 500979987 565 500979989 2 500980043 17 500980048 204 500980053 76 500980077 4 500980081 39 500980087 1 500980105 3 500980109 5 500980110 15 500980146 4 500980167 9 500980175 23 end
Listed 100 out of 24599 observations
First, I made the median:
Code:
egen median_Employment = median( Employment)
Any suggestion is appreciated.
Cheers,
Paris
Selecting Date Range
I have a data where one of the variables is CloseDate whose Variable Type is int and Format is %tdnn/dd/CCYY. I would simply like to select a range to include when Total the sum of a variable. In other words, sum the amounts between, for example, 1/1/2022 and 12/31/2022. As a new Stata user, I would appreciate help doing what I think should be an easy task. Thanks for your help.
How are p-values calculated in an Oaxaca-Blinder Decomposition
Hi all,
I am trying to interpret the results of an Oaxaca-Blinder Decomposition. I am using the popular Oaxaca command. For my project I am just interested in the results of a two-way decomposition (i.e. I don't want the interaction terms), so I am using the pooled option. These results are based off of a logistic regression. Here is my output:
I am seeking help since within these results, there is overall significant differences between the two groups (urban and rural in this case), with roughly 24.9% of the difference coming from difference in composition/explained (p-value not statistically significant) and 75.1 percent coming from differences in coefficients/unexplained (p-value is statistically significant). On the surface this make sense, but when you look at the individual variables within the explained and unexplained portions, these findings don't line up.
Specifically, the only significant variables are found in the explained portion (educ4), and there are no significant variables in the unexplained portion (despite unexplained being significant overall).
Am I misinterpreting the results? Or, on a more technical level, how are standard errors and p-values calculated within an Oaxaca-Blinder Decomposition? Does the calculation differ when trying to estimate the significance of the overall explained/unexplained components than when trying to calculate the effects of individual variables?
Please let me know if I can clarify anything
I am trying to interpret the results of an Oaxaca-Blinder Decomposition. I am using the popular Oaxaca command. For my project I am just interested in the results of a two-way decomposition (i.e. I don't want the interaction terms), so I am using the pooled option. These results are based off of a logistic regression. Here is my output:
Code:
. svyset [pweight = weight], str(stratum_var) psu(cluster_var) tab race, gen(race) tab educ, gen(educ) tab hhinc, gen(hhinc) tab age_g5, gen(age_g5 oaxaca cohab_par age_g52 age_g53 age_g54 age_g55 race2 race3 race4 imm /// > educ2 educ3 educ4 hhinc2 hhinc3 hhinc4, /// > by(rural) svy logit pooled Blinder-Oaxaca decomposition Number of strata = 18 Number of obs = 3,268 Number of PSUs = 72 Population size = 38,244,320 Design df = 54 Model = logit Group 1: rural = 0 N of obs 1 = 2116 Group 2: rural = 1 N of obs 2 = 468 Linearized cohab_par Coefficient std. err. t P>t [95% conf. interval] overall group_1 .1405385 .0141326 9.94 0.000 .1122043 .1688727 group_2 .2199063 .0242979 9.05 0.000 .1711919 .2686207 difference -.0793678 .0276199 -2.87 0.006 -.1347424 -.0239933 explained -.0197445 .0107814 -1.83 0.073 -.0413599 .0018708 unexplained -.0596233 .0273862 -2.18 0.034 -.1145294 -.0047172 explained age_g52 -.0005991 .0012 -0.50 0.620 -.0030048 .0018067 age_g53 -.001 .001295 -0.77 0.443 -.0035962 .0015963 age_g54 .0000969 .0009533 0.10 0.919 -.0018143 .0020081 age_g55 -.0004468 .0010476 -0.43 0.671 -.0025472 .0016535 race2 -.001195 .0013982 -0.85 0.396 -.0039982 .0016081 race3 .0009479 .00266 0.36 0.723 -.0043851 .0062809 race4 .0005973 .0013599 0.44 0.662 -.002129 .0033237 imm .0003432 .0019318 0.18 0.860 -.0035298 .0042161 educ2 -.0002301 .0009301 -0.25 0.806 -.0020949 .0016347 educ3 .0008762 .0017989 0.49 0.628 -.0027304 .0044828 educ4 -.012033 .005119 -2.35 0.022 -.022296 -.00177 hhinc2 -.0003015 .0006636 -0.45 0.651 -.0016318 .0010289 hhinc3 -.0001882 .0006291 -0.30 0.766 -.0014493 .001073 hhinc4 -.0066125 .0039993 -1.65 0.104 -.0146306 .0014057 unexplained age_g52 -.028689 .0165345 -1.74 0.088 -.0618386 .0044607 age_g53 -.0297615 .0290941 -1.02 0.311 -.0880916 .0285686 age_g54 -.0556364 .0411148 -1.35 0.182 -.1380666 .0267938 age_g55 -.0026016 .0261534 -0.10 0.921 -.055036 .0498328 race2 .0048568 .0077741 0.62 0.535 -.0107293 .0204428 race3 .0176802 .0161501 1.09 0.278 -.0146989 .0500593 race4 .0041202 .0058635 0.70 0.485 -.0076355 .0158759 imm -.0093887 .012094 -0.78 0.441 -.0336357 .0148583 educ2 .0047812 .0141557 0.34 0.737 -.0235994 .0331617 educ3 -.0291882 .026426 -1.10 0.274 -.0821691 .0237926 educ4 -.0366389 .0235373 -1.56 0.125 -.0838284 .0105505 hhinc2 .0017072 .0118416 0.14 0.886 -.0220339 .0254482 hhinc3 .0011674 .0093851 0.12 0.901 -.0176486 .0199834 hhinc4 -.0020982 .0118124 -0.18 0.860 -.0257806 .0215842 _cons .1000664 .098881 1.01 0.316 -.0981782 .2983109
Specifically, the only significant variables are found in the explained portion (educ4), and there are no significant variables in the unexplained portion (despite unexplained being significant overall).
Am I misinterpreting the results? Or, on a more technical level, how are standard errors and p-values calculated within an Oaxaca-Blinder Decomposition? Does the calculation differ when trying to estimate the significance of the overall explained/unexplained components than when trying to calculate the effects of individual variables?
Please let me know if I can clarify anything
Solve dynamic non linear deterministic system
Dear forum,
After several hours of work, I am looking for the forum to help me to find a solution to my problem.
I am used to work with GAMS to solve non linear equations systems, I would like to know if it is possible to do this step directly with STATA. Indeed, this resolution is for me only a first step in my project since the results obtained are then integrated in a microsimulation model in a household survey and this work is conducted on STATA. I would like to find a way to do everything in STATA if it is possible.
What I need is to know if in STATA I can solve a system of deterministic dynamic nonlinear equations (without random shock). It is a model calibrated on national accounting data and therefore does not need to be estimated since we are not using time series (or panel) data. I tried with the dsgenl command without success.
Thanks in advance for your answers.
After several hours of work, I am looking for the forum to help me to find a solution to my problem.
I am used to work with GAMS to solve non linear equations systems, I would like to know if it is possible to do this step directly with STATA. Indeed, this resolution is for me only a first step in my project since the results obtained are then integrated in a microsimulation model in a household survey and this work is conducted on STATA. I would like to find a way to do everything in STATA if it is possible.
What I need is to know if in STATA I can solve a system of deterministic dynamic nonlinear equations (without random shock). It is a model calibrated on national accounting data and therefore does not need to be estimated since we are not using time series (or panel) data. I tried with the dsgenl command without success.
Thanks in advance for your answers.
Sunday, January 29, 2023
Calculating summary statistics for correlation coefficients
Hello,
I have calculated some pairwise correlation coefficients between observations in a panel data set within a grouping variable, and I have successfully transferred the correlation coefficient matrix into a data set. I am now looking to calculate some summary statistics for these correlation coefficients (mean, percentiles, std), however, I do not want to double count coefficients in these calculations. For example, since the correlation coefficient at column C1 row 2 is the same at column C2 row one, if I collapse and sum by column, I will be double counting these coefficients. Similarly, I do not want to include correlation coefficients between a variable and itself. How can I avoid double counting correlation coefficients?
[CODE]
* Example generated by -dataex-. For more info, type help dataex
clear
Thanks for any help!
I have calculated some pairwise correlation coefficients between observations in a panel data set within a grouping variable, and I have successfully transferred the correlation coefficient matrix into a data set. I am now looking to calculate some summary statistics for these correlation coefficients (mean, percentiles, std), however, I do not want to double count coefficients in these calculations. For example, since the correlation coefficient at column C1 row 2 is the same at column C2 row one, if I collapse and sum by column, I will be double counting these coefficients. Similarly, I do not want to include correlation coefficients between a variable and itself. How can I avoid double counting correlation coefficients?
[CODE]
* Example generated by -dataex-. For more info, type help dataex
clear
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input float(C1 C2 C3 C4 C5 C6 C7 C8)
1 .9937328 .9937328 . . . 1 .9032351
.9937328 1 1 . . . 1 .946541
.9937328 1 1 . . . 1 .946541
. . . 1 1 . . -1
. . . 1 1 . . -1
. . . . . . . .
1 1 1 . . . 1 .
.9032351 .946541 .946541 -1 -1 . . 1
endCountry-pair specific id for gravity model
Hello all,
I'm trying to estimate a gravity model with trade data that is disaggregated at industry level and trying to include country-pair fixed effects in my regression. I'm aware when you want to generate country pair id, you egen pairid = group(importer exporter). But what I have been wondering is when you generate pair id like above, that means a pair of USA(importer) and Canada(exporter) would be different from a pair of Canada(importer) and USA(exporter).
So my question is, is it right to run regression with country-pair fe with pair ids just like above? From my understanding, we include country-pair fe to observe heterogeneity between each country pair. So then, wouldn't it be more appropriate for me to have the same pair id for the pair of USA and Canda regardless of which one of these two country is importer or exporter? (Canada USA(imp)-Canada(exp) and Canada(imp)-USA(exp) both pair should have the same pair id since they both pairs are comprised of the same countries) Secondly, if so, then what is the command I can use to generate the same pair id for the USA and Canada regardless of them being importer or exporter?
Thank you
I'm trying to estimate a gravity model with trade data that is disaggregated at industry level and trying to include country-pair fixed effects in my regression. I'm aware when you want to generate country pair id, you egen pairid = group(importer exporter). But what I have been wondering is when you generate pair id like above, that means a pair of USA(importer) and Canada(exporter) would be different from a pair of Canada(importer) and USA(exporter).
So my question is, is it right to run regression with country-pair fe with pair ids just like above? From my understanding, we include country-pair fe to observe heterogeneity between each country pair. So then, wouldn't it be more appropriate for me to have the same pair id for the pair of USA and Canda regardless of which one of these two country is importer or exporter? (Canada USA(imp)-Canada(exp) and Canada(imp)-USA(exp) both pair should have the same pair id since they both pairs are comprised of the same countries) Secondly, if so, then what is the command I can use to generate the same pair id for the USA and Canada regardless of them being importer or exporter?
Thank you
How to merge datasets using joinby?
I have the following data. In Data set A, I have rows of children with "h13hhidc" indicating their household ID including the relevant sub-household ID information. "h13hhidc" is not unique to the children since multiple children of the same household are found within the data. Data Set B includes parents (some single households and some married households). These are also uniquely identified by h13hhidc. I would like to merge Data A and Data B so that the final result includes all parents in Data B having a new row for each child present in Data A, that has a matching h13hhidc. I want to keep people in Data B that do not have children in Data A (as a I later have more child data sets to import), but again, have several lines for parents that did in fact have matching children in Data A.
I have searched the forum and feel like this would be the proper time for the joinby command, but I am not getting the desired result. Any help is greatly appreciated.
Data A
Data B
I have searched the forum and feel like this would be the proper time for the joinby command, but I am not getting the desired result. Any help is greatly appreciated.
Data A
Code:
. list hhid opn h13hhidc psubhh qsubhh education in 1/500 , sepby (hhid)
+------------------------------------------------------+
| hhid opn h13hhidc psubhh qsubhh educat~n |
|------------------------------------------------------|
1. | 010004 101 0100040 0 0 . |
2. | 010004 201 0100040 0 0 . |
3. | 010004 202 0100040 0 0 . |
4. | 010004 604 0100040 0 0 . |
5. | 010004 605 0100040 0 0 . |Code:
. list hhid pn h13hhidc in 6/20, sepby (hhid)
+-------------------------+
| hhid pn h13hhidc |
|-------------------------|
6. | 010003 020 . |
7. | 010003 030 0100030 |
|-------------------------|
8. | 010004 010 . |
9. | 010004 040 0100040 |
|-------------------------|
10. | 010013 010 . |
11. | 010013 040 0100131 |
|-------------------------|
12. | 010038 010 0100380 |
13. | 010038 040 0100380Describe variables
I have a dataset (example below) and I would like to know the frequency of males (sex==1) and females (sex==0) that mentioned (resinsulted3_3 ==1) and not mentioned (resinsulted3_3 ==0) the variable resinsulted3_3.
Many thanks in advance
Many thanks in advance
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long pidp byte(sex resinsulted3_3) 68043527 1 0 68344767 2 0 68395767 1 0 68395771 2 0 68467847 2 0 68990767 2 0 69307651 1 0 69443647 1 1 69593251 1 0 69734695 1 0 69848927 2 1 69848931 1 0 69931887 2 0 70435091 2 0 137668727 2 0 137675535 1 0 138035931 1 1 138292287 2 0 204551495 2 0 204551499 1 1 205473571 1 0 205566727 1 1 205722447 2 1 205857091 1 0 205995807 1 0 206348055 2 0 272116287 2 1 272159131 1 1 272395771 1 0 273058775 1 0 273729247 1 0 273973371 2 0 274289571 1 0 340490971 1 0 341507567 1 1 341636091 1 0 341678935 2 0 341895167 1 1 341901971 1 1 342167847 1 0 346745684 1 0 408627647 2 0 408983291 1 0 409210407 2 0 409330767 2 0 409447727 1 1 409962499 2 0 410297047 1 0 410339207 2 0 410339223 1 0 410355527 1 0 410454807 2 0 416520484 2 1 476163891 2 0 476351567 2 0 477203615 1 0 477457927 1 0 477703411 2 1 477945499 1 1 478246051 2 0 478263727 1 0 478468407 1 0 478513291 1 0 545284535 2 0 545702047 2 1 545714287 2 0 545789087 1 0 545795903 2 0 546102567 2 0 546226327 2 1 546445295 1 1 546720687 2 0 546764207 2 0 612737127 2 0 613139007 2 0 613487171 1 1 613516407 2 1 613518455 1 0 613555167 2 0 613878167 2 0 614307251 2 0 680962211 2 0 681007099 2 0 681717687 1 1 681959767 1 1 687758964 1 0 748383527 2 0 748565087 1 1 749221971 1 0 749427331 1 1 749433451 1 1 749436167 1 1 749776171 1 1 750230407 2 0 816190411 2 0 816391011 2 0 817014571 1 0 817370211 1 0 817714967 2 0 817917607 1 0 end label values sex c_sex label def c_sex 1 "male", modify label def c_sex 2 "female", modify label values resinsulted3_3 c_resinsulted3_3 label def c_resinsulted3_3 0 "not mentioned", modify label def c_resinsulted3_3 1 "mentioned", modify
How to perform relative mortality in STATA?
Hi Everyone. I have a dataset of cancer patients from 2004-2014 with survival status until 2015. Now I've been asked to do relative mortality analysis and acturial survival. Could you please give me the syntax for performing this analysis?
Regards
Regards
Sactter plot with different groups while adding label for one particular observation (sepscatter command)
Hi everyone,
I have a cross section data for a group of countries and three variables: government spending, test scores and regions (please find below an example of the data). I would like to create a scatter plot with two variables (gov spending and test scores) while having different colors for each region and at the same time adding one label for one particular observation.
Example of the data
* Example generated by -dataex-. For more info, type help dataex
clear
input int year str3 code float harmonizedtestscores double govexpstudent str26 region
2020 "ABW" . 17.08119 "Latin America & Caribbean"
2020 "AFG" 354.7588 10.25174 "South Asia"
2020 "ALB" 434.1276 34.18148 "Europe & Central Asia"
2020 "AND" . 12.30702 "Europe & Central Asia"
2020 "ARE" 448.0407 6.04318 "Middle East & North Africa"
2020 "ARG" 408.1726 14.97726 "Latin America & Caribbean"
2020 "ARM" 442.96945 10.15365 "Europe & Central Asia"
2020 "ATG" 406.9974 8.05843 "Latin America & Caribbean"
2020 "AUS" 515.68536 19.30184 "East Asia & Pacific"
2020 "AUT" 507.6394 23.48041 "Europe & Central Asia"
2020 "BDI" 422.7478 12.92904 "Sub-Saharan Africa"
2020 "BEL" 516.80786 21.87767 "Europe & Central Asia"
2020 "BEN" 383.92285 10.25321 "Sub-Saharan Africa"
2020 "BFA" 403.6543 16.13029 "Sub-Saharan Africa"
2020 "BGD" 368.3153 7.56853 "South Asia"
2020 "BGR" 441.0933 22.99728 "Europe & Central Asia"
2020 "BHR" 451.706 11.16362 "Middle East & North Africa"
2020 "BLZ" . 16.55987 "Latin America & Caribbean"
2020 "BMU" . 8.05056 "North America"
2020 "BRA" 413.24475 20.05496 "Latin America & Caribbean"
2020 "BRB" . 20.75168 "Latin America & Caribbean"
2020 "BRN" 437.5159 8.87354 "East Asia & Pacific"
2020 "BTN" 386.7394 12.84915 "South Asia"
2020 "BWA" 391.3183 10.0503 "Sub-Saharan Africa"
2020 "CAF" 368.7302 4.10032 "Sub-Saharan Africa"
2020 "CAN" 533.998 17.56737 "North America"
2020 "CHE" 515.13904 24.78876 "Europe & Central Asia"
2020 "CHL" 452.2182 18.38536 "Latin America & Caribbean"
2020 "CHN" 441 5.90241 "East Asia & Pacific"
2020 "CIV" 373.2305 13.27872 "Sub-Saharan Africa"
2020 "CMR" 378.8688 5.44964 "Sub-Saharan Africa"
2020 "COD" 310.1971 7.19307 "Sub-Saharan Africa"
2020 "COG" 370.6141 11.68812 "Sub-Saharan Africa"
2020 "COL" 419.0275 16.98989 "Latin America & Caribbean"
2020 "COM" 392.2255 9.75717 "Sub-Saharan Africa"
2020 "CPV" . 16.65962 "Sub-Saharan Africa"
2020 "CRI" 428.5557 20.82721 "Latin America & Caribbean"
2020 "CUB" . 49.06773 "Latin America & Caribbean"
2020 "CUW" . 10.00974 "Latin America & Caribbean"
2020 "CYM" . 17.06149 "Latin America & Caribbean"
2020 "CYP" 502.1622 31.88086 "Europe & Central Asia"
2020 "CZE" 512.2216 13.8679 "Europe & Central Asia"
2020 "DEU" 517.28143 17.44257 "Europe & Central Asia"
2020 "DJI" . 37.32327 "Middle East & North Africa"
2020 "DMA" 403.7832 14.64306 "Latin America & Caribbean"
2020 "DNK" 517.8778 25.05246 "Europe & Central Asia"
2020 "DOM" 345.2165 16.18636 "Latin America & Caribbean"
2020 "DZA" 374.0891 11.47531 "Middle East & North Africa"
2020 "ECU" 420.1486 9.51098 "Latin America & Caribbean"
2020 "EGY" 355.9865 9.84518 "Middle East & North Africa"
2020 "ERI" . 5.26822 "Sub-Saharan Africa"
2020 "ESP" 506.6212 17.11832 "Europe & Central Asia"
2020 "EST" 543.20605 20.29085 "Europe & Central Asia"
2020 "ETH" 348.199 7.87686 "Sub-Saharan Africa"
2020 "FIN" 533.70764 21.50883 "Europe & Central Asia"
2020 "FJI" 383.336 12.35277 "East Asia & Pacific"
2020 "FRA" 510.2606 17.3647 "Europe & Central Asia"
2020 "GAB" 456.1277 4.71526 "Sub-Saharan Africa"
2020 "GBR" 520.35645 24.17334 "Europe & Central Asia"
I have used the following code and it works perfectly (please find below an image):
separate harmonizedtestscores ,by( region ) veryshortlabel
scatter harmonizedtestscores? govexpstudent
I just would like to add a label for one country "EGY". I would be grateful if you could advise.
Many thanks in advance.
Jala
I have a cross section data for a group of countries and three variables: government spending, test scores and regions (please find below an example of the data). I would like to create a scatter plot with two variables (gov spending and test scores) while having different colors for each region and at the same time adding one label for one particular observation.
Example of the data
* Example generated by -dataex-. For more info, type help dataex
clear
input int year str3 code float harmonizedtestscores double govexpstudent str26 region
2020 "ABW" . 17.08119 "Latin America & Caribbean"
2020 "AFG" 354.7588 10.25174 "South Asia"
2020 "ALB" 434.1276 34.18148 "Europe & Central Asia"
2020 "AND" . 12.30702 "Europe & Central Asia"
2020 "ARE" 448.0407 6.04318 "Middle East & North Africa"
2020 "ARG" 408.1726 14.97726 "Latin America & Caribbean"
2020 "ARM" 442.96945 10.15365 "Europe & Central Asia"
2020 "ATG" 406.9974 8.05843 "Latin America & Caribbean"
2020 "AUS" 515.68536 19.30184 "East Asia & Pacific"
2020 "AUT" 507.6394 23.48041 "Europe & Central Asia"
2020 "BDI" 422.7478 12.92904 "Sub-Saharan Africa"
2020 "BEL" 516.80786 21.87767 "Europe & Central Asia"
2020 "BEN" 383.92285 10.25321 "Sub-Saharan Africa"
2020 "BFA" 403.6543 16.13029 "Sub-Saharan Africa"
2020 "BGD" 368.3153 7.56853 "South Asia"
2020 "BGR" 441.0933 22.99728 "Europe & Central Asia"
2020 "BHR" 451.706 11.16362 "Middle East & North Africa"
2020 "BLZ" . 16.55987 "Latin America & Caribbean"
2020 "BMU" . 8.05056 "North America"
2020 "BRA" 413.24475 20.05496 "Latin America & Caribbean"
2020 "BRB" . 20.75168 "Latin America & Caribbean"
2020 "BRN" 437.5159 8.87354 "East Asia & Pacific"
2020 "BTN" 386.7394 12.84915 "South Asia"
2020 "BWA" 391.3183 10.0503 "Sub-Saharan Africa"
2020 "CAF" 368.7302 4.10032 "Sub-Saharan Africa"
2020 "CAN" 533.998 17.56737 "North America"
2020 "CHE" 515.13904 24.78876 "Europe & Central Asia"
2020 "CHL" 452.2182 18.38536 "Latin America & Caribbean"
2020 "CHN" 441 5.90241 "East Asia & Pacific"
2020 "CIV" 373.2305 13.27872 "Sub-Saharan Africa"
2020 "CMR" 378.8688 5.44964 "Sub-Saharan Africa"
2020 "COD" 310.1971 7.19307 "Sub-Saharan Africa"
2020 "COG" 370.6141 11.68812 "Sub-Saharan Africa"
2020 "COL" 419.0275 16.98989 "Latin America & Caribbean"
2020 "COM" 392.2255 9.75717 "Sub-Saharan Africa"
2020 "CPV" . 16.65962 "Sub-Saharan Africa"
2020 "CRI" 428.5557 20.82721 "Latin America & Caribbean"
2020 "CUB" . 49.06773 "Latin America & Caribbean"
2020 "CUW" . 10.00974 "Latin America & Caribbean"
2020 "CYM" . 17.06149 "Latin America & Caribbean"
2020 "CYP" 502.1622 31.88086 "Europe & Central Asia"
2020 "CZE" 512.2216 13.8679 "Europe & Central Asia"
2020 "DEU" 517.28143 17.44257 "Europe & Central Asia"
2020 "DJI" . 37.32327 "Middle East & North Africa"
2020 "DMA" 403.7832 14.64306 "Latin America & Caribbean"
2020 "DNK" 517.8778 25.05246 "Europe & Central Asia"
2020 "DOM" 345.2165 16.18636 "Latin America & Caribbean"
2020 "DZA" 374.0891 11.47531 "Middle East & North Africa"
2020 "ECU" 420.1486 9.51098 "Latin America & Caribbean"
2020 "EGY" 355.9865 9.84518 "Middle East & North Africa"
2020 "ERI" . 5.26822 "Sub-Saharan Africa"
2020 "ESP" 506.6212 17.11832 "Europe & Central Asia"
2020 "EST" 543.20605 20.29085 "Europe & Central Asia"
2020 "ETH" 348.199 7.87686 "Sub-Saharan Africa"
2020 "FIN" 533.70764 21.50883 "Europe & Central Asia"
2020 "FJI" 383.336 12.35277 "East Asia & Pacific"
2020 "FRA" 510.2606 17.3647 "Europe & Central Asia"
2020 "GAB" 456.1277 4.71526 "Sub-Saharan Africa"
2020 "GBR" 520.35645 24.17334 "Europe & Central Asia"
I have used the following code and it works perfectly (please find below an image):
separate harmonizedtestscores ,by( region ) veryshortlabel
scatter harmonizedtestscores? govexpstudent
I just would like to add a label for one country "EGY". I would be grateful if you could advise.
Many thanks in advance.
Jala
Creating a combined averaged using a panel dataset
I have a panel dataset in vertical form with 6 countries; each shows values for 14 indicators between 2000 and 2020. There is a column that shows the indicator "value" for each "year" and another column that takes the value of the most recent value ("lastvalue") between 2015-2020 (or"lastyear"), thus, this value repeats itself in 20 rows).
Here is an example for only one country Antigua and Barbuda (ATG)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str3 countrycode str20 indicatorcode int year double value int lastyear double lastvalue "ATG" "FX.OWN.TOTL.FE.ZS" 2000 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2001 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2002 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2003 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2004 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2005 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2006 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2007 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2008 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2009 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2010 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2011 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2012 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2013 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2014 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2015 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2016 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2017 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2018 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2019 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2020 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2000 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2001 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2002 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2003 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2004 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2005 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2006 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2007 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2008 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2009 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2010 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2011 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2012 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2013 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2014 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2015 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2016 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2017 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2018 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2019 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2020 . . . "ATG" "IC.FRM.FEMO.ZS" 2000 . . . "ATG" "IC.FRM.FEMO.ZS" 2001 . . . "ATG" "IC.FRM.FEMO.ZS" 2002 . . . "ATG" "IC.FRM.FEMO.ZS" 2003 . . . "ATG" "IC.FRM.FEMO.ZS" 2004 . . . "ATG" "IC.FRM.FEMO.ZS" 2005 . . . "ATG" "IC.FRM.FEMO.ZS" 2006 . . . "ATG" "IC.FRM.FEMO.ZS" 2007 . . . "ATG" "IC.FRM.FEMO.ZS" 2008 . . . "ATG" "IC.FRM.FEMO.ZS" 2009 . . . "ATG" "IC.FRM.FEMO.ZS" 2010 21.3 . . "ATG" "IC.FRM.FEMO.ZS" 2011 . . . "ATG" "IC.FRM.FEMO.ZS" 2012 . . . "ATG" "IC.FRM.FEMO.ZS" 2013 . . . "ATG" "IC.FRM.FEMO.ZS" 2014 . . . "ATG" "IC.FRM.FEMO.ZS" 2015 . . . "ATG" "IC.FRM.FEMO.ZS" 2016 . . . "ATG" "IC.FRM.FEMO.ZS" 2017 . . . "ATG" "IC.FRM.FEMO.ZS" 2018 . . . "ATG" "IC.FRM.FEMO.ZS" 2019 . . . "ATG" "IC.FRM.FEMO.ZS" 2020 . . . "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2000 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2001 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2002 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2003 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2004 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2005 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2006 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2007 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2008 96.2264175415039 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2009 78.1376495361328 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2010 107.908851623535 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2011 112.236839294434 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2012 106.744491577148 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2013 95.4545516967773 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2014 91.2582778930664 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2015 99.1847763061523 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2016 115.650970458984 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2017 104.237289428711 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2018 102.011489868164 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2019 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2020 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2000 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2001 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2002 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2003 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2004 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2005 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2006 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2007 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2008 79.2866897583008 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2009 88.0920181274414 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2010 83.8666687011719 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2011 93.8642272949219 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2012 99.3573303222656 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2013 80.2564086914063 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2014 85.8257522583008 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2015 92.4202117919922 2018 95.35211181640625 end
I would like to average the values for all the 6 countries and create a new combined countrycode. This means that I want a new set of rows for the new "averagedcountries" with
1. average "value" for each indicator (14) for each "year" (between 2000-2020).
2. average "lastvalue" for each indicator for the "recentvalue" (between 2015-2020).
I would appreciate any guidance! Thank you
Here is an example for only one country Antigua and Barbuda (ATG)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str3 countrycode str20 indicatorcode int year double value int lastyear double lastvalue "ATG" "FX.OWN.TOTL.FE.ZS" 2000 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2001 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2002 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2003 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2004 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2005 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2006 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2007 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2008 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2009 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2010 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2011 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2012 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2013 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2014 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2015 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2016 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2017 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2018 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2019 . . . "ATG" "FX.OWN.TOTL.FE.ZS" 2020 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2000 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2001 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2002 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2003 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2004 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2005 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2006 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2007 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2008 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2009 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2010 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2011 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2012 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2013 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2014 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2015 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2016 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2017 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2018 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2019 . . . "ATG" "FX.OWN.TOTL.MA.ZS" 2020 . . . "ATG" "IC.FRM.FEMO.ZS" 2000 . . . "ATG" "IC.FRM.FEMO.ZS" 2001 . . . "ATG" "IC.FRM.FEMO.ZS" 2002 . . . "ATG" "IC.FRM.FEMO.ZS" 2003 . . . "ATG" "IC.FRM.FEMO.ZS" 2004 . . . "ATG" "IC.FRM.FEMO.ZS" 2005 . . . "ATG" "IC.FRM.FEMO.ZS" 2006 . . . "ATG" "IC.FRM.FEMO.ZS" 2007 . . . "ATG" "IC.FRM.FEMO.ZS" 2008 . . . "ATG" "IC.FRM.FEMO.ZS" 2009 . . . "ATG" "IC.FRM.FEMO.ZS" 2010 21.3 . . "ATG" "IC.FRM.FEMO.ZS" 2011 . . . "ATG" "IC.FRM.FEMO.ZS" 2012 . . . "ATG" "IC.FRM.FEMO.ZS" 2013 . . . "ATG" "IC.FRM.FEMO.ZS" 2014 . . . "ATG" "IC.FRM.FEMO.ZS" 2015 . . . "ATG" "IC.FRM.FEMO.ZS" 2016 . . . "ATG" "IC.FRM.FEMO.ZS" 2017 . . . "ATG" "IC.FRM.FEMO.ZS" 2018 . . . "ATG" "IC.FRM.FEMO.ZS" 2019 . . . "ATG" "IC.FRM.FEMO.ZS" 2020 . . . "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2000 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2001 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2002 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2003 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2004 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2005 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2006 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2007 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2008 96.2264175415039 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2009 78.1376495361328 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2010 107.908851623535 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2011 112.236839294434 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2012 106.744491577148 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2013 95.4545516967773 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2014 91.2582778930664 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2015 99.1847763061523 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2016 115.650970458984 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2017 104.237289428711 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2018 102.011489868164 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2019 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.FE.ZS" 2020 . 2018 102.01148986816406 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2000 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2001 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2002 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2003 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2004 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2005 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2006 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2007 . 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2008 79.2866897583008 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2009 88.0920181274414 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2010 83.8666687011719 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2011 93.8642272949219 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2012 99.3573303222656 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2013 80.2564086914063 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2014 85.8257522583008 2018 95.35211181640625 "ATG" "SE.SEC.CMPT.LO.MA.ZS" 2015 92.4202117919922 2018 95.35211181640625 end
I would like to average the values for all the 6 countries and create a new combined countrycode. This means that I want a new set of rows for the new "averagedcountries" with
1. average "value" for each indicator (14) for each "year" (between 2000-2020).
2. average "lastvalue" for each indicator for the "recentvalue" (between 2015-2020).
I would appreciate any guidance! Thank you
How to deal with multicollinearity when adding fixed effects dummies in regression with cross sectional data?
Hello,
I have cross sectional data with 26 groups. I estimated a probit and fracreg regression for my two research questions. Since my key explanatory variables varies at the group level, I added the dummies for group and clustered errors at group level as well. However, I estimated vif by running reg command with the same variables I used for probit and fracreg regression. The VIF is very high for my key explanatory variable (around 28,000) when I add group dummies. However, when I remove the dummies it is within the acceptable range (like 4 etc). Why is the multicollinearity high when I add dummies? How can I fix the issue?
I have cross sectional data with 26 groups. I estimated a probit and fracreg regression for my two research questions. Since my key explanatory variables varies at the group level, I added the dummies for group and clustered errors at group level as well. However, I estimated vif by running reg command with the same variables I used for probit and fracreg regression. The VIF is very high for my key explanatory variable (around 28,000) when I add group dummies. However, when I remove the dummies it is within the acceptable range (like 4 etc). Why is the multicollinearity high when I add dummies? How can I fix the issue?
Saturday, January 28, 2023
scatter Y X || lfit Y X ||, by(variable)
Good evening,
I'd like to run scatter Y X || lfit Y X ||, by(variable)
but have everything on one graph instead of multiple graphs. I realize that there's going to be a lot of points and lines superimposed; that's OK. Is there a way to do it?
Thank you,
Stan
I'd like to run scatter Y X || lfit Y X ||, by(variable)
but have everything on one graph instead of multiple graphs. I realize that there's going to be a lot of points and lines superimposed; that's OK. Is there a way to do it?
Thank you,
Stan
Brant Test Significance Question
Hello everyone. Thank you all for taking the time to answer other questions on this forum, it had been very helpful. This is my first time posting here. I would like to ask a question regarding the oparallel and brant test in particular. As you can see, the assumption of parallel lines is violated.
Tests of the parallel regression assumption
| Chi2 df P>Chi2
-----------------+----------------------
Wolfe Gould | 63.74 22 0.000
Brant | 57.02 22 0.000
score | 60.3 22 0.000
likelihood ratio | 63.02 22 0.000
Wald | 58.48 22 0.000
Brant test is also. However, I am confused why doesn't the chi2 table add up to 57.02? It is much less if I add it up on the calculator. I apologize if it is a really stupid question, but I humbly request your help.
Brant test of parallel regression assumption
| chi2 p>chi2 df
-------------+------------------------------
All | 57.02 0.000 22
-------------+------------------------------
region | 0.02 0.881 1
sex | 0.03 0.870 1
age | 0.76 0.384 1
edu | 3.57 0.059 1
sonh | 1.28 0.258 1
dauh | 3.62 0.057 1
2.mainsrc | 0.22 0.641 1
3.mainsrc | 0.12 0.734 1
4.mainsrc | 0.86 0.355 1
savin | 0.33 0.568 1
gold | 1.02 0.314 1
bond | 2.08 0.150 1
house | 0.00 0.952 1
land | 2.11 0.147 1
car | 0.94 0.333 1
2.satisf | 1.43 0.232 1
3.satisf | 0.09 0.760 1
2.health | 0.78 0.377 1
3.health | 3.89 0.049 1
1.templ | 0.32 0.571 1
2.templ | 0.79 0.373 1
3.templ | 0.02 0.892 1
When I run gologit2 there are also no constraints found. Image attached.
Thank you all. I sincerely hope to get some answer.
Tests of the parallel regression assumption
| Chi2 df P>Chi2
-----------------+----------------------
Wolfe Gould | 63.74 22 0.000
Brant | 57.02 22 0.000
score | 60.3 22 0.000
likelihood ratio | 63.02 22 0.000
Wald | 58.48 22 0.000
Brant test is also. However, I am confused why doesn't the chi2 table add up to 57.02? It is much less if I add it up on the calculator. I apologize if it is a really stupid question, but I humbly request your help.
Brant test of parallel regression assumption
| chi2 p>chi2 df
-------------+------------------------------
All | 57.02 0.000 22
-------------+------------------------------
region | 0.02 0.881 1
sex | 0.03 0.870 1
age | 0.76 0.384 1
edu | 3.57 0.059 1
sonh | 1.28 0.258 1
dauh | 3.62 0.057 1
2.mainsrc | 0.22 0.641 1
3.mainsrc | 0.12 0.734 1
4.mainsrc | 0.86 0.355 1
savin | 0.33 0.568 1
gold | 1.02 0.314 1
bond | 2.08 0.150 1
house | 0.00 0.952 1
land | 2.11 0.147 1
car | 0.94 0.333 1
2.satisf | 1.43 0.232 1
3.satisf | 0.09 0.760 1
2.health | 0.78 0.377 1
3.health | 3.89 0.049 1
1.templ | 0.32 0.571 1
2.templ | 0.79 0.373 1
3.templ | 0.02 0.892 1
When I run gologit2 there are also no constraints found. Image attached.
Thank you all. I sincerely hope to get some answer.
Best way to improve processing speed for large data sets (~3gb)
How much will more RAM help me with processing speed.
I am working with a dataset of 87 million records and is about 3gb in size. Its a 20 year cohort of people and my datasets contain 5-6 rows of data corresponding to their health events. It would be difficut to collapse any further (I could of course separate them into years, but would rather improve processing power than add these extra steps.
For example, today i tried to run a bsample to sample 800000 records out of my 80,000,000, and it took 1 hours and never did complete. I have similar long wait time when trying to flag if a string contained any of a list of 20 ICD codes from this database, waiting times of upwards of an hour while the computer sounds like its about to take off like an airplaine
I have 7GB of available ram and it always running at max during these churns I currently only have 50GM of free C drive disk space (its mostly full). Lots of this I cant delete as they are records and dataset from other projects, and since this is a work computer getting more space will require a series of request from IT etc..
What will make the most difference for me? Requesting more hard drive space would be very easy, and I seem to remember that stata needs plenty of hard drive space to run big datasets.
Or do I need to double my RAM and that is the only way.
This is an ongoing project and will be working with these datasets for at least a year, so need to fix. Using stata 17SE
Thanks
I am working with a dataset of 87 million records and is about 3gb in size. Its a 20 year cohort of people and my datasets contain 5-6 rows of data corresponding to their health events. It would be difficut to collapse any further (I could of course separate them into years, but would rather improve processing power than add these extra steps.
For example, today i tried to run a bsample to sample 800000 records out of my 80,000,000, and it took 1 hours and never did complete. I have similar long wait time when trying to flag if a string contained any of a list of 20 ICD codes from this database, waiting times of upwards of an hour while the computer sounds like its about to take off like an airplaine
I have 7GB of available ram and it always running at max during these churns I currently only have 50GM of free C drive disk space (its mostly full). Lots of this I cant delete as they are records and dataset from other projects, and since this is a work computer getting more space will require a series of request from IT etc..
What will make the most difference for me? Requesting more hard drive space would be very easy, and I seem to remember that stata needs plenty of hard drive space to run big datasets.
Or do I need to double my RAM and that is the only way.
This is an ongoing project and will be working with these datasets for at least a year, so need to fix. Using stata 17SE
Thanks
Friday, January 27, 2023
Hausman Test Failed
Hi! my name is Karina. I run a Hausman test on my stata using command:
quietly xtreg ecgrowth jubgrowth sukubunga inflation, fe
estimates store fe
quietly xtreg ecgrowth jubgrowth sukubunga inflation, re
estimates store re
hausman fe re
and it come up with this:
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -2.36 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;
see suest for a generalized test
after searching for a little while, i found another option for doing the hausman test; it's either using xtoverid or sigmaless/sigmamore. Later, i try to use this as my command:
xtreg ecgrowth jubgrowth sukubunga inflation if t>=2004 & t<=2021, re
xtoverid
di r(j)
est store re
xtreg ecgrowth jubgrowth sukubunga inflation if t>=2004 & t<=2021, fe
hausman fe re, sigmaless
is this the right command to run the hausman test? if not, then would anyone help me to give the right command? im so confused and on the edge of a thesis deadline. Thank you for anyone replying!
quietly xtreg ecgrowth jubgrowth sukubunga inflation, fe
estimates store fe
quietly xtreg ecgrowth jubgrowth sukubunga inflation, re
estimates store re
hausman fe re
and it come up with this:
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -2.36 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;
see suest for a generalized test
after searching for a little while, i found another option for doing the hausman test; it's either using xtoverid or sigmaless/sigmamore. Later, i try to use this as my command:
xtreg ecgrowth jubgrowth sukubunga inflation if t>=2004 & t<=2021, re
xtoverid
di r(j)
est store re
xtreg ecgrowth jubgrowth sukubunga inflation if t>=2004 & t<=2021, fe
hausman fe re, sigmaless
is this the right command to run the hausman test? if not, then would anyone help me to give the right command? im so confused and on the edge of a thesis deadline. Thank you for anyone replying!
Running ANOVA in loops
Hi, I want to run repeated measures ANOVA on math scores and semester, but the analysis has to be done for all levels of stress (high/mid/low) and sex variables (male/female).
Please let me know if this is possible do in a loop without having to subset manually.
Thank you!
Please let me know if this is possible do in a loop without having to subset manually.
Thank you!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte student_id str6(income sex semester) float math_score 1 "low" "male" "summer" 75.79707 2 "low" "male" "winter" 78.42905 3 "low" "male" "fall" 77.19025 4 "low" "male" "summer" 86.13298 5 "low" "male" "winter" 84.81785 6 "low" "male" "fall" 72.808044 7 "low" "male" "summer" 60.94741 8 "low" "male" "winter" 93.07262 9 "low" "male" "fall" 96.72986 10 "low" "male" "summer" 95.45274 11 "low" "male" "winter" 73.21725 12 "low" "male" "fall" 68.09888 13 "low" "male" "summer" 54.45422 14 "low" "male" "winter" 70.273636 15 "low" "male" "fall" 63.7756 16 "low" "male" "summer" 73.78484 17 "low" "male" "winter" 74.67471 18 "low" "male" "fall" 74.49361 19 "low" "male" "summer" 65.32313 20 "low" "male" "winter" 71.98983 21 "low" "male" "fall" 70.34425 22 "low" "male" "summer" 63.78717 23 "low" "male" "winter" 79.24786 24 "low" "male" "fall" 82.9422 25 "low" "male" "summer" 66.953354 26 "low" "male" "winter" 83.132 27 "low" "male" "fall" 45.36668 28 "low" "male" "summer" 81.48463 29 "low" "male" "winter" 85.73037 30 "low" "male" "fall" 79.11235 31 "low" "male" "summer" 81.41333 32 "low" "male" "winter" 82.6655 33 "low" "male" "fall" 62.9081 1 "low" "male" "summer" 61.16171 2 "low" "male" "winter" 55.96095 3 "low" "male" "fall" 65.61567 4 "low" "male" "summer" 57.84998 5 "low" "male" "winter" 62.85389 6 "low" "male" "fall" 86.14867 7 "low" "male" "summer" 65.382805 8 "low" "male" "winter" 78.90556 9 "low" "male" "fall" 77.81376 10 "low" "male" "summer" 73.22633 11 "low" "male" "winter" 78.89809 12 "low" "male" "fall" 67.75861 13 "low" "male" "summer" 63.12998 14 "low" "male" "winter" 64.939606 15 "low" "male" "fall" 68.52872 16 "low" "male" "summer" 82.71146 17 "middle" "male" "winter" 89.11002 18 "middle" "male" "fall" 52.11129 19 "middle" "male" "summer" 65.83049 20 "middle" "male" "winter" 68.84013 21 "middle" "male" "fall" 69.065605 22 "middle" "male" "summer" 74.18057 23 "middle" "male" "winter" 82.78862 24 "middle" "male" "fall" 67.25807 25 "middle" "male" "summer" 69.853386 26 "middle" "male" "winter" 91.95356 27 "middle" "male" "fall" 76.73396 28 "middle" "male" "summer" 60.11725 29 "middle" "male" "winter" 76.10474 30 "middle" "female" "fall" 70.36514 31 "middle" "female" "summer" 67.02353 32 "middle" "female" "winter" 59.18249 33 "middle" "female" "fall" 79.9767 1 "middle" "female" "summer" 79.48432 2 "middle" "female" "winter" 64.14967 3 "middle" "female" "fall" 84.50146 4 "middle" "female" "summer" 63.50959 5 "middle" "female" "winter" 73.36041 6 "middle" "female" "fall" 81.75935 7 "middle" "female" "summer" 71.10308 8 "middle" "female" "winter" 56.25582 9 "middle" "female" "fall" 77.83181 10 "middle" "female" "summer" 60.94855 11 "middle" "female" "winter" 78.57565 12 "middle" "female" "fall" 83.1645 13 "middle" "female" "summer" 79.3857 14 "high" "female" "winter" 74.47901 15 "high" "female" "fall" 56.33062 16 "high" "female" "summer" 66.48232 17 "high" "female" "winter" 63.08831 18 "high" "female" "fall" 60.62999 19 "high" "female" "summer" 76.90691 20 "high" "female" "winter" 69.44049 21 "high" "female" "fall" 56.89666 22 "high" "female" "summer" 79.77268 23 "high" "female" "winter" 68.887505 24 "high" "female" "fall" 71.54644 25 "high" "female" "summer" 65.84627 26 "high" "female" "winter" 77.20901 27 "high" "female" "fall" 66.96027 28 "high" "female" "summer" 67.319885 29 "high" "female" "winter" 74.991455 30 "high" "female" "fall" 70.49643 31 "high" "female" "summer" 71.18354 32 "high" "female" "winter" 56.2911 33 "high" "female" "fall" 69.46329 end
Time invariant dummies.
Hi,
I am performing a fe model and I am adding a series of country dummies to control for country fe.
Unfortunately, all the country dummies are omitted due multicollinearity (they do not change over time), Is there a way to overcome this problem?
Many thanks, Chiara
I am performing a fe model and I am adding a series of country dummies to control for country fe.
Unfortunately, all the country dummies are omitted due multicollinearity (they do not change over time), Is there a way to overcome this problem?
Many thanks, Chiara
How can I check if a string has repeated words?
I know the moss package is related to this but I cannot make it work yet(one similar post can be found here). For example for the string "John said John" the result should be "John" and positions 1 and 11.
Extra:
A good extra thing would be to understand multiple words as well for example "John Williams thanked John Williams" should return "John Williams". The best would be to be able to give me the longest repeated substrin for example "John loves Joh" would give "Joh" and "John Williams thanked John Will" should give "John Will"
Extra:
A good extra thing would be to understand multiple words as well for example "John Williams thanked John Williams" should return "John Williams". The best would be to be able to give me the longest repeated substrin for example "John loves Joh" would give "Joh" and "John Williams thanked John Will" should give "John Will"
forvalue loop for discontinuous variable
How I can use a forvalue loop over the variable whose values are not sequentially continuous? I want to use forvalue loop for the industry codes which are not continuous and huge in numbers.
Panel Data
Hi,
I am using panel data with the wave variable ranging from 1 to 10 however i would like to focus on wave 1 to 8 instead when testing for POLS how would I be able to do this ?
I would appreciate any help
Kind regards,
Denis
I am using panel data with the wave variable ranging from 1 to 10 however i would like to focus on wave 1 to 8 instead when testing for POLS how would I be able to do this ?
I would appreciate any help
Kind regards,
Denis
Testing stability of reggression discontinuity model (TED)
Hello Statalisters,
I need your valuable advice.
Having applied in my analysis regression discontinuity design (RD), I want to check the validity of its results through the ted command to check the stability of the regression discontinuity model.
Can someone guide me how can I install this command in Stata 17?
Any help in this regard would be extremely helpful to me.
Thank you in andvane.
Best,
Angeliki
I need your valuable advice.
Having applied in my analysis regression discontinuity design (RD), I want to check the validity of its results through the ted command to check the stability of the regression discontinuity model.
Can someone guide me how can I install this command in Stata 17?
Any help in this regard would be extremely helpful to me.
Thank you in andvane.
Best,
Angeliki
Thursday, January 26, 2023
matching single county with unique congressional district
According to us house election data a single county for a specific state belongs to multiple congressional districts. However, based on the total geographical area I can match unique county to a specific district for that state.
In the following data nhgisnam means the name of the county , district means Congressional district , cnty_area means the total geographical area of that county and cnty_part_area means the portion of total area of that county belonging to that particular congressional district. For example : In th first line for Autauga county the cnty_part_area's value is 1564828723. That means in district 2 Autauga county's total area is 1564828723 - out of its total area (cnty_area) of 1565529773. Autauga county also belong to district 2, district 7 and district 6 of state 1. But, the highestr portion of it's area belong to district 2 which I can figure out from that cnty_part_area variable.
Can anyone kindly guide me how I can code the data so that for each county in a particular state I can keep the observation where each county is assigned to a single district in that state based on the highest value of cnty_part_area variable for that specific county ??
In the following data nhgisnam means the name of the county , district means Congressional district , cnty_area means the total geographical area of that county and cnty_part_area means the portion of total area of that county belonging to that particular congressional district. For example : In th first line for Autauga county the cnty_part_area's value is 1564828723. That means in district 2 Autauga county's total area is 1564828723 - out of its total area (cnty_area) of 1565529773. Autauga county also belong to district 2, district 7 and district 6 of state 1. But, the highestr portion of it's area belong to district 2 which I can figure out from that cnty_part_area variable.
Can anyone kindly guide me how I can code the data so that for each county in a particular state I can keep the observation where each county is assigned to a single district in that state based on the highest value of cnty_part_area variable for that specific county ??
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str36 nhgisnam str14 statenam byte(cd_statefip district) float county double(cnty_area cnty_part_area) "Autauga" "Alabama" 1 2 1001 1565529773 1564828723 "Autauga" "Alabama" 1 6 1001 1565529773 272866.6476 "Autauga" "Alabama" 1 7 1001 1565529773 428181.5623 "Baldwin" "Alabama" 1 1 1003 4232265763 4228366412 "Baldwin" "Alabama" 1 7 1003 4232265763 149732.2219 "Barbour" "Alabama" 1 2 1005 2342716428 2341574170 "Barbour" "Alabama" 1 3 1005 2342716428 601774.0707 "Bibb" "Alabama" 1 6 1007 1621774445 1621293818 "Bibb" "Alabama" 1 7 1007 1621774445 480626.7196 "Alameda" "California" 6 9 6001 1928023778 347196605.8 "Alameda" "California" 6 10 6001 1928023778 350508415.5 "Alameda" "California" 6 11 6001 1928023778 654388620.1 "Alameda" "California" 6 13 6001 1928023778 571141686.4 "Alameda" "California" 6 15 6001 1928023778 77415.0506 "Alpine" "California" 6 3 6003 1924880954 1923718768 "Alpine" "California" 6 4 6003 1924880954 194564.3245 "Alpine" "California" 6 19 6003 1924880954 163767.2729 "Alpine" "California" 6 25 6003 1924880954 326582.7162 "Amador" "California" 6 3 6005 1566159667 1565521028 "Amador" "California" 6 4 6005 1566159667 637209.1726 "Amador" "California" 6 11 6005 1566159667 1425.507371 "Glenn" "California" 6 1 6021 3437311730 146543.9293 "Glenn" "California" 6 2 6021 3437311730 3437165187 "Humboldt" "California" 6 1 6023 9286259251 9282075925 "Humboldt" "California" 6 2 6023 9286259251 589746.8231 "Los Angeles" "California" 6 22 6037 10591484333 1402787367 "Los Angeles" "California" 6 24 6037 10591484333 350751.8175 "Los Angeles" "California" 6 25 6037 10591484333 4108619373 "Los Angeles" "California" 6 26 6037 10591484333 1272584266 "Los Angeles" "California" 6 27 6037 10591484333 392578311.2 "Los Angeles" "California" 6 28 6037 10591484333 201732469.2 "Los Angeles" "California" 6 29 6037 10591484333 262922748.4 "Los Angeles" "California" 6 30 6037 10591484333 742182418.1 end
Jackknife xtqreg
Using the Grunfeld data, this works:
But if I use jackknife instead of bootstrap it does not work. What needs to be different for jackknife? I took out reps and changed i(company) to the jackknife equivalent.
Code:
bootstrap, reps(50) cluster(company) idcluster(comp): xtqreg invest mvalue kstock i.comp#c.time, quantile(.25) i(company)
Combine multiple rows into one when end_date = start_date
Hi there,
I am looking for a code that combine the rows when end_date = start_date for a certain ID when all the variables have same values. For example, in the table below, ID 1 has 4 rows and their variables x and y have the same values in each rows under ID 1. ID 1 has end_date=start_date in each of its row. In that case, ID 1 will be combined into one row that will show the very first start_date and very last end_date.
id start_date end_date x y
1 10 12 10 12
1 12 14 10 12
1 14 16 10 12
1 16 21 10 12
2 11 13 15 17
2 13 15 15 17
3 16 18 20 22
3 18 20 20 22
3 20 25 20 22
Result will look like this:
id start_date end_date x y
1 10 21 10 12
2 11 15 15 17
3 16 25 20 22
I am seeking for help to get the code.
Thank you.
Kind regards,
Rakib
I am looking for a code that combine the rows when end_date = start_date for a certain ID when all the variables have same values. For example, in the table below, ID 1 has 4 rows and their variables x and y have the same values in each rows under ID 1. ID 1 has end_date=start_date in each of its row. In that case, ID 1 will be combined into one row that will show the very first start_date and very last end_date.
id start_date end_date x y
1 10 12 10 12
1 12 14 10 12
1 14 16 10 12
1 16 21 10 12
2 11 13 15 17
2 13 15 15 17
3 16 18 20 22
3 18 20 20 22
3 20 25 20 22
Result will look like this:
id start_date end_date x y
1 10 21 10 12
2 11 15 15 17
3 16 25 20 22
I am seeking for help to get the code.
Thank you.
Kind regards,
Rakib
Wild bootstrap with ML
Dear users, I need to estimate a maximum likelihood model with wild bootstrap as I have few cluster issue. I wonder if I can get any advice on how to implement the wild boostrap. To be specific, I did the following to use boottest and failed.
ml model lf ML_eut_mu (CRRA: choice p0b p1b pay0b pay1b = post trt postxtrt ///
) (mu: = )
ml max, diff iterate(100)
boottest postxtrt
Thank you!
ml model lf ML_eut_mu (CRRA: choice p0b p1b pay0b pay1b = post trt postxtrt ///
) (mu: = )
ml max, diff iterate(100)
boottest postxtrt
Thank you!
New Stata package: ddml for Double/debiased Machine Learning
I am happy to announce a new package that I have written together with Christian Hansen, Mark Schaffer and Thomas Wiemann.
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. Estimators of causal parameters for five different econometric models are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables.
The main features are:
You can install ddml from our Github or SSC.
Many thanks to Kit Baum for uploading the package to SSC.
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in Stata. Estimators of causal parameters for five different econometric models are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables.
The main features are:
- ddml supports flexible estimators of causal parameters in five econometric models: (1) the Partially Linear Model, (2) the Interactive Model (for binary treatment), (3) the Partially Linear IV Model, (4) the Flexible Partially Linear IV Model, and (5) the Interactive IV Model (for binary treatment and instrument).
- ddml supports data-driven combinations of multiple machine learners via stacking by leveraging pystacked, our complementary Stata frontend relying on the Python library scikit-learn. See our separate Working paper.
- Aside from pystacked, ddml can be used in combination with many other existing supervised machine learning programs available in or via Stata. ddml has been tested with lassopack, rforest, svmachines, and parsnip. Indeed, the requirements for compatibility with ddml are minimal: Any eclass-program with the Stata-typical "reg y x'' syntax, support for if conditions and post-estimation predict is compatible with ddml.
- ddml provides flexible multi-line syntax and short one-line syntax. The multi-line syntax offers a wide range of options, guides the user through the DDML algorithm step-by-step, and includes auxiliary programs for storing, loading and displaying additional information. We also provide a complementary one-line version called qddml (`quick' ddml), which uses a similar syntax as pdslasso and ivlasso.
You can install ddml from our Github or SSC.
Code:
ssc install ddml net install ddml, from(https://raw.githubusercontent.com/aahrens1/ddml/master)
Predicitve Margins and Marginsplots dor continuous variable
Hello together,
I am completely new to margins and marginsplots an have the following question:
I have this regression:
and I need the predicitve margins and the following margins plot where GGDEBT has to be held constant at certain values (30, 60, 80, 100, 120)
with Predicted Change_NII on the y-axis and Total_domestic_exposure on the x-axis.
Array
Thanks in advance and kind regards,
Natalie
I am completely new to margins and marginsplots an have the following question:
I have this regression:
Code:
reghdfe c.Change_NII c.GGDEBT##c.Total_domestic_exposure c.GGDEBT_lag1 c.Long_term_rate c.Unemployment c.log_totalassets c.log_ownfunds c.CET_Actual c.NII_Actual, absorb(i.ST_Year i.Country_Code2)
with Predicted Change_NII on the y-axis and Total_domestic_exposure on the x-axis.
Array
Thanks in advance and kind regards,
Natalie
Wednesday, January 25, 2023
Merging problem of congressional district data with counties
This is my one set of data where I have statefip , district ( which stands for congressional district) and county. I want to merge this data with the following dataset. Problem is in my master data ( which is posted first here) congressional district from the same district belongs to multiple counties. I don't know how I can resolve the issue to merge with the using data ( which is posted second in this post below )
Can anyone help ??
The data I'm trying to merge with is like the following.
Can anyone help ??
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(cd_statefip district) float county 6 9 6001 6 10 6001 6 11 6001 6 13 6001 6 15 6001 6 3 6003 6 3 6003 6 3 6003 6 4 6003 6 19 6003 6 4 6005 6 4 6005 6 11 6009 6 19 6009 6 19 6009 6 1 6011 6 1 6011 6 2 6011 6 2 6011 48 5 48001 48 6 48001 48 11 48003 48 19 48003 48 1 48005 48 1 48005 48 5 48005 48 27 48007 48 15 48055 48 25 48055 48 25 48055 48 28 48055 end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(state_fips district) float democrat_pres byte incumbent_pres_match float(democrat_voteperc competitive) 6 1 1 0 42.61742 2 6 1 1 0 42.61742 2 6 2 1 1 71.2433 8 6 2 1 1 71.2433 8 6 3 1 1 54.23049 4 6 3 1 1 54.23049 4 6 4 1 0 38.89085 1 6 4 1 0 38.89085 1 6 5 1 1 74.47112 8 6 5 1 1 74.47112 8 6 6 1 1 75.05244 9 6 6 1 1 75.05244 9 6 7 1 1 51.68154 4 6 7 1 1 51.68154 4 6 8 1 0 . . 6 8 1 0 . . 6 9 1 1 55.5541 5 6 9 1 1 55.5541 5 6 10 1 0 47.29181 3 6 10 1 0 47.29181 3 6 11 1 1 69.73173 8 6 11 1 1 69.73173 8 6 12 1 1 85.08385 . 6 12 1 1 85.08385 . 6 13 1 1 86.78157 . 6 13 1 1 86.78157 . 6 14 1 1 78.91653 9 48 1 0 0 56.44611 5 48 1 0 0 56.44611 5 48 2 0 0 60.84797 6 48 2 0 0 60.84797 6 48 2 0 0 60.84797 6 48 3 0 1 24.331583 . 48 3 0 1 24.331583 . 48 3 0 1 24.331583 . 48 4 0 0 57.82096 5 48 4 0 0 57.82096 5 48 4 0 0 57.82096 5 48 5 0 1 40.26217 2 48 5 0 1 40.26217 2 48 5 0 1 40.26217 2 48 5 0 1 40.26217 2 48 17 0 0 51.36383 4 48 17 0 0 51.36383 4 48 17 0 0 51.36383 4 48 18 0 0 76.91312 9 48 18 0 0 76.91312 9 48 18 0 0 76.91312 9 end label values competitive competitivb label def competitivb 0 "30-", modify label def competitivb 1 "35-", modify label def competitivb 2 "40-", modify label def competitivb 3 "45-", modify label def competitivb 4 "50-", modify label def competitivb 5 "55-", modify label def competitivb 6 "60-", modify label def competitivb 7 "65-", modify label def competitivb 8 "70-", modify label def competitivb 9 "75-", modify label def competitivb 10 "80-", modify
Is there a way to make Python stop on error when running PyStata ?
Suppose I have the following (toy) code
and I run the whole thing in python. Suppose Stata throws an error that variable y does not exist. If it were the real Stata, this would stop the program, and I would be happy. But in python, it will be executed 1000 times! Is there any way to stop it on its own?
Code:
import stata_setup
stata_setup.config("C:/Program Files/Stata17", "mp")
for i in range(1000):
%stata replace y=2Create empty graph in a loop
Dear Statlisters,
I have a series of graphs made in a loop which are then amalgamated using grc1leg (a wrapper for graph combine). There are a number of these combined graphs that will need to be compared. When running a loop, if there is a graph without observations, state throws up an error and stops. This can be circumnavigated by prefixing capture in front of the catplot command. A further error then manifests when trying to save the graph that has no data. The potential errors are 1) failure to save anything if no graph is in the memory and the code stopping, 2) saving the last graph in the memory and the code continuing. The first of these can also be circumnavigated by introducing an if command and counting the number of non-missing datapoints and opting not to save if there is no graph. However, there is then an absence of file to combine with grc1leg.
For ease of coding and for ease of reading and comparability, I can manually remove the missing graph's code and introduce a hole into the grc1leg matrix where this missing graph would be, however this is a suboptimal method of fixing the problem. Instead it would be easier if the original graph would save as an empty graph with no datapoints, but I am unclear as to how to ask state to comply (if indeed it can).
initial code:
with if command:
manual solution - removing the problematic graph eg 1.gph and introducing a hole
Any help gratefully received.
Rob
I have a series of graphs made in a loop which are then amalgamated using grc1leg (a wrapper for graph combine). There are a number of these combined graphs that will need to be compared. When running a loop, if there is a graph without observations, state throws up an error and stops. This can be circumnavigated by prefixing capture in front of the catplot command. A further error then manifests when trying to save the graph that has no data. The potential errors are 1) failure to save anything if no graph is in the memory and the code stopping, 2) saving the last graph in the memory and the code continuing. The first of these can also be circumnavigated by introducing an if command and counting the number of non-missing datapoints and opting not to save if there is no graph. However, there is then an absence of file to combine with grc1leg.
For ease of coding and for ease of reading and comparability, I can manually remove the missing graph's code and introduce a hole into the grc1leg matrix where this missing graph would be, however this is a suboptimal method of fixing the problem. Instead it would be easier if the original graph would save as an empty graph with no datapoints, but I am unclear as to how to ask state to comply (if indeed it can).
initial code:
Code:
foreach var of local systemic {
colorpalette red yellow, ipolate(4) nograph
capture catplot Overall_`var' , over(Armcode) stack asyvars bar(2, col("`r(p4)'")) bar(3, col("`r(p3)'")) bar(4, col("`r(p2)'")) bar(5, col("`r(p1)'")) percent graphregion(color(white)) bgcolor(white) allcategories legend(order(2 3 4))
graph save "`var'.gph", replace
}
grc1leg "1.gph" "2.gph" "3.gph" "4.gph" "5.gph" "6.gph" "7.gph" "8.gph" graphregion(color(white))
graph save "combine.gph", replaceCode:
foreach var of local systemic {
colorpalette red yellow, ipolate(4) nograph
capture catplot Overall_`var' , over(Armcode) stack asyvars bar(2, col("`r(p4)'")) bar(3, col("`r(p3)'")) bar(4, col("`r(p2)'")) bar(5, col("`r(p1)'")) percent graphregion(color(white)) bgcolor(white) allcategories legend(order(2 3 4))
qui count if !(missing(Overall_`var'))
if(`r(N)'>0) {
graph save "`var'.gph", replace
}
}Code:
foreach var of local systemic {
colorpalette red yellow, ipolate(4) nograph
capture catplot Overall_`var' , over(Armcode) stack asyvars bar(2, col("`r(p4)'")) bar(3, col("`r(p3)'")) bar(4, col("`r(p2)'")) bar(5, col("`r(p1)'")) percent graphregion(color(white)) bgcolor(white) allcategories legend(order(2 3 4))
graph save "`var'.gph", replace
}
grc1leg "2.gph" "3.gph" "4.gph" "5.gph" "6.gph" "7.gph" "8.gph" graphregion(color(white)) holes(1)
graph save "combine.gph", replaceRob
Changing an estimation stored macro with a previously stored macro
I am trying to change an estimation stored macro with the stored result from the previous estimation. I manage to do so with `e(cmd)' but not `e(inexog)' below.
I only managed to get `e(inexog)' to output the desired list of variables when setting, within the program, ereturn local inexog = "...." (list of variable). I would like to do it automatically by making reference to a local macro, as I tried above.
Code:
. sysuse auto, clear (1978 Automobile Data) . . qui ivreg2 price (weight=length) i.turn headroom, cluster(turn) small . display e(inexog) 31b.turn 32.turn 33.turn 34.turn 35.turn 36.turn 37.turn 38.turn 39.turn 40.turn 41.turn 42.t > urn 43.turn 44.turn 45.turn 46.turn 48.turn 51.turn headroom . local x = "`e(inexog)'" . display "`x'" 31b.turn 32.turn 33.turn 34.turn 35.turn 36.turn 37.turn 38.turn 39.turn 40.turn 41.turn 42.t > urn 43.turn 44.turn 45.turn 46.turn 48.turn 51.turn headroom . . cap prog drop pretend_to_be_ivreg2 . prog pretend_to_be_ivreg2, eclass 1. ereturn local cmd = "ivreg2" 2. ereturn local inexog = "`x'" 3. end . display "`x'" 31b.turn 32.turn 33.turn 34.turn 35.turn 36.turn 37.turn 38.turn 39.turn 40.turn 41.turn 42.t > urn 43.turn 44.turn 45.turn 46.turn 48.turn 51.turn headroom . . qui ivreghdfe price (weight=length) headroom, absorb(turn) cluster(turn) // coeff/SE: 4.509 > 739 .7761883, Fstat: 69.125 . display e(cmd) ivreghdfe . display e(inexog) headroom . pretend_to_be_ivreg2 . display e(cmd) ivreg2 . display e(inexog) // DOES NOT WORK . .
Code:
. cap prog drop pretend_to_be_ivreg2 . prog pretend_to_be_ivreg2, eclass 1. ereturn local cmd = "ivreg2" 2. ereturn local inexog = "31b.turn 32.turn 33.turn 34.turn 35.turn 36.turn 37.turn > 38.turn 39.turn 40.turn 41.turn 42.turn 43.turn 44.turn 45.turn 46.turn 48.turn 51.turn he > adroom" 3. end . qui ivreghdfe price (weight=length) headroom, absorb(turn) cluster(turn) // coeff/SE: 4.509 > 739 .7761883, Fstat: 69.125 . display e(cmd) ivreghdfe . display e(inexog) headroom . pretend_to_be_ivreg2 . display e(cmd) ivreg2 . display e(inexog) // NOW, IT WORKS 31b.turn 32.turn 33.turn 34.turn 35.turn 36.turn 37.turn 38.turn 39.turn 40.turn 41.turn 42.t > urn 43.turn 44.turn 45.turn 46.turn 48.turn 51.turn headroom .
Discrete Choice Experiment - Fractonnial factorial design
Dear collegues,
As PhD student, I want to set up a Discrete Choice Experiment in a consumers survey. The main objective is to measure consumer willingness to pay for multi-labelled seafood products. Following focus group and expert opinion, my attributes and their corresponding levels will be the following :
My full factorial desing is composed by 64 attribute combinations (4*4*4). To generate it, i used the following code
I obtained a table which present the entire combination.
I therefore have to select a fractonnial factorial design for my survey. To do so, i used the following code:
I aim to propose to each individual in my survey to choose between two options and one output option. Each individual will have to do this operation three times.
I don't understand how to select the fractionnal design with the higher Defficenty ... Each time I run the last two lines of my code, the Defficienty score changes. I am confused about the procedure to follow. And I haven't introduced the opt-out yet for the moment in my design ...
Sorry for these questions, I am just starting to get interested in these issues of DCE.
If you want more information to clarify my position, i remain available
Best Regards
Jean-François DEWALS
As PhD student, I want to set up a Discrete Choice Experiment in a consumers survey. The main objective is to measure consumer willingness to pay for multi-labelled seafood products. Following focus group and expert opinion, my attributes and their corresponding levels will be the following :
| Attributes | Level |
| Price | Market price; +10%; + 20%; +30% |
| Labels | None; Ecolabel; Health Allegations; Both labels |
| Production Method | Wild cod; Raised salmon; Wild seabass; raised seabass |
My full factorial desing is composed by 64 attribute combinations (4*4*4). To generate it, i used the following code
Code:
matrix attrib = 4,4,4 genfact, levels (attrib) list, separator (4)
I therefore have to select a fractonnial factorial design for my survey. To do so, i used the following code:
Code:
rename x1 price rename x2 labels rename x3 species recode price (1=0) (2=1) (3=2) (4=3) recode labels (1=0) (2=1) (3=2) (4=3) recode species (1=0) (2=1) (3=2) (4=3) ******* Test de Defficient ******* matrix b = J(1,9,0) dcreate i.price i.labels i.species, nalt(2) nset(12) bmat(b)
I don't understand how to select the fractionnal design with the higher Defficenty ... Each time I run the last two lines of my code, the Defficienty score changes. I am confused about the procedure to follow. And I haven't introduced the opt-out yet for the moment in my design ...
Sorry for these questions, I am just starting to get interested in these issues of DCE.
If you want more information to clarify my position, i remain available
Best Regards
Jean-François DEWALS
Tuesday, January 24, 2023
10000 posts by William Lisowski
Congratulations William Lisowski on reaching the milestone of 10,000 posts on Statalist! Your contributions have greatly enriched the community by providing valuable insights, helpful advice, and interesting perspectives on various topics. Your dedication and willingness to share your knowledge and experience have made the forum a better place for everyone. Thank you for being an active and engaged member of the community, and we look forward to reading your future contributions. Your hard work is appreciated.
Need some help dealing with duplicates
Hi all, I need some help with dealing with duplicates in my data.
I have something like this:
This is a simplified version of my data, most of them only have 1 unique var1 and var2, which I intend to use for merging data. But from some variables, there are different types of Var3, which separates them.
What I want to achieve is to add up var4 if var1 and var2 are the same, and drop every var3 != Red.
So the effect I want is:
I got a feeling that I should use for loop but not sure where to start.
Can anyone help please?
Thank you sooooo much!!!
I have something like this:
| var1 | var2 | var3 | var4 |
| a | x | Red | 1 |
| a | x | Green | 2 |
| b | y | Red | 3 |
| b | y | Green | 4 |
This is a simplified version of my data, most of them only have 1 unique var1 and var2, which I intend to use for merging data. But from some variables, there are different types of Var3, which separates them.
What I want to achieve is to add up var4 if var1 and var2 are the same, and drop every var3 != Red.
So the effect I want is:
| var1 | var2 | var3 | var4 |
| a | x | Red | 3 |
| b | y | Red | 7 |
I got a feeling that I should use for loop but not sure where to start.
Can anyone help please?
Thank you sooooo much!!!
Mediation effects with MLM?
I'm trying to do a somewhat tricky mediation test. Basically, I'm looking at the relationship between parental education (4 levels) and the risk of self-harm (binary) in college students, and I want to know if the "academic rank" of the student's college explains part of this association. For those curious, academic rank is measured by the NCES-Barron's Admissions Competitiveness Index. Here, I'm testing mediation by comparing the bottom category (High school degree) to the top (Advanced degree).
I need to adjust for clustering by school but also account for the fact that the mediator is school-specific, not student-specific. Here's what I originally tried:
But I'm thinking I need to do something that actually addresses the multilevel nature of the data. Something like:
... and then compare the coefficient on 4.maxed4_merged (High school vs. Advanced degree). Btw, I don't know if I need to say
for these mixed effects models.
What do you think is the best approach? I'm open to all possibilities.
I need to adjust for clustering by school but also account for the fact that the mediator is school-specific, not student-specific. Here's what I originally tried:
Code:
svyset schoolID [pw=nrweight] // students are nested in schools, and nrweight adjusts for nonresponse based on each school's demographics
local cov "i.survey_year i.gender i.race age_10cat"
* in the following, sib_any is the outcome, and it stands for self-injurious behavior
* inst_acarank is a 6-level ordinal measure of academic rank (basically, how hard it is to get into that university)
* maxed4_merged is a 4-level measure of parental education
* below, subpop(deg_bach) restricts my test to bachelor's degree students
gen sib_any2 = sib_any
svy, subpop(deg_bach): gsem (sib_any <- i.maxed4_merged `cov', logit) ///
(sib_any2 <- i.maxed4_merged `cov' i.inst_acarank, logit) ///
if inst_acarank!=.
margins, dydx(4.maxed4_merged) post
mlincom 1-2Code:
eststo m2: mixed sib_any i.maxed4_merged i.gender i.race i.survey_year age_10cat [pw=nrweight] ///
|| inst_acarank: || schoolID_new:
eststo m1: mixed sib_any i.maxed4_merged i.gender i.race i.survey_year age_10cat [pw=nrweight] || ///
schoolID_new: if e(sample)Code:
,vce(cluster schoolID)
What do you think is the best approach? I'm open to all possibilities.
Very large T- Statistics
Hi Stata Community, I'm running some regressions using Fixed Effect Methodology. The issue is that the outcome of my regression shows very large T statistics for some of the variables. I would like to have your thought about ( is this a good sign to have T statistics above 100) or there is an issue that I need to fix. I also noticed that the F test is not reported and only dot is shown. Any could explain what is the issue and how it can be solved?
Thanks a lot for your help in advance.
Thanks a lot for your help in advance.
Code:
xtreg data38350 w_l_data4024 w_l_asset w_l_net_int_margin w_l_fee_income wholefund w_l_non_int_exp w_l_mktshare RESTINC_rest RESTDEC_rest. fe cluster (Country)
Code:
Fixed-effects (within) regression Number of obs = 52406
Group variable: Bvdidnum Number of groups = 8897
R-sq: Within = 0.0314 Obs per group: min = 1
Between = 0.0540 avg = 5.9
Overall = 0.0462 max = 17
F(8,10) = .
corr(u_i, Xb) = -0.4073 Prob > F = .
(Std. err. adjusted for 11 clusters in Country)
-------------------------------------------------------------------------------
| Robust
data38350 | Coefficient std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
w_l_data4024 | .3722737 .001208 308.18 0.000 .3695822 .3749653
w_l_asset | -4.095435 .0018257 -2243.23 0.000 -4.099503 -4.091367
w_l_net_int~n | -.0930606 .0012431 -74.86 0.000 -.0958303 -.0902909
w_l_fee_inc~e | .2234046 .0134468 16.61 0.000 .1934434 .2533658
wholefund | -1.636517 .8602466 -1.90 0.086 -3.553265 .2802323
w_l_non_int~p | .0662385 .0014957 44.29 0.000 .0629059 .069571
w_l_mktshare | .4508283 .0805129 5.60 0.000 .2714344 .6302223
RESTINC_rest | .0947936 .0000537 1765.90 0.000 .094674 .0949132
RESTDEC_rest | .0178919 .0015936 11.23 0.000 .0143413 .0214426
_cons | 32.68893 .0093108 3510.87 0.000 32.66818 32.70967
--------------+----------------------------------------------------------------
sigma_u | 9.7930902
sigma_e | 4.4650662
rho | .82789564 (fraction of variance due to u_i)
-------------------------------------------------------------------------------Calculate economic significance for censored (Tobit) regression with multiple imputation
Hi,
I'm trying to calculate economic significance from a censored (Tobit) regression with multiple imputation according to this definition:
"Economic significance is indicated by the predicted change in the dependent variable from a movement from the tenth to the ninetieth percentile in the independent variable as a percentage of the mean of the dependent variable".
So far I have the following code (with y being my dependent variable, and x being my independent variable(s)):
mi register y x
mi impute mvn y x, add(10)
mi estimate, cmdok post: tobit y x, ll(0) ul(1) vce(robust)
I have tried using the following code (which works without mi estimate) immediately after the above:
sum y, meanonly
local y=r(mean)
margins, expression(predict(ystar(0,1))/`y_m') at((p10) x) at((p90) x) contrast(atcontrast(ar._at) marginswithin) atmeans
But I get the error:
e(sample) does not identify the estimation sample
Any help would be very much appreciated! :-)
Best regards,
Valdemar
I'm trying to calculate economic significance from a censored (Tobit) regression with multiple imputation according to this definition:
"Economic significance is indicated by the predicted change in the dependent variable from a movement from the tenth to the ninetieth percentile in the independent variable as a percentage of the mean of the dependent variable".
So far I have the following code (with y being my dependent variable, and x being my independent variable(s)):
mi register y x
mi impute mvn y x, add(10)
mi estimate, cmdok post: tobit y x, ll(0) ul(1) vce(robust)
I have tried using the following code (which works without mi estimate) immediately after the above:
sum y, meanonly
local y=r(mean)
margins, expression(predict(ystar(0,1))/`y_m') at((p10) x) at((p90) x) contrast(atcontrast(ar._at) marginswithin) atmeans
But I get the error:
e(sample) does not identify the estimation sample
Any help would be very much appreciated! :-)
Best regards,
Valdemar
creating table following multiple imputation with svy suite
Hello.
Is it possible to create tables using "collect" series of commands using mi estimate:svy, subpop (subpopulation): logit x a b c d?
I use the syntax below without the imputation and it works however it did not generate the table I wanted when I used it after having done multiple imputation.
Thank you.
"quietly: collect _r_b _r_se, tag(model[(1)]):svy, subpop (lostemp2008_V7Chp3): logit lostemp0810 i.age5cat08 female i.race4 i.educ4 married08 r10iwendm"
Note: I have 5 models, so I do this command 4 more time for the other models and at the end I get a single table with columns for each model.
Is it possible to create tables using "collect" series of commands using mi estimate:svy, subpop (subpopulation): logit x a b c d?
I use the syntax below without the imputation and it works however it did not generate the table I wanted when I used it after having done multiple imputation.
Thank you.
"quietly: collect _r_b _r_se, tag(model[(1)]):svy, subpop (lostemp2008_V7Chp3): logit lostemp0810 i.age5cat08 female i.race4 i.educ4 married08 r10iwendm"
Note: I have 5 models, so I do this command 4 more time for the other models and at the end I get a single table with columns for each model.
Monday, January 23, 2023
Computing a cumulative score
Hi,
I have the following sample of data. I have an indicator of 1 and 0 in once column and I want to compute the sum of consecutive 1s (i.e the cumulative column) for each ID.. How can I do that in stata? Array
I have the following sample of data. I have an indicator of 1 and 0 in once column and I want to compute the sum of consecutive 1s (i.e the cumulative column) for each ID.. How can I do that in stata? Array
Adding Number of Observations Label to Stacked Bar Chart
Hi all,
I'm trying to make a stacked bar chart that shows the number of observations within each group. My current code looks something like this:
graph bar lab risk gov, over(lt) ytitle("Proportions") ylabel(,labsize(small)) stack
But I would like there to be a label above each bar that shows the number of observations for that particular bar. Using the following lines:
sort lt
by lt: count
I see that there are 679, 678, and 680 for each bar. Is there a way to label this in the chart?
Many thanks,
Cora
I'm trying to make a stacked bar chart that shows the number of observations within each group. My current code looks something like this:
graph bar lab risk gov, over(lt) ytitle("Proportions") ylabel(,labsize(small)) stack
But I would like there to be a label above each bar that shows the number of observations for that particular bar. Using the following lines:
sort lt
by lt: count
I see that there are 679, 678, and 680 for each bar. Is there a way to label this in the chart?
Many thanks,
Cora
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input float(lab gov risk lt)
.1700374 .7782796 . 1
.43530655 .2523498 -.00009400777 1
.2423384 .035211213 -.27288932 2
.14841408 1.0353048 -.003870568 1
.3139711 .01862362 -.3500698 2
.28238416 .07640956 -.2502 2
.4400993 .0670787 -.00056580675 1
.4450542 .23286644 . 1
.12029877 .06753071 -.3343936 1
.5082648 .008052898 -.0031476584 1
.51327187 .04179808 -.001943114 2
.4545794 .1239085 -.24128672 1
.5322632 .013264816 . 1
.4979768 .023063127 -.1731898 1
.3181508 .06416951 -.22944584 3
.13223591 .7514665 -.08747536 2
.243767 0 -.3631753 2
.3984841 .09781377 -.25557485 2
.2594303 .6841453 -.02333964 1
.0002424968 1.3886087 -.00016629793 1
.00020155523 1.4744323 . 1
.0002928973 1.482304 . 1
.25241518 .7487788 . 1
.5009559 .09423032 . 1
.22404113 .17477986 -.3677354 2
.24345666 0 -.3581009 1
.1634283 .10225035 -.3377799 2
.50319344 .06749732 -.0008835515 1
.3006188 .008035306 -.26575184 3
.12657124 .6291607 -.28722692 1
.53362954 0 . 1
.4184705 .01314309 -.2038869 1
.5343853 .015596353 . 1
.3781637 0 -.3169422 1
.50177145 .08583656 . 1
.3970145 .27546874 -.1764103 1
.025122266 .7652866 -.0876677 1
.5389655 0 . 1
.4668964 .10456214 -.17507926 1
.53362954 0 . 1
.30558315 .05589489 -.1812089 2
.04403041 .3396863 -.3529391 2
.3763588 .4286624 . 1
.2673638 .7169658 . 1
.26665908 .4362247 -.22927435 1
.264844 .10131685 -.2486496 2
.3497822 .03381969 -.12766492 2
.4957976 .05695531 . 1
.454442 0 . 1
.4964611 0 -.1172438 2
.14419264 .21328416 -.2944041 3
.017567163 1.1927178 -.12055212 1
.23156355 0 -.3671024 1
.24158622 .26593 -.3394998 2
.4820203 .04930139 -.06837179 1
.54 0 . 1
.2487957 .21417157 -.13990873 1
.5003201 .036252417 -.1072251 1
.04380105 1.280126 . 1
.3979877 .01832301 -.04471144 1
.18135896 .5702492 -.006220731 1
.18662126 0 -.1729077 3
.1840557 .50763726 -.27756977 1
.41636425 0 -.25866598 2
.4459521 0 -.02916088 1
.3315545 0 -.3131432 1
.10622215 .5709492 . 1
.2908345 .031559385 -.15997745 2
.4678009 .04758657 -.15388933 1
.4751051 .10665444 . 1
.09704377 .4514135 -.3623746 2
.27230695 .4651364 -.05074546 1
.5260047 .03887583 . 1
.43815535 .0918106 -.23629928 1
.3128505 .2115928 -.12426014 1
.4359849 .0975452 -.13636968 1
.4457981 .036428157 -.17247716 1
.5343853 .015596353 . 1
.4035726 .1919202 -.19471768 1
.5160003 .01278616 -.005513604 2
.3007862 .07709143 -.27725396 1
.4477225 .25632626 . 1
.2314591 .0040478515 -.29092264 2
.397527 .006521621 -.20152047 1
.4879967 0 . 1
.2080003 .9212269 . 1
.54 0 . 1
.4816735 .13781506 -.02348751 1
.4440349 0 -.08713304 1
.4171881 .16578023 -.14507967 1
.4358414 .2036193 . 1
.42539865 .04657824 -.12620062 2
.3682399 .4724361 . 1
.3041541 .2263215 -.2994768 1
.2044367 .7129219 -.0004181319 1
.44044185 .13565966 -.15355474 1
.3315545 0 -.3131432 1
.2187026 .10570543 -.3050115 1
.47968 .1424138 . 1
.367069 .28755468 -.2607651 1
endDates in Stata
I am trying convert some dates in quarters. I am doing this by going into the variables manager and changing the format to quarters, as seen in the screenshot I've attached. When I do that though it is still in the Stata format (eg. 6973-Q1), instead of a format I can work with (eg. 2021q1) What am I doing wrong?
Dynamic Panel Equation
Hello
I am trying to build a Dynamic Panel Equation with the following information
- Variables: Y, X, and C (controlling variable)
- Time t1, t2, t3, ..., tn
At any time t, I propose the following possible relationships. Let t=t2
X(t2) → Y(t2)
Y(t1) → Y(t2)
Y(t1) → X(t2)
Array
Taking the information above, the mode is:
Y(t) = a + b*Y(t-1) + c*X(t) + d*X(t-1) + Σθ*C(t)
Where:
- C(t): are controlling variables at time t
I have two questions:
1. Does the model above correct?
2. If the model above is correct then the Instrument Variables (according to Anderson and Hisao 1981) is:
- IV1: Y(t-1) - Y(t-2)
- IV2: X(t-1) - X(t-2)
Is this correct?
Thank you
Phan
I am trying to build a Dynamic Panel Equation with the following information
- Variables: Y, X, and C (controlling variable)
- Time t1, t2, t3, ..., tn
At any time t, I propose the following possible relationships. Let t=t2
X(t2) → Y(t2)
Y(t1) → Y(t2)
Y(t1) → X(t2)
Array
Taking the information above, the mode is:
Y(t) = a + b*Y(t-1) + c*X(t) + d*X(t-1) + Σθ*C(t)
Where:
- C(t): are controlling variables at time t
I have two questions:
1. Does the model above correct?
2. If the model above is correct then the Instrument Variables (according to Anderson and Hisao 1981) is:
- IV1: Y(t-1) - Y(t-2)
- IV2: X(t-1) - X(t-2)
Is this correct?
Thank you
Phan
Sorting data
Hello,
I have the following data below:
Within ID’s, I want to properly order the data so that the previous end date is one day less than the next start date. The data should look like this:
If this cannot be achieved within ID, then I would like to create separate groups for those cases.
I would appreciate any assistance.
Thanks,
Anoush K.
I have the following data below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(ID start end) 2 18080 18747 2 20702 21335 2 18748 23011 2 21336 23011 end format %td start format %td end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(ID start end) 2 18080 18747 2 18748 23011 2 20702 21335 2 21336 23011 end format %td start format %td end
I would appreciate any assistance.
Thanks,
Anoush K.
Sunday, January 22, 2023
Graphign Functions
Hello, any idea on how to graph the following function:
y=35+2x^2-ln(x*3)
I tried this:
twoway function y=35+2x^2-ln(x^3), range (0 1)
error in expression: 35+2x^2-ln(x^3)
Thanks in advance
y=35+2x^2-ln(x*3)
I tried this:
twoway function y=35+2x^2-ln(x^3), range (0 1)
error in expression: 35+2x^2-ln(x^3)
Thanks in advance
stpm2cr - Maximum Number of Iterations Reached
Hi everyone,
I'm using -stpm2cr- to model the effect of a treatment on a set of cardiovascular outcomes over a three year period. I can plot the cumulative incidence figures without any trouble. However, when I attempt to use the same predict code to obtain point estimates and confidence intervals of time points along the way (1 month, 1 year, 2 years, and 3 years), I occasionally get an error that says:
-Maximum number of iterations exceeded.-
Once the error hits, the code stops and I can do no more.
I can't find any reference to this in the documentation and Google isn't turning up much. Any advice for how to deal with this? The data must be in there somewhere in order for plots to be generated, I'd welcome suggestions for how to address this.
Thanks!
I'm using -stpm2cr- to model the effect of a treatment on a set of cardiovascular outcomes over a three year period. I can plot the cumulative incidence figures without any trouble. However, when I attempt to use the same predict code to obtain point estimates and confidence intervals of time points along the way (1 month, 1 year, 2 years, and 3 years), I occasionally get an error that says:
-Maximum number of iterations exceeded.-
Once the error hits, the code stops and I can do no more.
I can't find any reference to this in the documentation and Google isn't turning up much. Any advice for how to deal with this? The data must be in there somewhere in order for plots to be generated, I'd welcome suggestions for how to address this.
Thanks!
Line graph of percent of frequencies within categories instead of bar graph
Dear statalist,
I have a dataset of skin cancer over ten years. I want to plot a line graph (instead of a bar graph) of the percent of frequencies within categories. Unfortunately, I can not find it in stata. Any help would be appreciated.
Thank you,
Abdullah
Array
I have a dataset of skin cancer over ten years. I want to plot a line graph (instead of a bar graph) of the percent of frequencies within categories. Unfortunately, I can not find it in stata. Any help would be appreciated.
Thank you,
Abdullah
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int year 2020 2017 2019 2016 2019 2020 2019 2018 2020 2015 end label values year labels5 label def labels5 2015 "2015", modify label def labels5 2016 "2016", modify label def labels5 2017 "2017", modify label def labels5 2018 "2018", modify label def labels5 2019 "2019", modify label def labels5 2020 "2020", modify
Interpreting Log transformed ITSA model
This question might not be so much as a programming question as an interpretation question.
I have a sample dataset as follows:
My question is how to interpret the coefficients on the log transformed itsa for _t (pre-trend), _b[_t]+_b[_x_t2022m7] (post-trend), and _x_t2022m7 (change in trend).
I have a sample dataset as follows:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte id str8 date byte month_num int year byte seq float(oc_per10000 ec_per10000) byte period 1 "Jan 2022" 1 2022 1 632.7574 4.2058268 0 1 "Feb 2022" 2 2022 2 572.58765 4.0631905 0 1 "Mar 2022" 3 2022 3 633.962 4.245679 0 1 "Apr 2022" 4 2022 4 605.58405 3.6853316 0 1 "May 2022" 5 2022 5 616.6671 4.1147165 0 1 "Jun 2022" . . . . . . 1 "Jul 2022" 7 2022 6 596.7175 5.263053 1 1 "Aug 2022" 8 2022 7 620.8705 5.038163 1 1 "Sep 2022" 9 2022 8 571.1929 4.3138437 1 1 "Oct 2022" 10 2022 9 577.1292 4.168524 1 1 "Nov 2022" 11 2022 10 560.5041 4.1477256 1 2 "Jan 2022" 1 2022 1 632.7574 4.2058268 0 2 "Feb 2022" 2 2022 2 572.58765 4.0631905 0 2 "Mar 2022" 3 2022 3 633.962 4.245679 0 2 "Apr 2022" 4 2022 4 605.58405 3.6853316 0 2 "May 2022" 5 2022 5 616.6671 4.1147165 0 2 "Jun 2022" . . . . . . 2 "Jul 2022" 7 2022 6 596.7175 5.263053 1 2 "Aug 2022" 8 2022 7 620.8705 5.038163 1 2 "Sep 2022" 9 2022 8 571.1929 4.3138437 1 2 "Oct 2022" 10 2022 9 577.1292 4.168524 1 2 "Nov 2022" 11 2022 10 560.5041 4.1477256 1 end
Code:
* Declare data to be time-series data
tsset id monthly
* sort by ID and time
sort id monthly
di monthly("2022m7","YM")
*log for percentage point change;
gen logec_per10000=ln(ec_per10000)
gen logoc_per10000=ln(oc_per10000)
*ITSA initial model;
itsa ec_per10000, single treat(1) trperiod(750) posttrend replace
*test for autocorrelation;
actest ,lags(9)
*ITSA final model;
itsa ec_per10000, single treat(1) trperiod(750) posttrend replace force
*ITSA Log;
itsa logec_per10000, single treat(1) trperiod(750) posttrend fig replace force- When you log transform the dependent variable, is taking the natural log the appropriate method?
- Do I have to further exponentiate the coefficients to interpret it as a monthly growth rate? (i.e., (exp(coef)-1)*100)?
Why line, bar, and pie plot dominate publications
Dear Stata users,
There are so many plot types in statistical world, line, bar, pie, box, histogram, violin, radar, mosaic, and diagnose series. And more and more new types are being created, heat plot, joy plot, alluvial plot, upset plot, just to name a few. However, line, bar, and pie plot still dominate publications, whether academic papers or popular research reports, especially in social science. Why?
There are so many plot types in statistical world, line, bar, pie, box, histogram, violin, radar, mosaic, and diagnose series. And more and more new types are being created, heat plot, joy plot, alluvial plot, upset plot, just to name a few. However, line, bar, and pie plot still dominate publications, whether academic papers or popular research reports, especially in social science. Why?
Regression output featuring a period for one variable
I'm using logit regression to link state-level policies with an individual's probability of reemployment. As controls, I feature a handful of state-by-month labor market data. As in prior work that's used the same data I'm working with, and that's explored the same outcome, the model also controls for squared and cubed values of these labor market slack measures: so unemployment rate, expressed as a proportion (e.g., 0.065); then unemployment rate^2 (0.004225); and unemployment rate^3 (0.000275). Some of the non-squared or -cubed proportions are rather small; which means the squared and cubed values are even smaller. I assume that's why the regression output associated with those variables shows a dot or period, as below (see INITRATE3). Is there anything to be concerned about here? Should I remove the variable for which a dot is showing up?
Code:
ur_sa | 8.80e-10 1.12e-08 -1.64 0.100 1.38e-20 56.01906
ur2_sa | 2.56e+64 2.57e+66 1.47 0.141 4.65e-22 1.4e+150
ur3_sa | 5.2e-175 1.3e-172 -1.56 0.119 0 1.28e+45
iur | .1303132 .9846071 -0.27 0.787 4.83e-08 351892
iur2 | 1.09e+14 8.32e+15 0.42 0.673 7.56e-52 1.56e+79
iur3 | 3.96e-41 8.42e-39 -0.44 0.661 6.3e-222 2.5e+140
initrate | .0017314 .083949 -0.13 0.896 9.25e-45 3.24e+38
initrate2 | 1.1e-221 3.5e-218 -0.16 0.872 0 .
initrate3 | . . 0.19 0.849 0 .
empgrowth | 1.156185 .0878403 1.91 0.056 .9962262 1.341828
emp2 | .9770312 .0225311 -1.01 0.314 .9338542 1.022205
emp3 | 1.002295 .0017089 1.34 0.179 .9989514 1.00565Filling in missing data from previous values?
Below I have included an example of a wide dataset in which children have ages reported at each wave (ex: wave 5 age is k5agebg). In some instances, age is set to ".m" but we know that interviews are 2 years a part. How can I make .m's on my age variables be equal to the next value minus 2 if currently .m? In the example, k11agebg should be 35 on the first observation and 26 on the second. k10agebg would be 33 on the first observation and 24 on the second. I've done similar things with panel data, but am a bit stumped on solving this in the wide format.
Code:
. list hhidpn kidid k1age k2agebg k3agebg k4agebg k5agebg k6agebg k7agebg k8agebg k9agebg k10agebg k11agebg k12agebg kabyearbg if hhidpn == 920672010 , sepby(hhidpn)
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| hhidpn kidid k1age k2agebg k3agebg k4agebg k5agebg k6agebg k7agebg k8agebg k9agebg k10agebg k11agebg k12agebg kabyea~g |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|
128812. | 920672010 9206720101 . . . . . . . . . .m .m 37 1977 |
128813. | 920672010 9206720102 . . . . . . . . . .m .m 28 1986 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+Code:
* Example generated by -dataex-. For more info, type help dataex clear input long hhidpn str10 kidid byte(k1age k2agebg k3agebg k4agebg k5agebg k6agebg k7agebg k8agebg k9agebg k10agebg k11agebg k12agebg) int kabyearbg 920672010 "9206720101" . . . . . . . . . .m .m 37 1977 920672010 "9206720102" . . . . . . . . . .m .m 28 1986 end
Table1 command issue
Hello, when I write this command: table1, by (pes) vars(edad conts \ par_cat cat \ hta cat \ preprev cat \ diabm cat \ diabg cat \ imc conts) \ gadg conts \ edadpar conts ) , STATA responds "varlist required", I don't understand why?. All those variables are already created.
Regression analysis with panel data
Hello Everyone,
I have a panel dataset with approximately 5300 observations and I am analysing if a firm's ESG score has a significant influence on the firm's stock price including several control variables.
Therefore, I use the following regression command:
reghdfe prcc_f ESG_score epspx bkvlps size leverage rd market_to_book invest_a, absorb(Ticker Year) vce(robust)
Does this command make sense or how do I interpret it correctly (With vce(robust) I want to eliminate the problem of heteroskedasticity)?
Thank you very much in advance!
I have a panel dataset with approximately 5300 observations and I am analysing if a firm's ESG score has a significant influence on the firm's stock price including several control variables.
Therefore, I use the following regression command:
reghdfe prcc_f ESG_score epspx bkvlps size leverage rd market_to_book invest_a, absorb(Ticker Year) vce(robust)
Does this command make sense or how do I interpret it correctly (With vce(robust) I want to eliminate the problem of heteroskedasticity)?
Thank you very much in advance!
cannot run GMM
Dear Statalist,
I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I mostly have reversal causality from 2 of my control variables (z1 and z2 )and my main independent variable (M1). When I run two-step system GMM, the results change dramatically, eroding the significance of most, if not all, the variables. However, the model is valid according to AR(2), sargan , and Hansan tests, as follows:
Code:
I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem. I also tried to change the number of lags, however it is relatively useless as well. So what is the wrong in my command pls.
I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I mostly have reversal causality from 2 of my control variables (z1 and z2 )and my main independent variable (M1). When I run two-step system GMM, the results change dramatically, eroding the significance of most, if not all, the variables. However, the model is valid according to AR(2), sargan , and Hansan tests, as follows:
Code:
Code:
xtabond2 lny l.lny lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7, /// gmm (l.lny , lag(1 3) collapse) iv ( lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 ) twostep cluster(country) nodiffsargan Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: country Number of obs = 177 Time variable : time Number of groups = 17 Number of instruments = 17 Obs per group: min = 1 Wald chi2(13) = 1.40e+10 avg = 10.41 Prob > chi2 = 0.000 max = 16 (Std. Err. adjusted for clustering on country) ------------------------------------------------------------------------------ | Corrected lny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lny| L1. | 1.041343 .0277953 37.46 0.000 .9868654 1.095821 | lnz1 | .0005234 .0033601 0.16 0.876 -.0060623 .007109 x1| .0000164 .0000536 0.31 0.760 -.0000887 .0001214 x2| .0000259 .0001718 0.15 0.880 -.0003108 .0003625 x3| .0002729 .0005714 0.48 0.633 -.0008469 .0013927 x4| -.0000124 .0000428 -0.29 0.772 -.0000962 .0000714 x5| -.0001993 .0002734 -0.73 0.466 -.0007351 .0003365 m1| .0052374 .0120197 0.44 0.663 -.0183208 .0287955 m1sq | -.0018499 .0028094 -0.66 0.510 -.0073563 .0036564 m1_z2 | -.0002276 .0034992 -0.07 0.948 -.0070859 .0066308 z2| -.0019458 .0062422 -0.31 0.755 -.0141803 .0102886 x6| -.002183 .0010165 -2.15 0.032 -.0041753 -.0001908 x7| .0000406 .0000986 0.4 0.680 -.0001525 .0002338 _cons | -.1274088 .1413448 -0.90 0.367 -.4044396 .149622 ------------------------------------------------------------------------------ Instruments for first differences equation Standard D.(lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7) GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/3).L.lny collapsed Instruments for levels equation Standard lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 _cons GMM-type (missing=0, separate instruments for each period unless collapsed) D.L.lngini collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -3.09 Pr > z = 0.002 Arellano-Bond test for AR(2) in first differences: z = -0.89 Pr > z = 0.375 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(3) = 0.53 Prob > chi2 = 0.913 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(3) = 0.49 Prob > chi2 = 0.921 (Robust, but weakened by many instruments.)
I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem. I also tried to change the number of lags, however it is relatively useless as well. So what is the wrong in my command pls.
What does "star" option in pwcorr mean?
Normally I used pwcorr to have a correlation matrix. And for such a command, we have a star option.
From the description, they stated:
It is confused to me. What does it mean that correlation coefficients are statistically significant then?
From the description, they stated:
The star option in the pwcorr command in Stata is used to display a star (*) next to correlation coefficients that are significant at a certain level. The level of significance is determined by the user and is specified as an argument in the star option. For example, pwcorr, star(.05) will star all correlation coefficients that are significant at the 5% level or better. The option is used to quickly identify which correlation coefficients are statistically significant and which are not.
Saturday, January 21, 2023
Why using the same code but can run in one machine and cannot run in another machine resulting r(123)?
Hi all,
I suspect the problems previously but I can confirm today that some code of mine can work in my shcool's machine but my personal machine. They shared almost the same stat and all fully licensed.
The information of my school machine is:
Array
And my personal machine is
Array
And when I run the code
My personal computer announce r(123) while the machine from my school can run successfully.
Could you please guide me to sort it out?
Thank you
I suspect the problems previously but I can confirm today that some code of mine can work in my shcool's machine but my personal machine. They shared almost the same stat and all fully licensed.
The information of my school machine is:
Array
And my personal machine is
Array
And when I run the code
Code:
did_imputation lnq_r id2 ndate nbanned_date, horizons(0/10) pretrend(10)
Could you please guide me to sort it out?
Thank you