I sent Stata graph as tiff to the editors of a journal but the insist they would like them in a Microsft word formattable version. Does anyone know any other method of exporting a STATA.gph to a word formattable version?
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Saturday, October 31, 2020
How to calculate birth year effects?
Dear friends,
I hope to calculate the birthyear effects. My code is similar as below,
But most dummies of i.cohort are missing. Thank you for any advice.
I hope to calculate the birthyear effects. My code is similar as below,
Code:
reg income i.birthyear i.cohort, noconst
Convert Data in Matrix Form to Long Format
Hello every one,
I have data in matrix form, where there are several rows that should ideally be independent columns. For example, I have something like this:
If I had one row on top, I would ideally reshape it to long using id2 but I need all of the information on top of the matrix. I wonder if somebody could help me figure out this.
Thanks
I have data in matrix form, where there are several rows that should ideally be independent columns. For example, I have something like this:
| id2 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | |||
| name2 | AA | BB | CC | DD | EE | FF | GG | HH | |||
| location2 | G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | |||
| id1 | name1 | location1 | |||||||||
| 1 | A | l1 | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 |
| 2 | B | l2 | b1 | b2 | b3 | b4 | b5 | b6 | b7 | b8 | b9 |
| 3 | C | l3 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 |
| 4 | D | l4 | d1 | d2 | d3 | d4 | d5 | d6 | d7 | d8 | d9 |
| 5 | E | l5 | e1 | e2 | e3 | e4 | e5 | e6 | e7 | e8 | e9 |
| 6 | F | l6 | f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 |
| 7 | G | l7 | g1 | g2 | g3 | g4 | g5 | g6 | g7 | g8 | g9 |
| 8 | H | l8 | h1 | h2 | h3 | h4 | h5 | h6 | h7 | h8 | h9 |
| 9 | I | l9 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
| 10 | J | l10 | j1 | j2 | j3 | j4 | j5 | j6 | j7 | j8 | j9 |
| 11 | K | l11 | k1 | k2 | k3 | k4 | k5 | k6 | k7 | k8 | k9 |
| 12 | L | l12 | l1 | l2 | l3 | l4 | l5 | l6 | l7 | l8 | l9 |
| 13 | M | l13 | m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 |
Thanks
store coefficients from different data set
Hi Statalist,
Could anyone help me, please? I want to do a plot with coefficients from regressions from different data set. For example, if I use webuse auto and run an estimation and then open another data set webuse lifeexp an run another model. How can I store both results and plot the coefficients in a graph with confidence interval, please?
Many thanks.
Could anyone help me, please? I want to do a plot with coefficients from regressions from different data set. For example, if I use webuse auto and run an estimation and then open another data set webuse lifeexp an run another model. How can I store both results and plot the coefficients in a graph with confidence interval, please?
Many thanks.
[Panel Data] How can I study the way a variable in the past affected the drop in a different variable in the future.
All the values that I have for the variable bondratio (amount of company debt that is in the form of bonds) are for the years 2017-2019, but I want to study how this variable affected the impact of the COVID pandemic on stock prices in 2020 (i.e. if companies with a higher/lower bondratio were more/less resilient to the crisis).
Here is an example of my data for three different stocks:
PERMNO is the stock id
The months 720+ correspond to January 2020 and onwards
Any help will be much appreciated, as I am really scratching my head over this one!
Here is an example of my data for three different stocks:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double PERMNO float month double stock_price float bondratio 13688 684 61.88999938964844 . 13688 685 66.75 . 13688 686 66.36000061035156 .8182862 13688 687 67.05000305175781 . 13688 688 68.37999725341797 . 13688 689 66.37000274658203 .8085532 13688 690 67.69000244140625 . 13688 691 70.37999725341797 . 13688 692 68.08999633789063 .8222454 13688 693 57.77000045776367 . 13688 694 54.2400016784668 . 13688 695 44.83000183105469 .7425898 13688 696 42.43000030517578 . 13688 697 41.09000015258789 . 13688 698 43.93000030517578 .7249041 13688 699 46.099998474121094 . 13688 700 43.33000183105469 . 13688 701 42.560001373291016 .8167768 13688 702 43.08000183105469 . 13688 703 46.18000030517578 . 13688 704 46.0099983215332 .8127667 13688 705 46.810001373291016 . 13688 706 26.3799991607666 . 13688 707 23.75 .6926678 13688 708 13 . 13688 709 17.030000686645508 . 13688 710 17.799999237060547 4.933464 13688 711 22.520000457763672 . 13688 712 17.100000381469727 . 13688 713 22.920000076293945 3.659187 13688 714 18.1299991607666 . 13688 715 10.449999809265137 . 13688 716 10 . 13688 717 6.170000076293945 . 13688 718 7.460000038146973 . 13688 719 10.869999885559082 . 13688 720 15.21 . 13688 721 15.5 . 13688 722 8.99 . 13688 723 10.64 . 13688 724 11.86 . 13688 725 8.87 . 13688 726 9.35 . 13688 727 9.26 . 13712 684 13.670000076293945 . 13712 685 16.920000076293945 . 13712 686 15.949999809265137 . 13712 687 15.260000228881836 . 13712 688 16.229999542236328 . 13712 689 15.729999542236328 . 13712 690 15.960000038146973 . 13712 691 15.739999771118164 . 13712 692 16.299999237060547 . 13712 693 15.739999771118164 . 13712 694 16.690000534057617 . 13712 695 17.25 .9295753 13712 696 19.31999969482422 . 13712 697 19.549999237060547 . 13712 698 19.139999389648438 .9316249 13712 699 18.860000610351563 . 13712 700 20.690000534057617 . 13712 701 19.579999923706055 .9333118 13712 702 20.760000228881836 . 13712 703 20.829999923706055 . 13712 704 22.3799991607666 .9333699 13712 705 19.719999313354492 . 13712 706 19.399999618530273 . 13712 707 16.6200008392334 .9368091 13712 708 16.31999969482422 . 13712 709 16.3799991607666 . 13712 710 15.899999618530273 .9358506 13712 711 15.319999694824219 . 13712 712 13.479999542236328 . 13712 713 14.260000228881836 .9378165 13712 714 11.329999923706055 . 13712 715 11.720000267028809 . 13712 716 13.989999771118164 . 13712 717 12.260000228881836 . 13712 718 12.6899995803833 . 13712 719 12.609999656677246 . 13712 720 10.14 . 13712 721 9.4 . 13712 722 9.24 . 13712 723 8.55 . 13712 724 7.84 . 13712 725 7.82 . 13712 726 7.4 . 13712 727 7.57 . 13714 684 12.479999542236328 . 13714 685 12.9399995803833 . 13714 686 13.800000190734863 .171748 13714 687 14.0600004196167 . 13714 688 14.210000038146973 . 13714 689 13.369999885559082 .17461425 13714 690 13.34000015258789 . 13714 691 14.079999923706055 . 13714 692 14.75 .18139043 13714 693 14.319999694824219 . 13714 694 14.09000015258789 . 13714 695 13.420000076293945 .18119723 13714 696 12.989999771118164 . 13714 697 11.010000228881836 . 13714 698 12.420000076293945 .1810034 13714 699 12.890000343322754 . 13714 700 14.319999694824219 . 13714 701 14.510000228881836 .15326667 13714 702 14.539999961853027 . 13714 703 13.859999656677246 . 13714 704 13.899999618530273 .16861856 13714 705 12.4399995803833 . 13714 706 13.149999618530273 . 13714 707 12.720000267028809 . 13714 708 14.100000381469727 . 13714 709 13.5 . 13714 710 14.199999809265137 . 13714 711 14.5600004196167 . 13714 712 14.550000190734863 . 13714 713 13.75 . 13714 714 14.020000457763672 . 13714 715 13.300000190734863 . 13714 716 13.930000305175781 . 13714 717 13.949999809265137 . 13714 718 14.34000015258789 . 13714 719 13.699999809265137 . 13714 720 13.89 . 13714 721 12.57 . 13714 722 7.08 . 13714 723 9.43 . 13714 724 9.76 . 13714 725 10.3 . 13714 726 9.68 . 13714 727 10.71 . end format %tm month
The months 720+ correspond to January 2020 and onwards
Any help will be much appreciated, as I am really scratching my head over this one!
r(608) error in loop
Hi!
I'm trying to create groups in my data: for every year and district I create a group that includes everyone 2.5 years more or less than that person. To do this, I created a loop that takes every year district group and only keeps the observations I want and appends them. I run a code that works up to a certain point and suddenly stops working. Here is my code:
keep person_id year district
duplicates drop
preserve
keep if _n>=1
gen group=.
save "file_1.dta", replace
restore
local i=1
levelsof district, local(lev_1)
levelsof year, local(lev_2)
foreach l_1 of local lev_1{
foreach l_2 of local lev_2{
preserve
keep if abs(year-`l_2')<=3 & district==`l_1'
keep person_id
gen year=`l_2'
gen district=`l_1'
gen group=`i'
append using "file_1.dta"
save "file_1.dta", replace
restore
local i=`i'+1
}
}
Thanks for any help!
Miranda
I'm trying to create groups in my data: for every year and district I create a group that includes everyone 2.5 years more or less than that person. To do this, I created a loop that takes every year district group and only keeps the observations I want and appends them. I run a code that works up to a certain point and suddenly stops working. Here is my code:
keep person_id year district
duplicates drop
preserve
keep if _n>=1
gen group=.
save "file_1.dta", replace
restore
local i=1
levelsof district, local(lev_1)
levelsof year, local(lev_2)
foreach l_1 of local lev_1{
foreach l_2 of local lev_2{
preserve
keep if abs(year-`l_2')<=3 & district==`l_1'
keep person_id
gen year=`l_2'
gen district=`l_1'
gen group=`i'
append using "file_1.dta"
save "file_1.dta", replace
restore
local i=`i'+1
}
}
Thanks for any help!
Miranda
Importing/Reformatting Data by ID
Hi Everyone,
I'm trying to merge about a dozen different lists that use the same ID scheme but are missing different subsets of entries. The data is by the year and I want every year in one .dta file. How can I import by ID or reformat columns so that I get a data set with the same ID list and just missing values by the year?
Here is a small sample of the data:
You can see in the unitid columns some of the same ID values. I want to line up data by ID and fill in missing values with the empty condition '.'
Thank you
I'm trying to merge about a dozen different lists that use the same ID scheme but are missing different subsets of entries. The data is by the year and I want every year in one .dta file. How can I import by ID or reformat columns so that I get a data set with the same ID list and just missing values by the year?
Here is a small sample of the data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(unitid v2 v3 v4 v5 v6 v7 v8) 100654 6106 100654 6001 100690 597 100654 5628 100663 21923 100663 20902 100724 5318 100663 18333 100706 9736 100690 675 100751 37663 100690 627 100724 4413 100706 9101 100760 1769 100706 7866 100751 38390 100724 4760 100830 4878 100724 5383 100760 1681 100751 38563 100858 28290 100751 37098 100812 3044 100760 1835 101028 1528 100760 1787 100830 5211 100812 3114 101143 1755 100812 3041 100858 30440 100830 4894 101161 4781 100830 4919 100937 1268 100858 29776 101189 3319 100858 27287 end
You can see in the unitid columns some of the same ID values. I want to line up data by ID and fill in missing values with the empty condition '.'
Thank you
Comparing IV Coefficients for Two Groups
Hi,
I am trying to determine whether there is a statistically significant difference between a single IV estimate computed for two separate groups:
My current code is as follows:
I want to test if the OCC_VP coefficient is equal or not between p207==1 and p207==2.
I have found that a solution would be using suest, however, this command is not compatible with ivregress. Is there any alternative approach?
Thanks!
I am trying to determine whether there is a statistically significant difference between a single IV estimate computed for two separate groups:
My current code is as follows:
Code:
ivregress 2sls delta_x delta_p208a delta_HS (OCC_VP=OCC_VV) if p207==1 ivregress 2sls delta_x delta_p208a delta_HS (OCC_VP=OCC_VV) if p207==2
I have found that a solution would be using suest, however, this command is not compatible with ivregress. Is there any alternative approach?
Thanks!
how to custom plot axis with different scaling?
Is there any option with Stata that can achieve the same goal as this in R: labels = function(y) {paste0(y/1000000,'million')}?
Basically, I want to show big numbers on y scale but shown as in millions. Cannot manually set the scale because I have to do multiple plots.
Thanks!
Basically, I want to show big numbers on y scale but shown as in millions. Cannot manually set the scale because I have to do multiple plots.
Thanks!
Stata is turning out results very slowly. What can i do?
I have a dataset comprising about 30 million observations. When i try to execute even basic commands like save or use stata takes a lot of time to respond. I did not have this problem even until recently. In the past i have comfortably crunched about 300k observations. Without having to change to a different system, what else can i do? Please help!
Using the foreach-command to find a spesific part among hundreds of observations in a string variable.
Dear members of Statalist,
I am fairly new to Stata and I have been struggling with an issue I believe is related to the "foreach" command. In my dataset, I have a variable (related_diagnosis) which contains various ICD10-codes where I am interested to make a new categorical variable which encodes 1 if there is a value within that variable containing a specific letter ("G"). related_diagnosis is coded as a string variable. The variable contains thousands of observations, and looking for observations containing "G" would take a very long time. I suppose I should use a loop such as "foreach" to simplify this.
I've tried to search for a similar topic without success. If this is answered in another topic, I would be grateful if someone could provide the link discussing this topic.
Best,
Haakon
I am fairly new to Stata and I have been struggling with an issue I believe is related to the "foreach" command. In my dataset, I have a variable (related_diagnosis) which contains various ICD10-codes where I am interested to make a new categorical variable which encodes 1 if there is a value within that variable containing a specific letter ("G"). related_diagnosis is coded as a string variable. The variable contains thousands of observations, and looking for observations containing "G" would take a very long time. I suppose I should use a loop such as "foreach" to simplify this.
I've tried to search for a similar topic without success. If this is answered in another topic, I would be grateful if someone could provide the link discussing this topic.
Best,
Haakon
How to draw a smaller sample from a larger dataset with different mean/ distribution than that of larger/ parent dataset?
Hello,
I want to draw a ssample (with certain mean/ distribution) from a larger dataset that has different mean/ distribution. For example, say in my larger data set has mean age is 40y and 89% were men. I want to draw a smaller sample with mean age 53y and 49% men. How do I do that?
Sharing some background context:
I have data collected under a community-based diabetes screening program. This screening was done using telemedicine equipped mobile medical van. Patients who were diagnosed with diabetes or at risk of diabetes complications were referred to a rural diabetic center for follow-up care.
Apparently, the rural diabetic center caters to many other patients not referred to by the van.
Unfortunately, the patients who were referred to the center were given a new unique ID and there is no way of identifying those screened in the van from the follow-up data recorded in the center.
That said, I want to draw a sample population from the follow-up data in a way that the baseline characteristics of the sample (i.e. health profile of patients who visited the center the first time) match the baseline characteristics of those screened. This way the sample drawn from the follow-up data will be representative of the screened population.
This will allow me to understand the long-term effect of care provided in the diabetes center to those screened in the mobile-medical van. I want to measure the added value of running sreenig drives suing mobile medical units as compared to routine care.
I understand that this is not an ideal way; however, due to data paucity on similar delivery care models, I don't have an alternative.
Thanks in advance.
Best,
Preeti
I want to draw a ssample (with certain mean/ distribution) from a larger dataset that has different mean/ distribution. For example, say in my larger data set has mean age is 40y and 89% were men. I want to draw a smaller sample with mean age 53y and 49% men. How do I do that?
Sharing some background context:
I have data collected under a community-based diabetes screening program. This screening was done using telemedicine equipped mobile medical van. Patients who were diagnosed with diabetes or at risk of diabetes complications were referred to a rural diabetic center for follow-up care.
Apparently, the rural diabetic center caters to many other patients not referred to by the van.
Unfortunately, the patients who were referred to the center were given a new unique ID and there is no way of identifying those screened in the van from the follow-up data recorded in the center.
That said, I want to draw a sample population from the follow-up data in a way that the baseline characteristics of the sample (i.e. health profile of patients who visited the center the first time) match the baseline characteristics of those screened. This way the sample drawn from the follow-up data will be representative of the screened population.
This will allow me to understand the long-term effect of care provided in the diabetes center to those screened in the mobile-medical van. I want to measure the added value of running sreenig drives suing mobile medical units as compared to routine care.
I understand that this is not an ideal way; however, due to data paucity on similar delivery care models, I don't have an alternative.
Thanks in advance.
Best,
Preeti
Problems of setting intial values in sem
Dear all,
I am running a sem on the following data. My command is
sem trust->std_retweet std_favorite std_quote,latent(trust) nocapslatent.
The model fails to converge.
I learned from the manual that the problem is possible due to the starting value for some variables. As all three variables have been standardized, I tried an initial value 20 percent large than is expected.
sem trust->std_retweet std_favorite std_quote,latent(trust) nocapslatent var(e.std_retweet, init(1.2)). It still does not solve the problem of failing to converge. Could anyone offer some tips on troubleshooting?
I am running a sem on the following data. My command is
sem trust->std_retweet std_favorite std_quote,latent(trust) nocapslatent.
The model fails to converge.
I learned from the manual that the problem is possible due to the starting value for some variables. As all three variables have been standardized, I tried an initial value 20 percent large than is expected.
sem trust->std_retweet std_favorite std_quote,latent(trust) nocapslatent var(e.std_retweet, init(1.2)). It still does not solve the problem of failing to converge. Could anyone offer some tips on troubleshooting?
Removing specific observations
Hello everyone,
Based on my title post it seems the question is repeated but no it’s different.
my question is:
I have dataset file consists of a number of independent directors variable. I have created a dummy variable coded 1 if there is at least one independent director on the committee.
I need to remove the firms that have no independent directors on the committee. Maybe it seems to be easy by dropping the observations== 0 ( using either the dummy variable or the variable of number of independent directors). But, I Need to remove those firms with no independent directors over the study period (2000-2006). This means that the remaining observations should consist of a firm that ( for example) did have no independent in the year 2000 then it appointed independent directors in 2001 or 2002.
Because I want to compare between the firm performance before appointing independent directors and after.
kindly who can help me out to do this and create dummy variable coded 1 after appointing independent directors zero before.
please please help.
thank you in advance
Based on my title post it seems the question is repeated but no it’s different.
my question is:
I have dataset file consists of a number of independent directors variable. I have created a dummy variable coded 1 if there is at least one independent director on the committee.
I need to remove the firms that have no independent directors on the committee. Maybe it seems to be easy by dropping the observations== 0 ( using either the dummy variable or the variable of number of independent directors). But, I Need to remove those firms with no independent directors over the study period (2000-2006). This means that the remaining observations should consist of a firm that ( for example) did have no independent in the year 2000 then it appointed independent directors in 2001 or 2002.
Because I want to compare between the firm performance before appointing independent directors and after.
kindly who can help me out to do this and create dummy variable coded 1 after appointing independent directors zero before.
please please help.
thank you in advance
I need to transpose every row of data, not including the first column, into each corresponding value of the first column.
Here is an example of my data:
Variables B,C,D,E,F,G and H are dates. I need to create a single date variable, which has the same 7 dates for each US Equity, and a single column for the stock price, which is what the number values represent in the data example. Here is an example of how the first few rows would look like:
Any help will be much appreciated!!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str15 Equity double(B C D E F G H) "ENSV US Equity" .1394 .1171 .2198 .1531 .1725 .205 .1387 "GNMX US Equity" .1659 .1659 .1659 .1659 .1659 .1659 .1659 "SITO US Equity" .1888 .1912 .16 .1799 .205 .17 .15 "AXAS US Equity" .19 .121 .3147 .1942 .2334 .202 .1919 "ZOM US Equity" .22 .1897 .17 .1577 .2379 .1581 .1087 "MVIS US Equity" .2542 .1725 .3499 .8799 1.3599999999999999 2.02 1.58 "XELA US Equity" .257 .205 .355 .3248 .5542 .5103 .4279 "TTNP US Equity" .2631 .233 .2585 .2833 .3062 .291 .2308 "GNUS US Equity" .2815 .2822 .2991 2.05 2.25 1.46 1.07 "NTEC US Equity" .2988 .197 .25 .287 .2833 .4187 .308 "KMPH US Equity" .303 .225 .3 .1876 .2868 .416 .7299 "TRNX US Equity" .3031 .1601 .1951 .0806 .0848 .0665 .034 "HTBX US Equity" .33 .57 .5698 1.02 .8427 2.17 1.26 "TGB US Equity" .351 .2671 .344 .3971 .495 .645 .96 "WTRH US Equity" .3594 1.23 1.34 2.4699999999999998 2.63 5.34 4.03 "IDEX US Equity" .3605 1.34 .6024 .3938 2.01 1.47 1.28 "NXTD US Equity" .3699 .35 .46 .3891 .5041 .487 .386 "TMDI US Equity" .3702 .27 .297 .2668 .835 .92 .787 "ASM US Equity" .3867 .34 .4048 .69 .81 1.06 1.18 "IFMK US Equity" .389 1.49 1.29 1.06 1.07 .9274 .9384 "UAMY US Equity" .41 .3306 .36 .3339 .49 .4001 .4992 "DFFN US Equity" .4122 .3166 .503 1.3 .975 1.15 .9795 "ACST US Equity" .4281 .3795 .61 .633 .4691 .76 .2447 "ONTX US Equity" .4298 .303 .3149 .4176 .5661 1.15 .2407 "UAVS US Equity" .4458 .4107 1.45 1.19 1.19 2.77 3.1 "GRNQ US Equity" .449 .3763 .37 1.45 1.6600000000000001 .8752 1.11 "OCGN US Equity" .45 .2811 .313 .31 .2204 .5151 .3482 "GLBS US Equity" .4501 .6051 .7186 .6666 .267 .143 .1352 "GPL US Equity" .4581 .3082 .4684 .4334 .5 .8198 .988 "AIHS US Equity" .4601 .4377 .375 .4086 .7358 .6819 .461 "WYY US Equity" .4624 .366 .4395 .57 .6955 .7054 .515 "CHFS US Equity" .47 .44 .4058 .375 .4788 .8493 .3672 "NAKD US Equity" .4722 .54 .7225 .625 .6526 .4529 .26 "LPCN US Equity" .4965 .48 .5537 .9136 1.26 1.54 1.48 "LODE US Equity" .5 .4051 .5115 .5779 .9499 .805 1.1400000000000001 "PTN US Equity" .5107 .4236 .4845 .5069 .512 .5766 .5527 "AYTU US Equity" .53 1.5 1.6400000000000001 1.49 1.42 1.3900000000000001 1.06 "CBL US Equity" .5308 .2001 .289 .3004 .2726 .1828 .186 "SPHS US Equity" .5354 .27 .2567 .0093 .0092 .0104 .011 "WRN US Equity" .5466 .3943 .657 .7901 .8719 1.15 1.28 "TEUM US Equity" .56 .412 .6338 .4361 .6201 .697 .68 "ADMP US Equity" .5652 .3573 .52 .4931 .537 1.18 .66 "JAGX US Equity" .57 .4801 .456 .4831 .4849 .6538 .4 "HTGM US Equity" .58 .325 .37 .5299 .72 .6639 .37 "ALRN US Equity" .5847 .332 .5458 1.62 1.18 .88 1.28 end
| Equity | Date | Stock_Price |
| ENSV US Equity | B | 0.1394 |
| ENSV US Equity | C | 0.1171 |
| ENSV US Equity | D | 0.2198 |
| ENSV US Equity | E | 0.1531 |
| ENSV US Equity | F | 0.1725 |
| ENSV US Equity | G | 0.205 |
| ENSV US Equity | H | 0.1387 |
| GNMX US Equity | B | 0.1659 |
| GNMX US Equity | C | 0.1659 |
| GNMX US Equity | D | 0.1659 |
| GNMX US Equity | E | 0.1659 |
| GNMX US Equity | F | 0.1659 |
| GNMX US Equity | G | 0.1659 |
| GNMX US Equity | H | 0.1659 |
Any help will be much appreciated!!
I use qreg, then margins, then mplotoffset: how do I change the spacing for my y-axis?
Hi Stata community,
I want to change the spacing for my y-axis. Currently I have 0.5 cm between one outcome value of y and another outcome value of y (let's say 0.5 cm between 5 and 6).
I would like to change the spacing from 0.5 cm to 1.0 cm between all outcome values of y.
How can I do that?
I would greatly appreciate your help.
Thanks,
Nico
I want to change the spacing for my y-axis. Currently I have 0.5 cm between one outcome value of y and another outcome value of y (let's say 0.5 cm between 5 and 6).
I would like to change the spacing from 0.5 cm to 1.0 cm between all outcome values of y.
How can I do that?
I would greatly appreciate your help.
Thanks,
Nico
Probit Question
Hi all,
If I have a panel of people from ages 25-30, who I may observe more than once in these ages, and I want to look at an independent variable that is "Ever used welfare" can I use a probit? The variable is 1 if you used welfare at that age ever, and 0 otherwise. My worry is that it is double counting. For instance, say that you used welfare at age 23. You will show up as a 1 if I observe you at age 25, and also as a 1 again when I observe you at age 28. Is this an issue?
Thanks!
If I have a panel of people from ages 25-30, who I may observe more than once in these ages, and I want to look at an independent variable that is "Ever used welfare" can I use a probit? The variable is 1 if you used welfare at that age ever, and 0 otherwise. My worry is that it is double counting. For instance, say that you used welfare at age 23. You will show up as a 1 if I observe you at age 25, and also as a 1 again when I observe you at age 28. Is this an issue?
Thanks!
Stata tables to word format by writing matrix command
Hi.
I am using a dataset that has 3 years. I am using the command tabulate to do analysis. So I am writing, say, tabulate work_availability education_category if year==2004. I do it for other two years as well.
I want to export the results in table format in word file for all three years in one table. Is there a way to write matrix command? if so, how to write? Or what other command I can use to get the table?
Please help. Thanks
I am using a dataset that has 3 years. I am using the command tabulate to do analysis. So I am writing, say, tabulate work_availability education_category if year==2004. I do it for other two years as well.
I want to export the results in table format in word file for all three years in one table. Is there a way to write matrix command? if so, how to write? Or what other command I can use to get the table?
Please help. Thanks
Custom STS graph axis with spacing
Hi,
This is my first post on STATALIST
I am trying to create a custom y axis scale using Sts graph for a Kaplein Meir.
Essentially on the y axis I want to label (min) 0, 0.001, 0.01, 0.1 (max) - but have these equally spaced out on the axis
Any help will be gratefully received
Many thanks
This is my first post on STATALIST
I am trying to create a custom y axis scale using Sts graph for a Kaplein Meir.
Essentially on the y axis I want to label (min) 0, 0.001, 0.01, 0.1 (max) - but have these equally spaced out on the axis
Any help will be gratefully received
Many thanks
RDRobust Question
Hi all,
I am trying to use a fuzzy regression discontinuity strategy in some data work of mine. I have a binary dependent variable that takes the value of 1 if you are eligible for a program, and 0 otherwise. I want to run the estimates with CCT bandwidths, but I keep running into an error about the cutoff value not being in the range of my dependent variable. Is there something I am missing here? Does Rdrobust not work with binary dependent variables?
Thanks!
L
I am trying to use a fuzzy regression discontinuity strategy in some data work of mine. I have a binary dependent variable that takes the value of 1 if you are eligible for a program, and 0 otherwise. I want to run the estimates with CCT bandwidths, but I keep running into an error about the cutoff value not being in the range of my dependent variable. Is there something I am missing here? Does Rdrobust not work with binary dependent variables?
Thanks!
L
Table from string and numeric variables
Hello,
this is a general question about the best approach. I am using stata 16.1
I want to make a table from one string variable and several numeric variables for e.g. the first 50 observations (showing single observations not summary statistics). I want to use the string variable as row labels. There should be many customization options.
I have tried several approaches but neither seems optimal.
The list command does in principle what I want to do but with very limited customization options. I want to use variable labels instead for the variable names, different formats and column width for the numeric variables. If possible customized placement of separating lines would be nice (customization a little bit like in Excel).
I came closest by using matrices but labeling the row names is challenging and the maximum length of the label expression is limited (as I understand it).
Which approach would you suggest?
Thanks!
this is a general question about the best approach. I am using stata 16.1
I want to make a table from one string variable and several numeric variables for e.g. the first 50 observations (showing single observations not summary statistics). I want to use the string variable as row labels. There should be many customization options.
I have tried several approaches but neither seems optimal.
The list command does in principle what I want to do but with very limited customization options. I want to use variable labels instead for the variable names, different formats and column width for the numeric variables. If possible customized placement of separating lines would be nice (customization a little bit like in Excel).
I came closest by using matrices but labeling the row names is challenging and the maximum length of the label expression is limited (as I understand it).
Which approach would you suggest?
Thanks!
Is there a command for Black Scholes model for American options?
I realise there is
for European Option, but I wonder, is there a command for American dividend-paying option?
Code:
bsopm
Cox-regression:proportional hazard test by Schoenfeld residuals
Hi!
I have performed cox-regression on a data set with both continuous and categorical variables. In the cox-regr. command I specified categorical values by adding i. before variable. Shoenfeld residuals were saved by following command stcox i.spiders age i.sex i.ascites albumin bilirubi i.edema1 choleste i.stage,schoenfeld(sc*) scaledsch(ssc*). When assessing proportionality I used estat phtest, detail. All these went well. When I attempted to make plots with Schoenfeld residuals, I used estat phtest, plot(var) command which went well for continuous variables such as age, albumin etc., but when assessing categorical variables such as sex estat phtest, plot(sex), I got error response that this variable is not found in model. Could you give an advice what is wrong?
I have performed cox-regression on a data set with both continuous and categorical variables. In the cox-regr. command I specified categorical values by adding i. before variable. Shoenfeld residuals were saved by following command stcox i.spiders age i.sex i.ascites albumin bilirubi i.edema1 choleste i.stage,schoenfeld(sc*) scaledsch(ssc*). When assessing proportionality I used estat phtest, detail. All these went well. When I attempted to make plots with Schoenfeld residuals, I used estat phtest, plot(var) command which went well for continuous variables such as age, albumin etc., but when assessing categorical variables such as sex estat phtest, plot(sex), I got error response that this variable is not found in model. Could you give an advice what is wrong?
Create new variable with sums of occurrences
Hi Statalist Community
It is my first post here and I hope it is not a question that is too trivial. Unfortunately I have not found any old posts concerning this issue.
I have panel data with over 160'000 obsevations and around 1'500 variables (unbalanced).
Now I would like to create the variable "length" which measures the length of unemployment.
This is how it should be constructed: every occurence of 1s in "unemployed" needs to be summed up. This value then needs to be associated to the new variable "length". It needs to take this value for the first obervation of the occurrence we are summing up. "Length" is the variable I need to create.
I have so far tried some commands with: bysort and sum/count. Unfortunately without any success.
Can anyone give me advice on this? How would you solve it?
I hope this table shows even better what I am trying to do...
Thank you!
It is my first post here and I hope it is not a question that is too trivial. Unfortunately I have not found any old posts concerning this issue.
I have panel data with over 160'000 obsevations and around 1'500 variables (unbalanced).
Now I would like to create the variable "length" which measures the length of unemployment.
This is how it should be constructed: every occurence of 1s in "unemployed" needs to be summed up. This value then needs to be associated to the new variable "length". It needs to take this value for the first obervation of the occurrence we are summing up. "Length" is the variable I need to create.
I have so far tried some commands with: bysort and sum/count. Unfortunately without any success.
Can anyone give me advice on this? How would you solve it?
I hope this table shows even better what I am trying to do...
Thank you!
| id | year | unemployed | length |
| 1 | 2000 | 0 | 0 |
| 1 | 2001 | 1 | 3 |
| 1 | 2002 | 1 | 0 |
| 1 | 2003 | 1 | 0 |
| 2 | 2000 | 1 | 1 |
| 2 | 2001 | 0 | 0 |
| 2 | 2002 | 1 | 2 |
| 2 | 2003 | 1 | 0 |
| 3 | 2000 | 0 | 0 |
| 3 | 2001 | 1 | 2 |
| 3 | 2002 | 1 | 0 |
Adjustments to Esttab Code to Suppress Superfluous Statistics and Display Labels
Hello --- I'm hoping to get some assistance adjusting my esttab code (pasted at bottom) to make the following adjustments to my regression output table as it currently appears below:
Array
- Suppress the three "var" statistics reported below the coefficients
- Create and display labels for the p-value stats reported at the bottom (e.g., for dnocost I would like "D-No # Private Cost = 1" as the text in the leftmost column)
- (minor thing!) I have a "$" figure in the notes at the bottom, but when exported to Latex the tex file reads it as the opening of math mode --- how can I prevent this?
Array
Code:
* Model 1a: Bid as dependent variable
eststo m1a: metobit bid i.treatment#c.randomcosts invperiod || subjectid: , ll(0) ul(upper) vce(cluster uniquegroupid)
mat list e(b)
test _b[4b.treatment#c.randomcosts] = 1
estadd scalar dnocost=r(p)
test _b[5.treatment#c.randomcosts] = 1
estadd scalar ddisccost=r(p)
test _b[6.treatment#c.randomcosts] = 1
estadd scalar dundcost=r(p)
test _b[4b.treatment#c.randomcosts] = _b[5.treatment#c.randomcosts]
estadd scalar dnoddisc=r(p)
test _b[4b.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
estadd scalar dnodund=r(p)
test _b[5.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
estadd scalar ddiscdund=r(p)
* Model 1b: Bid Inflation as dependent variable
eststo m2a: metobit bid_above_cost i.treatment#c.randomcosts invperiod || subjectid: , ll(0) ul(upper) vce(cluster uniquegroupid)
test _b[4b.treatment#c.randomcosts] = 1
estadd scalar dnocost=r(p)
test _b[5.treatment#c.randomcosts] = 1
estadd scalar ddisccost=r(p)
test _b[6.treatment#c.randomcosts] = 1
estadd scalar dundcost=r(p)
test _b[4b.treatment#c.randomcosts] = _b[5.treatment#c.randomcosts]
estadd scalar dnoddisc=r(p)
test _b[4b.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
estadd scalar dnodund=r(p)
test _b[5.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
estadd scalar ddiscdund=r(p)
// export for latex
esttab m1a m2a using "$latex/reg_ind_bidding_discriminatory_only_tobit.tex", replace eqlabels(" " " ") stats(dnocost ddisccost dundcost dnoddisc dnodund ddiscdund) label b(%10.4f) se star(* 0.10 ** 0.05 *** 0.01) title("Tobit Model of Individual Bidding Behaviour") mtitle("Bid" "Bid Inflation" ) addnotes("Random Effect at Individual Bidder Level with clustering at Auction Group level" "Upper Limit of \$7.07 for Bid Cap Treatments" "Excludes 230 observations of those bidders in bid cap treatments with private costs $>$ $7.07$/unit")Friday, October 30, 2020
Problems converting string variable to date/time format
Hi,
I have a date/time variable that is a string and looks as follows: hh.mm.ss DD.MM.YYYY.
I'd like to convert this variable to time/date format. What I tried was:
While this code doesn't give me an error, it also doesn't give me the correct times: Array
Anyone could help?
Thanks a lot!
I have a date/time variable that is a string and looks as follows: hh.mm.ss DD.MM.YYYY.
I'd like to convert this variable to time/date format. What I tried was:
Code:
gen start2 = clock(start, "hms DMY") format start2 %tc
Anyone could help?
Thanks a lot!
Convert 1K to 1000 and 1M to 1000000
Hello everyone, I need help with converting a particular variable to numbers. The variable in question is the amount of damage done by different disasters. The values were entered as for example 1K (Meaning $1000) and 1M (Meaning $1000000). I will want to convert these values to just numbers. Is there a way to do this in STATA.
PS: As the variable is continuous there are different observations with the "K" suffix e.g. 9K, 12K etc. Also there are different observations with the "M" Suffix.
I will really appreciate your insight on this, as I have a very large data set
I use STATA16
PS: As the variable is continuous there are different observations with the "K" suffix e.g. 9K, 12K etc. Also there are different observations with the "M" Suffix.
I will really appreciate your insight on this, as I have a very large data set
I use STATA16
Mixed logit and ordered outcome models (help please)
Good evening. My name is Sean and I am a current graduate student, in an advanced econometrics class. I am having trouble with formatting my data, so that I can create a mixed logit and ordered outcome model. Most recently, I have been receiving the error "Variable has no within-group variance." I have been through the help section in STATA, talked to some classmates, and tried to hire a tutor online, but no one seems to be able to help me, for some reason. I am hoping that someone in these forums can walk me through it. Thank you for reading and responding.
I have my data in a wide format, as a .csv file. These are the models that I am having generating:
e
I have my data in a wide format, as a .csv file. These are the models that I am having generating:
- Mixed logit model
- Dependent variable: camera brand (1,2,3)
- Independent variables: age, sex, income, price of camera (modeled as a random variable), choice of other camera brands
- Ordered choice model
- Dependent variable: camera brand (1 = budget camera,2=prosumer (middle of the line) camera,3=professional camera)
- Independent variables: age, sex, income, price of camera
e
Syntax to know the menu path of the commands that we use in the do file
Good night
I would like to know if there is a syntax to know the menu path of the commands that we use in the do file
Thank You
I would like to know if there is a syntax to know the menu path of the commands that we use in the do file
Thank You
Combining observations with the same year and id in a single dataset
I have some data that looks like this:
I want to combine all observations which share an id and year so that the data looks like this:
Any help is appreciated.
| no. | id | year | var1 | var2 | var3 |
| 1 | 1 | 2000 | 50 | . | . |
| 2 | 1 | 2000 | . | . | 10 |
| 3 | 2 | 2001 | . | . | 500 |
| 4 | 2 | 2001 | 200 | . | . |
| 5 | 3 | 2002 | . | 300 | . |
| 6 | 3 | 2002 | . | . | 100 |
| no. | id | year | var1 | var2 | var3 |
| 1 | 1 | 2000 | 50 | . | 10 |
| 2 | 2 | 2001 | 200 | . | 500 |
| 3 | 3 | 2002 | . | 300 | 100 |
generating variables with "or"
Hi, I wish to create a new variable given certain parameters of 3 other variables.
I have tried the basic creation of a new variable commands "gen highscore = . / replace highscore = 0 if (varA <1)
[I can get that far, but what I need is something essentially like the following - but the syntax doesn't work):
replace highscore = 0 if (varA <=1) or if (varB <=10) or if (varC >= 10)
Is there a way to fix this syntax?
Thanks for any and all help - I am sure the answer is pretty straightforward.
I have tried the basic creation of a new variable commands "gen highscore = . / replace highscore = 0 if (varA <1)
[I can get that far, but what I need is something essentially like the following - but the syntax doesn't work):
replace highscore = 0 if (varA <=1) or if (varB <=10) or if (varC >= 10)
Is there a way to fix this syntax?
Thanks for any and all help - I am sure the answer is pretty straightforward.
Histogram with density of 15??
Hi everyone,
I would like to know if I'm doing something wrong.
I am running almost 200 regressions basically and collecting the coefficients for each one over all of them, so that I have a .dta which consists solely of the b_ of the variables. I've succeeded in doing this, here is an example of my data for vark:
However, when trying to create a histogram to check the variability of each one, I ran across into a problem which is the fact that my histogram density isn't within one, but goes over 15.
Array
Am I doing something wrong?
Just another question while I'm at it: the coefplot command (SSC, I think), doesn't work in this case right? I thought since these are regression coeficients it would look well presented all together, but it only presents me with the coefficient of the last regression and respective confidence interval it seems.
Sorry if my questions are fairly basic. Please forgive a fellow Stata beginner.
Thank you
I would like to know if I'm doing something wrong.
I am running almost 200 regressions basically and collecting the coefficients for each one over all of them, so that I have a .dta which consists solely of the b_ of the variables. I've succeeded in doing this, here is an example of my data for vark:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float b_vark 3.396698 3.379129 3.4008584 3.4163265 3.407227 3.3611805 3.410911 3.40948 3.4045825 3.382801 3.405709 3.418393 3.411751 3.360613 3.416084 3.410531 3.406611 3.385547 3.394252 3.389984 3.33743 3.396989 3.38991 3.384635 3.424929 3.4166055 3.357182 3.4244335 3.41926 3.410072 3.429587 3.371039 3.437761 3.4308956 3.4213645 3.359139 3.425608 3.4222975 3.413559 3.372591 3.36899 3.363826 3.431501 3.4220905 3.42228 3.370989 3.391934 3.4071674 3.399939 3.358832 3.4016 3.397689 3.397151 3.376958 3.3882046 3.383614 3.3409314 3.388817 3.3837104 3.380788 3.4201446 3.4101994 3.361287 3.41551 3.4127526 3.4071155 3.426149 3.378766 3.432141 3.427403 3.421279 3.366856 3.42038 3.419267 3.4132764 3.3793986 3.3773935 3.3731976 3.426004 3.4197905 3.421866 3.3825076 3.391777 3.388213 3.340173 3.393364 3.38452 3.383398 3.422355 3.4159694 3.360905 3.421646 3.414381 3.409625 3.429239 3.3763766 3.435365 3.427185 3.422666 3.366118 end
Code:
histogram b_vark
Am I doing something wrong?
Just another question while I'm at it: the coefplot command (SSC, I think), doesn't work in this case right? I thought since these are regression coeficients it would look well presented all together, but it only presents me with the coefficient of the last regression and respective confidence interval it seems.
Sorry if my questions are fairly basic. Please forgive a fellow Stata beginner.
Thank you
First Differencing Error
Hi there,
I'm trying to first difference my variable on industrial production, but all I get are missing values. What am I doing wrong?
clear
input int DATE double INDPRO
6940 53.2837
6971 53.5675
6999 53.7364
7030 53.1571
7060 53.5566
7091 53.5534
7121 53.4808
7152 53.1195
7183 53.1786
7213 53.4617
7244 53.409
7274 53.4536
7305 53.7071
7336 53.7262
7365 53.5481
tset DATE, monthly
gen IP = L.INDPRO
Why is this yielding noting but missing values?
I'm trying to first difference my variable on industrial production, but all I get are missing values. What am I doing wrong?
clear
input int DATE double INDPRO
6940 53.2837
6971 53.5675
6999 53.7364
7030 53.1571
7060 53.5566
7091 53.5534
7121 53.4808
7152 53.1195
7183 53.1786
7213 53.4617
7244 53.409
7274 53.4536
7305 53.7071
7336 53.7262
7365 53.5481
tset DATE, monthly
gen IP = L.INDPRO
Why is this yielding noting but missing values?
Help regarding creating variables for line + bar graph.
HI,
I know that this has been discussed earlier in numerous fora, but, somehow I am unable to figure out if a) this is right and b) is there a simpler less cluttered and confusing way to do this? So initial dataset includes all the subjects with the concerned event. I want to create a bar graph showing absolute numbers by calendar year which is not a problem
Age65 is a categorical variable indicating if age greater than or lesser than 65 years. Now to the bar graph mentioned above I need line graph in the same bargraph showing % of the >65 years which had events that year. So I need to now make the total events by calendar year and then I need to create events by age group. I wrote:
This is giving me counts of total events per calendar year and among each age category by calendar year.
Now how do I get the proportion of failures in age65==1 out of total failures per calendar year and then use it to make a line showing percentage of events in that age group in the bar graph which shows total events?
While I tried:
I am not totally convinced that my method is right and would appreciate some help on this.
Thanks a lot.
Shalom
I know that this has been discussed earlier in numerous fora, but, somehow I am unable to figure out if a) this is right and b) is there a simpler less cluttered and confusing way to do this? So initial dataset includes all the subjects with the concerned event. I want to create a bar graph showing absolute numbers by calendar year which is not a problem
| Year | age65 |
| 1995 | 0 |
| 1995 | 0 |
| 1995 | 1 |
| 1997 | 0 |
| 1997 | 1 |
| 1997 | 0 |
| 1997 | 1 |
| 1997 | 0 |
| 1997 | 1 |
Code:
bys failyear: gen failure =_n bys failyear age65: gen fail=_n
Now how do I get the proportion of failures in age65==1 out of total failures per calendar year and then use it to make a line showing percentage of events in that age group in the bar graph which shows total events?
While I tried:
Code:
replace failure=sum(failure) replace fail=sum(fail)
Thanks a lot.
Shalom
Standarization
Hello everyone. I have this dataset that looks like this. I want to create a variable that standarizes the variable "promp08" using the variables "recinto" and "carrera". See, this is a ficticious database about people from a university, and the goal is to create a variable that standarizes the average grade for every student for every major and in every hearquarter of the university, i just don't know how to generate a variable that computes the mean and standard deviation for "promp08" for each major. I'm using version 14 of stata and i'll be thankfull for any help that ypu can give me!
Array
ArrayExtracting matrix rownames for svy mean with multiple groups
Hi everyone!
I’m trying to estimate the difference in blood pressure between hypertensive and non-hypertensive individuals using a complex survey, but doing so by age-group and gender.
I’m having trouble with the matrix code for outputting the results of my svy command. Here is what I am writing currently:
svyset psu [pweight=samplewt_bp], strata(stratum)
svy , subpop(HTN1) : mean sbp_final, over(sex age5)
Now, trying to extract these estimates (which I will later merge by country and hypertension status) is proving to be a challenge.
matrix M = e(b)
mat li M
M[1,18]
I’ve tried it many different ways but to no avail. What I would really like is a matrix with the one variable containing the strata specification and one variable containing the means. If anyone could help me figure out how to code this I would greatly appreciate it!
I’m trying to estimate the difference in blood pressure between hypertensive and non-hypertensive individuals using a complex survey, but doing so by age-group and gender.
I’m having trouble with the matrix code for outputting the results of my svy command. Here is what I am writing currently:
svyset psu [pweight=samplewt_bp], strata(stratum)
svy , subpop(HTN1) : mean sbp_final, over(sex age5)
| Survey: Mean estimation | |||
| Number of strata = 2 | Number of obs | = | 5,091 |
| Number of PSUs = 258 | Population size | = | 2,439,648 |
| Subpop. no. obs | = | 1,645 | |
| Subpop. size | = | 703,978.23 | |
| Design df | = | 256 | |
| Linearized | ||||
| Mean | Std. Err. | [95% | Conf. | Interval] |
| c.sbp_final@sex#age5 | ||||
| 1 0 | 136.3088 | 5.042258 | 126.3792 | 146.2384 |
| 1 30 | 139.3294 | 1.698257 | 135.9851 | 142.6738 |
| 1 35 | 147.6446 | 2.197222 | 143.3177 | 151.9716 |
| 1 40 | 144.5797 | 1.80772 | 141.0198 | 148.1395 |
| 1 45 | 154.1876 | 4.811669 | 144.7121 | 163.6631 |
| 1 50 | 157.141 | 4.226026 | 148.8188 | 165.4632 |
| 1 55 | 155.97 | 2.916671 | 150.2263 | 161.7137 |
| 1 60 | 154.9218 | 3.047789 | 148.9199 | 160.9238 |
| 1 65 | 158.1695 | 2.623057 | 153.004 | 163.335 |
| 2 0 | 136.8597 | 2.010842 | 132.8998 | 140.8196 |
| 2 30 | 138.7637 | 2.321715 | 134.1916 | 143.3358 |
| 2 35 | 145.7804 | 2.492297 | 140.8723 | 150.6884 |
| 2 40 | 147.0154 | 2.922702 | 141.2598 | 152.771 |
| 2 45 | 157.0043 | 3.423124 | 150.2632 | 163.7454 |
| 2 50 | 159.3642 | 5.456736 | 148.6184 | 170.11 |
| 2 55 | 154.3976 | 2.330003 | 149.8092 | 158.9861 |
| 2 60 | 158.9386 | 4.417497 | 150.2393 | 167.6378 |
| 2 65 | 158.8922 | 3.301328 | 152.3909 | 165.3934 |
matrix M = e(b)
mat li M
M[1,18]
| M[1,18] | |||||||
| c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ |
| 1.sex# | 1.sex# | 1.sex# | 1.sex# | 1.sex# | 1.sex# | 1.sex# | 1.sex# |
| 0.age5 | 30.age5 | 35.age5 | 40.age5 | 45.age5 | 50.age5 | 55.age5 | 60.age5 |
| y1 136.3088 | 139.32943 | 147.64465 | 144.57965 | 154.18764 | 157.14103 | 155.96998 | 154.92182 |
| c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ | c.sbp_final@ |
| 1.sex# | 2.sex# | 2.sex# | 2.sex# | 2.sex# | 2.sex# | 2.sex# | 2.sex# |
| 65.age5 | 0.age5 | 30.age5 | 35.age5 | 40.age5 | 45.age5 | 50.age5 | 55.age5 |
| y1 158.16949 | 136.85968 | 138.76374 | 145.78036 | 147.0154 | 157.00432 | 159.36418 | 154.39765 |
| c.sbp_final@ | c.sbp_final@ | ||||||
| 2.sex# | 2.sex# | ||||||
| 60.age5 | 65.age5 | ||||||
| y1 158.93856 | 158.89217 |
I’ve tried it many different ways but to no avail. What I would really like is a matrix with the one variable containing the strata specification and one variable containing the means. If anyone could help me figure out how to code this I would greatly appreciate it!
Margins after anova
I am requesting marginal means after running anova on StataSE 14, and am wondering why I am getting different values from the seemingly similar commands below.
#1
#2
Please let me know if there is anything I can provide for more clarity.
Thank you in advance,
Julia
#1
Code:
anova selfesteem census c.ses pwcompare census, cimargins
Code:
anova selfesteem census c.ses margins census
Thank you in advance,
Julia
Import file with .file file extension
Hello,
I am trying to import data with a .file file extension and can't seem to figure out how. If it helps, I am trying to import the file in the ED2017.zip folder found at ftp://ftp.cdc.gov/pub/Health_Statist...atasets/NHAMCS . Any ideas? (Would this be better in SAS . . . ? :x )
I am trying to import data with a .file file extension and can't seem to figure out how. If it helps, I am trying to import the file in the ED2017.zip folder found at ftp://ftp.cdc.gov/pub/Health_Statist...atasets/NHAMCS . Any ideas? (Would this be better in SAS . . . ? :x )
Heteroscedasticity test for large panel dataset using XSMLE
Does XSMLE package have any command to check the Heteroscedasticity for large panel data set? My matrix size 2952 * 2952 which is too large for spreg package to handle. Can anyone help me in this regard?
loops
hello,
As part of a Monte Carlo study
I have the following population regression model:
Y= 262 -0.006X2-2.4X3+u
I was told to assume that u is normally distributed with N(0,42)
I have to Generate 64 observations for u
I have to combine the observations with the 64 observations on X2 and X3 in a table that I already have to estimate the corresponding sample regression model Betahat1 Betahat 2 and Betahat3 using the OLS method. Save the estimated coefficients.
Then they want me to repeat that 20 times. Is there a shortcut to create u, u1,u2...u20 where each of them is N(0,42) and gets added to my editor then I want a command that regresses my X2 and X3 and each of the Us on my Y. So I have to have 20 regressions. I wanna do that in one command. Is there a way? Please help.
As part of a Monte Carlo study
I have the following population regression model:
Y= 262 -0.006X2-2.4X3+u
I was told to assume that u is normally distributed with N(0,42)
I have to Generate 64 observations for u
I have to combine the observations with the 64 observations on X2 and X3 in a table that I already have to estimate the corresponding sample regression model Betahat1 Betahat 2 and Betahat3 using the OLS method. Save the estimated coefficients.
Then they want me to repeat that 20 times. Is there a shortcut to create u, u1,u2...u20 where each of them is N(0,42) and gets added to my editor then I want a command that regresses my X2 and X3 and each of the Us on my Y. So I have to have 20 regressions. I wanna do that in one command. Is there a way? Please help.
margins- marginal effect of time-variant variables in mixed effects models
Dear Statalist users,
I am using Stata 14 SE, and have a question about using margins for a time-variant variable in a mixed-effects model.
My dependent variable Y, and three independent variables (X) are measured at two time-points. While X1and X3 are measured in each district at time 1 and time 2(wave), X2 is measured at the province-level at time 1 and 2. Districts are nested in provinces. Basically I have multi-level repeated measures data. My control variables (Z) are time-invariant. Data example is below.
The code I use to model the effect of X1 X2 and X3 on Y is:
I look at the effect of the interaction between X1 and X2:
And what I would like to do is to plot how this interaction. X3 is measured in wave 1 and 2, and I just am not sure how to show the effect of the change in X3 in a graph. I tried:
I am not confident this is capturing what I would like to show. I generated a variable called 'DifferenceinX3' where I subtracted the wave 1 values of X3 from wave 2 values.
When I use this variable in the model instead of X3, both the estimates and the marginsplot change completely. I am not sure why the coefficient of X3 almost doubles when I use the difference variable instead.
Should not their substantive effect be exactly the same? The direction of effect is the same but the coefficient almost doubles.
But more importantly, how does one go about plotting the marginal effects of two time-variant variables in a mixed model?
I appreciate your help.
I am using Stata 14 SE, and have a question about using margins for a time-variant variable in a mixed-effects model.
My dependent variable Y, and three independent variables (X) are measured at two time-points. While X1and X3 are measured in each district at time 1 and time 2(wave), X2 is measured at the province-level at time 1 and 2. Districts are nested in provinces. Basically I have multi-level repeated measures data. My control variables (Z) are time-invariant. Data example is below.
The code I use to model the effect of X1 X2 and X3 on Y is:
Code:
mixed Y i.wave i.X1 c.X3 c.X2 Z1 Z2 ||province: ||district_no:
Code:
mixed Y i.wave i.X1##c.X3 c.X2 Z1 Z2 ||province: ||district_no:
Code:
margins X1, at (X3(-.15(.03).15) marginsplot
When I use this variable in the model instead of X3, both the estimates and the marginsplot change completely. I am not sure why the coefficient of X3 almost doubles when I use the difference variable instead.
Should not their substantive effect be exactly the same? The direction of effect is the same but the coefficient almost doubles.
But more importantly, how does one go about plotting the marginal effects of two time-variant variables in a mixed model?
I appreciate your help.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double Y byte wave float X1 double X3 float(X2 Z1 Z2 province) int district_no float DifferenceinX3 .5008916001877053 1 0 .8836196467457053 0 0 0 1 1 0 .5701738334858188 2 0 .891889725917615 .5954652 0 0 1 1 .00827006 .25843545684528696 1 0 .8517262864682598 0 0 0 1 2 0 .33096601673721704 2 0 .8558748788826122 .5954652 0 0 1 2 .0041485853 .2314806361495061 1 0 .8671302188663923 0 0 0 1 3 0 .28064805600966664 2 0 .8749831214675643 .5954652 0 0 1 3 .007852902 .4701195219123506 1 0 .8039459375485475 0 0 0 1 4 0 .5468765275197967 2 0 .825685903500473 .5954652 0 0 1 4 .021739945 .45914704629798625 1 0 .8386024120303737 0 0 0 1 5 0 .5308816595945309 2 0 .8622581288649511 .5954652 0 0 1 5 .023655705 .5529871977240398 1 0 .9015807301467821 0 0 0 1 6 0 .5972691441441441 2 0 .9105439383410197 .5954652 0 0 1 6 .008963187 .23223635003739715 1 0 .869795996674554 0 0 0 1 7 0 .2716732739920712 2 0 .8649493081680801 .5954652 0 0 1 7 -.00484667 .42910860429108605 1 0 .8222902932232359 0 0 0 1 8 0 .5013144023806029 2 0 .8456993069130976 .5954652 0 0 1 8 .023409005 .4536984981126014 1 0 .8757469606429013 0 0 0 1 9 0 .5270582609388699 2 0 .8802646998000965 .5954652 0 0 1 9 .0045177345 .4836112708453134 1 0 .7833479404031551 0 0 0 1 10 0 .5462431001464458 2 0 .8043728423475259 .5954652 0 0 1 10 .021024877 .36644963615473 1 0 .8615101724805369 0 0 0 1 11 0 .4484892121448794 2 0 .8800895139308706 .5954652 0 0 1 11 .018579356 .2582014753593243 1 0 .8491196205853588 0 0 0 1 12 0 .3228475641790513 2 1 .8528333602230218 .5954652 0 0 1 12 .0037137566 .35363741339491916 1 0 .8177992041628406 0 0 0 1 13 0 .4731265652090156 2 0 .8134437035333794 .5954652 0 0 1 13 -.004355507 .32956786802940646 1 0 .8485232696897375 0 0 0 1 14 0 .37584912406149446 2 0 .8606675244077803 .5954652 0 0 1 14 .012144266 .3377397403399747 1 0 .8314157170778479 0 0 0 1 15 0 .42843334243252795 2 0 .8406463605192804 .5954652 0 0 1 15 .009230647 .5449415852219232 1 0 .8457498530453939 0 0 0 2 16 0 .641944955764613 2 0 .8683048852266039 .3317993 0 0 2 16 .02255503 .524244480400856 1 0 .7929118002416432 0 0 0 2 17 0 .6770883478172743 2 0 .8163368642780467 .3317993 0 0 2 17 .023425037 .5755950385517935 1 0 .8811443932411674 0 0 0 2 18 0 .6560495938435229 2 1 .9165220744168112 .3317993 0 0 2 18 .03537767 .7091292483254775 1 0 .6773241515002459 0 0 0 2 19 0 .85121412803532 2 0 .7589862514493954 .3317993 0 0 2 19 .08166207 .4849688681767829 1 0 .8204195205479452 0 0 0 2 20 0 .5811804708578187 2 0 .8416793893129771 .3317993 0 0 2 20 .02125984 .685989894350023 1 0 .7928177975148384 0 0 0 2 21 0 .8277890608586036 2 0 .828457731311777 .3317993 0 0 2 21 .03563996 .7121102248005802 1 0 .7838338895068595 0 0 0 2 22 0 .8619329388560157 2 0 .872473077649726 .3317993 0 0 2 22 .08863921 .9071300179748353 1 0 .8856816985436041 0 0 0 2 23 0 .9593705293276109 2 0 .9326021581461171 .3317993 0 0 2 23 .04692047 .5623674911660778 1 1 .8395382395382396 0 0 0 2 24 0 .6620594333102972 2 0 .8542088516054382 .3317993 0 0 2 24 .014670635 .5348067182412929 1 0 .9103541429696387 0 0 0 3 25 0 .6653963139734789 2 0 .915298976671581 .28208148 0 0 3 25 .004944839 .4125722543352601 1 0 .8892910634048926 0 0 0 3 26 0 .4892944388561575 2 0 .900830606594513 .28208148 0 0 3 26 .01153956 .5145569620253164 1 0 .868237347294939 0 0 0 3 27 0 .62026913372582 2 0 .8601643069393463 .28208148 0 0 3 27 -.008073069 .6256125821524903 1 0 .8756493401735875 0 0 0 3 28 0 .7158580413297394 2 0 .8784253184098804 .28208148 0 0 3 28 .0027759855 .4544952285283777 1 0 .8783187717363644 0 0 0 3 29 0 .5412363492612542 2 1 .8783167145512929 .28208148 0 0 3 29 -2.0720697e-06 .5304798962386511 1 0 .9205705009276438 0 0 0 3 30 0 .6460984702403908 2 0 .9102711397058824 .28208148 0 0 3 30 -.010299353 .43370756482224004 1 0 .9102380952380953 0 0 0 3 31 0 .5171763437963087 2 0 .9005827090022595 .28208148 0 0 3 31 -.009655378 .4417435328386157 1 0 .8575885377549252 0 0 0 3 32 0 .5016402405686168 2 0 .8635209235209235 .28208148 0 0 3 32 .005932394 .4869785664899747 1 0 .778960223307746 0 0 0 3 33 0 .5746084480303749 2 0 .7624526498389209 .28208148 0 0 3 33 -.016507579 .4162415833503367 1 0 .8761123713139068 0 0 0 3 34 0 .47278770253427505 2 1 .8665964542741794 .28208148 0 0 3 34 -.009515888 .559327566508895 1 0 .8404325464855598 0 0 0 3 35 0 .660734327400994 2 0 .8594414893617022 .28208148 0 0 3 35 .019008964 .5650262617035853 1 0 .9233479726279236 0 0 0 3 36 0 .7066111111111111 2 0 .9333808336302102 .28208148 0 0 3 36 .010032884 .6467490520994242 1 0 .9294468787705594 0 0 0 3 37 0 .7727925586485193 2 0 .9328652917946467 .28208148 0 0 3 37 .003418416 .43885714285714283 1 0 .8626991565135895 0 0 0 3 38 0 .4796839729119639 2 0 .8629751290473956 .28208148 0 0 3 38 .000275978 .5349692529496572 1 0 .8505993873465352 0 0 0 3 39 0 .6348095224320963 2 0 .8626212058616248 .28208148 0 0 3 39 .012021798 .569474921630094 1 0 .8865721434528774 0 0 0 3 40 0 .6965041965041965 2 0 .8881137465949106 .28208148 0 0 3 40 .001541624 .5593326906149139 1 0 .8820998278829604 0 0 0 3 41 0 .6688803780964798 2 0 .8722279220266751 .28208148 0 0 3 41 -.009871885 .24974731232197003 1 0 .8973921874433282 0 0 0 3 42 0 .30645011600928074 2 0 .9088443737344518 .28208148 0 0 3 42 .01145216 .3044498656702591 1 0 .8084917045579219 0 0 0 4 43 0 .44136020068534704 2 0 .7534060943451509 0 0 0 4 43 -.05508561 .06365623500559792 1 0 .8332165995447383 0 0 0 4 44 0 .12957372298031639 2 1 .753286551785397 0 0 0 4 44 -.07993005 .06817504100193827 1 0 .8357064790727029 0 0 0 4 45 0 .11110193633623715 2 0 .765450680404053 0 0 0 4 45 -.07025579 .2962797808638043 1 0 .7742774566473989 0 0 0 4 46 0 .5701609574907139 2 0 .7253514252245217 0 0 0 4 46 -.04892602 .15349887133182843 1 0 .8223345320244921 0 0 0 4 47 0 .3618196160925585 2 0 .7642843118005105 0 0 0 4 47 -.05805022 .07620412844036697 1 0 .8578691709844559 0 0 0 4 48 0 .16082134968218906 2 0 .7408955808864356 0 0 0 4 48 -.11697356 .12797713559860274 1 0 .8392732354996506 0 0 0 4 49 0 .20923184520340365 2 0 .7784266879037254 0 0 0 4 49 -.06084653 .1355679965983984 1 0 .8142646558566731 0 0 0 4 50 0 .26414204902576993 2 0 .7482440990213011 0 0 0 4 50 -.066020556 end
how to calculate marginal effect?
Hi there,
Need some help calculating marginal effect.
My question is 'What is the marginal effect of annual household income (income) on the total medical expenditure?' Given marginal effect is the slope of the regression, I am a bit confused on how to answer this question. Any help would be greatly appreciated.
Thank you!
Need some help calculating marginal effect.
| Variable | Description |
| sid | Subject ID |
| age | Age |
| famsze | Size of the family |
| educyr | Years of education |
| totexp | Total medical expenditure |
| retire | =1 if retired |
| female | =1 if female |
| white | =1 if white |
| hisp | =1 if Hispanic |
| marry | =1 if married |
| northe | =1 if North-East area |
| mwest | =1 if Mid-West area |
| south | =1 if South area (West is excluded) |
| phylim | =1 if has functional limitation |
| actlim | =1 if has activity limitation |
| msa | =1 if metropolitan statistical area |
| income | annual household income (in 1000 dollars) |
| injury | =1 if condition is caused by an accident/injury |
| priolist | =1 if has medical conditions that are on the priority |
| totchr | # of chronic problems |
| suppins | =1 if has supplementary private insurance |
| hvgg | =1 if health status is excellent, good or very good |
Thank you!
Calculating percentiles in STATA
Hi, I am using the following loop to get the 0.5%, 1%, 2%, 5%, 95%, 98%, 99% and 99.5% percentiles for the variable "Return". However, STATA does not recognise the 0.5 and also the 99.5, I believe this is because these are not integers. Could anyone please help me, I would much appreciate this since I have been struggling for a while to get around this problem. I know I could get the 0.5 percentile and then use gen command to create my dummy variables, however I am required to use a more efficient way.
_pctile Return, percentiles( 0.5 1 2 5 95 98 99 99.5)
return list
local i = 1
foreach n of numlist 0.5 1 2 5 95 98 99 99.5 {
gen byte above`n' = Return >= `r(r`i')'
local ++i
}
Kind Regards,
Adrian
_pctile Return, percentiles( 0.5 1 2 5 95 98 99 99.5)
return list
local i = 1
foreach n of numlist 0.5 1 2 5 95 98 99 99.5 {
gen byte above`n' = Return >= `r(r`i')'
local ++i
}
Kind Regards,
Adrian
using DHS DATA SET for cluster fixed effect
- Hello every one,
Please I am new in stata, I am trying to construct a cross sectional data set from a survey stata file and climate record of survey cluster in excel, then run a probit cluster fixed effect on the sample..
for instance, the excel file is:while the excel file is:cluster temperature precipitation 1 25.6 1.89 2 27 2.4 3 24.44 1.56 4
5
624.89
30
27.62.32
2.9
2I imported the excel to stata and saved it as a stata file, then i tried merging the two files using the command,region cluster adoption age sex f=2, male=1 urban 1 1 10 2 rural 1 0 15 1 urban 1 1 14 2 rural 1 0 5 2 urban 2 1 13 1 rural 2 1 6 1 urban 2 0 7 2 rural 3 1 13 2 urban 3 1 4 1 rural 3 0 7 2 urban 3 1 4 2
use "C:\survey\A.dta", clear sort cluster joinby cluster using "C:\temp\B.dta", unmatched(both) sortby cluster: probit adoption age i.sex temperature precipitation I am getting error messages. Kindly guide me please. Thank you.
estat vce, corr and vif
I apologise in advance if this has been posted before. Though I couldn't seem to find a discussion when I searched the forum.
I am using estat vce, corr and it appears that my two main independent variables of interest have a correlation of 0.68.
I realise when using vif a value over 10 is problematic, though I was wondering what the cut off is using estat vce, corr.
Also, in relation to this, models with year or dummies, age and age^2 controls for example would have high correlation as one would imagine.
How does one deal with that sort of collinearity?
Thanks
I am using estat vce, corr and it appears that my two main independent variables of interest have a correlation of 0.68.
I realise when using vif a value over 10 is problematic, though I was wondering what the cut off is using estat vce, corr.
Also, in relation to this, models with year or dummies, age and age^2 controls for example would have high correlation as one would imagine.
How does one deal with that sort of collinearity?
Thanks
After splitting a variable that had options seperated by commas, how to find out which is the highest and the lowest frequency of values?
My data looks like this
fruit_variable has values
1. Mango
2. Mango, pineapple
3. Mango, grapes
4. Banana, grapes, chickoo
5. strawberry , mango, orange
7. orange, banana , mango
I want to know which fruit is produced the most to least. It is not possible to know that in such type of data seperated by commas.
So, I used the split command and seperated every option seperated by commas into different variable like the following
fruit_variable1 fruit variable2 fruit variable3 fruit variable4
mango
mango Pineapple
mango Grapes
Banana Grapes Chickoo
Strawberry Mango orange
Orange Banana Mango
Now I created new variables after every fruit---- fruit_mango, fruit_pineapple and so on.
I did
gen fruit_mango=0 if fruit_variable1=="Mango"
replace fruit_mango=0 if fruit_variable2=="Mango"
fruit_mango=0 if fruit_variable3=="Mango"
and so on
But, I don't think, this is smart coding. Can it be done any other way?
is there a command to include all the fruit_variable_n(series) variables together instead of writing it individually every time? Doesn't " * " symbol denote the "n"(value that a variable takes after splitting)?
How do I do this?
fruit_variable has values
1. Mango
2. Mango, pineapple
3. Mango, grapes
4. Banana, grapes, chickoo
5. strawberry , mango, orange
7. orange, banana , mango
I want to know which fruit is produced the most to least. It is not possible to know that in such type of data seperated by commas.
So, I used the split command and seperated every option seperated by commas into different variable like the following
fruit_variable1 fruit variable2 fruit variable3 fruit variable4
mango
mango Pineapple
mango Grapes
Banana Grapes Chickoo
Strawberry Mango orange
Orange Banana Mango
Now I created new variables after every fruit---- fruit_mango, fruit_pineapple and so on.
I did
gen fruit_mango=0 if fruit_variable1=="Mango"
replace fruit_mango=0 if fruit_variable2=="Mango"
fruit_mango=0 if fruit_variable3=="Mango"
and so on
But, I don't think, this is smart coding. Can it be done any other way?
is there a command to include all the fruit_variable_n(series) variables together instead of writing it individually every time? Doesn't " * " symbol denote the "n"(value that a variable takes after splitting)?
How do I do this?
Renaming variables using the 'foreach' command
I am trying to rename multiple variables corresponding to baseline characteristics, endline characteristics and endline2 characertistics into wave1 wave2 and wave3 so that I can reshape the data into a long format based on these three time periods.
I used the following codes however it does not seem to work despite multiple tries at reformatting it, STATA gives me an error message everytime:
local vars "asset_tot_value iagri_month ibusiness_month ipaidlabor_month ranimals_month ctotal_pcmonth"
foreach k of local vars {
rename `k'_bsl `k'1
rename `k'_end `k'2
rename `k'_fup `k'3
}
Is there something wrong with how I am coding this?
I used the following codes however it does not seem to work despite multiple tries at reformatting it, STATA gives me an error message everytime:
local vars "asset_tot_value iagri_month ibusiness_month ipaidlabor_month ranimals_month ctotal_pcmonth"
foreach k of local vars {
rename `k'_bsl `k'1
rename `k'_end `k'2
rename `k'_fup `k'3
}
Is there something wrong with how I am coding this?
Poisson-CRE and over identification test
Hi all,
I am trying to perform an over identification test on the following theoretic scenario. Say that I have a balanced panel where N = 208 and T=10. Let's call z = (z_1, z_2) the instrument matrix, x(NT,1) the endogenous being instrumented, w the other feature (NT, d_W). Now, according to Wooldridge et al. procedure, I would like to compute:
My aim is to test for over identifying restrictions in such scenario. One thing I thought is to manually compute the Hansen test:
1) estimate the entire procedure with all instruments;
2) save the residuals;
3) regress the residuals on the instruments alone (xtreg, fe ???)
4) Now N*R^2 would be Chi squared, but how to save N*R^2 from xtreg, fe? And how, in the case we manage to do it, to predict p-values?
As you can see the procedure lacks some steps. So, can you please either advice me on steps 3) and 4) or provide a more elegant (maybe a command or standard procedure) way to perform overidentifiication test on Poisson-CRE procedure please?
Thank you,
Federico
I am trying to perform an over identification test on the following theoretic scenario. Say that I have a balanced panel where N = 208 and T=10. Let's call z = (z_1, z_2) the instrument matrix, x(NT,1) the endogenous being instrumented, w the other feature (NT, d_W). Now, according to Wooldridge et al. procedure, I would like to compute:
Code:
xtreg x z w i.Year, fe predict double residuals, e xtpoisson y x residuals w i.Year, fe
1) estimate the entire procedure with all instruments;
2) save the residuals;
3) regress the residuals on the instruments alone (xtreg, fe ???)
4) Now N*R^2 would be Chi squared, but how to save N*R^2 from xtreg, fe? And how, in the case we manage to do it, to predict p-values?
As you can see the procedure lacks some steps. So, can you please either advice me on steps 3) and 4) or provide a more elegant (maybe a command or standard procedure) way to perform overidentifiication test on Poisson-CRE procedure please?
Thank you,
Federico
xtivreg2: interacting two instumented variables (gmm)
Hello,
my baseline model is the following:
Where x1 is my variable of interest and z1 and z2 are valid instruments.
I would like to expand my model to include an interaction between x1 and x2. x2 is another endogenous variable for which I have valid instruments (z3 z4). Does anyone know how can I interact with two instrumented variables using xtivreg2?
How can I specify the interaction with multiple instruments into xtivreg2?
The specification below is wrong but perhaps it is a starting point to address the question
thanks a lot in advance for your help Best Regards
my baseline model is the following:
Code:
xtivreg2 y l.y $controls trend (x1= z1 z2),fe gmm robust bw(1) //
I would like to expand my model to include an interaction between x1 and x2. x2 is another endogenous variable for which I have valid instruments (z3 z4). Does anyone know how can I interact with two instrumented variables using xtivreg2?
How can I specify the interaction with multiple instruments into xtivreg2?
The specification below is wrong but perhaps it is a starting point to address the question
Code:
xtivreg2 y l.y $controls trend (x1 x2 x1#x2= z1 z2 z3 z4 z1#z2#z3#z4),fe gmm robust bw(1) //
Control Function Approach and looking for a method to do robustness check of fixed effect
I'm applying the control function (CF) approach in a nonlinear model using a panel dataset of five waves (wave 1 - 5). I have run a fixed effect and random effect, the Hausman test revealed that the fixed effect is accepted.
Then, I want to do a robust check of my result - I decided to run 2SLS, but my regressor is not statistically significant in the second stage. I attempted the control function approach, which is the alternative to 2SLS. First, it was statistically significant, but when I run it again it is not statistically significant.
Can someone help check if I have missed any step, and what method can I use to do a robustness check of fixed effect? I have tried GMM, but it was not making sense.
Kindly note: both the dependent variable and regressor are continuous, even IV (diffage - the difference between the age of male and female in the household).
2SLS
First stage
reg Autonomy4 diffage age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant
estat endog
estat firststage
Second stage
**** 2SLS
ivreg2 stunting age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum religion dwel african indian white parent grandparent uncle_ant (Autonomy4 = diffage)
or
ivregress 2SLS stunting age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum religion dwel african indian white parent grandparent uncle_ant (Autonomy4 = diffage), first
Control Function Approach
First stage
reg Autonomy4 diffage age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant
test diffage
predict error1hat, residual
Second stage
ivreg2 stunting Autonomy4 age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant error1hat
To show the effect: I specified below graph or plot
margins, at(Autonomy4=(-1.02(0.01)1)) atmeans
marginsplot
Thanks for your help.
Then, I want to do a robust check of my result - I decided to run 2SLS, but my regressor is not statistically significant in the second stage. I attempted the control function approach, which is the alternative to 2SLS. First, it was statistically significant, but when I run it again it is not statistically significant.
Can someone help check if I have missed any step, and what method can I use to do a robustness check of fixed effect? I have tried GMM, but it was not making sense.
Kindly note: both the dependent variable and regressor are continuous, even IV (diffage - the difference between the age of male and female in the household).
2SLS
First stage
reg Autonomy4 diffage age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant
estat endog
estat firststage
Second stage
**** 2SLS
ivreg2 stunting age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum religion dwel african indian white parent grandparent uncle_ant (Autonomy4 = diffage)
or
ivregress 2SLS stunting age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum religion dwel african indian white parent grandparent uncle_ant (Autonomy4 = diffage), first
Control Function Approach
First stage
reg Autonomy4 diffage age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant
test diffage
predict error1hat, residual
Second stage
ivreg2 stunting Autonomy4 age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant error1hat
To show the effect: I specified below graph or plot
margins, at(Autonomy4=(-1.02(0.01)1)) atmeans
marginsplot
Thanks for your help.
Thursday, October 29, 2020
How will I be able to have a count of certain string variable when using collapse command ??
This is how my data looks like.....
What I would like to do is:
make a table that can display School Name, Location and mean score of NScore, EScore and MScore
for which
code
collapse (mean) NScore EScore Mscore Location, by(SchoolName)
works fine
And the resulting table looks Similar to
I was wondering if we can display number of students of each school in the table after which table will look similar to
Thank you all for the help.
| Student Name | SchoolName | Location | NScore | EScore | MScore |
| A | X | 5 | 1 | 2 | 1 |
| B | X | 5 | 1 | 2 | 2 |
| C | X | 5 | 2 | 2 | 2 |
| D | X | 5 | 2 | 1 | 2 |
| E | X | 5 | 1 | 1 | 1 |
| F | X | 5 | 2 | 2 | 2 |
| G | X | 5 | 2 | 2 | 2 |
| H | Y | 5 | 2 | 2 | 2 |
| I | Y | 5 | 2 | 2 | 1 |
| J | Y | 5 | 2 | 4 | 3 |
| K | Y | 5 | 1 | 3 | 2 |
| L | Z | 5 | 4 | 2 | 1 |
| M | Z | 5 | 5 | 2 | 3 |
| N | Z | 5 | 1 | 1 | 1 |
| O | Z | 5 | 2 | 1 | 2 |
| P | Z | 5 | 3 | 2 | 1 |
| Q | Z | 5 | 4 | 4 | 4 |
| R | Z | 5 | 1 | 2 | 1 |
| S | Z | 5 | 2 | 3 | 1 |
make a table that can display School Name, Location and mean score of NScore, EScore and MScore
for which
code
collapse (mean) NScore EScore Mscore Location, by(SchoolName)
works fine
And the resulting table looks Similar to
| SchoolName | Location | NScore | EScore | MScore |
| X | 5 | 2 (mean values) | 3 | 1 |
| Y | 5 | 1 | 2 | 3 |
| Z | 5 | 3 | 2 | 2 |
I was wondering if we can display number of students of each school in the table after which table will look similar to
| SchoolName | Location | NoofStudents | NScore | EScore | MScore |
| X | 5 | 7 | 2 | 3 | 1 |
| Y | 5 | 4 | 1 | 2 | 3 |
| Z | 5 | 8 | 3 | 2 | 2 |