BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Saturday, October 31, 2020

Exporting Stata Graph

I sent Stata graph as tiff to the editors of a journal but the insist they would like them in a Microsft word formattable version. Does anyone know any other method of exporting a STATA.gph to a word formattable version?

How to calculate birth year effects?

Dear friends,

I hope to calculate the birthyear effects. My code is similar as below,

Code:

reg income i.birthyear i.cohort, noconst

But most dummies of i.cohort are missing. Thank you for any advice.

Convert Data in Matrix Form to Long Format

Hello every one,

I have data in matrix form, where there are several rows that should ideally be independent columns. For example, I have something like this:

			id2	100	101	102	103	104	105	106	107
			name2	AA	BB	CC	DD	EE	FF	GG	HH
			location2	G1	G2	G3	G4	G5	G6	G7	G8
id1	name1	location1
1	A	l1	a1	a2	a3	a4	a5	a6	a7	a8	a9
2	B	l2	b1	b2	b3	b4	b5	b6	b7	b8	b9
3	C	l3	c1	c2	c3	c4	c5	c6	c7	c8	c9
4	D	l4	d1	d2	d3	d4	d5	d6	d7	d8	d9
5	E	l5	e1	e2	e3	e4	e5	e6	e7	e8	e9
6	F	l6	f1	f2	f3	f4	f5	f6	f7	f8	f9
7	G	l7	g1	g2	g3	g4	g5	g6	g7	g8	g9
8	H	l8	h1	h2	h3	h4	h5	h6	h7	h8	h9
9	I	l9	i1	i2	i3	i4	i5	i6	i7	i8	i9
10	J	l10	j1	j2	j3	j4	j5	j6	j7	j8	j9
11	K	l11	k1	k2	k3	k4	k5	k6	k7	k8	k9
12	L	l12	l1	l2	l3	l4	l5	l6	l7	l8	l9
13	M	l13	m1	m2	m3	m4	m5	m6	m7	m8	m9

If I had one row on top, I would ideally reshape it to long using id2 but I need all of the information on top of the matrix. I wonder if somebody could help me figure out this.

Thanks

store coefficients from different data set

Hi Statalist,

Could anyone help me, please? I want to do a plot with coefficients from regressions from different data set. For example, if I use webuse auto and run an estimation and then open another data set webuse lifeexp an run another model. How can I store both results and plot the coefficients in a graph with confidence interval, please?

Many thanks.

[Panel Data] How can I study the way a variable in the past affected the drop in a different variable in the future.

All the values that I have for the variable bondratio (amount of company debt that is in the form of bonds) are for the years 2017-2019, but I want to study how this variable affected the impact of the COVID pandemic on stock prices in 2020 (i.e. if companies with a higher/lower bondratio were more/less resilient to the crisis).

Here is an example of my data for three different stocks:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double PERMNO float month double stock_price float bondratio
13688 684  61.88999938964844        .
13688 685              66.75        .
13688 686  66.36000061035156 .8182862
13688 687  67.05000305175781        .
13688 688  68.37999725341797        .
13688 689  66.37000274658203 .8085532
13688 690  67.69000244140625        .
13688 691  70.37999725341797        .
13688 692  68.08999633789063 .8222454
13688 693  57.77000045776367        .
13688 694   54.2400016784668        .
13688 695  44.83000183105469 .7425898
13688 696  42.43000030517578        .
13688 697  41.09000015258789        .
13688 698  43.93000030517578 .7249041
13688 699 46.099998474121094        .
13688 700  43.33000183105469        .
13688 701 42.560001373291016 .8167768
13688 702  43.08000183105469        .
13688 703  46.18000030517578        .
13688 704   46.0099983215332 .8127667
13688 705 46.810001373291016        .
13688 706   26.3799991607666        .
13688 707              23.75 .6926678
13688 708                 13        .
13688 709 17.030000686645508        .
13688 710 17.799999237060547 4.933464
13688 711 22.520000457763672        .
13688 712 17.100000381469727        .
13688 713 22.920000076293945 3.659187
13688 714   18.1299991607666        .
13688 715 10.449999809265137        .
13688 716                 10        .
13688 717  6.170000076293945        .
13688 718  7.460000038146973        .
13688 719 10.869999885559082        .
13688 720              15.21        .
13688 721               15.5        .
13688 722               8.99        .
13688 723              10.64        .
13688 724              11.86        .
13688 725               8.87        .
13688 726               9.35        .
13688 727               9.26        .
13712 684 13.670000076293945         .
13712 685 16.920000076293945         .
13712 686 15.949999809265137         .
13712 687 15.260000228881836         .
13712 688 16.229999542236328         .
13712 689 15.729999542236328         .
13712 690 15.960000038146973         .
13712 691 15.739999771118164         .
13712 692 16.299999237060547         .
13712 693 15.739999771118164         .
13712 694 16.690000534057617         .
13712 695              17.25  .9295753
13712 696  19.31999969482422         .
13712 697 19.549999237060547         .
13712 698 19.139999389648438  .9316249
13712 699 18.860000610351563         .
13712 700 20.690000534057617         .
13712 701 19.579999923706055  .9333118
13712 702 20.760000228881836         .
13712 703 20.829999923706055         .
13712 704   22.3799991607666  .9333699
13712 705 19.719999313354492         .
13712 706 19.399999618530273         .
13712 707   16.6200008392334  .9368091
13712 708  16.31999969482422         .
13712 709   16.3799991607666         .
13712 710 15.899999618530273  .9358506
13712 711 15.319999694824219         .
13712 712 13.479999542236328         .
13712 713 14.260000228881836  .9378165
13712 714 11.329999923706055         .
13712 715 11.720000267028809         .
13712 716 13.989999771118164         .
13712 717 12.260000228881836         .
13712 718   12.6899995803833         .
13712 719 12.609999656677246         .
13712 720              10.14         .
13712 721                9.4         .
13712 722               9.24         .
13712 723               8.55         .
13712 724               7.84         .
13712 725               7.82         .
13712 726                7.4         .
13712 727               7.57         .
13714 684 12.479999542236328         .
13714 685   12.9399995803833         .
13714 686 13.800000190734863   .171748
13714 687   14.0600004196167         .
13714 688 14.210000038146973         .
13714 689 13.369999885559082 .17461425
13714 690  13.34000015258789         .
13714 691 14.079999923706055         .
13714 692              14.75 .18139043
13714 693 14.319999694824219         .
13714 694  14.09000015258789         .
13714 695 13.420000076293945 .18119723
13714 696 12.989999771118164         .
13714 697 11.010000228881836         .
13714 698 12.420000076293945  .1810034
13714 699 12.890000343322754         .
13714 700 14.319999694824219         .
13714 701 14.510000228881836 .15326667
13714 702 14.539999961853027         .
13714 703 13.859999656677246         .
13714 704 13.899999618530273 .16861856
13714 705   12.4399995803833         .
13714 706 13.149999618530273         .
13714 707 12.720000267028809         .
13714 708 14.100000381469727         .
13714 709               13.5         .
13714 710 14.199999809265137         .
13714 711   14.5600004196167         .
13714 712 14.550000190734863         .
13714 713              13.75         .
13714 714 14.020000457763672         .
13714 715 13.300000190734863         .
13714 716 13.930000305175781         .
13714 717 13.949999809265137         .
13714 718  14.34000015258789         .
13714 719 13.699999809265137         .
13714 720              13.89         .
13714 721              12.57         .
13714 722               7.08         .
13714 723               9.43         .
13714 724               9.76         .
13714 725               10.3         .
13714 726               9.68         .
13714 727              10.71         .
end
format %tm month

PERMNO is the stock id

The months 720+ correspond to January 2020 and onwards

Any help will be much appreciated, as I am really scratching my head over this one!

r(608) error in loop

Hi!

I'm trying to create groups in my data: for every year and district I create a group that includes everyone 2.5 years more or less than that person. To do this, I created a loop that takes every year district group and only keeps the observations I want and appends them. I run a code that works up to a certain point and suddenly stops working. Here is my code:

keep person_id year district
duplicates drop

preserve
keep if _n>=1
gen group=.
save "file_1.dta", replace
restore

local i=1

levelsof district, local(lev_1)
levelsof year, local(lev_2)

foreach l_1 of local lev_1{
foreach l_2 of local lev_2{
preserve
keep if abs(year-`l_2')<=3 & district==`l_1'

keep person_id
gen year=`l_2'
gen district=`l_1'
gen group=`i'

append using "file_1.dta"
save "file_1.dta", replace
restore

local i=`i'+1
}
}

Thanks for any help!

Miranda

Importing/Reformatting Data by ID

Hi Everyone,

I'm trying to merge about a dozen different lists that use the same ID scheme but are missing different subsets of entries. The data is by the year and I want every year in one .dta file. How can I import by ID or reformat columns so that I get a data set with the same ID list and just missing values by the year?

Here is a small sample of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(unitid v2 v3 v4 v5 v6 v7 v8)
100654  6106 100654  6001 100690   597 100654  5628
100663 21923 100663 20902 100724  5318 100663 18333
100706  9736 100690   675 100751 37663 100690   627
100724  4413 100706  9101 100760  1769 100706  7866
100751 38390 100724  4760 100830  4878 100724  5383
100760  1681 100751 38563 100858 28290 100751 37098
100812  3044 100760  1835 101028  1528 100760  1787
100830  5211 100812  3114 101143  1755 100812  3041
100858 30440 100830  4894 101161  4781 100830  4919
100937  1268 100858 29776 101189  3319 100858 27287
end

You can see in the unitid columns some of the same ID values. I want to line up data by ID and fill in missing values with the empty condition '.'

Thank you

Comparing IV Coefficients for Two Groups

Hi,
I am trying to determine whether there is a statistically significant difference between a single IV estimate computed for two separate groups:
My current code is as follows:

Code:

ivregress 2sls delta_x delta_p208a delta_HS (OCC_VP=OCC_VV) if p207==1
ivregress 2sls delta_x delta_p208a delta_HS (OCC_VP=OCC_VV) if p207==2

I want to test if the OCC_VP coefficient is equal or not between p207==1 and p207==2.
I have found that a solution would be using suest, however, this command is not compatible with ivregress. Is there any alternative approach?

Thanks!

how to custom plot axis with different scaling?

Is there any option with Stata that can achieve the same goal as this in R: labels = function(y) {paste0(y/1000000,'million')}?

Basically, I want to show big numbers on y scale but shown as in millions. Cannot manually set the scale because I have to do multiple plots.

Thanks!

Stata is turning out results very slowly. What can i do?

I have a dataset comprising about 30 million observations. When i try to execute even basic commands like save or use stata takes a lot of time to respond. I did not have this problem even until recently. In the past i have comfortably crunched about 300k observations. Without having to change to a different system, what else can i do? Please help!

Using the foreach-command to find a spesific part among hundreds of observations in a string variable.

Dear members of Statalist,

I am fairly new to Stata and I have been struggling with an issue I believe is related to the "foreach" command. In my dataset, I have a variable (related_diagnosis) which contains various ICD10-codes where I am interested to make a new categorical variable which encodes 1 if there is a value within that variable containing a specific letter ("G"). related_diagnosis is coded as a string variable. The variable contains thousands of observations, and looking for observations containing "G" would take a very long time. I suppose I should use a loop such as "foreach" to simplify this.

I've tried to search for a similar topic without success. If this is answered in another topic, I would be grateful if someone could provide the link discussing this topic.

Best,
Haakon

How to draw a smaller sample from a larger dataset with different mean/ distribution than that of larger/ parent dataset?

Hello,

I want to draw a ssample (with certain mean/ distribution) from a larger dataset that has different mean/ distribution. For example, say in my larger data set has mean age is 40y and 89% were men. I want to draw a smaller sample with mean age 53y and 49% men. How do I do that?

Sharing some background context:

I have data collected under a community-based diabetes screening program. This screening was done using telemedicine equipped mobile medical van. Patients who were diagnosed with diabetes or at risk of diabetes complications were referred to a rural diabetic center for follow-up care.

Apparently, the rural diabetic center caters to many other patients not referred to by the van.

Unfortunately, the patients who were referred to the center were given a new unique ID and there is no way of identifying those screened in the van from the follow-up data recorded in the center.

That said, I want to draw a sample population from the follow-up data in a way that the baseline characteristics of the sample (i.e. health profile of patients who visited the center the first time) match the baseline characteristics of those screened. This way the sample drawn from the follow-up data will be representative of the screened population.

This will allow me to understand the long-term effect of care provided in the diabetes center to those screened in the mobile-medical van. I want to measure the added value of running sreenig drives suing mobile medical units as compared to routine care.

I understand that this is not an ideal way; however, due to data paucity on similar delivery care models, I don't have an alternative.

Thanks in advance.

Best,
Preeti

Problems of setting intial values in sem

Dear all,
I am running a sem on the following data. My command is

sem trust->std_retweet std_favorite std_quote,latent(trust) nocapslatent.

The model fails to converge.

I learned from the manual that the problem is possible due to the starting value for some variables. As all three variables have been standardized, I tried an initial value 20 percent large than is expected.
sem trust->std_retweet std_favorite std_quote,latent(trust) nocapslatent var(e.std_retweet, init(1.2)). It still does not solve the problem of failing to converge. Could anyone offer some tips on troubleshooting?

Removing specific observations

Hello everyone,
Based on my title post it seems the question is repeated but no it’s different.
my question is:
I have dataset file consists of a number of independent directors variable. I have created a dummy variable coded 1 if there is at least one independent director on the committee.
I need to remove the firms that have no independent directors on the committee. Maybe it seems to be easy by dropping the observations== 0 ( using either the dummy variable or the variable of number of independent directors). But, I Need to remove those firms with no independent directors over the study period (2000-2006). This means that the remaining observations should consist of a firm that ( for example) did have no independent in the year 2000 then it appointed independent directors in 2001 or 2002.
Because I want to compare between the firm performance before appointing independent directors and after.
kindly who can help me out to do this and create dummy variable coded 1 after appointing independent directors zero before.

please please help.
thank you in advance

I need to transpose every row of data, not including the first column, into each corresponding value of the first column.

Here is an example of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 Equity double(B C D E F G H)
"ENSV US Equity" .1394 .1171              .2198              .1531              .1725               .205              .1387
"GNMX US Equity" .1659 .1659              .1659              .1659              .1659              .1659              .1659
"SITO US Equity" .1888 .1912                .16              .1799               .205                .17                .15
"AXAS US Equity"   .19  .121              .3147              .1942              .2334               .202              .1919
"ZOM US Equity"    .22 .1897                .17              .1577              .2379              .1581              .1087
"MVIS US Equity" .2542 .1725              .3499              .8799 1.3599999999999999               2.02               1.58
"XELA US Equity"  .257  .205               .355              .3248              .5542              .5103              .4279
"TTNP US Equity" .2631  .233              .2585              .2833              .3062               .291              .2308
"GNUS US Equity" .2815 .2822              .2991               2.05               2.25               1.46               1.07
"NTEC US Equity" .2988  .197                .25               .287              .2833              .4187               .308
"KMPH US Equity"  .303  .225                 .3              .1876              .2868               .416              .7299
"TRNX US Equity" .3031 .1601              .1951              .0806              .0848              .0665               .034
"HTBX US Equity"   .33   .57              .5698               1.02              .8427               2.17               1.26
"TGB US Equity"   .351 .2671               .344              .3971               .495               .645                .96
"WTRH US Equity" .3594  1.23               1.34 2.4699999999999998               2.63               5.34               4.03
"IDEX US Equity" .3605  1.34              .6024              .3938               2.01               1.47               1.28
"NXTD US Equity" .3699   .35                .46              .3891              .5041               .487               .386
"TMDI US Equity" .3702   .27               .297              .2668               .835                .92               .787
"ASM US Equity"  .3867   .34              .4048                .69                .81               1.06               1.18
"IFMK US Equity"  .389  1.49               1.29               1.06               1.07              .9274              .9384
"UAMY US Equity"   .41 .3306                .36              .3339                .49              .4001              .4992
"DFFN US Equity" .4122 .3166               .503                1.3               .975               1.15              .9795
"ACST US Equity" .4281 .3795                .61               .633              .4691                .76              .2447
"ONTX US Equity" .4298  .303              .3149              .4176              .5661               1.15              .2407
"UAVS US Equity" .4458 .4107               1.45               1.19               1.19               2.77                3.1
"GRNQ US Equity"  .449 .3763                .37               1.45 1.6600000000000001              .8752               1.11
"OCGN US Equity"   .45 .2811               .313                .31              .2204              .5151              .3482
"GLBS US Equity" .4501 .6051              .7186              .6666               .267               .143              .1352
"GPL US Equity"  .4581 .3082              .4684              .4334                 .5              .8198               .988
"AIHS US Equity" .4601 .4377               .375              .4086              .7358              .6819               .461
"WYY US Equity"  .4624  .366              .4395                .57              .6955              .7054               .515
"CHFS US Equity"   .47   .44              .4058               .375              .4788              .8493              .3672
"NAKD US Equity" .4722   .54              .7225               .625              .6526              .4529                .26
"LPCN US Equity" .4965   .48              .5537              .9136               1.26               1.54               1.48
"LODE US Equity"    .5 .4051              .5115              .5779              .9499               .805 1.1400000000000001
"PTN US Equity"  .5107 .4236              .4845              .5069               .512              .5766              .5527
"AYTU US Equity"   .53   1.5 1.6400000000000001               1.49               1.42 1.3900000000000001               1.06
"CBL US Equity"  .5308 .2001               .289              .3004              .2726              .1828               .186
"SPHS US Equity" .5354   .27              .2567              .0093              .0092              .0104               .011
"WRN US Equity"  .5466 .3943               .657              .7901              .8719               1.15               1.28
"TEUM US Equity"   .56  .412              .6338              .4361              .6201               .697                .68
"ADMP US Equity" .5652 .3573                .52              .4931               .537               1.18                .66
"JAGX US Equity"   .57 .4801               .456              .4831              .4849              .6538                 .4
"HTGM US Equity"   .58  .325                .37              .5299                .72              .6639                .37
"ALRN US Equity" .5847  .332              .5458               1.62               1.18                .88               1.28
end

Variables B,C,D,E,F,G and H are dates. I need to create a single date variable, which has the same 7 dates for each US Equity, and a single column for the stock price, which is what the number values represent in the data example. Here is an example of how the first few rows would look like:

Equity	Date	Stock_Price
ENSV US Equity	B	0.1394
ENSV US Equity	C	0.1171
ENSV US Equity	D	0.2198
ENSV US Equity	E	0.1531
ENSV US Equity	F	0.1725
ENSV US Equity	G	0.205
ENSV US Equity	H	0.1387
GNMX US Equity	B	0.1659
GNMX US Equity	C	0.1659
GNMX US Equity	D	0.1659
GNMX US Equity	E	0.1659
GNMX US Equity	F	0.1659
GNMX US Equity	G	0.1659
GNMX US Equity	H	0.1659

Any help will be much appreciated!!

I use qreg, then margins, then mplotoffset: how do I change the spacing for my y-axis?

Hi Stata community,

I want to change the spacing for my y-axis. Currently I have 0.5 cm between one outcome value of y and another outcome value of y (let's say 0.5 cm between 5 and 6).
I would like to change the spacing from 0.5 cm to 1.0 cm between all outcome values of y.

How can I do that?

I would greatly appreciate your help.

Thanks,

Nico

Probit Question

Hi all,

If I have a panel of people from ages 25-30, who I may observe more than once in these ages, and I want to look at an independent variable that is "Ever used welfare" can I use a probit? The variable is 1 if you used welfare at that age ever, and 0 otherwise. My worry is that it is double counting. For instance, say that you used welfare at age 23. You will show up as a 1 if I observe you at age 25, and also as a 1 again when I observe you at age 28. Is this an issue?
Thanks!

Stata tables to word format by writing matrix command

Hi.
I am using a dataset that has 3 years. I am using the command tabulate to do analysis. So I am writing, say, tabulate work_availability education_category if year==2004. I do it for other two years as well.
I want to export the results in table format in word file for all three years in one table. Is there a way to write matrix command? if so, how to write? Or what other command I can use to get the table?
Please help. Thanks

Custom STS graph axis with spacing

Hi,

This is my first post on STATALIST

I am trying to create a custom y axis scale using Sts graph for a Kaplein Meir.

Essentially on the y axis I want to label (min) 0, 0.001, 0.01, 0.1 (max) - but have these equally spaced out on the axis

Any help will be gratefully received

Many thanks

RDRobust Question

Hi all,

I am trying to use a fuzzy regression discontinuity strategy in some data work of mine. I have a binary dependent variable that takes the value of 1 if you are eligible for a program, and 0 otherwise. I want to run the estimates with CCT bandwidths, but I keep running into an error about the cutoff value not being in the range of my dependent variable. Is there something I am missing here? Does Rdrobust not work with binary dependent variables?
Thanks!
L

Table from string and numeric variables

Hello,
this is a general question about the best approach. I am using stata 16.1

I want to make a table from one string variable and several numeric variables for e.g. the first 50 observations (showing single observations not summary statistics). I want to use the string variable as row labels. There should be many customization options.
I have tried several approaches but neither seems optimal.

The list command does in principle what I want to do but with very limited customization options. I want to use variable labels instead for the variable names, different formats and column width for the numeric variables. If possible customized placement of separating lines would be nice (customization a little bit like in Excel).
I came closest by using matrices but labeling the row names is challenging and the maximum length of the label expression is limited (as I understand it).

Which approach would you suggest?
Thanks!

Is there a command for Black Scholes model for American options?

I realise there is

Code:

bsopm

for European Option, but I wonder, is there a command for American dividend-paying option?

Cox-regression:proportional hazard test by Schoenfeld residuals

Hi!
I have performed cox-regression on a data set with both continuous and categorical variables. In the cox-regr. command I specified categorical values by adding i. before variable. Shoenfeld residuals were saved by following command stcox i.spiders age i.sex i.ascites albumin bilirubi i.edema1 choleste i.stage,schoenfeld(sc*) scaledsch(ssc*). When assessing proportionality I used estat phtest, detail. All these went well. When I attempted to make plots with Schoenfeld residuals, I used estat phtest, plot(var) command which went well for continuous variables such as age, albumin etc., but when assessing categorical variables such as sex estat phtest, plot(sex), I got error response that this variable is not found in model. Could you give an advice what is wrong?

Create new variable with sums of occurrences

Hi Statalist Community

It is my first post here and I hope it is not a question that is too trivial. Unfortunately I have not found any old posts concerning this issue.

I have panel data with over 160'000 obsevations and around 1'500 variables (unbalanced).
Now I would like to create the variable "length" which measures the length of unemployment.
This is how it should be constructed: every occurence of 1s in "unemployed" needs to be summed up. This value then needs to be associated to the new variable "length". It needs to take this value for the first obervation of the occurrence we are summing up. "Length" is the variable I need to create.

I have so far tried some commands with: bysort and sum/count. Unfortunately without any success.

Can anyone give me advice on this? How would you solve it?

I hope this table shows even better what I am trying to do...

Thank you!

id	year	unemployed	length
1	2000	0	0
1	2001	1	3
1	2002	1	0
1	2003	1	0
2	2000	1	1
2	2001	0	0
2	2002	1	2
2	2003	1	0
3	2000	0	0
3	2001	1	2
3	2002	1	0

Adjustments to Esttab Code to Suppress Superfluous Statistics and Display Labels

Hello --- I'm hoping to get some assistance adjusting my esttab code (pasted at bottom) to make the following adjustments to my regression output table as it currently appears below:

Suppress the three "var" statistics reported below the coefficients
Create and display labels for the p-value stats reported at the bottom (e.g., for dnocost I would like "D-No # Private Cost = 1" as the text in the leftmost column)
(minor thing!) I have a "$" figure in the notes at the bottom, but when exported to Latex the tex file reads it as the opening of math mode --- how can I prevent this?

Thanks in advance!
Array

Code:

* Model 1a: Bid as dependent variable
eststo m1a: metobit bid i.treatment#c.randomcosts invperiod || subjectid: , ll(0) ul(upper) vce(cluster uniquegroupid)

mat list e(b)
    test _b[4b.treatment#c.randomcosts] = 1
        estadd scalar dnocost=r(p)    
    test _b[5.treatment#c.randomcosts] = 1
        estadd scalar ddisccost=r(p)    
    test _b[6.treatment#c.randomcosts] = 1
        estadd scalar dundcost=r(p)    
    test _b[4b.treatment#c.randomcosts] = _b[5.treatment#c.randomcosts]
        estadd scalar dnoddisc=r(p)
    test _b[4b.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
        estadd scalar dnodund=r(p)
    test _b[5.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
        estadd scalar ddiscdund=r(p)



* Model 1b: Bid Inflation as dependent variable
eststo m2a: metobit bid_above_cost i.treatment#c.randomcosts invperiod || subjectid: , ll(0) ul(upper) vce(cluster uniquegroupid)
    test _b[4b.treatment#c.randomcosts] = 1
        estadd scalar dnocost=r(p)    
    test _b[5.treatment#c.randomcosts] = 1
        estadd scalar ddisccost=r(p)    
    test _b[6.treatment#c.randomcosts] = 1
        estadd scalar dundcost=r(p)    
    test _b[4b.treatment#c.randomcosts] = _b[5.treatment#c.randomcosts]
        estadd scalar dnoddisc=r(p)
    test _b[4b.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
        estadd scalar dnodund=r(p)
    test _b[5.treatment#c.randomcosts] = _b[6.treatment#c.randomcosts]
        estadd scalar ddiscdund=r(p)

// export for latex
esttab m1a m2a using "$latex/reg_ind_bidding_discriminatory_only_tobit.tex", replace  eqlabels(" " " ") stats(dnocost ddisccost dundcost dnoddisc dnodund ddiscdund) label b(%10.4f) se star(* 0.10 ** 0.05 *** 0.01) title("Tobit Model of Individual Bidding Behaviour") mtitle("Bid" "Bid Inflation" ) addnotes("Random Effect at Individual Bidder Level with clustering at Auction Group level" "Upper Limit of \$7.07 for Bid Cap Treatments" "Excludes 230 observations of those bidders in bid cap treatments with private costs $>$ $7.07$/unit")

Friday, October 30, 2020

Problems converting string variable to date/time format

Hi,

I have a date/time variable that is a string and looks as follows: hh.mm.ss DD.MM.YYYY.
I'd like to convert this variable to time/date format. What I tried was:

Code:

gen start2 = clock(start, "hms DMY")
format start2 %tc

While this code doesn't give me an error, it also doesn't give me the correct times: Array

Anyone could help?

Thanks a lot!

Convert 1K to 1000 and 1M to 1000000

Hello everyone, I need help with converting a particular variable to numbers. The variable in question is the amount of damage done by different disasters. The values were entered as for example 1K (Meaning $1000) and 1M (Meaning $1000000). I will want to convert these values to just numbers. Is there a way to do this in STATA.
PS: As the variable is continuous there are different observations with the "K" suffix e.g. 9K, 12K etc. Also there are different observations with the "M" Suffix.
I will really appreciate your insight on this, as I have a very large data set
I use STATA16

Mixed logit and ordered outcome models (help please)

Good evening. My name is Sean and I am a current graduate student, in an advanced econometrics class. I am having trouble with formatting my data, so that I can create a mixed logit and ordered outcome model. Most recently, I have been receiving the error "Variable has no within-group variance." I have been through the help section in STATA, talked to some classmates, and tried to hire a tutor online, but no one seems to be able to help me, for some reason. I am hoping that someone in these forums can walk me through it. Thank you for reading and responding.

I have my data in a wide format, as a .csv file. These are the models that I am having generating:

Mixed logit model
- Dependent variable: camera brand (1,2,3)
- Independent variables: age, sex, income, price of camera (modeled as a random variable), choice of other camera brands
Ordered choice model
- Dependent variable: camera brand (1 = budget camera,2=prosumer (middle of the line) camera,3=professional camera)
- Independent variables: age, sex, income, price of camera

And here is a snapshot of the data that I need to reshape: Array

e

Syntax to know the menu path of the commands that we use in the do file

Good night
I would like to know if there is a syntax to know the menu path of the commands that we use in the do file

Thank You

Combining observations with the same year and id in a single dataset

I have some data that looks like this:

no.	id	year	var1	var2	var3
1	1	2000	50	.	.
2	1	2000	.	.	10
3	2	2001	.	.	500
4	2	2001	200	.	.
5	3	2002	.	300	.
6	3	2002	.	.	100

I want to combine all observations which share an id and year so that the data looks like this:

no.	id	year	var1	var2	var3
1	1	2000	50	.	10
2	2	2001	200	.	500
3	3	2002	.	300	100

Any help is appreciated.

generating variables with "or"

Hi, I wish to create a new variable given certain parameters of 3 other variables.
I have tried the basic creation of a new variable commands "gen highscore = . / replace highscore = 0 if (varA <1)
[I can get that far, but what I need is something essentially like the following - but the syntax doesn't work):

replace highscore = 0 if (varA <=1) or if (varB <=10) or if (varC >= 10)

Is there a way to fix this syntax?

Thanks for any and all help - I am sure the answer is pretty straightforward.

Histogram with density of 15??

Hi everyone,

I would like to know if I'm doing something wrong.

I am running almost 200 regressions basically and collecting the coefficients for each one over all of them, so that I have a .dta which consists solely of the b_ of the variables. I've succeeded in doing this, here is an example of my data for vark:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float b_vark
 3.396698
 3.379129
3.4008584
3.4163265
 3.407227
3.3611805
 3.410911
  3.40948
3.4045825
 3.382801
 3.405709
 3.418393
 3.411751
 3.360613
 3.416084
 3.410531
 3.406611
 3.385547
 3.394252
 3.389984
  3.33743
 3.396989
  3.38991
 3.384635
 3.424929
3.4166055
 3.357182
3.4244335
  3.41926
 3.410072
 3.429587
 3.371039
 3.437761
3.4308956
3.4213645
 3.359139
 3.425608
3.4222975
 3.413559
 3.372591
  3.36899
 3.363826
 3.431501
3.4220905
  3.42228
 3.370989
 3.391934
3.4071674
 3.399939
 3.358832
   3.4016
 3.397689
 3.397151
 3.376958
3.3882046
 3.383614
3.3409314
 3.388817
3.3837104
 3.380788
3.4201446
3.4101994
 3.361287
  3.41551
3.4127526
3.4071155
 3.426149
 3.378766
 3.432141
 3.427403
 3.421279
 3.366856
  3.42038
 3.419267
3.4132764
3.3793986
3.3773935
3.3731976
 3.426004
3.4197905
 3.421866
3.3825076
 3.391777
 3.388213
 3.340173
 3.393364
  3.38452
 3.383398
 3.422355
3.4159694
 3.360905
 3.421646
 3.414381
 3.409625
 3.429239
3.3763766
 3.435365
 3.427185
 3.422666
 3.366118
end

However, when trying to create a histogram to check the variability of each one, I ran across into a problem which is the fact that my histogram density isn't within one, but goes over 15.

Code:

histogram b_vark

Array

Am I doing something wrong?

Just another question while I'm at it: the coefplot command (SSC, I think), doesn't work in this case right? I thought since these are regression coeficients it would look well presented all together, but it only presents me with the coefficient of the last regression and respective confidence interval it seems.

Sorry if my questions are fairly basic. Please forgive a fellow Stata beginner.

Thank you

First Differencing Error

Hi there,

I'm trying to first difference my variable on industrial production, but all I get are missing values. What am I doing wrong?

clear
input int DATE double INDPRO
6940 53.2837
6971 53.5675
6999 53.7364
7030 53.1571
7060 53.5566
7091 53.5534
7121 53.4808
7152 53.1195
7183 53.1786
7213 53.4617
7244 53.409
7274 53.4536
7305 53.7071
7336 53.7262
7365 53.5481

tset DATE, monthly
gen IP = L.INDPRO

Why is this yielding noting but missing values?

Help regarding creating variables for line + bar graph.

HI,

I know that this has been discussed earlier in numerous fora, but, somehow I am unable to figure out if a) this is right and b) is there a simpler less cluttered and confusing way to do this? So initial dataset includes all the subjects with the concerned event. I want to create a bar graph showing absolute numbers by calendar year which is not a problem

Year	age65
1995	0
1995	0
1995	1
1997	0
1997	1
1997	0
1997	1
1997	0
1997	1

Age65 is a categorical variable indicating if age greater than or lesser than 65 years. Now to the bar graph mentioned above I need line graph in the same bargraph showing % of the >65 years which had events that year. So I need to now make the total events by calendar year and then I need to create events by age group. I wrote:

Code:

bys failyear: gen failure =_n
bys failyear age65: gen fail=_n

This is giving me counts of total events per calendar year and among each age category by calendar year.
Now how do I get the proportion of failures in age65==1 out of total failures per calendar year and then use it to make a line showing percentage of events in that age group in the bar graph which shows total events?

While I tried:

Code:

replace failure=sum(failure)
replace fail=sum(fail)

I am not totally convinced that my method is right and would appreciate some help on this.
Thanks a lot.

Shalom

Standarization

Hello everyone. I have this dataset that looks like this. I want to create a variable that standarizes the variable "promp08" using the variables "recinto" and "carrera". See, this is a ficticious database about people from a university, and the goal is to create a variable that standarizes the average grade for every student for every major and in every hearquarter of the university, i just don't know how to generate a variable that computes the mean and standard deviation for "promp08" for each major. I'm using version 14 of stata and i'll be thankfull for any help that ypu can give me!

Array

Extracting matrix rownames for svy mean with multiple groups

Hi everyone!

I’m trying to estimate the difference in blood pressure between hypertensive and non-hypertensive individuals using a complex survey, but doing so by age-group and gender.

I’m having trouble with the matrix code for outputting the results of my svy command. Here is what I am writing currently:

svyset psu [pweight=samplewt_bp], strata(stratum)
svy , subpop(HTN1) : mean sbp_final, over(sex age5)

Survey: Mean estimation
Number of strata = 2	Number of obs	=	5,091
Number of PSUs = 258	Population size	=	2,439,648
Subpop. no. obs	=	1,645
Subpop. size	=	703,978.23
Design df	=	256


Linearized
Mean	Std. Err.	[95%	Conf.	Interval]

c.sbp_final@sex#age5
1 0	136.3088	5.042258	126.3792	146.2384
1 30	139.3294	1.698257	135.9851	142.6738
1 35	147.6446	2.197222	143.3177	151.9716
1 40	144.5797	1.80772	141.0198	148.1395
1 45	154.1876	4.811669	144.7121	163.6631
1 50	157.141	4.226026	148.8188	165.4632
1 55	155.97	2.916671	150.2263	161.7137
1 60	154.9218	3.047789	148.9199	160.9238
1 65	158.1695	2.623057	153.004	163.335
2 0	136.8597	2.010842	132.8998	140.8196
2 30	138.7637	2.321715	134.1916	143.3358
2 35	145.7804	2.492297	140.8723	150.6884
2 40	147.0154	2.922702	141.2598	152.771
2 45	157.0043	3.423124	150.2632	163.7454
2 50	159.3642	5.456736	148.6184	170.11
2 55	154.3976	2.330003	149.8092	158.9861
2 60	158.9386	4.417497	150.2393	167.6378
2 65	158.8922	3.301328	152.3909	165.3934

Now, trying to extract these estimates (which I will later merge by country and hypertension status) is proving to be a challenge.
matrix M = e(b)

mat li M
M[1,18]

M[1,18]
c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@
1.sex#	1.sex#	1.sex#	1.sex#	1.sex#	1.sex#	1.sex#	1.sex#
0.age5	30.age5	35.age5	40.age5	45.age5	50.age5	55.age5	60.age5
y1 136.3088	139.32943	147.64465	144.57965	154.18764	157.14103	155.96998	154.92182
c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@	c.sbp_final@
1.sex#	2.sex#	2.sex#	2.sex#	2.sex#	2.sex#	2.sex#	2.sex#
65.age5	0.age5	30.age5	35.age5	40.age5	45.age5	50.age5	55.age5
y1 158.16949	136.85968	138.76374	145.78036	147.0154	157.00432	159.36418	154.39765
c.sbp_final@	c.sbp_final@
2.sex#	2.sex#
60.age5	65.age5
y1 158.93856	158.89217

I’ve tried it many different ways but to no avail. What I would really like is a matrix with the one variable containing the strata specification and one variable containing the means. If anyone could help me figure out how to code this I would greatly appreciate it!

Margins after anova

I am requesting marginal means after running anova on StataSE 14, and am wondering why I am getting different values from the seemingly similar commands below.

#1

Code:

anova selfesteem census c.ses
pwcompare census, cimargins

Code:

anova selfesteem census c.ses
margins census

Please let me know if there is anything I can provide for more clarity.

Thank you in advance,
Julia

Import file with .file file extension

Hello,

I am trying to import data with a .file file extension and can't seem to figure out how. If it helps, I am trying to import the file in the ED2017.zip folder found at ftp://ftp.cdc.gov/pub/Health_Statist...atasets/NHAMCS . Any ideas? (Would this be better in SAS . . . ? :x )

Heteroscedasticity test for large panel dataset using XSMLE

Does XSMLE package have any command to check the Heteroscedasticity for large panel data set? My matrix size 2952 * 2952 which is too large for spreg package to handle. Can anyone help me in this regard?

loops

hello,
As part of a Monte Carlo study

I have the following population regression model:
Y= 262 -0.006X2-2.4X3+u

I was told to assume that u is normally distributed with N(0,42)
I have to Generate 64 observations for u
I have to combine the observations with the 64 observations on X₂ and X₃ in a table that I already have to estimate the corresponding sample regression model Betahat1 Betahat 2 and Betahat3 using the OLS method. Save the estimated coefficients.

Then they want me to repeat that 20 times. Is there a shortcut to create u, u1,u2...u20 where each of them is N(0,42) and gets added to my editor then I want a command that regresses my X2 and X3 and each of the Us on my Y. So I have to have 20 regressions. I wanna do that in one command. Is there a way? Please help.

margins- marginal effect of time-variant variables in mixed effects models

Dear Statalist users,
I am using Stata 14 SE, and have a question about using margins for a time-variant variable in a mixed-effects model.
My dependent variable Y, and three independent variables (X) are measured at two time-points. While X1and X3 are measured in each district at time 1 and time 2(wave), X2 is measured at the province-level at time 1 and 2. Districts are nested in provinces. Basically I have multi-level repeated measures data. My control variables (Z) are time-invariant. Data example is below.

The code I use to model the effect of X1 X2 and X3 on Y is:

Code:

 mixed Y i.wave  i.X1 c.X3  c.X2   Z1 Z2   ||province:  ||district_no:

I look at the effect of the interaction between X1 and X2:

Code:

 mixed Y i.wave  i.X1##c.X3 c.X2   Z1 Z2   ||province:  ||district_no:

And what I would like to do is to plot how this interaction. X3 is measured in wave 1 and 2, and I just am not sure how to show the effect of the change in X3 in a graph. I tried:

Code:

margins X1, at (X3(-.15(.03).15) 
marginsplot

I am not confident this is capturing what I would like to show. I generated a variable called 'DifferenceinX3' where I subtracted the wave 1 values of X3 from wave 2 values.
When I use this variable in the model instead of X3, both the estimates and the marginsplot change completely. I am not sure why the coefficient of X3 almost doubles when I use the difference variable instead.
Should not their substantive effect be exactly the same? The direction of effect is the same but the coefficient almost doubles.
But more importantly, how does one go about plotting the marginal effects of two time-variant variables in a mixed model?

I appreciate your help.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Y byte wave float X1 double X3 float(X2 Z1 Z2 province) int district_no float DifferenceinX3
 .5008916001877053 1 0 .8836196467457053         0 0 0 1  1              0
 .5701738334858188 2 0  .891889725917615  .5954652 0 0 1  1      .00827006
.25843545684528696 1 0 .8517262864682598         0 0 0 1  2              0
.33096601673721704 2 0 .8558748788826122  .5954652 0 0 1  2    .0041485853
 .2314806361495061 1 0 .8671302188663923         0 0 0 1  3              0
.28064805600966664 2 0 .8749831214675643  .5954652 0 0 1  3     .007852902
 .4701195219123506 1 0 .8039459375485475         0 0 0 1  4              0
 .5468765275197967 2 0  .825685903500473  .5954652 0 0 1  4     .021739945
.45914704629798625 1 0 .8386024120303737         0 0 0 1  5              0
 .5308816595945309 2 0 .8622581288649511  .5954652 0 0 1  5     .023655705
 .5529871977240398 1 0 .9015807301467821         0 0 0 1  6              0
 .5972691441441441 2 0 .9105439383410197  .5954652 0 0 1  6     .008963187
.23223635003739715 1 0  .869795996674554         0 0 0 1  7              0
 .2716732739920712 2 0 .8649493081680801  .5954652 0 0 1  7     -.00484667
.42910860429108605 1 0 .8222902932232359         0 0 0 1  8              0
 .5013144023806029 2 0 .8456993069130976  .5954652 0 0 1  8     .023409005
 .4536984981126014 1 0 .8757469606429013         0 0 0 1  9              0
 .5270582609388699 2 0 .8802646998000965  .5954652 0 0 1  9    .0045177345
 .4836112708453134 1 0 .7833479404031551         0 0 0 1 10              0
 .5462431001464458 2 0 .8043728423475259  .5954652 0 0 1 10     .021024877
   .36644963615473 1 0 .8615101724805369         0 0 0 1 11              0
 .4484892121448794 2 0 .8800895139308706  .5954652 0 0 1 11     .018579356
 .2582014753593243 1 0 .8491196205853588         0 0 0 1 12              0
 .3228475641790513 2 1 .8528333602230218  .5954652 0 0 1 12    .0037137566
.35363741339491916 1 0 .8177992041628406         0 0 0 1 13              0
 .4731265652090156 2 0 .8134437035333794  .5954652 0 0 1 13    -.004355507
.32956786802940646 1 0 .8485232696897375         0 0 0 1 14              0
.37584912406149446 2 0 .8606675244077803  .5954652 0 0 1 14     .012144266
 .3377397403399747 1 0 .8314157170778479         0 0 0 1 15              0
.42843334243252795 2 0 .8406463605192804  .5954652 0 0 1 15     .009230647
 .5449415852219232 1 0 .8457498530453939         0 0 0 2 16              0
  .641944955764613 2 0 .8683048852266039  .3317993 0 0 2 16      .02255503
  .524244480400856 1 0 .7929118002416432         0 0 0 2 17              0
 .6770883478172743 2 0 .8163368642780467  .3317993 0 0 2 17     .023425037
 .5755950385517935 1 0 .8811443932411674         0 0 0 2 18              0
 .6560495938435229 2 1 .9165220744168112  .3317993 0 0 2 18      .03537767
 .7091292483254775 1 0 .6773241515002459         0 0 0 2 19              0
   .85121412803532 2 0 .7589862514493954  .3317993 0 0 2 19      .08166207
 .4849688681767829 1 0 .8204195205479452         0 0 0 2 20              0
 .5811804708578187 2 0 .8416793893129771  .3317993 0 0 2 20      .02125984
  .685989894350023 1 0 .7928177975148384         0 0 0 2 21              0
 .8277890608586036 2 0  .828457731311777  .3317993 0 0 2 21      .03563996
 .7121102248005802 1 0 .7838338895068595         0 0 0 2 22              0
 .8619329388560157 2 0  .872473077649726  .3317993 0 0 2 22      .08863921
 .9071300179748353 1 0 .8856816985436041         0 0 0 2 23              0
 .9593705293276109 2 0 .9326021581461171  .3317993 0 0 2 23      .04692047
 .5623674911660778 1 1 .8395382395382396         0 0 0 2 24              0
 .6620594333102972 2 0 .8542088516054382  .3317993 0 0 2 24     .014670635
 .5348067182412929 1 0 .9103541429696387         0 0 0 3 25              0
 .6653963139734789 2 0  .915298976671581 .28208148 0 0 3 25     .004944839
 .4125722543352601 1 0 .8892910634048926         0 0 0 3 26              0
 .4892944388561575 2 0  .900830606594513 .28208148 0 0 3 26      .01153956
 .5145569620253164 1 0  .868237347294939         0 0 0 3 27              0
   .62026913372582 2 0 .8601643069393463 .28208148 0 0 3 27    -.008073069
 .6256125821524903 1 0 .8756493401735875         0 0 0 3 28              0
 .7158580413297394 2 0 .8784253184098804 .28208148 0 0 3 28    .0027759855
 .4544952285283777 1 0 .8783187717363644         0 0 0 3 29              0
 .5412363492612542 2 1 .8783167145512929 .28208148 0 0 3 29 -2.0720697e-06
 .5304798962386511 1 0 .9205705009276438         0 0 0 3 30              0
 .6460984702403908 2 0 .9102711397058824 .28208148 0 0 3 30    -.010299353
.43370756482224004 1 0 .9102380952380953         0 0 0 3 31              0
 .5171763437963087 2 0 .9005827090022595 .28208148 0 0 3 31    -.009655378
 .4417435328386157 1 0 .8575885377549252         0 0 0 3 32              0
 .5016402405686168 2 0 .8635209235209235 .28208148 0 0 3 32     .005932394
 .4869785664899747 1 0  .778960223307746         0 0 0 3 33              0
 .5746084480303749 2 0 .7624526498389209 .28208148 0 0 3 33    -.016507579
 .4162415833503367 1 0 .8761123713139068         0 0 0 3 34              0
.47278770253427505 2 1 .8665964542741794 .28208148 0 0 3 34    -.009515888
  .559327566508895 1 0 .8404325464855598         0 0 0 3 35              0
  .660734327400994 2 0 .8594414893617022 .28208148 0 0 3 35     .019008964
 .5650262617035853 1 0 .9233479726279236         0 0 0 3 36              0
 .7066111111111111 2 0 .9333808336302102 .28208148 0 0 3 36     .010032884
 .6467490520994242 1 0 .9294468787705594         0 0 0 3 37              0
 .7727925586485193 2 0 .9328652917946467 .28208148 0 0 3 37     .003418416
.43885714285714283 1 0 .8626991565135895         0 0 0 3 38              0
 .4796839729119639 2 0 .8629751290473956 .28208148 0 0 3 38     .000275978
 .5349692529496572 1 0 .8505993873465352         0 0 0 3 39              0
 .6348095224320963 2 0 .8626212058616248 .28208148 0 0 3 39     .012021798
  .569474921630094 1 0 .8865721434528774         0 0 0 3 40              0
 .6965041965041965 2 0 .8881137465949106 .28208148 0 0 3 40     .001541624
 .5593326906149139 1 0 .8820998278829604         0 0 0 3 41              0
 .6688803780964798 2 0 .8722279220266751 .28208148 0 0 3 41    -.009871885
.24974731232197003 1 0 .8973921874433282         0 0 0 3 42              0
.30645011600928074 2 0 .9088443737344518 .28208148 0 0 3 42      .01145216
 .3044498656702591 1 0 .8084917045579219         0 0 0 4 43              0
.44136020068534704 2 0 .7534060943451509         0 0 0 4 43     -.05508561
.06365623500559792 1 0 .8332165995447383         0 0 0 4 44              0
.12957372298031639 2 1  .753286551785397         0 0 0 4 44     -.07993005
.06817504100193827 1 0 .8357064790727029         0 0 0 4 45              0
.11110193633623715 2 0  .765450680404053         0 0 0 4 45     -.07025579
 .2962797808638043 1 0 .7742774566473989         0 0 0 4 46              0
 .5701609574907139 2 0 .7253514252245217         0 0 0 4 46     -.04892602
.15349887133182843 1 0 .8223345320244921         0 0 0 4 47              0
 .3618196160925585 2 0 .7642843118005105         0 0 0 4 47     -.05805022
.07620412844036697 1 0 .8578691709844559         0 0 0 4 48              0
.16082134968218906 2 0 .7408955808864356         0 0 0 4 48     -.11697356
.12797713559860274 1 0 .8392732354996506         0 0 0 4 49              0
.20923184520340365 2 0 .7784266879037254         0 0 0 4 49     -.06084653
 .1355679965983984 1 0 .8142646558566731         0 0 0 4 50              0
.26414204902576993 2 0 .7482440990213011         0 0 0 4 50    -.066020556
end

how to calculate marginal effect?

Hi there,
Need some help calculating marginal effect.

Variable	Description
sid	Subject ID
age	Age
famsze	Size of the family
educyr	Years of education
totexp	Total medical expenditure
retire	=1 if retired
female	=1 if female
white	=1 if white
hisp	=1 if Hispanic
marry	=1 if married
northe	=1 if North-East area
mwest	=1 if Mid-West area
south	=1 if South area (West is excluded)
phylim	=1 if has functional limitation
actlim	=1 if has activity limitation
msa	=1 if metropolitan statistical area
income	annual household income (in 1000 dollars)
injury	=1 if condition is caused by an accident/injury
priolist	=1 if has medical conditions that are on the priority
totchr	# of chronic problems
suppins	=1 if has supplementary private insurance
hvgg	=1 if health status is excellent, good or very good

My question is 'What is the marginal effect of annual household income (income) on the total medical expenditure?' Given marginal effect is the slope of the regression, I am a bit confused on how to answer this question. Any help would be greatly appreciated.

Thank you!

Calculating percentiles in STATA

Hi, I am using the following loop to get the 0.5%, 1%, 2%, 5%, 95%, 98%, 99% and 99.5% percentiles for the variable "Return". However, STATA does not recognise the 0.5 and also the 99.5, I believe this is because these are not integers. Could anyone please help me, I would much appreciate this since I have been struggling for a while to get around this problem. I know I could get the 0.5 percentile and then use gen command to create my dummy variables, however I am required to use a more efficient way.

_pctile Return, percentiles( 0.5 1 2 5 95 98 99 99.5)
return list
local i = 1
foreach n of numlist 0.5 1 2 5 95 98 99 99.5 {
gen byte above`n' = Return >= `r(r`i')'
local ++i
}

Kind Regards,
Adrian

using DHS DATA SET for cluster fixed effect

Hello every one,

Please I am new in stata, I am trying to construct a cross sectional data set from a survey stata file and climate record of survey cluster in excel, then run a probit cluster fixed effect on the sample..
for instance, the excel file is:

cluster	temperature	precipitation
1	25.6	1.89
2	27	2.4
3	24.44	1.56
4 5 6	24.89 30 27.6	2.32 2.9 2

while the excel file is:

region	cluster	adoption	age	sex f=2, male=1
urban	1	1	10	2
rural	1	0	15	1
urban	1	1	14	2
rural	1	0	5	2
urban	2	1	13	1
rural	2	1	6	1
urban	2	0	7	2
rural	3	1	13	2
urban	3	1	4	1
rural	3	0	7	2
urban	3	1	4	2

I imported the excel to stata and saved it as a stata file, then i tried merging the two files using the command,
use "C:\survey\A.dta", clear sort cluster joinby cluster using "C:\temp\B.dta", unmatched(both) sortby cluster: probit adoption age i.sex temperature precipitation I am getting error messages. Kindly guide me please. Thank you.

estat vce, corr and vif

I apologise in advance if this has been posted before. Though I couldn't seem to find a discussion when I searched the forum.
I am using estat vce, corr and it appears that my two main independent variables of interest have a correlation of 0.68.
I realise when using vif a value over 10 is problematic, though I was wondering what the cut off is using estat vce, corr.
Also, in relation to this, models with year or dummies, age and age^2 controls for example would have high correlation as one would imagine.
How does one deal with that sort of collinearity?
Thanks

After splitting a variable that had options seperated by commas, how to find out which is the highest and the lowest frequency of values?

My data looks like this

fruit_variable has values
1. Mango
2. Mango, pineapple
3. Mango, grapes
4. Banana, grapes, chickoo
5. strawberry , mango, orange
7. orange, banana , mango

I want to know which fruit is produced the most to least. It is not possible to know that in such type of data seperated by commas.
So, I used the split command and seperated every option seperated by commas into different variable like the following

fruit_variable1 fruit variable2 fruit variable3 fruit variable4
mango
mango Pineapple
mango Grapes
Banana Grapes Chickoo
Strawberry Mango orange
Orange Banana Mango

Now I created new variables after every fruit---- fruit_mango, fruit_pineapple and so on.
I did
gen fruit_mango=0 if fruit_variable1=="Mango"
replace fruit_mango=0 if fruit_variable2=="Mango"
fruit_mango=0 if fruit_variable3=="Mango"
and so on

But, I don't think, this is smart coding. Can it be done any other way?

is there a command to include all the fruit_variable_n(series) variables together instead of writing it individually every time? Doesn't " * " symbol denote the "n"(value that a variable takes after splitting)?
How do I do this?

Renaming variables using the 'foreach' command

I am trying to rename multiple variables corresponding to baseline characteristics, endline characteristics and endline2 characertistics into wave1 wave2 and wave3 so that I can reshape the data into a long format based on these three time periods.

I used the following codes however it does not seem to work despite multiple tries at reformatting it, STATA gives me an error message everytime:

local vars "asset_tot_value iagri_month ibusiness_month ipaidlabor_month ranimals_month ctotal_pcmonth"

foreach k of local vars {

rename `k'_bsl `k'1
rename `k'_end `k'2
rename `k'_fup `k'3

}

Is there something wrong with how I am coding this?

Poisson-CRE and over identification test

Hi all,

I am trying to perform an over identification test on the following theoretic scenario. Say that I have a balanced panel where N = 208 and T=10. Let's call z = (z_1, z_2) the instrument matrix, x(NT,1) the endogenous being instrumented, w the other feature (NT, d_W). Now, according to Wooldridge et al. procedure, I would like to compute:

Code:

xtreg x z w i.Year, fe
predict double residuals, e
xtpoisson y x residuals w i.Year, fe

My aim is to test for over identifying restrictions in such scenario. One thing I thought is to manually compute the Hansen test:

1) estimate the entire procedure with all instruments;
2) save the residuals;
3) regress the residuals on the instruments alone (xtreg, fe ???)
4) Now N*R^2 would be Chi squared, but how to save N*R^2 from xtreg, fe? And how, in the case we manage to do it, to predict p-values?

As you can see the procedure lacks some steps. So, can you please either advice me on steps 3) and 4) or provide a more elegant (maybe a command or standard procedure) way to perform overidentifiication test on Poisson-CRE procedure please?

Thank you,

Federico

xtivreg2: interacting two instumented variables (gmm)

Hello,
my baseline model is the following:

Code:

 xtivreg2 y l.y $controls trend (x1= z1 z2),fe gmm robust bw(1) //

Where x1 is my variable of interest and z1 and z2 are valid instruments.

I would like to expand my model to include an interaction between x1 and x2. x2 is another endogenous variable for which I have valid instruments (z3 z4). Does anyone know how can I interact with two instrumented variables using xtivreg2?

How can I specify the interaction with multiple instruments into xtivreg2?
The specification below is wrong but perhaps it is a starting point to address the question

Code:

xtivreg2 y l.y $controls trend (x1 x2 x1#x2= z1 z2 z3 z4 z1#z2#z3#z4),fe gmm robust bw(1) //

thanks a lot in advance for your help Best Regards

Control Function Approach and looking for a method to do robustness check of fixed effect

I'm applying the control function (CF) approach in a nonlinear model using a panel dataset of five waves (wave 1 - 5). I have run a fixed effect and random effect, the Hausman test revealed that the fixed effect is accepted.
Then, I want to do a robust check of my result - I decided to run 2SLS, but my regressor is not statistically significant in the second stage. I attempted the control function approach, which is the alternative to 2SLS. First, it was statistically significant, but when I run it again it is not statistically significant.
Can someone help check if I have missed any step, and what method can I use to do a robustness check of fixed effect? I have tried GMM, but it was not making sense.
Kindly note: both the dependent variable and regressor are continuous, even IV (diffage - the difference between the age of male and female in the household).

2SLS
First stage
reg Autonomy4 diffage age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant

estat endog

estat firststage

Second stage
**** 2SLS
ivreg2 stunting age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum religion dwel african indian white parent grandparent uncle_ant (Autonomy4 = diffage)
or
ivregress 2SLS stunting age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum religion dwel african indian white parent grandparent uncle_ant (Autonomy4 = diffage), first

Control Function Approach
First stage
reg Autonomy4 diffage age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant

test diffage

predict error1hat, residual

Second stage
ivreg2 stunting Autonomy4 age sex_c cageN hhsize i.femaleedu hhincome healthdum Qlifedum emplstatusdum i.religion dwel african indian white parent grandparent uncle_ant error1hat

To show the effect: I specified below graph or plot
margins, at(Autonomy4=(-1.02(0.01)1)) atmeans
marginsplot

Thanks for your help.

Thursday, October 29, 2020

How will I be able to have a count of certain string variable when using collapse command ??

This is how my data looks like.....

Student Name	SchoolName	Location	NScore	EScore	MScore
A	X	5	1	2	1
B	X	5	1	2	2
C	X	5	2	2	2
D	X	5	2	1	2
E	X	5	1	1	1
F	X	5	2	2	2
G	X	5	2	2	2
H	Y	5	2	2	2
I	Y	5	2	2	1
J	Y	5	2	4	3
K	Y	5	1	3	2
L	Z	5	4	2	1
M	Z	5	5	2	3
N	Z	5	1	1	1
O	Z	5	2	1	2
P	Z	5	3	2	1
Q	Z	5	4	4	4
R	Z	5	1	2	1
S	Z	5	2	3	1

What I would like to do is:

make a table that can display School Name, Location and mean score of NScore, EScore and MScore

for which

code
collapse (mean) NScore EScore Mscore Location, by(SchoolName)

works fine

And the resulting table looks Similar to

SchoolName	Location	NScore	EScore	MScore
X	5	2 (mean values)	3	1
Y	5	1	2	3
Z	5	3	2	2

I was wondering if we can display number of students of each school in the table after which table will look similar to

SchoolName	Location	NoofStudents	NScore	EScore	MScore
X	5	7	2	3	1
Y	5	4	1	2	3
Z	5	8	3	2	2

Thank you all for the help.