BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Tuesday, December 31, 2019

Find first stage F-stats under xtivreg with factor variables (so no xtoverid)

It seems to be well-documented (here: https://www.statalist.org/forums/for...port-fvvarlist or here: https://www.stata.com/statalist/arch.../msg00707.html) that xtoverid does not work when factor variable are included in a regression using xtivreg.

I am using factor variables in an xtivreg regression, and I would like to know the first stage F-stat for my excluded variables. Is there any way to do this w/out using xtoverid?

If there is no post-estimation command that works to do this, I can of course separately run what I think is the 1st stage, and test my excluded variables myself. From page 20 of the manual (https://www.stata.com/manuals/xtxtivreg.pdf) it looks like I would first (a) remove all fixed effects using xtreg, then (b) run a 2SLS regression of my 1st stage using ivreg or ivreg2. Does anyone know if this is indeed the best manual approximation of the first stage of xtivreg?

Panel data - Creating a date variable from year and weeknumber as string

Stata listers

I am writing with a query relating to panel data for historical prices. I am trying to create a a date variable from an Excel file which contains year and weeknumber as a string. Is there a way to convert information available - year and week numebrs (as string) into Stata or Excel recognisable dates? Thanks very much.

year	weeknum	Price 1	Price 2	date
1890	2nd week in Jan	76	90
1890	3rd week in Jan	76	90
1890	4th week in Jan	76	90
1890	2nd week in Feb	76	90
1890	3rd week in Feb	76	90
1890	4th week in Feb	76	90
1890	2nd week in March	76	90
1890	3rd week in March	80	94
1890	4th week in March	80	94
1890	5th week in March	80	94

I am not able to attach this data in .dta format for some reason. I am using Stata MP 16.

Margins: trouble with continuous interactions under simultaneous fixed effects

Sometimes I wish to control for variable X via fixed effects (say, year fixed effects) but also allow the marginal effect of a second variable to vary continuously with variable X (say, the effect of adopting a new technology might vary linearly or non-linearly with year). In these situations, I am NOT interested in allowing the marginal effect of that second variable to change with *every* value of variable X --- this would waste power, as I believe that the marginal effect of the second variable varies smoothly with variable X.

Stata can run this regression: reg Y X1 i.X1#c.X2 i.X2. However, while a coefficient is calculated for both X1 and i.X1#c.X2, margins is for some reason unable to obtain the marginal effects of X1 over X2.

I have had this problem several times, and right now I'm having this problem in a situation where I have other fixed effect accounted for, and so am using xtreg. However, the problem is generalizable to a situation where one is using reg only. I have replicated the problem in the auto dataset, below, and would be incredibly grateful for thoughts on what's going on.

Code:

sysuse auto, clear
xtset foreign
gen lprice=log(price)
gen HIGHmpg=mpg>25

** Reg 1: This works fine
xtreg lprice i.HIGHmpg i.turn
margins, dydx(i.HIGHmpg)

    ** Works fine w/ no interaction

** Reg 2: This does not work
xtreg lprice i.HIGHmpg i.HIGHmpg#c.turn i.turn
margins, dydx(i.HIGHmpg) at(c.turn=(32 36 40 44 48 52))

    /* Command does not run. Error returned:
        c.turn ambiguous abbreviation
        r(111); */

** Reg 3: This "trick" also doesn't work
gen test=turn
xtreg lprice i.HIGHmpg i.HIGHmpg#c.test i.turn
margins, dydx(i.HIGHmpg) at(c.test=(32 36 40 44 48 52))

    /* Command runs, but interactions deemed "not estimable" */

** Reg 2 w/ reg instead of xtreg
reg lprice i.HIGHmpg i.HIGHmpg#c.turn i.turn
margins, dydx(i.HIGHmpg) at(c.turn=(32 36 40 44 48 52))

    ** Same error given
    
** Reg 3 w/ reg instead of xtreg
reg lprice i.HIGHmpg i.HIGHmpg#c.test i.turn
margins, dydx(i.HIGHmpg) at(c.test=(32 36 40 44 48 52))

    ** Interactions still deemed "not estimable"

** Note: it is possible to allow an interaction between i.HIGHmpg and EVERY
** value of test, as below, but this is not what I want to do, as it wastes power.
** In my own examples, it is helpful to do this because I can see a linear or
** non-linear pattern in the marginal effects, but then I ultimately want to run
** the model allowing only a continuous change in the marginal effects.
xtreg lprice i.HIGHmpg i.HIGHmpg#i.turn i.turn
margins, dydx(i.HIGHmpg) over(i.turn)

Please help: Importing and merging multiple sheets from an excel file while renaming variables using loops

Hello,
I am quite new using loops and really want to understand it better.
I have an excel file with 11 sheets, I only need a couple sheets from it and need to rerun it everyday with new data but the same variables so am I am trying to write an efficient script to be able to complete what I need to do. For each of the sheets there are patient identifiers but they come in with different column names which is making it difficult to merge when importing within one loop.
If the column that I want to merge on is column_A, but in each sheet is called "column_AA" "column_AB" "columnAC" respective for the Sheet A, Sheet B, Sheet C
What I have so far to import the data in.

Code:

local sheets "Sheet_A Sheet_B Sheet_C"
foreach y in `sheets'{
    import excel using "data_set.xls", sheet(`y') firstrow clear
    save "`y'.dta",replace 
    }

How might I be able to add in a command to rename the column names to a similar one so then I can merge them all?

I was thinking of adding a loop inside it or a second loop after, but then the correct column_AB wouldnt match up with the correct sheet.
this is what i was thinking but doesnt really work

Code:

local variable "column_AA column_AB columnAC" 
        foreach t in `variable'{
        use "`y'.dta", clear
        rename `t' column_A
        }

Thanks for your help/advice/response!

-Ben

Panel data regression

Hello everyone,

I'm writing my thesis and I'm struggling with the processing of my data. First of all, my research question is: "What is the effect of environmental controversies on the profitability of Chinese and European firms?" and I want to check for moderation of corporate environmental performance, press freedom of the country of origin of the firm and ownership structure (concentration and state ownership). My dependent variables are ROA, ROE and Tobin's Q. My independent variables are environmental controversies (EC), corporate environmental performance (CEP), press freedom (PF), ownership concentration (Independence), and state ownership (GUO). My control variables are firm size, leverage and industry.

I have collected my data from Eikon and Orbis. I opted for a balanced dataset (so there are no more missing values), and this dataset consists of 314 firms (64 Chinese, 250 European)
My variables are:
- id (1 until 314)
- Year (2013-2018)
- Country (Europe or China)
- Industry (10 categories)
- Independence (A+ until D)
- GUO (e.g. Public authority)
- EC (dummy --> 0: no controversy in that year; 1: controversy in that year)
- CEP (score out of 100)
- PF (score out of 100)
- ROA
- ROE
- Tobin's Q
- Firm size
- Leverage

I made dummy variables for Country (DummyChina and DummyEurope), BvDIndependenceIndicator (DummyLowConcentration, DummyMediumLowConcentration, DummyMediumHighConcentration and DummyHighConcentration), GUO Type (DummyStateOwnership), Industry (DummyIndustry1, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10).

Also, the variables EC, CEP and PF are lagged, as I want to measure the effect of the occurence of an environmental controversy on the profitability of the next year.

When I first started my regression, I used SPSS. However, I read that Stata is a much better alternative for panel data. I was able to upload my data in Stata, and did some tests to check whether I need: pooled OLS model, fixed effects model or random effects model. The result pointed out that I need to use REM. I was able to regress my first model, only using ROA as my dependent variable and EC, Firm size, leverage, DummyChina, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10.

My questions:
- If I want to compare Chinese and European firms, is this the right standard model? Or do I have to start with just ROA, EC and the control variables and then make interaction terms for Country and Industry?
- If I later make interaction terms for Country, Industry, CEP, PF, GUO and Independence, can I add all these in just one regression? I do I have to add them separately and make multiple regressions?

Quite frankly, I'm a bit lost. I have never used panel data or Stata, and I have no idea what the right order is to answer my research question and check for moderation. My main struggle is the interaction terms.

If anyone has suggestions or could tell me the steps I have to follow, please let me know. Thank you in advance!!

Time varying covariate in Cox Regression model

Hi all. After a thorough search online I can't seem to find a solution to my problem, which is why I'm now asking the experts

I'm doing a cox regression in 1175 subjects where I want to assess the effect of the dichotomous baseline variable X on the outcome Z. All subjects have variable X which is present since birth. In addition I have another dichotomous variable Y (which is more like an intervention effect) which is not present at baseline for any of the subjects, however some of the subjects get affected by (Y) event during the follow up at different dates, and this variable is known to be connected with outcome Z. I'm trying to know if variable Y increases the chance of occurrence of outcome Z in which (Z=1) among those who have the effect variable Y during their followup and those who don't.

So the "known" chain of events is X --> Y ---> Z . And I want to test X --> Z. But I still want to include the effect of Y in my model as some of the subjects will follow X-->Y-->Z.
So i thought - how can I include Y as a time-varying covariate so as not to underestimate the effect of Y but still assess if there is a direct correlation with X and Z.

Hope the question isn't to cryptic - I'll be happy to elaborate on the question.

Timevar for survival analysis

Dear All,

This might be a silly question, but it is driving me crazy.

I am managing data which were not recorded for survival analysis and I am trying to put them in a proper format.

For the purpose of my question, here my data (I have more variables, but they behave as Var1 and Var2, namely varying during time):

ID	Visit	Date	DOsp1	DOsp2	Sex	Var1	Var2
1	0	1mar2002			M	0	.
1	1	3jun2005			M	.	.
1	2	4feb2007			M	.	.
2	0	9feb2002	21dec2000	22jun2001	F	1	18.9
2	1	7sep2002			F	2	9999
3	0	25mar2003			M	0	20
3	1	13oct2004			M	2	9999
4	0	4oct2002			F	1	23.5
4	1	03may2004	4jan2003	24jun2003	F	.	.
4	2	13jan2006			F	.	.
4	3	25aug2007			F	2	9999

ID is my person identifier, who can be visited several times (Visit, 0 is the baseline) in different dates (Date is when the visit took place). Each person, during the visit, could say up to 9 dates (I do have DOsp1-DOsp9, but for the sake of this question I just put the first two) regarding if and when they were hospitalized between the visits.

I will use snapspan in order to convert my data to time-span data, but before I guess I need to slightly change my time variable (and the dataset overall).

I want to have a timevar like Time (see table below) in order to run snapspan ID Time.

ID	Visit	Date	DOsp1	DOsp2	Sex	Var1	Var2	Time
1	0	1mar2002			M	0	.	1mar2002
1	1	3jun2005			M	.	.	3jun2005
1	2	4feb2007			M	.	.	4feb2007
2	.	.	.	.	.	.	.	21dec2000
2	.	.	.	.	.	.	.	22jun2001
2	0	9feb2002	21dec2000	22jun2001	F	1	18.9	9feb2002
2	1	7sep2002			F	2	9999	7sep2002
3	0	25mar2003			M	0	20	25mar2003
3	1	13oct2004			M	2	9999	13oct2004
4	0	4oct2002			F	1	23.5	4oct2002
4	.	.	.	.	.	.	.	4jan2003
4	.	.	.	.	.	.	.	24jun2003
4	1	03may2004	4jan2003	24jun2003	F	.	.	03may2004
4	2	13jan2006			F	.	.	13jan2006
4	3	25aug2007			F	2	9999	25aug2007

This is the final dataset I want to obtain:

ID	Datestarts	Dateends	Sex	Var1	Var2	Event	Event_recode
1	.	1mar2002	M	0	.	Visit 0	0
1	1mar2002	3jun2005	M	.	.	Visit 1	0
1	3jun2005	4feb2007	M	.	.	Visit 2	0
2	.	9feb2002	F	1	18.9	Visit 0	0
2	9feb2002	7sep2002	F	2	9999	Visit 1	2
3	.	25mar2003	M	0	20	Visit 0	0
3	25mar2003	13oct2004	M	2	9999	Visit 1	2
4	.	4oct2002	F	1	23.5	Visit 0	0
4	4oct2002	4jan2003	F	.	.	Osp 1	1
4	4jan2003	24jun2003	F	.	.	Osp 2	1
4	24jun2003	03may2004	F	.	.	Visit 1	0
4	03may2004	13jan2006	F	.	.	Visit 2	0
4	13jan2006	25aug2007	F	2	9999	Visit 3	2

As you might notice, if any date recorded in DOsp1-DOsp9 happened before Visit 0, it will not be taken into account. Then Event_recode will be build in order to have the failure var for my stset (Event_recode will be 0 if the row is regarding a visit, 1 if it is regarding an hospitalization, 2 if the person dies, namely if Var1==2, and then 3 if it is censored).

All of that, in order to run the following code:

stset Dataends, id(ID) time0( Datastarts ) origin(time Datastarts ) failure(Event_recode==1 2 ).

Thank you to anyone who can help me, feel free to ask me clarifications.
Best

how to estimate individual betas of coefficients in a province-sector-year panel data (with 2 sectional identifiers)

Hello everyone:
I'm trying to estimate production functions for a panel data of manufacturing with 2 identifiers (province, sector) so that each sector will have observations of the different provinces. The first thing I do is to egen a new ID by group(province sector), but it leads to ignoring the unobservable common trend within each province, or sector apparently.

I was considering a fixed effect (LSDV) or a semi-parametric (e.g. Levinsohn and Petrin). The problem is:
(1)for the former fashion, how to correctly set factor variables;
(2)for the latter, how to correctly get betas of K and L for every province-sector section.

The attachment dataex.txt is a part of my data file.The models I thought were:
(1) reg lnYL_go lnKL i.prov_sec_id i.prov_sec_id#c.lnKL i.actual_year, vce(cluster prov_sec_id) (lnYL and lnKL not included, they are simply ln(Y\L), etc, assuming CRS.)
(2) prodest lnY_va, free(lnL) state(lnK) proxy(lnInt) met(lp) va acf id(prov_sec_id) t(actual_year)

I'm not trying to be a free rider, it's just that related references are rare. Any opinion or suggestion would be appreciated, and happy new year!

Discriminate analysis using stata

Hello everyone ;
i need to apply 'Discriminate analysis' on stata , how can i apply it and get both the standardized and unstandardized Discriminate function coefficients with structure matrix
I'm supposed to do like the pic

about synth instruction question

hello great master
I have a question
My stata version is 14
when I perform the instruction, it always displays the error message
what can I do?

////stata instruction///
xtset state year
replace age15to24 = 100*age15to24
synth cigsale cigsale(1988) cigsale(1980) cigsale(1975) lnincome retprice ///
age15to24 beer(1984(1)1988), trunit(3) trperiod(1989)

file synthopt.plugin not found <-------error message
(error occurred while loading synth.ado)

/////data////
clear
input long state float(year cigsale lnincome beer age15to24 retprice cigsale_cal cigsale_rest)
1 1970 89.8 . . 1788.618 39.6 . 120.08421
1 1971 95.4 . . 1799.2784 42.7 . 123.86316
1 1972 101.1 9.498476 . 1809.939 42.3 . 129.17896
1 1973 102.9 9.550107 . 1820.5994 42.1 . 131.53947
1 1974 108.2 9.537163 . 1831.26 43.1 . 134.66843
1 1975 111.7 9.540031 . 1841.9207 46.6 . 136.93158
1 1976 116.2 9.591908 . 1852.581 50.4 . 141.26053
1 1977 117.1 9.617496 . 1863.242 50.1 . 141.08948
1 1978 123 9.654072 . 1873.9023 55.1 . 140.47368
1 1979 121.4 9.64918 . 1884.563 56.8 . 138.08684
1 1980 123.2 9.612194 . 1895.2234 60.6 . 138.08948
1 1981 119.6 9.609594 . 1858.4222 68.8 . 137.98685
1 1982 119.1 9.59758 . 1821.621 73.1 . 136.29474
1 1983 116.3 9.626769 . 1784.8202 84.4 . 131.25
1 1984 113 9.671621 18 1748.019 90.8 . 124.90263
1 1985 114.5 9.703193 18.7 1711.218 99 . 123.1158
1 1986 116.3 9.74595 19.3 1674.4167 103 . 120.59473
1 1987 114 9.762092 19.4 1637.6157 110 . 117.58685
1 1988 112.1 9.78177 19.4 1600.8146 114.4 . 113.82368
1 1989 105.6 9.802527 19.4 1564.0134 122.3 . 109.66315
1 1990 108.6 9.81429 20.1 1527.2124 139.1 . 105.66579
1 1991 107.9 9.81926 20.1 . 144.4 . 104.3421
1 1992 109.1 9.845286 20.4 . 172.2 . 103.39474
1 1993 108.5 9.85216 20.3 . 176.2 . 102.69473
1 1994 107.1 9.879334 21 . 154.6 . 102.11842
1 1995 102.6 9.924404 20.6 . 155.1 . 103.1579
1 1996 101.4 9.940027 21 . 158.3 . 101.18421
1 1997 104.9 9.93727 20.8 . 167.4 . 101.78947
1 1998 106.2 . . . 180.5 . 100.9579
1 1999 100.7 . . . 195.6 . 97.59473
1 2000 96.2 . . . 270.7 . 92.13421
2 1970 100.3 . . 1690.0676 36.7 . 120.08421
2 1971 104.1 . . 1699.5386 38.8 . 123.86316
2 1972 103.9 9.464514 . 1709.0095 44.1 . 129.17896
2 1973 108 9.55683 . 1718.4805 45.1 . 131.53947
2 1974 109.7 9.542286 . 1727.9513 45.5 . 134.66843
2 1975 114.8 9.514094 . 1737.4224 48.6 . 136.93158
2 1976 119.1 9.558153 . 1746.8933 50.9 . 141.26053
2 1977 122.6 9.590923 . 1756.364 52.6 . 141.08948
2 1978 127.3 9.657238 . 1765.835 56.5 . 140.47368
2 1979 126.5 9.633533 . 1775.306 58.4 . 138.08684
2 1980 131.8 9.573803 . 1784.777 61.5 . 138.08948
2 1981 128.7 9.593041 . 1750.1112 64.7 . 137.98685
2 1982 127.4 9.5737 . 1715.4453 72.1 . 136.29474
2 1983 128 9.593053 . 1680.7794 82 . 131.25
2 1984 123.1 9.65044 17.9 1646.1138 93.6 . 124.90263
2 1985 125.8 9.675527 18.1 1611.448 98.5 . 123.1158
2 1986 126 9.705939 18.7 1576.782 103.6 . 120.59473
2 1987 122.3 9.705574 19 1542.1163 113 . 117.58685
2 1988 121.5 9.721532 18.9 1507.4504 119.9 . 113.82368
2 1989 118.3 9.73737 19 1472.7847 127.7 . 109.66315
2 1990 113.1 9.736311 19.9 1438.119 141.2 . 105.66579
2 1991 116.8 9.743068 19.9 . 146.5 . 104.3421
2 1992 126 9.788629 20 . 177.3 . 103.39474
2 1993 113.8 9.785142 19.7 . 179.9 . 102.69473
2 1994 108.8 9.813631 19.7 . 168.1 . 102.11842
2 1995 113 9.86446 19.5 . 167.3 . 103.1579
2 1996 110.7 9.885234 20.1 . 167.1 . 101.18421
2 1997 108.7 9.883107 19.8 . 181.3 . 101.78947
2 1998 109.5 . . . 187.3 . 100.9579
2 1999 104.8 . . . 206.9 . 97.59473
2 2000 99.4 . . . 279.3 . 92.13421
3 1970 123 . . 1781.5833 38.8 123 .
3 1971 121 . . 1792.9636 39.7 121 .
3 1972 123.5 9.930814 . 1804.344 39.9 123.5 .
3 1973 124.4 9.955092 . 1815.724 39.9 124.4 .
3 1974 126.7 9.947999 . 1827.1044 41.9 126.7 .
3 1975 127.1 9.937167 . 1838.4847 45 127.1 .
3 1976 128 9.976858 . 1849.865 48.3 128 .
3 1977 126.4 10.0027 . 1861.2454 49 126.4 .
3 1978 126.1 10.045565 . 1872.6255 58.7 126.1 .
3 1979 121.9 10.054688 . 1884.0057 60.1 121.9 .
3 1980 120.2 10.03784 . 1895.386 62.1 120.2 .
3 1981 118.6 10.028626 . 1855.3705 66.4 118.6 .
3 1982 115.4 10.01253 . 1815.355 72.8 115.4 .
3 1983 110.8 10.031737 . 1775.3394 84.9 110.8 .
3 1984 104.8 10.07536 25 1735.324 94.9 104.8 .
3 1985 102.8 10.099703 24 1695.3083 98 102.8 .
3 1986 99.7 10.127267 24.7 1655.2927 104.4 99.7 .
3 1987 97.5 10.1343 24.1 1615.277 103.9 97.5 .
3 1988 90.1 10.141663 23.6 1575.2615 117.4 90.1 .
3 1989 82.4 10.142313 23.7 1535.246 126.4 82.4 .
3 1990 77.8 10.141623 23.8 1495.2303 163.8 77.8 .
3 1991 68.7 10.110714 22.3 . 186.8 68.7 .
3 1992 67.5 10.11494 21.3 . 201.9 67.5 .
3 1993 63.4 10.098497 20.8 . 205.1 63.4 .
3 1994 58.6 10.099508 20.1 . 190.3 58.6 .
3 1995 56.4 10.155916 19.7 . 195.1 56.4 .
3 1996 54.5 10.178637 19.1 . 197.9 54.5 .
3 1997 53.8 10.17519 19.5 . 200.3 53.8 .
3 1998 52.3 . . . 207.8 52.3 .
3 1999 47.2 . . . 224.9 47.2 .
3 2000 41.6 . . . 351.2 41.6 .
4 1970 124.8 . . 1909.5022 29.4 . 120.08421
4 1971 125.5 . . 1916.476 31.1 . 123.86316
4 1972 134.3 9.805548 . 1923.4497 31.2 . 129.17896
4 1973 137.9 9.848413 . 1930.4232 32.7 . 131.53947
4 1974 132.8 9.840451 . 1937.397 38.1 . 134.66843
4 1975 131 9.828461 . 1944.3706 41.7 . 136.93158
4 1976 134.2 9.858913 . 1951.344 44.8 . 141.26053
end

Split string variable

Dear Experts,

I want to split the string variable. Please advice.

The issue is: I have the responses "a" "adc" "acfj" "cde" "adfghj". I want to split these responses as single word. e.g. "a" "b" "c" "d" "e". Can it done? Looking forward for your advice.

Thanking you

Yours faithfully

Cheda Jamtsho

Removing NA Across var

I have data in string as shown below

data1 data2
NA NA
NA NA
NA NA
NA 8415739
NA 10024002
N 12057882
N 10759322
N 11305650
N 10937087
N 11463371
N 11287917
N 12720750
N 14849447
N 15542380
N 17368642
N 20738561

I want to replace NA observation to missing(.)
I tried this command
replace data1=. if data1==NA stata return error "NA not found"

Can anybody help me on this please

Monday, December 30, 2019

Generate sum of variables with sequential variables names

Hi everyone,

I have a data with these sequential variable names.

p1_1_1_1	p1_1_1_2	p1_1_1_3	p1_1_1_4	p1_1_2_1	p1_1_2_2	p1_1_2_3	p1_1_2_4	p1_1_3_1	p1_1_3_2	p1_1_3_3	p1_1_3_4
1	1	1	1	2	2	2	2	3	3	3	3
1	1	1	1	2	2	2	2	3	3	3	3
1	1	1	1	2	2	2	2	3	3	3	3

I would like to sum the sequential variable names as follows:

generate p1_1_1 = p1_1_1_1 + p1_1_1_2 + p1_1_1_3 + p1_1_1_4
generate p1_1_2 = p1_1_2_1 + p1_1_2_2 + p1_1_2_3 + p1_1_2_4
...

Somebody can help me for doing the same using loops...

Thanks a lot...

Generate var using sequential variables names

Hi everyone,

p1_1 p1_2 p1_3 p2_1 p2_2 p2_3 p3_1 p3_2 p3_3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3

I would like to generate a sum of the values with sequential variable names, using loops.

generate p1= p1_1 + p1_2 + p1_3
generate p2= p2_1 + p2_2 + p2_3
generate p3= p3_1 + p3_2 + p3_3

Thanks you....

Identification of Treatment and Control Group

Respected members,

I am trying to employ DID as a means of analysis. In my dataset of 287 firms between 2001 and 2016, there was a policy reform in 2010 of including at least 10 percent of female directors. After reading some articles on DID, I have developed the following alternatives to identify the treatment and control groups.

Option 1
Treatment group: Firms that did not have 10% of female directors before 2010.
Control group: Firms that had female directors of 10% or above before 2010.

Option 2
Treatment group: Firms that did not have 10% of female directors before 2010 and had at least 10% of female directors from 2010 onwards.
Control group: Firms that did not have 10% of female directors before 2010 and did not have at least 10% of female directors even after 2010.
The firms that already had at least 10% female directors before 2010 are excluded from the analysis.

Could you please advise me in this regard as to which of the above option (1 or 2) is appropriate?

Thanks in anticipation.

Multilevel Panel Data with CPS Data

Good evening,

Using the below Census Population Survey variables, I need to figure out the change in percent Latino for each metarea from year to year, as well as the actual percent for each given year. I plan to use actual percent for that year and the change in percent as IVs for my model.

-year (2010-2019 in one year increments; sample per year)
-metarea (about 370 metropolitan areas that households are assigned to)
-household
-person in household
-Latino (binary variable at the person level)

I attached a preview of my dataset.

Thank you!

Problem with merging multiple csv files using merge 1:1

Hi,

I am a beginner in Stata (using Stata 16) and after going through many of the posts regarding merging multiple files from a folder, I tried to write the following code but I received an error. I will describe the data, folder structure, code and error messages below:

Data: I have quarterly bank data from FDIC where each csv file corresponds to one quarter and within each file, different banks are identified using a variable called 'cert'. For every file, there is also a column named 'repdte' which lists the quarter for the particular file (so for eg, I will have a file named All_Reports_20170930_U.S. Government Obligations.csv which will have many columns giving data regarding US Govt Obligations and there will also be two additional columns cert and repdte listing the bank ID and 20170930 respectively for the entire file).
Sample csv files may be downloaded from: https://www7.fdic.gov/sdi/download_l...st_outside.asp For my testing, I am using the 2018, 2017 files for quarters 1231 and 0930 for the files "Unused Commitments Securitization" and "U.S. Government Obligations".
What I want to do: I want to merge all the bank data across banks and quarter (panel data) and to do this, i figured I should use the command: merge 1:1 cert repdte using filename
Code:
clear all
pwd
cd "C:\Users\HP\Dropbox\Data\Test2"

tempfile mbuild
clear
save `mbuild', emptyok

foreach year in 2018 2017{
foreach dm in 1231 0930 {
foreach name in "Unused Commitments Securitization" "U.S. Government Obligations"{
import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'", clear
gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'"
merge 1:1 cert repdte using `mbuild'
save `mbuild', replace

}
}
}
Error:
.
. foreach year in 2018 2017{
2. foreach dm in 1231 0930 {
3. foreach name in "Unused Commitments Securitization" "U.S. Governmen
> t Obligations"{
4. import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Report
> s_`year'`dm'_`name'", clear
5. gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y
> ear'`dm'_`name'"
6. merge 1:1 cert repdte using `mbuild'
7. save `mbuild', replace
8.
. }
9. }
10. }
(52 vars, 5,415 obs)
no variables defined
r(111);

Could someone please help me understand what i am doing wrong and how I can achieve what I am trying to do? Additionally, I also want to be able to retrieve the merged file to do further analysis on Stata and also export it to a folder on my computer - how should I do that?

Assistance on Statistical analysis

Can anyone help me out? I am investigating the coping strategies used among women by using the Brief COPE scale with Likert scale of 4. I want to see if there any association between the coping strategies and socio-demographic characteristics and medical variables? Which test is appropriate and what command to give. I have attached a dummy table for understanding.

Time stamps in forum software

Is there any chance the forum software could be changed to specify time zone in the time stamps? Right now (for me at least) it displays Central Time, which always confuses me a bit, since I'm on the east coast (and I'm assuming is even more confusing for people in more different time zones). So right now, it's 15:34 where I am, but the time stamp says 14:34. Even better would be the option to change what time zone the time stamps are displayed in (if that option doesn't exist already - apologies if it does!).

Using anymatch in a forvalue loop to detect if each value in v1 matches ANY value in v2

I'm struggling to come up with a solution for finding if each observation in variable 1 matches ANY of the specified observations in v2.I'm trying to narrow the data to focus on passengers that have arrived on-time at least one time in the data. That way I can look at those passengers' data, even for points when they weren't on time.
I'm trying to pass a numlist to anymatch of the names of the ID's of the passengers that have arrived on time at least one time but I'm getting an error.
"values() invalid -- invalid numlist"

This is my code:

g on_time= passengers if timely==1; // limiting to timely arrivals.
levelsof on_time;
g on_time_levels= r(levels); //unique numlist of passengers with timely arrivals (unsure of this)
g on_time_ever=.;
forvalues i =1/6939 {;
egen tempvariable = anymatch(passengers) if _n==`i',values(on_time_levels);
replace on_time_ever=tempvariable if _n==`i';
drop tempvariable;
};

I am unsure if the levels var I generated is really a numlist. How else can I get a numlist from this variable so I can pass it to anymatch? Or am I just going about this completely wrong?
Thanks!

standard error of 8280 in multinominal logit

Hi,

I am analysing my data using multinominal logit. Firstly sorry that I cannot post my data and full results here.

Let call dependent variable "P3", and I have several independent variables: "treatment" , "P1", "age", "iq", "female", "mistakes", "major". The one I'm interested in is "treatment", and I think that "P1" has be included in the regression as a control. "P3" and "P1" are measuring the same thing before and after the treatment, and they have 7 categories. The sample size is small, 157, with two missing value in Female, so N=155.

I am running into a problem of getting very large standard error of coefficient, such as 8280 of one category of P1. Almost every such large standard error happens with one of the category of P1.

I looked at the cross-table of P1 and P3, and found there are some empty cells. The partial table looks like this.

Code:


P1        |                                P3
            |        -2         -1          0          1          2          3          4 |     Total
-----------+-----------------------------------------------------------------------------+----------
         3 |         0          0          0          0          1          2          2 |         5
         4 |         0          0          0          0          0          3        13 |        16
-----------+-----------------------------------------------------------------------------+----------

I am wondering if these empty cells cause the enormous standard error. I know that the sample size is very small, and the number of independent variable are relatively too larger to sample size, should I switch to -firthlogit-?

Thanks for any help!!

Estimating adjusted means and 95% CI using regression stata

Dear all,

Now i am analyzing a repeated measurable longitudinal data.

by linear regress model, Y, x, covariates ( age, sex, education, incomes) , i.obesity

i would like to get the adjusted means (95% confidence interval ) of Y at different level of obesity.

do i perform the command of margins?

what is the correct code to get this above results ?

i am grateful for your help.

Jianbo

Creating a local list from a variable

I have the following string values for two variables. I would like to create a local list from Var1 and/or Var2.

clear
input str4 Var1 str3 Var2
"A f" "H O"
"B" "L"
"C" "Z"
"D" "N t"
"E g" "m o"
"F" "a p"
"G" "w"
"" "q"
"" "po"
end
[/CODE]

when I use levels of this is what I get:
levelsof Var1, local(levels)
`"A f"' `"B"' `"C"' `"D"' `"E g"' `"F"' `"G"'

local List1 I desire is:
`" "A f" "B" "C" "D" "E g" "F" "G" "'

Similarly,
levelsof Var2, local(levels) is:
`"H O"' `"L"' `"N t"' `"Z"' `"a p"' `"m o"' `"po"' `"q"' `"w"'

local List2 I desire is:
`" "H O" "L" "N t" "Z" "a p" "m o" "po" "q" "w" "'

The goal is to eliminate manual entry to create List1 and and List2. Instead just grab them from the Var1 or Var2 and create a local list.

Any help would be appreciated.
Thanks

Running sum of observations by group for last 3 years

Dear Statalists,

my dataset includes company IDs and patents the companies invented per year. Each line is a patent invented in a certain year by a certain company, so that there might be several lines per company/year. I am struggeling with the running sum of the number of patents (= number of obs.) per company for the last 3 years.

In my example, for 1994 I would like to have a 2 as in that year two patents have been invented and there are no previous years for that company. For 1995, I would like to have 8 (6 from 1995 and 2 from 1994). For 1996 its 11, for 1997 its 12 (1994 drops out) and so on...

Any ideas? Thanks in advance!

I am using Stata MP 15.0.

Code:

clear
input long permno float grant_year
10016 1994
10016 1994
10016 1995
10016 1995
10016 1995
10016 1995
10016 1995
10016 1995
10016 1996
10016 1996
10016 1996
10016 1997
10016 1997
10016 1997
10016 1998
10016 1998
10016 1998
10016 1998
10016 1998
10016 1998
10016 1998
10016 1998
end

IV estimation with ordinal endogenous variable and ordinal instrumental variable

Hello, everyone. I am actually quite new with Statalist, and just beginning to learn Stata beyond what was taught in our syllabi. I need your help with IV estimation. I wish to estimate mortality risk with BMI as main predictor (with survival analysis). To address the issue of reverse causation involving BMI and comorbid illness, I would like to use BMI_time–1 as instrumental variable for BMI... with both BMIs as ordinal variables (underweight, normal[baseline], overweight, obese 1, obese 2, obese 3). What should I use in Stata? Is it ivregress or ivpoisson. Also, can anyone help me on how to code this in Stata? So far, the Stata manual hasn't been really helpful (the instrument is treated as continuous), and I've searched as extensively as I can, but came up with nothing. Do I create a separate dummy variable for each BMI class (e.g., BMI_time-1_underweight = 0 or 1, etc.)?

For completion, my other exogenous variables are age, sex, and current smoking status, and my other instruments are diabetes_time-1 (0 or 1) and cardiovascular disease_time-1 (0 or 1) and smoking status at time-1 (smoker vs nonsmoker).

Thank you all so much for your time and understanding.

ttest for time series

Hello all,

I got monthly data of the standard deviation of my betas (Sd_beta) and three dummy variables P1-P3, which indicate wheter the volatility of Ted (Ted_Vol) is in the first second or third tercile.
I've regressed this dummies on the above mentioned standard Deviation and got the coefficients for P1 and P3. In the next step I have to determine if the difference (P3-P1) between the standard deviation given that P1=1 or P3 is statistically significant. I'm not qute sure how to approach this task. Is there a way to compute monthly differences and use them for a t-test?
Another thought of mine was to calculate the difference of Ted_Vol if P1=1 and P3=1 but I do not know how to match this numbers regarding my time variable.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float date int Jahr byte Monat double(Sd_Beta Ted_Vol) float(P1 P3)
469 1999  2  .2162872850894928 .00042292120633646846 1 0
470 1999  3 .21683630347251892  .0005120532005093992 1 0
471 1999  4 .21574001014232636  .0005089303012937307 1 0
472 1999  5 .21684658527374268   .001667482778429985 0 0
473 1999  6 .21839885413646698  .0011676736176013947 0 0
474 1999  7 .21961617469787598  .0005640562158077955 0 0
475 1999  8 .22290168702602386  .0003095806168857962 1 0
476 1999  9 .23819047212600708   .001692846417427063 0 1
477 1999 10  .2405623197555542   .001672371756285429 0 0
478 1999 11 .23849868774414063  .0029762284830212593 0 1
479 1999 12 .24061444401741028   .002350094262510538 0 1
480 2000  1 .23947422206401825  .0033114443067461252 0 1
481 2000  2 .23897820711135864  .0017416990594938397 0 1
482 2000  3 .23724332451820374  .0005756043246947229 0 0
483 2000  4 .23789244890213013  .0004283250018488616 1 0
484 2000  5 .23350174725055695   .001657421002164483 0 1
485 2000  6 .23567992448806763   .002668452449142933 0 1
486 2000  7 .23621943593025208  .0005725464434362948 1 0
487 2000  8 .23868688941001892  .0008709787507541478 0 0
488 2000  9 .24259942770004272  .0012692536693066359 0 0
489 2000 10 .24906545877456665 .00048570294165983796 1 0
end
format %tm date

Thank you in advance.

Delete records with missing valuables

How can I delete all observations that have missing data in any of the variables that I have

I have more than 61000 observations with 3809 variables and i need to keep only observation that are complete

Firm fixed effects

Hi Statalist,

I have a question about firm-fixed effects.

My regression looks like:

Dependent var = independent var + controls

My dependent var is a continuous variable, and my independent var is a dummy variable. This dummy variable can, of course, be 1 or 0. It can go from 1 to 0 in consecutive years, but NOT from 0 to 1.

I made paneldata by xtset CIK fyear, where CIK is the company identifier.

My research supervisor said that when I include firm-fixed effects, for the B1 coefficient stata only looks at those firms that go from 1 to 0 in consecutive years (because all other firm-years are 'constant').

Is this true, and can anyone elaborate on this so that I will be able to defend this story more strongly?
If you need more information please feel free to ask...

Standardized concentration indices with conindex

Hello everyone,

I am trying to modify the conindex user-written program so as it would calculate indirectly standardized concentration indices as well. However, I get an error 102 "too few variables specified", and I am not sure how to fix it. The option that I have added is [, STvar(varname)]. Below you can find the code. The added code has been highlighted with a red color. I haven't tried to modify the compare option so as it would incorporate the comparison of standardized coefficients. But that would be great too.

Code:

capture program drop conindex2
program define conindex2, rclass sortpreserve byable(recall)
version 11.0
syntax varname [if] [in] [fweight aweight pweight]  , [RANKvar(varname)] [, robust] [, CLUSter(varname)] [, truezero] [, LIMits(numlist min=1 max=2 missingokay)] [, generalized][, generalised] [, bounded] [, WAGstaff] [, ERReygers]  [, v(string)] [,beta(string)] [, graph] [, loud] [, COMPare(varname)] [, KEEPrank(string)] [, ytitle(string)] [, xtitle(string)] [,compkeep(numlist)] [,extended] [,symmetric] [,bygroup(numlist)] [,svy] [, STvar(varname)]
marksample touse
tempname grouptest counter
tempvar wght sumw cumw cumw_1 cumwr cumwr_1 frnk temp sigma2 meanlhs meanlhs_star cumlhs cumlhs1 lhs rhs1 rhs2 xmin xmax varlist_star weight1 meanweight1 tempx temp1x sumlhsx  temps tempex lhsex rhs1ex rhs2ex sigma2ex exrank tempgx  lhsgex lhsgexstar symrank smrankmean tempsym sigma2sym lhssym lhssymstar rhs1sym rhs2sym lhsgsym tempgxstar raw_rank_c wi_c cusum_c wj_c rank_c var_rank_c mean_c lhs_c split_c ranking  extwght temp1 meanweight  sumlhs sumwr  counts meanoverall tempdis temp0 meanlhs2  rhs temp2  frnktest meanlhsex2  equality group lhscomp  rhs1comp rhs2comp rhscomp intercept scale stvar
local weighted [`weight'`exp']
if "`weight'" != "" local weighted [`weight'`exp']
if "`weight'" == "" qui gen byte `wght' = 1
else qui gen `wght'`exp'

if "`svy'"!=""{
    if "`weight'" != ""  {
        di as error "When the svy option is used, weights should only be specified using svyset."
        exit 498
    }
    if "`cluster'"!="" {
        di as error "Warning: cluster option is redundant when using the svy option. svyset should be used to identify the survey design characteristics"
    }
    if "`robust'"!="" {
        di as error "Warning: robust option is redundant when using the svy option. svyset should be used to identify the survey design characteristics"
    }
    qui svyset
    if r(settings) == ", clear"{
        di as error "svyset must be used to identify the survey design characteristics prior to running conindex2 with the svy option."
        exit 498
    }
    local wtype = r(wtype)
    local wvar = r(wvar)
    if "`wtype'" != "." {
        local weighted "[`wtype' = `wvar']"
        qui replace `wght'=`wvar'
    }
    else replace `wght'=1
    local survey "svy:"
}

markout `touse' `rankvar' `wght' `clus' `compare'

quietly {
    local xxmin: word 1 of `limits'
    local xxmax: word 2 of `limits'

    if _by()==1 {
        if "`compare'"!="" {
            di as error "The option compare cannot be used in conjunction with by."
            exit 498
        }
    }
    if "`compkeep'"=="" local bygroup = _byindex()
    
    if "`generalised'"=="generalised" local generalized="generalized"
    
    if "`extended'"!="" | "`symmetric'"!="" {
        di as error "Please see the help file for the correct syntax for the extended and symmetric indices"
        exit 498
    }
    
    if "`xxmin'"=="" {
        scalar xmin=.
    }
    else scalar xmin=`xxmin'
    if "`xxmax'"=="" {
        scalar xmax=.
    }
    else scalar xmax=`xxmax'
    
    if "`weight'"!="" {
        sum `varlist' [aweight`exp'] if `touse'
    }
    else sum `varlist' if `touse'
    return scalar N=r(N)
    
    scalar testmean=r(mean)
    count if `varlist' < 0 & `touse'
    if r(N) > 0 {
        noisily disp as txt _n "Note: `varlist' has `r(N)' values less than 0"
    }
    
    if "`rankvar'" == "`varlist'" | "`rankvar'" ==""{
        local index = "Gini"
    }
    else local index = "CI"
    
       gen double `standvar'=`varlist'
    if "`stvar'" != "" {
        replace `standvar'=`stvar'    
        local label : variable label `stvar'
        label variable `standvar' `"`label'"'    
    }    
    
    gen double `ranking'=`varlist'
    if "`rankvar'" != "" {
        replace `ranking'=`rankvar'    
        local label : variable label `rankvar'
        label variable `ranking' `"`label'"'    
    }    
    gen double `varlist_star'=`varlist'
    
    local CompWT_options = " `varlist'"
    if "`if'"!="" {
        local compif0="`if' & `compare'==0"
        local compif1="`if' & `compare'==1"
    }
    else {
        local compif0=" if `compare'==0"
        local compif1=" if `compare'==1"
    }
    forvalues i=0(1)1 {
        if "`weight'"!=""{
            local CompWT_options`i' = "`CompWT_options' [`weight'`exp'] `compif`i'' `in',"
        }
        else local CompWT_options`i' = "`CompWT_options' `compif`i'' `in',"
    }
    if "`rankvar'"!="" {
        local Comp_options = "`Comp_options' rankvar(`rankvar')"
    }
    if "`cluster'"!="" {
        local Comp_options = "`Comp_options' cluster(`cluster')"
    }
    if xmin!=. {
        local Comp_options = "`Comp_options' limits(`limits')"
    }
    if "`v'"!="" {
        local Comp_options = "`Comp_options' v(`v')"
    }
    if "`beta'"!="" {
        local Comp_options = "`Comp_options' beta(`beta')"
    }
    if "`loud'"!="" {
        local Comp_options = "`Comp_options' loud"
    }
    if "`'"!="" {
        local Comp_options = "`Comp_options' "
    }
    foreach opt in robust truezero generalized bounded wagstaff erreygers svy{
        if "``opt''"!="" {
            local Comp_options = "`Comp_options' `opt'"
        }
    }
    
    local extended=0
    local symmetric=0
    local modified=0
    local problem=0
    
    if "`truezero'"=="truezero" {
        if testmean==0 {
            if `problem'==0  di as err="The mean of the variable (`varlist') is 0 - the standard concentration index is not defined in this case."
            local problem=1
        }
        if xmin != . {
            if xmin>0 {
                if `problem'==0 di as err="The lower bound for a ratio scale variable cannot be greater than 0."
                local problem=1
            }
        }
    }    
    if "`generalized'"=="generalized" {
        local generalized=1
    }
    else local generalized=0
    if "`truezero'"!="truezero" {
        if `generalized'==1 {
            if `problem'==0  di as err="The option truezero must be used when specifying the generalized option."
            local problem=1
        }    
        else local generalized=0
    }
    
    if "`bounded'"!="" {
        if xmax==. {
            if `problem'==0 di as err="For bounded variables, the limits option must be specified as limits(#1 #2) where #1 is the minimum and #2 is the maximum."
            local problem=1    
        }
        local bounded=1
        if xmin > xmax |xmin == xmax | xmin ==.{
            if `problem'==0 di as err="For bounded variables, the limits option must be specified as limits(#1 #2) where #1 is the minimum and #2 is the maximum."
            local problem=1
        }
        sum `varlist'
        if xmin!=.{
            if r(min)<xmin |r(max)>xmax{
                if `problem'==0 di as err="The variable (`varlist') takes values outside of the specified limits."
                local problem=1
            }    
            if r(min)>=xmin & r(max)<=xmax{        
                replace `varlist_star'=(`varlist'-xmin)/(xmax-xmin)        
            }
        }
    }
    else local bounded=0
    if "`wagstaff'"=="wagstaff" local wagstaff=1
        else local wagstaff=0
    if "`erreygers'"=="erreygers" local erreygers=1
        else local erreygers=0    
    if `bounded'==0 & (`erreygers'==1| `wagstaff'==1){
        di as err="Wagstaff and Erreygers Normalisations are only for use with bounded variables."
        di as err="Hence the bounded and limits(#1 #2) options must be used to specify the theoretical minimum (#1) and maximum (#2)."
        local problem=1
    }    
    if (`erreygers'==1 & `wagstaff'==1){
        di as err="The option wagstaff cannot be used in conjunction with the option erreygers."    
        local problem=1
    }
    if "`v'"!="" {
        capture confirm number `v'
        if _rc {
            di as err="For the option v(#), # must be a number greater than 1."
            local problem=1
        }
        if `v'<=1 & _rc==0 {
            di as err="For the option v(#), # must not be less than 1."
            local problem=1
        }
        local extended=1
    }
    if "`beta'"!=""  {
        capture  confirm number `beta'
        if _rc {
            di as err="For the option beta(#), # must be a number greater than 1."
            local problem=1
        }
        if `beta'<=1 & _rc==0 {
            di as err="For the option beta(#), # must not be less than 1."
            local problem=1
        }
        local symmetric=1
    }
    
    if `extended'==1 & `symmetric'==1{
        di as err="The option v(#) cannot be used in conjunction with the option beta(#)."
        local problem=1
    }
    
    if (`extended'==1 | `symmetric'==1) & (`erreygers'==1| `wagstaff'==1){
        di as err="Wagstaff and Erreygers Normalisations are not supported for extended/symmetric indices."
        local problem=1
    }    
    
    if (`generalized'==1) & (`erreygers'==1| `wagstaff'==1){
        di as err="Cannot specify generalized in conjunction with Wagstaff or Erreygers Normalisations."
        local problem=1
    }    
    
    if xmin != . {
        sum `varlist'
        if r(min)<xmin{
            if `problem'==0 di as err="The variable (`varlist') takes values outside of the specified limits."
            exit 498
        }
        if "`truezero'"=="truezero" {
            di as txt="Note: The option truezero has been specified in conjunction with the limits option."
            if `extended'==1 | `symmetric'==1{
                di as txt="      The index will be calculated using the standardised variable (`varlist' - min)/(max - min)."
            }
            else di as txt="      The limits are redundant as the variable is assumed to be ratio scaled (or fixed)."
        }
    }
        
    if "`truezero'"!="truezero" & `extended'==0 & `symmetric'==0 & `erreygers'==0 & `wagstaff'==0  & `generalized'==0 & `bounded'==0{
        local modified=1
        if xmin == . | xmax != . {
            di as err="For the modified concentration index, the limits option must be specified as limits(#1) where #1 is the minimum."
            di as err="If you require an alternative index, please look at the help file by typing - help conindex2 - to find the correct syntax."
            local problem=1
        }    
        if xmin == . {
            di as err="For the modified concentration index (the default), a missing value (.) may not be used as the lower limit. "
            local problem=1
        }
        sum `varlist'
        if r(min)==r(max){
            di as err="The modified concentration index cannot be computed since the variable (`varlist') is always equal to its minimum value."
            local problem=1
        }
    }
    
    if "`truezero'"!="truezero" {
        if `extended'==1 | `symmetric'==1{
            di as err="The extended and symmetric indices should be used for ratio-scale variables and hence truezero must be specified also."
            local problem=1
        }
    }    
    
    if "`graph'"=="graph"{
        if "`truezero'"!="truezero" & `bounded'!=0{
            di as err="Graph option only available for ratio-scale variables - please also specify the truezero option if the variable is ratio-scale or the bounded option if the variable is bounded."
            local problem=1
        }
        if "`wagstaff'"=="wagstaff" | "`erreygers'"=="erreygers"{
            di as err="Graph option not supported for Wagstaff or Erreygers Normalisations."
            local problem=1
        }
        if `extended'==1 | `symmetric'==1{
            di as err="Graph option not supported for Extended or Symmetric Indices."
            local problem=1
        }
    }
    
    if "`loud'"=="loud" local noisily="noisily"    
    if `problem'==1  exit 498
    if `generalized'==1 & `extended'==1 noisily disp as txt _n "Note: The extended index equals the Erreygers normalised CI when v=2"
    if `generalized'==1 & `symmetric'==1 noisily disp as txt _n "Note: The symmetric index equals the Erreygers normalised CI when beta=2"
    
    if "`robust'"=="robust" | "`cluster'"!=""{
        local SEtype="Robust std. error"
    }
    else local SEtype="Std. error"


    if "`svy'"!="" & (`extended'==0 & `symmetric'==0) gen `scale'=1
    else gen double `scale'=sqrt(`wght')
    
    gsort -`touse' `ranking'
    egen double `sumw'=sum(`wght') if `touse'
    gen double `cumw'=sum(`wght') if `touse'
    gen double `cumw_1'=`cumw'[_n-1] if `touse'
    replace `cumw_1'=0 if `cumw_1'==.
    bys `ranking': egen double `cumwr'=max(`cumw') if `touse'
    bys `ranking': egen double `cumwr_1'=min(`cumw_1') if `touse'
    gen double `frnk'=(`cumwr_1'+0.5*(`cumwr'-`cumwr_1'))/`sumw' if `touse'
    gen double `temp'=(`wght'/`sumw')*((`frnk'-0.5)^2) if `touse'
    egen double `sigma2'=sum(`temp') if `touse'
    replace `temp'=`wght'*`varlist_star'
    egen double `meanlhs'=sum(`temp') if `touse'
    replace `meanlhs'=`meanlhs'/`sumw'
    
    if  `modified'==1 & `bounded'==0{
        replace `meanlhs'=`meanlhs'-xmin
    }


    if "`graph'"=="graph" {
         capture which lorenz
         if _rc==111 disp "conindex2 requires the lorenz.ado by Ben Jahn to produce graphs. Please install this before using conindex2."
        if "`ytitle'" ==""{
            local ytext : variable label `varlist'
            if "`ytext'" == "" local ytext "`varlist'"
            local ytitle = "Cumulative share of `ytext'"
            if `generalized'==1 {
                if "`ytext'" == "" local ytext "`varlist'"
                local ytitle = "Cumulative average of `ytext'"
            }
        }
        if "`xtitle'" ==""{
            if "`rankvar'"  == "" local xtext : variable label `varlist'
            if "`rankvar'"  != "" local xtext : variable label `ranking'
            if "`xtext'" == "" local xtext "`rankvar'"
            if "`xtext'" == "" local xtext "`varlist'"
            local xtitle = "Rank of `xtext'"
        }    
        if `generalized'== 0{
            lorenz estimate `varlist_star', pvar(`ranking')
            lorenz graph, ytitle(`ytitle', size(medsmall)) yscale(titlegap(5))  xtitle(`xtitle', size(medsmall))  ytitle(`ytitle', size(medsmall)) graphregion(color(white)) bgcolor(white)
        }
        if `generalized'==1 {
            lorenz estimate `varlist_star', pvar(`ranking') generalized
            lorenz graph, ytitle(`ytitle', size(medsmall)) yscale(titlegap(5))  xtitle(`xtitle', size(medsmall))  ytitle(`ytitle', size(medsmall)) graphregion(color(white)) bgcolor(white)
        }    
    }

    
    noisily  di in smcl ///
        "{hline 19}{c TT}{hline 13}{c TT}{hline 13}{c TT}{hline 19}" _c
    noi di in smcl  "{c TT}{hline 10}{c TRC}"

    noisily  di in text "Index:" _col(20) "{c |} No. of obs." _col(34) ///
          "{c |} Index value" _col(48) "{c |} `SEtype'" _col(68) ///
          "{c |} p-value" _col(79) "{c |}"
    noisily  di in smcl ///
        "{hline 19}{c +}{hline 13}{c +}{hline 13}{c +}{hline 19}" _c
    noi di in smcl  "{c +}{hline 10}{c RT}"
    
    gen double `lhs'=2*`sigma2'*(`varlist_star'/`meanlhs')*`scale' if `touse'
    gen double `intercept'=`scale' if `touse'
    gen double `rhs'=`frnk'*`scale' if `touse'
    
    local type = "`index'"
    
    if  `modified'==1 & `bounded'==0{
        replace `meanlhs'=`meanlhs'+xmin
    }
    
    if `generalized'==0 & `erreygers'==0 & `wagstaff'==0{
        `noisily'  disp "`index'"
        local type = "`index'"
    }
    if `modified'==1 {
        `noisily'  disp "Modified `index'"
        local type = "Modified `index'"
        replace `lhs'=`lhs'*(`meanlhs')/(`meanlhs'-xmin) if `touse' ==1
    }    
    if `wagstaff'==1{
        `noisily'  disp "Wagstaff Normalisation"
        local type = "Wagstaff norm. `index'"
        replace `lhs'= `lhs'/(1-`meanlhs') if `touse'
    }
    if `erreygers'==1{
        `noisily'  disp "Errygers Normalisation"
        local type = "Erreygers norm. `index'"
        replace `lhs'= `lhs'*(4*`meanlhs') if `touse'
    }
    if `generalized'==1 {
        `noisily'  disp "Gen. standard `index'"
        local type = "Gen. `index'"
        replace `lhs'=`lhs'*`meanlhs' if `touse'
    }    
    
    if `extended'==1 | `symmetric'==1{
        gsort -`touse' `frnk'
        gen double `temp1'=`wght'*`varlist_star' if `touse'
        egen double `sumlhs'=sum(`temp1') if `touse'
        bys `ranking': egen double `sumwr'=sum(`wght') if `touse'
        bys `ranking': egen double `counts'=count(`temp1') if `touse'
        gen `meanoverall'=`sumlhs'/`sumw' if `touse'
        bys `ranking': egen double `temp0'=rank(`ranking') if `touse', unique
        bys `ranking': egen double `meanlhs2'=sum(`temp1') if `touse'
        replace `meanlhs2'=`meanlhs2'/`sumwr' if `touse'
    }    
    
    
    if `extended'==1{
        capture drop `lhs'
        capture drop `rhs'
        capture drop `temp2'
        gen double `rhs'=((`sumwr'/`sumw')+((1-(`cumwr'/`sumw'))^`v')-((1-(`cumwr_1'/`sumw'))^`v')) if `temp0'==1
        egen double `temp2'=sum(`rhs'^2) if `temp0'==1
        gen double `lhs'=(`meanlhs2'/`meanoverall')*`temp2' if `touse' & `temp0'==1
        local type = "Extended `index'"    
        if `generalized'==1{
            local type = "Gen. extended `index'"
            replace `lhs'=(`meanlhs2'*(`v'^(`v'/(`v'-1)))/(`v'-1))*`temp2' if `touse' & `temp0'==1
        }
    }            
    
    if `symmetric'==1{
        capture drop `lhs'
        capture drop `rhs'
        capture drop `temp2'
        gen double `rhs'=(2^(`beta'-2))*(abs((`cumwr'/`sumw'-0.5))^`beta'-(abs(`cumwr_1'/`sumw'-0.5))^`beta') if `temp0'==1
        egen double `temp2'=sum(`rhs'^2) if `temp0'==1
        gen double `lhs'=(`meanlhs2'/`meanoverall')*`temp2' if `touse' & `temp0'==1
        local type = "Symmetric `index'"
    
        if `generalized'==1{
            local type = "Gen. symmetric `index'"
            replace `lhs'=`meanlhs2'*4*`temp2' if `touse' & `temp0'==1
        }
    }
    `noisily'  regress `lhs' `rhs' `intercept' `standvar' if `touse'==1, `robust' cluster(`cluster') noconstant
    if "`survey'"=="" `noisily'  regress `lhs' `rhs' `intercept' `standvar' if `touse'==1, `robust' cluster(`cluster') noconstant
    if "`survey'"=="svy:" `noisily' svy: regress `lhs' `rhs' `intercept' `standvar' if `touse'==1,  noconstant

    
    return scalar RSS=e(rss)
     mat b=e(b)
     mat V=e(V)
     return scalar CI= b[1,1]
     return scalar CIse= sqrt(V[1,1])

    if `extended'==1 | `symmetric'==1{
        `noisily'   regress `lhs' `rhs' `standvar' if `temp0'==1, robust
        return scalar RSS=e(rss)
        mat b=e(b)
        mat V=e(V)
        return scalar CI= b[1,1]
        return scalar CIse = .
    }
    
    return scalar Nunique= e(N)
    local nclus= e(N_clust)
    local t=return(CI)/return(CIse)
     local p=2*ttail(e(df_r),abs(`t'))
     noisily  di in text "`type'" _col(20) "{c |} " as result return(N) ///
        _col(34) "{c |} " as result return(CI) _col(48) "{c | }" ///
         as result return(CIse) _col(68) "{c |} " as result %7.4f ///
        `p' _col(79)"{c |}"
     noisily  di in smcl ///
        "{hline 19}{c BT}{hline 13}{c BT}{hline 13}{c BT}{hline 19}" _c
    noi di in smcl  "{c BT}{hline 10}{c BRC}"

    if `nclus'!=. noisily  di in text "(Note: Std. error adjusted for `nclus' clusters in `cluster')"
    if return(Nunique)!=return(N) noisily  di in text "(Note: Only " return(Nunique) " unique values for `rankvar')"
    if `extended'==1 | `symmetric'==1{
        noisily  di in text "(Note: Standard errors for the extended and symmetric indices are not calculated by the current version of conindex2.)"
    }
    
    if "`keeprank'"!="" {
        tempname savedrank
        gen  double `savedrank'=`frnk'
        if _by()==0  {
            confirm new variable `keeprank'`compkeep'
            gen  double `keeprank'`compkeep'=`savedrank'
        }
        if _by()==1 {
            gen  double `keeprank'_`bygroup'=`savedrank'
            }            
    }
    



    if "`compkeep'"!="" {
        confirm new variable templhs
        gen double templhs=`lhs'
        confirm new variable temprhs
        gen double temprhs=`rhs'
    }
    if "`compare'"!=""{
        egen `group' = group(`compare')
        qui sum `group' if `touse' , meanonly
        scalar gmax=r(max)
        noisily  di in text ""
        noisily  di in text ""
        noisily  di in text "For groups:"
        noisily  di in text ""
        noisily  di in text ""
        
        gen double `lhscomp'=.  
        gen double `rhscomp'=.
        foreach i of num 1/`=scalar(gmax)'  {
            if "`if'"!="" {
                local compif`i'="`if' & `group'==`i'"
            }
            else {
                local compif`i'=" if `group'==`i'"
            }
            if "`weight'"!=""{
                local CompWT_options`i' = "`CompWT_options' [`weight'`exp'] `compif`i'' `in',"
            }
            else local CompWT_options`i' = "`CompWT_options' `compif`i'' `in',"
            qui sum `compare' if `touse' & `group'==`i', meanonly
            noisily  di in text "CI for group `i': `compare' = "r(mean)
            noisily conindex2 `CompWT_options`i'' `Comp_options' keeprank(`keeprank') compkeep(`i')
            noisily  di in text ""
            replace `lhscomp'=templhs if `touse' & `group'==`i'
            replace `rhscomp'=temprhs if `touse' & `group'==`i'
            drop templhs temprhs
            }    
        `noisily'  regress `lhscomp' c.`rhscomp' i.`group' if `touse',  `robust' cluster(`cluster')
        return scalar N_restricted=e(N)
        return scalar SSE_restricted=e(rss)
        `noisily'  regress `lhscomp' c.`rhscomp'##i.`group' if `touse',  `robust' cluster(`cluster')
        noisily  di in text ""
        return scalar SSE_unrestricted=e(rss)
        return scalar N_unrestricted=e(N)

        return scalar F=[(return(SSE_restricted)-return(SSE_unrestricted))/(gmax-1)]/(return(SSE_unrestricted)/(return(N_restricted)-2*gmax))
        local p=1 - F(gmax-1,(return(N_restricted)- 2*gmax), return(F))                        /* OO'D made two changes to second df 28.5.14 */
        noisily  di in text "Test for stat. significant differences with Ho: diff=0 (assuming equal variances)" _col(50) "
        noi di in smcl "{hline 19}{c TT}{hline 19}{c TRC}"
        noisily  di in text "F-stat = " as result return(F) _col(20) "{c |} p-value= "  as result %7.4f `p' _col(40) "{c |}"        
        noi di in smcl "{hline 19}{c BT}{hline 19}{c BRC}"

        if gmax==2{
            disp "Group: `compare'=0"
            conindex2 `CompWT_options1' `Comp_options'
            return scalar CI0=r(CI)
            return scalar CIse0=r(CIse)
            disp "Group: `compare'=1"

            conindex2 `CompWT_options2' `Comp_options'
            return scalar CI1=r(CI)
            return scalar CIse1=r(CIse)
            return scalar Diff= return(CI1)-return(CI0)
    
            return scalar Diffse= sqrt((return(CIse0))^2 + (return(CIse1))^2)
            return scalar z=return(Diff)/return(Diffse)
            local p=2*(1-normal(abs(return(z))))
            noisily  di in text "Test for stat. significant differences with Ho: diff=0 " _col(50) "(large sample assumed)"
            noi di in smcl ///
                "{hline 19}{c TT}{hline 23}{c TT}{hline 17}{c TT}{hline 18}{c TRC}"
            noisily  di in text "Diff. = " as result return(Diff) _col(20) ///
                "{c |} Std. err. = " as result return(Diffse) _col(44) ///
                "{c |} z-stat = " as result %7.2f return(z) _col(59) "{c |} p-value = " as result %7.4f `p' _col(79)"{c |}"                
            noi di in smcl ///
                "{hline 19}{c BT}{hline 23}{c BT}{hline 17}{c BT}{hline 18}{c BRC}"
        }
    }    
}
end

Any help would be much appreciated.

Thanos

Multi-Logistics Regression on DHS data

Hi All,

I am using DHS India data to undertake analysis on the topic related to Child Health. I have completed my analysis but before I publish or present my findings I need to verify that my analysis is robust.

My request: Has anyone here conducted a multi-logistics regression using DHS data and willing to share their 'do file'. I will be very grateful to you.

Thank You.

Sunday, December 29, 2019

CMP model for multnomial probit with varying choice set

Hello,

I am looking to build a multinomial probit with a varying choice set for each individual via CMP. The idea is to later add other choice dimensions for building a joint model and hence I am looking for a workaround using CMP.

I understand that the "asmprobit" function does a good job of this by just not adding the rows corresponding to the alternatives that are not present.

Panel Data

Hi
How do I estimate a random effects of individual and time effects in panel data models with stata?
(I want the final estimate of the model)
{ xtreg Variables ,fe Prob > F = 0.0000}
{ hausman fe re Prob>chi2 = 0.4301}
{Breusch–Pagan, Honda, king-wu, SLM, GHM (in EViews) prob cross -section = 0.0000 prob period = 0.0000 Both = 0.0000 }

How to do the final model estimation??

Resolving "Initial values not feasible" error after using melogit command and the choice between melogit and meqrlogit

Dear Statalists,

I'm working on a multi-level model using data from cross-countries survey data for the 2016 year. But I am encountering a problem with stata command melogit and i hope you will help me to overcome it. It's the first time I work on the multilevel model.

You can see an extract of my data structure below:
countryID is the country's identification number,
id is ID number of respondent (which is so long),
health and pensions are the binary outcomes.
AGE1 (grand mean-centered) and SEX1 are individual predictors
Primary, Secondary and Tertiary are country-level variables which represent the proportion of immigrant with primary, secondary and tertiary education in a different country

I select only one country here 56 which Country ISO 3166 Code for Belgium.

clear
input float(countryID id health pensions AGE1) float SEX1 double(Primary Secondary Tertiary)

56 2.016056e+15 1 1 -7.302176 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -7.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -8.3021755 0 43.7 31.4 24.9
56 2.016056e+15 1 1 -9.3021755 0 43.7 31.4 24.9
56 2.016056e+15 1 1 -14.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 0 -8.3021755 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -15.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 0 -1.3021756 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -10.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 32.697823 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -11.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -12.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -11.302176 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -12.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 0 26.697824 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -5.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 1 -13.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 0 -14.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -25.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -13.302176 0 43.7 31.4 24.9

When I run melogit command, i obtain this result:

melogit health SEX1 AGE1 Primary Secondary Tertiary || id:

Fitting fixed-effects model:

Iteration 0: log likelihood = -13691.836
Iteration 1: log likelihood = -13670.184
Iteration 2: log likelihood = -13670.165
Iteration 3: log likelihood = -13670.165

Refining starting values:

Grid node 0: log likelihood = -13173.957

Fitting full model:

initial values not feasible
r(1400);

But if meqrlogit command, I obtain the following result:

meqrlogit health SEX1 AGE1 Primary Secondary Tertiary || id:

Refining starting values:

Iteration 0: log likelihood = -12500.998 (not concave)
Iteration 1: log likelihood = -12483.544
Iteration 2: log likelihood = -12451.561

Performing gradient-based optimization:

Iteration 0: log likelihood = -12451.561 (not concave)
Iteration 1: log likelihood = -12389.075 (not concave)
Iteration 2: log likelihood = -12364.939
Iteration 3: log likelihood = -12315.838 (not concave)
Iteration 4: log likelihood = -12309.971 (not concave)
Iteration 5: log likelihood = -12304.889 (not concave)
Iteration 6: log likelihood = -12304.14
Iteration 7: log likelihood = -12298.841 (not concave)
Iteration 8: log likelihood = -12298.209
Iteration 9: log likelihood = -12287.509 (not concave)
Iteration 10: log likelihood = -12287.477
Iteration 11: log likelihood = -12286.192 (not concave)
Iteration 12: log likelihood = -12285.969
Iteration 13: log likelihood = -12285.85 (not concave)
Iteration 14: log likelihood = -12285.831
Iteration 15: log likelihood = -12285.769
Iteration 16: log likelihood = -12285.768
Iteration 17: log likelihood = -12285.767

Mixed-effects logistic regression Number of obs = 25769
Group variable: id Number of groups = 13

Obs per group: min = 1002
avg = 1982.2
max = 3995

Integration points = 7 Wald chi2(5) = 567.30
Log likelihood = -12285.767 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SEX1 | .103667 .0331526 3.13 0.002 .0386892 .1686449
AGE1 | -.0042459 .0009453 -4.49 0.000 -.0060987 -.0023932
Primary | -7.556424 1.069612 -7.06 0.000 -9.652826 -5.460023
Secondary | -7.528384 1.07176 -7.02 0.000 -9.628994 -5.427774
Tertiary | -7.262108 1.087128 -6.68 0.000 -9.392841 -5.131376
_cons | 748.3098 107.4985 6.96 0.000 537.6166 959.003
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 4.869241 2.235363 1.980128 11.97373
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) = 2768.79 Prob>=chibar2 = 0.0000

Questions:
What is the problem with the melogit ? and with what command can I fix it ?
What do you think about meqrlogit estimation result ? Is it better then the melogit one ?
If yes, Why ?

Many thanks

Cisse abs

PPML multicolineality

Hi,

I have estimated a model with OLS and PPML.

With OLS I obtain R2 = 0.78
estat vif = 1.2

With PPML I obtain R2 = 0.95

Is it mean that I have a multicollinearity problem?

Thank you!

Difference of a variable based on corresponding variables in other columns

Hi, Based on a subset of data pasted below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double year str2 state double growth str2(neigh1 neigh2 neigh3) str3 neigh4
1989 "WY"  9.05 "MT" "SD" "NE" "CO" 
1989 "IN"  4.62 "MI" "OH" "KY" "IL" 
1989 "CO"  3.01 "WY" "NE" "KS" "OK" 
1989 "MN"  2.57 "WI" "IA" "SD" "ND" 
1989 "ID"  5.31 "MT" "WY" "UT" "NV" 
1989 "NH"  3.41 "ME" "MA" "VT" ""   
1989 "DE"  2.87 "PA" "NJ" "MD" ""   
1989 "KS"  1.91 "NE" "MO" "OK" "CO" 
1989 "MI"  3.81 "OH" "IN" "WI" ""   
1989 "NV"  3.55 "ID" "UT" "AZ" "CA" 
1989 "RI"  6.21 "MA" "CT" ""   ""   
1989 "WV"  9.03 "PA" "MD" "VA" "KY" 
1989 "ND" -7.22 "MN" "SD" "MT" ""   
1989 "HI"  5.65 ""   ""   ""   ""   
1989 "NC"  4.54 "VA" "SC" "GA" "TN" 
1989 "IL"  5.65 "WI" "IN" "KY" "MO" 
1989 "AZ"  1.87 "UT" "CO" "NM" "CA" 
1989 "IA"  5.15 "MN" "WI" "IL" "MO" 
1989 "OH"  3.45 "PA" "WV" "KY" "IN" 
1989 "TX"   5.7 "OK" "AR" "LA" "NM" 
1989 "VA"  3.38 "MD" "NC" "TN" "KY" 
1989 "UT"   4.6 "ID" "WY" "CO" "NM" 
1989 "MD"  4.79 "PA" "DE" "VA" "WV" 
1989 "AL"  4.67 "TN" "GA" "FL" "MS" 
1989 "NM"  1.17 "CO" "OK" "TX" "AZ" 
1989 "AR"  3.82 "MO" "TN" "MS" "LA "
1989 "KY"  6.79 "OH" "WV" "VA" "TN" 
1989 "FL"  3.54 "GA" "AL" ""   ""   
1989 "LA"  5.04 "AR" "MS" "TX" ""   
1989 "TN"  3.83 "KY" "VA" "NC" "GA" 
1989 "CA"  3.11 "OR" "NV" "AZ" ""   
1989 "MO"  3.89 "IA" "IL" "KY" "TN" 
1989 "PA"  4.55 "NY" "NJ" "DE" "MD" 
1989 "WI"  5.14 "MI" "IL" "IA" "MN" 
1989 "VT"  6.79 "NH" "MA" "NY" ""   
1989 "ME"  5.82 "NH" ""   ""   ""   
1989 "GA"  2.68 "NC" "SC" "FL" "AL" 
1989 "OK"  6.23 "KS" "MO" "AR" "TX" 
1989 "OR"  4.72 "WA" "ID" "NV" "CA" 
1989 "NE"  5.35 "SD" "IA" "MO" "KS" 
1989 "SC"  4.19 "NC" "GA" ""   ""   
1989 "WA"  3.58 "ID" "OR" ""   ""   
1989 "MA"  5.14 "NH" "RI" "CT" "NY" 
1989 "NJ"  6.79 "NY" "CT" "DE" "PA" 
1989 "MS"  2.69 "TN" "AL" "LA" "AR" 
1989 "AK" -6.16 ""   ""   ""   ""   
1989 "CT"  5.29 "MA" "RI" "NY" ""   
1989 "SD"    .5 "ND" "MN" "IA" "NE" 
1989 "NY"  5.34 "VT" "MA" "CT" "NJ" 
1989 "MT"  -.27 "ND" "SD" "WY" "ID" 
1990 "MS"  1.18 "TN" "AL" "LA" "AR" 
1990 "IA"  3.54 "MN" "WI" "IL" "MO" 
1990 "AL"  -.28 "TN" "GA" "FL" "MS" 
1990 "WY"  1.23 "MT" "SD" "NE" "CO" 
1990 "IL"  1.76 "WI" "IN" "KY" "MO" 
1990 "CT"   .96 "MA" "RI" "NY" ""   
1990 "NV"  1.74 "ID" "UT" "AZ" "CA" 
1990 "FL"   .92 "GA" "AL" ""   ""   
1990 "WA"  2.74 "ID" "OR" ""   ""   
1990 "ME"   .62 "NH" ""   ""   ""   
1990 "MN"  2.01 "WI" "IA" "SD" "ND" 
1990 "CO"   .77 "WY" "NE" "KS" "OK" 
1990 "ID"  5.46 "MT" "WY" "UT" "NV" 
1990 "UT"   .23 "ID" "WY" "CO" "NM" 
1990 "AR"  1.82 "MO" "TN" "MS" "LA "
1990 "IN"  3.02 "MI" "OH" "KY" "IL" 
1990 "NJ"  1.23 "NY" "CT" "DE" "PA" 
1990 "LA"   .82 "AR" "MS" "TX" ""   
1990 "WI"  1.31 "MI" "IL" "IA" "MN" 
1990 "PA"  1.47 "NY" "NJ" "DE" "MD" 
1990 "ND"  7.37 "MN" "SD" "MT" ""   
1990 "AZ" -1.54 "UT" "CO" "NM" "CA" 
1990 "MO"   1.8 "IA" "IL" "KY" "TN" 
1990 "MA"  -.08 "NH" "RI" "CT" "NY" 
1990 "AK"  3.12 ""   ""   ""   ""   
1990 "GA"   .52 "NC" "SC" "FL" "AL" 
1990 "WV"  1.15 "PA" "MD" "VA" "KY" 
1990 "OR"  1.03 "WA" "ID" "NV" "CA" 
1990 "NC"  1.96 "VA" "SC" "GA" "TN" 
1990 "NM"  1.24 "CO" "OK" "TX" "AZ" 
1990 "NE"  2.77 "SD" "IA" "MO" "KS" 
1990 "MT"  4.16 "ND" "SD" "WY" "ID" 
1990 "NY"  -.41 "VT" "MA" "CT" "NJ" 
1990 "MD"   .65 "PA" "DE" "VA" "WV" 
1990 "VT"  3.07 "NH" "MA" "NY" ""   
1990 "SC"  1.95 "NC" "GA" ""   ""   
1990 "OK"    .8 "KS" "MO" "AR" "TX" 
1990 "DE"  5.52 "PA" "NJ" "MD" ""   
1990 "KS"   .31 "NE" "MO" "OK" "CO" 
1990 "OH"  1.84 "PA" "WV" "KY" "IN" 
1990 "MI"   1.2 "OH" "IN" "WI" ""   
1990 "RI"  2.25 "MA" "CT" ""   ""   
1990 "TX"   1.7 "OK" "AR" "LA" "NM" 
1990 "NH" -2.13 "ME" "MA" "VT" ""   
1990 "KY"  2.28 "OH" "WV" "VA" "TN" 
1990 "TN"   .33 "KY" "VA" "NC" "GA" 
1990 "HI"  4.51 ""   ""   ""   ""   
1990 "SD"  1.87 "ND" "MN" "IA" "NE" 
1990 "VA"  2.06 "MD" "NC" "TN" "KY" 
1990 "CA"   1.1 "OR" "NV" "AZ" ""   
end

For every state (by year) I would like to take the difference between the growth of that state and its neighboring states (for which the neighbors data exist) denoted by columns: neigh1 neigh2 neigh3 neigh4. E.g. for year 1989 for state WY, I would like to take the difference between the growth of WY and its neighbors MT, SD, NE, CO.

Would appreciate help in this regards. Thanks.

georoute command and HERE API not working?

hello all,

I'm having some trouble with the HERE API and the georoute package. I registered for a HERE account and generated both a javascript and a REST ID and code, and neither will work with georoute! I keep getting the Stata message "There seems so be a problem with your HERE account". Am I missing something obvious? Thanks so much, I'm very confused.

Mediation analysis with STATA and control variables?

I've seen a lot of forms to do a mediation analysis, being Baron and Kenny (1986) steps the most popular. However, I see that they do the regressions just with the three variables of interest (reg DV IV; reg Mediator IV; reg DV IV Mediator). My first question is: is necessary to include, also, the control variables to do the analysis? And, if so, how it can be done with STATA. I've read that SEM is a good way, but it is normally done withouth control variables.

'End Duplicates' error in mata programing

Hello, everyone. I'm Zhang.

I would like to ask you a question about an 'End Duplicates' error in my mata programming.

My program is used to compute some matrixs. Considering the computing limit of stata, I need to use mata language. I wrote a do file for mata-stata interaface. I also want to use an ado programmingwhichmakes all kinds of do files run. But the problem is, in my program, the ado programming and the mata programming have the repeated same 'End'. Thus my stata report a error called ‘End Duplicates’.

So, I would like to ask you two questions.
First, is it wrong to write like this, and is stata not allowing this?
Second, if my idea is resaonable, how to code in order to interface mata programming with ado programming?

My codes are:

Code:

/*Define program*/                                                          

program define MYPROGRAM
version 14.0

/*Define syntax*/                                                          

syntax using/, [name(string)  *
                       [ ... ]   // some other options
            
use "`using'", clear   // import data
confirm name `name'

     /*Create and Compute Matrix use Mata*/                                                          

     mata:
      ...
      create a matrix named MATRIX
      ...

     /*End Mata*/
     end

/*End Program*/                  
end   // you can see that, there are two 'end's make 'end duplicates' and end unrecognize

Your answer is very important to me.
Thank you very much for your answer!

Testing of adjusted Kaplan-Meier survival

How to test for adjusted kaplan-meier survival analysis ?
I am able to create adjusted KM graph but i don't know how to test adjusted KM?
Thanks

Marginsplots for different values of x

Hello,
first post so apologies if something similar has been asked before.
I'm investigating the relationship between intergroup ethnic contact at the workplace and tolerance with educational years as interaction.

My X (intergroup contact) has after recoding 3 levels of contact (no contact (baseline) some contact and a lot of contact). I'd like to produce two marginsplots displaying the effect of contact with educational years interacting. One marginsplot displaying the effect of some contact at different levels of education and another displaying the effect of a lot of contact at different levels of education.

My regression looks like this:
xtreg tolerance i.RCimgclg##c.eduyrs i.gndr agea i. empl i.domicil hincfel lrscale, fe robust

So far my marginsplot looks like picture attatched

https://imgur.com/a/2nDpJbc

Thank you in advance

Generating data according to a pattern

Hello list,
I am trying to generate data according to the following pattern:

Obs A B C
----------------
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1

Any help would be appreciated.
Thanks
André

Comparing two probit models with clustered standard errors

Hey everyone,

I would like to compare two probit models (that are nested) to see whether the addition of further variables improves the model. As I need clustered standard errors, it is not possible to use a likelihood ratio test. Which other possibilites do I have? Is it appropriate to compare the AIC/BIC or do another test?

Looking forward to some advice. Thanks! :-)

Panel Data Analysis: Growth rates or levels?

Hi Statalist-community!

I am currently writing a seminar paper. I am estimating the effect of the share of leftwing government members in swiss cantons on the public expenses (and their categories) of these cantons.

The data is balanced panel data with N (cantons) = 26 and T(years) = 28.

The dependent variable is: public expenses in category j per canton in year t
The main independent variable is: share of left-wing government members in government of canton i in year t.

(Category means for example: health, social security, culture, education, etc.)

As control variables I have:

- GDP per canton
- unemployment rate per canton
- ratio of people aged > 64 per canton
- debt per canton
- population per canton
- and a lagged variable of the proportion of left-wing politicians in the parliament

Here is a sample of my main data for one canton:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int jahr long kanton double(links total BIP) long bev double(alt schulden Alquote)
1990 1  .2 2103595.49679  22258.61295548331 498035               .126 4118853.07339                .0021
1991 1  .2  2418372.7013 23247.730242745427 506818               .126 4258530.37578 .0055000000000000005
1992 1  .2 2622013.84704  23731.27671978167 511979               .126 3995904.96175   .01648507796021844
1993 1  .2 2752637.76148 24251.730934888055 518945               .126 4020907.67093   .03353853343994975
1994 1  .2 2866913.15304  24850.55070690186 523114               .127 4312582.93007  .032740127331169364
1995 1  .2 2871594.22682  25154.26712852215 528887 .12800000000000003  4261014.9935   .02906240833847551
1996 1   0  3033372.5041 25331.135728586345 531665               .129 4566428.24205   .03839377973484641
1997 1   0 3065249.40363  25817.39558783384 534028                .13  4734656.4063  .046613346283090946
1998 1   0 3092004.07619  26558.17208651548 536462               .132 4797195.89518  .030217723885365436
1999 1   0 3184180.88853 27039.352615536056 540639 .13195681406631782 4907276.22711  .021088996722396877
2000 1   0 3288728.78046 28525.767132413897 544306 .13364541269065564 4901166.57689  .013503121668950816
2001 1   0 3448772.61443  29194.49554783396 550298 .13510134508938793  4762340.0067   .01214187822228023
2002 1   0 3559261.63535  29167.80471578718 555782  .1365679349097308 4489301.77899   .02123115577889447
2003 1   0       3771771 29508.175608019385 560674 .13796074010922568 4366114.90529  .033257738911005245
2004 1   0 3868402.48256 30431.545233146422 565122 .14008302631998046 5305308.83576  .034339989993256326
2005 1   0 3987495.90309   31596.1406982345 569344 .14222684352517986 4762380.78077   .03251647849637799
2006 1   0 4304953.29254 33544.994154576845 574813 .14474272502535607 4510972.05203   .02857251626095847
2007 1   0 4505096.31065  35767.62753463508 581562   .147655795942651 4886900.96703  .023552013313319846
2008 1 .25 3995789.07397        37774.52858 591632 .15066122184060363 4832603.85848  .022927679523156913
2009 1  .4 4107370.48387        36943.54089 600040  .1535130991267249 4584035.66422  .033850257782418576
2010 1  .4 4146058.66746        37664.96637 608299 .15526129007990633 4696771.97718   .03128852310932995
2011 1  .4 4336101.42091        38505.31274 618298 .15872281650595668 4311897.59296  .025658837672748246
2012 1  .4 4577403.78175        38719.76854 627340 .16137022985940638 3900244.74448   .02685290486325758
2013 1  .4 4703960.67493        39488.54672 636362  .1640371360954928 3744099.61901  .028494090775842893
2014 1  .4 4770144.18455        40139.27705 645277  .1663657623005314 4051689.93526  .027857224266495634
2015 1  .4 4892710.96136        40647.58003 653675 .16873522775079358 4254435.63835  .029879852698914817
2016 1  .2 4973509.49627           40813.49 663462  .1705282291977536 4838286.30867  .031555575596547404
2017 1  .2 4944214.42435        41592.47817 670988 .17351129975498816 4990070.88585  .030368726678589565
end
label values kanton kanton1
label def kanton1 1 "AG", modify
label var jahr "jahr"
label var kanton "kanton"
label var links "Anteil links in Regierung"
label var total "Total Ausgaben"
label var BIP "BIP"
label var bev "Total Wohnbevölkerung"
label var alt "Anteil Bevölkerung >64"
label var schulden "Bruttoschulden"
label var Alquote "Arbeitslosenquote in Dezimalzahlen"

My question to you is now: Should I use the growth rates in the variables (the dependent variables as well as the control variables) or their levels? I decided to estimate a LSDVC model (xtlsdvc in STATA). When I use levels, I see some effects, but when I use growth rates, virtually all the variables become insignificant.

I use the following code in STATA for the analysis in levels:

Code:

xtlsdvc kat03 links lagd_mlp bev alt schulden Alquote BIP, initial(ab) vcov(50) first

And the following code for the analysis in growth rates:

Code:

xtlsdvc gln_kat03 links lagd_mlp gln_alt gln_schulden gln_Alquote gln_BIP, initial(ab) vcov(50) first

The reason why I'm unsure is because almost all the scientific papers examining the same hypothesis are using growth rates, but I don't really see why.

The data are stationary and the Hausman-test result proposed to use fixed effects.

Also: Do you think I am estimating the right model?

Thank you so much, your answer would help me an awful lot!!

Regards,
Lara Knuchel