It seems to be well-documented (here: https://www.statalist.org/forums/for...port-fvvarlist or here: https://www.stata.com/statalist/arch.../msg00707.html) that xtoverid does not work when factor variable are included in a regression using xtivreg.
I am using factor variables in an xtivreg regression, and I would like to know the first stage F-stat for my excluded variables. Is there any way to do this w/out using xtoverid?
If there is no post-estimation command that works to do this, I can of course separately run what I think is the 1st stage, and test my excluded variables myself. From page 20 of the manual (https://www.stata.com/manuals/xtxtivreg.pdf) it looks like I would first (a) remove all fixed effects using xtreg, then (b) run a 2SLS regression of my 1st stage using ivreg or ivreg2. Does anyone know if this is indeed the best manual approximation of the first stage of xtivreg?
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Tuesday, December 31, 2019
Panel data - Creating a date variable from year and weeknumber as string
Stata listers
I am writing with a query relating to panel data for historical prices. I am trying to create a a date variable from an Excel file which contains year and weeknumber as a string. Is there a way to convert information available - year and week numebrs (as string) into Stata or Excel recognisable dates? Thanks very much.
I am not able to attach this data in .dta format for some reason. I am using Stata MP 16.
I am writing with a query relating to panel data for historical prices. I am trying to create a a date variable from an Excel file which contains year and weeknumber as a string. Is there a way to convert information available - year and week numebrs (as string) into Stata or Excel recognisable dates? Thanks very much.
| year | weeknum | Price 1 | Price 2 | date |
| 1890 | 2nd week in Jan | 76 | 90 | |
| 1890 | 3rd week in Jan | 76 | 90 | |
| 1890 | 4th week in Jan | 76 | 90 | |
| 1890 | 2nd week in Feb | 76 | 90 | |
| 1890 | 3rd week in Feb | 76 | 90 | |
| 1890 | 4th week in Feb | 76 | 90 | |
| 1890 | 2nd week in March | 76 | 90 | |
| 1890 | 3rd week in March | 80 | 94 | |
| 1890 | 4th week in March | 80 | 94 | |
| 1890 | 5th week in March | 80 | 94 |
I am not able to attach this data in .dta format for some reason. I am using Stata MP 16.
Margins: trouble with continuous interactions under simultaneous fixed effects
Sometimes I wish to control for variable X via fixed effects (say, year fixed effects) but also allow the marginal effect of a second variable to vary continuously with variable X (say, the effect of adopting a new technology might vary linearly or non-linearly with year). In these situations, I am NOT interested in allowing the marginal effect of that second variable to change with *every* value of variable X --- this would waste power, as I believe that the marginal effect of the second variable varies smoothly with variable X.
Stata can run this regression: reg Y X1 i.X1#c.X2 i.X2. However, while a coefficient is calculated for both X1 and i.X1#c.X2, margins is for some reason unable to obtain the marginal effects of X1 over X2.
I have had this problem several times, and right now I'm having this problem in a situation where I have other fixed effect accounted for, and so am using xtreg. However, the problem is generalizable to a situation where one is using reg only. I have replicated the problem in the auto dataset, below, and would be incredibly grateful for thoughts on what's going on.
Stata can run this regression: reg Y X1 i.X1#c.X2 i.X2. However, while a coefficient is calculated for both X1 and i.X1#c.X2, margins is for some reason unable to obtain the marginal effects of X1 over X2.
I have had this problem several times, and right now I'm having this problem in a situation where I have other fixed effect accounted for, and so am using xtreg. However, the problem is generalizable to a situation where one is using reg only. I have replicated the problem in the auto dataset, below, and would be incredibly grateful for thoughts on what's going on.
Code:
sysuse auto, clear
xtset foreign
gen lprice=log(price)
gen HIGHmpg=mpg>25
** Reg 1: This works fine
xtreg lprice i.HIGHmpg i.turn
margins, dydx(i.HIGHmpg)
** Works fine w/ no interaction
** Reg 2: This does not work
xtreg lprice i.HIGHmpg i.HIGHmpg#c.turn i.turn
margins, dydx(i.HIGHmpg) at(c.turn=(32 36 40 44 48 52))
/* Command does not run. Error returned:
c.turn ambiguous abbreviation
r(111); */
** Reg 3: This "trick" also doesn't work
gen test=turn
xtreg lprice i.HIGHmpg i.HIGHmpg#c.test i.turn
margins, dydx(i.HIGHmpg) at(c.test=(32 36 40 44 48 52))
/* Command runs, but interactions deemed "not estimable" */
** Reg 2 w/ reg instead of xtreg
reg lprice i.HIGHmpg i.HIGHmpg#c.turn i.turn
margins, dydx(i.HIGHmpg) at(c.turn=(32 36 40 44 48 52))
** Same error given
** Reg 3 w/ reg instead of xtreg
reg lprice i.HIGHmpg i.HIGHmpg#c.test i.turn
margins, dydx(i.HIGHmpg) at(c.test=(32 36 40 44 48 52))
** Interactions still deemed "not estimable"
** Note: it is possible to allow an interaction between i.HIGHmpg and EVERY
** value of test, as below, but this is not what I want to do, as it wastes power.
** In my own examples, it is helpful to do this because I can see a linear or
** non-linear pattern in the marginal effects, but then I ultimately want to run
** the model allowing only a continuous change in the marginal effects.
xtreg lprice i.HIGHmpg i.HIGHmpg#i.turn i.turn
margins, dydx(i.HIGHmpg) over(i.turn)Please help: Importing and merging multiple sheets from an excel file while renaming variables using loops
Hello,
I am quite new using loops and really want to understand it better.
I have an excel file with 11 sheets, I only need a couple sheets from it and need to rerun it everyday with new data but the same variables so am I am trying to write an efficient script to be able to complete what I need to do. For each of the sheets there are patient identifiers but they come in with different column names which is making it difficult to merge when importing within one loop.
If the column that I want to merge on is column_A, but in each sheet is called "column_AA" "column_AB" "columnAC" respective for the Sheet A, Sheet B, Sheet C
What I have so far to import the data in.
How might I be able to add in a command to rename the column names to a similar one so then I can merge them all?
I was thinking of adding a loop inside it or a second loop after, but then the correct column_AB wouldnt match up with the correct sheet.
this is what i was thinking but doesnt really work
Thanks for your help/advice/response!
-Ben
I am quite new using loops and really want to understand it better.
I have an excel file with 11 sheets, I only need a couple sheets from it and need to rerun it everyday with new data but the same variables so am I am trying to write an efficient script to be able to complete what I need to do. For each of the sheets there are patient identifiers but they come in with different column names which is making it difficult to merge when importing within one loop.
If the column that I want to merge on is column_A, but in each sheet is called "column_AA" "column_AB" "columnAC" respective for the Sheet A, Sheet B, Sheet C
What I have so far to import the data in.
Code:
local sheets "Sheet_A Sheet_B Sheet_C"
foreach y in `sheets'{
import excel using "data_set.xls", sheet(`y') firstrow clear
save "`y'.dta",replace
}I was thinking of adding a loop inside it or a second loop after, but then the correct column_AB wouldnt match up with the correct sheet.
this is what i was thinking but doesnt really work
Code:
local variable "column_AA column_AB columnAC"
foreach t in `variable'{
use "`y'.dta", clear
rename `t' column_A
}-Ben
Panel data regression
Hello everyone,
I'm writing my thesis and I'm struggling with the processing of my data. First of all, my research question is: "What is the effect of environmental controversies on the profitability of Chinese and European firms?" and I want to check for moderation of corporate environmental performance, press freedom of the country of origin of the firm and ownership structure (concentration and state ownership). My dependent variables are ROA, ROE and Tobin's Q. My independent variables are environmental controversies (EC), corporate environmental performance (CEP), press freedom (PF), ownership concentration (Independence), and state ownership (GUO). My control variables are firm size, leverage and industry.
I have collected my data from Eikon and Orbis. I opted for a balanced dataset (so there are no more missing values), and this dataset consists of 314 firms (64 Chinese, 250 European)
My variables are:
- id (1 until 314)
- Year (2013-2018)
- Country (Europe or China)
- Industry (10 categories)
- Independence (A+ until D)
- GUO (e.g. Public authority)
- EC (dummy --> 0: no controversy in that year; 1: controversy in that year)
- CEP (score out of 100)
- PF (score out of 100)
- ROA
- ROE
- Tobin's Q
- Firm size
- Leverage
I made dummy variables for Country (DummyChina and DummyEurope), BvDIndependenceIndicator (DummyLowConcentration, DummyMediumLowConcentration, DummyMediumHighConcentration and DummyHighConcentration), GUO Type (DummyStateOwnership), Industry (DummyIndustry1, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10).
Also, the variables EC, CEP and PF are lagged, as I want to measure the effect of the occurence of an environmental controversy on the profitability of the next year.
When I first started my regression, I used SPSS. However, I read that Stata is a much better alternative for panel data. I was able to upload my data in Stata, and did some tests to check whether I need: pooled OLS model, fixed effects model or random effects model. The result pointed out that I need to use REM. I was able to regress my first model, only using ROA as my dependent variable and EC, Firm size, leverage, DummyChina, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10.
My questions:
- If I want to compare Chinese and European firms, is this the right standard model? Or do I have to start with just ROA, EC and the control variables and then make interaction terms for Country and Industry?
- If I later make interaction terms for Country, Industry, CEP, PF, GUO and Independence, can I add all these in just one regression? I do I have to add them separately and make multiple regressions?
Quite frankly, I'm a bit lost. I have never used panel data or Stata, and I have no idea what the right order is to answer my research question and check for moderation. My main struggle is the interaction terms.
If anyone has suggestions or could tell me the steps I have to follow, please let me know. Thank you in advance!!
I'm writing my thesis and I'm struggling with the processing of my data. First of all, my research question is: "What is the effect of environmental controversies on the profitability of Chinese and European firms?" and I want to check for moderation of corporate environmental performance, press freedom of the country of origin of the firm and ownership structure (concentration and state ownership). My dependent variables are ROA, ROE and Tobin's Q. My independent variables are environmental controversies (EC), corporate environmental performance (CEP), press freedom (PF), ownership concentration (Independence), and state ownership (GUO). My control variables are firm size, leverage and industry.
I have collected my data from Eikon and Orbis. I opted for a balanced dataset (so there are no more missing values), and this dataset consists of 314 firms (64 Chinese, 250 European)
My variables are:
- id (1 until 314)
- Year (2013-2018)
- Country (Europe or China)
- Industry (10 categories)
- Independence (A+ until D)
- GUO (e.g. Public authority)
- EC (dummy --> 0: no controversy in that year; 1: controversy in that year)
- CEP (score out of 100)
- PF (score out of 100)
- ROA
- ROE
- Tobin's Q
- Firm size
- Leverage
I made dummy variables for Country (DummyChina and DummyEurope), BvDIndependenceIndicator (DummyLowConcentration, DummyMediumLowConcentration, DummyMediumHighConcentration and DummyHighConcentration), GUO Type (DummyStateOwnership), Industry (DummyIndustry1, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10).
Also, the variables EC, CEP and PF are lagged, as I want to measure the effect of the occurence of an environmental controversy on the profitability of the next year.
When I first started my regression, I used SPSS. However, I read that Stata is a much better alternative for panel data. I was able to upload my data in Stata, and did some tests to check whether I need: pooled OLS model, fixed effects model or random effects model. The result pointed out that I need to use REM. I was able to regress my first model, only using ROA as my dependent variable and EC, Firm size, leverage, DummyChina, DummyIndustry2, DummyIndustry3, DummyIndustry4, DummyIndustry5, DummyIndustry6, DummyIndustry7, DummyIndustry8, DummyIndustry9 and DummyIndustry10.
My questions:
- If I want to compare Chinese and European firms, is this the right standard model? Or do I have to start with just ROA, EC and the control variables and then make interaction terms for Country and Industry?
- If I later make interaction terms for Country, Industry, CEP, PF, GUO and Independence, can I add all these in just one regression? I do I have to add them separately and make multiple regressions?
Quite frankly, I'm a bit lost. I have never used panel data or Stata, and I have no idea what the right order is to answer my research question and check for moderation. My main struggle is the interaction terms.
If anyone has suggestions or could tell me the steps I have to follow, please let me know. Thank you in advance!!
Time varying covariate in Cox Regression model
Hi all. After a thorough search online I can't seem to find a solution to my problem, which is why I'm now asking the experts
I'm doing a cox regression in 1175 subjects where I want to assess the effect of the dichotomous baseline variable X on the outcome Z. All subjects have variable X which is present since birth. In addition I have another dichotomous variable Y (which is more like an intervention effect) which is not present at baseline for any of the subjects, however some of the subjects get affected by (Y) event during the follow up at different dates, and this variable is known to be connected with outcome Z. I'm trying to know if variable Y increases the chance of occurrence of outcome Z in which (Z=1) among those who have the effect variable Y during their followup and those who don't.
So the "known" chain of events is X --> Y ---> Z . And I want to test X --> Z. But I still want to include the effect of Y in my model as some of the subjects will follow X-->Y-->Z.
So i thought - how can I include Y as a time-varying covariate so as not to underestimate the effect of Y but still assess if there is a direct correlation with X and Z.
Hope the question isn't to cryptic - I'll be happy to elaborate on the question.
I'm doing a cox regression in 1175 subjects where I want to assess the effect of the dichotomous baseline variable X on the outcome Z. All subjects have variable X which is present since birth. In addition I have another dichotomous variable Y (which is more like an intervention effect) which is not present at baseline for any of the subjects, however some of the subjects get affected by (Y) event during the follow up at different dates, and this variable is known to be connected with outcome Z. I'm trying to know if variable Y increases the chance of occurrence of outcome Z in which (Z=1) among those who have the effect variable Y during their followup and those who don't.
So the "known" chain of events is X --> Y ---> Z . And I want to test X --> Z. But I still want to include the effect of Y in my model as some of the subjects will follow X-->Y-->Z.
So i thought - how can I include Y as a time-varying covariate so as not to underestimate the effect of Y but still assess if there is a direct correlation with X and Z.
Hope the question isn't to cryptic - I'll be happy to elaborate on the question.
Timevar for survival analysis
Dear All,
This might be a silly question, but it is driving me crazy.
I am managing data which were not recorded for survival analysis and I am trying to put them in a proper format.
For the purpose of my question, here my data (I have more variables, but they behave as Var1 and Var2, namely varying during time):
ID is my person identifier, who can be visited several times (Visit, 0 is the baseline) in different dates (Date is when the visit took place). Each person, during the visit, could say up to 9 dates (I do have DOsp1-DOsp9, but for the sake of this question I just put the first two) regarding if and when they were hospitalized between the visits.
I will use snapspan in order to convert my data to time-span data, but before I guess I need to slightly change my time variable (and the dataset overall).
I want to have a timevar like Time (see table below) in order to run snapspan ID Time.
This is the final dataset I want to obtain:
As you might notice, if any date recorded in DOsp1-DOsp9 happened before Visit 0, it will not be taken into account. Then Event_recode will be build in order to have the failure var for my stset (Event_recode will be 0 if the row is regarding a visit, 1 if it is regarding an hospitalization, 2 if the person dies, namely if Var1==2, and then 3 if it is censored).
All of that, in order to run the following code:
stset Dataends, id(ID) time0( Datastarts ) origin(time Datastarts ) failure(Event_recode==1 2 ).
Thank you to anyone who can help me, feel free to ask me clarifications.
Best
This might be a silly question, but it is driving me crazy.
I am managing data which were not recorded for survival analysis and I am trying to put them in a proper format.
For the purpose of my question, here my data (I have more variables, but they behave as Var1 and Var2, namely varying during time):
| ID | Visit | Date | DOsp1 | DOsp2 | Sex | Var1 | Var2 |
| 1 | 0 | 1mar2002 | M | 0 | . | ||
| 1 | 1 | 3jun2005 | M | . | . | ||
| 1 | 2 | 4feb2007 | M | . | . | ||
| 2 | 0 | 9feb2002 | 21dec2000 | 22jun2001 | F | 1 | 18.9 |
| 2 | 1 | 7sep2002 | F | 2 | 9999 | ||
| 3 | 0 | 25mar2003 | M | 0 | 20 | ||
| 3 | 1 | 13oct2004 | M | 2 | 9999 | ||
| 4 | 0 | 4oct2002 | F | 1 | 23.5 | ||
| 4 | 1 | 03may2004 | 4jan2003 | 24jun2003 | F | . | . |
| 4 | 2 | 13jan2006 | F | . | . | ||
| 4 | 3 | 25aug2007 | F | 2 | 9999 |
ID is my person identifier, who can be visited several times (Visit, 0 is the baseline) in different dates (Date is when the visit took place). Each person, during the visit, could say up to 9 dates (I do have DOsp1-DOsp9, but for the sake of this question I just put the first two) regarding if and when they were hospitalized between the visits.
I will use snapspan in order to convert my data to time-span data, but before I guess I need to slightly change my time variable (and the dataset overall).
I want to have a timevar like Time (see table below) in order to run snapspan ID Time.
| ID | Visit | Date | DOsp1 | DOsp2 | Sex | Var1 | Var2 | Time |
| 1 | 0 | 1mar2002 | M | 0 | . | 1mar2002 | ||
| 1 | 1 | 3jun2005 | M | . | . | 3jun2005 | ||
| 1 | 2 | 4feb2007 | M | . | . | 4feb2007 | ||
| 2 | . | . | . | . | . | . | . | 21dec2000 |
| 2 | . | . | . | . | . | . | . | 22jun2001 |
| 2 | 0 | 9feb2002 | 21dec2000 | 22jun2001 | F | 1 | 18.9 | 9feb2002 |
| 2 | 1 | 7sep2002 | F | 2 | 9999 | 7sep2002 | ||
| 3 | 0 | 25mar2003 | M | 0 | 20 | 25mar2003 | ||
| 3 | 1 | 13oct2004 | M | 2 | 9999 | 13oct2004 | ||
| 4 | 0 | 4oct2002 | F | 1 | 23.5 | 4oct2002 | ||
| 4 | . | . | . | . | . | . | . | 4jan2003 |
| 4 | . | . | . | . | . | . | . | 24jun2003 |
| 4 | 1 | 03may2004 | 4jan2003 | 24jun2003 | F | . | . | 03may2004 |
| 4 | 2 | 13jan2006 | F | . | . | 13jan2006 | ||
| 4 | 3 | 25aug2007 | F | 2 | 9999 | 25aug2007 |
This is the final dataset I want to obtain:
| ID | Datestarts | Dateends | Sex | Var1 | Var2 | Event | Event_recode |
| 1 | . | 1mar2002 | M | 0 | . | Visit 0 | 0 |
| 1 | 1mar2002 | 3jun2005 | M | . | . | Visit 1 | 0 |
| 1 | 3jun2005 | 4feb2007 | M | . | . | Visit 2 | 0 |
| 2 | . | 9feb2002 | F | 1 | 18.9 | Visit 0 | 0 |
| 2 | 9feb2002 | 7sep2002 | F | 2 | 9999 | Visit 1 | 2 |
| 3 | . | 25mar2003 | M | 0 | 20 | Visit 0 | 0 |
| 3 | 25mar2003 | 13oct2004 | M | 2 | 9999 | Visit 1 | 2 |
| 4 | . | 4oct2002 | F | 1 | 23.5 | Visit 0 | 0 |
| 4 | 4oct2002 | 4jan2003 | F | . | . | Osp 1 | 1 |
| 4 | 4jan2003 | 24jun2003 | F | . | . | Osp 2 | 1 |
| 4 | 24jun2003 | 03may2004 | F | . | . | Visit 1 | 0 |
| 4 | 03may2004 | 13jan2006 | F | . | . | Visit 2 | 0 |
| 4 | 13jan2006 | 25aug2007 | F | 2 | 9999 | Visit 3 | 2 |
All of that, in order to run the following code:
stset Dataends, id(ID) time0( Datastarts ) origin(time Datastarts ) failure(Event_recode==1 2 ).
Thank you to anyone who can help me, feel free to ask me clarifications.
Best
how to estimate individual betas of coefficients in a province-sector-year panel data (with 2 sectional identifiers)
Hello everyone:
I'm trying to estimate production functions for a panel data of manufacturing with 2 identifiers (province, sector) so that each sector will have observations of the different provinces. The first thing I do is to egen a new ID by group(province sector), but it leads to ignoring the unobservable common trend within each province, or sector apparently.
I was considering a fixed effect (LSDV) or a semi-parametric (e.g. Levinsohn and Petrin). The problem is:
(1)for the former fashion, how to correctly set factor variables;
(2)for the latter, how to correctly get betas of K and L for every province-sector section.
The attachment dataex.txt is a part of my data file.The models I thought were:
(1) reg lnYL_go lnKL i.prov_sec_id i.prov_sec_id#c.lnKL i.actual_year, vce(cluster prov_sec_id) (lnYL and lnKL not included, they are simply ln(Y\L), etc, assuming CRS.)
(2) prodest lnY_va, free(lnL) state(lnK) proxy(lnInt) met(lp) va acf id(prov_sec_id) t(actual_year)
I'm not trying to be a free rider, it's just that related references are rare. Any opinion or suggestion would be appreciated, and happy new year!
I'm trying to estimate production functions for a panel data of manufacturing with 2 identifiers (province, sector) so that each sector will have observations of the different provinces. The first thing I do is to egen a new ID by group(province sector), but it leads to ignoring the unobservable common trend within each province, or sector apparently.
I was considering a fixed effect (LSDV) or a semi-parametric (e.g. Levinsohn and Petrin). The problem is:
(1)for the former fashion, how to correctly set factor variables;
(2)for the latter, how to correctly get betas of K and L for every province-sector section.
The attachment dataex.txt is a part of my data file.The models I thought were:
(1) reg lnYL_go lnKL i.prov_sec_id i.prov_sec_id#c.lnKL i.actual_year, vce(cluster prov_sec_id) (lnYL and lnKL not included, they are simply ln(Y\L), etc, assuming CRS.)
(2) prodest lnY_va, free(lnL) state(lnK) proxy(lnInt) met(lp) va acf id(prov_sec_id) t(actual_year)
I'm not trying to be a free rider, it's just that related references are rare. Any opinion or suggestion would be appreciated, and happy new year!
Discriminate analysis using stata
Hello everyone ;
i need to apply 'Discriminate analysis' on stata , how can i apply it and get both the standardized and unstandardized Discriminate function coefficients with structure matrix
I'm supposed to do like the pic
i need to apply 'Discriminate analysis' on stata , how can i apply it and get both the standardized and unstandardized Discriminate function coefficients with structure matrix
I'm supposed to do like the pic
about synth instruction question
hello great master
I have a question
My stata version is 14
when I perform the instruction, it always displays the error message
what can I do?
////stata instruction///
xtset state year
replace age15to24 = 100*age15to24
synth cigsale cigsale(1988) cigsale(1980) cigsale(1975) lnincome retprice ///
age15to24 beer(1984(1)1988), trunit(3) trperiod(1989)
file synthopt.plugin not found <-------error message
(error occurred while loading synth.ado)
/////data////
clear
input long state float(year cigsale lnincome beer age15to24 retprice cigsale_cal cigsale_rest)
1 1970 89.8 . . 1788.618 39.6 . 120.08421
1 1971 95.4 . . 1799.2784 42.7 . 123.86316
1 1972 101.1 9.498476 . 1809.939 42.3 . 129.17896
1 1973 102.9 9.550107 . 1820.5994 42.1 . 131.53947
1 1974 108.2 9.537163 . 1831.26 43.1 . 134.66843
1 1975 111.7 9.540031 . 1841.9207 46.6 . 136.93158
1 1976 116.2 9.591908 . 1852.581 50.4 . 141.26053
1 1977 117.1 9.617496 . 1863.242 50.1 . 141.08948
1 1978 123 9.654072 . 1873.9023 55.1 . 140.47368
1 1979 121.4 9.64918 . 1884.563 56.8 . 138.08684
1 1980 123.2 9.612194 . 1895.2234 60.6 . 138.08948
1 1981 119.6 9.609594 . 1858.4222 68.8 . 137.98685
1 1982 119.1 9.59758 . 1821.621 73.1 . 136.29474
1 1983 116.3 9.626769 . 1784.8202 84.4 . 131.25
1 1984 113 9.671621 18 1748.019 90.8 . 124.90263
1 1985 114.5 9.703193 18.7 1711.218 99 . 123.1158
1 1986 116.3 9.74595 19.3 1674.4167 103 . 120.59473
1 1987 114 9.762092 19.4 1637.6157 110 . 117.58685
1 1988 112.1 9.78177 19.4 1600.8146 114.4 . 113.82368
1 1989 105.6 9.802527 19.4 1564.0134 122.3 . 109.66315
1 1990 108.6 9.81429 20.1 1527.2124 139.1 . 105.66579
1 1991 107.9 9.81926 20.1 . 144.4 . 104.3421
1 1992 109.1 9.845286 20.4 . 172.2 . 103.39474
1 1993 108.5 9.85216 20.3 . 176.2 . 102.69473
1 1994 107.1 9.879334 21 . 154.6 . 102.11842
1 1995 102.6 9.924404 20.6 . 155.1 . 103.1579
1 1996 101.4 9.940027 21 . 158.3 . 101.18421
1 1997 104.9 9.93727 20.8 . 167.4 . 101.78947
1 1998 106.2 . . . 180.5 . 100.9579
1 1999 100.7 . . . 195.6 . 97.59473
1 2000 96.2 . . . 270.7 . 92.13421
2 1970 100.3 . . 1690.0676 36.7 . 120.08421
2 1971 104.1 . . 1699.5386 38.8 . 123.86316
2 1972 103.9 9.464514 . 1709.0095 44.1 . 129.17896
2 1973 108 9.55683 . 1718.4805 45.1 . 131.53947
2 1974 109.7 9.542286 . 1727.9513 45.5 . 134.66843
2 1975 114.8 9.514094 . 1737.4224 48.6 . 136.93158
2 1976 119.1 9.558153 . 1746.8933 50.9 . 141.26053
2 1977 122.6 9.590923 . 1756.364 52.6 . 141.08948
2 1978 127.3 9.657238 . 1765.835 56.5 . 140.47368
2 1979 126.5 9.633533 . 1775.306 58.4 . 138.08684
2 1980 131.8 9.573803 . 1784.777 61.5 . 138.08948
2 1981 128.7 9.593041 . 1750.1112 64.7 . 137.98685
2 1982 127.4 9.5737 . 1715.4453 72.1 . 136.29474
2 1983 128 9.593053 . 1680.7794 82 . 131.25
2 1984 123.1 9.65044 17.9 1646.1138 93.6 . 124.90263
2 1985 125.8 9.675527 18.1 1611.448 98.5 . 123.1158
2 1986 126 9.705939 18.7 1576.782 103.6 . 120.59473
2 1987 122.3 9.705574 19 1542.1163 113 . 117.58685
2 1988 121.5 9.721532 18.9 1507.4504 119.9 . 113.82368
2 1989 118.3 9.73737 19 1472.7847 127.7 . 109.66315
2 1990 113.1 9.736311 19.9 1438.119 141.2 . 105.66579
2 1991 116.8 9.743068 19.9 . 146.5 . 104.3421
2 1992 126 9.788629 20 . 177.3 . 103.39474
2 1993 113.8 9.785142 19.7 . 179.9 . 102.69473
2 1994 108.8 9.813631 19.7 . 168.1 . 102.11842
2 1995 113 9.86446 19.5 . 167.3 . 103.1579
2 1996 110.7 9.885234 20.1 . 167.1 . 101.18421
2 1997 108.7 9.883107 19.8 . 181.3 . 101.78947
2 1998 109.5 . . . 187.3 . 100.9579
2 1999 104.8 . . . 206.9 . 97.59473
2 2000 99.4 . . . 279.3 . 92.13421
3 1970 123 . . 1781.5833 38.8 123 .
3 1971 121 . . 1792.9636 39.7 121 .
3 1972 123.5 9.930814 . 1804.344 39.9 123.5 .
3 1973 124.4 9.955092 . 1815.724 39.9 124.4 .
3 1974 126.7 9.947999 . 1827.1044 41.9 126.7 .
3 1975 127.1 9.937167 . 1838.4847 45 127.1 .
3 1976 128 9.976858 . 1849.865 48.3 128 .
3 1977 126.4 10.0027 . 1861.2454 49 126.4 .
3 1978 126.1 10.045565 . 1872.6255 58.7 126.1 .
3 1979 121.9 10.054688 . 1884.0057 60.1 121.9 .
3 1980 120.2 10.03784 . 1895.386 62.1 120.2 .
3 1981 118.6 10.028626 . 1855.3705 66.4 118.6 .
3 1982 115.4 10.01253 . 1815.355 72.8 115.4 .
3 1983 110.8 10.031737 . 1775.3394 84.9 110.8 .
3 1984 104.8 10.07536 25 1735.324 94.9 104.8 .
3 1985 102.8 10.099703 24 1695.3083 98 102.8 .
3 1986 99.7 10.127267 24.7 1655.2927 104.4 99.7 .
3 1987 97.5 10.1343 24.1 1615.277 103.9 97.5 .
3 1988 90.1 10.141663 23.6 1575.2615 117.4 90.1 .
3 1989 82.4 10.142313 23.7 1535.246 126.4 82.4 .
3 1990 77.8 10.141623 23.8 1495.2303 163.8 77.8 .
3 1991 68.7 10.110714 22.3 . 186.8 68.7 .
3 1992 67.5 10.11494 21.3 . 201.9 67.5 .
3 1993 63.4 10.098497 20.8 . 205.1 63.4 .
3 1994 58.6 10.099508 20.1 . 190.3 58.6 .
3 1995 56.4 10.155916 19.7 . 195.1 56.4 .
3 1996 54.5 10.178637 19.1 . 197.9 54.5 .
3 1997 53.8 10.17519 19.5 . 200.3 53.8 .
3 1998 52.3 . . . 207.8 52.3 .
3 1999 47.2 . . . 224.9 47.2 .
3 2000 41.6 . . . 351.2 41.6 .
4 1970 124.8 . . 1909.5022 29.4 . 120.08421
4 1971 125.5 . . 1916.476 31.1 . 123.86316
4 1972 134.3 9.805548 . 1923.4497 31.2 . 129.17896
4 1973 137.9 9.848413 . 1930.4232 32.7 . 131.53947
4 1974 132.8 9.840451 . 1937.397 38.1 . 134.66843
4 1975 131 9.828461 . 1944.3706 41.7 . 136.93158
4 1976 134.2 9.858913 . 1951.344 44.8 . 141.26053
end
I have a question
My stata version is 14
when I perform the instruction, it always displays the error message
what can I do?
////stata instruction///
xtset state year
replace age15to24 = 100*age15to24
synth cigsale cigsale(1988) cigsale(1980) cigsale(1975) lnincome retprice ///
age15to24 beer(1984(1)1988), trunit(3) trperiod(1989)
file synthopt.plugin not found <-------error message
(error occurred while loading synth.ado)
/////data////
clear
input long state float(year cigsale lnincome beer age15to24 retprice cigsale_cal cigsale_rest)
1 1970 89.8 . . 1788.618 39.6 . 120.08421
1 1971 95.4 . . 1799.2784 42.7 . 123.86316
1 1972 101.1 9.498476 . 1809.939 42.3 . 129.17896
1 1973 102.9 9.550107 . 1820.5994 42.1 . 131.53947
1 1974 108.2 9.537163 . 1831.26 43.1 . 134.66843
1 1975 111.7 9.540031 . 1841.9207 46.6 . 136.93158
1 1976 116.2 9.591908 . 1852.581 50.4 . 141.26053
1 1977 117.1 9.617496 . 1863.242 50.1 . 141.08948
1 1978 123 9.654072 . 1873.9023 55.1 . 140.47368
1 1979 121.4 9.64918 . 1884.563 56.8 . 138.08684
1 1980 123.2 9.612194 . 1895.2234 60.6 . 138.08948
1 1981 119.6 9.609594 . 1858.4222 68.8 . 137.98685
1 1982 119.1 9.59758 . 1821.621 73.1 . 136.29474
1 1983 116.3 9.626769 . 1784.8202 84.4 . 131.25
1 1984 113 9.671621 18 1748.019 90.8 . 124.90263
1 1985 114.5 9.703193 18.7 1711.218 99 . 123.1158
1 1986 116.3 9.74595 19.3 1674.4167 103 . 120.59473
1 1987 114 9.762092 19.4 1637.6157 110 . 117.58685
1 1988 112.1 9.78177 19.4 1600.8146 114.4 . 113.82368
1 1989 105.6 9.802527 19.4 1564.0134 122.3 . 109.66315
1 1990 108.6 9.81429 20.1 1527.2124 139.1 . 105.66579
1 1991 107.9 9.81926 20.1 . 144.4 . 104.3421
1 1992 109.1 9.845286 20.4 . 172.2 . 103.39474
1 1993 108.5 9.85216 20.3 . 176.2 . 102.69473
1 1994 107.1 9.879334 21 . 154.6 . 102.11842
1 1995 102.6 9.924404 20.6 . 155.1 . 103.1579
1 1996 101.4 9.940027 21 . 158.3 . 101.18421
1 1997 104.9 9.93727 20.8 . 167.4 . 101.78947
1 1998 106.2 . . . 180.5 . 100.9579
1 1999 100.7 . . . 195.6 . 97.59473
1 2000 96.2 . . . 270.7 . 92.13421
2 1970 100.3 . . 1690.0676 36.7 . 120.08421
2 1971 104.1 . . 1699.5386 38.8 . 123.86316
2 1972 103.9 9.464514 . 1709.0095 44.1 . 129.17896
2 1973 108 9.55683 . 1718.4805 45.1 . 131.53947
2 1974 109.7 9.542286 . 1727.9513 45.5 . 134.66843
2 1975 114.8 9.514094 . 1737.4224 48.6 . 136.93158
2 1976 119.1 9.558153 . 1746.8933 50.9 . 141.26053
2 1977 122.6 9.590923 . 1756.364 52.6 . 141.08948
2 1978 127.3 9.657238 . 1765.835 56.5 . 140.47368
2 1979 126.5 9.633533 . 1775.306 58.4 . 138.08684
2 1980 131.8 9.573803 . 1784.777 61.5 . 138.08948
2 1981 128.7 9.593041 . 1750.1112 64.7 . 137.98685
2 1982 127.4 9.5737 . 1715.4453 72.1 . 136.29474
2 1983 128 9.593053 . 1680.7794 82 . 131.25
2 1984 123.1 9.65044 17.9 1646.1138 93.6 . 124.90263
2 1985 125.8 9.675527 18.1 1611.448 98.5 . 123.1158
2 1986 126 9.705939 18.7 1576.782 103.6 . 120.59473
2 1987 122.3 9.705574 19 1542.1163 113 . 117.58685
2 1988 121.5 9.721532 18.9 1507.4504 119.9 . 113.82368
2 1989 118.3 9.73737 19 1472.7847 127.7 . 109.66315
2 1990 113.1 9.736311 19.9 1438.119 141.2 . 105.66579
2 1991 116.8 9.743068 19.9 . 146.5 . 104.3421
2 1992 126 9.788629 20 . 177.3 . 103.39474
2 1993 113.8 9.785142 19.7 . 179.9 . 102.69473
2 1994 108.8 9.813631 19.7 . 168.1 . 102.11842
2 1995 113 9.86446 19.5 . 167.3 . 103.1579
2 1996 110.7 9.885234 20.1 . 167.1 . 101.18421
2 1997 108.7 9.883107 19.8 . 181.3 . 101.78947
2 1998 109.5 . . . 187.3 . 100.9579
2 1999 104.8 . . . 206.9 . 97.59473
2 2000 99.4 . . . 279.3 . 92.13421
3 1970 123 . . 1781.5833 38.8 123 .
3 1971 121 . . 1792.9636 39.7 121 .
3 1972 123.5 9.930814 . 1804.344 39.9 123.5 .
3 1973 124.4 9.955092 . 1815.724 39.9 124.4 .
3 1974 126.7 9.947999 . 1827.1044 41.9 126.7 .
3 1975 127.1 9.937167 . 1838.4847 45 127.1 .
3 1976 128 9.976858 . 1849.865 48.3 128 .
3 1977 126.4 10.0027 . 1861.2454 49 126.4 .
3 1978 126.1 10.045565 . 1872.6255 58.7 126.1 .
3 1979 121.9 10.054688 . 1884.0057 60.1 121.9 .
3 1980 120.2 10.03784 . 1895.386 62.1 120.2 .
3 1981 118.6 10.028626 . 1855.3705 66.4 118.6 .
3 1982 115.4 10.01253 . 1815.355 72.8 115.4 .
3 1983 110.8 10.031737 . 1775.3394 84.9 110.8 .
3 1984 104.8 10.07536 25 1735.324 94.9 104.8 .
3 1985 102.8 10.099703 24 1695.3083 98 102.8 .
3 1986 99.7 10.127267 24.7 1655.2927 104.4 99.7 .
3 1987 97.5 10.1343 24.1 1615.277 103.9 97.5 .
3 1988 90.1 10.141663 23.6 1575.2615 117.4 90.1 .
3 1989 82.4 10.142313 23.7 1535.246 126.4 82.4 .
3 1990 77.8 10.141623 23.8 1495.2303 163.8 77.8 .
3 1991 68.7 10.110714 22.3 . 186.8 68.7 .
3 1992 67.5 10.11494 21.3 . 201.9 67.5 .
3 1993 63.4 10.098497 20.8 . 205.1 63.4 .
3 1994 58.6 10.099508 20.1 . 190.3 58.6 .
3 1995 56.4 10.155916 19.7 . 195.1 56.4 .
3 1996 54.5 10.178637 19.1 . 197.9 54.5 .
3 1997 53.8 10.17519 19.5 . 200.3 53.8 .
3 1998 52.3 . . . 207.8 52.3 .
3 1999 47.2 . . . 224.9 47.2 .
3 2000 41.6 . . . 351.2 41.6 .
4 1970 124.8 . . 1909.5022 29.4 . 120.08421
4 1971 125.5 . . 1916.476 31.1 . 123.86316
4 1972 134.3 9.805548 . 1923.4497 31.2 . 129.17896
4 1973 137.9 9.848413 . 1930.4232 32.7 . 131.53947
4 1974 132.8 9.840451 . 1937.397 38.1 . 134.66843
4 1975 131 9.828461 . 1944.3706 41.7 . 136.93158
4 1976 134.2 9.858913 . 1951.344 44.8 . 141.26053
end
Split string variable
Dear Experts,
I want to split the string variable. Please advice.
The issue is: I have the responses "a" "adc" "acfj" "cde" "adfghj". I want to split these responses as single word. e.g. "a" "b" "c" "d" "e". Can it done? Looking forward for your advice.
Thanking you
Yours faithfully
Cheda Jamtsho
I want to split the string variable. Please advice.
The issue is: I have the responses "a" "adc" "acfj" "cde" "adfghj". I want to split these responses as single word. e.g. "a" "b" "c" "d" "e". Can it done? Looking forward for your advice.
Thanking you
Yours faithfully
Cheda Jamtsho
Removing NA Across var
I have data in string as shown below
data1 data2
NA NA
NA NA
NA NA
NA 8415739
NA 10024002
N 12057882
N 10759322
N 11305650
N 10937087
N 11463371
N 11287917
N 12720750
N 14849447
N 15542380
N 17368642
N 20738561
I want to replace NA observation to missing(.)
I tried this command
replace data1=. if data1==NA stata return error "NA not found"
Can anybody help me on this please
data1 data2
NA NA
NA NA
NA NA
NA 8415739
NA 10024002
N 12057882
N 10759322
N 11305650
N 10937087
N 11463371
N 11287917
N 12720750
N 14849447
N 15542380
N 17368642
N 20738561
I want to replace NA observation to missing(.)
I tried this command
replace data1=. if data1==NA stata return error "NA not found"
Can anybody help me on this please
Monday, December 30, 2019
Generate sum of variables with sequential variables names
Hi everyone,
I have a data with these sequential variable names.
I would like to sum the sequential variable names as follows:
generate p1_1_1 = p1_1_1_1 + p1_1_1_2 + p1_1_1_3 + p1_1_1_4
generate p1_1_2 = p1_1_2_1 + p1_1_2_2 + p1_1_2_3 + p1_1_2_4
...
Somebody can help me for doing the same using loops...
Thanks a lot...
I have a data with these sequential variable names.
| p1_1_1_1 | p1_1_1_2 | p1_1_1_3 | p1_1_1_4 | p1_1_2_1 | p1_1_2_2 | p1_1_2_3 | p1_1_2_4 | p1_1_3_1 | p1_1_3_2 | p1_1_3_3 | p1_1_3_4 |
| 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |
| 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |
| 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |
I would like to sum the sequential variable names as follows:
generate p1_1_1 = p1_1_1_1 + p1_1_1_2 + p1_1_1_3 + p1_1_1_4
generate p1_1_2 = p1_1_2_1 + p1_1_2_2 + p1_1_2_3 + p1_1_2_4
...
Somebody can help me for doing the same using loops...
Thanks a lot...
Generate var using sequential variables names
Hi everyone,
p1_1 p1_2 p1_3 p2_1 p2_2 p2_3 p3_1 p3_2 p3_3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
I would like to generate a sum of the values with sequential variable names, using loops.
generate p1= p1_1 + p1_2 + p1_3
generate p2= p2_1 + p2_2 + p2_3
generate p3= p3_1 + p3_2 + p3_3
Thanks you....
p1_1 p1_2 p1_3 p2_1 p2_2 p2_3 p3_1 p3_2 p3_3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
1 1 1 2 2 2 3 3 3
I would like to generate a sum of the values with sequential variable names, using loops.
generate p1= p1_1 + p1_2 + p1_3
generate p2= p2_1 + p2_2 + p2_3
generate p3= p3_1 + p3_2 + p3_3
Thanks you....
Identification of Treatment and Control Group
Respected members,
I am trying to employ DID as a means of analysis. In my dataset of 287 firms between 2001 and 2016, there was a policy reform in 2010 of including at least 10 percent of female directors. After reading some articles on DID, I have developed the following alternatives to identify the treatment and control groups.
Option 1
Treatment group: Firms that did not have 10% of female directors before 2010.
Control group: Firms that had female directors of 10% or above before 2010.
Option 2
Treatment group: Firms that did not have 10% of female directors before 2010 and had at least 10% of female directors from 2010 onwards.
Control group: Firms that did not have 10% of female directors before 2010 and did not have at least 10% of female directors even after 2010.
The firms that already had at least 10% female directors before 2010 are excluded from the analysis.
Could you please advise me in this regard as to which of the above option (1 or 2) is appropriate?
Thanks in anticipation.
I am trying to employ DID as a means of analysis. In my dataset of 287 firms between 2001 and 2016, there was a policy reform in 2010 of including at least 10 percent of female directors. After reading some articles on DID, I have developed the following alternatives to identify the treatment and control groups.
Option 1
Treatment group: Firms that did not have 10% of female directors before 2010.
Control group: Firms that had female directors of 10% or above before 2010.
Option 2
Treatment group: Firms that did not have 10% of female directors before 2010 and had at least 10% of female directors from 2010 onwards.
Control group: Firms that did not have 10% of female directors before 2010 and did not have at least 10% of female directors even after 2010.
The firms that already had at least 10% female directors before 2010 are excluded from the analysis.
Could you please advise me in this regard as to which of the above option (1 or 2) is appropriate?
Thanks in anticipation.
Multilevel Panel Data with CPS Data
Good evening,
Using the below Census Population Survey variables, I need to figure out the change in percent Latino for each metarea from year to year, as well as the actual percent for each given year. I plan to use actual percent for that year and the change in percent as IVs for my model.
-year (2010-2019 in one year increments; sample per year)
-metarea (about 370 metropolitan areas that households are assigned to)
-household
-person in household
-Latino (binary variable at the person level)
I attached a preview of my dataset.
Thank you!
Using the below Census Population Survey variables, I need to figure out the change in percent Latino for each metarea from year to year, as well as the actual percent for each given year. I plan to use actual percent for that year and the change in percent as IVs for my model.
-year (2010-2019 in one year increments; sample per year)
-metarea (about 370 metropolitan areas that households are assigned to)
-household
-person in household
-Latino (binary variable at the person level)
I attached a preview of my dataset.
Thank you!
Problem with merging multiple csv files using merge 1:1
Hi,
I am a beginner in Stata (using Stata 16) and after going through many of the posts regarding merging multiple files from a folder, I tried to write the following code but I received an error. I will describe the data, folder structure, code and error messages below:
Data: I have quarterly bank data from FDIC where each csv file corresponds to one quarter and within each file, different banks are identified using a variable called 'cert'. For every file, there is also a column named 'repdte' which lists the quarter for the particular file (so for eg, I will have a file named All_Reports_20170930_U.S. Government Obligations.csv which will have many columns giving data regarding US Govt Obligations and there will also be two additional columns cert and repdte listing the bank ID and 20170930 respectively for the entire file).
Sample csv files may be downloaded from: https://www7.fdic.gov/sdi/download_l...st_outside.asp For my testing, I am using the 2018, 2017 files for quarters 1231 and 0930 for the files "Unused Commitments Securitization" and "U.S. Government Obligations".
What I want to do: I want to merge all the bank data across banks and quarter (panel data) and to do this, i figured I should use the command: merge 1:1 cert repdte using filename
Code:
clear all
pwd
cd "C:\Users\HP\Dropbox\Data\Test2"
tempfile mbuild
clear
save `mbuild', emptyok
foreach year in 2018 2017{
foreach dm in 1231 0930 {
foreach name in "Unused Commitments Securitization" "U.S. Government Obligations"{
import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'", clear
gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'"
merge 1:1 cert repdte using `mbuild'
save `mbuild', replace
}
}
}
Error:
.
. foreach year in 2018 2017{
2. foreach dm in 1231 0930 {
3. foreach name in "Unused Commitments Securitization" "U.S. Governmen
> t Obligations"{
4. import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Report
> s_`year'`dm'_`name'", clear
5. gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y
> ear'`dm'_`name'"
6. merge 1:1 cert repdte using `mbuild'
7. save `mbuild', replace
8.
. }
9. }
10. }
(52 vars, 5,415 obs)
no variables defined
r(111);
Could someone please help me understand what i am doing wrong and how I can achieve what I am trying to do? Additionally, I also want to be able to retrieve the merged file to do further analysis on Stata and also export it to a folder on my computer - how should I do that?
I am a beginner in Stata (using Stata 16) and after going through many of the posts regarding merging multiple files from a folder, I tried to write the following code but I received an error. I will describe the data, folder structure, code and error messages below:
Data: I have quarterly bank data from FDIC where each csv file corresponds to one quarter and within each file, different banks are identified using a variable called 'cert'. For every file, there is also a column named 'repdte' which lists the quarter for the particular file (so for eg, I will have a file named All_Reports_20170930_U.S. Government Obligations.csv which will have many columns giving data regarding US Govt Obligations and there will also be two additional columns cert and repdte listing the bank ID and 20170930 respectively for the entire file).
Sample csv files may be downloaded from: https://www7.fdic.gov/sdi/download_l...st_outside.asp For my testing, I am using the 2018, 2017 files for quarters 1231 and 0930 for the files "Unused Commitments Securitization" and "U.S. Government Obligations".
What I want to do: I want to merge all the bank data across banks and quarter (panel data) and to do this, i figured I should use the command: merge 1:1 cert repdte using filename
Code:
clear all
pwd
cd "C:\Users\HP\Dropbox\Data\Test2"
tempfile mbuild
clear
save `mbuild', emptyok
foreach year in 2018 2017{
foreach dm in 1231 0930 {
foreach name in "Unused Commitments Securitization" "U.S. Government Obligations"{
import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'", clear
gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y ear' `dm'_`name'"
merge 1:1 cert repdte using `mbuild'
save `mbuild', replace
}
}
}
Error:
.
. foreach year in 2018 2017{
2. foreach dm in 1231 0930 {
3. foreach name in "Unused Commitments Securitization" "U.S. Governmen
> t Obligations"{
4. import delimited "C:\Users\HP\Dropbox\Data\Test2\All_Report
> s_`year'`dm'_`name'", clear
5. gen source = "C:\Users\HP\Dropbox\Data\Test2\All_Reports_`y
> ear'`dm'_`name'"
6. merge 1:1 cert repdte using `mbuild'
7. save `mbuild', replace
8.
. }
9. }
10. }
(52 vars, 5,415 obs)
no variables defined
r(111);
Could someone please help me understand what i am doing wrong and how I can achieve what I am trying to do? Additionally, I also want to be able to retrieve the merged file to do further analysis on Stata and also export it to a folder on my computer - how should I do that?
Assistance on Statistical analysis
Can anyone help me out? I am investigating the coping strategies used among women by using the Brief COPE scale with Likert scale of 4. I want to see if there any association between the coping strategies and socio-demographic characteristics and medical variables? Which test is appropriate and what command to give. I have attached a dummy table for understanding.
Time stamps in forum software
Is there any chance the forum software could be changed to specify time zone in the time stamps? Right now (for me at least) it displays Central Time, which always confuses me a bit, since I'm on the east coast (and I'm assuming is even more confusing for people in more different time zones). So right now, it's 15:34 where I am, but the time stamp says 14:34. Even better would be the option to change what time zone the time stamps are displayed in (if that option doesn't exist already - apologies if it does!).
Using anymatch in a forvalue loop to detect if each value in v1 matches ANY value in v2
I'm struggling to come up with a solution for finding if each observation in variable 1 matches ANY of the specified observations in v2.I'm trying to narrow the data to focus on passengers that have arrived on-time at least one time in the data. That way I can look at those passengers' data, even for points when they weren't on time.
I'm trying to pass a numlist to anymatch of the names of the ID's of the passengers that have arrived on time at least one time but I'm getting an error.
"values() invalid -- invalid numlist"
This is my code:
g on_time= passengers if timely==1; // limiting to timely arrivals.
levelsof on_time;
g on_time_levels= r(levels); //unique numlist of passengers with timely arrivals (unsure of this)
g on_time_ever=.;
forvalues i =1/6939 {;
egen tempvariable = anymatch(passengers) if _n==`i',values(on_time_levels);
replace on_time_ever=tempvariable if _n==`i';
drop tempvariable;
};
I am unsure if the levels var I generated is really a numlist. How else can I get a numlist from this variable so I can pass it to anymatch? Or am I just going about this completely wrong?
Thanks!
I'm trying to pass a numlist to anymatch of the names of the ID's of the passengers that have arrived on time at least one time but I'm getting an error.
"values() invalid -- invalid numlist"
This is my code:
g on_time= passengers if timely==1; // limiting to timely arrivals.
levelsof on_time;
g on_time_levels= r(levels); //unique numlist of passengers with timely arrivals (unsure of this)
g on_time_ever=.;
forvalues i =1/6939 {;
egen tempvariable = anymatch(passengers) if _n==`i',values(on_time_levels);
replace on_time_ever=tempvariable if _n==`i';
drop tempvariable;
};
I am unsure if the levels var I generated is really a numlist. How else can I get a numlist from this variable so I can pass it to anymatch? Or am I just going about this completely wrong?
Thanks!
standard error of 8280 in multinominal logit
Hi,
I am analysing my data using multinominal logit. Firstly sorry that I cannot post my data and full results here.
Let call dependent variable "P3", and I have several independent variables: "treatment" , "P1", "age", "iq", "female", "mistakes", "major". The one I'm interested in is "treatment", and I think that "P1" has be included in the regression as a control. "P3" and "P1" are measuring the same thing before and after the treatment, and they have 7 categories. The sample size is small, 157, with two missing value in Female, so N=155.
I am running into a problem of getting very large standard error of coefficient, such as 8280 of one category of P1. Almost every such large standard error happens with one of the category of P1.
I looked at the cross-table of P1 and P3, and found there are some empty cells. The partial table looks like this.
I am wondering if these empty cells cause the enormous standard error. I know that the sample size is very small, and the number of independent variable are relatively too larger to sample size, should I switch to -firthlogit-?
Thanks for any help!!
I am analysing my data using multinominal logit. Firstly sorry that I cannot post my data and full results here.
Let call dependent variable "P3", and I have several independent variables: "treatment" , "P1", "age", "iq", "female", "mistakes", "major". The one I'm interested in is "treatment", and I think that "P1" has be included in the regression as a control. "P3" and "P1" are measuring the same thing before and after the treatment, and they have 7 categories. The sample size is small, 157, with two missing value in Female, so N=155.
I am running into a problem of getting very large standard error of coefficient, such as 8280 of one category of P1. Almost every such large standard error happens with one of the category of P1.
I looked at the cross-table of P1 and P3, and found there are some empty cells. The partial table looks like this.
Code:
P1 | P3
| -2 -1 0 1 2 3 4 | Total
-----------+-----------------------------------------------------------------------------+----------
3 | 0 0 0 0 1 2 2 | 5
4 | 0 0 0 0 0 3 13 | 16
-----------+-----------------------------------------------------------------------------+----------Thanks for any help!!
Estimating adjusted means and 95% CI using regression stata
Dear all,
Now i am analyzing a repeated measurable longitudinal data.
by linear regress model, Y, x, covariates ( age, sex, education, incomes) , i.obesity
i would like to get the adjusted means (95% confidence interval ) of Y at different level of obesity.
do i perform the command of margins?
what is the correct code to get this above results ?
i am grateful for your help.
Jianbo
Now i am analyzing a repeated measurable longitudinal data.
by linear regress model, Y, x, covariates ( age, sex, education, incomes) , i.obesity
i would like to get the adjusted means (95% confidence interval ) of Y at different level of obesity.
do i perform the command of margins?
what is the correct code to get this above results ?
i am grateful for your help.
Jianbo
Creating a local list from a variable
I have the following string values for two variables. I would like to create a local list from Var1 and/or Var2.
clear
input str4 Var1 str3 Var2
"A f" "H O"
"B" "L"
"C" "Z"
"D" "N t"
"E g" "m o"
"F" "a p"
"G" "w"
"" "q"
"" "po"
end
[/CODE]
when I use levels of this is what I get:
levelsof Var1, local(levels)
`"A f"' `"B"' `"C"' `"D"' `"E g"' `"F"' `"G"'
local List1 I desire is:
`" "A f" "B" "C" "D" "E g" "F" "G" "'
Similarly,
levelsof Var2, local(levels) is:
`"H O"' `"L"' `"N t"' `"Z"' `"a p"' `"m o"' `"po"' `"q"' `"w"'
local List2 I desire is:
`" "H O" "L" "N t" "Z" "a p" "m o" "po" "q" "w" "'
The goal is to eliminate manual entry to create List1 and and List2. Instead just grab them from the Var1 or Var2 and create a local list.
Any help would be appreciated.
Thanks
clear
input str4 Var1 str3 Var2
"A f" "H O"
"B" "L"
"C" "Z"
"D" "N t"
"E g" "m o"
"F" "a p"
"G" "w"
"" "q"
"" "po"
end
[/CODE]
when I use levels of this is what I get:
levelsof Var1, local(levels)
`"A f"' `"B"' `"C"' `"D"' `"E g"' `"F"' `"G"'
local List1 I desire is:
`" "A f" "B" "C" "D" "E g" "F" "G" "'
Similarly,
levelsof Var2, local(levels) is:
`"H O"' `"L"' `"N t"' `"Z"' `"a p"' `"m o"' `"po"' `"q"' `"w"'
local List2 I desire is:
`" "H O" "L" "N t" "Z" "a p" "m o" "po" "q" "w" "'
The goal is to eliminate manual entry to create List1 and and List2. Instead just grab them from the Var1 or Var2 and create a local list.
Any help would be appreciated.
Thanks
Running sum of observations by group for last 3 years
Dear Statalists,
my dataset includes company IDs and patents the companies invented per year. Each line is a patent invented in a certain year by a certain company, so that there might be several lines per company/year. I am struggeling with the running sum of the number of patents (= number of obs.) per company for the last 3 years.
In my example, for 1994 I would like to have a 2 as in that year two patents have been invented and there are no previous years for that company. For 1995, I would like to have 8 (6 from 1995 and 2 from 1994). For 1996 its 11, for 1997 its 12 (1994 drops out) and so on...
Any ideas? Thanks in advance!
I am using Stata MP 15.0.
my dataset includes company IDs and patents the companies invented per year. Each line is a patent invented in a certain year by a certain company, so that there might be several lines per company/year. I am struggeling with the running sum of the number of patents (= number of obs.) per company for the last 3 years.
In my example, for 1994 I would like to have a 2 as in that year two patents have been invented and there are no previous years for that company. For 1995, I would like to have 8 (6 from 1995 and 2 from 1994). For 1996 its 11, for 1997 its 12 (1994 drops out) and so on...
Any ideas? Thanks in advance!
I am using Stata MP 15.0.
Code:
clear input long permno float grant_year 10016 1994 10016 1994 10016 1995 10016 1995 10016 1995 10016 1995 10016 1995 10016 1995 10016 1996 10016 1996 10016 1996 10016 1997 10016 1997 10016 1997 10016 1998 10016 1998 10016 1998 10016 1998 10016 1998 10016 1998 10016 1998 10016 1998 end
IV estimation with ordinal endogenous variable and ordinal instrumental variable
Hello, everyone. I am actually quite new with Statalist, and just beginning to learn Stata beyond what was taught in our syllabi. I need your help with IV estimation. I wish to estimate mortality risk with BMI as main predictor (with survival analysis). To address the issue of reverse causation involving BMI and comorbid illness, I would like to use BMItime–1 as instrumental variable for BMI... with both BMIs as ordinal variables (underweight, normal[baseline], overweight, obese 1, obese 2, obese 3). What should I use in Stata? Is it ivregress or ivpoisson. Also, can anyone help me on how to code this in Stata? So far, the Stata manual hasn't been really helpful (the instrument is treated as continuous), and I've searched as extensively as I can, but came up with nothing. Do I create a separate dummy variable for each BMI class (e.g., BMI_time-1_underweight = 0 or 1, etc.)?
For completion, my other exogenous variables are age, sex, and current smoking status, and my other instruments are diabetestime-1 (0 or 1) and cardiovascular diseasetime-1 (0 or 1) and smoking status at time-1 (smoker vs nonsmoker).
Thank you all so much for your time and understanding.
For completion, my other exogenous variables are age, sex, and current smoking status, and my other instruments are diabetestime-1 (0 or 1) and cardiovascular diseasetime-1 (0 or 1) and smoking status at time-1 (smoker vs nonsmoker).
Thank you all so much for your time and understanding.
ttest for time series
Hello all,
I got monthly data of the standard deviation of my betas (Sd_beta) and three dummy variables P1-P3, which indicate wheter the volatility of Ted (Ted_Vol) is in the first second or third tercile.
I've regressed this dummies on the above mentioned standard Deviation and got the coefficients for P1 and P3. In the next step I have to determine if the difference (P3-P1) between the standard deviation given that P1=1 or P3 is statistically significant. I'm not qute sure how to approach this task. Is there a way to compute monthly differences and use them for a t-test?
Another thought of mine was to calculate the difference of Ted_Vol if P1=1 and P3=1 but I do not know how to match this numbers regarding my time variable.
Thank you in advance.
I got monthly data of the standard deviation of my betas (Sd_beta) and three dummy variables P1-P3, which indicate wheter the volatility of Ted (Ted_Vol) is in the first second or third tercile.
I've regressed this dummies on the above mentioned standard Deviation and got the coefficients for P1 and P3. In the next step I have to determine if the difference (P3-P1) between the standard deviation given that P1=1 or P3 is statistically significant. I'm not qute sure how to approach this task. Is there a way to compute monthly differences and use them for a t-test?
Another thought of mine was to calculate the difference of Ted_Vol if P1=1 and P3=1 but I do not know how to match this numbers regarding my time variable.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float date int Jahr byte Monat double(Sd_Beta Ted_Vol) float(P1 P3) 469 1999 2 .2162872850894928 .00042292120633646846 1 0 470 1999 3 .21683630347251892 .0005120532005093992 1 0 471 1999 4 .21574001014232636 .0005089303012937307 1 0 472 1999 5 .21684658527374268 .001667482778429985 0 0 473 1999 6 .21839885413646698 .0011676736176013947 0 0 474 1999 7 .21961617469787598 .0005640562158077955 0 0 475 1999 8 .22290168702602386 .0003095806168857962 1 0 476 1999 9 .23819047212600708 .001692846417427063 0 1 477 1999 10 .2405623197555542 .001672371756285429 0 0 478 1999 11 .23849868774414063 .0029762284830212593 0 1 479 1999 12 .24061444401741028 .002350094262510538 0 1 480 2000 1 .23947422206401825 .0033114443067461252 0 1 481 2000 2 .23897820711135864 .0017416990594938397 0 1 482 2000 3 .23724332451820374 .0005756043246947229 0 0 483 2000 4 .23789244890213013 .0004283250018488616 1 0 484 2000 5 .23350174725055695 .001657421002164483 0 1 485 2000 6 .23567992448806763 .002668452449142933 0 1 486 2000 7 .23621943593025208 .0005725464434362948 1 0 487 2000 8 .23868688941001892 .0008709787507541478 0 0 488 2000 9 .24259942770004272 .0012692536693066359 0 0 489 2000 10 .24906545877456665 .00048570294165983796 1 0 end format %tm date
Thank you in advance.
Delete records with missing valuables
How can I delete all observations that have missing data in any of the variables that I have
I have more than 61000 observations with 3809 variables and i need to keep only observation that are complete
I have more than 61000 observations with 3809 variables and i need to keep only observation that are complete
Firm fixed effects
Hi Statalist,
I have a question about firm-fixed effects.
My regression looks like:
Dependent var = independent var + controls
My dependent var is a continuous variable, and my independent var is a dummy variable. This dummy variable can, of course, be 1 or 0. It can go from 1 to 0 in consecutive years, but NOT from 0 to 1.
I made paneldata by xtset CIK fyear, where CIK is the company identifier.
My research supervisor said that when I include firm-fixed effects, for the B1 coefficient stata only looks at those firms that go from 1 to 0 in consecutive years (because all other firm-years are 'constant').
Is this true, and can anyone elaborate on this so that I will be able to defend this story more strongly?
If you need more information please feel free to ask...
I have a question about firm-fixed effects.
My regression looks like:
Dependent var = independent var + controls
My dependent var is a continuous variable, and my independent var is a dummy variable. This dummy variable can, of course, be 1 or 0. It can go from 1 to 0 in consecutive years, but NOT from 0 to 1.
I made paneldata by xtset CIK fyear, where CIK is the company identifier.
My research supervisor said that when I include firm-fixed effects, for the B1 coefficient stata only looks at those firms that go from 1 to 0 in consecutive years (because all other firm-years are 'constant').
Is this true, and can anyone elaborate on this so that I will be able to defend this story more strongly?
If you need more information please feel free to ask...
Standardized concentration indices with conindex
Hello everyone,
I am trying to modify the conindex user-written program so as it would calculate indirectly standardized concentration indices as well. However, I get an error 102 "too few variables specified", and I am not sure how to fix it. The option that I have added is [, STvar(varname)]. Below you can find the code. The added code has been highlighted with a red color. I haven't tried to modify the compare option so as it would incorporate the comparison of standardized coefficients. But that would be great too.
Any help would be much appreciated.
Thanos
I am trying to modify the conindex user-written program so as it would calculate indirectly standardized concentration indices as well. However, I get an error 102 "too few variables specified", and I am not sure how to fix it. The option that I have added is [, STvar(varname)]. Below you can find the code. The added code has been highlighted with a red color. I haven't tried to modify the compare option so as it would incorporate the comparison of standardized coefficients. But that would be great too.
Code:
capture program drop conindex2 program define conindex2, rclass sortpreserve byable(recall) version 11.0 syntax varname [if] [in] [fweight aweight pweight] , [RANKvar(varname)] [, robust] [, CLUSter(varname)] [, truezero] [, LIMits(numlist min=1 max=2 missingokay)] [, generalized][, generalised] [, bounded] [, WAGstaff] [, ERReygers] [, v(string)] [,beta(string)] [, graph] [, loud] [, COMPare(varname)] [, KEEPrank(string)] [, ytitle(string)] [, xtitle(string)] [,compkeep(numlist)] [,extended] [,symmetric] [,bygroup(numlist)] [,svy] [, STvar(varname)] marksample touse tempname grouptest counter tempvar wght sumw cumw cumw_1 cumwr cumwr_1 frnk temp sigma2 meanlhs meanlhs_star cumlhs cumlhs1 lhs rhs1 rhs2 xmin xmax varlist_star weight1 meanweight1 tempx temp1x sumlhsx temps tempex lhsex rhs1ex rhs2ex sigma2ex exrank tempgx lhsgex lhsgexstar symrank smrankmean tempsym sigma2sym lhssym lhssymstar rhs1sym rhs2sym lhsgsym tempgxstar raw_rank_c wi_c cusum_c wj_c rank_c var_rank_c mean_c lhs_c split_c ranking extwght temp1 meanweight sumlhs sumwr counts meanoverall tempdis temp0 meanlhs2 rhs temp2 frnktest meanlhsex2 equality group lhscomp rhs1comp rhs2comp rhscomp intercept scale stvar local weighted [`weight'`exp'] if "`weight'" != "" local weighted [`weight'`exp'] if "`weight'" == "" qui gen byte `wght' = 1 else qui gen `wght'`exp' if "`svy'"!=""{ if "`weight'" != "" { di as error "When the svy option is used, weights should only be specified using svyset." exit 498 } if "`cluster'"!="" { di as error "Warning: cluster option is redundant when using the svy option. svyset should be used to identify the survey design characteristics" } if "`robust'"!="" { di as error "Warning: robust option is redundant when using the svy option. svyset should be used to identify the survey design characteristics" } qui svyset if r(settings) == ", clear"{ di as error "svyset must be used to identify the survey design characteristics prior to running conindex2 with the svy option." exit 498 } local wtype = r(wtype) local wvar = r(wvar) if "`wtype'" != "." { local weighted "[`wtype' = `wvar']" qui replace `wght'=`wvar' } else replace `wght'=1 local survey "svy:" } markout `touse' `rankvar' `wght' `clus' `compare' quietly { local xxmin: word 1 of `limits' local xxmax: word 2 of `limits' if _by()==1 { if "`compare'"!="" { di as error "The option compare cannot be used in conjunction with by." exit 498 } } if "`compkeep'"=="" local bygroup = _byindex() if "`generalised'"=="generalised" local generalized="generalized" if "`extended'"!="" | "`symmetric'"!="" { di as error "Please see the help file for the correct syntax for the extended and symmetric indices" exit 498 } if "`xxmin'"=="" { scalar xmin=. } else scalar xmin=`xxmin' if "`xxmax'"=="" { scalar xmax=. } else scalar xmax=`xxmax' if "`weight'"!="" { sum `varlist' [aweight`exp'] if `touse' } else sum `varlist' if `touse' return scalar N=r(N) scalar testmean=r(mean) count if `varlist' < 0 & `touse' if r(N) > 0 { noisily disp as txt _n "Note: `varlist' has `r(N)' values less than 0" } if "`rankvar'" == "`varlist'" | "`rankvar'" ==""{ local index = "Gini" } else local index = "CI" gen double `standvar'=`varlist' if "`stvar'" != "" { replace `standvar'=`stvar' local label : variable label `stvar' label variable `standvar' `"`label'"' } gen double `ranking'=`varlist' if "`rankvar'" != "" { replace `ranking'=`rankvar' local label : variable label `rankvar' label variable `ranking' `"`label'"' } gen double `varlist_star'=`varlist' local CompWT_options = " `varlist'" if "`if'"!="" { local compif0="`if' & `compare'==0" local compif1="`if' & `compare'==1" } else { local compif0=" if `compare'==0" local compif1=" if `compare'==1" } forvalues i=0(1)1 { if "`weight'"!=""{ local CompWT_options`i' = "`CompWT_options' [`weight'`exp'] `compif`i'' `in'," } else local CompWT_options`i' = "`CompWT_options' `compif`i'' `in'," } if "`rankvar'"!="" { local Comp_options = "`Comp_options' rankvar(`rankvar')" } if "`cluster'"!="" { local Comp_options = "`Comp_options' cluster(`cluster')" } if xmin!=. { local Comp_options = "`Comp_options' limits(`limits')" } if "`v'"!="" { local Comp_options = "`Comp_options' v(`v')" } if "`beta'"!="" { local Comp_options = "`Comp_options' beta(`beta')" } if "`loud'"!="" { local Comp_options = "`Comp_options' loud" } if "`'"!="" { local Comp_options = "`Comp_options' " } foreach opt in robust truezero generalized bounded wagstaff erreygers svy{ if "``opt''"!="" { local Comp_options = "`Comp_options' `opt'" } } local extended=0 local symmetric=0 local modified=0 local problem=0 if "`truezero'"=="truezero" { if testmean==0 { if `problem'==0 di as err="The mean of the variable (`varlist') is 0 - the standard concentration index is not defined in this case." local problem=1 } if xmin != . { if xmin>0 { if `problem'==0 di as err="The lower bound for a ratio scale variable cannot be greater than 0." local problem=1 } } } if "`generalized'"=="generalized" { local generalized=1 } else local generalized=0 if "`truezero'"!="truezero" { if `generalized'==1 { if `problem'==0 di as err="The option truezero must be used when specifying the generalized option." local problem=1 } else local generalized=0 } if "`bounded'"!="" { if xmax==. { if `problem'==0 di as err="For bounded variables, the limits option must be specified as limits(#1 #2) where #1 is the minimum and #2 is the maximum." local problem=1 } local bounded=1 if xmin > xmax |xmin == xmax | xmin ==.{ if `problem'==0 di as err="For bounded variables, the limits option must be specified as limits(#1 #2) where #1 is the minimum and #2 is the maximum." local problem=1 } sum `varlist' if xmin!=.{ if r(min)<xmin |r(max)>xmax{ if `problem'==0 di as err="The variable (`varlist') takes values outside of the specified limits." local problem=1 } if r(min)>=xmin & r(max)<=xmax{ replace `varlist_star'=(`varlist'-xmin)/(xmax-xmin) } } } else local bounded=0 if "`wagstaff'"=="wagstaff" local wagstaff=1 else local wagstaff=0 if "`erreygers'"=="erreygers" local erreygers=1 else local erreygers=0 if `bounded'==0 & (`erreygers'==1| `wagstaff'==1){ di as err="Wagstaff and Erreygers Normalisations are only for use with bounded variables." di as err="Hence the bounded and limits(#1 #2) options must be used to specify the theoretical minimum (#1) and maximum (#2)." local problem=1 } if (`erreygers'==1 & `wagstaff'==1){ di as err="The option wagstaff cannot be used in conjunction with the option erreygers." local problem=1 } if "`v'"!="" { capture confirm number `v' if _rc { di as err="For the option v(#), # must be a number greater than 1." local problem=1 } if `v'<=1 & _rc==0 { di as err="For the option v(#), # must not be less than 1." local problem=1 } local extended=1 } if "`beta'"!="" { capture confirm number `beta' if _rc { di as err="For the option beta(#), # must be a number greater than 1." local problem=1 } if `beta'<=1 & _rc==0 { di as err="For the option beta(#), # must not be less than 1." local problem=1 } local symmetric=1 } if `extended'==1 & `symmetric'==1{ di as err="The option v(#) cannot be used in conjunction with the option beta(#)." local problem=1 } if (`extended'==1 | `symmetric'==1) & (`erreygers'==1| `wagstaff'==1){ di as err="Wagstaff and Erreygers Normalisations are not supported for extended/symmetric indices." local problem=1 } if (`generalized'==1) & (`erreygers'==1| `wagstaff'==1){ di as err="Cannot specify generalized in conjunction with Wagstaff or Erreygers Normalisations." local problem=1 } if xmin != . { sum `varlist' if r(min)<xmin{ if `problem'==0 di as err="The variable (`varlist') takes values outside of the specified limits." exit 498 } if "`truezero'"=="truezero" { di as txt="Note: The option truezero has been specified in conjunction with the limits option." if `extended'==1 | `symmetric'==1{ di as txt=" The index will be calculated using the standardised variable (`varlist' - min)/(max - min)." } else di as txt=" The limits are redundant as the variable is assumed to be ratio scaled (or fixed)." } } if "`truezero'"!="truezero" & `extended'==0 & `symmetric'==0 & `erreygers'==0 & `wagstaff'==0 & `generalized'==0 & `bounded'==0{ local modified=1 if xmin == . | xmax != . { di as err="For the modified concentration index, the limits option must be specified as limits(#1) where #1 is the minimum." di as err="If you require an alternative index, please look at the help file by typing - help conindex2 - to find the correct syntax." local problem=1 } if xmin == . { di as err="For the modified concentration index (the default), a missing value (.) may not be used as the lower limit. " local problem=1 } sum `varlist' if r(min)==r(max){ di as err="The modified concentration index cannot be computed since the variable (`varlist') is always equal to its minimum value." local problem=1 } } if "`truezero'"!="truezero" { if `extended'==1 | `symmetric'==1{ di as err="The extended and symmetric indices should be used for ratio-scale variables and hence truezero must be specified also." local problem=1 } } if "`graph'"=="graph"{ if "`truezero'"!="truezero" & `bounded'!=0{ di as err="Graph option only available for ratio-scale variables - please also specify the truezero option if the variable is ratio-scale or the bounded option if the variable is bounded." local problem=1 } if "`wagstaff'"=="wagstaff" | "`erreygers'"=="erreygers"{ di as err="Graph option not supported for Wagstaff or Erreygers Normalisations." local problem=1 } if `extended'==1 | `symmetric'==1{ di as err="Graph option not supported for Extended or Symmetric Indices." local problem=1 } } if "`loud'"=="loud" local noisily="noisily" if `problem'==1 exit 498 if `generalized'==1 & `extended'==1 noisily disp as txt _n "Note: The extended index equals the Erreygers normalised CI when v=2" if `generalized'==1 & `symmetric'==1 noisily disp as txt _n "Note: The symmetric index equals the Erreygers normalised CI when beta=2" if "`robust'"=="robust" | "`cluster'"!=""{ local SEtype="Robust std. error" } else local SEtype="Std. error" if "`svy'"!="" & (`extended'==0 & `symmetric'==0) gen `scale'=1 else gen double `scale'=sqrt(`wght') gsort -`touse' `ranking' egen double `sumw'=sum(`wght') if `touse' gen double `cumw'=sum(`wght') if `touse' gen double `cumw_1'=`cumw'[_n-1] if `touse' replace `cumw_1'=0 if `cumw_1'==. bys `ranking': egen double `cumwr'=max(`cumw') if `touse' bys `ranking': egen double `cumwr_1'=min(`cumw_1') if `touse' gen double `frnk'=(`cumwr_1'+0.5*(`cumwr'-`cumwr_1'))/`sumw' if `touse' gen double `temp'=(`wght'/`sumw')*((`frnk'-0.5)^2) if `touse' egen double `sigma2'=sum(`temp') if `touse' replace `temp'=`wght'*`varlist_star' egen double `meanlhs'=sum(`temp') if `touse' replace `meanlhs'=`meanlhs'/`sumw' if `modified'==1 & `bounded'==0{ replace `meanlhs'=`meanlhs'-xmin } if "`graph'"=="graph" { capture which lorenz if _rc==111 disp "conindex2 requires the lorenz.ado by Ben Jahn to produce graphs. Please install this before using conindex2." if "`ytitle'" ==""{ local ytext : variable label `varlist' if "`ytext'" == "" local ytext "`varlist'" local ytitle = "Cumulative share of `ytext'" if `generalized'==1 { if "`ytext'" == "" local ytext "`varlist'" local ytitle = "Cumulative average of `ytext'" } } if "`xtitle'" ==""{ if "`rankvar'" == "" local xtext : variable label `varlist' if "`rankvar'" != "" local xtext : variable label `ranking' if "`xtext'" == "" local xtext "`rankvar'" if "`xtext'" == "" local xtext "`varlist'" local xtitle = "Rank of `xtext'" } if `generalized'== 0{ lorenz estimate `varlist_star', pvar(`ranking') lorenz graph, ytitle(`ytitle', size(medsmall)) yscale(titlegap(5)) xtitle(`xtitle', size(medsmall)) ytitle(`ytitle', size(medsmall)) graphregion(color(white)) bgcolor(white) } if `generalized'==1 { lorenz estimate `varlist_star', pvar(`ranking') generalized lorenz graph, ytitle(`ytitle', size(medsmall)) yscale(titlegap(5)) xtitle(`xtitle', size(medsmall)) ytitle(`ytitle', size(medsmall)) graphregion(color(white)) bgcolor(white) } } noisily di in smcl /// "{hline 19}{c TT}{hline 13}{c TT}{hline 13}{c TT}{hline 19}" _c noi di in smcl "{c TT}{hline 10}{c TRC}" noisily di in text "Index:" _col(20) "{c |} No. of obs." _col(34) /// "{c |} Index value" _col(48) "{c |} `SEtype'" _col(68) /// "{c |} p-value" _col(79) "{c |}" noisily di in smcl /// "{hline 19}{c +}{hline 13}{c +}{hline 13}{c +}{hline 19}" _c noi di in smcl "{c +}{hline 10}{c RT}" gen double `lhs'=2*`sigma2'*(`varlist_star'/`meanlhs')*`scale' if `touse' gen double `intercept'=`scale' if `touse' gen double `rhs'=`frnk'*`scale' if `touse' local type = "`index'" if `modified'==1 & `bounded'==0{ replace `meanlhs'=`meanlhs'+xmin } if `generalized'==0 & `erreygers'==0 & `wagstaff'==0{ `noisily' disp "`index'" local type = "`index'" } if `modified'==1 { `noisily' disp "Modified `index'" local type = "Modified `index'" replace `lhs'=`lhs'*(`meanlhs')/(`meanlhs'-xmin) if `touse' ==1 } if `wagstaff'==1{ `noisily' disp "Wagstaff Normalisation" local type = "Wagstaff norm. `index'" replace `lhs'= `lhs'/(1-`meanlhs') if `touse' } if `erreygers'==1{ `noisily' disp "Errygers Normalisation" local type = "Erreygers norm. `index'" replace `lhs'= `lhs'*(4*`meanlhs') if `touse' } if `generalized'==1 { `noisily' disp "Gen. standard `index'" local type = "Gen. `index'" replace `lhs'=`lhs'*`meanlhs' if `touse' } if `extended'==1 | `symmetric'==1{ gsort -`touse' `frnk' gen double `temp1'=`wght'*`varlist_star' if `touse' egen double `sumlhs'=sum(`temp1') if `touse' bys `ranking': egen double `sumwr'=sum(`wght') if `touse' bys `ranking': egen double `counts'=count(`temp1') if `touse' gen `meanoverall'=`sumlhs'/`sumw' if `touse' bys `ranking': egen double `temp0'=rank(`ranking') if `touse', unique bys `ranking': egen double `meanlhs2'=sum(`temp1') if `touse' replace `meanlhs2'=`meanlhs2'/`sumwr' if `touse' } if `extended'==1{ capture drop `lhs' capture drop `rhs' capture drop `temp2' gen double `rhs'=((`sumwr'/`sumw')+((1-(`cumwr'/`sumw'))^`v')-((1-(`cumwr_1'/`sumw'))^`v')) if `temp0'==1 egen double `temp2'=sum(`rhs'^2) if `temp0'==1 gen double `lhs'=(`meanlhs2'/`meanoverall')*`temp2' if `touse' & `temp0'==1 local type = "Extended `index'" if `generalized'==1{ local type = "Gen. extended `index'" replace `lhs'=(`meanlhs2'*(`v'^(`v'/(`v'-1)))/(`v'-1))*`temp2' if `touse' & `temp0'==1 } } if `symmetric'==1{ capture drop `lhs' capture drop `rhs' capture drop `temp2' gen double `rhs'=(2^(`beta'-2))*(abs((`cumwr'/`sumw'-0.5))^`beta'-(abs(`cumwr_1'/`sumw'-0.5))^`beta') if `temp0'==1 egen double `temp2'=sum(`rhs'^2) if `temp0'==1 gen double `lhs'=(`meanlhs2'/`meanoverall')*`temp2' if `touse' & `temp0'==1 local type = "Symmetric `index'" if `generalized'==1{ local type = "Gen. symmetric `index'" replace `lhs'=`meanlhs2'*4*`temp2' if `touse' & `temp0'==1 } } `noisily' regress `lhs' `rhs' `intercept' `standvar' if `touse'==1, `robust' cluster(`cluster') noconstant if "`survey'"=="" `noisily' regress `lhs' `rhs' `intercept' `standvar' if `touse'==1, `robust' cluster(`cluster') noconstant if "`survey'"=="svy:" `noisily' svy: regress `lhs' `rhs' `intercept' `standvar' if `touse'==1, noconstant return scalar RSS=e(rss) mat b=e(b) mat V=e(V) return scalar CI= b[1,1] return scalar CIse= sqrt(V[1,1]) if `extended'==1 | `symmetric'==1{ `noisily' regress `lhs' `rhs' `standvar' if `temp0'==1, robust return scalar RSS=e(rss) mat b=e(b) mat V=e(V) return scalar CI= b[1,1] return scalar CIse = . } return scalar Nunique= e(N) local nclus= e(N_clust) local t=return(CI)/return(CIse) local p=2*ttail(e(df_r),abs(`t')) noisily di in text "`type'" _col(20) "{c |} " as result return(N) /// _col(34) "{c |} " as result return(CI) _col(48) "{c | }" /// as result return(CIse) _col(68) "{c |} " as result %7.4f /// `p' _col(79)"{c |}" noisily di in smcl /// "{hline 19}{c BT}{hline 13}{c BT}{hline 13}{c BT}{hline 19}" _c noi di in smcl "{c BT}{hline 10}{c BRC}" if `nclus'!=. noisily di in text "(Note: Std. error adjusted for `nclus' clusters in `cluster')" if return(Nunique)!=return(N) noisily di in text "(Note: Only " return(Nunique) " unique values for `rankvar')" if `extended'==1 | `symmetric'==1{ noisily di in text "(Note: Standard errors for the extended and symmetric indices are not calculated by the current version of conindex2.)" } if "`keeprank'"!="" { tempname savedrank gen double `savedrank'=`frnk' if _by()==0 { confirm new variable `keeprank'`compkeep' gen double `keeprank'`compkeep'=`savedrank' } if _by()==1 { gen double `keeprank'_`bygroup'=`savedrank' } } if "`compkeep'"!="" { confirm new variable templhs gen double templhs=`lhs' confirm new variable temprhs gen double temprhs=`rhs' } if "`compare'"!=""{ egen `group' = group(`compare') qui sum `group' if `touse' , meanonly scalar gmax=r(max) noisily di in text "" noisily di in text "" noisily di in text "For groups:" noisily di in text "" noisily di in text "" gen double `lhscomp'=. gen double `rhscomp'=. foreach i of num 1/`=scalar(gmax)' { if "`if'"!="" { local compif`i'="`if' & `group'==`i'" } else { local compif`i'=" if `group'==`i'" } if "`weight'"!=""{ local CompWT_options`i' = "`CompWT_options' [`weight'`exp'] `compif`i'' `in'," } else local CompWT_options`i' = "`CompWT_options' `compif`i'' `in'," qui sum `compare' if `touse' & `group'==`i', meanonly noisily di in text "CI for group `i': `compare' = "r(mean) noisily conindex2 `CompWT_options`i'' `Comp_options' keeprank(`keeprank') compkeep(`i') noisily di in text "" replace `lhscomp'=templhs if `touse' & `group'==`i' replace `rhscomp'=temprhs if `touse' & `group'==`i' drop templhs temprhs } `noisily' regress `lhscomp' c.`rhscomp' i.`group' if `touse', `robust' cluster(`cluster') return scalar N_restricted=e(N) return scalar SSE_restricted=e(rss) `noisily' regress `lhscomp' c.`rhscomp'##i.`group' if `touse', `robust' cluster(`cluster') noisily di in text "" return scalar SSE_unrestricted=e(rss) return scalar N_unrestricted=e(N) return scalar F=[(return(SSE_restricted)-return(SSE_unrestricted))/(gmax-1)]/(return(SSE_unrestricted)/(return(N_restricted)-2*gmax)) local p=1 - F(gmax-1,(return(N_restricted)- 2*gmax), return(F)) /* OO'D made two changes to second df 28.5.14 */ noisily di in text "Test for stat. significant differences with Ho: diff=0 (assuming equal variances)" _col(50) " noi di in smcl "{hline 19}{c TT}{hline 19}{c TRC}" noisily di in text "F-stat = " as result return(F) _col(20) "{c |} p-value= " as result %7.4f `p' _col(40) "{c |}" noi di in smcl "{hline 19}{c BT}{hline 19}{c BRC}" if gmax==2{ disp "Group: `compare'=0" conindex2 `CompWT_options1' `Comp_options' return scalar CI0=r(CI) return scalar CIse0=r(CIse) disp "Group: `compare'=1" conindex2 `CompWT_options2' `Comp_options' return scalar CI1=r(CI) return scalar CIse1=r(CIse) return scalar Diff= return(CI1)-return(CI0) return scalar Diffse= sqrt((return(CIse0))^2 + (return(CIse1))^2) return scalar z=return(Diff)/return(Diffse) local p=2*(1-normal(abs(return(z)))) noisily di in text "Test for stat. significant differences with Ho: diff=0 " _col(50) "(large sample assumed)" noi di in smcl /// "{hline 19}{c TT}{hline 23}{c TT}{hline 17}{c TT}{hline 18}{c TRC}" noisily di in text "Diff. = " as result return(Diff) _col(20) /// "{c |} Std. err. = " as result return(Diffse) _col(44) /// "{c |} z-stat = " as result %7.2f return(z) _col(59) "{c |} p-value = " as result %7.4f `p' _col(79)"{c |}" noi di in smcl /// "{hline 19}{c BT}{hline 23}{c BT}{hline 17}{c BT}{hline 18}{c BRC}" } } } end
Any help would be much appreciated.
Thanos
Multi-Logistics Regression on DHS data
Hi All,
I am using DHS India data to undertake analysis on the topic related to Child Health. I have completed my analysis but before I publish or present my findings I need to verify that my analysis is robust.
My request: Has anyone here conducted a multi-logistics regression using DHS data and willing to share their 'do file'. I will be very grateful to you.
Thank You.
I am using DHS India data to undertake analysis on the topic related to Child Health. I have completed my analysis but before I publish or present my findings I need to verify that my analysis is robust.
My request: Has anyone here conducted a multi-logistics regression using DHS data and willing to share their 'do file'. I will be very grateful to you.
Thank You.
Sunday, December 29, 2019
CMP model for multnomial probit with varying choice set
Hello,
I am looking to build a multinomial probit with a varying choice set for each individual via CMP. The idea is to later add other choice dimensions for building a joint model and hence I am looking for a workaround using CMP.
I understand that the "asmprobit" function does a good job of this by just not adding the rows corresponding to the alternatives that are not present.
I am looking to build a multinomial probit with a varying choice set for each individual via CMP. The idea is to later add other choice dimensions for building a joint model and hence I am looking for a workaround using CMP.
I understand that the "asmprobit" function does a good job of this by just not adding the rows corresponding to the alternatives that are not present.
Panel Data
Hi
How do I estimate a random effects of individual and time effects in panel data models with stata?
(I want the final estimate of the model)
{ xtreg Variables ,fe Prob > F = 0.0000}
{ hausman fe re Prob>chi2 = 0.4301}
{Breusch–Pagan, Honda, king-wu, SLM, GHM (in EViews) prob cross -section = 0.0000 prob period = 0.0000 Both = 0.0000 }
How to do the final model estimation??
How do I estimate a random effects of individual and time effects in panel data models with stata?
(I want the final estimate of the model)
{ xtreg Variables ,fe Prob > F = 0.0000}
{ hausman fe re Prob>chi2 = 0.4301}
{Breusch–Pagan, Honda, king-wu, SLM, GHM (in EViews) prob cross -section = 0.0000 prob period = 0.0000 Both = 0.0000 }
How to do the final model estimation??
Resolving "Initial values not feasible" error after using melogit command and the choice between melogit and meqrlogit
Dear Statalists,
I'm working on a multi-level model using data from cross-countries survey data for the 2016 year. But I am encountering a problem with stata command melogit and i hope you will help me to overcome it. It's the first time I work on the multilevel model.
You can see an extract of my data structure below:
countryID is the country's identification number,
id is ID number of respondent (which is so long),
health and pensions are the binary outcomes.
AGE1 (grand mean-centered) and SEX1 are individual predictors
Primary, Secondary and Tertiary are country-level variables which represent the proportion of immigrant with primary, secondary and tertiary education in a different country
I select only one country here 56 which Country ISO 3166 Code for Belgium.
clear
input float(countryID id health pensions AGE1) float SEX1 double(Primary Secondary Tertiary)
56 2.016056e+15 1 1 -7.302176 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -7.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -8.3021755 0 43.7 31.4 24.9
56 2.016056e+15 1 1 -9.3021755 0 43.7 31.4 24.9
56 2.016056e+15 1 1 -14.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 0 -8.3021755 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -15.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 0 -1.3021756 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -10.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 32.697823 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -11.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -12.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -11.302176 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -12.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 0 26.697824 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -5.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 1 -13.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 0 -14.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -25.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -13.302176 0 43.7 31.4 24.9
When I run melogit command, i obtain this result:
melogit health SEX1 AGE1 Primary Secondary Tertiary || id:
Fitting fixed-effects model:
Iteration 0: log likelihood = -13691.836
Iteration 1: log likelihood = -13670.184
Iteration 2: log likelihood = -13670.165
Iteration 3: log likelihood = -13670.165
Refining starting values:
Grid node 0: log likelihood = -13173.957
Fitting full model:
initial values not feasible
r(1400);
But if meqrlogit command, I obtain the following result:
meqrlogit health SEX1 AGE1 Primary Secondary Tertiary || id:
Refining starting values:
Iteration 0: log likelihood = -12500.998 (not concave)
Iteration 1: log likelihood = -12483.544
Iteration 2: log likelihood = -12451.561
Performing gradient-based optimization:
Iteration 0: log likelihood = -12451.561 (not concave)
Iteration 1: log likelihood = -12389.075 (not concave)
Iteration 2: log likelihood = -12364.939
Iteration 3: log likelihood = -12315.838 (not concave)
Iteration 4: log likelihood = -12309.971 (not concave)
Iteration 5: log likelihood = -12304.889 (not concave)
Iteration 6: log likelihood = -12304.14
Iteration 7: log likelihood = -12298.841 (not concave)
Iteration 8: log likelihood = -12298.209
Iteration 9: log likelihood = -12287.509 (not concave)
Iteration 10: log likelihood = -12287.477
Iteration 11: log likelihood = -12286.192 (not concave)
Iteration 12: log likelihood = -12285.969
Iteration 13: log likelihood = -12285.85 (not concave)
Iteration 14: log likelihood = -12285.831
Iteration 15: log likelihood = -12285.769
Iteration 16: log likelihood = -12285.768
Iteration 17: log likelihood = -12285.767
Mixed-effects logistic regression Number of obs = 25769
Group variable: id Number of groups = 13
Obs per group: min = 1002
avg = 1982.2
max = 3995
Integration points = 7 Wald chi2(5) = 567.30
Log likelihood = -12285.767 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SEX1 | .103667 .0331526 3.13 0.002 .0386892 .1686449
AGE1 | -.0042459 .0009453 -4.49 0.000 -.0060987 -.0023932
Primary | -7.556424 1.069612 -7.06 0.000 -9.652826 -5.460023
Secondary | -7.528384 1.07176 -7.02 0.000 -9.628994 -5.427774
Tertiary | -7.262108 1.087128 -6.68 0.000 -9.392841 -5.131376
_cons | 748.3098 107.4985 6.96 0.000 537.6166 959.003
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 4.869241 2.235363 1.980128 11.97373
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) = 2768.79 Prob>=chibar2 = 0.0000
Questions:
What is the problem with the melogit ? and with what command can I fix it ?
What do you think about meqrlogit estimation result ? Is it better then the melogit one ?
If yes, Why ?
Many thanks
Cisse abs
I'm working on a multi-level model using data from cross-countries survey data for the 2016 year. But I am encountering a problem with stata command melogit and i hope you will help me to overcome it. It's the first time I work on the multilevel model.
You can see an extract of my data structure below:
countryID is the country's identification number,
id is ID number of respondent (which is so long),
health and pensions are the binary outcomes.
AGE1 (grand mean-centered) and SEX1 are individual predictors
Primary, Secondary and Tertiary are country-level variables which represent the proportion of immigrant with primary, secondary and tertiary education in a different country
I select only one country here 56 which Country ISO 3166 Code for Belgium.
clear
input float(countryID id health pensions AGE1) float SEX1 double(Primary Secondary Tertiary)
56 2.016056e+15 1 1 -7.302176 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -7.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -8.3021755 0 43.7 31.4 24.9
56 2.016056e+15 1 1 -9.3021755 0 43.7 31.4 24.9
56 2.016056e+15 1 1 -14.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 0 -8.3021755 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -15.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 0 -1.3021756 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -10.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 32.697823 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -11.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -12.302176 1 43.7 31.4 24.9
56 2.016056e+15 0 0 -11.302176 0 43.7 31.4 24.9
56 2.016056e+15 1 0 -12.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 0 26.697824 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -5.302176 0 43.7 31.4 24.9
56 2.016056e+15 0 1 -13.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 0 -14.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -25.302176 1 43.7 31.4 24.9
56 2.016056e+15 1 1 -13.302176 0 43.7 31.4 24.9
When I run melogit command, i obtain this result:
melogit health SEX1 AGE1 Primary Secondary Tertiary || id:
Fitting fixed-effects model:
Iteration 0: log likelihood = -13691.836
Iteration 1: log likelihood = -13670.184
Iteration 2: log likelihood = -13670.165
Iteration 3: log likelihood = -13670.165
Refining starting values:
Grid node 0: log likelihood = -13173.957
Fitting full model:
initial values not feasible
r(1400);
But if meqrlogit command, I obtain the following result:
meqrlogit health SEX1 AGE1 Primary Secondary Tertiary || id:
Refining starting values:
Iteration 0: log likelihood = -12500.998 (not concave)
Iteration 1: log likelihood = -12483.544
Iteration 2: log likelihood = -12451.561
Performing gradient-based optimization:
Iteration 0: log likelihood = -12451.561 (not concave)
Iteration 1: log likelihood = -12389.075 (not concave)
Iteration 2: log likelihood = -12364.939
Iteration 3: log likelihood = -12315.838 (not concave)
Iteration 4: log likelihood = -12309.971 (not concave)
Iteration 5: log likelihood = -12304.889 (not concave)
Iteration 6: log likelihood = -12304.14
Iteration 7: log likelihood = -12298.841 (not concave)
Iteration 8: log likelihood = -12298.209
Iteration 9: log likelihood = -12287.509 (not concave)
Iteration 10: log likelihood = -12287.477
Iteration 11: log likelihood = -12286.192 (not concave)
Iteration 12: log likelihood = -12285.969
Iteration 13: log likelihood = -12285.85 (not concave)
Iteration 14: log likelihood = -12285.831
Iteration 15: log likelihood = -12285.769
Iteration 16: log likelihood = -12285.768
Iteration 17: log likelihood = -12285.767
Mixed-effects logistic regression Number of obs = 25769
Group variable: id Number of groups = 13
Obs per group: min = 1002
avg = 1982.2
max = 3995
Integration points = 7 Wald chi2(5) = 567.30
Log likelihood = -12285.767 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SEX1 | .103667 .0331526 3.13 0.002 .0386892 .1686449
AGE1 | -.0042459 .0009453 -4.49 0.000 -.0060987 -.0023932
Primary | -7.556424 1.069612 -7.06 0.000 -9.652826 -5.460023
Secondary | -7.528384 1.07176 -7.02 0.000 -9.628994 -5.427774
Tertiary | -7.262108 1.087128 -6.68 0.000 -9.392841 -5.131376
_cons | 748.3098 107.4985 6.96 0.000 537.6166 959.003
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 4.869241 2.235363 1.980128 11.97373
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) = 2768.79 Prob>=chibar2 = 0.0000
Questions:
What is the problem with the melogit ? and with what command can I fix it ?
What do you think about meqrlogit estimation result ? Is it better then the melogit one ?
If yes, Why ?
Many thanks
Cisse abs
PPML multicolineality
Hi,
I have estimated a model with OLS and PPML.
With OLS I obtain R2 = 0.78
estat vif = 1.2
With PPML I obtain R2 = 0.95
Is it mean that I have a multicollinearity problem?
Thank you!
I have estimated a model with OLS and PPML.
With OLS I obtain R2 = 0.78
estat vif = 1.2
With PPML I obtain R2 = 0.95
Is it mean that I have a multicollinearity problem?
Thank you!
Difference of a variable based on corresponding variables in other columns
Hi, Based on a subset of data pasted below:
For every state (by year) I would like to take the difference between the growth of that state and its neighboring states (for which the neighbors data exist) denoted by columns: neigh1 neigh2 neigh3 neigh4. E.g. for year 1989 for state WY, I would like to take the difference between the growth of WY and its neighbors MT, SD, NE, CO.
Would appreciate help in this regards. Thanks.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double year str2 state double growth str2(neigh1 neigh2 neigh3) str3 neigh4 1989 "WY" 9.05 "MT" "SD" "NE" "CO" 1989 "IN" 4.62 "MI" "OH" "KY" "IL" 1989 "CO" 3.01 "WY" "NE" "KS" "OK" 1989 "MN" 2.57 "WI" "IA" "SD" "ND" 1989 "ID" 5.31 "MT" "WY" "UT" "NV" 1989 "NH" 3.41 "ME" "MA" "VT" "" 1989 "DE" 2.87 "PA" "NJ" "MD" "" 1989 "KS" 1.91 "NE" "MO" "OK" "CO" 1989 "MI" 3.81 "OH" "IN" "WI" "" 1989 "NV" 3.55 "ID" "UT" "AZ" "CA" 1989 "RI" 6.21 "MA" "CT" "" "" 1989 "WV" 9.03 "PA" "MD" "VA" "KY" 1989 "ND" -7.22 "MN" "SD" "MT" "" 1989 "HI" 5.65 "" "" "" "" 1989 "NC" 4.54 "VA" "SC" "GA" "TN" 1989 "IL" 5.65 "WI" "IN" "KY" "MO" 1989 "AZ" 1.87 "UT" "CO" "NM" "CA" 1989 "IA" 5.15 "MN" "WI" "IL" "MO" 1989 "OH" 3.45 "PA" "WV" "KY" "IN" 1989 "TX" 5.7 "OK" "AR" "LA" "NM" 1989 "VA" 3.38 "MD" "NC" "TN" "KY" 1989 "UT" 4.6 "ID" "WY" "CO" "NM" 1989 "MD" 4.79 "PA" "DE" "VA" "WV" 1989 "AL" 4.67 "TN" "GA" "FL" "MS" 1989 "NM" 1.17 "CO" "OK" "TX" "AZ" 1989 "AR" 3.82 "MO" "TN" "MS" "LA " 1989 "KY" 6.79 "OH" "WV" "VA" "TN" 1989 "FL" 3.54 "GA" "AL" "" "" 1989 "LA" 5.04 "AR" "MS" "TX" "" 1989 "TN" 3.83 "KY" "VA" "NC" "GA" 1989 "CA" 3.11 "OR" "NV" "AZ" "" 1989 "MO" 3.89 "IA" "IL" "KY" "TN" 1989 "PA" 4.55 "NY" "NJ" "DE" "MD" 1989 "WI" 5.14 "MI" "IL" "IA" "MN" 1989 "VT" 6.79 "NH" "MA" "NY" "" 1989 "ME" 5.82 "NH" "" "" "" 1989 "GA" 2.68 "NC" "SC" "FL" "AL" 1989 "OK" 6.23 "KS" "MO" "AR" "TX" 1989 "OR" 4.72 "WA" "ID" "NV" "CA" 1989 "NE" 5.35 "SD" "IA" "MO" "KS" 1989 "SC" 4.19 "NC" "GA" "" "" 1989 "WA" 3.58 "ID" "OR" "" "" 1989 "MA" 5.14 "NH" "RI" "CT" "NY" 1989 "NJ" 6.79 "NY" "CT" "DE" "PA" 1989 "MS" 2.69 "TN" "AL" "LA" "AR" 1989 "AK" -6.16 "" "" "" "" 1989 "CT" 5.29 "MA" "RI" "NY" "" 1989 "SD" .5 "ND" "MN" "IA" "NE" 1989 "NY" 5.34 "VT" "MA" "CT" "NJ" 1989 "MT" -.27 "ND" "SD" "WY" "ID" 1990 "MS" 1.18 "TN" "AL" "LA" "AR" 1990 "IA" 3.54 "MN" "WI" "IL" "MO" 1990 "AL" -.28 "TN" "GA" "FL" "MS" 1990 "WY" 1.23 "MT" "SD" "NE" "CO" 1990 "IL" 1.76 "WI" "IN" "KY" "MO" 1990 "CT" .96 "MA" "RI" "NY" "" 1990 "NV" 1.74 "ID" "UT" "AZ" "CA" 1990 "FL" .92 "GA" "AL" "" "" 1990 "WA" 2.74 "ID" "OR" "" "" 1990 "ME" .62 "NH" "" "" "" 1990 "MN" 2.01 "WI" "IA" "SD" "ND" 1990 "CO" .77 "WY" "NE" "KS" "OK" 1990 "ID" 5.46 "MT" "WY" "UT" "NV" 1990 "UT" .23 "ID" "WY" "CO" "NM" 1990 "AR" 1.82 "MO" "TN" "MS" "LA " 1990 "IN" 3.02 "MI" "OH" "KY" "IL" 1990 "NJ" 1.23 "NY" "CT" "DE" "PA" 1990 "LA" .82 "AR" "MS" "TX" "" 1990 "WI" 1.31 "MI" "IL" "IA" "MN" 1990 "PA" 1.47 "NY" "NJ" "DE" "MD" 1990 "ND" 7.37 "MN" "SD" "MT" "" 1990 "AZ" -1.54 "UT" "CO" "NM" "CA" 1990 "MO" 1.8 "IA" "IL" "KY" "TN" 1990 "MA" -.08 "NH" "RI" "CT" "NY" 1990 "AK" 3.12 "" "" "" "" 1990 "GA" .52 "NC" "SC" "FL" "AL" 1990 "WV" 1.15 "PA" "MD" "VA" "KY" 1990 "OR" 1.03 "WA" "ID" "NV" "CA" 1990 "NC" 1.96 "VA" "SC" "GA" "TN" 1990 "NM" 1.24 "CO" "OK" "TX" "AZ" 1990 "NE" 2.77 "SD" "IA" "MO" "KS" 1990 "MT" 4.16 "ND" "SD" "WY" "ID" 1990 "NY" -.41 "VT" "MA" "CT" "NJ" 1990 "MD" .65 "PA" "DE" "VA" "WV" 1990 "VT" 3.07 "NH" "MA" "NY" "" 1990 "SC" 1.95 "NC" "GA" "" "" 1990 "OK" .8 "KS" "MO" "AR" "TX" 1990 "DE" 5.52 "PA" "NJ" "MD" "" 1990 "KS" .31 "NE" "MO" "OK" "CO" 1990 "OH" 1.84 "PA" "WV" "KY" "IN" 1990 "MI" 1.2 "OH" "IN" "WI" "" 1990 "RI" 2.25 "MA" "CT" "" "" 1990 "TX" 1.7 "OK" "AR" "LA" "NM" 1990 "NH" -2.13 "ME" "MA" "VT" "" 1990 "KY" 2.28 "OH" "WV" "VA" "TN" 1990 "TN" .33 "KY" "VA" "NC" "GA" 1990 "HI" 4.51 "" "" "" "" 1990 "SD" 1.87 "ND" "MN" "IA" "NE" 1990 "VA" 2.06 "MD" "NC" "TN" "KY" 1990 "CA" 1.1 "OR" "NV" "AZ" "" end
For every state (by year) I would like to take the difference between the growth of that state and its neighboring states (for which the neighbors data exist) denoted by columns: neigh1 neigh2 neigh3 neigh4. E.g. for year 1989 for state WY, I would like to take the difference between the growth of WY and its neighbors MT, SD, NE, CO.
Would appreciate help in this regards. Thanks.
georoute command and HERE API not working?
hello all,
I'm having some trouble with the HERE API and the georoute package. I registered for a HERE account and generated both a javascript and a REST ID and code, and neither will work with georoute! I keep getting the Stata message "There seems so be a problem with your HERE account". Am I missing something obvious? Thanks so much, I'm very confused.
I'm having some trouble with the HERE API and the georoute package. I registered for a HERE account and generated both a javascript and a REST ID and code, and neither will work with georoute! I keep getting the Stata message "There seems so be a problem with your HERE account". Am I missing something obvious? Thanks so much, I'm very confused.
Mediation analysis with STATA and control variables?
I've seen a lot of forms to do a mediation analysis, being Baron and Kenny (1986) steps the most popular. However, I see that they do the regressions just with the three variables of interest (reg DV IV; reg Mediator IV; reg DV IV Mediator). My first question is: is necessary to include, also, the control variables to do the analysis? And, if so, how it can be done with STATA. I've read that SEM is a good way, but it is normally done withouth control variables.
'End Duplicates' error in mata programing
Hello, everyone. I'm Zhang.
I would like to ask you a question about an 'End Duplicates' error in my mata programming.
My program is used to compute some matrixs. Considering the computing limit of stata, I need to use mata language. I wrote a do file for mata-stata interaface. I also want to use an ado programmingwhichmakes all kinds of do files run. But the problem is, in my program, the ado programming and the mata programming have the repeated same 'End'. Thus my stata report a error called ‘End Duplicates’.
So, I would like to ask you two questions.
First, is it wrong to write like this, and is stata not allowing this?
Second, if my idea is resaonable, how to code in order to interface mata programming with ado programming?
My codes are:
Code:
/*Define program*/
program define MYPROGRAM
version 14.0
/*Define syntax*/
syntax using/, [name(string) *
[ ... ] // some other options
use "`using'", clear // import data
confirm name `name'
/*Create and Compute Matrix use Mata*/
mata:
...
create a matrix named MATRIX
...
/*End Mata*/
end
/*End Program*/
end // you can see that, there are two 'end's make 'end duplicates' and end unrecognizeThank you very much for your answer!
Testing of adjusted Kaplan-Meier survival
How to test for adjusted kaplan-meier survival analysis ?
I am able to create adjusted KM graph but i don't know how to test adjusted KM?
Thanks
I am able to create adjusted KM graph but i don't know how to test adjusted KM?
Thanks
Marginsplots for different values of x
Hello,
first post so apologies if something similar has been asked before.
I'm investigating the relationship between intergroup ethnic contact at the workplace and tolerance with educational years as interaction.
My X (intergroup contact) has after recoding 3 levels of contact (no contact (baseline) some contact and a lot of contact). I'd like to produce two marginsplots displaying the effect of contact with educational years interacting. One marginsplot displaying the effect of some contact at different levels of education and another displaying the effect of a lot of contact at different levels of education.
My regression looks like this:
xtreg tolerance i.RCimgclg##c.eduyrs i.gndr agea i. empl i.domicil hincfel lrscale, fe robust
So far my marginsplot looks like picture attatched
https://imgur.com/a/2nDpJbc
Thank you in advance
first post so apologies if something similar has been asked before.
I'm investigating the relationship between intergroup ethnic contact at the workplace and tolerance with educational years as interaction.
My X (intergroup contact) has after recoding 3 levels of contact (no contact (baseline) some contact and a lot of contact). I'd like to produce two marginsplots displaying the effect of contact with educational years interacting. One marginsplot displaying the effect of some contact at different levels of education and another displaying the effect of a lot of contact at different levels of education.
My regression looks like this:
xtreg tolerance i.RCimgclg##c.eduyrs i.gndr agea i. empl i.domicil hincfel lrscale, fe robust
So far my marginsplot looks like picture attatched
Thank you in advance
Generating data according to a pattern
Hello list,
I am trying to generate data according to the following pattern:
Obs A B C
----------------
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
Any help would be appreciated.
Thanks
André
I am trying to generate data according to the following pattern:
Obs A B C
----------------
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
Any help would be appreciated.
Thanks
André
Comparing two probit models with clustered standard errors
Hey everyone,
I would like to compare two probit models (that are nested) to see whether the addition of further variables improves the model. As I need clustered standard errors, it is not possible to use a likelihood ratio test. Which other possibilites do I have? Is it appropriate to compare the AIC/BIC or do another test?
Looking forward to some advice. Thanks! :-)
I would like to compare two probit models (that are nested) to see whether the addition of further variables improves the model. As I need clustered standard errors, it is not possible to use a likelihood ratio test. Which other possibilites do I have? Is it appropriate to compare the AIC/BIC or do another test?
Looking forward to some advice. Thanks! :-)
Panel Data Analysis: Growth rates or levels?
Hi Statalist-community!
I am currently writing a seminar paper. I am estimating the effect of the share of leftwing government members in swiss cantons on the public expenses (and their categories) of these cantons.
The data is balanced panel data with N (cantons) = 26 and T(years) = 28.
The dependent variable is: public expenses in category j per canton in year t
The main independent variable is: share of left-wing government members in government of canton i in year t.
(Category means for example: health, social security, culture, education, etc.)
As control variables I have:
- GDP per canton
- unemployment rate per canton
- ratio of people aged > 64 per canton
- debt per canton
- population per canton
- and a lagged variable of the proportion of left-wing politicians in the parliament
Here is a sample of my main data for one canton:
My question to you is now: Should I use the growth rates in the variables (the dependent variables as well as the control variables) or their levels? I decided to estimate a LSDVC model (xtlsdvc in STATA). When I use levels, I see some effects, but when I use growth rates, virtually all the variables become insignificant.
I use the following code in STATA for the analysis in levels:
And the following code for the analysis in growth rates:
The reason why I'm unsure is because almost all the scientific papers examining the same hypothesis are using growth rates, but I don't really see why.
The data are stationary and the Hausman-test result proposed to use fixed effects.
Also: Do you think I am estimating the right model?
Thank you so much, your answer would help me an awful lot!!
Regards,
Lara Knuchel
I am currently writing a seminar paper. I am estimating the effect of the share of leftwing government members in swiss cantons on the public expenses (and their categories) of these cantons.
The data is balanced panel data with N (cantons) = 26 and T(years) = 28.
The dependent variable is: public expenses in category j per canton in year t
The main independent variable is: share of left-wing government members in government of canton i in year t.
(Category means for example: health, social security, culture, education, etc.)
As control variables I have:
- GDP per canton
- unemployment rate per canton
- ratio of people aged > 64 per canton
- debt per canton
- population per canton
- and a lagged variable of the proportion of left-wing politicians in the parliament
Here is a sample of my main data for one canton:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int jahr long kanton double(links total BIP) long bev double(alt schulden Alquote) 1990 1 .2 2103595.49679 22258.61295548331 498035 .126 4118853.07339 .0021 1991 1 .2 2418372.7013 23247.730242745427 506818 .126 4258530.37578 .0055000000000000005 1992 1 .2 2622013.84704 23731.27671978167 511979 .126 3995904.96175 .01648507796021844 1993 1 .2 2752637.76148 24251.730934888055 518945 .126 4020907.67093 .03353853343994975 1994 1 .2 2866913.15304 24850.55070690186 523114 .127 4312582.93007 .032740127331169364 1995 1 .2 2871594.22682 25154.26712852215 528887 .12800000000000003 4261014.9935 .02906240833847551 1996 1 0 3033372.5041 25331.135728586345 531665 .129 4566428.24205 .03839377973484641 1997 1 0 3065249.40363 25817.39558783384 534028 .13 4734656.4063 .046613346283090946 1998 1 0 3092004.07619 26558.17208651548 536462 .132 4797195.89518 .030217723885365436 1999 1 0 3184180.88853 27039.352615536056 540639 .13195681406631782 4907276.22711 .021088996722396877 2000 1 0 3288728.78046 28525.767132413897 544306 .13364541269065564 4901166.57689 .013503121668950816 2001 1 0 3448772.61443 29194.49554783396 550298 .13510134508938793 4762340.0067 .01214187822228023 2002 1 0 3559261.63535 29167.80471578718 555782 .1365679349097308 4489301.77899 .02123115577889447 2003 1 0 3771771 29508.175608019385 560674 .13796074010922568 4366114.90529 .033257738911005245 2004 1 0 3868402.48256 30431.545233146422 565122 .14008302631998046 5305308.83576 .034339989993256326 2005 1 0 3987495.90309 31596.1406982345 569344 .14222684352517986 4762380.78077 .03251647849637799 2006 1 0 4304953.29254 33544.994154576845 574813 .14474272502535607 4510972.05203 .02857251626095847 2007 1 0 4505096.31065 35767.62753463508 581562 .147655795942651 4886900.96703 .023552013313319846 2008 1 .25 3995789.07397 37774.52858 591632 .15066122184060363 4832603.85848 .022927679523156913 2009 1 .4 4107370.48387 36943.54089 600040 .1535130991267249 4584035.66422 .033850257782418576 2010 1 .4 4146058.66746 37664.96637 608299 .15526129007990633 4696771.97718 .03128852310932995 2011 1 .4 4336101.42091 38505.31274 618298 .15872281650595668 4311897.59296 .025658837672748246 2012 1 .4 4577403.78175 38719.76854 627340 .16137022985940638 3900244.74448 .02685290486325758 2013 1 .4 4703960.67493 39488.54672 636362 .1640371360954928 3744099.61901 .028494090775842893 2014 1 .4 4770144.18455 40139.27705 645277 .1663657623005314 4051689.93526 .027857224266495634 2015 1 .4 4892710.96136 40647.58003 653675 .16873522775079358 4254435.63835 .029879852698914817 2016 1 .2 4973509.49627 40813.49 663462 .1705282291977536 4838286.30867 .031555575596547404 2017 1 .2 4944214.42435 41592.47817 670988 .17351129975498816 4990070.88585 .030368726678589565 end label values kanton kanton1 label def kanton1 1 "AG", modify label var jahr "jahr" label var kanton "kanton" label var links "Anteil links in Regierung" label var total "Total Ausgaben" label var BIP "BIP" label var bev "Total Wohnbevölkerung" label var alt "Anteil Bevölkerung >64" label var schulden "Bruttoschulden" label var Alquote "Arbeitslosenquote in Dezimalzahlen"
My question to you is now: Should I use the growth rates in the variables (the dependent variables as well as the control variables) or their levels? I decided to estimate a LSDVC model (xtlsdvc in STATA). When I use levels, I see some effects, but when I use growth rates, virtually all the variables become insignificant.
I use the following code in STATA for the analysis in levels:
Code:
xtlsdvc kat03 links lagd_mlp bev alt schulden Alquote BIP, initial(ab) vcov(50) first
Code:
xtlsdvc gln_kat03 links lagd_mlp gln_alt gln_schulden gln_Alquote gln_BIP, initial(ab) vcov(50) first
The data are stationary and the Hausman-test result proposed to use fixed effects.
Also: Do you think I am estimating the right model?
Thank you so much, your answer would help me an awful lot!!
Regards,
Lara Knuchel