Hi,
I want to randomly assign treatment and run the same regression in the loop for 50 times and then I want to see how many times my variable of interest (POST) is significant. It's better if I can store the estimate with the p-value (or in a star representation). I tried the following. Where am I going wrong? Any help is appreciated.
drop post
drop random
foreach i of numlist 1/50 {
generate random=uniform()
sort random
generate post=0
replace post=1 in 1/50
tsset id year
set matsize 800
set more off
reg agriculture post i.country_id i.year i.country#c.line_time_trend [aw=ypop], cluster( country_id )
est sto b_1
esttab b_1,keep( post ) se star(* 0.10 ** 0.05 *** 0.001)
est clear //estimates clear
matrix p = J(`post',1,.)
matrix rownames p = `y'
matrix colnames p = "p"
matlist p
// for the computation of the p-values
local i = 1
foreach var of local y {
reg `var' post
matrix p[`i++',1] = 2*ttail(e(df_r), abs(_b[post]/_se[post]))
}
//
matlist p
}
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Monday, September 30, 2019
Interpreting dummy variable with no constant in the model
Hi statalist, I have a question regard to how to interpreting dummy variable with no constant in the model. For example, in this cross sectional regression
So in this case, I understand that there is no base dummy variables. Therefore, it can be interpreted that α3 is the coefficient of financial industry (it doesn't require to sum with coefficient from constant) However, what about capital expenditure, can I interpret that α1 is the sensitivity of "overall" stock return to capital expenditure instead of sensitivity of "base dummy variable" stock return to capital expenditure?
Best regards,
Siraphop Swingthong
stock returni = α1 capital expenditurei + α2 employeei + α3Financial industry + α4Consumer production industry + α5Agro industryεi
where capital expenditure and employee are normal variables, Financial industries, consumer production industry and agro industry are dummy variable where FIN = 1 mean the stock is in financial industries and 0 for other wise, Consumer production =1 mean stock is in consumer production and 0 for other wise Agro=1 mean stock is in agro and 0 for other wise (Note that suppose there are only three industry in the market, financial, consumer production and agricultural)
where capital expenditure and employee are normal variables, Financial industries, consumer production industry and agro industry are dummy variable where FIN = 1 mean the stock is in financial industries and 0 for other wise, Consumer production =1 mean stock is in consumer production and 0 for other wise Agro=1 mean stock is in agro and 0 for other wise (Note that suppose there are only three industry in the market, financial, consumer production and agricultural)
Best regards,
Siraphop Swingthong
Replacing Contents of All Variables
I have a data set in which I have students' grades for their 30 subjects aprox. These are letter grades like A*, A, B,... I want to convert them into their equivalent numerical values like A*= 95, A=90, B=80, and so on. Can someone guide me how can I do that using a single command instead of using the following command for every variable and grade?
Secondly, do I also need to convert the type of the variable from string to integer?
Code:
replace Accounting ="96" if Accounting == "A*"
ppml: Number of regressors excluded
Hi,
I am using the PPML estimator but I am not really sure what does it mean?:
"Number of regressors excluded to ensure that the estimates exist: 59"
I appreciate any suggestion.
I am using the PPML estimator but I am not really sure what does it mean?:
"Number of regressors excluded to ensure that the estimates exist: 59"
I appreciate any suggestion.
Smoothing Kaplan Meier Curves
Hello STATA Community,
I am a new STATA user and I am trying to compute smooth Kaplan Meier curves. Is this possible?
After created my survival dataset with all appropriate variables, I run:
This produces Kaplan Meier curves but unfortunately the data center I work out of will not let me release these unless they are smoothed (because every event (i.e. drop in the graph) needs to be experienced by a certain number of people for confidentiality reasons). Smoothed survival curves still show the pattern by group, but removes the specificity of when each drop occurs. Can someone help provide syntax or instruction on how to smooth these curves?
I know how to compute smooth hazard functions but I'm really only looking to produce survival curves.
Thank you for your help.
I am a new STATA user and I am trying to compute smooth Kaplan Meier curves. Is this possible?
After created my survival dataset with all appropriate variables, I run:
Code:
sts graph, by(kidneycan)
I know how to compute smooth hazard functions but I'm really only looking to produce survival curves.
Thank you for your help.
Adding time variables
Hi,
I am currently working with time data. The following is the time each person spends alone on each activity (25 activities, shown only 7 ), this includes missing values since the person is not alone for the given activity. I would like to see how much time each person spends alone in a day. Essentially I am trying to add the h1+h2+ h3+ h4+ h5 + h6+ .... for each individual (140 individuals). Can someone please help me with the same? Thank you
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(h1 h2 h3 h4 h5 h6 h7)
. . 1800000 . 1800000 . .
. 3600000 3600000 3600000 . 7200000 3600000
. 1.08e+07 1.26e+07 1800000 1.08e+07 3600000 1800000
1.08e+07 1800000 1800000 1800000 1800000 3600000 .
. 1800000 5400000 1800000 1800000 3600000 3600000
1.08e+07 1800000 1800000 1800000 7200000 . 3600000
. . . . . 3600000 7200000
7200000 . . 7200000 . 3600000 .
end
format %tchh:MM:SS h1
format %tchh:MM:SS h2
format %tchh:MM:SS h3
format %tchh:MM:SS h4
format %tchh:MM:SS h5
format %tchh:MM:SS h6
format %tchh:MM:SS h7
Cheers,
Aarushi
I am currently working with time data. The following is the time each person spends alone on each activity (25 activities, shown only 7 ), this includes missing values since the person is not alone for the given activity. I would like to see how much time each person spends alone in a day. Essentially I am trying to add the h1+h2+ h3+ h4+ h5 + h6+ .... for each individual (140 individuals). Can someone please help me with the same? Thank you
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(h1 h2 h3 h4 h5 h6 h7)
. . 1800000 . 1800000 . .
. 3600000 3600000 3600000 . 7200000 3600000
. 1.08e+07 1.26e+07 1800000 1.08e+07 3600000 1800000
1.08e+07 1800000 1800000 1800000 1800000 3600000 .
. 1800000 5400000 1800000 1800000 3600000 3600000
1.08e+07 1800000 1800000 1800000 7200000 . 3600000
. . . . . 3600000 7200000
7200000 . . 7200000 . 3600000 .
end
format %tchh:MM:SS h1
format %tchh:MM:SS h2
format %tchh:MM:SS h3
format %tchh:MM:SS h4
format %tchh:MM:SS h5
format %tchh:MM:SS h6
format %tchh:MM:SS h7
Cheers,
Aarushi
Syntax of cross-classified multilevel model
Dear Statalist,
I need to analyze data with complex structures.
My data set consists of individuals who are fully nested within teams, which are nested "crossly" within functions and locations, which are fully nested within firms.
(individuals) ---> (teams) ---> (functions) and (locations) ---> (firms)
I guess I need to do four-level cross-classified multi-level analysis and I have the following hypothetical syntax:
Code:
xtmixed y x || firm: || _all:R.function: || _all:R.location: || team:
I hope someone can confirm whether it is right, or correct it if it is wrong. Thank you for helping me out.
Best,
Eunhee
I need to analyze data with complex structures.
My data set consists of individuals who are fully nested within teams, which are nested "crossly" within functions and locations, which are fully nested within firms.
(individuals) ---> (teams) ---> (functions) and (locations) ---> (firms)
I guess I need to do four-level cross-classified multi-level analysis and I have the following hypothetical syntax:
Code:
xtmixed y x || firm: || _all:R.function: || _all:R.location: || team:
I hope someone can confirm whether it is right, or correct it if it is wrong. Thank you for helping me out.
Best,
Eunhee
Out of Sample/Scope Prediction with CI
I need some help to get more details for Examples 2&3 to work
(page 7 in regress postestimation — Postestimation tools for regress ) https://www.stata.com/manuals/rregre...estimation.pdf.
I need to add a few independent variable observations after the regression and ANOVA analysis and then predict the prediction interval and individual value.
I used the following program to insert an observation into the dataset and prompt the user for an out of scope prediction:
I followed the examples but I got a different answer than our instructor who used "R". This is their code and answer:
> new.amount <- data.frame(pectin=c(1.5))
> predict(model,newdata=new.amount,interval="predict ion")
fit lwr upr
1 62.5725 53.19595 71.94905
Dataset
(page 7 in regress postestimation — Postestimation tools for regress ) https://www.stata.com/manuals/rregre...estimation.pdf.
I need to add a few independent variable observations after the regression and ANOVA analysis and then predict the prediction interval and individual value.
I used the following program to insert an observation into the dataset and prompt the user for an out of scope prediction:
Code:
capture program drop OutOfSamplePredict
program OutOfSamplePredict
*ask user for the out of sample dependent var
display "This will create an out-of-sample prediciton..."
display ""
display "Please enter your Dependant variable, for the Y Axis:" _request(dependant)
display "Thank you, to verify:"
display "The Dependant variable selected, for the Y Axis, is:$dependant"
display "Please enter the INDependant variable used for your regression - the X Axis:" _request(INDependant_var)
display ""
display "Thank you, to verify:"
display "The InDependant variable you entered is:$INDependant_var"
display ""
display "Please enter the value you wish to predict, (Dependent variable out-of-sample value):" _request(INDependant_value)
display ""
display "Thank you, to verify:"
display "The value you want a prediciton for is: $INDependant_value"
*run the regression Quietly
quietly regress $dependant $INDependant_var
* create a new observation
quietly insobs 1
*insert the new Out of Sample observation that the user entered into the INDependant variable
replace $INDependant_var = $INDependant_value in l
*create a local macro name to store the prediciton value to.
local mypred = _b[_cons] + _b[$INDependant_var]*$INDependant_value
display "your linear prediction equation : " _b[_cons] " + " _b[$INDependant_var] " * " $INDependant_value
display ""
display "prediction value is = `mypred'"
display ""
display "writing your values to the dataset..."
predict predict, xb
predict residuals, residuals
predict CooksDistance, cooksd
predict StandardResiduals, rstandard
predict Leverage, leverage
* insert the prediction value based on the regression equation
replace predict = `mypred' in l
* generate leverage for the predicted value
/*
predict temp_leverage in l, leverage /* because predict can only make new variables create a temp variable */
replace Leverage = temp_leverage in l /* replace the only created value into the replacement variable */
drop temp_leverage /* drop the temporary variable */
*/
* generate Cooks Distance for the predicted value
/*
predict temp_cooks in l, cooksd /* because predict can only make new variables create a temp variable */
replace CooksDistance = temp_cooks in l /* replace the only created value into the replacement variable */
drop temp_cooks /* drop the temporary variable */
*/
* generate Standard Error for the predicted value
predict SE_Forecast in l, stdf
predict SE_Predict in l, stdp
predict SE_Residual in l, stdr
* generate the Confidence interval for the out of scope prediction
local 2SD = SE_Forecast[_N] * 2
local UCL = predict[_N] + `2SD'
local LCL = predict[_N] - `2SD'
display "The Upper bound for your Confidence Interval is: `UCL'"
display ""
display "The LOWER bound for your Confidence Interval is: `LCL'"
endI followed the examples but I got a different answer than our instructor who used "R". This is their code and answer:
> new.amount <- data.frame(pectin=c(1.5))
> predict(model,newdata=new.amount,interval="predict ion")
fit lwr upr
1 62.5725 53.19595 71.94905
Dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float firmness byte pectin 46.9 0 50.2 0 51.3 0 56.48 1 59.34 1 62.97 1 67.91 2 70.78 2 73.67 2 68.13 3 70.85 3 72.34 3 end
IV
Dear all,
I have a question concerning a comment from a reviewer and I am not quite sure how to go about it. To keep it simple, I have a paper on RCT (facilitating saving intervention among farmers) and the reviewer says that I should do an IV version of it by taking the treatment (binary) as an instrument. This would mean taking the basic treatment effect estimate and divide it by 0.2(multiplying it by 5) which would get the treatment effect among "compliers".
I am not sure even how to think and how to code this! I would be happy to have some thoughts about this issue.
Thanks!
I have a question concerning a comment from a reviewer and I am not quite sure how to go about it. To keep it simple, I have a paper on RCT (facilitating saving intervention among farmers) and the reviewer says that I should do an IV version of it by taking the treatment (binary) as an instrument. This would mean taking the basic treatment effect estimate and divide it by 0.2(multiplying it by 5) which would get the treatment effect among "compliers".
I am not sure even how to think and how to code this! I would be happy to have some thoughts about this issue.
Thanks!
Data management problem
Hi
I am trying to analyze some data, and have entered some commands,..as reproduced below,..however when i get to the loop in red,..i'm getting "==1 invalid name" and have run down quite a number of answers on similar problems, but i am not managing to find the problem. Help would be much appreciated,..i have also attached the excel file so that the commands can be reproduced.
// Importing data
import excel "Version3_20_06.xlsx", sheet("Form responses 1") firstrow
// Cleaning data
// Store-level variables
// Filling in store level variables
replace Retail_size="Small" if Retail_size=="small"
replace Retail_size=Retail_size[_n-1] if Retail_size==""
egen store_number=group(StoresTimestamp)
replace store_number=store_number[_n-1] if store_number==.
replace Shop_location=Shop_location[_n-1] if Shop_location==""
// Assigning numeric values to categorical variables
encode Retail_size, gen(shop_size)
encode Shop_location, gen(shop_location)
// Reordering variables
order store_number shop_size shop_location
// Dropping un-needed variables
drop StoresTimestamp Retail_size Shop_location
// Brand-level variables
drop Category_beverage
rename Cocacola_available cocacola_available
rename Cocacola_price cocacola_price
rename Coke_light_available coke_light_available
rename Coke_light_price coke_light_price
rename sprite_zeroavailable sprite_zero_available
//rename sprite_Size sprite_size
//rename BeverageSize stoney_size
rename cocacola_Size cocacola_size
rename Coo_ee_Price Coo_ee_price
rename CloverTropikajuicePrice clover_tropika_juice_price
rename cococola_promotion cocacola_promotion
//rename Stoney_zero_available stoney_zero_available
rename Stoney_zero_price stoney_zero_price
rename Stoney_zero_promotion stoney_zero_promotion
// Encode availability
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
encode `var'_available, gen(`var'_available_)
drop `var'_available
rename `var'_available_ `var'_available
}
// Generating size measure in litres
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
gen `var'_size_litres=.
replace `var'_size_litres=0.250 if `var'_size=="> 250ml"
replace `var'_size_litres=0.300 if `var'_size=="> 300ml"
replace `var'_size_litres=0.330 if `var'_size=="> 330ml"
replace `var'_size_litres=0.355 if `var'_size=="> 355ml"
replace `var'_size_litres=0.440 if `var'_size=="> 440ml"
replace `var'_size_litres=0.473 if `var'_size=="> 473ml"
replace `var'_size_litres=0.500 if `var'_size=="> 500ml"
replace `var'_size_litres=0.750 if `var'_size=="> 750ml"
replace `var'_size_litres=1 if `var'_size=="1L"
replace `var'_size_litres=1.25 if `var'_size=="1> .25L"
replace `var'_size_litres=1.5 if `var'_size=="1.> 5L"
replace `var'_size_litres=2 if `var'_size=="2L"
replace `var'_size_litres=2.25 if `var'_size=="2 > .25L"
drop `var'_size
}
// Generating price per litre
destring coke_zero_price, replace force float
destring sprite_zero_price, replace force float
destring red_bull_sugarfree_price, replace force float
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
gen `var'_price_litre=`var'_price/`var'_size_litres
}
// Re-shaping data for analysis
order store_number-flavoured_water_price_litre, alpha
order store_number shop_size shop_location
local i=1
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruit_juice clover_tropika_juice pure_milk flavoured_water {
preserve
if `i'==1 {
keep store_number shop_size shop_location `var'*
drop if `var'_size_litres==.
rename `var'_available available
rename `var'_price price
rename `var'_price_litre price_litre
rename `var'_size_litres size_litres
rename `var'_promotion promotion
gen brand="`var'"
keep store_number shop_size shop_location brand size_litres available price price_litre promotion
order store_number shop_size shop_location brand size_litres available price price_litre promotion
save "Data.dta", replace
}
if `i'>1 {
keep store_number shop_size shop_location `var'*
drop if `var'_size_litres==.
rename `var'_available available
rename `var'_price price
rename `var'_price_litre price_litre
rename `var'_size_litres size_litres
rename `var'_promotion promotion
gen brand="`var'"
keep store_number shop_size shop_location brand size_litres available price price_litre promotion
order store_number shop_size shop_location brand size_litres available price price_litre promotion
save "Temp/`var'.dta", replace
use "Data.dta", clear
append using "Temp/`var'.dta", force
save "Data.dta", replace
}
local i=`i'+1
restore
}
I can't seem to find a solution to the "==1 invalid name" error,..i will greatly appreciate any possible help
I am trying to analyze some data, and have entered some commands,..as reproduced below,..however when i get to the loop in red,..i'm getting "==1 invalid name" and have run down quite a number of answers on similar problems, but i am not managing to find the problem. Help would be much appreciated,..i have also attached the excel file so that the commands can be reproduced.
// Importing data
import excel "Version3_20_06.xlsx", sheet("Form responses 1") firstrow
// Cleaning data
// Store-level variables
// Filling in store level variables
replace Retail_size="Small" if Retail_size=="small"
replace Retail_size=Retail_size[_n-1] if Retail_size==""
egen store_number=group(StoresTimestamp)
replace store_number=store_number[_n-1] if store_number==.
replace Shop_location=Shop_location[_n-1] if Shop_location==""
// Assigning numeric values to categorical variables
encode Retail_size, gen(shop_size)
encode Shop_location, gen(shop_location)
// Reordering variables
order store_number shop_size shop_location
// Dropping un-needed variables
drop StoresTimestamp Retail_size Shop_location
// Brand-level variables
drop Category_beverage
rename Cocacola_available cocacola_available
rename Cocacola_price cocacola_price
rename Coke_light_available coke_light_available
rename Coke_light_price coke_light_price
rename sprite_zeroavailable sprite_zero_available
//rename sprite_Size sprite_size
//rename BeverageSize stoney_size
rename cocacola_Size cocacola_size
rename Coo_ee_Price Coo_ee_price
rename CloverTropikajuicePrice clover_tropika_juice_price
rename cococola_promotion cocacola_promotion
//rename Stoney_zero_available stoney_zero_available
rename Stoney_zero_price stoney_zero_price
rename Stoney_zero_promotion stoney_zero_promotion
// Encode availability
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
encode `var'_available, gen(`var'_available_)
drop `var'_available
rename `var'_available_ `var'_available
}
// Generating size measure in litres
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
gen `var'_size_litres=.
replace `var'_size_litres=0.250 if `var'_size=="> 250ml"
replace `var'_size_litres=0.300 if `var'_size=="> 300ml"
replace `var'_size_litres=0.330 if `var'_size=="> 330ml"
replace `var'_size_litres=0.355 if `var'_size=="> 355ml"
replace `var'_size_litres=0.440 if `var'_size=="> 440ml"
replace `var'_size_litres=0.473 if `var'_size=="> 473ml"
replace `var'_size_litres=0.500 if `var'_size=="> 500ml"
replace `var'_size_litres=0.750 if `var'_size=="> 750ml"
replace `var'_size_litres=1 if `var'_size=="1L"
replace `var'_size_litres=1.25 if `var'_size=="1> .25L"
replace `var'_size_litres=1.5 if `var'_size=="1.> 5L"
replace `var'_size_litres=2 if `var'_size=="2L"
replace `var'_size_litres=2.25 if `var'_size=="2 > .25L"
drop `var'_size
}
// Generating price per litre
destring coke_zero_price, replace force float
destring sprite_zero_price, replace force float
destring red_bull_sugarfree_price, replace force float
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
gen `var'_price_litre=`var'_price/`var'_size_litres
}
// Re-shaping data for analysis
order store_number-flavoured_water_price_litre, alpha
order store_number shop_size shop_location
local i=1
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruit_juice clover_tropika_juice pure_milk flavoured_water {
preserve
if `i'==1 {
keep store_number shop_size shop_location `var'*
drop if `var'_size_litres==.
rename `var'_available available
rename `var'_price price
rename `var'_price_litre price_litre
rename `var'_size_litres size_litres
rename `var'_promotion promotion
gen brand="`var'"
keep store_number shop_size shop_location brand size_litres available price price_litre promotion
order store_number shop_size shop_location brand size_litres available price price_litre promotion
save "Data.dta", replace
}
if `i'>1 {
keep store_number shop_size shop_location `var'*
drop if `var'_size_litres==.
rename `var'_available available
rename `var'_price price
rename `var'_price_litre price_litre
rename `var'_size_litres size_litres
rename `var'_promotion promotion
gen brand="`var'"
keep store_number shop_size shop_location brand size_litres available price price_litre promotion
order store_number shop_size shop_location brand size_litres available price price_litre promotion
save "Temp/`var'.dta", replace
use "Data.dta", clear
append using "Temp/`var'.dta", force
save "Data.dta", replace
}
local i=`i'+1
restore
}
I can't seem to find a solution to the "==1 invalid name" error,..i will greatly appreciate any possible help
ci for categorial variables - tabci?
I'm just wondering why Stata gives CIs for binary variables - ci proportions varname - but, as far as I tried to find, we cannot have results for categorical variables.
Am I missing something, is it technically impossible or we just don't have a user-written program to estimate these proportions with CIs in Stata?
If it can be done, and considering (and maybe I'm wrong) we don't have it so far, maybe this thread could foster somebody to create, say, a SSC (or SJ) - tabci - ado file.
Code:
. sysuse auto
(1978 Automobile Data)
. ci proportions foreign
-- Binomial Exact --
Variable | Obs Proportion Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
foreign | 74 .2972973 .0531331 .196584 .4148353
. ci proportions rep78
no binary (0/1) variables found; nothing to computeIf it can be done, and considering (and maybe I'm wrong) we don't have it so far, maybe this thread could foster somebody to create, say, a SSC (or SJ) - tabci - ado file.
Command stir does not accept iweights
Hello
I've been using the command stir (incidence rate comparison). The manual says that stir allows iweights and fweights (more specifically it says you can specify stset with iweights and fweights), but when I use iweights I get an error message saying the iweights are not allowed. Is it a known issue? Is there a particular reason for the command to prevent weighting data with iweight or is it a bug? I know that some commands do not allow the use of all types of weights. But stip, a similar command which make similar calculations but do not report confidence intervals, does not forbid the use of iweights.
Thanks for your help
Christophe
I've been using the command stir (incidence rate comparison). The manual says that stir allows iweights and fweights (more specifically it says you can specify stset with iweights and fweights), but when I use iweights I get an error message saying the iweights are not allowed. Is it a known issue? Is there a particular reason for the command to prevent weighting data with iweight or is it a bug? I know that some commands do not allow the use of all types of weights. But stip, a similar command which make similar calculations but do not report confidence intervals, does not forbid the use of iweights.
Thanks for your help
Christophe
Replace if not found within a loop
Dear all,
I am trying to bottom and top coding the household income for each country - wave.
I am using the following loop that I already successfully used with another data source:
I want to replace the income below the 1% of the equivalised household income with that 1%, and those greater than 10 times the median of non-equivalised with the value corresponding to the 10 times median.
However, Stata returns the following error: if not found.
I cannot understand what's the error, can anyone help me?
Thank you
I am trying to bottom and top coding the household income for each country - wave.
I am using the following loop that I already successfully used with another data source:
levelsof country, local(c)
levelsof wave, local(w)
foreach country in `c' {
foreach wave in `w' {
sum hi100_equ [aw=new_weight] if country==`country' & wave==`wave', det
local bottom_hi=r(mean)/100
sum hi100_eur [aw=new_weight] if country==`country' & wave==`wave', det
local top_dhi=10*r(p50)
disp `bottom_hi'
disp `top_hi'
replace adjusted_hi=`top_hi' if hi100_eur>`$top_hi' & country==`country' & wave==`wave'
replace adjusted_hi=`bottom_hi' if hi100_eur<`$bottom_hi' & country==`country' & wave==`wave'
}
}
levelsof wave, local(w)
foreach country in `c' {
foreach wave in `w' {
sum hi100_equ [aw=new_weight] if country==`country' & wave==`wave', det
local bottom_hi=r(mean)/100
sum hi100_eur [aw=new_weight] if country==`country' & wave==`wave', det
local top_dhi=10*r(p50)
disp `bottom_hi'
disp `top_hi'
replace adjusted_hi=`top_hi' if hi100_eur>`$top_hi' & country==`country' & wave==`wave'
replace adjusted_hi=`bottom_hi' if hi100_eur<`$bottom_hi' & country==`country' & wave==`wave'
}
}
However, Stata returns the following error: if not found.
I cannot understand what's the error, can anyone help me?
Thank you
endogenous switching regression model
Hello,
Please I am new in stata so it's quite difficult for me to solve some regression problems.
Actually, I am trying to estimate an endogenous switching regression model using the movestay command where my selection dependent variable is ADOPT(dummy) while my outcome dependent variable is HDDS(continuous).
Unfortunately I keep getting the following error. i tried to add the variables progressively but either of the messages below. the strange part is that the same command was ran on a different computer using the same of stata and it produced results without any errors.
I am getting desperate. Can someone help me solve this problem? Please
global str GENDER AGE YEARSOFEDUC INCOMEO DISNEARESTMARKETKM TENURE CREDITACCESS
. movestay(HDDS=$str),select(ADOPT=$str MEDIA)
Fitting initial values .....
initial: log likelihood = -<inf> (could not be evaluated)
could not find feasible values
r(491);
or
movestay HDDS INCOMEO AGE AGRICEXPERIENCE ,select(ADOPT=MEDIA)
Fitting initial values .....initial vector: copy option requires either a matrix or a list of numbers
r(198);
Please I am new in stata so it's quite difficult for me to solve some regression problems.
Actually, I am trying to estimate an endogenous switching regression model using the movestay command where my selection dependent variable is ADOPT(dummy) while my outcome dependent variable is HDDS(continuous).
Unfortunately I keep getting the following error. i tried to add the variables progressively but either of the messages below. the strange part is that the same command was ran on a different computer using the same of stata and it produced results without any errors.
I am getting desperate. Can someone help me solve this problem? Please
global str GENDER AGE YEARSOFEDUC INCOMEO DISNEARESTMARKETKM TENURE CREDITACCESS
. movestay(HDDS=$str),select(ADOPT=$str MEDIA)
Fitting initial values .....
initial: log likelihood = -<inf> (could not be evaluated)
could not find feasible values
r(491);
or
movestay HDDS INCOMEO AGE AGRICEXPERIENCE ,select(ADOPT=MEDIA)
Fitting initial values .....initial vector: copy option requires either a matrix or a list of numbers
r(198);
-dstdize- to calculate average age-adjusted death rate by race
Good Morning,
I would like to calculate the average age-adjusted death rate by race (separately for whites and blacks), and test if the rates for countries are statistically different. As an example, I have modified Stata's mortality data file as follows:
Based on my understanding of the -dstdize- command, the following code will give me the average death rate by race for each country.
I have the following two questions:
1. In my study, I have about 300 countries. Hence, I would like to present an age-adjusted rate for blacks and whites (averaged across all countries). I just want to make sure that the following code will help me achieve this?
2. Is there a way for me to test if the death rate for whites in Sweden is statistically different from that for whites in Panama?
I am using Stata MP 14.2 for Windows.
Many thanks in advance,
Caroline
I would like to calculate the average age-adjusted death rate by race (separately for whites and blacks), and test if the rates for countries are statistically different. As an example, I have modified Stata's mortality data file as follows:
Code:
clear input str6 nation byte age_category float(population deaths) input str6 nation byte age_category race_category float(population deaths) "Sweden" 1 1 3100000 3000 "Sweden" 1 2 45000 523 "Sweden" 2 1 3000000 10000 "Sweden" 2 2 57000 928 "Panama" 1 1 700000 300 "Panama" 1 2 41000 904 "Panama" 2 1 270000 1000 "Panama" 2 2 75000 421 end label values age_category age_lbl label def age_lbl 1 "0 - 29", modify label def age_lbl 2 "30 - 59", modify label values race_category race_lbl label def race_lbl 1 "White", modify label def race_lbl 2 "Black", modify
Code:
dstdize deaths population age_category, by(nation race_category)
1. In my study, I have about 300 countries. Hence, I would like to present an age-adjusted rate for blacks and whites (averaged across all countries). I just want to make sure that the following code will help me achieve this?
Code:
dstdize deaths population age_category, by(race_category)
I am using Stata MP 14.2 for Windows.
Many thanks in advance,
Caroline
Can I test country level data and firm level data in one regression
Hello everyone,
I'm trying to find the effects of culture on the existence of teamwork. My dataset consists of 29 countries and of around 30.000 companies. My dataset is cross-sectional.
Dependent variable
Team = a dummy variable that equals 1 when there is teamwork present in the company. (Firm level)
Independent variables
Individualism = A national cultural score which is defined on country-level (range 1-100).
Masculinity = A national cultural score which is defined on country-level (range 1-100).
Power = A national cultural score which is defined on country-level (range 1-100).
Female = The percentage of females in the company used as a proxy for altruism (Firm level)
Control variables
Size = This variable equals the amount of employees in the company (Firm level)
Sector = 0-5 depending on type of sector (dummy). (Firm level).
What is the best way to analyze this matter.? Using a regression on country level with the averages of the firm level variables comes at the cost of the amount of observation (29 instead of 30.000).
To be able to analyze this matter on firm level, I have to make the assumption that all cultural scores are exactly the same for each company in the same country, which feels like a strong assumption. However, I would have much more observations (30.000 instead of 29).
What is the best way to analyze this matter?
Thank you in advance and best regards,
I'm trying to find the effects of culture on the existence of teamwork. My dataset consists of 29 countries and of around 30.000 companies. My dataset is cross-sectional.
Dependent variable
Team = a dummy variable that equals 1 when there is teamwork present in the company. (Firm level)
Independent variables
Individualism = A national cultural score which is defined on country-level (range 1-100).
Masculinity = A national cultural score which is defined on country-level (range 1-100).
Power = A national cultural score which is defined on country-level (range 1-100).
Female = The percentage of females in the company used as a proxy for altruism (Firm level)
Control variables
Size = This variable equals the amount of employees in the company (Firm level)
Sector = 0-5 depending on type of sector (dummy). (Firm level).
What is the best way to analyze this matter.? Using a regression on country level with the averages of the firm level variables comes at the cost of the amount of observation (29 instead of 30.000).
To be able to analyze this matter on firm level, I have to make the assumption that all cultural scores are exactly the same for each company in the same country, which feels like a strong assumption. However, I would have much more observations (30.000 instead of 29).
What is the best way to analyze this matter?
Thank you in advance and best regards,
Generating a new variable that equals the Nth non-missing value in a row
Hi All,
I am essentially trying to compress a row of variables into just the non-missing entries. My dataset looks like this
So I was to gen var_a = the first non missing value from each column, so that would be all the observations basically, 2,1,.,.,.,.,2,1,3,4,7. If there are more, then the next variable var_b would include them, but would be missing otherwise.
Any ideas?
Many thanks!
Chris
I am essentially trying to compress a row of variables into just the non-missing entries. My dataset looks like this
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(q5__1 q5__2 q5__3 q5__4 q5__5 q5__6 q5__7 q5__8 q5__9 q5__10) . 2 . . . . . . 0 0 1 . . . . . . . 0 0 . . . . . . . . 0 0 . . . . . . . . 0 0 . . . . . . . . 0 0 . . . . . . . . 0 0 . 2 . . . . . . 0 0 1 . . . . . . . 0 0 . . 3 . . . . . 0 0 . . . 4 . . . . 0 0 . . . . . . 7 . 0 0 end
Any ideas?
Many thanks!
Chris
working out expressions(sum and multiply) in stata with some missing observations
Hi Nick Cox
I want to add and multiply certain variables in stata in a new variable VaR. My expression is:
gen VaR=constant_term+ (coeff_L_Tbills* L_Tbills)+ (coeff_L_Inf* L_Inf)
Now the issue is that, for some rows where when L_Inf is missing or 0, the whole outcome VaR of the equation becomes zero. which shouldn't be otherwise if calculated manually.
How can I amend the above code, to get the correct equation.
Best Regards
I want to add and multiply certain variables in stata in a new variable VaR. My expression is:
gen VaR=constant_term+ (coeff_L_Tbills* L_Tbills)+ (coeff_L_Inf* L_Inf)
Now the issue is that, for some rows where when L_Inf is missing or 0, the whole outcome VaR of the equation becomes zero. which shouldn't be otherwise if calculated manually.
How can I amend the above code, to get the correct equation.
Best Regards
two stage IV approch
bootstrap "repeated time values within panel" 2 SRI
I'm trying to run 2sri approach(control function) in Stata 15. My first stage is a probit regression and the second regression is Tobit (using xtdpf code from the website)
My first stage code is as:
global Z "a b c d "
probit X IV $Z i.year i.industry, r
predict `Xhat'
matrix `a1'=e(b)
gen Xuhat=X-`Xhat'
My second stage code is :
global Z "a b c d"
xtdpf Y X $Z Xuhat, timedummies
The problem now is that in the probit model, I control for year and industry fixed effect, I cannot control firm fixed effects. In the second stage, I control both year and firm fixed effects. My supervisor said in this way, the estimates are inconsistent. However, my instrumental variable is quite a constant variable (didn't change every year), besides Probit cannot include firm fixed effects. Does anyone know how to solve this problem? please help me, I will appreciate any help!
I'm trying to run 2sri approach(control function) in Stata 15. My first stage is a probit regression and the second regression is Tobit (using xtdpf code from the website)
My first stage code is as:
global Z "a b c d "
probit X IV $Z i.year i.industry, r
predict `Xhat'
matrix `a1'=e(b)
gen Xuhat=X-`Xhat'
My second stage code is :
global Z "a b c d"
xtdpf Y X $Z Xuhat, timedummies
The problem now is that in the probit model, I control for year and industry fixed effect, I cannot control firm fixed effects. In the second stage, I control both year and firm fixed effects. My supervisor said in this way, the estimates are inconsistent. However, my instrumental variable is quite a constant variable (didn't change every year), besides Probit cannot include firm fixed effects. Does anyone know how to solve this problem? please help me, I will appreciate any help!
Looping Over the numeric value of a variable and use it for the subsequent loop.
Hi, thanks for your help in advance.
I have the following dataset. And I come up with the following code:
In words, what I am trying to do is, looping over the value of variable dateday_count, and for each row, I want to do
More specifically, for row 12 for permno 10001, I want to do
I am moderately confident that my code is doing the job. However, I am wondering, is there any faster way for me to loop over the numeric value of dateday_count straightaway, and choose it as the looping variable. So I do not have to use "if dateaday_cout==`i'" command.
In the dataex, I only attached the sample from single permno, you need to remove the "bys permno" part of my code.
I have the following dataset. And I come up with the following code:
Code:
egen t=max(dateday_count)
forvalues i=1/762{
gen k_`i'=.
gen ref_`i'=.
}
levelsof dateday_count, local(levels)
foreach i of local levels {
di(`i')
quietly{
forvalue x=1/`i'{
bys permno (dateday): replace k_`x'=adjvol[_n-`x']*exp(total_log_rev_turnover-total_log_rev_turnover[_n-`x'+1]) if dateday_count==`i'
by permno: replace ref_`x' = adjprc[_n-`x']*k_`x' if dateday_count==`i'
}
}
}In words, what I am trying to do is, looping over the value of variable dateday_count, and for each row, I want to do
Code:
forvalue x=1/"dateday_count"{
bys permno (dateday): replace k_`x'=adjvol[_n-`x']*exp(total_log_rev_turnover-total_log_rev_turnover[_n-`x'+1])
by permno: replace ref_`x' = adjprc[_n-`x']*k_`x'
}Code:
forvalue x=1/12{
bys permno (dateday): replace k_`x'=adjvol[_n-`x']*exp(total_log_rev_turnover-total_log_rev_turnover[_n-`x'+1])
by permno: replace ref_`x' = adjprc[_n-`x']*k_`x'
}In the dataex, I only attached the sample from single permno, you need to remove the "bys permno" part of my code.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(permno dateday) float(adjvol adjprc total_log_rev_turnover) double dateday_count 10001 10595 0 2.1041667 0 1 10001 10596 7200 2.0833333 0 2 10001 10597 6150 2.1666667 0 3 10001 10598 2100 2.0416667 -1.2086598 4 10001 10601 600 2.1666667 -1.4323044 5 10001 10602 17250 2.1666667 -1.4323044 6 10001 10603 1500 2.0833333 -2.1274576 7 10001 10604 330 2.0833333 -2.244239 8 10001 10605 1500 2.1666667 -2.939392 9 10001 10608 300 2.1666667 -3.0449755 10 10001 10609 3000 2.0833333 -3.0449755 11 10001 10610 1500 2.1666667 -3.740129 12 10001 10611 0 2.125 -3.740129 13 10001 10612 1500 2.0833333 -4.435282 14 10001 10615 600 2.1666667 -4.658927 15 10001 10616 2982 2.0833333 -10.17839 16 10001 10617 390 2.0833333 -10.317952 17 10001 10618 3300 2.1666667 -10.317952 18 10001 10619 0 2.125 -10.317952 19 10001 10622 17700 2.125 -10.317952 20 10001 10623 7575 2.1666667 -10.317952 21 10001 10624 600 2.1666667 -.2236447 22 10001 10625 6900 2.1666667 -.2236447 23 10001 10626 13500 2.1666667 -.2236447 24 10001 10629 17100 2.1666667 -.2236447 25 10001 10630 5400 2.0833333 -.2236447 26 10001 10631 16200 2.125 -.2236447 27 10001 10632 6000 2.125 -.2236447 28 10001 10633 0 2.125 -.2236447 29 10001 10636 3600 2.0833333 -.2236447 30 10001 10637 3210 2.1666667 -.2236447 31 10001 10638 2100 2.1666667 -1.4323044 32 10001 10639 10080 2.1666667 -1.4323044 33 10001 10640 12816 2.125 -1.4323044 34 10001 10644 8400 2.1666667 -1.4323044 35 10001 10645 12000 2.1666667 -1.4323044 36 10001 10646 0 2.2083333 -1.4323044 37 10001 10647 3000 2.1666667 -1.4323044 38 10001 10650 900 2.25 -1.7898387 39 10001 10651 3300 2.25 -1.7898387 40 10001 10652 3900 2.25 0 41 10001 10653 1200 2.125 -.5121625 42 10001 10654 1350 2.25 -1.1116405 43 10001 10657 3000 2.25 -1.1116405 44 10001 10658 11400 2.25 -1.1116405 45 10001 10659 900 2.25 -1.4691747 46 10001 10660 7800 2.25 -1.4691747 47 10001 10661 450 2.2916667 -1.6320474 48 10001 10664 3600 2.1666667 -1.6320474 49 10001 10665 2550 2.1666667 -3.5405884 50 10001 10666 300 2.2916667 -3.6461716 51 10001 10667 4800 2.2916667 -3.6461716 52 10001 10668 1200 2.2083333 -4.1583343 53 10001 10671 4800 2.2083333 -4.1583343 54 10001 10672 13500 2.2083333 -4.1583343 55 10001 10673 5160 2.3333333 -4.1583343 56 10001 10674 2700 2.25 -6.47912 57 10001 10678 1500 2.3333333 -7.174273 58 10001 10679 600 2.3333333 -7.397918 59 10001 10680 0 2.2916667 -7.397918 60 10001 10681 0 2.2916667 -7.397918 61 10001 10682 1500 2.25 -8.093071 62 10001 10685 0 2.2916667 0 63 10001 10686 4950 2.3333333 0 64 10001 10687 900 2.3333333 -.3575342 65 10001 10688 5400 2.3333333 -.3575342 66 10001 10689 1050 2.25 -.7893968 67 10001 10692 4650 2.3333333 -.7893968 68 10001 10693 5400 2.25 -.7893968 69 10001 10694 1500 2.25 -1.48455 70 10001 10695 3900 2.25 -1.48455 71 10001 10696 10800 2.416667 -1.48455 72 10001 10699 9000 2.416667 -1.48455 73 10001 10700 5940 2.2916667 -1.48455 74 10001 10701 1980 2.2916667 -2.5672574 75 10001 10702 1500 2.416667 -3.2624106 76 10001 10703 1500 2.416667 -3.957564 77 10001 10706 2100 2.416667 -5.166224 78 10001 10707 5400 2.5 -5.166224 79 10001 10708 2400 2.5 -6.78371 80 10001 10709 1500 2.5 -7.478863 81 10001 10710 23250 2.416667 -7.478863 82 10001 10713 14910 2.375 0 83 10001 10714 7800 2.416667 0 84 10001 10715 84111 2.5 0 85 10001 10716 19350 2.416667 0 86 10001 10717 0 2.458333 0 87 10001 10720 8550 2.5 0 88 10001 10721 3900 2.416667 0 89 10001 10722 17499 2.375 0 90 10001 10723 900 2.375 -.3575342 91 10001 10724 3000 2.416667 -.3575342 92 10001 10727 900 2.5 -.7150685 93 10001 10728 2700 2.458333 -3.035854 94 10001 10729 0 2.395833 -3.035854 95 10001 10730 3900 2.3333333 -3.035854 96 10001 10731 600 2.416667 -3.259499 97 10001 10734 0 2.375 -3.259499 98 10001 10735 9600 2.3333333 -3.259499 99 10001 10736 600 2.416667 -3.483144 100 end format %d dateday
Marginal effect after heckpoisson
Dear All,
I am new to stata and I am using Heckpoisson, a poison regression model with endogenous sample selection model to estimate factors influencing access to extension services and intensity of of access. I have the model result already but I have no clue of how to estimate the marginal effect (dydx) after Heckpoisson. I need you suggestion or help.
Below is the model output:
Poisson regression with endogenous selection Number of obs = 293
(25 quadrature point s) Selected = 141
Nonselected = 152
Wald chi2(9) = 39.34
Log pseudolikelihood = -397.3723 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------
| Robust
ext_freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
ext_freq |
age_hhh | -.0040843 .0034685 -1.18 0.239 -.0108824 .0027139
gender_hhh | -.2542102 .0785204 -2.82 0.005 -.3754374 -.067643
hh_size | -.0046676 .0156446 -0.30 0.765 -.0353305 .0259953
dist_mkt | .0075753 .013998 0.54 0.590 -.0198998 .0349712
own_cash_crop | .1196532 .0692318 1.73 0.084 -.0160386 .2553451
input_acces0 | .2543907 .0800315 3.05 0.002 .0870918 .4008095
aware_extension_services | -.2671813 .0863115 -3.15 0.002 -.4409838 -.1026488
access_2crdt | .07612 .0672013 1.14 0.253 -.0549509 .2084733
farm_size1 | .0021675 .0178802 0.15 0.881 -.032373 .0377159
_cons | 1.263667 .2010278 6.15 0.000 .8426601 1.630675
-------------------------+----------------------------------------------------------------
access_to_extension |
hh_size | -.051465 .031816 -1.62 0.106 -.1138233 .0108933
dist_mkt | -.0452002 .0317816 -1.42 0.155 -.107491 .0170905
own_cash_crop | -.0665395 .161095 -0.41 0.680 -.3822798 .2492008
input_acces0 | -.0579019 .1757906 -0.33 0.742 -.4024451 .2866414
aware_extension_services | .6452373 .1814888 3.56 0.000 .2895259 1.000949
access_2crdt | .3881123 .1606434 2.42 0.016 .0732571 .7029675
sell_rice | .3996281 .2191917 1.82 0.068 -.0299798 .8292361
farm_incom_hh | .0000456 .0000128 3.56 0.000 .0000205 .0000707
crop_dive | -.3390377 .3519390 -1.06 0.288 -.9650316 .2869562
main_inc_source | -.0995744 .1876322 -0.53 0.596 -.4673268 .268178
_cons | -.1595383 .4452855 -0.35 0.730 -1.026583 .7189037
-------------------------+----------------------------------------------------------------
/athrho | 1.515521 .248586 6.54 0.000 1.137901 2.11234
/lnsigma | -3.783098 .9520586 -4.03 0.000 -5.703099 -1.971098
-------------------------+----------------------------------------------------------------
rho | .9636253 .0322574 .8137061 .9711619
sigma | .0261155 .0222056 .0033356 .1393039
------------------------------------------------------------------------------------------
Wald test of indep. eqns. (rho = 0): chi2(1) = 33.01 Prob > chi2 = 0.0053
My regards,
T. Sumo
I am new to stata and I am using Heckpoisson, a poison regression model with endogenous sample selection model to estimate factors influencing access to extension services and intensity of of access. I have the model result already but I have no clue of how to estimate the marginal effect (dydx) after Heckpoisson. I need you suggestion or help.
Below is the model output:
Poisson regression with endogenous selection Number of obs = 293
(25 quadrature point s) Selected = 141
Nonselected = 152
Wald chi2(9) = 39.34
Log pseudolikelihood = -397.3723 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------
| Robust
ext_freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
ext_freq |
age_hhh | -.0040843 .0034685 -1.18 0.239 -.0108824 .0027139
gender_hhh | -.2542102 .0785204 -2.82 0.005 -.3754374 -.067643
hh_size | -.0046676 .0156446 -0.30 0.765 -.0353305 .0259953
dist_mkt | .0075753 .013998 0.54 0.590 -.0198998 .0349712
own_cash_crop | .1196532 .0692318 1.73 0.084 -.0160386 .2553451
input_acces0 | .2543907 .0800315 3.05 0.002 .0870918 .4008095
aware_extension_services | -.2671813 .0863115 -3.15 0.002 -.4409838 -.1026488
access_2crdt | .07612 .0672013 1.14 0.253 -.0549509 .2084733
farm_size1 | .0021675 .0178802 0.15 0.881 -.032373 .0377159
_cons | 1.263667 .2010278 6.15 0.000 .8426601 1.630675
-------------------------+----------------------------------------------------------------
access_to_extension |
hh_size | -.051465 .031816 -1.62 0.106 -.1138233 .0108933
dist_mkt | -.0452002 .0317816 -1.42 0.155 -.107491 .0170905
own_cash_crop | -.0665395 .161095 -0.41 0.680 -.3822798 .2492008
input_acces0 | -.0579019 .1757906 -0.33 0.742 -.4024451 .2866414
aware_extension_services | .6452373 .1814888 3.56 0.000 .2895259 1.000949
access_2crdt | .3881123 .1606434 2.42 0.016 .0732571 .7029675
sell_rice | .3996281 .2191917 1.82 0.068 -.0299798 .8292361
farm_incom_hh | .0000456 .0000128 3.56 0.000 .0000205 .0000707
crop_dive | -.3390377 .3519390 -1.06 0.288 -.9650316 .2869562
main_inc_source | -.0995744 .1876322 -0.53 0.596 -.4673268 .268178
_cons | -.1595383 .4452855 -0.35 0.730 -1.026583 .7189037
-------------------------+----------------------------------------------------------------
/athrho | 1.515521 .248586 6.54 0.000 1.137901 2.11234
/lnsigma | -3.783098 .9520586 -4.03 0.000 -5.703099 -1.971098
-------------------------+----------------------------------------------------------------
rho | .9636253 .0322574 .8137061 .9711619
sigma | .0261155 .0222056 .0033356 .1393039
------------------------------------------------------------------------------------------
Wald test of indep. eqns. (rho = 0): chi2(1) = 33.01 Prob > chi2 = 0.0053
My regards,
T. Sumo
Numbering occurrences of a flag within patients.
In the example below I have created OVERLAP_FLAG, which is equal to 1 where two consecutive regimens overlap and their start dates are within 28 days of one another.
My plan is to number each group of flags with a view to merging each overlapping group of regimens together.
As you can see from the example below, this is complicated by the fact that some patients have multiple groups of overlapping regimens.
Is there a straightforward way of numbering each group in a distinct fashion within each patient? For example, so that the first three flags form a group coded to 1 and the second solitary flag becomes 2, etc. This would be easy if were was a variable on which to sort the flags.
My plan is to number each group of flags with a view to merging each overlapping group of regimens together.
As you can see from the example below, this is complicated by the fact that some patients have multiple groups of overlapping regimens.
Is there a straightforward way of numbering each group in a distinct fashion within each patient? For example, so that the first three flags form a group coded to 1 and the second solitary flag becomes 2, etc. This would be easy if were was a variable on which to sort the flags.

Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long id double regimen float(start_date end_date regimen_overlap overlap_flag) 9942 222346 20577 20587 . . 9942 222347 20577 20605 0 1 9942 1194964 20591 20643 0 1 9942 1194965 20591 20628 14 1 9942 222349 20741 20752 150 . 9942 222348 20763 20842 22 . 9942 222350 20766 20850 3 1 end format %td start_date_of_regimen format %td end_date_of_regimen
Sunday, September 29, 2019
Generate a new variable in a loop in different files and calculating the mean
Hi,
I want to write a loop to generate a new variable called savings (income-expenditure) in two separate files. Use sample weights with name wht to calculate its mean in respective files. Then a combined mean of the means. How can we do it via loops?
Any help is appreciated.
I want to write a loop to generate a new variable called savings (income-expenditure) in two separate files. Use sample weights with name wht to calculate its mean in respective files. Then a combined mean of the means. How can we do it via loops?
Any help is appreciated.
how to export Summary Statistics to word
Dear Sir/Madam
I am working on my thesis on impact of informal credit on household welfare. I try to generate summary statistics for command of:
"sum heads_child girl_cnt age_cnt if f07_observed==1 & treatment==1"
I used the command of "outreg2 using myresul.doc, replace keep( heads_child girl_cnt age_cnt ) eqkeep(Mean) dec(3)". But the STATA result is "==1 invalid name". Please may you help me?
Thank you in Advance
I am working on my thesis on impact of informal credit on household welfare. I try to generate summary statistics for command of:
"sum heads_child girl_cnt age_cnt if f07_observed==1 & treatment==1"
I used the command of "outreg2 using myresul.doc, replace keep( heads_child girl_cnt age_cnt ) eqkeep(Mean) dec(3)". But the STATA result is "==1 invalid name". Please may you help me?
Thank you in Advance
Adding partner data to responding person data in a panel data file
I wish to extend my analysis of panel data to include information about the responding person's partner so I began the process of merging this extra data to my original (Responding Person) data file. The variables of interest are the same (e.g. age, sex, educ, inc, emp, ...). I collect the data from two separate data files of the HILDA survey. I collect the 'Responding person' data from the "Combined" data file (I think this is correct) and the 'Partner' data from the 'Rperson' data file (where I run a loop to append selected vars for all waves). I believe the 'unique identifier' is xwaveid (destringed), which I obtain for the responding person data. Based on HILDA suggested code (https://melbourneinstitute.unimelb.e...rogram-library) I use the 'unique identifier' hhpxid for the partner data and then rename to xwaveid. There are about 300,000 obs in the responding person data and about 150,000 obs for the partner data. When I merged these two data files it showed I had only about 150,000 obs, which appears as though I have unintentionally deleted all non-partnered 'responding person' obs. I want information on all responding persons and about those with partners. I would appreciate assistance with the code to ensure I am not unintentionally deleting obs from the new data file. My code follows:
.
I appreciate any help with the process and data.
Code:
clear all set memory 1g set maxvar 32000 set more off // Merge 'rperson' file with 'partner' file use "C:\data\rpersondata.dta", clear merge 1:m id wave using "C:\data\partnerdata.dta" order id wave sort id wave tsset id wave save "C:\data\rperson-partner", replace
I appreciate any help with the process and data.
Matching observations across boundaries
I am looking at houses on each side of a designated geographical boundary.
Does anyone know how to match house characteristics on either side of the boundary?
I have the following variables:
boundary - identifies which boundary the house belongs to.
treatment - identifies which houses are on the treated/control side of the boundary
bedrooms - number of bedrooms
baths - number of bathrooms
parking - number of car spaces
I have only been able to identify whether duplicates exist at the boundary level - see following command. I am unsure how to break this down and compare each side of the boundary.
Essentially, I aim to ensure that if a house with x bedrooms, x baths and x parking spaces exists in the control group for boundary x, then a house with x bedrooms, x baths and x parking spaces exists in the treatment group for boundary x.
Does anyone know how to match house characteristics on either side of the boundary?
I have the following variables:
boundary - identifies which boundary the house belongs to.
treatment - identifies which houses are on the treated/control side of the boundary
bedrooms - number of bedrooms
baths - number of bathrooms
parking - number of car spaces
I have only been able to identify whether duplicates exist at the boundary level - see following command. I am unsure how to break this down and compare each side of the boundary.
Code:
sort boundary bedrooms baths parking quietly by boundary bedrooms baths parking: gen dup = cond(_N==1, 0,_n) drop if dup==0
Essentially, I aim to ensure that if a house with x bedrooms, x baths and x parking spaces exists in the control group for boundary x, then a house with x bedrooms, x baths and x parking spaces exists in the treatment group for boundary x.
F test for cof.
Hello everyone,
My model is: Y = b0 + b1.x1 + b2.x2 + b3.x3 + e
and I use GMM for this model.
I want to test b1 + b2 < 0 by F-test
What is command for this. And how to read the resullt?
Thank you so much.
My model is: Y = b0 + b1.x1 + b2.x2 + b3.x3 + e
and I use GMM for this model.
I want to test b1 + b2 < 0 by F-test
What is command for this. And how to read the resullt?
Thank you so much.
Panel data with three identifiers
Dear All,
I am currently working with a country-year panel dataset which has data for value added for multiple sectors. It looks as follows:
As you can see, each country has data for value added for several sectors, but this is depicted columnwise. Value added for each sector are variables together with "country" and "year" . Currently my panel dataset has two identifiers: country and year. What I would like to do is modify my data in a way that this panel dataset consists of three identifiers:
1) country
2) sector
3) year
I assume this implies also that the value added data for the different sectors would not be depicted in columns anymore, but would be depicted row-wise, in the sense that each country has e.g. 5 datapoints for each year, namely for agriculture, mining, manufacturing, utilities, construction.
If anybody has an idea if and how this can be achieved, I would be very grateful.
Best,
Satya
I am currently working with a country-year panel dataset which has data for value added for multiple sectors. It looks as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long country int year double(agriculture manufacturing mining utilities construction) 5 1970 981.375378691819 6083.8882699607275 3314.377986731164 506.70685027966016 2006.7123877789525 5 1971 1047.3785337633847 6916.5025883226235 3380.606676519309 574.0725008948751 2235.0557453082833 5 1972 1004.0187968550568 7110.704855608108 3309.9627407452876 656.0828581655715 2027.9986329723647 5 1973 857.5592410758162 6647.521287082387 3364.4174412377624 675.1209768176975 1788.0445962466267 5 1974 1087.0186460911143 6477.841145292657 4110.573046865174 710.5728047619141 2257.390075704935 5 1975 1134.5517052620596 4828.1632395765 3648.174427027457 683.7475882841235 1669.5575805624069 5 1976 1116.2193959399322 5118.6251140941695 4092.781502010798 723.6899811012743 1393.6386137599113 5 1977 1234.8119779118867 5553.11987047624 4204.795616042943 765.4049653156358 1381.483073306251 5 1978 1189.0248529620965 6067.751221019465 4271.604682434886 816.9282885945638 1492.8891916388243 5 1979 1262.397738606136 6547.328358169937 4500.897755405062 872.4694857071697 1849.1763433825088 5 1980 1310.8037648876589 6951.561122736998 4734.184848648669 916.3115795981843 2291.850442816282 5 1981 1361.21761552351 7129.405790177698 5098.548425411251 935.1010484086189 2775.4757319337446 5 1982 1351.3967355295133 5635.297596050137 5431.141386362448 935.6919122076893 2122.026921332625 5 1983 1323.0689527912727 5809.858704261644 5317.674901321783 974.4525774266994 1817.1942903442387 5 1984 1441.57423805217 6326.53118621761 5608.693742154081 1043.938160197364 1857.7914351603458 5 1985 1543.100313012334 6496.477562539438 5791.14785009947 1083.7623802547005 2184.456832982976 5 1986 1660.7047304139123 6991.434529896969 5844.194450763241 1150.0663227014993 2397.3264521622946 5 1987 1818.8784210569606 7360.349185204471 5825.051991719084 1212.8607937144413 2616.440559550564 5 1988 2028.3106346387979 8008.985307139801 6281.010706743252 1281.282952848104 2841.1835013807054 5 1989 2155.034353611419 8886.108881625602 6770.549692753161 1180.786731574298 3157.5709039675676 5 1990 2351.34021390641 8973.802928950769 6833.363549901618 1077.8464141233414 3474.1928413228406 5 1991 2408.4364500945303 9452.749982148323 7679.688653679325 1368.4682503263523 3421.012082585575 5 1992 2697.5166095917593 10531.674004308923 7569.027171831896 1745.8993692111267 3887.8242222622434 5 1993 2781.7553785478876 11295.766898195656 7556.805316858913 1830.5277660732788 4800.750808303282 5 1994 2991.578662240602 11756.838693410447 8228.939157081648 1944.1343699178508 4749.886080404079 5 1995 3198.4717539431185 12642.568025900755 8996.785292525166 2092.459673304508 5221.02245487333 5 1996 3288.185504451532 13048.903995935294 10414.844340025093 2013.910182318858 5670.742873302024 5 1997 3400.8444205103046 13666.359953230405 11589.309516818577 2180.6136392506082 6026.200400763627 5 1998 3480.0746719699564 13352.125643752064 12549.560396035524 2276.8215444742495 6139.222709566625 5 1999 3507.691666495815 13283.739759050579 13878.610178440074 2168.7028528576793 5533.019757328133 5 2000 3739.033132780806 13936.441006213818 14323.552454900651 2374.8681490421977 5493.928485426047 5 2001 4020.677532508271 14022.11307099687 15135.987692954339 2410.9514914702045 5719.199924726687 5 2002 4295.359875214949 14287.256919868087 14496.674452839698 2490.106139597811 5862.724799319365 5 2003 4407.591009192732 14756.880720587662 15295.681421215628 2597.848939476505 6116.296122638 5 2004 4893.195674290561 15787.57945427821 16229.215748930521 2669.7927909571977 6314.7171296878905 5 2005 5235.52366773901 16727.216355640525 15597.436348861484 2750.7696314974487 6953.399167622369 5 2006 5445.435190788036 17387.421512458182 15702.659249244005 2958.520403792658 7228.817939932632 5 2007 5496.648019910568 17914.92334467639 16222.68096332891 2105.2513241878996 7561.054759264848 5 2008 5718.895309529531 18164.10214994937 15307.885730809367 2067.1852711011134 8328.307975440213 5 2009 6168.858138719481 19665.531230700435 14887.4184368782 2185.9714380755613 9188.567039247773 5 2010 6643.054616967439 20749.751555815412 20625.515807991615 1717.334059385173 9910.352150850182 5 2011 6372.830277868871 21521.695290676136 22306.019697880554 1652.4533003991382 10074.898295916159 5 2012 . . . . . 6 1970 68821.93786184487 17369.873736774083 3526.515925579863 2041.1034930415956 6478.822823366367 6 1971 70154.88059311402 19806.013382866688 4021.112799977039 2327.370003468182 7370.674429313272 6 1972 69529.11852637665 21228.31016384425 4309.874383683857 2494.5016114342784 7219.073877877839 6 1973 76405.62475426005 23149.738455664396 4699.9720650859945 2720.285290549922 7473.1613642627735 6 1974 79672.18431101153 23478.436567610952 4766.705948363077 2758.910030983694 7967.122989619162 6 1975 81298.14725613421 27884.128940155526 5661.17096005116 3276.615238697974 9242.602076124318 6 1976 79860.65545789216 27204.02823429808 5523.093619562108 3196.697793851682 9657.891406608484 6 1977 78141.54154392579 31377.195195268836 6370.350195573367 3687.079347003094 9824.914960944541 6 1978 81482.31652129904 36914.34728855157 7494.529641851021 4337.740408238934 9766.317058592982 6 1979 86480.53106427223 40110.23922306626 8143.375106952908 4713.2840817148735 9959.39087194243 6 1980 85198.03664727035 45186.85969769561 9174.05519266953 5309.8288776436475 12622.200547720255 6 1981 91147.25782712108 45972.85443251355 9333.631651989963 5402.189744686262 13022.49539885 6 1982 101655.69523825926 48626.426400488635 9872.372689866636 5714.006347095089 13467.98482994601 6 1983 110119.28092506662 53353.591315590245 10832.104614725418 6269.48723121748 15766.45203965875 6 1984 124303.5092001832 61278.47839969475 12441.053587759634 7200.726856470508 17477.38970981009 6 1985 126595.4729153697 72436.9164726229 14706.49374956509 8511.935404827627 21357.667218341994 6 1986 130796.37675991641 79421.29703643244 16124.496531959472 9332.657753277526 24747.26071581163 6 1987 136949.46533222307 89939.2949550645 18259.91143061495 10568.609298859183 29169.873183938675 6 1988 140433.74440328806 102753.77047817009 20240.274289695364 12444.517907813512 31500.62223648442 6 1989 144752.02644349143 102726.90669706423 18585.498872444587 12045.92373388096 28840.598387041602 6 1990 155359.30865097183 103857.54312662853 16009.664171667959 10333.248612250309 29182.785921071838 6 1991 159087.684325992 117380.464430292 17545.75458490957 11342.875186746853 31974.064723389914 6 1992 166566.1122787945 145099.27530431503 18292.937257386733 12552.35200495757 38697.583006345696 6 1993 174397.55919085976 179038.58372590304 21755.108379221816 12998.873510122829 45662.19714692269 6 1994 181374.32862260463 229234.3566415063 25124.183578975397 13543.017164438932 51906.919430940674 6 1995 190444.4385496261 293532.53271976346 34191.0874256793 18450.14721820064 58341.801828012125 6 1996 200158.65324442572 333720.0491866298 36820.182114367555 18924.544391895302 63306.33906522496 6 1997 207163.2925940071 368590.35520958534 41277.9203685271 23685.839451496056 64963.44862183223 6 1998 214412.5678890425 400340.8864532194 42821.50801665309 28977.71925394508 70821.05924713468 6 1999 220416.11978993574 430023.4113546913 47481.79707818456 34844.40065579689 73853.43390707385 6 2000 225706.0779279122 466665.6071827977 58842.837490595186 36979.48239400304 78040.35211055088 6 2001 232026.00857688286 509624.91748463066 58860.59277227855 42771.547508042335 83334.05398435422 6 2002 238754.8214577816 560306.9317601209 61623.92582143337 50271.42957991981 90657.22914538029 6 2003 244723.57627283974 631506.6268613825 70621.67376257626 55785.21351099297 101611.77245709402 6 2004 260141.22718433995 670682.3291969394 98865.51429976773 75601.4894113703 109872.8095578558 6 2005 273748.487774046 748324.4711984446 110310.7693275924 84353.56371073547 126585.04526160768 6 2006 287435.9121627484 870334.4883952552 121780.59467706362 72293.79046379843 147239.290461974 6 2007 298071.04091277 1004849.0825141772 133722.59182882542 84434.0867865947 170104.38001878318 6 2008 311708.75538201455 1110095.2136394589 147728.46164443574 93277.56500040894 186523.36037079184 6 2009 324749.2434879906 1207014.3575204427 160626.19857142656 101421.35449888217 221133.34201765328 6 2010 338620.23225162644 1352625.1903939417 180003.69346994383 113656.5427613304 250916.70350204606 6 2011 . . . . . 6 2012 . . . . . 7 1970 4166.08956278226 6096.705315576166 1867.8060020608802 1195.187578990279 2933.284089633269 7 1971 4267.720071074892 6612.174997849354 1781.8200480018113 1306.7384196960384 3144.597708073326 7 1972 4515.515804777153 7215.752117827384 1659.21044684351 1471.4087083569211 3201.087487260272 7 1973 4622.287777814628 7831.29011507595 1734.8287385859464 1699.4152342569348 3832.5777400196935 7 1974 4878.711177596934 8483.707446580545 1275.7722402303477 1760.7445384035761 4122.937013126807 7 1975 5160.787583888245 8588.685749537071 1324.656792467882 1882.9085555343866 3714.567827509241 7 1976 5318.332243405078 8965.147072487252 1297.3501871164472 1904.1759755207215 4048.837765736522 7 1977 5492.143362350697 9094.316027864194 1135.4201078296132 1969.4620089671464 4357.310188205705 7 1978 5937.844350699436 10001.145994446908 1061.520413626778 2170.26602093115 4245.886875463278 7 1979 6225.5206856465575 10613.215143423866 1073.9325069683393 2396.294182181272 4221.736009647186 7 1980 6363.332263233388 10741.745100434986 1271.9531345867902 2576.8199564838856 4838.955296242551 7 1981 6567.116333651066 10457.29954216322 1340.5060808886453 2661.3950452667546 5182.556250807869 7 1982 6442.744584838183 10306.678498790812 1363.9935805965235 2746.959316374569 5392.229676756674 7 1983 6623.702279401696 10424.528054544617 1557.4312814427053 2789.4941563472394 6090.6836938243505 end label values country country label def country 5 "chile", modify label def country 6 "china", modify label def country 7 "colombia", modify
1) country
2) sector
3) year
I assume this implies also that the value added data for the different sectors would not be depicted in columns anymore, but would be depicted row-wise, in the sense that each country has e.g. 5 datapoints for each year, namely for agriculture, mining, manufacturing, utilities, construction.
If anybody has an idea if and how this can be achieved, I would be very grateful.
Best,
Satya
Split to multiple files, merge and append
I have two files, each with 3 variables.
The first one has date, city, and score. The second one has date, city and ret. Each has 49 cities.
I want to split each files to 49 stata files, by city.
Then merge the 49 score files with the 49 ret files by city and date.
Then merge each of the 49 files with a fulldate file (only one variable date with all the 365/366 dates per year).
merge 1:1 date using fulldate
Then for the 49 files, each
replace city="cityname" if missing(city)
Then append all the 49 files to one file.
Anyone can help?
Great thanks!
The first one has date, city, and score. The second one has date, city and ret. Each has 49 cities.
I want to split each files to 49 stata files, by city.
Then merge the 49 score files with the 49 ret files by city and date.
Then merge each of the 49 files with a fulldate file (only one variable date with all the 365/366 dates per year).
merge 1:1 date using fulldate
Then for the 49 files, each
replace city="cityname" if missing(city)
Then append all the 49 files to one file.
Anyone can help?
Great thanks!
Loop for running individual regression for sectors and then taking average of all coefficients
Hi,
I have to run three regressions , from three separate files and then take a mean of all explanatory variables (for national estimate) but cannot append the data as per the instruction. What can be the probable loop, also I will need RMSE, R squared, t values etc for the national figure. Please help
I have to run three regressions , from three separate files and then take a mean of all explanatory variables (for national estimate) but cannot append the data as per the instruction. What can be the probable loop, also I will need RMSE, R squared, t values etc for the national figure. Please help
Generate an initiale value for each five year interval in unbalanced panel data
Hi dear Statalister,
I have an unbalanced panel data and i want to generate a new variable. This variable should take the first value of variable X but this first value should change every five year (it means take the value of X each five years) in a context of unbalanced panel data. Could some one help me please? Thanks a lot in advance.
I have an unbalanced panel data and i want to generate a new variable. This variable should take the first value of variable X but this first value should change every five year (it means take the value of X each five years) in a context of unbalanced panel data. Could some one help me please? Thanks a lot in advance.
Predictions using gologit2 - issues applying coeffcients to validation dataset
I'm new to Stata, so apologies in advance if this is a basic question. I am having some issues applying coefficients from my gologit2 model from my data, to a test dataset. I'm wondering whether I'm approaching this wrong!?
- Dataset split into two - development and test datasets
- Outcome variable is severity of a disease of interest (mild, moderate, severe)
- GOLOGIT2 used due to issues with the proportional odds assumption being violated in OLOGIT
- Stepwise ran using GOLOGIT2 (without autofit as read somewhere that SW and autofit should not be run together)
- Final model run but with autofit.
- When I try to apply coeefficients from this regression to my test dataset it does not work.I have tried the following options:
- I applied estimation results to test data and tried to use predict.
- Code:
- Use develop.dta
- Gologit2 y i.x1 x2 x3, autofit
- drop _all
- use test.dta
- predict p1 p2 p3
- Issue is that dummy variables created in the development data that are not present in the test data
- Code:
- After fitting the model on the development data, merge in test data in and run the predict command - Issue is that it wont predict for the merged data – probably due to a similar issue to the one above
- I pulled in the development and test data into one big dataset and tried to use if with gologit2 (i.e. to run the model on half of my data if var ==1 ), then appy coefficients to all.Message came up saying that I could not use if with gologit2.
Forcing my Event Study through R2000 in loops
Hey Everyone,
I'm currently trying to run a regression/prediction of some abnormal returns. All good so far, but during my iteration, the command constantly breaks if my data set (announcedates) do not match the respective trading dates. My dataset contains daily stock returns in a time span from 2009 until 2019. Furthermore I matched respective announce dates into the data set. Due to size, I cannot upload it here. Hope the code is enough, otherwise I will supply more information if required.
The dataset also grouped bond announcements intro groups which are sorted by G_ID. So for every G_ID i want to iterate the predicted return based on a given event window around the announce date.
The code I am running currently is this one:
sort G_ID date
by G_ID: gen datenum=_n
by G_ID: gen target=datenum if date==announcedate
egen td=min(target), by(G_ID)
drop target
gen dif=datenum-td
//create Event Window
by G_ID: gen event_window=1 if dif>=-1 & dif<=0
egen count_event_obs=count(event_window), by(G_ID)
by G_ID: gen estimation_window=1 if dif<-2 & dif>=-202
egen count_est_obs=count(estimation_window), by(G_ID)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.
//predicted returns
gen predicted_return=.
egen id=group(G_ID)
sum id
forvalues i=1(1)`r(max)' {
l id G_ID if id==`i' & dif==0
capture quietly reg return market_return if id==`i' & estimation_window==1
capture noisily predict p if id==`i'
capture noisily replace predicted_return = p if id==`i' & event_window==1
capture noisily drop p
}
// generate abnormal returns based on predictions for estimated return.*/
sort id date
gen abnormal_return=return-predicted_return if event_window==1
by id: egen cumulative_abnormal_return = sum(abnormal_return)
/*Test for statistical Significance
//70 is number of days in event window. If prolonged, change digets */
sort id date
by id: egen ar_sd = sd(abnormal_return)
gen test =(1/sqrt(count_event_obs)) * (cumulative_abnormal_return/ar_sd)
quietly list G_ID cumulative_abnormal_return test if dif==0
/*export data*/
export excel G_ID announcedate cumulative_abnormal_return test using "stats_group_announcedate01_large.xls" if dif==0, firstrow(variables) replace
//calculate the cumulative abnormal for all companies treated as a group
reg cumulative_abnormal_return if dif==0, robust
In the red marked area my code typically bugs around.
Before reading about capture function, I used this code:
forvalues i=1(1)`r(max)' {
l id G_ID if id==`i' & dif==0
quietly reg return market_return if id==`i' & estimation_window==1
predict p if id==`i'
replace predicted_return = p if id==`i' & event_window==1
drop p
}
which lead to constant breaks with the error R2000. Thus is tried to establish a capture command! What I want is for the code to display a zero as predicted return, instead of an error code which breaks the code. Then i want to analyse what exactly leads to the break (i.e. announcedate on a weekday, instead of a trading day). It appears useless to repeat entering the code over and over again, after it breaks and i thus just want it to make "zero changes" instead of stopping.
Would be really really happy if you could provide some help here! Hope im also complying with all rules applying here, as this is my very first post!!
Best
Kai
I'm currently trying to run a regression/prediction of some abnormal returns. All good so far, but during my iteration, the command constantly breaks if my data set (announcedates) do not match the respective trading dates. My dataset contains daily stock returns in a time span from 2009 until 2019. Furthermore I matched respective announce dates into the data set. Due to size, I cannot upload it here. Hope the code is enough, otherwise I will supply more information if required.
The dataset also grouped bond announcements intro groups which are sorted by G_ID. So for every G_ID i want to iterate the predicted return based on a given event window around the announce date.
The code I am running currently is this one:
sort G_ID date
by G_ID: gen datenum=_n
by G_ID: gen target=datenum if date==announcedate
egen td=min(target), by(G_ID)
drop target
gen dif=datenum-td
//create Event Window
by G_ID: gen event_window=1 if dif>=-1 & dif<=0
egen count_event_obs=count(event_window), by(G_ID)
by G_ID: gen estimation_window=1 if dif<-2 & dif>=-202
egen count_est_obs=count(estimation_window), by(G_ID)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.
//predicted returns
gen predicted_return=.
egen id=group(G_ID)
sum id
forvalues i=1(1)`r(max)' {
l id G_ID if id==`i' & dif==0
capture quietly reg return market_return if id==`i' & estimation_window==1
capture noisily predict p if id==`i'
capture noisily replace predicted_return = p if id==`i' & event_window==1
capture noisily drop p
}
// generate abnormal returns based on predictions for estimated return.*/
sort id date
gen abnormal_return=return-predicted_return if event_window==1
by id: egen cumulative_abnormal_return = sum(abnormal_return)
/*Test for statistical Significance
//70 is number of days in event window. If prolonged, change digets */
sort id date
by id: egen ar_sd = sd(abnormal_return)
gen test =(1/sqrt(count_event_obs)) * (cumulative_abnormal_return/ar_sd)
quietly list G_ID cumulative_abnormal_return test if dif==0
/*export data*/
export excel G_ID announcedate cumulative_abnormal_return test using "stats_group_announcedate01_large.xls" if dif==0, firstrow(variables) replace
//calculate the cumulative abnormal for all companies treated as a group
reg cumulative_abnormal_return if dif==0, robust
In the red marked area my code typically bugs around.
Before reading about capture function, I used this code:
forvalues i=1(1)`r(max)' {
l id G_ID if id==`i' & dif==0
quietly reg return market_return if id==`i' & estimation_window==1
predict p if id==`i'
replace predicted_return = p if id==`i' & event_window==1
drop p
}
which lead to constant breaks with the error R2000. Thus is tried to establish a capture command! What I want is for the code to display a zero as predicted return, instead of an error code which breaks the code. Then i want to analyse what exactly leads to the break (i.e. announcedate on a weekday, instead of a trading day). It appears useless to repeat entering the code over and over again, after it breaks and i thus just want it to make "zero changes" instead of stopping.
Would be really really happy if you could provide some help here! Hope im also complying with all rules applying here, as this is my very first post!!
Best
Kai
Merging multiple Spss data sets into STATA
Hi, I'm trying to merge a few UNICEF MICS survey data files together which are on Spss onto stata, for example file ch.sav with hh.sav. I have them all saved as individual STATA files. Can somebody help with the correct command please as nothing I'm trying is working!
Specification of funnel plot with -funnelcompar-
Hi Statalisters
I want to assess county differences in medication rates in Norway to detect outliers. For this purpose I want to use funnel plots as recommended by Spiegelhalter (see e.g. The Art of Statistics). I am trying this on publically available Norwegian data. However, I have worked on this for some time now and I am not sure whether I am specifying the funnel plot correctly, and I wonder if any of you have some input on this.
Specifically, I am having some trouble with (i) specifying the correct weights (the plot command only works with the -nowei- option), and (ii) the confidence intervals seem to narrow, although I suspect this may be due to sample size, not my specification of the funnel plot.
It's county data for a year so it's a small sample with 18 observations. The medication rate is by 1000 individuals per county. I've gathered population data per county, and made a population rate in 1000's so it's scaled to the medication rate.
I obtain the standard deviation of the medication rate by
The funnel plot command is then:
Which gives:
Array
Without -nowei- I get the following error message:
As it's such a small sample, I'll include the total dataset in dataex:
I want to assess county differences in medication rates in Norway to detect outliers. For this purpose I want to use funnel plots as recommended by Spiegelhalter (see e.g. The Art of Statistics). I am trying this on publically available Norwegian data. However, I have worked on this for some time now and I am not sure whether I am specifying the funnel plot correctly, and I wonder if any of you have some input on this.
Specifically, I am having some trouble with (i) specifying the correct weights (the plot command only works with the -nowei- option), and (ii) the confidence intervals seem to narrow, although I suspect this may be due to sample size, not my specification of the funnel plot.
It's county data for a year so it's a small sample with 18 observations. The medication rate is by 1000 individuals per county. I've gathered population data per county, and made a population rate in 1000's so it's scaled to the medication rate.
I obtain the standard deviation of the medication rate by
Code:
egen sd_medrate=sd(medrate1000)
Code:
funnelcompar medrate1000 poprate1000 county sd_medrate, cont nowei
Array
Without -nowei- I get the following error message:
Code:
may not use noninteger frequency weights
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long county float(county_pop medrate1000 poprate1000 sd_medrate) 4 72856 6.37 72.856 1.6318192 12 107080 4.68 107.08 1.6318192 2 108499 9.68 108.499 1.6318192 14 156494 4.76 156.494 1.6318192 13 168231 7.46 168.231 1.6318192 16 170377 6.4 170.377 1.6318192 9 185216 5.56 185.216 1.6318192 5 190709 8.24 190.709 1.6318192 17 231286 7.83 231.286 1.6318192 8 236271 9.19 236.271 1.6318192 7 251262 5.87 251.262 1.6318192 3 257673 6.28 257.673 1.6318192 18 271662 7.59 271.662 1.6318192 15 422102 5.61 422.102 1.6318192 11 427947 6.51 427.947 1.6318192 6 477175 4.79 477.175 1.6318192 1 536499 5.32 536.499 1.6318192 10 586860 3.62 586.86 1.6318192 end label values county fylke label def fylke 1 "Akershus", modify label def fylke 2 "Aust-Agder", modify label def fylke 3 "Buskerud", modify label def fylke 4 "Finnmark", modify label def fylke 5 "Hedmark", modify label def fylke 6 "Hordaland", modify label def fylke 7 "Møre og Romsdal", modify label def fylke 8 "Nordland", modify label def fylke 9 "Oppland", modify label def fylke 10 "Oslo", modify label def fylke 11 "Rogaland", modify label def fylke 12 "Sogn og Fjordane", modify label def fylke 13 "Telemark", modify label def fylke 14 "Troms", modify label def fylke 15 "Trøndelag", modify label def fylke 16 "Vest-Agder", modify label def fylke 17 "Vestfold", modify label def fylke 18 "Østfold", modify
Specification of funnel plot with -funnelcompar-
Hi Statalisters
I want to assess county differences in medication rates in Norway to detect outliers. For this purpose I want to use funnel plots as recommended by Spiegelhalter (see e.g. The Art of Statistics). I am trying this on publically available Norwegian data. However, I have worked on this for some time now and I am not sure whether I am specifying the funnel plot correctly, and I wonder if any of you have some input on this.
Specifically, I am having some trouble with (i) specifying the correct weights (the plot command only works with the -nowei- option), and (ii) the confidence intervals seem to narrow, although I suspect this may be due to sample size, not my specification of the funnel plot.
It's county data for a year so it's a small sample with 18 observations. The medication rate is by 1000 individuals per county. I've gathered population data per county, and made a population rate in 1000's so it's scaled to the medication rate.
I obtain the standard deviation of the medication rate by
The funnel plot command is then:
Which gives:
Array
Without -nowei- I get the following error message:
As it's such a small sample, I'll include the total dataset in dataex:
I want to assess county differences in medication rates in Norway to detect outliers. For this purpose I want to use funnel plots as recommended by Spiegelhalter (see e.g. The Art of Statistics). I am trying this on publically available Norwegian data. However, I have worked on this for some time now and I am not sure whether I am specifying the funnel plot correctly, and I wonder if any of you have some input on this.
Specifically, I am having some trouble with (i) specifying the correct weights (the plot command only works with the -nowei- option), and (ii) the confidence intervals seem to narrow, although I suspect this may be due to sample size, not my specification of the funnel plot.
It's county data for a year so it's a small sample with 18 observations. The medication rate is by 1000 individuals per county. I've gathered population data per county, and made a population rate in 1000's so it's scaled to the medication rate.
I obtain the standard deviation of the medication rate by
Code:
egen sd_medrate=sd(medrate1000)
Code:
funnelcompar medrate1000 poprate1000 county sd_medrate, cont nowei
Array
Without -nowei- I get the following error message:
Code:
may not use noninteger frequency weights
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long county float(county_pop medrate1000 poprate1000 sd_medrate) 4 72856 6.37 72.856 1.6318192 12 107080 4.68 107.08 1.6318192 2 108499 9.68 108.499 1.6318192 14 156494 4.76 156.494 1.6318192 13 168231 7.46 168.231 1.6318192 16 170377 6.4 170.377 1.6318192 9 185216 5.56 185.216 1.6318192 5 190709 8.24 190.709 1.6318192 17 231286 7.83 231.286 1.6318192 8 236271 9.19 236.271 1.6318192 7 251262 5.87 251.262 1.6318192 3 257673 6.28 257.673 1.6318192 18 271662 7.59 271.662 1.6318192 15 422102 5.61 422.102 1.6318192 11 427947 6.51 427.947 1.6318192 6 477175 4.79 477.175 1.6318192 1 536499 5.32 536.499 1.6318192 10 586860 3.62 586.86 1.6318192 end label values county fylke label def fylke 1 "Akershus", modify label def fylke 2 "Aust-Agder", modify label def fylke 3 "Buskerud", modify label def fylke 4 "Finnmark", modify label def fylke 5 "Hedmark", modify label def fylke 6 "Hordaland", modify label def fylke 7 "Møre og Romsdal", modify label def fylke 8 "Nordland", modify label def fylke 9 "Oppland", modify label def fylke 10 "Oslo", modify label def fylke 11 "Rogaland", modify label def fylke 12 "Sogn og Fjordane", modify label def fylke 13 "Telemark", modify label def fylke 14 "Troms", modify label def fylke 15 "Trøndelag", modify label def fylke 16 "Vest-Agder", modify label def fylke 17 "Vestfold", modify label def fylke 18 "Østfold", modify
Panel data with fixed effects model having dummy/time-invariant variables in data set
Greetings,
I'm trying to conduct FE-model test on my panel data set. The panel consists of 116 companies, observed over 5 years. Within these companies I try to measure different characteristics of different managers (ex CEO) as independent variables and a set of control variables to see the effect on leverage (dependent variable). The 116 companies have different amount of managers presented, as to why I want to run separate regressions for each manager.
Furthermore, the variables are coded as following:
Dependent:
Leverage: as per average leverage for each year
Independent:
Gender: 0 Male, 1 Female (dummy)
Age: as per age by measuring point
Education: 1-4 depending on educational level
Experience: 0 if no experience, 1 if experience (dummy)
Tenure: as per tenure by measuring point
Misdem: 0 if none, 1 if (dummy)
Control variables:
Industry: 0-9 depending on type of industry (dummy) - unfortunately presented below are my old tests, showing industry as 1-10.
ROA: measured as %
Firm size: Ln(sales)
As noticed, across my 5 year time-period, there are many time-invariant variables (at least in regards of my panel), for example the Education variable does not vary (if the manager has educational level "2", this will not vary in my panel). All the dummy variables are also time invariant for each manager.
Investigating into how panel data regressions should be run, they can (as to what I understand is mostly common) be run using a Pooled OLS technique or panel data regression using either a fixed effects (fe) or random effects (re) model. As I also understood, panel data regressions are superior to Pooled OLS regressions, which leaves me with the choice between fe or re panel regression. From what I understood further, whether fe or re should be used is determined by a number of factors but mainly using a Hausman-test. To conduct the Hausman-test both regressions, fe and re, are run and then somehow compared to determine which is most fitting ones data - significant at 5% level means that fe should be prefered over re(?).
Anyway, to get to the problem..
I set up my panel using:
As I then try to run the fe regression using:
or as in my case (using manager CEO):
I choose to vce(robust) because of potential auto-correlation and heteroskedacity tested with respectively Wooldridge and White tests.
I get the following result:
Array
From what I understood this problem (omitted variables) is because fe already accounts for time-invariance in the regression, which is why time-invariant variables don't work, or get knocked out/omitted regressions using fe..
Which presents my questions:
1. Is this possible to fix so I can run a fe regression followed by a Hausman-test? - or should I choose the re model / pooled ols anyway, even if probably Hausman-test would probably say that fe regression is prefered, because of the impossibility of running this data set as fe regression?
2. Is my regression correct in terms of using "i." on Education (CEOEdu variable) and having it coded as 1-4?
3. Is my regression and reasoning correct in general, have I missed any steps in regards of the preparation of conducting this type of regression?
4. Not related to the regression or problem - is there a simple way to get the regression from STATA into some form of presentation type or other document type?
Important to add is that my knowledge about statistics (and STATA) is highly limited and time is limited to acquire knowledge, hence I have turned to this forum of great expertise for help.
I would very much appreciate fast help with how I should command my regression to get the correct output for my reporting of the results and to analyze the results. An explanation to why one regression is used instead of another with regards to my data set, if this is the case, would also be highly appreciated.
Thank you in advance and best regards,
I'm trying to conduct FE-model test on my panel data set. The panel consists of 116 companies, observed over 5 years. Within these companies I try to measure different characteristics of different managers (ex CEO) as independent variables and a set of control variables to see the effect on leverage (dependent variable). The 116 companies have different amount of managers presented, as to why I want to run separate regressions for each manager.
Furthermore, the variables are coded as following:
Dependent:
Leverage: as per average leverage for each year
Independent:
Gender: 0 Male, 1 Female (dummy)
Age: as per age by measuring point
Education: 1-4 depending on educational level
Experience: 0 if no experience, 1 if experience (dummy)
Tenure: as per tenure by measuring point
Misdem: 0 if none, 1 if (dummy)
Control variables:
Industry: 0-9 depending on type of industry (dummy) - unfortunately presented below are my old tests, showing industry as 1-10.
ROA: measured as %
Firm size: Ln(sales)
As noticed, across my 5 year time-period, there are many time-invariant variables (at least in regards of my panel), for example the Education variable does not vary (if the manager has educational level "2", this will not vary in my panel). All the dummy variables are also time invariant for each manager.
Investigating into how panel data regressions should be run, they can (as to what I understand is mostly common) be run using a Pooled OLS technique or panel data regression using either a fixed effects (fe) or random effects (re) model. As I also understood, panel data regressions are superior to Pooled OLS regressions, which leaves me with the choice between fe or re panel regression. From what I understood further, whether fe or re should be used is determined by a number of factors but mainly using a Hausman-test. To conduct the Hausman-test both regressions, fe and re, are run and then somehow compared to determine which is most fitting ones data - significant at 5% level means that fe should be prefered over re(?).
Anyway, to get to the problem..
I set up my panel using:
Code:
egen companynum = group(Company) xtset companynum xtset companynum Years, yearly
Code:
xtreg dep indep1 indep2 indep3 indep4 indep5 indep6 cont1 cont2 cont3 cont4 cont5, fe vce(robust)
Code:
xtreg Leverage i.CEOGen CEOAge i.CEOEdu i.CEOExp CEOTen CEOMis ROA i.Industry FirmSize, fe vce(robust)
I get the following result:
Array
From what I understood this problem (omitted variables) is because fe already accounts for time-invariance in the regression, which is why time-invariant variables don't work, or get knocked out/omitted regressions using fe..
Which presents my questions:
1. Is this possible to fix so I can run a fe regression followed by a Hausman-test? - or should I choose the re model / pooled ols anyway, even if probably Hausman-test would probably say that fe regression is prefered, because of the impossibility of running this data set as fe regression?
2. Is my regression correct in terms of using "i." on Education (CEOEdu variable) and having it coded as 1-4?
3. Is my regression and reasoning correct in general, have I missed any steps in regards of the preparation of conducting this type of regression?
4. Not related to the regression or problem - is there a simple way to get the regression from STATA into some form of presentation type or other document type?
Important to add is that my knowledge about statistics (and STATA) is highly limited and time is limited to acquire knowledge, hence I have turned to this forum of great expertise for help.
I would very much appreciate fast help with how I should command my regression to get the correct output for my reporting of the results and to analyze the results. An explanation to why one regression is used instead of another with regards to my data set, if this is the case, would also be highly appreciated.
Thank you in advance and best regards,
Generating a new variable based on several specified conditions
Dear Stata-Community,
I am currently working on a paper in which I want to create a variable that indicates the number of respondents that mentioned unemployment as most important problem and indicated the same or plus/minus one value on a left-right scale as the party under observation. So in the end, I want a variable for each party in each country that indicates how many respondents, that are (+/-1) close on the left-right scale, said unemployment is the most important problem. Unfortunately, my dataset only contains a country variable (country), a most important problem variable with the unemployment-specification (qpp1==70101), a self-placement variable (qpp13) and eight different party-placement variables for all countries (qpp14_1 to qpp14_8). So my problem is that the Swedish Green Party might be the variable qpp14_3 while this variable in Belgium means a conservative party. The dataset contains 25.000 individual respondents and I want for each of the in qpp14_1 to qpp_14_8 mentioned parties how many potentially close respondents (+/-1) indicated unemployment as most important problem. My question now is: Is there any possibility to automatize this process?
I am using stata version 15.1 and would be very grateful for any ideas.
Thanks in advance!
Robin
I am currently working on a paper in which I want to create a variable that indicates the number of respondents that mentioned unemployment as most important problem and indicated the same or plus/minus one value on a left-right scale as the party under observation. So in the end, I want a variable for each party in each country that indicates how many respondents, that are (+/-1) close on the left-right scale, said unemployment is the most important problem. Unfortunately, my dataset only contains a country variable (country), a most important problem variable with the unemployment-specification (qpp1==70101), a self-placement variable (qpp13) and eight different party-placement variables for all countries (qpp14_1 to qpp14_8). So my problem is that the Swedish Green Party might be the variable qpp14_3 while this variable in Belgium means a conservative party. The dataset contains 25.000 individual respondents and I want for each of the in qpp14_1 to qpp_14_8 mentioned parties how many potentially close respondents (+/-1) indicated unemployment as most important problem. My question now is: Is there any possibility to automatize this process?
I am using stata version 15.1 and would be very grateful for any ideas.
Thanks in advance!
Robin
Test for differences in Medians for Multiple Variables Using qreg
Dear Stata Users:
I would like to Test for differences in Medians for Multiple Variables Using qreg following the Stata Journal
I would like to export the t statistics and p-values for quantile regression for multiple variables. How is it possible to do that. It is very tedious to do it one by one. Is it possible to do that with eststo and then export with esttab. What would be a proper code?
I would like to Test for differences in Medians for Multiple Variables Using qreg following the Stata Journal
HTML Code:
https://journals.sagepub.com/doi/pdf/10.1177/1536867X1201200202
Code:
qreg var1 class qreg var2 class qreg var3 class qreg var4 class
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(var1 var2 var3 var4 class) 2.1709347229330476 .02042863528691251 .03228491681068315 6.344270516206654 1 5.487542064826069 .12843769095325996 .04757251167276653 6.154184831332533 1 2.403442868218309 .14004722153361354 .056244318978395644 6.982910060182892 1 1.1505027133742167 .11100197329982031 .004023767790014441 6.811365516638235 1 1.8804746970472777 .5373991437510292 .007095957516877986 9.874810261649259 1 1.9808765550239236 .3227327984439582 .021833211767566252 9.931394505789136 1 2.676107983414312 .2086538813493268 .07895762414439786 6.142981046239942 1 5.0250382942155865 .5440525877207101 .025750244090855447 5.636962241280868 0 4.258091546463471 .012181876701455542 .04470620519153116 6.660379227959387 1 2.197711032881432 0 .03437963510075126 4.4299473733834605 0 2.2245278845356267 .4394672805861094 .009428517082833212 5.723343242641416 1 1.1452441380506346 .005852152919964551 .0037893836139718242 11.088995933771097 1 .9484033697297157 .3734964247700097 .002009329861113992 7.095376932085215 1 1.1523751483206472 .2830885583049549 .014212063384959249 5.654406639014709 1 1.7533311612460127 .19266922499285918 .06038927853563682 7.63872652475961 1 .8706634694165418 .27225441581137516 .013483301895410065 7.3137409182571105 1 1.079994695584802 .24342634444177425 .00556080453072516 8.201428740938756 1 .9901501157807883 .200563603986151 .008008362894517725 6.203767551444203 1 1.3157547495682214 .013003901170351105 .11967674566421503 8.84246002419529 1 1.6125047223917022 .27092330708216894 .00022581646766590453 10.475257746929474 1 1.6695885509839 .7825286415711947 -.0016366612111292963 8.494743062578646 0 .8480628518530803 .3766737329713641 -.0002713747199372005 7.579527636994385 1 1.060821685855263 .2603442745643109 -.002459104030792259 9.143559338285831 1 1.6878696906442758 .11737663668433701 .0026116483332574844 7.910486424877584 1 3.968033586295519 .09784727102453503 .04043281225865062 7.8916183004611 1 .899243711630933 .12953298510293923 .002730703847420142 8.836453142906317 1 1.876879945068383 .2996552112194602 .006820006046659643 8.30061876606964 0 .9541869853059908 .3654661990765307 .017404474266007573 7.69546963611513 1 1.9644481201108408 .4032161241201265 .005893921591571983 11.974259250751283 0 2.5286603845167903 .3319836814821631 .012760344774700163 10.392128915166236 1 2.5399946760599135 .28070659154350697 .014606202798280617 6.371492181952594 0 1.9045555011585498 .2964226277064348 .03337520020575096 8.209402284791736 1 5.716134885512472 .5292907790381521 .008094833449250688 8.67395024313201 0 1.3615053404450799 .14013686402856293 .0557818383280838 11.487299917331256 1 2.4918920664014497 0 .022622035593839667 5.8137662863912745 1 4.115151363312894 .41287287656029065 .0038089010156910734 7.650706871093464 0 2.355544245111873 .18922108333108298 .0066966541804420205 7.86064405705401 0 1.1459177647714638 .3806578806414909 .011546484840845521 7.993001640282866 1 1.4456867284194304 .2235793293343836 .0307410848329395 6.388823556055131 1 2.433206439905795 .05182226940660904 .005467731741770416 8.736676452032238 1 .6718319519007932 .24684993964632423 -.0034781785701610136 8.62690498651182 0 1.610265383568393 .003758584550491892 .0015988813762135522 8.218744836646748 0 1.5904666612063105 .08420437504195837 .004750224395124635 8.339667396550764 1 .4844194771165337 .19147015156146804 .008096030121903125 8.126672457793893 1 1.1255137911908935 .013977875086425444 -.022882576630560036 5.852547247921313 0 1.0980541357873876 .23833167825223436 .0021414722790804503 9.602645938552133 1 1.459342862276669 .2960699771294345 .026362620968550576 8.654420620275989 1 3.7872993641364756 .16344200892632452 .03434962682508268 6.424757186209322 1 1.3330632839020906 .14922273950851017 .028375058135984177 5.3653356230988205 1 1.7050250470489474 .09482475476419729 .044931304649582844 8.317028590089064 1 1.4095675648788928 .45225421663566984 .012633874174308985 9.654641503869545 1 .9817325091888597 .19951911856495427 .01903986539661863 6.826789228492126 0 4.749956883277682 .004365141127600385 .03350505532830546 8.57455905040462 1 2.6315519937451133 .24769642069198225 .04563033716614323 8.256789004740584 1 1.2951633934280147 .40049463644969263 .008107996872041815 6.28326437993576 1 1.6885666182075358 .12606567517413872 .007133636339187263 6.390119854398666 1 2.0852919852933036 .22075020161884867 .03615227896612162 8.04582799565937 1 .6189007671401604 .25361495023248226 .010725521502892112 7.872453625672574 0 1.5426325594770125 .2503213115847204 .038414029355626496 6.398403568472147 1 5.431799918831169 .48254364089775564 .03834164588528678 6.465522446681758 1 .6169221507832088 .09853527536057495 -.0068559833290489135 5.904451351934881 1 1.3892754575052342 .054018197650424475 .02138990983282448 7.747180085196543 1 3.208024624600127 .2938049183043822 -.018247490676978 7.8478625324739415 1 2.0495010093948625 .07491891129600903 .028160696657735153 5.4289595895385325 1 1.914610189433304 .12734084304731733 .03212300106711626 7.359841852124484 0 .4680346209512774 .07488053707720517 .0033831925393684814 8.738226678109067 1 1.5331495506412196 .2399151993404393 .02296684529768565 9.739909238559308 1 2.4155976127480647 .1974669905582754 .04859411600264234 8.57590766064048 1 1.3535190448409071 .5582551101016601 -.005989655296997654 9.011132073356997 1 1.8093710795578795 .3263061983574228 .02702264160363464 6.177338121345474 1 2.413543812058275 .13404023535930262 .007393056878498134 9.294058900151573 1 1.7598221676427495 .35767105564133356 .011749799851318124 6.3289790974743 1 1.9380582068072627 .24758035817844862 .03609937184292087 7.932295030766125 0 2.0306460746550394 .24806585552336896 .021115980979310157 6.4785541880673865 1 1.8385283854866579 .19986105409416743 .020652414185113842 10.363061986291324 1 1.9736462512097805 .06855061601161756 .024047875579706748 6.750961108209461 0 1.9769853417901335 .33724378397180416 .017118646198720125 7.777863842906347 1 1.8545938219946896 .30055710669517777 .018918646200314113 7.636383979077606 1 2.886666087988723 .15277429315835184 .03533725755090225 10.855454260862754 1 1.3807206988594998 .616143277023171 .003723889966898756 5.639606554437637 1 2.2361776526107477 .26794909161566116 .012921402895171477 7.630267068171986 1 9.28029906542056 .27524229074889867 .04070484581497797 8.644002038279933 1 .8217743575613246 .30930619871638637 .007741141789399162 5.845432488574447 1 1.3653890875928012 .22438623490829931 .01337300005543486 7.03656089770649 0 2.800319755600815 .3067567966466924 -.00008855565723056941 10.430521115146327 1 3.927241192562631 0 .05982923895641445 8.717433576126178 1 .5205014326016568 .17731842326756325 .0007743689109328184 8.6229545667191 0 .7988620988725066 .18910471323051906 .002006304018415759 5.249142878535657 0 1.1528772598589017 .27211150188875866 .013282133925851286 8.443672585190909 1 1.0824540711537936 .31280606062952193 .01893637035804013 7.857182870229327 1 1.1188457318289626 .2452208335452753 .010463449736842892 6.5064146534896405 1 4.414392777315159 .30744329614941296 .009213276808564675 6.2400224017506565 1 1.863419219993175 .2169520322068658 .018969571690629208 8.835908851864653 1 1.2937478285552086 .05945033965279936 .0030191359943169205 11.814059157345177 1 1.2836064650560342 .19992189011901249 .003265378320655072 7.041395040734514 0 .9734652518048188 .41056808441462483 .011505039458516832 4.915093087375768 1 2.3414603025370666 .04547157992309307 .007938720680141068 8.953379403377928 0 5.007207397093378 . .04567133374239973 9.570989408736887 1 1.2348910387473195 .6933934494285644 .015182351673371145 6.540061276160971 1 13.08921134631084 .3304803372220662 .03931358971740739 7.134521062086644 1 end
Generate mean and median test results in Stata
Dear Stata Users:
I am trying to generate mean and median test results in Stata. Thanks to Kain Chen
I have a sample code. However it is not working in stata 14.
Could you please help.
I am trying to generate mean and median test results in Stata. Thanks to Kain Chen
HTML Code:
http://kaichen.work/?p=1218
Could you please help.
Code:
local vars var1 var2 var3 var4
local group class
foreach v in `vars' {
di "`v'"
ttest `v', by(`group')
local mean_`v'_mean_0=round(r(mu_1),.001)
local mean_`v'_mean_1=round(r(mu_2),.001)
local mean_`v'_diff=`mean_`v'_mean_1'-`mean_`v'_mean_0'
local mean_`v'_p=r(p)
}
foreach v in `vars' {
sum `v' if `group'==0, detail
local p50_`v'_p50_0=round(r(p50),.001)
sum `v' if `group'==1, detail
local p50_`v'_p50_1=round(r(p50),.001)
ranksum `v', by(`group')
local p50_`v'_n_0=r(N_1)
local p50_`v'_n_1=r(N_2)
local p50_`v'_diff=`p50_`v'_p50_1'-`p50_`v'_p50_0'
local p50_`v'_p=2*normprob(-abs(r(z)))
}
qui {
noi di _newline
noi di "{hline 115}"
noi di _col(15) "{c |} `group' = 1" ///
_col(45) "{c |} `group' = 0" ///
_col(75) "{c |} Diff"
noi di _col(16) "{hline 100}"
noi di _col(15) "{c |} Mean" ///
_col(25) "{c |} Median" ///
_col(35) "{c |} N" ///
_col(45) "{c |} Mean" ///
_col(55) "{c |} Median" ///
_col(65) "{c |} N" ///
_col(75) "{c |} Mean" ///
_col(85) "{c |} P" ///
_col(95) "{c |} Median" ///
_col(105) "{c |} P"
noi di "{hline 115}"
foreach v in `vars' {
noi di %12s abbrev("`v'",12) ///
_col(15) "{c |}" %8.3f `mean_`v'_mean_1' ///
_col(25) "{c |}" %8.3f `p50_`v'_p50_1' ///
_col(35) "{c |}" %8.0f `p50_`v'_n_1' ///
_col(45) "{c |}" %8.3f `mean_`v'_mean_0' ///
_col(55) "{c |}" %8.3f `p50_`v'_p50_0' ///
_col(65) "{c |}" %8.0f `p50_`v'_n_0' ///
_col(75) "{c |}" %8.3f `mean_`v'_diff' ///
_col(85) "{c |}" %8.3f `mean_`v'_p' ///
_col(95) "{c |}" %8.3f `p50_`v'_diff' ///
_col(105) "{c |}" %8.3f `p50_`v'_p'
}
noi di "{hline 115}"
}Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(var1 var2 var3 var4 class) 2.1709347229330476 .02042863528691251 .03228491681068315 6.344270516206654 1 5.487542064826069 .12843769095325996 .04757251167276653 6.154184831332533 1 2.403442868218309 .14004722153361354 .056244318978395644 6.982910060182892 1 1.1505027133742167 .11100197329982031 .004023767790014441 6.811365516638235 1 1.8804746970472777 .5373991437510292 .007095957516877986 9.874810261649259 1 1.9808765550239236 .3227327984439582 .021833211767566252 9.931394505789136 1 2.676107983414312 .2086538813493268 .07895762414439786 6.142981046239942 1 5.0250382942155865 .5440525877207101 .025750244090855447 5.636962241280868 0 4.258091546463471 .012181876701455542 .04470620519153116 6.660379227959387 1 2.197711032881432 0 .03437963510075126 4.4299473733834605 0 2.2245278845356267 .4394672805861094 .009428517082833212 5.723343242641416 1 1.1452441380506346 .005852152919964551 .0037893836139718242 11.088995933771097 1 .9484033697297157 .3734964247700097 .002009329861113992 7.095376932085215 1 1.1523751483206472 .2830885583049549 .014212063384959249 5.654406639014709 1 1.7533311612460127 .19266922499285918 .06038927853563682 7.63872652475961 1 .8706634694165418 .27225441581137516 .013483301895410065 7.3137409182571105 1 1.079994695584802 .24342634444177425 .00556080453072516 8.201428740938756 1 .9901501157807883 .200563603986151 .008008362894517725 6.203767551444203 1 1.3157547495682214 .013003901170351105 .11967674566421503 8.84246002419529 1 1.6125047223917022 .27092330708216894 .00022581646766590453 10.475257746929474 1 1.6695885509839 .7825286415711947 -.0016366612111292963 8.494743062578646 0 .8480628518530803 .3766737329713641 -.0002713747199372005 7.579527636994385 1 1.060821685855263 .2603442745643109 -.002459104030792259 9.143559338285831 1 1.6878696906442758 .11737663668433701 .0026116483332574844 7.910486424877584 1 3.968033586295519 .09784727102453503 .04043281225865062 7.8916183004611 1 .899243711630933 .12953298510293923 .002730703847420142 8.836453142906317 1 1.876879945068383 .2996552112194602 .006820006046659643 8.30061876606964 0 .9541869853059908 .3654661990765307 .017404474266007573 7.69546963611513 1 1.9644481201108408 .4032161241201265 .005893921591571983 11.974259250751283 0 2.5286603845167903 .3319836814821631 .012760344774700163 10.392128915166236 1 2.5399946760599135 .28070659154350697 .014606202798280617 6.371492181952594 0 1.9045555011585498 .2964226277064348 .03337520020575096 8.209402284791736 1 5.716134885512472 .5292907790381521 .008094833449250688 8.67395024313201 0 1.3615053404450799 .14013686402856293 .0557818383280838 11.487299917331256 1 2.4918920664014497 0 .022622035593839667 5.8137662863912745 1 4.115151363312894 .41287287656029065 .0038089010156910734 7.650706871093464 0 2.355544245111873 .18922108333108298 .0066966541804420205 7.86064405705401 0 1.1459177647714638 .3806578806414909 .011546484840845521 7.993001640282866 1 1.4456867284194304 .2235793293343836 .0307410848329395 6.388823556055131 1 2.433206439905795 .05182226940660904 .005467731741770416 8.736676452032238 1 .6718319519007932 .24684993964632423 -.0034781785701610136 8.62690498651182 0 1.610265383568393 .003758584550491892 .0015988813762135522 8.218744836646748 0 1.5904666612063105 .08420437504195837 .004750224395124635 8.339667396550764 1 .4844194771165337 .19147015156146804 .008096030121903125 8.126672457793893 1 1.1255137911908935 .013977875086425444 -.022882576630560036 5.852547247921313 0 1.0980541357873876 .23833167825223436 .0021414722790804503 9.602645938552133 1 1.459342862276669 .2960699771294345 .026362620968550576 8.654420620275989 1 3.7872993641364756 .16344200892632452 .03434962682508268 6.424757186209322 1 1.3330632839020906 .14922273950851017 .028375058135984177 5.3653356230988205 1 1.7050250470489474 .09482475476419729 .044931304649582844 8.317028590089064 1 1.4095675648788928 .45225421663566984 .012633874174308985 9.654641503869545 1 .9817325091888597 .19951911856495427 .01903986539661863 6.826789228492126 0 4.749956883277682 .004365141127600385 .03350505532830546 8.57455905040462 1 2.6315519937451133 .24769642069198225 .04563033716614323 8.256789004740584 1 1.2951633934280147 .40049463644969263 .008107996872041815 6.28326437993576 1 1.6885666182075358 .12606567517413872 .007133636339187263 6.390119854398666 1 2.0852919852933036 .22075020161884867 .03615227896612162 8.04582799565937 1 .6189007671401604 .25361495023248226 .010725521502892112 7.872453625672574 0 1.5426325594770125 .2503213115847204 .038414029355626496 6.398403568472147 1 5.431799918831169 .48254364089775564 .03834164588528678 6.465522446681758 1 .6169221507832088 .09853527536057495 -.0068559833290489135 5.904451351934881 1 1.3892754575052342 .054018197650424475 .02138990983282448 7.747180085196543 1 3.208024624600127 .2938049183043822 -.018247490676978 7.8478625324739415 1 2.0495010093948625 .07491891129600903 .028160696657735153 5.4289595895385325 1 1.914610189433304 .12734084304731733 .03212300106711626 7.359841852124484 0 .4680346209512774 .07488053707720517 .0033831925393684814 8.738226678109067 1 1.5331495506412196 .2399151993404393 .02296684529768565 9.739909238559308 1 2.4155976127480647 .1974669905582754 .04859411600264234 8.57590766064048 1 1.3535190448409071 .5582551101016601 -.005989655296997654 9.011132073356997 1 1.8093710795578795 .3263061983574228 .02702264160363464 6.177338121345474 1 2.413543812058275 .13404023535930262 .007393056878498134 9.294058900151573 1 1.7598221676427495 .35767105564133356 .011749799851318124 6.3289790974743 1 1.9380582068072627 .24758035817844862 .03609937184292087 7.932295030766125 0 2.0306460746550394 .24806585552336896 .021115980979310157 6.4785541880673865 1 1.8385283854866579 .19986105409416743 .020652414185113842 10.363061986291324 1 1.9736462512097805 .06855061601161756 .024047875579706748 6.750961108209461 0 1.9769853417901335 .33724378397180416 .017118646198720125 7.777863842906347 1 1.8545938219946896 .30055710669517777 .018918646200314113 7.636383979077606 1 2.886666087988723 .15277429315835184 .03533725755090225 10.855454260862754 1 1.3807206988594998 .616143277023171 .003723889966898756 5.639606554437637 1 2.2361776526107477 .26794909161566116 .012921402895171477 7.630267068171986 1 9.28029906542056 .27524229074889867 .04070484581497797 8.644002038279933 1 .8217743575613246 .30930619871638637 .007741141789399162 5.845432488574447 1 1.3653890875928012 .22438623490829931 .01337300005543486 7.03656089770649 0 2.800319755600815 .3067567966466924 -.00008855565723056941 10.430521115146327 1 3.927241192562631 0 .05982923895641445 8.717433576126178 1 .5205014326016568 .17731842326756325 .0007743689109328184 8.6229545667191 0 .7988620988725066 .18910471323051906 .002006304018415759 5.249142878535657 0 1.1528772598589017 .27211150188875866 .013282133925851286 8.443672585190909 1 1.0824540711537936 .31280606062952193 .01893637035804013 7.857182870229327 1 1.1188457318289626 .2452208335452753 .010463449736842892 6.5064146534896405 1 4.414392777315159 .30744329614941296 .009213276808564675 6.2400224017506565 1 1.863419219993175 .2169520322068658 .018969571690629208 8.835908851864653 1 1.2937478285552086 .05945033965279936 .0030191359943169205 11.814059157345177 1 1.2836064650560342 .19992189011901249 .003265378320655072 7.041395040734514 0 .9734652518048188 .41056808441462483 .011505039458516832 4.915093087375768 1 2.3414603025370666 .04547157992309307 .007938720680141068 8.953379403377928 0 5.007207397093378 . .04567133374239973 9.570989408736887 1 1.2348910387473195 .6933934494285644 .015182351673371145 6.540061276160971 1 13.08921134631084 .3304803372220662 .03931358971740739 7.134521062086644 1 end
Saturday, September 28, 2019
Type mismatch issues with generating / replacing
Hi all,
I'm currently trying to create a dummy variable, inlaborforce, using the below code:
gen inlaborforce = 0
replace inlaborforce = 1 if labforce == "Yes, in the labor force"
replace inlaborforce = . if labforce == "NIU"
I'm seeing similar if not identical syntax in guides / forums, but I'm get a type mismatch error. Tried to destring but both variables are numerical so that didn't work.
Sorry if this is too basic a question. Thanks,
I'm currently trying to create a dummy variable, inlaborforce, using the below code:
gen inlaborforce = 0
replace inlaborforce = 1 if labforce == "Yes, in the labor force"
replace inlaborforce = . if labforce == "NIU"
I'm seeing similar if not identical syntax in guides / forums, but I'm get a type mismatch error. Tried to destring but both variables are numerical so that didn't work.
Sorry if this is too basic a question. Thanks,
Create several nested regression tables in the same file with asdoc
Dear all,
I want to create several nested regression tables in the same file and do it with option reset in asdoc. However, the created file can't open with MS Word. My code is as follows. If anybody has an idea as to how I can do this, I would be very grateful
Best,
Jishuang
I want to create several nested regression tables in the same file and do it with option reset in asdoc. However, the created file can't open with MS Word. My code is as follows. If anybody has an idea as to how I can do this, I would be very grateful
Code:
* Table1 asdoc reg y1 x1 x2, nest save(myfile) asdoc reg y2 x1 x2, nest save(myfile) * Table2 asdoc reg y1 x3 x4, nest reset save(myfile) asdoc reg y2 x3 x4, nest save(myfile)
Jishuang
Model selection to study individual fund attribute's impact
hi,
I have daily data for large number of exchange traded funds for 10 years . I am trying to examine the fund attributes such as expense ratio, size etc .. impact on fund performance. Among the attributes, I have one independent variable that is time-invariant. Can you pls advise on the below -
1. Which model is appropriate in this situation 1. Fama McBeth 2. Fixed Effect or something else (fixed effect throwing collinearity error )
2. I have on lagged dependent variable in the right side - can I use Sys GMM ?
3. Is it possible to use Fama French factor models in this situation to get the risk-adjusted return ?
Any thoughts - greatly appreciated.
Thank you !!
I have daily data for large number of exchange traded funds for 10 years . I am trying to examine the fund attributes such as expense ratio, size etc .. impact on fund performance. Among the attributes, I have one independent variable that is time-invariant. Can you pls advise on the below -
1. Which model is appropriate in this situation 1. Fama McBeth 2. Fixed Effect or something else (fixed effect throwing collinearity error )
2. I have on lagged dependent variable in the right side - can I use Sys GMM ?
3. Is it possible to use Fama French factor models in this situation to get the risk-adjusted return ?
Any thoughts - greatly appreciated.
Thank you !!
-xtlogit- Four-Way Dummy Variable Interaction Term Interpretation
I have a two-year panel data set of patients who all received a treatment at the end of year 1. The panel is split into eight 3-month periods, four before and four after. Patients are coded as having poor kidney function, diabetes and cardiovascular disease - all as dummy variables. The dependent binary variable is a 0 if they were not admitted to hospital in the quarter, and a 1 if they were. I am not interested in controlling for the diseases listed, I am interested in understanding how these different patient groups reacted to the treatment. If I code my treatment as a dummy variable, my input is:
Admitted to Hospital = age + sex + i.kidney#i.diabetes#i.cardiovascular#i.treatment.
I have run a RE xtlogit (using or command) setting patient as the id code and quarter as the time code. My output is quite long - but I understand that all interpretations are related back to the base category. My output says that compared to the base category of "no disease in pre-treatment year", my category of "no disease in post-treatment year" is 1.33 (significant at the 5%). Can I say that these patients were 1.33 times as likely to be admitted to hospital in each quarter, in the year after treatment? I would deeply appreciate any help anyone has to offer.
Admitted to Hospital = age + sex + i.kidney#i.diabetes#i.cardiovascular#i.treatment.
I have run a RE xtlogit (using or command) setting patient as the id code and quarter as the time code. My output is quite long - but I understand that all interpretations are related back to the base category. My output says that compared to the base category of "no disease in pre-treatment year", my category of "no disease in post-treatment year" is 1.33 (significant at the 5%). Can I say that these patients were 1.33 times as likely to be admitted to hospital in each quarter, in the year after treatment? I would deeply appreciate any help anyone has to offer.
Having Variables in X-Axis side by side, rather than in different groups..
Hello all, probably a very basic question but I am brand new to stata and trying to teach it to myself (Stata 16)
I am trying to create a bar graph of frequency of BMI groups by sex.
I run the code:
graph bar (count), over(bmigrp) over(sex)
which gets the results I want, but puts all the male results on one "window" of the graph, and the female results on the other side.
Is there a way to get the results to be side by side for each category in different colours?
Thank you!
I am trying to create a bar graph of frequency of BMI groups by sex.
I run the code:
graph bar (count), over(bmigrp) over(sex)
which gets the results I want, but puts all the male results on one "window" of the graph, and the female results on the other side.
Is there a way to get the results to be side by side for each category in different colours?
Thank you!