Monday, September 30, 2019

random treatment,50 times loop and store the estimate with p value for the variable of interest (POST) and display.

Hi,

I want to randomly assign treatment and run the same regression in the loop for 50 times and then I want to see how many times my variable of interest (POST) is significant. It's better if I can store the estimate with the p-value (or in a star representation). I tried the following. Where am I going wrong? Any help is appreciated.


drop post
drop random
foreach i of numlist 1/50 {

generate random=uniform()
sort random
generate post=0
replace post=1 in 1/50

tsset id year

set matsize 800
set more off

reg agriculture post i.country_id i.year i.country#c.line_time_trend [aw=ypop], cluster( country_id )
est sto b_1

esttab b_1,keep( post ) se star(* 0.10 ** 0.05 *** 0.001)
est clear //estimates clear


matrix p = J(`post',1,.)
matrix rownames p = `y'
matrix colnames p = "p"

matlist p

// for the computation of the p-values
local i = 1
foreach var of local y {
reg `var' post
matrix p[`i++',1] = 2*ttail(e(df_r), abs(_b[post]/_se[post]))
}

//
matlist p
}




Interpreting dummy variable with no constant in the model

Hi statalist, I have a question regard to how to interpreting dummy variable with no constant in the model. For example, in this cross sectional regression
stock returni = α1 capital expenditurei + α2 employeei + α3Financial industry + α4Consumer production industry + α5Agro industryεi
where capital expenditure and employee are normal variables, Financial industries, consumer production industry and agro industry are dummy variable where FIN = 1 mean the stock is in financial industries and 0 for other wise, Consumer production =1 mean stock is in consumer production and 0 for other wise Agro=1 mean stock is in agro and 0 for other wise (Note that suppose there are only three industry in the market, financial, consumer production and agricultural)
So in this case, I understand that there is no base dummy variables. Therefore, it can be interpreted that α3 is the coefficient of financial industry (it doesn't require to sum with coefficient from constant) However, what about capital expenditure, can I interpret that α1 is the sensitivity of "overall" stock return to capital expenditure instead of sensitivity of "base dummy variable" stock return to capital expenditure?

Best regards,
Siraphop Swingthong

Replacing Contents of All Variables

I have a data set in which I have students' grades for their 30 subjects aprox. These are letter grades like A*, A, B,... I want to convert them into their equivalent numerical values like A*= 95, A=90, B=80, and so on. Can someone guide me how can I do that using a single command instead of using the following command for every variable and grade?

Code:
replace Accounting ="96" if Accounting == "A*"
Secondly, do I also need to convert the type of the variable from string to integer?

ppml: Number of regressors excluded

Hi,

I am using the PPML estimator but I am not really sure what does it mean?:

"Number of regressors excluded to ensure that the estimates exist: 59"

I appreciate any suggestion.



Smoothing Kaplan Meier Curves

Hello STATA Community,

I am a new STATA user and I am trying to compute smooth Kaplan Meier curves. Is this possible?

After created my survival dataset with all appropriate variables, I run:
Code:
sts graph, by(kidneycan)
This produces Kaplan Meier curves but unfortunately the data center I work out of will not let me release these unless they are smoothed (because every event (i.e. drop in the graph) needs to be experienced by a certain number of people for confidentiality reasons). Smoothed survival curves still show the pattern by group, but removes the specificity of when each drop occurs. Can someone help provide syntax or instruction on how to smooth these curves?

I know how to compute smooth hazard functions but I'm really only looking to produce survival curves.

Thank you for your help.

Adding time variables

Hi,

I am currently working with time data. The following is the time each person spends alone on each activity (25 activities, shown only 7 ), this includes missing values since the person is not alone for the given activity. I would like to see how much time each person spends alone in a day. Essentially I am trying to add the h1+h2+ h3+ h4+ h5 + h6+ .... for each individual (140 individuals). Can someone please help me with the same? Thank you

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(h1 h2 h3 h4 h5 h6 h7)
. . 1800000 . 1800000 . .
. 3600000 3600000 3600000 . 7200000 3600000
. 1.08e+07 1.26e+07 1800000 1.08e+07 3600000 1800000
1.08e+07 1800000 1800000 1800000 1800000 3600000 .
. 1800000 5400000 1800000 1800000 3600000 3600000
1.08e+07 1800000 1800000 1800000 7200000 . 3600000
. . . . . 3600000 7200000
7200000 . . 7200000 . 3600000 .

end
format %tchh:MM:SS h1
format %tchh:MM:SS h2
format %tchh:MM:SS h3
format %tchh:MM:SS h4
format %tchh:MM:SS h5
format %tchh:MM:SS h6
format %tchh:MM:SS h7

Cheers,
Aarushi

Syntax of cross-classified multilevel model

Dear Statalist,
I need to analyze data with complex structures.

My data set consists of individuals who are fully nested within teams, which are nested "crossly" within functions and locations, which are fully nested within firms.
(individuals) ---> (teams) ---> (functions) and (locations) ---> (firms)

I guess I need to do four-level cross-classified multi-level analysis and I have the following hypothetical syntax:

Code:
xtmixed y x || firm: || _all:R.function: || _all:R.location: || team:

I hope someone can confirm whether it is right, or correct it if it is wrong. Thank you for helping me out.

Best,
Eunhee

Out of Sample/Scope Prediction with CI

I need some help to get more details for Examples 2&3 to work
(page 7 in regress postestimation — Postestimation tools for regress ) https://www.stata.com/manuals/rregre...estimation.pdf.

I need to add a few independent variable observations after the regression and ANOVA analysis and then predict the prediction interval and individual value.

I used the following program to insert an observation into the dataset and prompt the user for an out of scope prediction:
Code:
capture program drop OutOfSamplePredict


program OutOfSamplePredict

*ask user for the out of sample dependent var
    display "This will create an out-of-sample prediciton..."
        display ""
    display "Please enter your Dependant variable, for the Y Axis:" _request(dependant)
        display "Thank you, to verify:"
        display "The Dependant variable selected, for the Y Axis, is:$dependant"
    display "Please enter the INDependant variable used for your regression - the X Axis:" _request(INDependant_var)
        display ""
        display "Thank you, to verify:"
        display "The InDependant variable you entered is:$INDependant_var"
        display ""
    display "Please enter the value you wish to predict,  (Dependent variable out-of-sample value):" _request(INDependant_value)
        display ""
    display "Thank you, to verify:"
    display "The value you want a prediciton for is: $INDependant_value"
    
*run the regression Quietly
    quietly regress $dependant $INDependant_var
    
* create a new observation
    quietly insobs 1
    
*insert the new Out of Sample observation that the user entered  into the INDependant variable 
    replace $INDependant_var = $INDependant_value in l

*create a local macro name to store the  prediciton value to.

    local mypred = _b[_cons] + _b[$INDependant_var]*$INDependant_value

    
    display "your linear prediction equation : " _b[_cons] " + " _b[$INDependant_var] " * " $INDependant_value 
        display ""
    display "prediction value is = `mypred'"
        display ""
    display "writing your values to the dataset..."
        
        
    predict predict, xb
    predict residuals, residuals
    predict CooksDistance, cooksd
    predict StandardResiduals, rstandard
    predict Leverage, leverage
    
* insert the prediction value based on the regression equation
    replace predict = `mypred' in l

    
    
* generate leverage for the predicted value
/*
    predict temp_leverage in l, leverage      /* because predict can only make new variables create a temp variable  */
    replace Leverage = temp_leverage in l     /* replace the only created value into the replacement variable */
    drop temp_leverage                        /* drop the temporary variable */
*/

* generate Cooks Distance for the predicted value
/*
    predict temp_cooks in l, cooksd      /* because predict can only make new variables create a temp variable  */
    replace CooksDistance = temp_cooks in l     /* replace the only created value into the replacement variable */
    drop temp_cooks                        /* drop the temporary variable */
*/    
* generate Standard Error for the predicted value

    predict SE_Forecast in l, stdf      
    predict SE_Predict in l, stdp
    predict SE_Residual in l, stdr
    
* generate the Confidence interval for the out of scope prediction
    local 2SD = SE_Forecast[_N] * 2
    local UCL = predict[_N] + `2SD'
    local LCL = predict[_N] - `2SD'
    
    display "The Upper bound for your Confidence Interval is: `UCL'" 
        display ""
    display "The LOWER bound for your Confidence Interval is: `LCL'" 


    



end


I followed the examples but I got a different answer than our instructor who used "R". This is their code and answer:

> new.amount <- data.frame(pectin=c(1.5))
> predict(model,newdata=new.amount,interval="predict ion")
fit lwr upr
1 62.5725 53.19595 71.94905


Dataset

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float firmness byte pectin
 46.9 0
 50.2 0
 51.3 0
56.48 1
59.34 1
62.97 1
67.91 2
70.78 2
73.67 2
68.13 3
70.85 3
72.34 3
end

IV

Dear all,

I have a question concerning a comment from a reviewer and I am not quite sure how to go about it. To keep it simple, I have a paper on RCT (facilitating saving intervention among farmers) and the reviewer says that I should do an IV version of it by taking the treatment (binary) as an instrument. This would mean taking the basic treatment effect estimate and divide it by 0.2(multiplying it by 5) which would get the treatment effect among "compliers".

I am not sure even how to think and how to code this! I would be happy to have some thoughts about this issue.
Thanks!

Data management problem

Hi

I am trying to analyze some data, and have entered some commands,..as reproduced below,..however when i get to the loop in red,..i'm getting "==1 invalid name" and have run down quite a number of answers on similar problems, but i am not managing to find the problem. Help would be much appreciated,..i have also attached the excel file so that the commands can be reproduced.

// Importing data
import excel "Version3_20_06.xlsx", sheet("Form responses 1") firstrow

// Cleaning data

// Store-level variables
// Filling in store level variables
replace Retail_size="Small" if Retail_size=="small"
replace Retail_size=Retail_size[_n-1] if Retail_size==""
egen store_number=group(StoresTimestamp)
replace store_number=store_number[_n-1] if store_number==.
replace Shop_location=Shop_location[_n-1] if Shop_location==""

// Assigning numeric values to categorical variables
encode Retail_size, gen(shop_size)
encode Shop_location, gen(shop_location)
// Reordering variables
order store_number shop_size shop_location

// Dropping un-needed variables
drop StoresTimestamp Retail_size Shop_location


// Brand-level variables
drop Category_beverage
rename Cocacola_available cocacola_available
rename Cocacola_price cocacola_price
rename Coke_light_available coke_light_available
rename Coke_light_price coke_light_price
rename sprite_zeroavailable sprite_zero_available

//rename sprite_Size sprite_size
//rename BeverageSize stoney_size
rename cocacola_Size cocacola_size
rename Coo_ee_Price Coo_ee_price
rename CloverTropikajuicePrice clover_tropika_juice_price
rename cococola_promotion cocacola_promotion

//rename Stoney_zero_available stoney_zero_available
rename Stoney_zero_price stoney_zero_price
rename Stoney_zero_promotion stoney_zero_promotion

// Encode availability

foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
encode `var'_available, gen(`var'_available_)
drop `var'_available
rename `var'_available_ `var'_available
}
// Generating size measure in litres
foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
gen `var'_size_litres=.
replace `var'_size_litres=0.250 if `var'_size=="> 250ml"
replace `var'_size_litres=0.300 if `var'_size=="> 300ml"
replace `var'_size_litres=0.330 if `var'_size=="> 330ml"
replace `var'_size_litres=0.355 if `var'_size=="> 355ml"
replace `var'_size_litres=0.440 if `var'_size=="> 440ml"
replace `var'_size_litres=0.473 if `var'_size=="> 473ml"
replace `var'_size_litres=0.500 if `var'_size=="> 500ml"
replace `var'_size_litres=0.750 if `var'_size=="> 750ml"
replace `var'_size_litres=1 if `var'_size=="1L"
replace `var'_size_litres=1.25 if `var'_size=="1> .25L"
replace `var'_size_litres=1.5 if `var'_size=="1.> 5L"
replace `var'_size_litres=2 if `var'_size=="2L"
replace `var'_size_litres=2.25 if `var'_size=="2 > .25L"
drop `var'_size
}
// Generating price per litre
destring coke_zero_price, replace force float
destring sprite_zero_price, replace force float
destring red_bull_sugarfree_price, replace force float

foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruitJ_juice clover_tropika_juice pure_milk flavoured_water {
gen `var'_price_litre=`var'_price/`var'_size_litres
}
// Re-shaping data for analysis
order store_number-flavoured_water_price_litre, alpha
order store_number shop_size shop_location
local i=1

foreach var in cocacola coke_light coke_zero sprite_zero sprite sparletta stoney stoney_zero tab_caffeine_free Coo_ee appletiser Oros halls energade red_bull_caffeine red_bull_sugarfree vitaminwater still_water clover60_fruit_juice liqui_fruit_juice clover_tropika_juice pure_milk flavoured_water {
preserve
if `i'==1 {
keep store_number shop_size shop_location `var'*
drop if `var'_size_litres==.
rename `var'_available available
rename `var'_price price
rename `var'_price_litre price_litre
rename `var'_size_litres size_litres
rename `var'_promotion promotion
gen brand="`var'"
keep store_number shop_size shop_location brand size_litres available price price_litre promotion
order store_number shop_size shop_location brand size_litres available price price_litre promotion
save "Data.dta", replace
}

if `i'>1 {
keep store_number shop_size shop_location `var'*
drop if `var'_size_litres==.
rename `var'_available available
rename `var'_price price
rename `var'_price_litre price_litre
rename `var'_size_litres size_litres
rename `var'_promotion promotion
gen brand="`var'"
keep store_number shop_size shop_location brand size_litres available price price_litre promotion
order store_number shop_size shop_location brand size_litres available price price_litre promotion
save "Temp/`var'.dta", replace
use "Data.dta", clear
append using "Temp/`var'.dta", force
save "Data.dta", replace
}
local i=`i'+1
restore
}


I can't seem to find a solution to the "==1 invalid name" error,..i will greatly appreciate any possible help

ci for categorial variables - tabci?

I'm just wondering why Stata gives CIs for binary variables - ci proportions varname - but, as far as I tried to find, we cannot have results for categorical variables.

Code:
. sysuse auto
(1978 Automobile Data)

. ci proportions foreign

                                                         -- Binomial Exact --
    Variable |        Obs  Proportion    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
     foreign |         74    .2972973    .0531331         .196584    .4148353

. ci proportions rep78
no binary (0/1) variables found; nothing to compute
Am I missing something, is it technically impossible or we just don't have a user-written program to estimate these proportions with CIs in Stata?

If it can be done, and considering (and maybe I'm wrong) we don't have it so far, maybe this thread could foster somebody to create, say, a SSC (or SJ) - tabci - ado file.

Command stir does not accept iweights

Hello
I've been using the command stir (incidence rate comparison). The manual says that stir allows iweights and fweights (more specifically it says you can specify stset with iweights and fweights), but when I use iweights I get an error message saying the iweights are not allowed. Is it a known issue? Is there a particular reason for the command to prevent weighting data with iweight or is it a bug? I know that some commands do not allow the use of all types of weights. But stip, a similar command which make similar calculations but do not report confidence intervals, does not forbid the use of iweights.
Thanks for your help
Christophe



Replace if not found within a loop

Dear all,
I am trying to bottom and top coding the household income for each country - wave.
I am using the following loop that I already successfully used with another data source:
levelsof country, local(c)
levelsof wave, local(w)
foreach country in `c' {
foreach wave in `w' {
sum hi100_equ [aw=new_weight] if country==`country' & wave==`wave', det
local bottom_hi=r(mean)/100
sum hi100_eur [aw=new_weight] if country==`country' & wave==`wave', det
local top_dhi=10*r(p50)
disp `bottom_hi'
disp `top_hi'
replace adjusted_hi=`top_hi' if hi100_eur>`$top_hi' & country==`country' & wave==`wave'
replace adjusted_hi=`bottom_hi' if hi100_eur<`$bottom_hi' & country==`country' & wave==`wave'
}
}
I want to replace the income below the 1% of the equivalised household income with that 1%, and those greater than 10 times the median of non-equivalised with the value corresponding to the 10 times median.
However, Stata returns the following error: if not found.
I cannot understand what's the error, can anyone help me?
Thank you

endogenous switching regression model

Hello,
Please I am new in stata so it's quite difficult for me to solve some regression problems.

Actually, I am trying to estimate an endogenous switching regression model using the movestay command where my selection dependent variable is ADOPT(dummy) while my outcome dependent variable is HDDS(continuous).
Unfortunately I keep getting the following error. i tried to add the variables progressively but either of the messages below. the strange part is that the same command was ran on a different computer using the same of stata and it produced results without any errors.
I am getting desperate. Can someone help me solve this problem? Please


global str GENDER AGE YEARSOFEDUC INCOMEO DISNEARESTMARKETKM TENURE CREDITACCESS

. movestay(HDDS=$str),select(ADOPT=$str MEDIA)

Fitting initial values .....
initial: log likelihood = -<inf> (could not be evaluated)

could not find feasible values
r(491);


or


movestay HDDS INCOMEO AGE AGRICEXPERIENCE ,select(ADOPT=MEDIA)

Fitting initial values .....initial vector: copy option requires either a matrix or a list of numbers
r(198);

-dstdize- to calculate average age-adjusted death rate by race

Good Morning,

I would like to calculate the average age-adjusted death rate by race (separately for whites and blacks), and test if the rates for countries are statistically different. As an example, I have modified Stata's mortality data file as follows:
Code:
clear
input str6 nation byte age_category float(population deaths)
input str6 nation byte age_category race_category float(population deaths)
"Sweden" 1 1 3100000  3000
"Sweden" 1 2 45000  523
"Sweden" 2 1 3000000  10000
"Sweden" 2 2 57000  928
"Panama" 1 1 700000  300
"Panama" 1 2 41000  904
"Panama" 2 1 270000  1000
"Panama" 2 2 75000  421
end
label values age_category age_lbl
label def age_lbl 1 "0 - 29", modify
label def age_lbl 2 "30 - 59", modify
label values race_category race_lbl
label def race_lbl 1 "White", modify
label def race_lbl 2 "Black", modify
Based on my understanding of the -dstdize- command, the following code will give me the average death rate by race for each country.
Code:
dstdize deaths population age_category, by(nation race_category)
I have the following two questions:
1. In my study, I have about 300 countries. Hence, I would like to present an age-adjusted rate for blacks and whites (averaged across all countries). I just want to make sure that the following code will help me achieve this?
Code:
dstdize deaths population age_category, by(race_category)
2. Is there a way for me to test if the death rate for whites in Sweden is statistically different from that for whites in Panama?

I am using Stata MP 14.2 for Windows.

Many thanks in advance,
Caroline

Can I test country level data and firm level data in one regression

Hello everyone,

I'm trying to find the effects of culture on the existence of teamwork. My dataset consists of 29 countries and of around 30.000 companies. My dataset is cross-sectional.

Dependent variable
Team = a dummy variable that equals 1 when there is teamwork present in the company. (Firm level)

Independent variables
Individualism = A national cultural score which is defined on country-level (range 1-100).
Masculinity = A national cultural score which is defined on country-level (range 1-100).
Power = A national cultural score which is defined on country-level (range 1-100).
Female = The percentage of females in the company used as a proxy for altruism (Firm level)

Control variables
Size = This variable equals the amount of employees in the company (Firm level)
Sector = 0-5 depending on type of sector (dummy). (Firm level).

What is the best way to analyze this matter.? Using a regression on country level with the averages of the firm level variables comes at the cost of the amount of observation (29 instead of 30.000).
To be able to analyze this matter on firm level, I have to make the assumption that all cultural scores are exactly the same for each company in the same country, which feels like a strong assumption. However, I would have much more observations (30.000 instead of 29).

What is the best way to analyze this matter?

Thank you in advance and best regards,

Generating a new variable that equals the Nth non-missing value in a row

Hi All,

I am essentially trying to compress a row of variables into just the non-missing entries. My dataset looks like this

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(q5__1 q5__2 q5__3 q5__4 q5__5 q5__6 q5__7 q5__8 q5__9 q5__10)
. 2 . . . . . . 0 0
1 . . . . . . . 0 0
. . . . . . . . 0 0
. . . . . . . . 0 0
. . . . . . . . 0 0
. . . . . . . . 0 0
. 2 . . . . . . 0 0
1 . . . . . . . 0 0
. . 3 . . . . . 0 0
. . . 4 . . . . 0 0
. . . . . . 7 . 0 0
end
So I was to gen var_a = the first non missing value from each column, so that would be all the observations basically, 2,1,.,.,.,.,2,1,3,4,7. If there are more, then the next variable var_b would include them, but would be missing otherwise.

Any ideas?

Many thanks!

Chris


working out expressions(sum and multiply) in stata with some missing observations

Hi Nick Cox


I want to add and multiply certain variables in stata in a new variable VaR. My expression is:

gen VaR=constant_term+ (coeff_L_Tbills* L_Tbills)+ (coeff_L_Inf* L_Inf)

Now the issue is that, for some rows where when L_Inf is missing or 0, the whole outcome VaR of the equation becomes zero. which shouldn't be otherwise if calculated manually.

How can I amend the above code, to get the correct equation.

Best Regards


two stage IV approch

bootstrap "repeated time values within panel" 2 SRI

I'm trying to run 2sri approach(control function) in Stata 15. My first stage is a probit regression and the second regression is Tobit (using xtdpf code from the website)

My first stage code is as:
global Z "a b c d "
probit X IV $Z i.year i.industry, r
predict `Xhat'
matrix `a1'=e(b)
gen Xuhat=X-`Xhat'

My second stage code is :
global Z "a b c d"
xtdpf Y X $Z Xuhat, timedummies

The problem now is that in the probit model, I control for year and industry fixed effect, I cannot control firm fixed effects. In the second stage, I control both year and firm fixed effects. My supervisor said in this way, the estimates are inconsistent. However, my instrumental variable is quite a constant variable (didn't change every year), besides Probit cannot include firm fixed effects. Does anyone know how to solve this problem? please help me, I will appreciate any help!

Looping Over the numeric value of a variable and use it for the subsequent loop.

Hi, thanks for your help in advance.
I have the following dataset. And I come up with the following code:

Code:
egen t=max(dateday_count)
forvalues i=1/762{
gen k_`i'=.
gen ref_`i'=.
}

levelsof dateday_count, local(levels) 
foreach i of local levels {
        di(`i')
        quietly{
        forvalue x=1/`i'{
        bys permno (dateday): replace k_`x'=adjvol[_n-`x']*exp(total_log_rev_turnover-total_log_rev_turnover[_n-`x'+1]) if dateday_count==`i'
        by permno: replace ref_`x' = adjprc[_n-`x']*k_`x' if dateday_count==`i'
        }
}
}

In words, what I am trying to do is, looping over the value of variable dateday_count, and for each row, I want to do
Code:
forvalue x=1/"dateday_count"{
bys permno (dateday): replace k_`x'=adjvol[_n-`x']*exp(total_log_rev_turnover-total_log_rev_turnover[_n-`x'+1]) 
by permno: replace ref_`x' = adjprc[_n-`x']*k_`x'
}
More specifically, for row 12 for permno 10001, I want to do
Code:
forvalue x=1/12{
bys permno (dateday): replace k_`x'=adjvol[_n-`x']*exp(total_log_rev_turnover-total_log_rev_turnover[_n-`x'+1]) 
by permno: replace ref_`x' = adjprc[_n-`x']*k_`x'
}
I am moderately confident that my code is doing the job. However, I am wondering, is there any faster way for me to loop over the numeric value of dateday_count straightaway, and choose it as the looping variable. So I do not have to use "if dateaday_cout==`i'" command.



In the dataex, I only attached the sample from single permno, you need to remove the "bys permno" part of my code.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(permno dateday) float(adjvol adjprc total_log_rev_turnover) double dateday_count
10001 10595     0 2.1041667          0   1
10001 10596  7200 2.0833333          0   2
10001 10597  6150 2.1666667          0   3
10001 10598  2100 2.0416667 -1.2086598   4
10001 10601   600 2.1666667 -1.4323044   5
10001 10602 17250 2.1666667 -1.4323044   6
10001 10603  1500 2.0833333 -2.1274576   7
10001 10604   330 2.0833333  -2.244239   8
10001 10605  1500 2.1666667  -2.939392   9
10001 10608   300 2.1666667 -3.0449755  10
10001 10609  3000 2.0833333 -3.0449755  11
10001 10610  1500 2.1666667  -3.740129  12
10001 10611     0     2.125  -3.740129  13
10001 10612  1500 2.0833333  -4.435282  14
10001 10615   600 2.1666667  -4.658927  15
10001 10616  2982 2.0833333  -10.17839  16
10001 10617   390 2.0833333 -10.317952  17
10001 10618  3300 2.1666667 -10.317952  18
10001 10619     0     2.125 -10.317952  19
10001 10622 17700     2.125 -10.317952  20
10001 10623  7575 2.1666667 -10.317952  21
10001 10624   600 2.1666667  -.2236447  22
10001 10625  6900 2.1666667  -.2236447  23
10001 10626 13500 2.1666667  -.2236447  24
10001 10629 17100 2.1666667  -.2236447  25
10001 10630  5400 2.0833333  -.2236447  26
10001 10631 16200     2.125  -.2236447  27
10001 10632  6000     2.125  -.2236447  28
10001 10633     0     2.125  -.2236447  29
10001 10636  3600 2.0833333  -.2236447  30
10001 10637  3210 2.1666667  -.2236447  31
10001 10638  2100 2.1666667 -1.4323044  32
10001 10639 10080 2.1666667 -1.4323044  33
10001 10640 12816     2.125 -1.4323044  34
10001 10644  8400 2.1666667 -1.4323044  35
10001 10645 12000 2.1666667 -1.4323044  36
10001 10646     0 2.2083333 -1.4323044  37
10001 10647  3000 2.1666667 -1.4323044  38
10001 10650   900      2.25 -1.7898387  39
10001 10651  3300      2.25 -1.7898387  40
10001 10652  3900      2.25          0  41
10001 10653  1200     2.125  -.5121625  42
10001 10654  1350      2.25 -1.1116405  43
10001 10657  3000      2.25 -1.1116405  44
10001 10658 11400      2.25 -1.1116405  45
10001 10659   900      2.25 -1.4691747  46
10001 10660  7800      2.25 -1.4691747  47
10001 10661   450 2.2916667 -1.6320474  48
10001 10664  3600 2.1666667 -1.6320474  49
10001 10665  2550 2.1666667 -3.5405884  50
10001 10666   300 2.2916667 -3.6461716  51
10001 10667  4800 2.2916667 -3.6461716  52
10001 10668  1200 2.2083333 -4.1583343  53
10001 10671  4800 2.2083333 -4.1583343  54
10001 10672 13500 2.2083333 -4.1583343  55
10001 10673  5160 2.3333333 -4.1583343  56
10001 10674  2700      2.25   -6.47912  57
10001 10678  1500 2.3333333  -7.174273  58
10001 10679   600 2.3333333  -7.397918  59
10001 10680     0 2.2916667  -7.397918  60
10001 10681     0 2.2916667  -7.397918  61
10001 10682  1500      2.25  -8.093071  62
10001 10685     0 2.2916667          0  63
10001 10686  4950 2.3333333          0  64
10001 10687   900 2.3333333  -.3575342  65
10001 10688  5400 2.3333333  -.3575342  66
10001 10689  1050      2.25  -.7893968  67
10001 10692  4650 2.3333333  -.7893968  68
10001 10693  5400      2.25  -.7893968  69
10001 10694  1500      2.25   -1.48455  70
10001 10695  3900      2.25   -1.48455  71
10001 10696 10800  2.416667   -1.48455  72
10001 10699  9000  2.416667   -1.48455  73
10001 10700  5940 2.2916667   -1.48455  74
10001 10701  1980 2.2916667 -2.5672574  75
10001 10702  1500  2.416667 -3.2624106  76
10001 10703  1500  2.416667  -3.957564  77
10001 10706  2100  2.416667  -5.166224  78
10001 10707  5400       2.5  -5.166224  79
10001 10708  2400       2.5   -6.78371  80
10001 10709  1500       2.5  -7.478863  81
10001 10710 23250  2.416667  -7.478863  82
10001 10713 14910     2.375          0  83
10001 10714  7800  2.416667          0  84
10001 10715 84111       2.5          0  85
10001 10716 19350  2.416667          0  86
10001 10717     0  2.458333          0  87
10001 10720  8550       2.5          0  88
10001 10721  3900  2.416667          0  89
10001 10722 17499     2.375          0  90
10001 10723   900     2.375  -.3575342  91
10001 10724  3000  2.416667  -.3575342  92
10001 10727   900       2.5  -.7150685  93
10001 10728  2700  2.458333  -3.035854  94
10001 10729     0  2.395833  -3.035854  95
10001 10730  3900 2.3333333  -3.035854  96
10001 10731   600  2.416667  -3.259499  97
10001 10734     0     2.375  -3.259499  98
10001 10735  9600 2.3333333  -3.259499  99
10001 10736   600  2.416667  -3.483144 100
end
format %d dateday

Marginal effect after heckpoisson

Dear All,

I am new to stata and I am using Heckpoisson, a poison regression model with endogenous sample selection model to estimate factors influencing access to extension services and intensity of of access. I have the model result already but I have no clue of how to estimate the marginal effect (dydx) after Heckpoisson. I need you suggestion or help.

Below is the model output:

Poisson regression with endogenous selection Number of obs = 293
(25 quadrature point s) Selected = 141
Nonselected = 152
Wald chi2(9) = 39.34
Log pseudolikelihood = -397.3723 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------------
| Robust
ext_freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
ext_freq |
age_hhh | -.0040843 .0034685 -1.18 0.239 -.0108824 .0027139
gender_hhh | -.2542102 .0785204 -2.82 0.005 -.3754374 -.067643
hh_size | -.0046676 .0156446 -0.30 0.765 -.0353305 .0259953
dist_mkt | .0075753 .013998 0.54 0.590 -.0198998 .0349712
own_cash_crop | .1196532 .0692318 1.73 0.084 -.0160386 .2553451
input_acces0 | .2543907 .0800315 3.05 0.002 .0870918 .4008095
aware_extension_services | -.2671813 .0863115 -3.15 0.002 -.4409838 -.1026488
access_2crdt | .07612 .0672013 1.14 0.253 -.0549509 .2084733
farm_size1 | .0021675 .0178802 0.15 0.881 -.032373 .0377159
_cons | 1.263667 .2010278 6.15 0.000 .8426601 1.630675
-------------------------+----------------------------------------------------------------
access_to_extension |
hh_size | -.051465 .031816 -1.62 0.106 -.1138233 .0108933
dist_mkt | -.0452002 .0317816 -1.42 0.155 -.107491 .0170905
own_cash_crop | -.0665395 .161095 -0.41 0.680 -.3822798 .2492008
input_acces0 | -.0579019 .1757906 -0.33 0.742 -.4024451 .2866414
aware_extension_services | .6452373 .1814888 3.56 0.000 .2895259 1.000949
access_2crdt | .3881123 .1606434 2.42 0.016 .0732571 .7029675
sell_rice | .3996281 .2191917 1.82 0.068 -.0299798 .8292361
farm_incom_hh | .0000456 .0000128 3.56 0.000 .0000205 .0000707
crop_dive | -.3390377 .3519390 -1.06 0.288 -.9650316 .2869562
main_inc_source | -.0995744 .1876322 -0.53 0.596 -.4673268 .268178
_cons | -.1595383 .4452855 -0.35 0.730 -1.026583 .7189037
-------------------------+----------------------------------------------------------------
/athrho | 1.515521 .248586 6.54 0.000 1.137901 2.11234
/lnsigma | -3.783098 .9520586 -4.03 0.000 -5.703099 -1.971098
-------------------------+----------------------------------------------------------------
rho | .9636253 .0322574 .8137061 .9711619
sigma | .0261155 .0222056 .0033356 .1393039
------------------------------------------------------------------------------------------
Wald test of indep. eqns. (rho = 0): chi2(1) = 33.01 Prob > chi2 = 0.0053

My regards,

T. Sumo

Numbering occurrences of a flag within patients.

In the example below I have created OVERLAP_FLAG, which is equal to 1 where two consecutive regimens overlap and their start dates are within 28 days of one another.

My plan is to number each group of flags with a view to merging each overlapping group of regimens together.

As you can see from the example below, this is complicated by the fact that some patients have multiple groups of overlapping regimens.

Is there a straightforward way of numbering each group in a distinct fashion within each patient? For example, so that the first three flags form a group coded to 1 and the second solitary flag becomes 2, etc. This would be easy if were was a variable on which to sort the flags.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long id double regimen float(start_date end_date regimen_overlap overlap_flag)
9942  222346 20577 20587   . .
9942  222347 20577 20605   0 1
9942 1194964 20591 20643   0 1
9942 1194965 20591 20628  14 1
9942  222349 20741 20752 150 .
9942  222348 20763 20842  22 .
9942  222350 20766 20850   3 1
end
format %td start_date_of_regimen
format %td end_date_of_regimen

Sunday, September 29, 2019

Generate a new variable in a loop in different files and calculating the mean

Hi,

I want to write a loop to generate a new variable called savings (income-expenditure) in two separate files. Use sample weights with name wht to calculate its mean in respective files. Then a combined mean of the means. How can we do it via loops?

Any help is appreciated.

how to export Summary Statistics to word

Dear Sir/Madam
I am working on my thesis on impact of informal credit on household welfare. I try to generate summary statistics for command of:
"sum heads_child girl_cnt age_cnt if f07_observed==1 & treatment==1"
I used the command of "outreg2 using myresul.doc, replace keep( heads_child girl_cnt age_cnt ) eqkeep(Mean) dec(3)". But the STATA result is "==1 invalid name". Please may you help me?
Thank you in Advance

Adding partner data to responding person data in a panel data file

I wish to extend my analysis of panel data to include information about the responding person's partner so I began the process of merging this extra data to my original (Responding Person) data file. The variables of interest are the same (e.g. age, sex, educ, inc, emp, ...). I collect the data from two separate data files of the HILDA survey. I collect the 'Responding person' data from the "Combined" data file (I think this is correct) and the 'Partner' data from the 'Rperson' data file (where I run a loop to append selected vars for all waves). I believe the 'unique identifier' is xwaveid (destringed), which I obtain for the responding person data. Based on HILDA suggested code (https://melbourneinstitute.unimelb.e...rogram-library) I use the 'unique identifier' hhpxid for the partner data and then rename to xwaveid. There are about 300,000 obs in the responding person data and about 150,000 obs for the partner data. When I merged these two data files it showed I had only about 150,000 obs, which appears as though I have unintentionally deleted all non-partnered 'responding person' obs. I want information on all responding persons and about those with partners. I would appreciate assistance with the code to ensure I am not unintentionally deleting obs from the new data file. My code follows:

Code:
clear all
set memory 1g
set maxvar 32000 
set more off

// Merge 'rperson' file with 'partner' file

use "C:\data\rpersondata.dta", clear         
merge 1:m id wave using "C:\data\partnerdata.dta"                 

order id wave
sort  id wave
tsset id wave

save "C:\data\rperson-partner", replace
.

I appreciate any help with the process and data.

Matching observations across boundaries

I am looking at houses on each side of a designated geographical boundary.

Does anyone know how to match house characteristics on either side of the boundary?

I have the following variables:
boundary - identifies which boundary the house belongs to.
treatment - identifies which houses are on the treated/control side of the boundary
bedrooms - number of bedrooms
baths - number of bathrooms
parking - number of car spaces

I have only been able to identify whether duplicates exist at the boundary level - see following command. I am unsure how to break this down and compare each side of the boundary.

Code:
 sort boundary bedrooms baths parking
quietly by boundary bedrooms baths parking: gen dup = cond(_N==1, 0,_n)
drop if dup==0

Essentially, I aim to ensure that if a house with x bedrooms, x baths and x parking spaces exists in the control group for boundary x, then a house with x bedrooms, x baths and x parking spaces exists in the treatment group for boundary x.

F test for cof.

Hello everyone,

My model is: Y = b0 + b1.x1 + b2.x2 + b3.x3 + e
and I use GMM for this model.

I want to test b1 + b2 < 0 by F-test
What is command for this. And how to read the resullt?

Thank you so much.

Panel data with three identifiers

Dear All,

I am currently working with a country-year panel dataset which has data for value added for multiple sectors. It looks as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long country int year double(agriculture manufacturing mining utilities construction)
5 1970   981.375378691819 6083.8882699607275  3314.377986731164 506.70685027966016 2006.7123877789525
5 1971 1047.3785337633847 6916.5025883226235  3380.606676519309  574.0725008948751 2235.0557453082833
5 1972 1004.0187968550568  7110.704855608108 3309.9627407452876  656.0828581655715 2027.9986329723647
5 1973  857.5592410758162  6647.521287082387 3364.4174412377624  675.1209768176975 1788.0445962466267
5 1974 1087.0186460911143  6477.841145292657  4110.573046865174  710.5728047619141  2257.390075704935
5 1975 1134.5517052620596    4828.1632395765  3648.174427027457  683.7475882841235 1669.5575805624069
5 1976 1116.2193959399322 5118.6251140941695  4092.781502010798  723.6899811012743 1393.6386137599113
5 1977 1234.8119779118867   5553.11987047624  4204.795616042943  765.4049653156358  1381.483073306251
5 1978 1189.0248529620965  6067.751221019465  4271.604682434886  816.9282885945638 1492.8891916388243
5 1979  1262.397738606136  6547.328358169937  4500.897755405062  872.4694857071697 1849.1763433825088
5 1980 1310.8037648876589  6951.561122736998  4734.184848648669  916.3115795981843  2291.850442816282
5 1981   1361.21761552351  7129.405790177698  5098.548425411251  935.1010484086189 2775.4757319337446
5 1982 1351.3967355295133  5635.297596050137  5431.141386362448  935.6919122076893  2122.026921332625
5 1983 1323.0689527912727  5809.858704261644  5317.674901321783  974.4525774266994 1817.1942903442387
5 1984   1441.57423805217   6326.53118621761  5608.693742154081  1043.938160197364 1857.7914351603458
5 1985  1543.100313012334  6496.477562539438   5791.14785009947 1083.7623802547005  2184.456832982976
5 1986 1660.7047304139123  6991.434529896969  5844.194450763241 1150.0663227014993 2397.3264521622946
5 1987 1818.8784210569606  7360.349185204471  5825.051991719084 1212.8607937144413  2616.440559550564
5 1988 2028.3106346387979  8008.985307139801  6281.010706743252  1281.282952848104 2841.1835013807054
5 1989  2155.034353611419  8886.108881625602  6770.549692753161  1180.786731574298 3157.5709039675676
5 1990   2351.34021390641  8973.802928950769  6833.363549901618 1077.8464141233414 3474.1928413228406
5 1991 2408.4364500945303  9452.749982148323  7679.688653679325 1368.4682503263523  3421.012082585575
5 1992 2697.5166095917593 10531.674004308923  7569.027171831896 1745.8993692111267 3887.8242222622434
5 1993 2781.7553785478876 11295.766898195656  7556.805316858913 1830.5277660732788  4800.750808303282
5 1994  2991.578662240602 11756.838693410447  8228.939157081648 1944.1343699178508  4749.886080404079
5 1995 3198.4717539431185 12642.568025900755  8996.785292525166  2092.459673304508   5221.02245487333
5 1996  3288.185504451532 13048.903995935294 10414.844340025093  2013.910182318858  5670.742873302024
5 1997 3400.8444205103046 13666.359953230405 11589.309516818577 2180.6136392506082  6026.200400763627
5 1998 3480.0746719699564 13352.125643752064 12549.560396035524 2276.8215444742495  6139.222709566625
5 1999  3507.691666495815 13283.739759050579 13878.610178440074 2168.7028528576793  5533.019757328133
5 2000  3739.033132780806 13936.441006213818 14323.552454900651 2374.8681490421977  5493.928485426047
5 2001  4020.677532508271  14022.11307099687 15135.987692954339 2410.9514914702045  5719.199924726687
5 2002  4295.359875214949 14287.256919868087 14496.674452839698  2490.106139597811  5862.724799319365
5 2003  4407.591009192732 14756.880720587662 15295.681421215628  2597.848939476505     6116.296122638
5 2004  4893.195674290561  15787.57945427821 16229.215748930521 2669.7927909571977 6314.7171296878905
5 2005   5235.52366773901 16727.216355640525 15597.436348861484 2750.7696314974487  6953.399167622369
5 2006  5445.435190788036 17387.421512458182 15702.659249244005  2958.520403792658  7228.817939932632
5 2007  5496.648019910568  17914.92334467639  16222.68096332891 2105.2513241878996  7561.054759264848
5 2008  5718.895309529531  18164.10214994937 15307.885730809367 2067.1852711011134  8328.307975440213
5 2009  6168.858138719481 19665.531230700435   14887.4184368782 2185.9714380755613  9188.567039247773
5 2010  6643.054616967439 20749.751555815412 20625.515807991615  1717.334059385173  9910.352150850182
5 2011  6372.830277868871 21521.695290676136 22306.019697880554 1652.4533003991382 10074.898295916159
5 2012                  .                  .                  .                  .                  .
6 1970  68821.93786184487 17369.873736774083  3526.515925579863 2041.1034930415956  6478.822823366367
6 1971  70154.88059311402 19806.013382866688  4021.112799977039  2327.370003468182  7370.674429313272
6 1972  69529.11852637665  21228.31016384425  4309.874383683857 2494.5016114342784  7219.073877877839
6 1973  76405.62475426005 23149.738455664396 4699.9720650859945  2720.285290549922 7473.1613642627735
6 1974  79672.18431101153 23478.436567610952  4766.705948363077  2758.910030983694  7967.122989619162
6 1975  81298.14725613421 27884.128940155526   5661.17096005116  3276.615238697974  9242.602076124318
6 1976  79860.65545789216  27204.02823429808  5523.093619562108  3196.697793851682  9657.891406608484
6 1977  78141.54154392579 31377.195195268836  6370.350195573367  3687.079347003094  9824.914960944541
6 1978  81482.31652129904  36914.34728855157  7494.529641851021  4337.740408238934  9766.317058592982
6 1979  86480.53106427223  40110.23922306626  8143.375106952908 4713.2840817148735   9959.39087194243
6 1980  85198.03664727035  45186.85969769561   9174.05519266953 5309.8288776436475 12622.200547720255
6 1981  91147.25782712108  45972.85443251355  9333.631651989963  5402.189744686262     13022.49539885
6 1982 101655.69523825926 48626.426400488635  9872.372689866636  5714.006347095089  13467.98482994601
6 1983 110119.28092506662 53353.591315590245 10832.104614725418   6269.48723121748  15766.45203965875
6 1984  124303.5092001832  61278.47839969475 12441.053587759634  7200.726856470508  17477.38970981009
6 1985  126595.4729153697   72436.9164726229  14706.49374956509  8511.935404827627 21357.667218341994
6 1986 130796.37675991641  79421.29703643244 16124.496531959472  9332.657753277526  24747.26071581163
6 1987 136949.46533222307   89939.2949550645  18259.91143061495 10568.609298859183 29169.873183938675
6 1988 140433.74440328806 102753.77047817009 20240.274289695364 12444.517907813512  31500.62223648442
6 1989 144752.02644349143 102726.90669706423 18585.498872444587  12045.92373388096 28840.598387041602
6 1990 155359.30865097183 103857.54312662853 16009.664171667959 10333.248612250309 29182.785921071838
6 1991   159087.684325992   117380.464430292  17545.75458490957 11342.875186746853 31974.064723389914
6 1992  166566.1122787945 145099.27530431503 18292.937257386733  12552.35200495757 38697.583006345696
6 1993 174397.55919085976 179038.58372590304 21755.108379221816 12998.873510122829  45662.19714692269
6 1994 181374.32862260463  229234.3566415063 25124.183578975397 13543.017164438932 51906.919430940674
6 1995  190444.4385496261 293532.53271976346   34191.0874256793  18450.14721820064 58341.801828012125
6 1996 200158.65324442572  333720.0491866298 36820.182114367555 18924.544391895302  63306.33906522496
6 1997  207163.2925940071 368590.35520958534   41277.9203685271 23685.839451496056  64963.44862183223
6 1998  214412.5678890425  400340.8864532194  42821.50801665309  28977.71925394508  70821.05924713468
6 1999 220416.11978993574  430023.4113546913  47481.79707818456  34844.40065579689  73853.43390707385
6 2000  225706.0779279122  466665.6071827977 58842.837490595186  36979.48239400304  78040.35211055088
6 2001 232026.00857688286 509624.91748463066  58860.59277227855 42771.547508042335  83334.05398435422
6 2002  238754.8214577816  560306.9317601209  61623.92582143337  50271.42957991981  90657.22914538029
6 2003 244723.57627283974  631506.6268613825  70621.67376257626  55785.21351099297 101611.77245709402
6 2004 260141.22718433995  670682.3291969394  98865.51429976773   75601.4894113703  109872.8095578558
6 2005   273748.487774046  748324.4711984446  110310.7693275924  84353.56371073547 126585.04526160768
6 2006  287435.9121627484  870334.4883952552 121780.59467706362  72293.79046379843   147239.290461974
6 2007    298071.04091277 1004849.0825141772 133722.59182882542   84434.0867865947 170104.38001878318
6 2008 311708.75538201455 1110095.2136394589 147728.46164443574  93277.56500040894 186523.36037079184
6 2009  324749.2434879906 1207014.3575204427 160626.19857142656 101421.35449888217 221133.34201765328
6 2010 338620.23225162644 1352625.1903939417 180003.69346994383  113656.5427613304 250916.70350204606
6 2011                  .                  .                  .                  .                  .
6 2012                  .                  .                  .                  .                  .
7 1970   4166.08956278226  6096.705315576166 1867.8060020608802  1195.187578990279  2933.284089633269
7 1971  4267.720071074892  6612.174997849354 1781.8200480018113 1306.7384196960384  3144.597708073326
7 1972  4515.515804777153  7215.752117827384   1659.21044684351 1471.4087083569211  3201.087487260272
7 1973  4622.287777814628   7831.29011507595 1734.8287385859464 1699.4152342569348 3832.5777400196935
7 1974  4878.711177596934  8483.707446580545 1275.7722402303477 1760.7445384035761  4122.937013126807
7 1975  5160.787583888245  8588.685749537071  1324.656792467882 1882.9085555343866  3714.567827509241
7 1976  5318.332243405078  8965.147072487252 1297.3501871164472 1904.1759755207215  4048.837765736522
7 1977  5492.143362350697  9094.316027864194 1135.4201078296132 1969.4620089671464  4357.310188205705
7 1978  5937.844350699436 10001.145994446908  1061.520413626778   2170.26602093115  4245.886875463278
7 1979 6225.5206856465575 10613.215143423866 1073.9325069683393  2396.294182181272  4221.736009647186
7 1980  6363.332263233388 10741.745100434986 1271.9531345867902 2576.8199564838856  4838.955296242551
7 1981  6567.116333651066  10457.29954216322 1340.5060808886453 2661.3950452667546  5182.556250807869
7 1982  6442.744584838183 10306.678498790812 1363.9935805965235  2746.959316374569  5392.229676756674
7 1983  6623.702279401696 10424.528054544617 1557.4312814427053 2789.4941563472394 6090.6836938243505
end
label values country country
label def country 5 "chile", modify
label def country 6 "china", modify
label def country 7 "colombia", modify
As you can see, each country has data for value added for several sectors, but this is depicted columnwise. Value added for each sector are variables together with "country" and "year" . Currently my panel dataset has two identifiers: country and year. What I would like to do is modify my data in a way that this panel dataset consists of three identifiers:
1) country
2) sector
3) year

I assume this implies also that the value added data for the different sectors would not be depicted in columns anymore, but would be depicted row-wise, in the sense that each country has e.g. 5 datapoints for each year, namely for agriculture, mining, manufacturing, utilities, construction.

If anybody has an idea if and how this can be achieved, I would be very grateful.

Best,

Satya

Split to multiple files, merge and append

I have two files, each with 3 variables.

The first one has date, city, and score. The second one has date, city and ret. Each has 49 cities.

I want to split each files to 49 stata files, by city.

Then merge the 49 score files with the 49 ret files by city and date.

Then merge each of the 49 files with a fulldate file (only one variable date with all the 365/366 dates per year).

merge 1:1 date using fulldate



Then for the 49 files, each

replace city="cityname" if missing(city)


Then append all the 49 files to one file.

Anyone can help?

Great thanks!

Loop for running individual regression for sectors and then taking average of all coefficients

Hi,
I have to run three regressions , from three separate files and then take a mean of all explanatory variables (for national estimate) but cannot append the data as per the instruction. What can be the probable loop, also I will need RMSE, R squared, t values etc for the national figure. Please help

Generate an initiale value for each five year interval in unbalanced panel data

Hi dear Statalister,
I have an unbalanced panel data and i want to generate a new variable. This variable should take the first value of variable X but this first value should change every five year (it means take the value of X each five years) in a context of unbalanced panel data. Could some one help me please? Thanks a lot in advance.




Predictions using gologit2 - issues applying coeffcients to validation dataset

I'm new to Stata, so apologies in advance if this is a basic question. I am having some issues applying coefficients from my gologit2 model from my data, to a test dataset. I'm wondering whether I'm approaching this wrong!?
  • Dataset split into two - development and test datasets
  • Outcome variable is severity of a disease of interest (mild, moderate, severe)
  • GOLOGIT2 used due to issues with the proportional odds assumption being violated in OLOGIT
  • Stepwise ran using GOLOGIT2 (without autofit as read somewhere that SW and autofit should not be run together)
  • Final model run but with autofit.
  • When I try to apply coeefficients from this regression to my test dataset it does not work.I have tried the following options:
  1. I applied estimation results to test data and tried to use predict.
    • Code:
      • Use develop.dta
      • Gologit2 y i.x1 x2 x3, autofit
      • drop _all
      • use test.dta
      • predict p1 p2 p3
    • Issue is that dummy variables created in the development data that are not present in the test data
  2. After fitting the model on the development data, merge in test data in and run the predict command - Issue is that it wont predict for the merged data – probably due to a similar issue to the one above
  3. I pulled in the development and test data into one big dataset and tried to use if with gologit2 (i.e. to run the model on half of my data if var ==1 ), then appy coefficients to all.Message came up saying that I could not use if with gologit2.
Many thanks for your help!

Forcing my Event Study through R2000 in loops

Hey Everyone,

I'm currently trying to run a regression/prediction of some abnormal returns. All good so far, but during my iteration, the command constantly breaks if my data set (announcedates) do not match the respective trading dates. My dataset contains daily stock returns in a time span from 2009 until 2019. Furthermore I matched respective announce dates into the data set. Due to size, I cannot upload it here. Hope the code is enough, otherwise I will supply more information if required.
The dataset also grouped bond announcements intro groups which are sorted by G_ID. So for every G_ID i want to iterate the predicted return based on a given event window around the announce date.
The code I am running currently is this one:


sort G_ID date
by G_ID: gen datenum=_n
by G_ID: gen target=datenum if date==announcedate
egen td=min(target), by(G_ID)
drop target
gen dif=datenum-td

//create Event Window

by G_ID: gen event_window=1 if dif>=-1 & dif<=0
egen count_event_obs=count(event_window), by(G_ID)
by G_ID: gen estimation_window=1 if dif<-2 & dif>=-202
egen count_est_obs=count(estimation_window), by(G_ID)
replace event_window=0 if event_window==.
replace estimation_window=0 if estimation_window==.


//predicted returns

gen predicted_return=.
egen id=group(G_ID)
sum id
forvalues i=1(1)`r(max)' {
l id G_ID if id==`i' & dif==0
capture quietly reg return market_return if id==`i' & estimation_window==1
capture noisily predict p if id==`i'
capture noisily replace predicted_return = p if id==`i' & event_window==1
capture noisily drop p
}


// generate abnormal returns based on predictions for estimated return.*/

sort id date
gen abnormal_return=return-predicted_return if event_window==1
by id: egen cumulative_abnormal_return = sum(abnormal_return)

/*Test for statistical Significance
//70 is number of days in event window. If prolonged, change digets */

sort id date
by id: egen ar_sd = sd(abnormal_return)
gen test =(1/sqrt(count_event_obs)) * (cumulative_abnormal_return/ar_sd)
quietly list G_ID cumulative_abnormal_return test if dif==0

/*export data*/
export excel G_ID announcedate cumulative_abnormal_return test using "stats_group_announcedate01_large.xls" if dif==0, firstrow(variables) replace

//calculate the cumulative abnormal for all companies treated as a group
reg cumulative_abnormal_return if dif==0, robust

In the red marked area my code typically bugs around.

Before reading about capture function, I used this code:

forvalues i=1(1)`r(max)' {
l id G_ID if id==`i' & dif==0
quietly reg return market_return if id==`i' & estimation_window==1
predict p if id==`i'
replace predicted_return = p if id==`i' & event_window==1
drop p
}

which lead to constant breaks with the error R2000. Thus is tried to establish a capture command! What I want is for the code to display a zero as predicted return, instead of an error code which breaks the code. Then i want to analyse what exactly leads to the break (i.e. announcedate on a weekday, instead of a trading day). It appears useless to repeat entering the code over and over again, after it breaks and i thus just want it to make "zero changes" instead of stopping.

Would be really really happy if you could provide some help here! Hope im also complying with all rules applying here, as this is my very first post!!

Best

Kai

Merging multiple Spss data sets into STATA

Hi, I'm trying to merge a few UNICEF MICS survey data files together which are on Spss onto stata, for example file ch.sav with hh.sav. I have them all saved as individual STATA files. Can somebody help with the correct command please as nothing I'm trying is working!

Specification of funnel plot with -funnelcompar-

Hi Statalisters

I want to assess county differences in medication rates in Norway to detect outliers. For this purpose I want to use funnel plots as recommended by Spiegelhalter (see e.g. The Art of Statistics). I am trying this on publically available Norwegian data. However, I have worked on this for some time now and I am not sure whether I am specifying the funnel plot correctly, and I wonder if any of you have some input on this.

Specifically, I am having some trouble with (i) specifying the correct weights (the plot command only works with the -nowei- option), and (ii) the confidence intervals seem to narrow, although I suspect this may be due to sample size, not my specification of the funnel plot.

It's county data for a year so it's a small sample with 18 observations. The medication rate is by 1000 individuals per county. I've gathered population data per county, and made a population rate in 1000's so it's scaled to the medication rate.

I obtain the standard deviation of the medication rate by
Code:
 egen sd_medrate=sd(medrate1000)
The funnel plot command is then:
Code:
funnelcompar medrate1000 poprate1000 county sd_medrate, cont nowei
Which gives:
Array

Without -nowei- I get the following error message:
Code:
may not use noninteger frequency weights
As it's such a small sample, I'll include the total dataset in dataex:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long county float(county_pop medrate1000 poprate1000 sd_medrate)
 4  72856 6.37  72.856 1.6318192
12 107080 4.68  107.08 1.6318192
 2 108499 9.68 108.499 1.6318192
14 156494 4.76 156.494 1.6318192
13 168231 7.46 168.231 1.6318192
16 170377  6.4 170.377 1.6318192
 9 185216 5.56 185.216 1.6318192
 5 190709 8.24 190.709 1.6318192
17 231286 7.83 231.286 1.6318192
 8 236271 9.19 236.271 1.6318192
 7 251262 5.87 251.262 1.6318192
 3 257673 6.28 257.673 1.6318192
18 271662 7.59 271.662 1.6318192
15 422102 5.61 422.102 1.6318192
11 427947 6.51 427.947 1.6318192
 6 477175 4.79 477.175 1.6318192
 1 536499 5.32 536.499 1.6318192
10 586860 3.62  586.86 1.6318192
end
label values county fylke
label def fylke 1 "Akershus", modify
label def fylke 2 "Aust-Agder", modify
label def fylke 3 "Buskerud", modify
label def fylke 4 "Finnmark", modify
label def fylke 5 "Hedmark", modify
label def fylke 6 "Hordaland", modify
label def fylke 7 "Møre og Romsdal", modify
label def fylke 8 "Nordland", modify
label def fylke 9 "Oppland", modify
label def fylke 10 "Oslo", modify
label def fylke 11 "Rogaland", modify
label def fylke 12 "Sogn og Fjordane", modify
label def fylke 13 "Telemark", modify
label def fylke 14 "Troms", modify
label def fylke 15 "Trøndelag", modify
label def fylke 16 "Vest-Agder", modify
label def fylke 17 "Vestfold", modify
label def fylke 18 "Østfold", modify

Specification of funnel plot with -funnelcompar-

Hi Statalisters

I want to assess county differences in medication rates in Norway to detect outliers. For this purpose I want to use funnel plots as recommended by Spiegelhalter (see e.g. The Art of Statistics). I am trying this on publically available Norwegian data. However, I have worked on this for some time now and I am not sure whether I am specifying the funnel plot correctly, and I wonder if any of you have some input on this.

Specifically, I am having some trouble with (i) specifying the correct weights (the plot command only works with the -nowei- option), and (ii) the confidence intervals seem to narrow, although I suspect this may be due to sample size, not my specification of the funnel plot.

It's county data for a year so it's a small sample with 18 observations. The medication rate is by 1000 individuals per county. I've gathered population data per county, and made a population rate in 1000's so it's scaled to the medication rate.

I obtain the standard deviation of the medication rate by
Code:
egen sd_medrate=sd(medrate1000)
The funnel plot command is then:
Code:
funnelcompar medrate1000 poprate1000 county sd_medrate, cont nowei
Which gives:
Array

Without -nowei- I get the following error message:
Code:
may not use noninteger frequency weights
As it's such a small sample, I'll include the total dataset in dataex:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long county float(county_pop medrate1000 poprate1000 sd_medrate)
 4  72856 6.37  72.856 1.6318192
12 107080 4.68  107.08 1.6318192
 2 108499 9.68 108.499 1.6318192
14 156494 4.76 156.494 1.6318192
13 168231 7.46 168.231 1.6318192
16 170377  6.4 170.377 1.6318192
 9 185216 5.56 185.216 1.6318192
 5 190709 8.24 190.709 1.6318192
17 231286 7.83 231.286 1.6318192
 8 236271 9.19 236.271 1.6318192
 7 251262 5.87 251.262 1.6318192
 3 257673 6.28 257.673 1.6318192
18 271662 7.59 271.662 1.6318192
15 422102 5.61 422.102 1.6318192
11 427947 6.51 427.947 1.6318192
 6 477175 4.79 477.175 1.6318192
 1 536499 5.32 536.499 1.6318192
10 586860 3.62  586.86 1.6318192
end
label values county fylke
label def fylke 1 "Akershus", modify
label def fylke 2 "Aust-Agder", modify
label def fylke 3 "Buskerud", modify
label def fylke 4 "Finnmark", modify
label def fylke 5 "Hedmark", modify
label def fylke 6 "Hordaland", modify
label def fylke 7 "Møre og Romsdal", modify
label def fylke 8 "Nordland", modify
label def fylke 9 "Oppland", modify
label def fylke 10 "Oslo", modify
label def fylke 11 "Rogaland", modify
label def fylke 12 "Sogn og Fjordane", modify
label def fylke 13 "Telemark", modify
label def fylke 14 "Troms", modify
label def fylke 15 "Trøndelag", modify
label def fylke 16 "Vest-Agder", modify
label def fylke 17 "Vestfold", modify
label def fylke 18 "Østfold", modify

Panel data with fixed effects model having dummy/time-invariant variables in data set

Greetings,

I'm trying to conduct FE-model test on my panel data set. The panel consists of 116 companies, observed over 5 years. Within these companies I try to measure different characteristics of different managers (ex CEO) as independent variables and a set of control variables to see the effect on leverage (dependent variable). The 116 companies have different amount of managers presented, as to why I want to run separate regressions for each manager.

Furthermore, the variables are coded as following:
Dependent:
Leverage: as per average leverage for each year
Independent:
Gender: 0 Male, 1 Female (dummy)
Age: as per age by measuring point
Education: 1-4 depending on educational level
Experience: 0 if no experience, 1 if experience (dummy)
Tenure: as per tenure by measuring point
Misdem: 0 if none, 1 if (dummy)
Control variables:
Industry: 0-9 depending on type of industry (dummy) - unfortunately presented below are my old tests, showing industry as 1-10.
ROA: measured as %
Firm size: Ln(sales)

As noticed, across my 5 year time-period, there are many time-invariant variables (at least in regards of my panel), for example the Education variable does not vary (if the manager has educational level "2", this will not vary in my panel). All the dummy variables are also time invariant for each manager.

Investigating into how panel data regressions should be run, they can (as to what I understand is mostly common) be run using a Pooled OLS technique or panel data regression using either a fixed effects (fe) or random effects (re) model. As I also understood, panel data regressions are superior to Pooled OLS regressions, which leaves me with the choice between fe or re panel regression. From what I understood further, whether fe or re should be used is determined by a number of factors but mainly using a Hausman-test. To conduct the Hausman-test both regressions, fe and re, are run and then somehow compared to determine which is most fitting ones data - significant at 5% level means that fe should be prefered over re(?).

Anyway, to get to the problem..
I set up my panel using:
Code:
egen companynum = group(Company)
xtset companynum
xtset companynum Years, yearly
As I then try to run the fe regression using:
Code:
xtreg dep indep1 indep2 indep3 indep4 indep5 indep6 cont1 cont2 cont3 cont4 cont5, fe vce(robust)
or as in my case (using manager CEO):
Code:
xtreg Leverage i.CEOGen CEOAge i.CEOEdu i.CEOExp CEOTen CEOMis ROA i.Industry FirmSize, fe vce(robust)
I choose to vce(robust) because of potential auto-correlation and heteroskedacity tested with respectively Wooldridge and White tests.

I get the following result:
Array

From what I understood this problem (omitted variables) is because fe already accounts for time-invariance in the regression, which is why time-invariant variables don't work, or get knocked out/omitted regressions using fe..
Which presents my questions:
1. Is this possible to fix so I can run a fe regression followed by a Hausman-test? - or should I choose the re model / pooled ols anyway, even if probably Hausman-test would probably say that fe regression is prefered, because of the impossibility of running this data set as fe regression?
2. Is my regression correct in terms of using "i." on Education (CEOEdu variable) and having it coded as 1-4?
3. Is my regression and reasoning correct in general, have I missed any steps in regards of the preparation of conducting this type of regression?
4. Not related to the regression or problem - is there a simple way to get the regression from STATA into some form of presentation type or other document type?

Important to add is that my knowledge about statistics (and STATA) is highly limited and time is limited to acquire knowledge, hence I have turned to this forum of great expertise for help.

I would very much appreciate fast help with how I should command my regression to get the correct output for my reporting of the results and to analyze the results. An explanation to why one regression is used instead of another with regards to my data set, if this is the case, would also be highly appreciated.

Thank you in advance and best regards,

Generating a new variable based on several specified conditions

Dear Stata-Community,
I am currently working on a paper in which I want to create a variable that indicates the number of respondents that mentioned unemployment as most important problem and indicated the same or plus/minus one value on a left-right scale as the party under observation. So in the end, I want a variable for each party in each country that indicates how many respondents, that are (+/-1) close on the left-right scale, said unemployment is the most important problem. Unfortunately, my dataset only contains a country variable (country), a most important problem variable with the unemployment-specification (qpp1==70101), a self-placement variable (qpp13) and eight different party-placement variables for all countries (qpp14_1 to qpp14_8). So my problem is that the Swedish Green Party might be the variable qpp14_3 while this variable in Belgium means a conservative party. The dataset contains 25.000 individual respondents and I want for each of the in qpp14_1 to qpp_14_8 mentioned parties how many potentially close respondents (+/-1) indicated unemployment as most important problem. My question now is: Is there any possibility to automatize this process?
I am using stata version 15.1 and would be very grateful for any ideas.

Thanks in advance!
Robin

Test for differences in Medians for Multiple Variables Using qreg

Dear Stata Users:
I would like to Test for differences in Medians for Multiple Variables Using qreg following the Stata Journal
HTML Code:
https://journals.sagepub.com/doi/pdf/10.1177/1536867X1201200202
I would like to export the t statistics and p-values for quantile regression for multiple variables. How is it possible to do that. It is very tedious to do it one by one. Is it possible to do that with eststo and then export with esttab. What would be a proper code?

Code:
qreg var1 class
qreg var2 class
qreg var3 class
qreg var4 class
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(var1 var2 var3 var4 class)
2.1709347229330476  .02042863528691251     .03228491681068315  6.344270516206654 1
 5.487542064826069  .12843769095325996     .04757251167276653  6.154184831332533 1
 2.403442868218309  .14004722153361354    .056244318978395644  6.982910060182892 1
1.1505027133742167  .11100197329982031    .004023767790014441  6.811365516638235 1
1.8804746970472777   .5373991437510292    .007095957516877986  9.874810261649259 1
1.9808765550239236   .3227327984439582    .021833211767566252  9.931394505789136 1
 2.676107983414312   .2086538813493268     .07895762414439786  6.142981046239942 1
5.0250382942155865   .5440525877207101    .025750244090855447  5.636962241280868 0
 4.258091546463471 .012181876701455542     .04470620519153116  6.660379227959387 1
 2.197711032881432                   0     .03437963510075126 4.4299473733834605 0
2.2245278845356267   .4394672805861094    .009428517082833212  5.723343242641416 1
1.1452441380506346 .005852152919964551   .0037893836139718242 11.088995933771097 1
 .9484033697297157   .3734964247700097    .002009329861113992  7.095376932085215 1
1.1523751483206472   .2830885583049549    .014212063384959249  5.654406639014709 1
1.7533311612460127  .19266922499285918     .06038927853563682   7.63872652475961 1
 .8706634694165418  .27225441581137516    .013483301895410065 7.3137409182571105 1
 1.079994695584802  .24342634444177425     .00556080453072516  8.201428740938756 1
 .9901501157807883    .200563603986151    .008008362894517725  6.203767551444203 1
1.3157547495682214 .013003901170351105     .11967674566421503   8.84246002419529 1
1.6125047223917022  .27092330708216894  .00022581646766590453 10.475257746929474 1
   1.6695885509839   .7825286415711947  -.0016366612111292963  8.494743062578646 0
 .8480628518530803   .3766737329713641  -.0002713747199372005  7.579527636994385 1
 1.060821685855263   .2603442745643109   -.002459104030792259  9.143559338285831 1
1.6878696906442758  .11737663668433701   .0026116483332574844  7.910486424877584 1
 3.968033586295519  .09784727102453503     .04043281225865062    7.8916183004611 1
  .899243711630933  .12953298510293923    .002730703847420142  8.836453142906317 1
 1.876879945068383   .2996552112194602    .006820006046659643   8.30061876606964 0
 .9541869853059908   .3654661990765307    .017404474266007573   7.69546963611513 1
1.9644481201108408   .4032161241201265    .005893921591571983 11.974259250751283 0
2.5286603845167903   .3319836814821631    .012760344774700163 10.392128915166236 1
2.5399946760599135  .28070659154350697    .014606202798280617  6.371492181952594 0
1.9045555011585498   .2964226277064348     .03337520020575096  8.209402284791736 1
 5.716134885512472   .5292907790381521    .008094833449250688   8.67395024313201 0
1.3615053404450799  .14013686402856293      .0557818383280838 11.487299917331256 1
2.4918920664014497                   0    .022622035593839667 5.8137662863912745 1
 4.115151363312894  .41287287656029065   .0038089010156910734  7.650706871093464 0
 2.355544245111873  .18922108333108298   .0066966541804420205   7.86064405705401 0
1.1459177647714638   .3806578806414909    .011546484840845521  7.993001640282866 1
1.4456867284194304   .2235793293343836      .0307410848329395  6.388823556055131 1
 2.433206439905795  .05182226940660904    .005467731741770416  8.736676452032238 1
 .6718319519007932  .24684993964632423  -.0034781785701610136   8.62690498651182 0
 1.610265383568393 .003758584550491892   .0015988813762135522  8.218744836646748 0
1.5904666612063105  .08420437504195837    .004750224395124635  8.339667396550764 1
 .4844194771165337  .19147015156146804    .008096030121903125  8.126672457793893 1
1.1255137911908935 .013977875086425444   -.022882576630560036  5.852547247921313 0
1.0980541357873876  .23833167825223436   .0021414722790804503  9.602645938552133 1
 1.459342862276669   .2960699771294345    .026362620968550576  8.654420620275989 1
3.7872993641364756  .16344200892632452     .03434962682508268  6.424757186209322 1
1.3330632839020906  .14922273950851017    .028375058135984177 5.3653356230988205 1
1.7050250470489474  .09482475476419729    .044931304649582844  8.317028590089064 1
1.4095675648788928  .45225421663566984    .012633874174308985  9.654641503869545 1
 .9817325091888597  .19951911856495427     .01903986539661863  6.826789228492126 0
 4.749956883277682 .004365141127600385     .03350505532830546   8.57455905040462 1
2.6315519937451133  .24769642069198225     .04563033716614323  8.256789004740584 1
1.2951633934280147  .40049463644969263    .008107996872041815   6.28326437993576 1
1.6885666182075358  .12606567517413872    .007133636339187263  6.390119854398666 1
2.0852919852933036  .22075020161884867     .03615227896612162   8.04582799565937 1
 .6189007671401604  .25361495023248226    .010725521502892112  7.872453625672574 0
1.5426325594770125   .2503213115847204    .038414029355626496  6.398403568472147 1
 5.431799918831169  .48254364089775564     .03834164588528678  6.465522446681758 1
 .6169221507832088  .09853527536057495  -.0068559833290489135  5.904451351934881 1
1.3892754575052342 .054018197650424475     .02138990983282448  7.747180085196543 1
 3.208024624600127   .2938049183043822      -.018247490676978 7.8478625324739415 1
2.0495010093948625  .07491891129600903    .028160696657735153 5.4289595895385325 1
 1.914610189433304  .12734084304731733     .03212300106711626  7.359841852124484 0
 .4680346209512774  .07488053707720517   .0033831925393684814  8.738226678109067 1
1.5331495506412196   .2399151993404393     .02296684529768565  9.739909238559308 1
2.4155976127480647   .1974669905582754     .04859411600264234   8.57590766064048 1
1.3535190448409071   .5582551101016601   -.005989655296997654  9.011132073356997 1
1.8093710795578795   .3263061983574228     .02702264160363464  6.177338121345474 1
 2.413543812058275  .13404023535930262    .007393056878498134  9.294058900151573 1
1.7598221676427495  .35767105564133356    .011749799851318124    6.3289790974743 1
1.9380582068072627  .24758035817844862     .03609937184292087  7.932295030766125 0
2.0306460746550394  .24806585552336896    .021115980979310157 6.4785541880673865 1
1.8385283854866579  .19986105409416743    .020652414185113842 10.363061986291324 1
1.9736462512097805  .06855061601161756    .024047875579706748  6.750961108209461 0
1.9769853417901335  .33724378397180416    .017118646198720125  7.777863842906347 1
1.8545938219946896  .30055710669517777    .018918646200314113  7.636383979077606 1
 2.886666087988723  .15277429315835184     .03533725755090225 10.855454260862754 1
1.3807206988594998    .616143277023171    .003723889966898756  5.639606554437637 1
2.2361776526107477  .26794909161566116    .012921402895171477  7.630267068171986 1
  9.28029906542056  .27524229074889867     .04070484581497797  8.644002038279933 1
 .8217743575613246  .30930619871638637    .007741141789399162  5.845432488574447 1
1.3653890875928012  .22438623490829931     .01337300005543486   7.03656089770649 0
 2.800319755600815   .3067567966466924 -.00008855565723056941 10.430521115146327 1
 3.927241192562631                   0     .05982923895641445  8.717433576126178 1
 .5205014326016568  .17731842326756325   .0007743689109328184    8.6229545667191 0
 .7988620988725066  .18910471323051906    .002006304018415759  5.249142878535657 0
1.1528772598589017  .27211150188875866    .013282133925851286  8.443672585190909 1
1.0824540711537936  .31280606062952193     .01893637035804013  7.857182870229327 1
1.1188457318289626   .2452208335452753    .010463449736842892 6.5064146534896405 1
 4.414392777315159  .30744329614941296    .009213276808564675 6.2400224017506565 1
 1.863419219993175   .2169520322068658    .018969571690629208  8.835908851864653 1
1.2937478285552086  .05945033965279936   .0030191359943169205 11.814059157345177 1
1.2836064650560342  .19992189011901249    .003265378320655072  7.041395040734514 0
 .9734652518048188  .41056808441462483    .011505039458516832  4.915093087375768 1
2.3414603025370666  .04547157992309307    .007938720680141068  8.953379403377928 0
 5.007207397093378                   .     .04567133374239973  9.570989408736887 1
1.2348910387473195   .6933934494285644    .015182351673371145  6.540061276160971 1
 13.08921134631084   .3304803372220662     .03931358971740739  7.134521062086644 1
end

Generate mean and median test results in Stata

Dear Stata Users:
I am trying to generate mean and median test results in Stata. Thanks to Kain Chen
HTML Code:
 http://kaichen.work/?p=1218
I have a sample code. However it is not working in stata 14.
Could you please help.

Code:
local vars  var1 var2 var3 var4
local group class

foreach v in `vars' {
  di "`v'"
  ttest `v', by(`group')
  local mean_`v'_mean_0=round(r(mu_1),.001)
  local mean_`v'_mean_1=round(r(mu_2),.001)
  local mean_`v'_diff=`mean_`v'_mean_1'-`mean_`v'_mean_0'
  local mean_`v'_p=r(p)
}
 
foreach v in `vars' {
  sum `v' if `group'==0, detail
  local p50_`v'_p50_0=round(r(p50),.001)
  sum `v' if `group'==1, detail
  local p50_`v'_p50_1=round(r(p50),.001)
  ranksum `v', by(`group')
  local p50_`v'_n_0=r(N_1)
  local p50_`v'_n_1=r(N_2)
  local p50_`v'_diff=`p50_`v'_p50_1'-`p50_`v'_p50_0'
  local p50_`v'_p=2*normprob(-abs(r(z)))
}
 
qui {
  noi di _newline
  noi di "{hline 115}"
  noi di _col(15) "{c |} `group' = 1" ///
         _col(45) "{c |} `group' = 0" ///
         _col(75) "{c |} Diff"
  noi di _col(16) "{hline 100}"
  noi di _col(15) "{c |} Mean" ///
         _col(25) "{c |} Median" ///
         _col(35) "{c |} N" ///
         _col(45) "{c |} Mean" ///
         _col(55) "{c |} Median" ///
         _col(65) "{c |} N" ///
         _col(75) "{c |} Mean" ///
         _col(85) "{c |} P" ///
         _col(95) "{c |} Median" ///
         _col(105) "{c |} P"
  noi di "{hline 115}"
  foreach v in `vars' {
    noi di %12s abbrev("`v'",12) ///
           _col(15) "{c |}" %8.3f `mean_`v'_mean_1' ///
           _col(25) "{c |}" %8.3f `p50_`v'_p50_1' ///
           _col(35) "{c |}" %8.0f `p50_`v'_n_1' ///
           _col(45) "{c |}" %8.3f `mean_`v'_mean_0' ///
           _col(55) "{c |}" %8.3f `p50_`v'_p50_0' ///
           _col(65) "{c |}" %8.0f `p50_`v'_n_0' ///
           _col(75) "{c |}" %8.3f `mean_`v'_diff' ///
           _col(85) "{c |}" %8.3f `mean_`v'_p' ///
           _col(95) "{c |}" %8.3f `p50_`v'_diff' ///
           _col(105) "{c |}" %8.3f `p50_`v'_p' 
  }
  noi di "{hline 115}"
}
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(var1 var2 var3 var4 class)
2.1709347229330476  .02042863528691251     .03228491681068315  6.344270516206654 1
 5.487542064826069  .12843769095325996     .04757251167276653  6.154184831332533 1
 2.403442868218309  .14004722153361354    .056244318978395644  6.982910060182892 1
1.1505027133742167  .11100197329982031    .004023767790014441  6.811365516638235 1
1.8804746970472777   .5373991437510292    .007095957516877986  9.874810261649259 1
1.9808765550239236   .3227327984439582    .021833211767566252  9.931394505789136 1
 2.676107983414312   .2086538813493268     .07895762414439786  6.142981046239942 1
5.0250382942155865   .5440525877207101    .025750244090855447  5.636962241280868 0
 4.258091546463471 .012181876701455542     .04470620519153116  6.660379227959387 1
 2.197711032881432                   0     .03437963510075126 4.4299473733834605 0
2.2245278845356267   .4394672805861094    .009428517082833212  5.723343242641416 1
1.1452441380506346 .005852152919964551   .0037893836139718242 11.088995933771097 1
 .9484033697297157   .3734964247700097    .002009329861113992  7.095376932085215 1
1.1523751483206472   .2830885583049549    .014212063384959249  5.654406639014709 1
1.7533311612460127  .19266922499285918     .06038927853563682   7.63872652475961 1
 .8706634694165418  .27225441581137516    .013483301895410065 7.3137409182571105 1
 1.079994695584802  .24342634444177425     .00556080453072516  8.201428740938756 1
 .9901501157807883    .200563603986151    .008008362894517725  6.203767551444203 1
1.3157547495682214 .013003901170351105     .11967674566421503   8.84246002419529 1
1.6125047223917022  .27092330708216894  .00022581646766590453 10.475257746929474 1
   1.6695885509839   .7825286415711947  -.0016366612111292963  8.494743062578646 0
 .8480628518530803   .3766737329713641  -.0002713747199372005  7.579527636994385 1
 1.060821685855263   .2603442745643109   -.002459104030792259  9.143559338285831 1
1.6878696906442758  .11737663668433701   .0026116483332574844  7.910486424877584 1
 3.968033586295519  .09784727102453503     .04043281225865062    7.8916183004611 1
  .899243711630933  .12953298510293923    .002730703847420142  8.836453142906317 1
 1.876879945068383   .2996552112194602    .006820006046659643   8.30061876606964 0
 .9541869853059908   .3654661990765307    .017404474266007573   7.69546963611513 1
1.9644481201108408   .4032161241201265    .005893921591571983 11.974259250751283 0
2.5286603845167903   .3319836814821631    .012760344774700163 10.392128915166236 1
2.5399946760599135  .28070659154350697    .014606202798280617  6.371492181952594 0
1.9045555011585498   .2964226277064348     .03337520020575096  8.209402284791736 1
 5.716134885512472   .5292907790381521    .008094833449250688   8.67395024313201 0
1.3615053404450799  .14013686402856293      .0557818383280838 11.487299917331256 1
2.4918920664014497                   0    .022622035593839667 5.8137662863912745 1
 4.115151363312894  .41287287656029065   .0038089010156910734  7.650706871093464 0
 2.355544245111873  .18922108333108298   .0066966541804420205   7.86064405705401 0
1.1459177647714638   .3806578806414909    .011546484840845521  7.993001640282866 1
1.4456867284194304   .2235793293343836      .0307410848329395  6.388823556055131 1
 2.433206439905795  .05182226940660904    .005467731741770416  8.736676452032238 1
 .6718319519007932  .24684993964632423  -.0034781785701610136   8.62690498651182 0
 1.610265383568393 .003758584550491892   .0015988813762135522  8.218744836646748 0
1.5904666612063105  .08420437504195837    .004750224395124635  8.339667396550764 1
 .4844194771165337  .19147015156146804    .008096030121903125  8.126672457793893 1
1.1255137911908935 .013977875086425444   -.022882576630560036  5.852547247921313 0
1.0980541357873876  .23833167825223436   .0021414722790804503  9.602645938552133 1
 1.459342862276669   .2960699771294345    .026362620968550576  8.654420620275989 1
3.7872993641364756  .16344200892632452     .03434962682508268  6.424757186209322 1
1.3330632839020906  .14922273950851017    .028375058135984177 5.3653356230988205 1
1.7050250470489474  .09482475476419729    .044931304649582844  8.317028590089064 1
1.4095675648788928  .45225421663566984    .012633874174308985  9.654641503869545 1
 .9817325091888597  .19951911856495427     .01903986539661863  6.826789228492126 0
 4.749956883277682 .004365141127600385     .03350505532830546   8.57455905040462 1
2.6315519937451133  .24769642069198225     .04563033716614323  8.256789004740584 1
1.2951633934280147  .40049463644969263    .008107996872041815   6.28326437993576 1
1.6885666182075358  .12606567517413872    .007133636339187263  6.390119854398666 1
2.0852919852933036  .22075020161884867     .03615227896612162   8.04582799565937 1
 .6189007671401604  .25361495023248226    .010725521502892112  7.872453625672574 0
1.5426325594770125   .2503213115847204    .038414029355626496  6.398403568472147 1
 5.431799918831169  .48254364089775564     .03834164588528678  6.465522446681758 1
 .6169221507832088  .09853527536057495  -.0068559833290489135  5.904451351934881 1
1.3892754575052342 .054018197650424475     .02138990983282448  7.747180085196543 1
 3.208024624600127   .2938049183043822      -.018247490676978 7.8478625324739415 1
2.0495010093948625  .07491891129600903    .028160696657735153 5.4289595895385325 1
 1.914610189433304  .12734084304731733     .03212300106711626  7.359841852124484 0
 .4680346209512774  .07488053707720517   .0033831925393684814  8.738226678109067 1
1.5331495506412196   .2399151993404393     .02296684529768565  9.739909238559308 1
2.4155976127480647   .1974669905582754     .04859411600264234   8.57590766064048 1
1.3535190448409071   .5582551101016601   -.005989655296997654  9.011132073356997 1
1.8093710795578795   .3263061983574228     .02702264160363464  6.177338121345474 1
 2.413543812058275  .13404023535930262    .007393056878498134  9.294058900151573 1
1.7598221676427495  .35767105564133356    .011749799851318124    6.3289790974743 1
1.9380582068072627  .24758035817844862     .03609937184292087  7.932295030766125 0
2.0306460746550394  .24806585552336896    .021115980979310157 6.4785541880673865 1
1.8385283854866579  .19986105409416743    .020652414185113842 10.363061986291324 1
1.9736462512097805  .06855061601161756    .024047875579706748  6.750961108209461 0
1.9769853417901335  .33724378397180416    .017118646198720125  7.777863842906347 1
1.8545938219946896  .30055710669517777    .018918646200314113  7.636383979077606 1
 2.886666087988723  .15277429315835184     .03533725755090225 10.855454260862754 1
1.3807206988594998    .616143277023171    .003723889966898756  5.639606554437637 1
2.2361776526107477  .26794909161566116    .012921402895171477  7.630267068171986 1
  9.28029906542056  .27524229074889867     .04070484581497797  8.644002038279933 1
 .8217743575613246  .30930619871638637    .007741141789399162  5.845432488574447 1
1.3653890875928012  .22438623490829931     .01337300005543486   7.03656089770649 0
 2.800319755600815   .3067567966466924 -.00008855565723056941 10.430521115146327 1
 3.927241192562631                   0     .05982923895641445  8.717433576126178 1
 .5205014326016568  .17731842326756325   .0007743689109328184    8.6229545667191 0
 .7988620988725066  .18910471323051906    .002006304018415759  5.249142878535657 0
1.1528772598589017  .27211150188875866    .013282133925851286  8.443672585190909 1
1.0824540711537936  .31280606062952193     .01893637035804013  7.857182870229327 1
1.1188457318289626   .2452208335452753    .010463449736842892 6.5064146534896405 1
 4.414392777315159  .30744329614941296    .009213276808564675 6.2400224017506565 1
 1.863419219993175   .2169520322068658    .018969571690629208  8.835908851864653 1
1.2937478285552086  .05945033965279936   .0030191359943169205 11.814059157345177 1
1.2836064650560342  .19992189011901249    .003265378320655072  7.041395040734514 0
 .9734652518048188  .41056808441462483    .011505039458516832  4.915093087375768 1
2.3414603025370666  .04547157992309307    .007938720680141068  8.953379403377928 0
 5.007207397093378                   .     .04567133374239973  9.570989408736887 1
1.2348910387473195   .6933934494285644    .015182351673371145  6.540061276160971 1
 13.08921134631084   .3304803372220662     .03931358971740739  7.134521062086644 1
end

Saturday, September 28, 2019

Type mismatch issues with generating / replacing

Hi all,

I'm currently trying to create a dummy variable, inlaborforce, using the below code:

gen inlaborforce = 0
replace inlaborforce = 1 if labforce == "Yes, in the labor force"
replace inlaborforce = . if labforce == "NIU"

I'm seeing similar if not identical syntax in guides / forums, but I'm get a type mismatch error. Tried to destring but both variables are numerical so that didn't work.

Sorry if this is too basic a question. Thanks,

Create several nested regression tables in the same file with asdoc

Dear all,

I want to create several nested regression tables in the same file and do it with option reset in asdoc. However, the created file can't open with MS Word. My code is as follows. If anybody has an idea as to how I can do this, I would be very grateful

Code:
* Table1
asdoc reg y1 x1 x2, nest save(myfile)
asdoc reg y2 x1 x2, nest save(myfile)

* Table2
asdoc reg y1 x3 x4, nest reset save(myfile)
asdoc reg y2 x3 x4, nest save(myfile)
Best,

Jishuang

Model selection to study individual fund attribute's impact

hi,

I have daily data for large number of exchange traded funds for 10 years . I am trying to examine the fund attributes such as expense ratio, size etc .. impact on fund performance. Among the attributes, I have one independent variable that is time-invariant. Can you pls advise on the below -
1. Which model is appropriate in this situation 1. Fama McBeth 2. Fixed Effect or something else (fixed effect throwing collinearity error )
2. I have on lagged dependent variable in the right side - can I use Sys GMM ?
3. Is it possible to use Fama French factor models in this situation to get the risk-adjusted return ?

Any thoughts - greatly appreciated.

Thank you !!

-xtlogit- Four-Way Dummy Variable Interaction Term Interpretation

I have a two-year panel data set of patients who all received a treatment at the end of year 1. The panel is split into eight 3-month periods, four before and four after. Patients are coded as having poor kidney function, diabetes and cardiovascular disease - all as dummy variables. The dependent binary variable is a 0 if they were not admitted to hospital in the quarter, and a 1 if they were. I am not interested in controlling for the diseases listed, I am interested in understanding how these different patient groups reacted to the treatment. If I code my treatment as a dummy variable, my input is:

Admitted to Hospital = age + sex + i.kidney#i.diabetes#i.cardiovascular#i.treatment.

I have run a RE xtlogit (using or command) setting patient as the id code and quarter as the time code. My output is quite long - but I understand that all interpretations are related back to the base category. My output says that compared to the base category of "no disease in pre-treatment year", my category of "no disease in post-treatment year" is 1.33 (significant at the 5%). Can I say that these patients were 1.33 times as likely to be admitted to hospital in each quarter, in the year after treatment? I would deeply appreciate any help anyone has to offer.

Having Variables in X-Axis side by side, rather than in different groups..

Hello all, probably a very basic question but I am brand new to stata and trying to teach it to myself (Stata 16)

I am trying to create a bar graph of frequency of BMI groups by sex.

I run the code:

graph bar (count), over(bmigrp) over(sex)

which gets the results I want, but puts all the male results on one "window" of the graph, and the female results on the other side.

Is there a way to get the results to be side by side for each category in different colours?

Thank you!