Hi all,

I'm replicating the study of Barua et al. called Earnings Management Using Discontinued Operations. I'm new to Stata but I managed to get all the formulas working in Stata. However when I do my last regression it gives me very high coeffecients, low T-stats and low R2. Could someone maybe look at my code and add suggestions/edit it. I'm trying to calculate Core Earnings using the model in the paper of McVay. After making all the variables I ran a regression by SIC_2 (transformed the 4digit SIC into 2digits) and fiscal year. Using the coeffecients and residual from those regressions I than calculate the expected_CoreEarnings. After that I calculate the Unexpected_CoreEarnings and run a final regression to see the results by fiscal year.

The paper mentions
"We measure the expected value of core earnings and the change in core earnings for firm i using the predicted values from Equations (1) and (2), respectively. We estimate each equation by industry-year, excluding firm i from the estimation.". I did not know how to make the "firm i" so I just regressed on SIC_2 and fiscal year. Is this a problem?

All help is valued! Here is a link to my data an do file: https://drive.google.com/drive/folde...SU?usp=sharing

This is my Do file:
Code:
cd "xxxxxx"
use Data, clear
ssc install winsor
merge m:1 gvkey fyear using Restated_Data , nogenerate
//4-digit sic to 2-digit
gen sic_2 = real(substr(sic,1,2))
format sic_2 %02.0f
//Replace data with restated data(if available) for firms reporting discontinued operations
//replace at=restated_at if do!=0 & do!=. & restated_at!=.
//replace ch=restated_ch if do!=0 & do!=. & restated_ch!=.
//replace ivst=restated_ivst if do!=0 & do!=. & restated_ivst!=.
//replace dltt=restated_dltt if do!=0 & do!=. & restated_dltt!=.
//replace dlc=restated_dlc if do!=0 & do!=. & restated_dlc!=.
//replace ceq=restated_ceq if do!=0 & do!=. & restated_ceq!=.
//replace pstk=restated_pstk if do!=0 & do!=. & restated_pstk!=.
//replace mib=restated_mib if do!=0 & do!=. & restated_mib!=.
//replace ibc=restated_ibc if do!=0 & do!=. & restated_ibc!=.
//replace oancf=restated_oancf if do!=0 & do!=. & restated_oancf!=.
//replace xidoc=restated_xidoc if do!=0 & do!=. & restated_xidoc!=.
replace sale=restated_sale if do!=0 & do!=. & restated_sale!=.
replace xsga=restated_xsga if do!=0 & do!=. & restated_xsga!=.
replace cogs=restated_cogs if do!=0 & do!=. & restated_cogs!=.
//Firms not reporting discontinued operations
gen CE=oibdp/sale if do == 0
//Firms reporting discontinued operations
replace CE=(sale-cogs-xsga)/sale if do!=0 & do!=.
//Variables for McVay(2006)
gen NOA=(at-(ch+ivst))-(at-(dltt+dlc)-(ceq+pstk)-mib)
gen ACCRUALS = (ibc-(oancf-xidoc))/sale
sort gvkey fyear
by gvkey: gen lag_CE=CE[_n-1]
by gvkey: gen lag_CE_2=CE[_n-2]
by gvkey: gen lag_NOA=NOA[_n-1]
by gvkey: gen lag_SALES=sale[_n-1]
by gvkey: gen lag_ACCRUALS=ACCRUALS[_n-1]
by gvkey: gen lag_at=at[_n-1]
gen change_CE = CE-lag_CE
gen change_CE_2 = lag_CE-lag_CE_2
gen change_SALES = (sale-lag_SALES)/lag_SALES
gen ATO = sale/((NOA+lag_NOA)/2)
sort gvkey fyear
by gvkey: gen lag_ATO=ATO[_n-1]
gen change_ATO = ATO-lag_ATO
gen NEG_SALES=0
replace NEG_SALES=change_SALES if change_SALES <0

//drop empty & fyear 2011
drop if CE == . | ATO == . | ACCRUALS == . | change_SALES == . | change_ATO == . | NOA == . | change_CE == . | fyear == 2011

//Generate beta coefficients and residual
statsby ssr=e(rss) _b, by(sic_2 fyear) saving(Equation1, replace): regress CE lag_CE ATO lag_ACCRUALS ACCRUALS change_SALES NEG_SALES
statsby ssr=e(rss) _b, by(sic_2 fyear) saving(Equation2, replace): regress change_CE lag_CE change_CE_2 change_ATO lag_ACCRUALS ACCRUALS change_SALES NEG_SALES
//Merge generated betas and residuals from equation 1
merge m:1 sic_2 fyear using Equation1 , nogenerate
gen exp_CE= _b_cons + _b_lag_CE * lag_CE + _b_ATO * ATO + _b_lag_ACCRUALS * lag_ACCRUALS + _b_ACCRUALS * ACCRUALS + _b_change_SALES * change_SALES + _b_NEG_SALES * NEG_SALES + _eq2_ssr
drop _b_cons _b_ATO _b_lag_CE _b_lag_ACCRUALS _b_ACCRUALS _b_change_SALES _b_NEG_SALES _eq2_ssr
//Merge generated betas and residuals from equation 2
merge m:1 sic_2 fyear using Equation2 , nogenerate
gen exp_change_CE= _b_cons + _b_lag_CE * lag_CE + _b_change_CE_2 * change_CE_2 + _b_change_ATO * change_ATO + _b_lag_ACCRUALS * lag_ACCRUALS + _b_ACCRUALS * ACCRUALS + _b_change_SALES * change_SALES + _b_NEG_SALES * NEG_SALES + _eq2_ssr
drop _b_cons _b_change_ATO _b_lag_CE _b_change_CE_2 _b_lag_ACCRUALS _b_ACCRUALS _b_change_SALES _b_NEG_SALES _eq2_ssr
//Drop empty
drop if exp_CE == .
drop if exp_change_CE == .
//Generate variables for equation 3 & 4
gen UE_CE = CE - exp_CE
gen UE_change_CE = change_CE - exp_change_CE
gen DO= (do * (-1))/sale
gen SIZE= ln(at)
gen BM= ceq/mkvalt
gen OCF= oancf/lag_at
gen ROA= ib/((at+lag_at)/2)
//regress by year
eststo clear
bysort fyear : eststo: quietly reg UE_CE DO SIZE BM ACCRUALS OCF ROA
esttab using test.html, replace mtitles r2 ar2 label