Dear all,
I am trying to conduct a restricted cubic spline to assess the potential non-linear association between a continuous exposure variable and an incident event.
The point is that I am using the cumulative average method suggested by Walter Willett et al., in " Dietary Fat and coronary heart disease: a comparison of approaches for adjusting for total energy intake and modelling repeated dietary measurements".
For that purpose, after calculate the cumulative average consumption of my exposure variable, I have to reshape the dataset from wide to long format. After that, I run the mkspline stata code as well as the xblc command.
when I plot the results, the graph does not look as I thought. Actually the sape is a little bit wird.
However, if I do it only with the baseline exposure (without the cumulative average method) and I do not reshape the databse (because I do not need it), the graph that I obtain is pretty good.

Could some one tell me if we cannot run the restricted cubic spline with a database in long format (which is the format that the cumulative method needs)?.

Following you can see the the stata do-file that I am using for that purpose. You will see how the graph looks like.

Code:
clear
input float(id evento naranja0 naranja1 naranja2 naranja3 tiempo)
 1 0 200 225 200 150 1.5
 2 0 120 200 150 120 7.5
 3 1 150 125 200   . 1.6
 4 0 120 200 150 120 6.9
 5 1 120   . 150 120   3
 6 0 200 250 150 120 7.3
 7 1 200 125 300 120   4
 8 1 250 300 300 120   5
 9 0 120 200 150 120 6.7
10 0 120   . 150 120 6.9
11 0 120 225 200 150 1.5
12 0 120   . 150 120   7
13 1 150 125 200   . 1.6
14 0 120 200 150 120 6.9
15 1 120   . 150 120   3
16 0 200 250 150 120 7.3
17 1 200 125 300 120   4
18 1 250   . 300 120   5
end

Code:
*generate variable fup*
gen fup0=tiempo
replace fup0=1 if tiempo>=1

foreach x of numlist 1/3 {
    gen fup`x'=tiempo if tiempo>`x' & tiempo<`x'+1
    replace fup`x'=`x'+1 if tiempo>=`x'+1
    replace fup`x'=. if tiempo<=`x'
        }
replace fup3=tiempo if tiempo>3

sum fup0-fup3

* Event in each visit.

gen d0=0
replace d0=1 if evento==1 & tiempo<=1


foreach x of numlist 1/3 {
    gen d`x'=0
    replace d`x'=1 if evento==1 & tiempo>`x' & tiempo<=`x'+1
    replace d`x'=. if tiempo<=`x'
        }
        
        
             
*cumulative*
gen pru_naranja0= naranja0
egen pru_naranja1=rowmean(naranja0 naranja1)
egen pru_naranja2=rowmean(naranja0 naranja1 naranja2)
egen pru_naranja3=rowmean(naranja0 naranja1 naranja2 naranja3)


foreach x of numlist 0/3 {
    xtile t_prunaranja`x'=pru_naranja`x', nq(3)
    tab t_prunaranja`x'
        }        


*******************************************************************************
Restricted cubic splines
*******************************************************************************
foreach n of numlist 0/3 {
mkspline pru_naranja_sp`n' = pru_naranja`n', cubic nknots(3) displayknots
mat knots = r(knots)
}
 

rename pru_naranja_sp01 pru_naranja_1sp0
rename pru_naranja_sp02 pru_naranja_2sp0
rename pru_naranja_sp11 pru_naranja_1sp1
rename pru_naranja_sp12 pru_naranja_2sp1
rename pru_naranja_sp21 pru_naranja_1sp2
rename pru_naranja_sp22 pru_naranja_2sp2
rename pru_naranja_sp31 pru_naranja_1sp3
rename pru_naranja_sp32 pru_naranja_2sp3


*********************************************************************************************************
reshaping the database
*********************************************************************************************************
reshape long d fup  pru_naranja pru_naranja_1sp pru_naranja_2sp, i(id) j(year)
   
drop if fup ==.

rename pru_naranja_1sp  pru_naranja_sp1
rename pru_naranja_2sp  pru_naranja_sp2

**************************************
stset dataset
*************************************
stset fup, id(id) failure(d==1)
codebook  pru_naranja_sp*, c
stcox pru_naranja_sp1 pru_naranja_sp2
testparm pru_naranja_sp1 pru_naranja_sp2 // Is orange predicting the hazard ratios of diabetes?
testparm pru_naranja_sp2 // Is a spline model for orange predicting hazard ratios of diabetes better compared to a simpler linear-responsemodel? The p-value is small, so the answer is yes.There is evidence of strong departure from linearity (P-value < 0.001).


levelsof pru_naranja, local(levels)
xblc pru_naranja_sp1 pru_naranja_sp2, cov(pru_naranja) at(`r(levels)')  ref(150)  eform gen(ptorange hr Lb Ub)


* graph the result
twoway (line hr ptorange, sort) (line Lb Ub ptorange, sort lc(black black) lp(- -)), ///
legend(off) yscale(log) ylab(0.25 .5 1 2 4 8 16 32) ///
xtitle(Oranges) ytitle(Hazard ratio) yline(1) ///
title(Effect of Oranges) subtitle(Adjusted for treatment)


Dr. Nerea Becerra