Hello,

I am trying to look for the best n given:


set obs 200
gen x = runiform(0,5)
gen U = rnormal(0,100)
gen m = exp(x) - 4*(x^2)
gen Y = m+U

and the equation in the attached image:

//note the loops below does not work for some reason but it gets the idea across

forvalues i in 1(1)20 {
gen cosx`i' = cos(x*`i')
}

forvalues i in 1(1)20 {
gen sinx`i' = sinx*(`i')
}

forvalues i in 1/20 {
gen csx`i' = (cosx`i')+(sinx`i')
}

// Y(n=1)
regress Y csx1
predict Y1

//Y(n=5)
regress Y csx1-csx5
predict Y2

//Y(n=20)
regress Y csx1-csx20
predict Y3

scatter Y1 Y2 Y3 m x, legend(order(1 "Y1" 2 "Y2" 3 "Y3" 4 "m"))

//with a scatter plot looking like the attached scatter.png:

-------------------------------------------------------------------------------------------------------------------------

How should I perform LOOCV in Stata to find the best n? I tried help in Stata but found no information on it.

(The choice of n is kind of like finding the bandwidth in kernel regression but I'm not sure how to approach it with the syntax)

Thanks,
Rayne