After reading the excellent "Stata tip 110: How to get the optimal k-means cluster solution, Stata Journal (2012) 12, Number 2, pp. 347-351" from Anna Makles I copied and paste the code written on the paper. The STATA do file is:
PHP Code:
use physed, clear
local list1 " flexibility speed strength "
foreach v of varlist `list1´ {
egen z_`v´ = std(`v´)
local list2 "z_flexibility z_speed z_strength"
forvalues k = 1(1)20 {
cluster kmeans `list2´, k(`k´) start(random(123)) name(cs`k´)
}
* WSS matrix
matrix WSS = J(20,5,.)
matrix colnames WSS = k WSS log(WSS) eta-squared PRE
* WSS for each clustering
forvalues k = 1(1)20 {
scalar ws`k´ = 0
foreach v of varlist `list2´ {
quietly anova `v´ cs`k´
scalar ws`k´ = ws`k´ + e(rss)
}
matrix WSS[`k´, 1] = `k´
matrix WSS[`k´, 2] = ws`k´
matrix WSS[`k´, 3] = log(ws`k´)
matrix WSS[`k´, 4] = 1 - ws`k´/WSS[1,2]
matrix WSS[`k´, 5] = (WSS[`k´-1,2] - ws`k´)/WSS[`k´-1,2]
}
matrix list WSS
local squared = char(178)
_matplot WSS, columns(2 1) connect(l) xlabel(#10) name(plot1, replace) nodraw noname
_matplot WSS, columns(3 1) connect(l) xlabel(#10) name(plot2, replace) nodraw noname
_matplot WSS, columns(4 1) connect(l) xlabel(#10) name(plot3, replace) nodraw noname ytitle({&eta}`squared´)
_matplot WSS, columns(5 1) connect(l) xlabel(#10) name(plot4, replace) nodraw noname
graph combine plot1 plot2 plot3 plot4, name(plot1to4, replace)
Any idea about what is happening?
Thank you very much.
Jorge
0 Response to Problems when running optimal k-means cluster solution program
Post a Comment