I read Stata manual for clustering analysis on this topic, and followed suggestion to specify -start(kr(12345))- in the codes to ensure results reproducibility. However, every time I ran the code on the same dataset I got slightly different results and so did for Calinski-Harabasz pseudo-F index (stopping rule). I did not sort dataset in anyway, so the order of observation was kept the same each time.
The code I drafted:
set seed 12345
capture drop mgroup3
cluster kmedians meanscore_sds meanscore_gst meanscore_gsr if complete == 1, ///
k(3) name(mgroup3) mea(L2) s(kr(12345))
cluster stop mgroup3, rule (calinski)
table mgroup3 if complete == 1, mis
capture drop mgroup3
cluster kmedians meanscore_sds meanscore_gst meanscore_gsr if complete == 1, ///
k(3) name(mgroup3) mea(L2) s(kr(12345))
cluster stop mgroup3, rule (calinski)
table mgroup3 if complete == 1, mis
+---------------------------+
| | Calinski/ |
| Number of | Harabasz |
| clusters | pseudo-F |
|-------------+-------------|
| 3 | 1206.61 |
+---------------------------+
----------------------
mgroup3 | Freq.
----------+-----------
1 | 1,135
2 | 546
3 | 1,090
----------------------
| | Calinski/ |
| Number of | Harabasz |
| clusters | pseudo-F |
|-------------+-------------|
| 3 | 1206.61 |
+---------------------------+
----------------------
mgroup3 | Freq.
----------+-----------
1 | 1,135
2 | 546
3 | 1,090
----------------------
capture drop mgroup3
cluster kmedians meanscore_sds meanscore_gst meanscore_gsr if complete == 1, ///
k(3) name(mgroup3) mea(L2) s(kr(12345))
cluster stop mgroup3, rule (calinski)
table mgroup3 if complete == 1, mis
cluster kmedians meanscore_sds meanscore_gst meanscore_gsr if complete == 1, ///
k(3) name(mgroup3) mea(L2) s(kr(12345))
cluster stop mgroup3, rule (calinski)
table mgroup3 if complete == 1, mis
+---------------------------+
| | Calinski/ |
| Number of | Harabasz |
| clusters | pseudo-F |
|-------------+-------------|
| 3 | 1174.77 |
+---------------------------+
----------------------
mgroup3 | Freq.
----------+-----------
1 | 999
2 | 683
3 | 1,089
----------------------
Thank you.
Mengmeng
0 Response to Varied clustering analysis results from the same dataset
Post a Comment