I have a dataset of workers, and I want to divide them into clusters based on some observables, one of them is categorical (industry). I'm trying to do k-mean clustering. Instead of using many dummy variables for the different categories, I built a similarity measure between each pair of industries and want to use it in the k-mean algorithm. My question is how can I use this pre-existing similarity matrix in the k-mean computation, together with other continuous variables (e.g., education). The way I'm doing it right now is first to collapse the similarity matrix into 2 or 3 dimensions using multidimensional scaling process and then use the results in the k-mean method. Is there a way to use the similarity matrix directly?
Related Posts with k-mean clustering using existing similarity matrix
How to drop sequence of observations when the observations are zero sequentlyI am now cleaning my firm-level data. I have panel data which contains 19,150 observations/year for …
Changing Type of Local VariablesI have an imported dataset where I need to rename all of the variables and want to use the imported …
How to put all graphs in a same pdf file?Hello. I would like to know if it is possible to make Stata combine all pdf generated by the loop b…
Unit root test on panel dataHi! I have a panel data with 155 countries and 43 years. I tried the command Code: xtunitroot ht…
cgmreg - storing the number of observationsI have a problem in pubIishing the estimation tables using esttab as I am unable to store the statis…
Subscribe to:
Post Comments (Atom)
0 Response to k-mean clustering using existing similarity matrix
Post a Comment