I have a dataset of workers, and I want to divide them into clusters based on some observables, one of them is categorical (industry). I'm trying to do k-mean clustering. Instead of using many dummy variables for the different categories, I built a similarity measure between each pair of industries and want to use it in the k-mean algorithm. My question is how can I use this pre-existing similarity matrix in the k-mean computation, together with other continuous variables (e.g., education). The way I'm doing it right now is first to collapse the similarity matrix into 2 or 3 dimensions using multidimensional scaling process and then use the results in the k-mean method. Is there a way to use the similarity matrix directly?
Related Posts with k-mean clustering using existing similarity matrix
Identifying event date with at least two years of zero valueDear Stata Users, Can you please help me identify the following setting. I need to create a variabl…
Graph displaying part years and part quarters on the y axis,Hi all, I am trying to create a graph like the one below, but I would like to show one density for 2…
Combining date variables in panel dataDear Stata users, I have panel data, with multiple entries per ID for every day they partook in exe…
Estimating sample attrition in panel dataset (xtset id)Hi there, I am using StataMP 15.1. I would like to estimate the sample attrition in my panel datas…
New version of dtastamp on SSCThanks once again to Kit Baum, a new version of the dtastamp package is now available for download f…
Subscribe to:
Post Comments (Atom)
0 Response to k-mean clustering using existing similarity matrix
Post a Comment