Hi. I want to create 15 groups of firms based on their wage distribution. I have a 10-year long monthly panel data, in which the unit of observation is the worker. Some workers appear more than once, some change jobs, etc. I am thinking of using `cluster kmeans` command on Stata, on an alternative data in which the unit of observation is each firm, and `x1 x2 x3`,... are the mean wages of each worker through the whole panel. Some firms have many workers so there will be as many variables as workers in the biggest firm. The others would present missing values from variable `xi` to `xj` with `i` being the number of workers in firm i, and `j` the number of workers in the biggest firm.
This option is not very good because I will only work with means.
Do you have a better idea for exploiting the richness of the panel?
I there a better way to cluster firms on panel data?
Thanks!