Hello everybody,

this is not strictly a technical question, but more one about how to find an appropriate visualization for multidimensional data.

I found one way to approach this in stata is using weights in scatterplots to adjust markersize.
However, I found the result looked kinda odd and the actual marker sizes did not really seem to be a proportional representation of the underlying weights.
Apparently the algorithm behind uses some kind of smoothing so marker sizes do not get out of control in presence of outliers.

This is what the manual suggests. In some cases this may be misleading, however. Now, Nick Cox also brought up this point in this older post: https://www.stata.com/statalist/arch.../msg01143.html

He also mentioned there are better ways to display trivariate data. But I couldn't really come up with a better idea for myself.
So, I thought maybe the statalisters would have suggestions how to approach such a graphics problem?


Maybe it's easier to reason about this using an example, so here the one from the manual:

Code:
sysuse census, clear

generate drate = divorce / pop18p

label var drate "Divorce rate"

scatter drate medage [w=pop18p] if state!="Nevada", msymbol(Oh)
        note("Stata data excluding Nevada"
        "Area of symbol proportional to state's population aged 18+")
Array




Best
Boris