https://www.stata.com/statalist/arch.../msg00193.html
and not much else. This is a surprise to me.
In what follows, my focus is entirely on what you can do on or with scatter plots -- or points in two dimensions -- and not with one dimension or with three dimensions or more.
For whatever reasons, convex hulls no longer seem popular or even known about in statistical graphics.
I should back up, as some people will already be lost if they do not know about convex hulls, or at least do not know the term. The idea is likely to be familiar or at least immediate once exemplified and it may summon distant memories of childhood pastimes in which you connected the dots and Cinderella, or a horse, or something equally interesting emerged from a puzzle book.
Here is a convex hull as produced by
Code:
ssc install cvxhull sysuse auto, clear set scheme s1color cvxhull mpg weight, hull(1) noreport
Array
So, a convex hull is the smallest convex polygon including all the points in a set. Some points are on the hull and the others are inside.
A standard thought experiment is to imagine the points on the scatter plot as pins on a board. Summon up a giant rubber band (https://en.wikipedia.org/wiki/Rubber_band), stretch it to include all the points, and then let it go. The hull is now marked by the band.
OK, but why should you find this interesting or useful? It's when there are two or more groups that this becomes of note. I will show some more results before giving the small sales pitch, although if you need the pitch after the pictures, then I have probably failed.
cvxhull does not (does not promise to) give you all you may find helpful but it does leave behind variables that are essential for further processing. Each hull is presented by two variables defining different sides of the hull.
Thus we can do things like this:
Array Array
To spell it out:
0. This is pretty easy to explain. In my experience, the story of pins on a board nails it easily for people new to the idea. Thanks to cvxhull it is easy to implement.
1. Convex hulls look good shown as areas contained. This enhances perception of point patterns as wholes.
2. Transparency as introduced in Stata 15 is invaluable whenever, as will be common in interesting cases, hulls overlap.
3. If the reaction is that the hull is unduly influenced by outliers -- indeed being on the hull is one way to identify outliers -- then we can carry out peeling. Onion-like, inside each convex hull lies another that is the convex hull of the remaining points (until we run out of data points). The second graph shows the second hulls.
4. In the code below getting the sort order right is crucial detail.
5. Old news to some, but orange and blue work well together.
Here is the complete code for the last two graphs:
Code:
sysuse auto, clear ssc install cvxhull set scheme s1color cvxhull mpg weight , group(foreign) noreport hull(2) sort weight mpg local opts legend(off) aspect(1) yla(, ang(h)) ytitle("`: var label mpg'") twoway rarea _cvxh1l _cvxh1r weight if foreign, color(orange%20) sort /// || rarea _cvxh1l _cvxh1r weight if !foreign, color(blue%20) sort /// || scatter mpg weight if foreign, ms(Oh) mc(orange) /// || scatter mpg weight if !foreign, ms(+) mc(blue) `opts' name(G1, replace) twoway rarea _cvxh2l _cvxh2r weight if foreign, color(orange%20) sort /// || rarea _cvxh2l _cvxh2r weight if !foreign, color(blue%20) sort /// || scatter mpg weight if foreign, ms(Oh) mc(orange) /// || scatter mpg weight if !foreign, ms(+) mc(blue) `opts' name(G2, replace)
Detail: The contact address on the help file for cvxhull is out-of-date. Allan has moved twice since then.
0 Response to Convex hulls on scatter plots
Post a Comment