A search for discussions of convex hulls in Stata forums or outlets reveals various programs from 1995, 1997, 1998 in the Stata Technical Bulletin (all using Stata's old graphics), an ado file cvxhull posted by Allan Reese in 2004

https://www.stata.com/statalist/arch.../msg00193.html

and not much else. This is a surprise to me.

In what follows, my focus is entirely on what you can do on or with scatter plots -- or points in two dimensions -- and not with one dimension or with three dimensions or more.

For whatever reasons, convex hulls no longer seem popular or even known about in statistical graphics.

I should back up, as some people will already be lost if they do not know about convex hulls, or at least do not know the term. The idea is likely to be familiar or at least immediate once exemplified and it may summon distant memories of childhood pastimes in which you connected the dots and Cinderella, or a horse, or something equally interesting emerged from a puzzle book.

Here is a convex hull as produced by

Code:
ssc install cvxhull
sysuse auto, clear 
set scheme s1color 
cvxhull mpg weight, hull(1) noreport

Array

So, a convex hull is the smallest convex polygon including all the points in a set. Some points are on the hull and the others are inside.

A standard thought experiment is to imagine the points on the scatter plot as pins on a board. Summon up a giant rubber band (https://en.wikipedia.org/wiki/Rubber_band), stretch it to include all the points, and then let it go. The hull is now marked by the band.

OK, but why should you find this interesting or useful? It's when there are two or more groups that this becomes of note. I will show some more results before giving the small sales pitch, although if you need the pitch after the pictures, then I have probably failed.

cvxhull does not (does not promise to) give you all you may find helpful but it does leave behind variables that are essential for further processing. Each hull is presented by two variables defining different sides of the hull.

Thus we can do things like this:



Array Array


To spell it out:

0. This is pretty easy to explain. In my experience, the story of pins on a board nails it easily for people new to the idea. Thanks to cvxhull it is easy to implement.

1. Convex hulls look good shown as areas contained. This enhances perception of point patterns as wholes.

2. Transparency as introduced in Stata 15 is invaluable whenever, as will be common in interesting cases, hulls overlap.

3. If the reaction is that the hull is unduly influenced by outliers -- indeed being on the hull is one way to identify outliers -- then we can carry out peeling. Onion-like, inside each convex hull lies another that is the convex hull of the remaining points (until we run out of data points). The second graph shows the second hulls.

4. In the code below getting the sort order right is crucial detail.

5. Old news to some, but orange and blue work well together.

Here is the complete code for the last two graphs:


Code:
sysuse auto, clear
ssc install cvxhull
set scheme s1color 
cvxhull mpg weight , group(foreign) noreport hull(2)
sort weight mpg 
local opts legend(off) aspect(1) yla(, ang(h)) ytitle("`: var label mpg'")

twoway rarea _cvxh1l _cvxh1r weight if foreign, color(orange%20) sort /// 
|| rarea _cvxh1l _cvxh1r weight if !foreign, color(blue%20) sort      ///
|| scatter mpg weight if foreign, ms(Oh) mc(orange)                   ///
|| scatter mpg weight if !foreign, ms(+) mc(blue) `opts' name(G1, replace)

twoway rarea _cvxh2l _cvxh2r weight if foreign, color(orange%20) sort ///
|| rarea _cvxh2l _cvxh2r weight if !foreign, color(blue%20) sort      ///
|| scatter mpg weight if foreign, ms(Oh) mc(orange)                   ///
|| scatter mpg weight if !foreign, ms(+) mc(blue) `opts' name(G2, replace)

Detail: The contact address on the help file for cvxhull is out-of-date. Allan has moved twice since then.