Fighting spaghetti: some small devices using linkplot

A common kind of question on Statalist concerns plotting multiple time series. Spaghetti -- tangled lines that can hardly be distinguished -- is an ever-present graphical danger.

At worst one can have

1. one or more responses

2. one or more panels or groups

3. several times of observation.

Even reduced versions of this problem (one response only OR one group only) can be frustrating. Here I focus mainly on one response and several groups. The ambiguity of the term panel is a little disconcerting: is group of observations or part of graph implied? I will use panel in the graphical sense and group otherwise.

Most of the solutions hinge, directly or indirectly, on the command twoway line (which can be just line), but even the special extra commands tsline and xtline can be disappointing, at least in my experience.

I doubt that there is a single solution. At least, I have tried several, including

sparkline (SSC, 2013) https://www.stata.com/statalist/arch.../msg00922.html Examples at https://www.statalist.org/forums/for...le-time-series https://www.statalist.org/forums/for...hart-correctly

multiline (SSC, 2017) https://www.statalist.org/forums/for...ailable-on-ssc

fabplot (SSC, 2018) https://www.statalist.org/forums/for...ailable-on-ssc

Recently I was playing with some examples for which none of these was quite right and was pondering writing a different command. Then I realised that linkplot (SSC) was sufficiently general to help. linkplot isn't specifically geared to time series data at all, but that doesn't bite.

linkplot was posted on SSC in 2003 and announced at https://www.stata.com/statalist/arch.../msg00194.html but the email-based server then didn't allow graphical illustrations and that post just hints at time series applications.

Yesterday I realised that although I had updated the command in 2007 I hadn't updated the version on SSC, but Kit Baum kindly and promptly updated the files. Thus even if you previously installed a copy of linkplot an update is still in order for the syntax below to work.

The Grunfeld data are a sandbox for problems of this kind. Here's the good news: if a graphical method won't work well with the Grunfeld data, with just 10 panels, then you're probably doomed if your data are more complicated.

Let's look at the investment variable in the Grunfeld data. We'll flag key points, some of which apply much more generally.

#1: Always consider logarithmic scale for a response. That's really old news for some, but goodness knows how many people don't seem to realise how helpful that can be.

#1': If zeros are present too, or even negative values, and some kind of transformation is called for, just possibly you could use square roots, cube roots, sign(y) * log(1 + abs(y)), asinh(y), etc.

Code:

webuse grunfeld, clear
set scheme s1color

label var invest "investment"

xtline invest, ysc(log)

xtline invest, overlay ysc(log)

Array
Array

#2: Stata default axis labels for logarithmic scales aren't terribly smart. We could just reach in and tell Stata what we want. For other technique see https://www.stata-journal.com/articl...article=gr0072 and/or niceloglabels (Stata Journal). The graphs above show the problem.

#3. Even with about 10 groups, graphs with a group in each panel may not work well. In principle, the data are shown clearly, but effective comparison is difficult.

#4. Even with about 10 groups, a superimposed graph may not work well either. A large fraction of the total graph area becomes legend and the mental "back and forth" required to relate graph to legend and legend to graph is often too much like hard work.

How to do better? The default linkplot doesn't at first look especially promising,even with a logarithmic scale. Note that linkplot knows nothing about any tsset or xtset specification, but here the group identifier is fed to the link() option, which tells the program what should be connected. Minor trickery within the code ensures that incomplete panels won't be connected spuriously.

Code:

linkplot invest year, link(company)  ysc(log)

Array

The default of

Code:

twoway connect

could be a good idea for short series (not least panels of lengths 2, say before and after, start and end, and so forth), but it just contributes noise
here.

There are several small and large tweaks that can be made to improve the plot. Let me give the remaining code all at once, show the resulting graphs and then draw the morals.

Code:

local endlabels addplot(scatter invest year if year == 1954, ms(none) mla(company) mlabc(blue))

gen odd = mod(company, 2) == 0

linkplot invest year, recast(line) link(company) `endlabels' ysc(log) by(odd, legend(off) note("") compact) xla(, labsize(small)) subtitle("", fcolor(none) nobox nobexpand) yla(1000 100 10 1, ang(h)) xsc(r(1935 1956)) xtitle("")  

linkplot invest year, recast(line) link(company) `endlabels' ysc(log) by(odd, legend(off) note("") compact) xla(, labsize(small)) subtitle("", fcolor(none) nobox nobexpand) yla(1000 100 10 1, ang(h)) xsc(r(1935 1956)) asyvars  ytitle(investment)  xtitle("")

Array
Array

#5: Trailing end labels. If you care which panel is which, you need to show identifiers, but a legend may be dispensable. One of my slogans is: Lose the legend! Kill the key! (if you can). The Grunfeld data are especially easy (identifiers 1 to 10), but don't sniff at easy answers when available. To show those end labels, you just need an additional scatter plot fed to addplot() with no marker symbol, but a marker label. The default marker label position of 3 pm is exactly right. You might to stretch the x axis a little.

#5': Related technique is to show start labels, as well or alternatively. Usually the most recent value seems to offer the best place for the identifier.

#6: A compromise between one graph showing all and one panel for each group is to bundle groups together . Sometimes there is a natural or convenient way to do that (e.g. US states might be grouped by region). Here as the identifiers run from 1 (large company) to 10 (small company) dividing identifiers into odd and even reduces the overlap between series. 3 panels side by side could just about work if the series aren't long. 2 x 2 = 4 panels loses some comparability as some panels are on different rows.

#7: Sometimes the grouping doesn't need to be explained. Note that while we are using by() we can suppress the note() that appears by default and even the subtitles.

#8: Suppress x axis titles like "year". Really, who needs them?

#9: We just reach in and ask for y axis labels 1 10 100 1000 ourselves. 1 3 10 30 100 300 1000 is a good alternative if you want more labels. Much more discussion in the paper cited in #2.

#10: linkplot has an asyvars option that automatically colours companies separately. Here that might seem a complication too far, but liking the idea doesn't imply that we need a legend too. That's what the end labels do. Conversely, some might want to colour the end labels to match. (No; there isn't an automated way to do that in linkplot, at least not yet.)

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Fighting spaghetti: some small devices using linkplot
Fighting spaghetti: some small devices using linkplot

0 Response to Fighting spaghetti: some small devices using linkplot

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Fighting spaghetti: some small devices using linkplot Fighting spaghetti: some small devices using linkplot

Related Posts with Fighting spaghetti: some small devices using linkplot

0 Response to Fighting spaghetti: some small devices using linkplot

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Fighting spaghetti: some small devices using linkplot
Fighting spaghetti: some small devices using linkplot