At worst one can have
1. one or more responses
2. one or more panels or groups
3. several times of observation.
Even reduced versions of this problem (one response only OR one group only) can be frustrating. Here I focus mainly on one response and several groups. The ambiguity of the term panel is a little disconcerting: is group of observations or part of graph implied? I will use panel in the graphical sense and group otherwise.
Most of the solutions hinge, directly or indirectly, on the command twoway line (which can be just line), but even the special extra commands tsline and xtline can be disappointing, at least in my experience.
I doubt that there is a single solution. At least, I have tried several, including
sparkline (SSC, 2013) https://www.stata.com/statalist/arch.../msg00922.html Examples at https://www.statalist.org/forums/for...le-time-series https://www.statalist.org/forums/for...hart-correctly
multiline (SSC, 2017) https://www.statalist.org/forums/for...ailable-on-ssc
fabplot (SSC, 2018) https://www.statalist.org/forums/for...ailable-on-ssc
Recently I was playing with some examples for which none of these was quite right and was pondering writing a different command. Then I realised that linkplot (SSC) was sufficiently general to help. linkplot isn't specifically geared to time series data at all, but that doesn't bite.
linkplot was posted on SSC in 2003 and announced at https://www.stata.com/statalist/arch.../msg00194.html but the email-based server then didn't allow graphical illustrations and that post just hints at time series applications.
Yesterday I realised that although I had updated the command in 2007 I hadn't updated the version on SSC, but Kit Baum kindly and promptly updated the files. Thus even if you previously installed a copy of linkplot an update is still in order for the syntax below to work.
The Grunfeld data are a sandbox for problems of this kind. Here's the good news: if a graphical method won't work well with the Grunfeld data, with just 10 panels, then you're probably doomed if your data are more complicated.
Let's look at the investment variable in the Grunfeld data. We'll flag key points, some of which apply much more generally.
#1: Always consider logarithmic scale for a response. That's really old news for some, but goodness knows how many people don't seem to realise how helpful that can be.
#1': If zeros are present too, or even negative values, and some kind of transformation is called for, just possibly you could use square roots, cube roots, sign(y) * log(1 + abs(y)), asinh(y), etc.
Code:
webuse grunfeld, clear set scheme s1color label var invest "investment" xtline invest, ysc(log) xtline invest, overlay ysc(log)
Array
#2: Stata default axis labels for logarithmic scales aren't terribly smart. We could just reach in and tell Stata what we want. For other technique see https://www.stata-journal.com/articl...article=gr0072 and/or niceloglabels (Stata Journal). The graphs above show the problem.
#3. Even with about 10 groups, graphs with a group in each panel may not work well. In principle, the data are shown clearly, but effective comparison is difficult.
#4. Even with about 10 groups, a superimposed graph may not work well either. A large fraction of the total graph area becomes legend and the mental "back and forth" required to relate graph to legend and legend to graph is often too much like hard work.
How to do better? The default linkplot doesn't at first look especially promising,even with a logarithmic scale. Note that linkplot knows nothing about any tsset or xtset specification, but here the group identifier is fed to the link() option, which tells the program what should be connected. Minor trickery within the code ensures that incomplete panels won't be connected spuriously.
Code:
linkplot invest year, link(company) ysc(log)
The default of
Code:
twoway connect
here.
There are several small and large tweaks that can be made to improve the plot. Let me give the remaining code all at once, show the resulting graphs and then draw the morals.
Code:
local endlabels addplot(scatter invest year if year == 1954, ms(none) mla(company) mlabc(blue)) gen odd = mod(company, 2) == 0 linkplot invest year, recast(line) link(company) `endlabels' ysc(log) by(odd, legend(off) note("") compact) xla(, labsize(small)) subtitle("", fcolor(none) nobox nobexpand) yla(1000 100 10 1, ang(h)) xsc(r(1935 1956)) xtitle("") linkplot invest year, recast(line) link(company) `endlabels' ysc(log) by(odd, legend(off) note("") compact) xla(, labsize(small)) subtitle("", fcolor(none) nobox nobexpand) yla(1000 100 10 1, ang(h)) xsc(r(1935 1956)) asyvars ytitle(investment) xtitle("")
Array
#5: Trailing end labels. If you care which panel is which, you need to show identifiers, but a legend may be dispensable. One of my slogans is: Lose the legend! Kill the key! (if you can). The Grunfeld data are especially easy (identifiers 1 to 10), but don't sniff at easy answers when available. To show those end labels, you just need an additional scatter plot fed to addplot() with no marker symbol, but a marker label. The default marker label position of 3 pm is exactly right. You might to stretch the x axis a little.
#5': Related technique is to show start labels, as well or alternatively. Usually the most recent value seems to offer the best place for the identifier.
#6: A compromise between one graph showing all and one panel for each group is to bundle groups together . Sometimes there is a natural or convenient way to do that (e.g. US states might be grouped by region). Here as the identifiers run from 1 (large company) to 10 (small company) dividing identifiers into odd and even reduces the overlap between series. 3 panels side by side could just about work if the series aren't long. 2 x 2 = 4 panels loses some comparability as some panels are on different rows.
#7: Sometimes the grouping doesn't need to be explained. Note that while we are using by() we can suppress the note() that appears by default and even the subtitles.
#8: Suppress x axis titles like "year". Really, who needs them?
#9: We just reach in and ask for y axis labels 1 10 100 1000 ourselves. 1 3 10 30 100 300 1000 is a good alternative if you want more labels. Much more discussion in the paper cited in #2.
#10: linkplot has an asyvars option that automatically colours companies separately. Here that might seem a complication too far, but liking the idea doesn't imply that we need a legend too. That's what the end labels do. Conversely, some might want to colour the end labels to match. (No; there isn't an automated way to do that in linkplot, at least not yet.)
0 Response to Fighting spaghetti: some small devices using linkplot
Post a Comment