Thanks as always to Kit Baum, a small utility listfirst has been posted on SSC.
listfirst lists the first # observations, either generally or (more usefully) those satisfying an if
condition.
Optionally it will also list the last # observations, either generally or (more usefully) those satisfying
an if condition.
# defaults to 10. Optionally it may be changed, and need not be equal for first and last subsets.
If a variable list is not specified, it defaults to all variables in the dataset. Otherwise one or more variables
may be specified.
Output may be limited by what exists in the dataset. In particular, no observations will be listed if none
exist that satisfy a specified if condition. That is not considered an error.
Many readers will be familiar with utilities in Unix or other operating systems allowing you to see the head (top or first lines) or tail (bottom or end lines) of text files. Similar features have been folded into various statistical software. In Stata most but not quite all the possibilities yield easily to list or edit when the concern is with a dataset.
You may have special interest in what is at either end of a dataset. Perhaps more commonly the point is just to see a small sample of the dataset, especially in a large dataset. Perhaps a full list or opening edit or browse seems over the top.
Examples use the auto dataset bundled with Stata, which has 74 observations.
The simplest applications of listfirst are trivial.
listfirst by itself is equivalent to list in 1/10.
listfirst mpg by itself is equivalent to list mpg in 1/10.
listfirst mpg, first(5) is equivalent to list mpg in 1/5.
Such examples don't take you beyond what is already easy with list. However,
listfirst mpg, last
is equivalent to
list mpg if inrange(_n, 1, 10) | inrange(_n, 65, 74)
or more generally to
list mpg if inrange(_n, 1, 10) | inrange(_n, _N - 9, _N).
Either is harder to work out or to type.
listfirst mpg, first(5) last(5)
is similarly more challenging.
The use of an if condition is where listfirst scores.
listfirst mpg if foreign lists the first 10 observations satisfying the condition
-- which is more difficult otherwise without working out where they are in the dataset, or knowing that for
another reason. However, a useful trick is
list mpg if foreign & sum(foreign) <= 10
given that foreign is a (0, 1) indicator variable. That generalises to any true-or-false expression. See
(e.g.)
Cox, N.J. 2007. How can I identify first and last occurrences systematically in panel data?
http://www.stata.com/support/faqs/da...t-occurrences/
for more on such ideas.
listfirst mpg if foreign, last
shows the last 10 observations too.
The history here deserves a little note.
A command listsome was posted on Statalist on 10 April 2008 in
https://www.stata.com/statalist/arch.../msg00448.html
in response to a question from Malcolm Wardlaw earlier that day.
But that command was never documented or made public beyond Statalist.
Independently Robert Picard posted a listsome command on SSC that was first announced on 18 August 2014 in
https://www.statalist.org/forums/for...f-observations
Robert's command has a strong feature of offering random samples, which is not attempted here.
This listfirst command has two small virtues, being limited, and therefore simple; and showing "last" values
too if that is also wanted. I happily yield the command name to Robert.
More or less the same question arises from time to time, recently https://www.statalist.org/forums/for...et-a-condition
The version now on SSC differs slightly from that posted in that just mentioned thread.
0 Response to listfirst
Post a Comment