Hello statalisters,
I've been trying to trying to calculate the h index for a large dataset consisting of scientists. The h index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times. The dataset looks somewhat like this:
authorid | year | articleid | citation | hindex | c_hindex | t_hindex |
A | 1990 | 1 | 7 | 5 | 5 | 15 |
A | 1990 | 2 | 5 | 5 | 5 | 15 |
A | 1990 | 3 | 13 | 5 | 5 | 15 |
A | 1990 | 4 | 12 | 5 | 5 | 15 |
A | 1990 | 5 | 17 | 5 | 5 | 15 |
A | 1991 | 6 | 11 | 4 | 7 | 15 |
A | 1991 | 7 | 9 | 4 | 7 | 15 |
A | 1991 | 8 | 19 | 4 | 7 | 15 |
A | 1991 | 9 | 15 | 4 | 7 | 15 |
A | 1992 | 10 | 14 | 3 | 9 | 15 |
A | 1992 | 11 | 4 | 3 | 9 | 15 |
A | 1992 | 12 | 3 | 3 | 9 | 15 |
A | 1992 | 13 | 7 | 3 | 9 | 15 |
A | 1992 | 14 | 5 | 3 | 9 | 15 |
A | 1992 | 15 | 4 | 3 | 9 | 15 |
A | 1992 | 16 | 11 | 3 | 9 | 15 |
A | 1992 | 17 | 17 | 3 | 9 | 15 |
A | 1993 | 18 | 15 | 4 | 15 | |
A | 1993 | 19 | 17 | 4 | 15 | |
A | 1993 | 20 | 18 | 4 | 15 | |
A | 1993 | 21 | 11 | 4 | 15 | |
A | 1994 | 22 | 3 | 15 | ||
A | 1994 | 23 | 15 | 15 | ||
A | 1994 | 24 | 14 | 15 | ||
A | 1994 | 25 | 17 | 15 | ||
A | 1994 | 26 | 13 | 15 | ||
A | 1994 | 27 | 12 | 15 | ||
A | 1994 | 28 | 6 | 15 | ||
A | 1994 | 29 | 15 | 15 | ||
A | 1994 | 30 | 5 | 15 | ||
B | 1990 | 31 | 11 | |||
B | 1991 | 32 | 11 | |||
B | 1991 | 33 | 4 | |||
B | 1991 | 34 | 4 | |||
B | 1991 | 35 | 3 | |||
B | 1992 | 36 | 9 | |||
B | 1992 | 37 | 22 | |||
B | 1992 | 38 | 2 | |||
B | 1992 | 39 | 9 | |||
B | 1992 | 40 | 4 | |||
B | 1992 | 41 | 37 | |||
B | 1992 | 42 | 9 | |||
B | 1992 | 43 | 8 | |||
B | 1992 | 44 | 3 | |||
B | 1993 | 45 | 13 | |||
B | 1993 | 46 | 9 | |||
B | 1993 | 47 | 7 | |||
B | 1993 | 48 | 3 | |||
B | 1993 | 49 | 10 | |||
B | 1993 | 50 | 9 | |||
B | 1994 | 51 | 1 | |||
B | 1994 | 52 | 2 | |||
B | 1994 | 53 | 6 | |||
B | 1994 | 54 | 6 | |||
B | 1994 | 55 | 7 |
*generate h_index for each year, flow
bysort authorid year : egen temp = rank(-citation), unique
bysort authorid year citation : egen rank = max(temp)
by authorid year : egen hindextemp = max(rank) if citation >= rank
bysort authorid year : egen hindex = max(hindextemp)
drop rank temp hindextemp
What I'm having a hard time with is calculating the cumulative h-index of each author-year (c_hindex, column 6). For instance, author A has 7 articles that have been cited at least in 1991, therefore the cumulative h index for A in 1991 is 7.
Could anybody help me up with the command for the cumulative h-index?
Thank you very much in advance!
Hyeonjin
0 Response to Calculating Cumulative H-index with Stata?
Post a Comment