What’s the point of the h-index? UPDATED

UPDATE: I’ve increased the sample size of EEB scientists I used in the analysis.

——————————————————-

Over at the Dynamic Ecology blog yesterday, Jeremy Fox posted an interesting analysis of which metrics correlate with the chances of early career researchers in ecology and evolutionary biology (EEB) gaining an interview for an academic post in North America. Spoiler alert: none of them correlate, except the number of job applications you submit.

These metrics include number of papers published, number of first author papers, number of large (>$100,000) grants held, number of years post-doc, and h-index. Nada, zilch, nothing, nowt is significantly correlated. Which is good: as Jeremy (and the stream of commenters) discuss, it means that interview panels are looking roundly at individuals and what they can offer a university department, and not relying on (sometimes dubious) metrics.

Which brings us to the h-index…. Jeremy linked to an old post of mine called “How does a scientist’s h-index change over time?“, a piece that was far and away my most viewed post last year (and second-most viewed post in 2015). This suggests that there’s still a huge “appetite” for the h-index, in terms of understanding what it is and how it can/should (or cannot/should not) be used. Even before the Dynamic Ecology post came out I was planning to update it and give examples where I think it might be useful, so this seems like a good time to do that.

Opinions on the h-index vary hugely. Some of the links in my original post were to writings by scientists who really like the idea of being able to use it to track the academic impact of an individual (or at least some measure of it). Others despise it, and indeed all academic metrics, as pernicious and potentially dangerous to science – see David Colquhoun’s video on this topic, for instance.

I’m somewhere in the middle – I recognise the weaknesses of the h-index, but I also think that it’s measuring something, even if the something that it’s measuring may not be directly translatable into a measure of “quality” or “impact”, and especially not “employability” or “worthy of promotion” (and I would certainly never countenance using the h-index as a the sole measure of the latter two).

So when is the h-index useful? Well one use is as a personal tracker of one’s own standing or contribution within a field, assessing the trajectory of a career, and perhaps gauging when it’s time to apply for promotion (at least in the UK system which is a less transparent process than in North America, or at least that’s my impression). To illustrate this I’ve collated the h-indexes and years since first publication for 72 EEB scientists using Google Scholar (GS). I used GS rather than Web of Science (WoS) as, although GS is less conservative, WoS seems to be becoming noticeably less accurate; for example it’s recently assigned to me chapters on which I was not an author but which are included in a book that I co-edited. Another advantage of GS, of course, is that it’s publicly available and not pay walled.

It’s long been known that a scientist’s h-index should increase over their professional lives, and indeed that’s what we find if we plot number of years since first publication against an individual’s h-index:

h-index-graph

It’s a fairly strong correlation, though with a lot of scatter (something Jeremy noted in his blog) and it suggests that EEB scholars accrue their h-index at a rate of about 1.6 papers per year, on average, though with a big range (0.3 to 4.2 papers per year). One (albeit fanciful*) way to think about this graph is that it’s analogous to a Hertzsprung–Russell (HR) diagram in astronomy, where, as they age, stars shift position predictably on a plot of colour versus magnitude. In a similar way, as EEB scientists age professionally, their position on this plot moves in ways that may be predictable from their scientific output.

There’s a lot of structure in HR diagrams, including the famous Main Sequence, where most stars lie, as well as stellar evolutionary tracks for Giants, Super Giants, White Dwarfs, etc. In this modest sample I think we’re starting to see similar structure, with individuals lying far above or below the “h-index Main Sequence”, indicating that they are accruing greater or fewer citations than might be expected. UPDATE: In particular, three individuals who are “Super Giants” (to use the astronomical terminology) and lie far above the Main Sequence. Carlos Herrera makes an interesting point in the comments (below) about self-selection in GS which could mean that there are far fewer people with low h-indexes represented than we might expect.

One of the things that could be explored using these type of data is exactly why it is that this is happening: is it a question of where they are based, or their nationality, or where they publish, their sub-field, or what? One easy analysis to do is to assess whether there is a difference between female and male scientists, as follows:

h-index-graph-mf

Previous research has suggested that women on average receive fewer citations for their papers than men (see this 2013 study in Nature for instance) and this graph gives some support to that idea, though I’ve not formally tested the difference between the two lines. What is also interesting is that the R-squared values are identical, indicating as much variation in female as male career trajectories, at least as measured in this way.

UPDATE: These additional data suggest that the h-indexes of male and female researchers diverge over time, and that most of the difference is for mid to late career scientists. It’s unclear to me why this might be the case, but we could speculate about factors such as career breaks to have children. Note that I struggled to find female EEB scientists with an h-index larger than about 80 – if I’ve missed any please let me know.

The data set I used for this analysis is certainly not random and contains a lot of people I know personally or by reputation, so a larger, more systematic analysis could come to some rather different conclusions. However I thought this was an interesting starting point and if anyone else wants to play with the data, you can download the anonymised spreadsheet here.

*I’m not at all convinced about this analogy myself and am happy for anyone to explain to me why it’s a very poor one 🙂 UPDATE: Though Stephen Heard seems to like it.

25 thoughts on “What’s the point of the h-index? UPDATED”

Carlos Herrera January 5, 2017 at 3:55 pm

Just coming to my mind. In contrast to WoS, where data for _all_ publishing scientists are shown without any prior screening, in Google Scholar are the scientists themselves who decide whether their profiles are public or not (i.e., available for searches and analyses like yours). I have reasons to suspect that public GS profiles are a strongly biased sample, with scientists under the average tending to be underrepresented. I’m not sure whether, or how much, such a bias could influence your analysis, but I speculate that were all publishing scientists influded in GS we would perhaps get a triangular scatterplot, with the range of h-index broadening with increasing career length.

Reply ↓
1. jeffollerton Post authorJanuary 5, 2017 at 4:07 pm
  
  Great point, Carlos, and you may well be right. It would be interesting to repeat this with WoS if there was an easy way to get clean data.
  
  Reply ↓
Amanda January 5, 2017 at 5:15 pm

You could use the INSPIRE-HEP metrics – there the h-index is pretty accurately represented, the data is easy to draw directly from the page and there is no self selection. You are limited to physics though. http://inspirehep.net

Reply ↓
1. jeffollerton Post authorJanuary 5, 2017 at 5:28 pm
  
  Thanks, I’ll take a look, would be an interesting comparison.
  
  Reply ↓
drmmu January 5, 2017 at 10:24 pm

Besides providing personal h-index, GS uses h5-index as a metric to rank conferences and journals [1]. It’s a very handy tool to check 1) the top outlets in a subject area (or any sub-category of it), and 2) whether an unknown outlet is reputable. For instance, I’d be looking at this list [2] when I think of publishing some research on computer networks. Apparently, this only works well for established venues and there is probably no significant correlation between a personal h-index and the h-index of the conferences/journals he/she publishes regularly…
[1] https://scholar.google.co.uk/citations?view_op=top_venues&hl=en
[2] https://scholar.google.co.uk/citations?view_op=top_venues&hl=en&vq=eng_computernetworkswirelesscommunication

Reply ↓
1. jeffollerton Post authorJanuary 6, 2017 at 7:56 am
  
  Thanks Mu, agreed, that can be useful especially for early career researchers who may not know the journals. However it’s worth looking at David Colqhoun’s video for a different perspective – he’s very sceptical (statistically) of all of these metrics.
  
  Reply ↓
CuriousGeorge January 6, 2017 at 12:20 pm

You need to log-transform your response variable or do a weighted regression. The variance of h-index is clearly mean dependent.

Reply ↓
1. jeffollerton Post authorJanuary 6, 2017 at 12:53 pm
  
  Yes, you’re right, if I was going to do a formal analysis I would go into a lot more depth. This was really for illustrative purposes. But I also think that it’s interesting that the variance increases over time, it may say something about how people’s careers diverge.
  
  Reply ↓
  1. CuriousGeorge January 6, 2017 at 1:03 pm
    
    Right, but people are going to bandy about your graph and analysis, when you admittedly haven’t gone through the effort of doing it correctly.
    
    Stuff that has societal consequences (e.g., effect of sex on career status) shouldn’t just be analysed on a lark without rigour… In other words, the graph is misinformative, so why produce it at all.
    
    I don’t understand why blogs (which are more public than the formal literature) receive less rigour than the literature itself.
  2. jeffollerton Post authorJanuary 6, 2017 at 1:24 pm
    
    Hmmm, that’s an interesting perspective, and a you have a point. But to me, blogs are (in part) about throwing ideas around, ideas that may never appear in the formal literature because they are too fragmentary or provisional to get through peer review. What I was trying to do with this post was to suggest ways in which the h-index might be a useful metric for understanding some aspects of scientific career development. People can take from that what they wish. I merely pointed out that the raw data, when plotted, seem to support published studies on the topic. I deliberately made the data available so that if others had the time and inclination they could take this further. One final point: whether the h-index (or indeed citations generally) correlates with career status, and is therefore affected by sex, is moot.
  3. Mike Fowler January 6, 2017 at 3:34 pm
    
    Interesting stuff, Jeff!
    
    Running a poisson glm on your linked dataset (which is arguably more appropriate than the linear model, certainly related to CuriousGeorge’s suggested approach) shows a difference in intercept by gender (M’s start 0.036 log-transformed h-index units higher than F’s at 0 years since publishing), but no ‘significant’ difference in slopes (interaction term 0.0009 ± 0.0033 SE; p = 0.79).
    
    Adding in year of first publication would also be interesting to extend the analysis, but I don’t particularly enjoy interpreting 3-way interactions…
    
    The poisson is not a particularly well-fitting model (quasipoisson is no better, Neg bin might solve that, but you get what you pay for round here), but I’m more in tune with the “quick and dirty” joy of blogs – while we have a responsibility as scientists to communicate beyond our immediate sphere, it’s still up to journalists to fact check before they publish. Blogs are not always reliable sources of scientific output.
    
    [Apologies in advance for any crappy formatting, summary output from R: glm(hindex ~ year*gender)]
    
    Coefficients:
    Estimate Std. Error z value Pr(>|z|)
    (Intercept) 2.609715 0.085506 30.521 <2e-16 ***
    years 0.036112 0.002797 12.912 <2e-16 ***
    genderM 0.218630 0.106381 2.055 0.0399 *
    years:genderM 0.000868 0.003262 0.266 0.7902
    
    Null deviance: 1338.13 on 71 degrees of freedom
    Residual deviance: 528.11 on 68 degrees of freedom
    AIC: 930.53
  4. jeffollerton Post authorJanuary 6, 2017 at 3:54 pm
    
    Thanks Mike, you’ve beaten us to it! I’m sitting with a colleague and we are trying a couple of different approaches. I was running an ANCOVA in SPSS and Robin is playing with poisson glm in R, and we are debating whether he should be forcing the intercept to be 0…. What I ought to be doing is preparing for a seminar at 0900 on Monday!
  5. Mike Fowler January 6, 2017 at 4:02 pm
    
    Yeah – good point. I think forcing through 0 is definitely maybe a sound idea. Doesn’t seem to change my above interpretation though.
    
    (I’m supposed to be marking a Lit Review, therefore playing with data is a welcome distraction…)
  6. jeffollerton Post authorJanuary 6, 2017 at 4:11 pm
    
    Yes, playing with data is always more fun than marking!
  7. kwekings January 7, 2017 at 1:17 am
    
    Logging in a linear model, or using Poisson (as Mike Fowler did) or negative binomial error structures (see code below), not only improves the heteroscedasticity, but also changes the interpretation of an “expected” rate of change of h-index with years. All three methods suggest that h-index should accelerate with time: a researcher 10 years after the first publication is expected to increase at 1.5 points a year, while a researcher 30 years after first publication is expected to increase at 3.5 points a year. Which is expected because an established professor/academic silverback has a deep network of collaborators and former students (and “grand-students”) and should be more productive in terms of paper counts. Because of an improved fit, the variance is much narrower: 1.3-1.5 for 10 years, 2.3-5.3 for 30 years.
    
    Because negative binomial accommodates increasing variance with time, the gender effect appears to be no longer statistically significant for this dataset.
    
    library(MASS)
    
    h<-read.csv("clipboard", header=TRUE, sep="\t") #I just copied the table to the clipboard
    names(h)<-c("years","h","gender")
    
    summary(h.nbmod<-glm.nb(h~years*gender, data=h))
    
    years.M.new<-with(h,seq(min(years[gender=="M"]), max(years[gender=="M"]), length.out=1000))
    years.F.new<-with(h,seq(min(years[gender=="F"]), max(years[gender=="F"]), length.out=1000))
    
    h.M.nbpred<-predict(h.nbmod,newdata=data.frame(
    years=years.M.new, gender="M"
    ), type="response")
    h.F.nbpred<-predict(h.nbmod,newdata=data.frame(
    years=years.F.new, gender="F"
    ), type="response")
    
    plot(h~years, data=h, pch=ifelse(gender=="M",16,1), cex=1.5, xlim=c(0, max(years)))
    lines(h.M.nbpred~years.M.new, lwd=2)
    lines(h.F.nbpred~years.F.new, lwd=1)
  8. jeffollerton Post authorJanuary 7, 2017 at 11:19 am
    
    Thanks for this; will be interesting to hear what others say about whether there is or is not a gender effect. It seems that different statistical approaches give a different outcome; on Twitter Chris Harrod has produced an ANCOVA suggesting that there is a difference.
Picabu January 6, 2017 at 4:23 pm

For non-PIs, h-index seems to measure in part the ability to insert oneself onto a project. Be it through collaboration, currying favor with the PI, sharing reagents, etc. There’s no difference between a paper where you’re 1st author and one where you’r 6th out of 12 authors. This tends to reward people who jump around from project to project, help out with a bit here and there on somebody else’s main project, etc. It punishes those who dedicate their full energy and attention to one major project at a time.

Reply ↓
1. jeffollerton Post authorJanuary 6, 2017 at 4:44 pm
  
  Thanks for the comment. Yes, I wouldn’t disagree with the first part of what you’re saying, but then there are different ways of “being a scientist” and some people are naturally better collaborators or have particular skill sets that are in demand. That can certainly lead to more papers, you’re right, though they won’t necessarily be more cited.
  
  The other observation I’d make is that “punishes” implies that there is some kind of penalty for having a lower h-index than “expected” for a given professional age, and I don’t think that there’s any evidence to support that (or is there?) In fact Jeremy’s analysis on Dynamic Ecology suggests the opposite, that the h-index (by itself) is not looked upon as a serious measure of an individual’s “quality”.
  
  Reply ↓
Meghan Duffy January 8, 2017 at 12:46 am

The highest GS h-index I found for a woman ecologist in my quick search just now was 73. There’s no GS page for Margaret Davis, though. I wonder how high hers is?

Only the most senior women would stand a reasonable chance of having an h-index that high; the very serious, blatant sexism women faced 40 years ago not only influenced the number of women who’ve been publishing for that long, but also how much they published (and, therefore, their h-index). This newspaper article describes Margaret Davis’s battles against sexism here at Michigan: http://oldnews.aadl.org/taxonomy/term/2371

(Just some musings I had on the post!)

Reply ↓
1. jeffollerton Post authorJanuary 8, 2017 at 8:47 am
  
  Thanks for that Meghan. You’ve nailed a couple of issues there, including the fact that if someone doesn’t decide to “own” their GS profile, then the data are not available. They can be got from WoS, of course but that takes a lot of work to clean up the citations. And GS citations are always higher than WoS, so not directly comparable.
  
  The highest h-index for a woman ecologist that I found was Georgina Mace at 84.
  
  Reply ↓
Carlo Mealli December 30, 2017 at 8:26 am

Also in my country (Italy), the h-index is more and more referred to in academic competitions. I feel particularly uneasy on it as a possible source of injustice, if used alone. Let’s take two candidates with the equal h-index of 30. They should be at least closely comparable, but in actuality their scientific production can be hugely different. Let’s assume that one reaches H=30 with the minimum number of cited papers 900 (30 papers each one cited 30 times), but the other has a total of 9000 citations of his/her first 30 papers (only 30 for the last one). In this case he would have an average citation number per paper of 300 vs. 30 of the first candidate. The difference macroscopically reflects value and productivity, in spite of the same h index. In conclusion the h index should never be used alone but accompanied by the average citation index of the papers contributing to the h index itself. The latter index better discriminates the candidates even in the case of significant differences of the H index. Nobody can deny the outstanding relevance of a scientist with H index of 5 but a total of 100000 citations!

Reply ↓
1. jeffollerton Post authorDecember 30, 2017 at 12:38 pm
  
  Thanks for your comment Carlo. Yes, I agree, the h-index used on its own does not reflect the full story. This is true of all measures of academic impact, including number of papers published: writing 50 papers as first author is not the same as being co-author on 50 papers. Best wishes for the New Year!
  
  Reply ↓
Pingback: Which h index should I use? | Jeff Ollerton's Biodiversity Blog
Susan Letcher December 2, 2021 at 3:27 pm

Realizing I’m a bit late to the party… but Robin Chazdon has an h-index of 88.

Reply ↓
1. jeffollerton Post authorDecember 3, 2021 at 1:59 pm
  
  Thanks Susan – though the post is 5 years old, so I wonder if Robin’s h-index was >80 then? Maybe I should update this post at some point 🙂
  
  Reply ↓