Tag Archives: Bibliometrics

What’s the point of the h-index? UPDATED

UPDATE: I’ve increased the sample size of EEB scientists I used in the analysis.

——————————————————-

Over at the Dynamic Ecology blog yesterday, Jeremy Fox posted an interesting analysis of which metrics correlate with the chances of early career researchers in ecology and evolutionary biology (EEB) gaining an interview for an academic post in North America.   Spoiler alert: none of them correlate, except the number of job applications you submit.

These metrics include number of papers published, number of first author papers, number of large (>$100,000) grants held, number of years post-doc, and h-index.  Nada, zilch, nothing, nowt is significantly correlated.  Which is good: as Jeremy (and the stream of commenters) discuss, it means that interview panels are looking roundly at individuals and what they can offer a university department, and not relying on (sometimes dubious) metrics.

Which brings us to the h-index….  Jeremy linked to an old post of mine called “How does a scientist’s h-index change over time?“, a piece that was far and away my most viewed post last year (and second-most viewed post in 2015).  This suggests that there’s still a huge “appetite” for the h-index, in terms of understanding what it is and how it can/should (or cannot/should not) be used.  Even before the Dynamic Ecology post came out I was planning to update it and give examples where I think it might be useful, so this seems like a good time to do that.

Opinions on the h-index vary hugely.  Some of the links in my original post were to writings by scientists who really like the idea of being able to use it to track the academic impact of an individual (or at least some measure of it).  Others despise it, and indeed all academic metrics, as pernicious and potentially dangerous to science – see David Colquhoun’s video on this topic, for instance.

I’m somewhere in the middle – I recognise the weaknesses of the h-index, but I also think that it’s measuring something, even if the something that it’s measuring may not be directly translatable into a measure of “quality” or “impact”, and especially not “employability” or “worthy of promotion” (and I would certainly never countenance using the h-index as a the sole measure of the latter two).

So when is the h-index useful?  Well one use is as a personal tracker of one’s own standing or contribution within a field, assessing the trajectory of a career, and perhaps gauging when it’s time to apply for promotion (at least in the UK system which is a less transparent process than in North America, or at least that’s my impression).  To illustrate this I’ve collated the h-indexes and years since first publication for 72 EEB scientists using Google Scholar (GS).  I used GS rather than Web of Science (WoS) as, although GS is less conservative, WoS seems to be becoming noticeably less accurate; for example it’s recently assigned to me chapters on which I was not an author but which are included in a book that I co-edited.  Another advantage of GS, of course, is that it’s publicly available and not pay walled.

It’s long been known that a scientist’s h-index should increase over their professional lives, and indeed that’s what we find if we plot number of years since first publication against an individual’s h-index:

h-index-graph

It’s a fairly strong correlation, though with a lot of scatter (something Jeremy noted in his blog) and it suggests that EEB scholars accrue their h-index  at a rate of about 1.6 papers per year, on average, though with a big range (0.3 to 4.2 papers per year).  One (albeit fanciful*) way to think about this graph is that it’s analogous to a Hertzsprung–Russell (HR) diagram in astronomy, where, as they age, stars shift position predictably on a plot of colour versus magnitude.  In a similar way, as EEB scientists age professionally, their position on this plot moves in ways that may be predictable from their scientific output.

There’s a lot of structure in HR diagrams, including the famous Main Sequence, where most stars lie, as well as stellar evolutionary tracks for Giants, Super Giants, White Dwarfs, etc.  In this modest sample I think we’re starting to see similar structure, with individuals lying far above or below the “h-index Main Sequence”, indicating that they are accruing greater or fewer citations than might be expected.  UPDATE:  In particular, three individuals who are “Super Giants” (to use the astronomical terminology) and lie far above the Main Sequence.  Carlos Herrera makes an interesting point in the comments (below) about self-selection in GS which could mean that there are far fewer people with low h-indexes represented than we might expect.

One of the things that could be explored using these type of data is exactly why it is that this is happening: is it a question of where they are based, or their nationality, or where they publish, their sub-field, or what?  One easy analysis to do is to assess whether there is a difference between female and male scientists, as follows:

h-index-graph-mf

Previous research has suggested that women on average receive fewer citations for their papers than men (see this 2013 study in Nature for instance) and this graph gives some support to that idea, though I’ve not formally tested the difference between the two lines. What is also interesting is that the R-squared values are identical, indicating as much variation in female as male career trajectories, at least as measured in this way.

UPDATE:  These additional data suggest that the h-indexes of male and female researchers diverge over time, and that most of the difference is for mid to late career scientists.  It’s unclear to me why this might be the case, but we could speculate about factors such as career breaks to have children.  Note that I struggled to find female EEB scientists with an h-index larger than about 80 – if I’ve missed any please let me know.

The data set I used for this analysis is certainly not random and contains a lot of people I know personally or by reputation, so a larger, more systematic analysis could come to some rather different conclusions.  However I thought this was an interesting starting point and if anyone else wants to play with the data, you can download the anonymised spreadsheet here.

 

*I’m not at all convinced about this analogy myself and am happy for anyone to explain to me why it’s a very poor one 🙂  UPDATE:  Though Stephen Heard seems to like it.

 

 

 

 

23 Comments

Filed under Biodiversity, History of science