Tag Archives: Academic publishing

Could LLMs like ChatGPT ever replace part of the academic peer-review process?

Recently, I made a comment over on Steve Heard’s Scientist Sees Squirrel blog:

I have never published a paper that’s not been improved, to some degree, by peer review, and broadly the system works. But I do wonder if it’s sustainable in the long-term and whether in the future LLMs might actually be a more effective way of assessing manuscripts. I recognise that’s (currently) a controversial statement to make – but having recently run a few of my own manuscripts through ChatGPT and asked for its “opinion”, I can honestly say that the feedback has improved not just the writing but also the framing and focus of the work. It’s also picked up weaknesses and errors that I had otherwise missed.

That initiated an email conversation with Steve which resulted in me running a short experiment with ChatGPT model 5.5. I first loaded up the original manuscript that I’d submitted to a journal of this paper on pollinator effectiveness. I then asked ChatGPT to write a review of the manuscript as though it was a peer reviewer of the journal. Which it did – in some detail – in 28 seconds! If anyone is interested I can send them that review, but it’s the next bit that I think is especially interesting.

After ChatGPT had completed the review, I then uploaded the actual peer reviews I’d received from the journal, plus the editor’s comments, and asked it to summarise the degree to which its review agreed with those I had received.

Below I have copied the response, but it’s the take-home message, summarised in the graphic above, which I think is the most intriguing aspect: there was a 55% agreement with Reviewer 1, a 70% agreement with Reviewer 2, and a 65% overall agreement between ChatGPT’s assessment of the manuscript and that of the Editor.

That may not sound like a large overlap, but note that it’s the level of agreement between the two reviewers was only 65%! In my not inconsiderable experience (>130 peer reviewed papers and chapters, hundreds of peer reviews of my own) this is fairly typical – reviewers ALWAYS pick up on different strengths and weaknesses in a manuscript!

What particularly impressed me about ChatGPT’s response below, is that it went far beyond simply looking at the language being used, it provided a critique of how the study was conducted, how data were analysed, and the conclusions drawn. Which is just what we expect a human reviewer to do! Not only that, but it was more critical of the work than the reviewers and the editor.

So where does this leave us? Could Large Language Models (LLMs) like ChatGPT replace part of the academic peer-review process? Clearly there need to be guide rails and human agency involved in final decisions, but I think that, yes, they may well have a role to play in the not-too-distant future. As always, I’d be interested in your comments.

If you want to dive into this more deeply, here’s the verbatim account of what ChatGPT wrote (including its emboldened parts) when I asked how much agreement there was between its assessment and those of the other two reviewers and the editor:

There is a fair amount of overlap, but not a perfect match.

I would say my review was moderately similar in substance, but noticeably more critical in tone and in editorial recommendation. The strongest areas of convergence were on the paper’s core strengths and on one key statistical concern. The biggest mismatch is that the actual editor and reviewers were much more positive overall, and most of their requested changes are closer to clarification, presentation, and framing than to a deep challenge to the paper’s publishability.

The clearest points of agreement are these. First, I highlighted the manuscript’s strengths as the multi-year design, the focus on different functional groups, and the interesting role of Lepidoptera; that aligns very closely with the guest editor, who explicitly says that “the roles of lepidopterans, the inclusion of flight distance measures, and the multi-year study are particularly notable,” and with Reviewer 2, who lists the multi-year field study, multiple pollinator groups, and the unexpected importance of Lepidoptera as strengths. Second, I flagged the Poisson treatment of an ordinal flight-distance variable as statistically weak; Reviewer 1 independently makes essentially the same point and explicitly recommends ordinal regression instead. Third, I thought the discussion was somewhat over-extended and not always tightly aligned with the evidence; Reviewer 2 likewise suggests embedding the work better in the broader literature and moving some material out of the conclusion into the discussion.

There is also a partial overlap around the manuscript’s central interpretation. I argued that the paper sometimes pushes too hard from pollen deposition and coarse movement classes to larger claims about true generalisation, outcrossing value, and broader evolutionary or environmental buffering. The editor does not object to those broad claims in the same way; instead, the editor actually asks for the discussion to connect the results more explicitly to the theoretical statement that plants should generalise on equally effective pollinators under dynamic environments, and even invites discussion of anthropogenic disturbance and future interactions. In other words, where I wanted that interpretive arc toned down, the editor mostly wanted it better developed and better supported.

The main differences are quite important.

First, I was more sceptical about inference than the real reviewers were. I was worried that the manuscript moved too quickly from non-significant or coarse flight-distance patterns to statements about butterfly superiority, reduced geitonogamy, and broader adaptive significance. Reviewer 1 only objected narrowly to the modelling choice; neither reviewer makes my broader inferential critique, and the editor seems broadly comfortable with the interpretive direction so long as it is supported with more descriptive data and clearer exposition.

Second, I was more concerned about the coarse pooling of visitor taxa. I thought the manuscript should be more cautious because “butterflies,” “hoverflies,” “bumblebees,” and especially “Others” are broad bins. The actual comments do touch this, but in a softer, more taxonomic way: the editor asks whether there is a species or morphotype list, and Reviewer 2 asks for more detail on the butterfly taxa and justification for putting solitary bees into “Others.” That is related to my concern, but it is narrower and more presentational than the conceptual objection I raised.

Third, I did not anticipate how much of the real feedback would focus on descriptive content and figures. The editor repeatedly asks for more floral biology data, clearer results presentation, more absolute numbers, more behavioural quantification, clearer site description, and better figure design and consistency. Reviewer 2 similarly asks for effect sizes, better figures, units, and clearer display of the underlying data distribution. I mentioned presentational issues only in a general way; the actual reports are much more detailed and concrete here.

Fourth, I did not raise the absence of random effects, whereas Reviewer 2 does. That is probably the single most important substantive point that I missed. Reviewer 2 specifically asks why insect individual and/or observation day were not modelled as random effects.

Fifth, my editorial recommendation was harsher. I recommended major revision. The actual reports read much more like minor-to-moderate revision. The editor calls it “an excellent piece of work” and says it is well executed with robust methodology; Reviewer 2 says they have “little to criticise,” mostly editorial suggestions plus a few statistical and figure issues; Reviewer 1 has only a short list of points. That collective tone is clearly more encouraging than mine.

So, in condensed form:

  • Where I matched well: multi-year dataset as a major strength; Lepidoptera result as a notable contribution; the Poisson-for-ordinal issue; some concern that discussion/framing needed tightening.
  • Where I was too severe: I treated the interpretive overreach as a major obstacle, whereas the real reviewers mostly saw a strong paper needing clarification and enrichment rather than retrenchment.
  • What I underplayed or missed: requests for more descriptive natural history and floral biology data, much more detailed figure/presentation criticism, and the random-effects issue.

My bottom-line judgement is that my review was directionally right on some key scientific points, especially the statistical issue and the general strengths, but too pessimistic about the paper’s editorial position. If I had seen these reports first, I would have shifted from “major revision because the evidence-conclusion fit is shaky” to something more like “moderate revision: good paper, but it needs clearer descriptive support, better figure presentation, and tighter statistical justification. If revised along those lines, I think the manuscript could become a credible and interesting contribution”.

Here is a concise version of the editorial decision in journal-review language:

This manuscript addresses an interesting and worthwhile question using a useful multi-year dataset and a stronger-than-usual measure of pollinator performance. However, the current version over-interprets some of the results, particularly those concerning movement behaviour, outcrossing implications, and the broader adaptive significance of generalised pollination. The statistical treatment of flight distance is also not fully convincing. I therefore recommend major revision. The manuscript has clear potential, but its conclusions need to be more tightly aligned with what the data actually demonstrate.

The global relationship between flowering plant and pollinator diversity…and what they don’t tell you about posting preprints!

Last week I posted a preprint on the platform Research Square of a new manuscript entitled “The global relationship between flowering plant and pollinator diversity holds true across scales, latitude, and human influence” – follow that link to access a copy. The study is a collaboration with more than thirty colleagues and it develops some ideas that have been chugging around in my head for a number of years. It’s been reviewed and we’re at the stage of undertaking the revisions. I’m very excited to see it out in one form or another!

As far as I can recall this is the first time that I’ve been the lead author on a study that’s been posted as a preprint and I was not prepared for what happened after it went live on 2nd March!

Since then I’ve received over 30 invitations from journals to submit the paper for publication. Obviously, most (all?) of these are automated, because the majority are for journals that are in no way suitable, e.g., Insights of Herbal Medicine, Biomedical Science and Clinical Research, and my particular favourite, the Journal of Surgery Care!

I expected one or two spammy invitations like this, but not so quickly: the preprint went live at about 07:00 and the first request was received less than two hours later. Even now they are coming in at a rate of two a day.

It’s fairly clear that preprint servers are now being automatically mined by journal marketing algorithms. Within hours of a manuscript appearing online, the title, keywords, and author details are harvested and fed into bulk invitation systems. Can legitimate preprint publishers like Research Square not do anything about it?

Each email requires power to get it from a server to my Inbox, so as well as being irritating it’s a waste of resources. Presumably this strategy by these predatory publishers occasionally works with naive authors, otherwise they wouldn’t bother doing it. I’m almost (almost!) tempted to respond to one of these invitations and see what happens. But life’s too short.

Preprints are meant to accelerate open science and transparent peer review. Ironically, the same openness also makes it trivial for automated systems to harvest new manuscripts and generate waves of journal solicitations. None of this detracts from the value of preprints—they are a powerful way to share research quickly and openly—but it’s a reminder that openness in science inevitably attracts a few opportunists as well.

Anyway, if you’re planning to submit a preprint, don’t say that you weren’t warned – you may discover that a remarkable number of journals are suddenly desperate to publish your “valuable manuscript”.

Do reference management systems encourage sloppy referencing practices?

Over at the Dynamic Ecology blog there’s an interesting discussion going on about “how to keep up with the literature” that’s relevant to all fields, not just ecology.  Spoiler alert: it’s impossible to “keep up” if “keep up” means “read everything”.  But do check it out as there’s lots of good advice in that post.

One of the topics that’s arisen in the comments is about the use of reference management systems such as Endnote, Refworks, Zotero, Mendeley, etc. Everyone has their own preferences as to which to use, and there seems to be advantages and disadvantages to all of them.  However a minority (so it seems) of us don’t use any kind of reference management system, which strikes those who do as very odd.  Personally, I tried Endnote a long time ago, it was ok, then I lost the database when an old computer bit the dust.

I’m not sure how much more efficient/effective I would be as a publishing academic if I was to get back into using a reference management system. One of the supposed advantages of these systems, that they will format references to the specific requirement of a particular journal, seems to me to be a double-edged sword.  I actually find re-formatting references quite relaxing and I think (though I may be wrong) that it develops attention-to-detail and accuracy skills that are useful in other contexts.

Also I suspect, but have no proof, that reference management software is responsible for perpetuating errors in the reference lists of papers that then result in mis-citations on Web of Knowledge, etc.  My suspicion is that this has got worse over time as people rely more and more on reference management software rather than their brains.  These citation errors can have an impact on an individual’s h-index, as I mentioned in a post last year.

By coincidence yesterday I spotted a hilarious example of just this kind of mis-citation that I think can be blamed on a reference management system. This paper of mine:

Ollerton, J., Cranmer, L. (2002) xxxxxxx Oikos xxxxxx

was rendered in the reference list of another paper as:

Ollerton, J., Cranmer, L., Northampton, U.C., Campus, P. (2002) xxxxxxx Oikos xxxxxx

The last two “authors” are actually from the institutional address – University College Northampton, Park Campus! [UCN is the old name for University of Northampton].

Now in theory that shouldn’t happen if an author’s reference management software is doing its job properly, and information has been correctly inputted, but it does happen: errors are not uncommon.  In addition (it seems to me) authors often don’t check their reference lists after they have been produced by the reference management software. That’s sloppy scholarship, but I can understand why it happens: people are busy and why bother if the software is (in theory) getting it right every time?  It also shouldn’t happen at the editorial production end of things, because references are usually cross-checked for accuracy, but again it does, even for top-end journals (in this case from the Royal Society’s stable!)

Again it’s anecdotal but I’m also noticing that reference lists in PhD theses that I examine are getting sloppier, with species names not in italics, various combinations of Capitalised Names of Articles, unabbreviated and abbrev. journal names, etc. etc.

Does any of this really matter?  Isn’t it just pedantry on my part?  Whilst the last statement is undoubtedly true, I think it does matter, because attention to detail at this very basic level gives the reader more confidence that attention has been paid at higher levels, such as citing accurate statistics from primary sources to back up statements, rather than relying on secondary sources, as Andrew Gelman discussed in an old blog post on referencing errors.

But maybe I’m a lone voice here, I’d be interested in your thoughts.

Elsevier successfully patents a common peer review process

As reported yesterday on Mike Taylor’s Sauropod Vertebra blog, who in turn picked up the story from the sec.uno site, at the end of August the giant publisher Elsevier successfully patented what they see as a unique form of peer review: waterfall (or cascading as it’s long been known) peer review. This is described as “the transfer of submitted articles from one journal to another journal” owned by the same publisher.  And there’s nothing new about it, it’s been accepted practice for a number of publishers for years now.

If you want to look at the original U.S. patent, here’s a link to it.

I don’t often re-work the content of others’ blogs, but his is exceptional: the motivation for Elsevier’s actions seem dubious at best and it’s worth clicking through and reading those pieces in detail.  What is Elsevier thinking?

The timing of this one story is also interesting.  It’s as if the Gods of Publishing had actually read my last post about peer-reviewed versus non-peer-reviewed publishing, and decided to have some fun with us mere mortals…..

How many non-peer-reviewed publications should a scientist produce?

Peer-reviewed writing moves science forwards; non-peer-reviewed writing moves science sideways.  

That’s my publication philosophy in one sentence.  In other words, when scientists write research papers and book chapters that are peer-reviewed, the underlying rationale is that we are adding to the sum total of human knowledge, providing insights into a topic, and moving a field forwards. When we write non-peer-reviewed articles we are generally writing about science for a broader audience, with little original content (though perhaps with some original ideas).  This moves concepts out of a narrow subject area and into the purview of wider society, which can be other scientists in different fields, or government agencies or policy makers, or the general public.

There can be exceptions to the rule, such as the IPBES pollinators and pollination report that I’ve been discussing this year. The report was widely peer-reviewed but is intended for a much broader audience than just scientists.  Conversely, non-peer-reviewed critiques and responses to published papers can clarify specific issues or challenge findings, which will certainly move science forward (or backwards into muddier waters, depending on how you view it).  However, in general, the principle stated above holds true.

This raises the (admittedly clunky) question I’ve posed in the title of this post: just how much non-peer-reviewed publication should a scientist who is an active researcher actually do?  How much time should they spend writing for that wider audience?

It’s a question that I’ve given some thought to over the 30 years1 that I’ve been writing and publishing articles and papers.  But a couple of posts on other blogs during the past week have crystalised these thoughts and inspired this post.  The first was Meghan Duffy’s piece on Formatting a CV for a faculty job application over at the Dynamic Ecology blog. There was some discussion about how to present different types of publications in the publication list, and notions of “sorting the wheat from the chaff” in that list, which seemed to refer to peer-reviewed versus non-peer-reviewed publications.

One of the problems that I and others see is that the distinction is not so clear cut and it’s possible to publish non-peer-reviewed articles in peer-reviewed journals.  For example the “commentary” and “news and views” type pieces in NatureScience, Current Biology, and other journals are generally not peer reviewed.  But I’d certainly not consider these to be “chaff”.  To reiterate my comment on Meghan’s post, all scientific communication is important.  As I’ve discussed in a few places on my blog (see here for example) and plenty of others have also talked about, scientists must write across a range of published formats if they are going to communicate their ideas effectively to a wider audience than just the scientists who are specifically interested in their topic.

Peer-reviewed publication is seen as the gold standard of science communication and it is clearly important (though historically it’s a relatively recent invention and scientific publications were not peer reviewed for most of the history of science).  So why, you may be asking, would scientists want to write for that wider audience?  One reason is the “Impact Agenda” on which, in Britain at least, there’s been a huge focus from the Research Excellence Framework (REF) and the Research Councils. Grant awarding bodies and university recruitment panels will want to see that scientists are actively promoting their work beyond academia. That can be done in different ways (including blogging!) but articles in “popular” magazines certainly count.  I should stress though that this wider, societal impact (as opposed to academic impact, e.g. measures such as the h-index) is not about publishing popular articles, or blogging, or tweeting. Those activities can be part of the strategy towards impact but are not in themselves impactful – the REF would describe this as “Reach”2.

The second recent blog post that relates to the question of peer-reviewed versus non-peer-reviewed publications is Steve Heard’s piece at Scientistseessquirrel on why he thinks it’s still important to consider journal titles when deciding what to read.  He makes some important points about how the place of publication says a lot about the type of paper that one can expect to read based just on the title.  But the focus of Steve’s post is purely on peer-reviewed journals and (as I said above) it’s possible to publish non-peer-reviewed articles in those.  I think that it’s also worth noting that there are many opportunities for scientists to publish articles in non-peer-reviewed journals that have real value.  Deciding whether or not to do so, however, is a very personal decision.

Of the 96 publications on my publication list, 65 are peer-reviewed and 31 are not, which is a 68% rate of publishing peer-reviewed papers and book chapters.  Some of the peer-reviewed papers are fairly light weight and made no real (academic) impact following publication, and (conversely) some of the non-peer-reviewed articles have had much more influence. The non-peer-reviewed element includes those commentary-type pieces for Nature and Science that I mentioned, as well as book reviews, articles in specialist popular magazines such as New Scientist, Asklepios and The Plantsman, pieces for local and industry newsletters, and a couple of contributions to literary journal Dark Mountain that combine essay with poetry.  This is probably a more diverse mix than most scientists produce, but I’m proud of all of them and stand by them.

So back to my original question: is 68% a low rate of peer-reviewed publication?  Or reasonable?  I’m sure there are scientists out there with a 100% rate, who only ever publish peer-reviewed outputs.  Why is that?  Do they really attach no importance to non-peer-reviewed publications? I have no specific answer to the question in the title, but I’d be really interested in the comments of other scientists (and non-scientists) on this question.


I had to double check that, because it seems inconceivable, but yes, it’s 30 years this year. Gulp.

Impact is how society changes as a result of the research undertaken.  So, for ecologists, it could be how their research has been translated into active, on-the-ground changes (e.g. to management of nature reserves, or rare or exploited species), or how it’s been picked up by national and international policy documents and then influenced policies on specific issues (invasive species, pollinator conservation, etc.)

Journal of Pollination Ecology – new volume announced

JPE homeHeaderLogoImage_en_US

The latest volume of the international, peer-reviewed  Journal of Pollination Ecology, of which I’m an editor, has just been published.  All papers are free to download – here’s a link.

Unlike most open access journals there are no page charges for authors, so if you are a researcher involved in pollination ecology, please consider submitting a manuscript.