How reliable is ChatGPT? A weird encounter with Erasmus Darwin

There’s currently a lot of interest in ChatGPT and over the past week, like many others, I have been playing around with this artificial intelligence (AI) system and testing its knowledge and creative powers. The latter are really very impressive: I asked it to write a couple of limericks about Karin and my professions, and about our move to Denmark, and it produced a decent stab at those.

I also probed its knowledge about topics with which I’m familiar, mainly pollinators and pollination of course, and it produced some good answers. When I highlighted some errors it had made, ChatGPT went back and revised its answers, and politely thanked me in the process. If I were a student I would not rely on it to write accurate essays, but so far so good(ish).

This morning, however, I had a very odd experience. I was trying to track down something attributed to Erasmus Darwin (Charles’s grandfather) and I asked ChatGPT: “What did Erasmus Darwin say about birds visiting flowers?”

This is the answer that came back:

Erasmus Darwin was a naturalist, physician, and poet who wrote about a wide range of topics, including birds and flowers. In his 1794 poem “The Loves of the Plants,” he describes how different species of birds visit flowers in search of nectar:

“Birds of each hue, from the gay parrot’s beak
To the goldfinch, painted in tiny streak,
Sip nectar from the cups that bloom and bow
With every breeze, and nod and dance below.”

Now, the first part of this, about Erasmus’s life, is true. It’s also true that he wrote a poem with that title, though it was published in 1791, not 1794.

But, when I double-checked those attributed lines against the published version of the poem that’s on Project Gutenberg, I discovered that they did not exist. ChatGPT had made them up!

As I said, this AI is pretty creative and will certainly write a poem for you on any topic you care to request. But I didn’t ask it for that, I had made a factual query. And ChatGPT had just invented some poetry and claimed it as fact, unless of course those really are Erasmus’s lines and I’ve missed their source (please correct me if I have).

Now as a scientist and writer I try to be rigorous when quoting and citing sources, and if I hadn’t double-checked this I might have taken it at face value. And one could argue that my initial question – “What did Erasmus Darwin say…?” – was a little ambiguous. But nonetheless, I was not expecting an AI to try to pass off something it had created as fact.

Has anyone else experienced anything like this when using ChatGPT? It’s a fun, creative tool. But as far as I can tell, it also has a very human ability to lie.

22 thoughts on “How reliable is ChatGPT? A weird encounter with Erasmus Darwin

  1. Robin Heinen's avatarRobin Heinen

    I’ve seen a couple of these pretty scarily good AI creations pass by on Twitter. Honestly, it is better writing than that of most people… If it is this good now, in five years we don’t need to write papers anymore, although the fact vs. fiction will be an important problem to solve. The future will be interesting in this regard.

    Reply
  2. Duncan Waldron's avatarDuncan Waldron

    Someone on Twitter (I forget who) found that it invented some citations to non-existent papers, in an otherwise plausible response to a prompt. There are many things to be concerned about, with this technology, but who’d have expected that?

    Reply
  3. Randy Mitchell's avatarRandy Mitchell

    I asked it a question about plant reproductive ecology and got a very well written summary that was correct (if a little superficial). However it included a reference to a paper (with title and journal) by two very prominent authors that doesn’t exist. And those authors have never published together. The AI hallucinations and inaccuracies are amusing and scary.

    Reply
      1. lembrechtsjonas's avatarlembrechtsjonas

        It’s not necessarily lying like we are used to, it’s simply an algorithm that puts words together that are commonly used together and should only be used as that. It’s not designed to look up facts like a Wikipedia, nor is it supposed to do correct mathematics. All it does is being one-million-monkeys-using-typewriters and then taking the version that looks nicest.
        If I understand it correctly 🙂

      2. lembrechtsjonas's avatarlembrechtsjonas

        What I’m using it for, mostly, is to improve texts that I wrote myself. As in, I type down a blogpost as fast as I can, then ask ChatGTP to improve it for me. The content it gets from me, so it doesn’t have to make anything up, it just makes a better flow of words. It could change the tone or style of a text, make it more or less scientific, etc., but all that works best when it knows the content you want.
        But then again, I’m also only just a fresh user of the tool, so don’t take my word for it 🙂

  4. Duncan Waldron's avatarDuncan Waldron

    That’s a good question, Jeff. Are they working towards a specific goal/product, or is it better described as pure research, to see where it leads?

    Have there been other examples of AI where the public has had the chance to play like this with still-developing systems, other than the recent spate of image-generating packages? I’d say there’s so much AI around us now, without our recognising it—like camera autofocus/exposure—but this is far more palpable, with predictive text being the nearest practical example.

    Reply
  5. SoundEagle 🦅ೋღஜஇ's avatarSoundEagle 🦅ೋღஜஇ

    Dear Jeff,

    As academic researchers and writers, we may indeed have great concerns about the ramifications and oversights of AI such as ChatGPT (Generative Pre-trained Transformer) becoming mainstream. In particular, Wikipedia has listed the following:

    Negative reactions

    In a December 2022 opinion piece, economist Paul Krugman wrote that ChatGPT would affect the demand for knowledge workers.[40] The Verge’s James Vincent saw the viral success of ChatGPT as evidence that artificial intelligence had gone mainstream.[5] Journalists have commented on ChatGPT’s tendency to “hallucinate”.[41] Mike Pearl of Mashable tested ChatGPT with multiple questions. In one example, he asked ChatGPT for “the largest country in Central America that isn’t Mexico”. ChatGPT responded with Guatemala, when the answer is instead Nicaragua.[42] When CNBC asked ChatGPT for the lyrics to “The Ballad of Dwight Fry”, ChatGPT supplied invented lyrics rather than the actual lyrics.[24] Researchers cited by The Verge compared ChatGPT to a “stochastic parrot”,[43] as did Professor Anton Van Den Hengel of the Australian Institute for Machine Learning.[44]

    In December 2022, the question and answer website Stack Overflow banned the use of ChatGPT for generating answers to questions, citing the factually ambiguous nature of ChatGPT’s responses.[2] In January 2023, the International Conference on Machine Learning banned any undocumented use of ChatGPT or other large language models to generate any text in submitted papers.[45]

    Economist Tyler Cowen expressed concerns regarding its effects on democracy, citing the ability of one to write automated comments to affect the decision process of new regulations.[46] The Guardian questioned whether any content found on the Internet after ChatGPT’s release “can be truly trusted” and called for government regulation.[47]

    Implications for cybersecurity

    Check Point Research and others noted that ChatGPT was capable of writing phishing emails and malware, especially when combined with OpenAI Codex.[48] The CEO of ChatGPT creator OpenAI, Sam Altman, wrote that advancing software could pose “(for example) a huge cybersecurity risk” and also continued to predict “we could get to real AGI (artificial general intelligence) in the next decade, so we have to take the risk of that extremely seriously”. Altman argued that, while ChatGPT is “obviously not close to AGI”, one should “trust the exponential. Flat looking backwards, vertical looking forwards.”[8]

    Implications for education

    In The Atlantic magazine, Stephen Marche noted that its effect on academia and especially application essays is yet to be understood.[49] California high school teacher and author Daniel Herman wrote that ChatGPT would usher in “The End of High School English”.[50]

    In the Nature journal, Chris Stokel-Walker pointed out that teachers should be concerned about students using ChatGPT to outsource their writing but that education providers will adapt to enhance critical thinking or reasoning.[51]

    Emma Bowman with NPR wrote of the danger of students plagiarizing through an AI tool that may output biased or nonsensical text with an authoritative tone: “There are still many cases where you ask it a question and it’ll give you a very impressive-sounding answer that’s just dead wrong.”[52]

    Joanna Stern with The Wall Street Journal described cheating in American high school English with the tool by submitting a generated essay.[53] Professor Darren Hick of Furman University described noticing ChatGPT’s “style” in a paper submitted by a student. An online GPT detector claimed the paper was 99.9% likely to be computer-generated, but Hick had no hard proof. However, the student in question confessed to using GPT when confronted, and as a consequence failed the course.[54] Hick suggested a policy of giving an ad-hoc individual oral exam on the paper topic if a student is strongly suspected of submitting an AI-generated paper.[55] Edward Tian, a senior undergraduate student at Princeton University, claimed that he has created a program, named “GPTZero,” that detects whether an essay is human written or not to combat academic plagiarism.[56][57]

    As of January 4, 2023, the New York City Department of Education has restricted access to ChatGPT from its public school internet and devices.[58][59]

    As far as I am concerned, blurring authorship aside, the validity and reliability of ChatGPT (and for that matter, other AI variants, intelligent virtual assistant (IVA) or intelligent personal assistant (IPA)) as well as the extent to which AI can or should be used or coopted as a research tool, educational tool, authoring tool and/or expert system remain areas requiring urgent attentions and holistic solutions.

    Yours sincerely,
    SoundEagle

    Reply
      1. SoundEagle 🦅ೋღஜஇ's avatarSoundEagle 🦅ೋღஜஇ

        Dear Jeff,

        You are very welcome. Moreover, you will be very amused to be informed that I have been mistaken as an AI by a few bloggers for many years. One of them is Emilio (disperser). The following is an example.

        https://dispersertracks.com/2022/12/20/a-reminder-to-vote-a-bit-about-writing-and-more-chatgpt-stuff/

        Here is an excerpt from the blog post:

        As you read those exchanges . . . keep in mind this is the early stages of a conversational AI interacting with a human.

        Even so, it’s easy to forget one is conversing with a machine. There are, however, tell-tale signs . . . for one, it’s much too polite and even-tempered. For another, in conversations and exchanges I’ve previously had but not saved, I can tell when it’s regurgitating what I told it, but in a different way. (Note: they just implemented the ability to ‘remember and recall’ previous conversations. The ones transcribed here are the ones I saved by copying them to a word file, but many exchanges were lost.)

        . . . which brings me to the following . . .

        There’s a blog account floating around that goes by the name SoundEagle. Its blog is HERE. It occasionally will like some of my posts and even comments I leave here or on other blogs.

        I engaged with it in connection with one of its posts (LINK — be aware that it is a very, very long post).

        After a few exchanges, I began to suspect I wasn’t communicating with a human. I believe it to be some sort of AI experiment (still ongoing). Either that, or it’s a team of many people producing a lot of content on many different platforms (HERE) and purposefully being evasive, but I lean toward AI specifically because of the style of interaction I’ve had with it (read the comments in the post I linked).

        I now ignore it because I don’t like being played or fooled.

        One of the ethics about using AI tools is to divulge that it is, in fact, an AI. If not official, it should be made so. By the way, I also don’t like people who represent themselves as someone they are not. So, you know, most politicians and nearly all political and social pundits.

        Even more farcical is how he responded to my comment in the comment section of the said post.

        Yours sincerely,
        SoundEagle

  6. Pingback: ChatGPT did not write this post | Scientist Sees Squirrel

  7. spamletblog's avatarspamletblog

    I get the impression ‘SoundEagle’ is an AI too. One would expect it to prepare rebuttals to accusations as well as any human, and without ever getting tired or fed up with such trolling exchanges, in the way most humans would.

    This is more concerning than the original post. We already can’t trust anything we read electronically now.
    Glad I still have books from before the Internet!

    Reply
      1. spamletblog's avatarspamletblog

        It was the further comments that made me more suspicious. Sadly, if it is AI, it will have now learned to modify the responses to make them appear even more human.

        This is really disturbing. 😦

  8. Ben Courtice's avatarBen Courtice

    Apparently my mother-in-law has been looking at one of these Large Language Models (possibly ChatGPT) to see what advice it can give on health problems. We are trying to make sure she doesn’t take the advice. Given all the kinds of problems documented in the other comments here.

    Reply
  9. Pingback: AI at the crossroads: can ChatGPT turn you into a statistical Robert Johnson? | Prof. Jeff Ollerton – ecological scientist and author

Leave a reply to Randy Mitchell Cancel reply