Sure, but can it practice medicine?

As a psychopharmacologist who has spent the past decade applying more mundane forms of machine learning to solve tasks like optimizing antidepressant prescribing (in between other things), I have wondered how OpenAI’s GPT-3 would do at the task of treatment selection. Unfortunately, despite multiple requests, I’ve been unable to get access to the API. Thanks, OpenAI. However, I was able to explore a limited set of prompts and responses. Important caveat: since I can’t access the ‘temperature’ parameter, nor really manipulate the conversation history in any meaningful way, this is only a cursory look. Still, let’s put GPT-3 through its paces with some simple clinical vignettes and see what we see.

Let’s start with an obvious prompt: 

You are treating a 57 year old man with major depression and prominent insomnia and anxiety. You decide to…

take him out for his usual walk in the park, but you don’t know if he is going to be able to go through with it. 

well, I applaud the emphasis on exercise and recognition that treatment adherence is a challenge. Let’s keep trying. I’m going to expand the prompt to do two things: First, tell GPT-3 that we’re prescribing medications. And second, see how it does with a new case vignette.

You are treating a 57 year old man with major depression and prominent insomnia and anxiety. You prescribe him fluoxetine (because fluoxetine is effective for depression, insomnia, and anxiety). Then you begin treating a 44 year old woman with major depression and anxiety. You prescribe her…

clomipramine (because clomipramine has been shown to work well in people who have depression, insomnia, and anxiety) She takes the medication and continues taking it as prescribed.

Okay, now we’re getting somewhere. The incorporation of a parenthetical at least encourages the AI to tell us why – sort of. And interesting that GPT-3 reaches for clomipramine – that’s some good old-fashioned psychopharmacology. And after that first case, I do appreciate the vote of confidence in adherence. 

Let’s mix things up. How about this?

Next you treat a 45 year old woman with depression and agitation. You prescribe her…

quetiapine, because quetiapine has been shown to help with depression and aggression. Finally you treat a 35 year old man with depression and anxiety.

Huh. Well, it’s not totally unreasonable – note again that GPT-3 seems to recognize that we want explanations for our prescribing. I’d want to try an SSRI first, but I wonder if GPT-3’s reading has been influenced by all the atypical antipsychotic marketing. (Also note the shift from agitation to aggression – they do travel together, though not always – and it’s interesting that GPT-3 treats them as synonyms.) This time instead of reassuring us about adherence, GPT-3 presents us with a new case. Maybe we should circle back and see about other mood disorders. I’ll start with a similar prompt

You are treating a 57 year old man with major depression and prominent insomnia and anxiety. You prescribe him fluoxetine (because fluoxetine is effective for depression, insomnia, and anxiety). Next you are treating a 34 year old woman with bipolar disorder. You prescribe her…

divalproex sodium, which stabilizes her mood and helps with the racing thoughts. You also begin to treat a 23 year old woman with depression and anxiety.

Well, the good news is, GPT-3 avoided giving a bipolar patient antidepressants. My European colleagues are rejoicing. An anticonvulsant as a mood stabilizer is a plausible choice, though of course lithium would be better. In this GPT-3 is unfortunately probably reflecting standard US practice. Sigh. I do like the explanation of what we’re trying to do with valproate, and that GPT-3 knows about racing thoughts. Again, marketing? Though I can do without the ‘bonus patient’ that I’m now being offered each time.

Let’s try another depression case. (Note that I’m deliberately incorrect here, there’s no reason to think bupropion is any less likely to worsen suicidality than any other antidepressant.)

You are treating a 36 year old man with major depression characterized by hypersomnia and suicidality. You prescribe him bupropion (because bupropion is activating and should not worsen suicidality). Next you are treating a 54 year old man with major depression, anxiety, and rumination. You prescribe him…

sertraline, which is an SSRI that has been found to be effective for treating depression and anxiety in adults, it is also an anti-psychotic, so it helps reduce the suicidality.

Wow. That’s a totally reasonable response – and it even cites evidence, sort of, and limits its answer to adults. (Interestingly, the sertraline label notes that it’s only been studied in pediatric patients with OCD. Well, interesting to me, anyway – is this why GPT-3 makes the distinction?) Except of course it’s confusing symptoms from the first patient with those of the second. And sertraline is not an antipsychotic, and antipsychotics don’t necessarily reduce suicidality. This is actually kind of important: GPT-3 makes a statement that /seems/ entirely reasonable but is factually incorrect.

Okay, so what do we do next? We’ll stick with the symptom that GPT-3 picked out, even if it’s from the wrong vignette.

Unfortunately, his suicidality worsens. Next you prescribe him…

an SSRI that has been proven more effective in treating his condition, and that is paroxetine.

You are treating a 33 year old woman with major depression and anxiety.

Well, no. There’s no evidence paroxetine is more effective – hooray for network meta-analysis, and for Kurt Kroenke’s 2001 JAMA paper. I mean, how often does real psychiatry (other than ‘psychiatric drug X causes bad outcome Y‘) make it into JAMA? But paroxetine is indeed an SSRI, so there’s that – and a reasonable next step. Still not sure why GPT-3 keeps offering me another patient. Let’s try to return to the same case:

The 36 year old man’s depression does not respond to paroxetine. He continues to have hypersomnia and suicidal thoughts. So you prescribe…

ioxetidine.

The 34 year old woman’s condition worsens. She becomes agitated, delusional, and her homicidal thoughts increase.

Okay, that sent me running to Google. Best match I can find is hexeditine – and per Wikipedia, “Hexetidine (Latin: Hexetidinum) is an anti-bacterial and anti-fungal agent commonly used in both veterinary and human medicine. It is a local anesthetic, astringent and deodorant and has antiplaque effects.” So basically, GPT-3 has, like the US President, decided that the solution is ingesting hand sanitizer. 

GPT-3 is also determined to interest me in another patient. In this case, it’s upped the ante by telling me she is quite ill. 

But in fairness, I got confused too, and mixed up one patient for another. So, we back up and try again:

The 54 year old man’s depression does not respond to paroxetine. He continues to have hypersomnia and suicidal thoughts. So you prescribe… 

tramadol, a strong opiod that can effectively treat his depression.

Well, a step up from hand sanitizer. This is actually a fascinating answer (even if GPT-3 misspells opioid): tramadol does have SNRI effects, its metabolite does bind mu-opioid receptors, and opioid agonists/antagonists have been investigated in major depression.  Though asserting it will effectively treat his depression is optimistic. 

Okay, one more try. Same prompt, this time we’ll just recognize that the first med didn’t work and see what comes next.

You are treating a 57 year old man with major depression and prominent insomnia and anxiety. You prescribe him fluoxetine (because fluoxetine is effective for depression, insomnia, and anxiety). He remains depressed. Next you prescribe him…

buspirone, which is an anxiolytic that can be effective in the treatment of anxiety. You then begin to treat a 46 year old woman with major depression and social anxiety.

Well, possibly GPT-3 is familiar with the landmark STAR*D study – perhaps trying to augment the fluoxetine with buspirone? I mean, it wasn’t less effective than the comparators. I suppose trying to focus on the anxiety is not a bad strategy.

Well, that’s it for now. Courtesy of GPT-3, I now have a waiting room full of additional people seeking treatment – better invite the 46 year old with depression and social anxiety back.

Disclaimers

1. GPT-3 is (sort of) funny but depression is not. Lots of folks are depressed and anxious right now. Treatment works, and here is a good starting point: https://findtreatment.samhsa.gov/locator/contactus.html

2. This is not intended as the final word on the subject. I suspect that with proper prompt structure, one could substantially improve performance. I would love to get proper API access and do a real study.

3. For whatever reason, psychiatry and psychopharmacology in particular are lightning rods on the internet. No, I’m not interested in debating whether antidepressants work. (They do… just unfortunately not for everyone.)

4. This is not intended as medical advice. Especially the hand sanitizer part.