Clinical Summary | Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions
 
Published on MedED:  9 April January 2024
Source: JAMA Network Open 

Original Published: 23 August 2023
Type of article: Clinical Research Summary
MedED Catalogue Reference: MOCS003

Category: Ophthalmology
Cross-reference: Artificial Intelligence
Keywords: AI, ChatGpt, LLM, Ophthalmology, Digital technology



 

Key Take Aways

1. Despite concerns about accuracy and differentiation, a study comparing ophthalmologists' responses with those of a ChatBot revealed strikingly similar advice, indicating the potential of AI in delivering complex eye care information.
2. The ChatBot's ability to provide nuanced and detailed answers suggests a promising future in supplementing or even replacing human consultations in ophthalmology
3. Challenges arise in distinguishing between responses from humans and AI, raising the need for clear indicators to maintain trust and credibility in patient interactions
4. While the study highlights ChatBot's proficiency, caution is warranted regarding the potential dissemination of incorrect information without proper oversight and validation.



 

 

Top


Overview | Objectives | Study Design | Findings | Discussion |  Conclusion | Original Research | Limitations | Declaration of Interests|

This is a summary of the original research distributed under the terms of the CC-BY License. © 2023 Bernstein IA et al. JAMA Network OpenIt does not replace the original work, which has been linked below.


Originally published in JAMA Open Network, 22 August 2023


Overview


Large language models (LLMs) such as ChatGPT have the potential to undertake diverse tasks, including responding to patients' inquiries about eye care. Prior to this study by Bernstein, Zang et al., there had not been a direct assessment that compared the performance of an LLMS with that of an ophthalmologist. Consequently, the accuracy, suitability, and safety of advice generated by LLMs for patients in this target audience (eye patients) were unknown.
 

Back to top
 
Objectives

The authors of this study set out to determine how the advice generated by an LLM chatbot compared with that of an ophthalmologist.

Back to top

 
Study Design & Methods
 
This cross-sectional study utilized the online Eye Care Forum as its data source. The forum permits users to pose comprehensive questions and obtain responses from physicians associated with the American Academy of Ophthalmology (AAO).

Unlike other platforms, EyeCare does not restrict patient questions to short sentences. Users have the flexibility to present their inquiries in longer formats, facilitating the inclusion of context and detailed information, both valuable for the responding ophthalmologists. The researchers accessed the question-and-answer data from this forum, and saved the first ophthalmologists' responses to each post.

For this study, researchers selected ChatGPT, an LLM capable of generating responses based on given prompts or contexts. They employed prompt engineering to train the LLM using real-life questions from the EyeCare forum and providing explicit instructions to adapt its behaviour accordingly. The AI was guided to recognize key elements of an ideal response, such as partnership, empathy, apology, respect, and support, as outlined in the linked document below
 


The final dataset constituted a randomized subset of 200 question-answer pairs from the EyeCare forum.
These questions were then inputted into ChatGPT, which was instructed to respond as a human without revealing its identity. The generated answers were then added to the dataset, providing both AI and human responses for each question.

A panel of 8 board-certified ophthalmologists was tasked with distinguishing between answers from the ChatGPT chatbot and those from humans.

 


 

Findings 
 
The final data sample comprised 200 questions with a median (IQR) length of 101.5 words. 

The Chatbot responses to the questions were slightly longer than the human responses.
  • The AI responses averaged 129 words vs human responses, which averaged 77.5 words. 
 
The expert panel was able to distinguish between the Chatbot and human answers.
  • The panel's mean accuracy for distinguishing correctly was 61.3%; the individual rater accuracies ranged from 45% to 74%.
 
The expert panel assessed both the Chatbot and human responses similarly in terms of accuracy, alignment with medical consensus, potential harm, and the severity of harm. 
  • Chatbot answers were as likely as human answers to contain incorrect or inappropriate material.
  • Chatbot answers did not have a different likelihood of harm when compared to human answers. 
  • Consequently, there were no statistically significant disparities between the quality and potential harm of Chatbot and human answers on these criteria.
 
Examples of both data sets can be found in the linked document below.

Back to top  



Discussion

This study is the first to evaluate the quality of advice generated by an LLM chatbot model compared to that of an actual ophthalmologist.  While the panel of experts were able to discern Chatbot vs human responses with 61% accuracy, the nature of the responses given by both did not differ significantly, indicating that the risk of incorrect or harmful information in this particular use case was low.  

What was surprising to the researchers with the level of complexity the Chatbot was able to introduce into its responses to common questions.

The study has implications not only for ophthalmology but for the medical field in general. As more digital health tools come online and more people turn to online resources to find answers, opportunities and threats need to be identified and managed. 

The researchers indicate that using an LLMs ChatBot could free up the ophthalmologist by addressing early concerns and pre-appointment questions, allowing them to focus on the more complex aspects of the patient's care. Similarly, when well crafted, LLMs could provide patients in underserved areas with access to quality information, which would be a significant benefit of the technology.

Nonetheless, it is important to be aware of the dangers of unmoderated use of these technologies, not least of which is that the answers are so remarkably human-like that it may be impossible for the layperson to differentiate between quality, correct information or information that is neither.

Thus, the researchers conclude that the ideal approach for clinical applications of LLMs may be in aiding ophthalmologists, rather than serving as a patient-facing AI that substitutes their judgment. Finally, the researchers discussed the implication of the privacy of information, highlighting the fact that strategies need to be incorporated into any of the technologies to ensure patient privacy is protected. 

Back to top
 


Conclusion

The researchers hope their study stimulates dialogue among stakeholders to guide responsible integration into ophthalmology practice.
 
LLMs show great promise in enhancing patient care, particularly in generating responses to complex medical queries, as demonstrated in this study on ophthalmology inquiries.
 Given the rapid pace of development, this type of research is urgently needed to understand patient attitudes toward LLM-augmented ophthalmology, assess the clarity and acceptability of LLM-generated answers, test LLM performance in diverse clinical settings, and establish ethical guidelines.
 
Domain-specialized LLMs, which are trained using disease-specific data, could be the solution to meeting specific need cases, such as addressing the more common questions patients may have
.

 

Back to top


Limitations

The following limitations were noted:

  • Given the small study size and the fact that the questions were sampled from a single forum, it is unclear how representative they are of the greater population. 
  • The human ophthalmologists were volunteering their time to answer these anonymous questions on the medical forum, and likely, these answers would not be representative of the typical doctor-patient interaction that occurs when there is an established relationship.
  • The medical context was missing for both the responding physician and the AI, which may have impacted the answers given.
  • It is unknown whether chatbots are currently capable of ingesting large amounts of medical contextual information related to a single patient, which could significantly impact the answers provided
  • The study parameters focused only on the accuracy and safety of chatbot-generated advice but did not evaluate patient satisfaction or other factors. The panel of advisors consisted of only nine ophthalmologists. A broader sample of clinicians could yield different perspectives, potentially influencing the study's outcomes.
  • Furthermore, while GPT-3.5—the LLM powering the Chatbot in this study—was trained on a vast corpus of publicly accessible text from books, websites, and articles, the precise details regarding the specific datasets used for training GPT-3.5 are not publicly disclosed. 
  • Finally, ophthalmology is a field heavily driven by eye examination and imaging; future studies may evaluate the quality of answers generated when inputs include images of the eye submitted by the patient, which may significantly change the quality of the answers.
 
 


Back to top
 



 Disclosures

Conflict of Interest Disclosures:
None

Funding/Support: 

National Eye Institute (grant No. K23EY03263501); Career Development Award from Research to Prevent Blindness; an unrestricted departmental grant from Stanford University for Research to Prevent Blindness; departmental grant National Eye Institute (grant No. P30-EY026877).
 

Role of the Funder/Sponsor: 
The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
 
Back to top

Disclaimer
This article is in no way presented as an original work.  Every effort has been made to attribute quotes and content correctly. Where possible, all information has been independently verified. The Medical Education Network bears no responsibility for any inaccuracies which may occur from the use of third-party sources. If you have any queries regarding this article contact us 


Fact-checking Policy
The Medical Education Network makes every effort to review and fact-check the articles used as source material in our summaries and original material. We have strict guidelines in relation to the publications we use as our source data, favouring peer-reviewed research wherever possible. Every effort is made to ensure that the information contained here accurately reflects the original material. Should you find inaccuracies or out-of-date content or have any additional issues with our articles, please make use of the Contact Us form to notify us.

 

 

 

 

 

 

 

Rapid SSL

The Medical Education Network
Powered by eLecture, a VisualLive Solution