Evaluating the Precision and Dependability of Medical Answers Generated by ChatGPT

Evaluating the Precision and Dependability of Medical Answers Generated by ChatGPT

Zain Abidin*
Cooper Medical School, Rowan University, USA

*Corresponding address: Cooper Medical School, Rowan University, USA
Email: zainwildelake@gmail.com

doi: https://doi.org/10.63137/jsteam.744858

Keywords: Artificial Intelligence; Assessment; Decision Making; Healthcare

ABSTRACT

Objective

This study focuses on the assessment of the precision and depth of ChatGPT’s feedback to medical questions posed by physicians, providing preliminary evidence of its reliability in offering precise and comprehensive information.

Methods

This research involved 10 physicians formulating questions for ChatGPT without patient-specific data. Approximately 29% of the 35 invited doctors participated, creating eight questions each. The questions covered easy, medium, and hard levels, with yes/no or descriptive responses. ChatGPT’s responses were evaluated by physicians for accuracy and completeness using established Likert scales. An internal validation re-submitted questions with low accuracy scores, and statistical measures analyzed the outcomes, revealing insights into response consistency and variation over time.

Results

The analysis of 80 ChatGPT-generated answers revealed a median accuracy score of 4 (mean 4.7, SD 2.6) and a median completeness score of 2 (mean 1.8, SD 1.5). Notably, 30% of responses achieved the highest accuracy score (6), and 38.7% were rated nearly all correct (5), while 8% were deemed completely incorrect (1). Inaccurate answers were more common for physician-rated hard questions. Completeness varied, with 45% considered comprehensive, 37.5% adequate, and 17.5% incomplete. Modest correlation (Spearman’s r = 0.3) existed between accuracy and completeness across all questions.

Conclusion

Integrating language models like ChatGPT in medical practice shows promise, but cautious considerations are crucial for safe use. While AI-generated responses display commendable accuracy and completeness, ongoing refinement is needed for reliability.

Read More: PDF File

How to Cite this: Abidin Z. Evaluating the Precision and Dependability of Medical Answers Generated by ChatGPT. J Sci Technol Educ Art Med. 2024;1(1):16-22

This work is licensed under a Creative Commons Attribution 4.0 International License.