ChatGPT-4 performs well on ARRT exam practice questions

ChatGPT-4 performed well on practice questions for the American Registry of Radiologic Technologists (ARRT) Radiography Certification Exam in a study published August 16 in Academic Radiology.

Researchers led by Yousif Al-Naser from McMaster University in Hamilton, Ontario, Canada found that while the chatbot achieved an overall score of about 80%, its accuracy in answering image-based questions was low. It performed well on text-based questions.

“By identifying where ChatGPT excels and where it falls short, educators and technologists can better strategize how to integrate AI into educational frameworks effectively,” Al-Naser and colleagues wrote.

Radiologists and technologists continue to test generative AI’s merit when it comes to medical board exams. ChatGPT has shown mixed results when taking different exams from various radiologic societies. A 2023 study for example found that ChatGPT-4 performed well on the image-independent American College of Radiology (ACR) Diagnostic In-Training Exam (ACR DXIT) practice questions. However, a 2024 study found that the chatbot performed poorly on the actual ACR DXIT exam.

The ARRT Radiography Certification Exam is comprised of 200 multiple-choice questions, certifying the competence of radiologic technologists. Al-Naser and co-authors assessed ChatGPT-4’s performance in responding to practice questions like those found in the ARRT board examination.

The researchers used a dataset of 200 practice questions for the exam from BoardVitals. They fed each question to ChatGPT-4 15 times, resulting in 3,000 observations to account for response variability.

ChatGPT-4 achieved an overall performance of 80.56%, with higher accuracy on text-based questions (86.3%) compared with image-based questions (45.6%). The chatbot also showed a response time of 18.01 seconds for image-based questions, compared to the 13.27 seconds needed for text-based questions.

Additionally, ChatGPT’s performance varied by domain; while it achieved accuracies of 72.6% for safety and 70.6% for image production, it also scored 67.3% for patient care and 53.4% for procedures. Finally, the large language model achieved the highest performance on questions deemed to be easy (78.5%).

The study authors called for such AI models to be further developed, particularly for image processing and interpretation, to increase their use in educational settings. They added that by analyzing ChatGPT’s strengths and weaknesses, the model’s use in education could be improved and help improve outcomes for students in radiologic technology.

“These tools can provide students with interactive, AI-driven quizzes that offer immediate feedback and explanations, improving their understanding of radiographic imaging principles,” the authors wrote. “ChatGPT has the potential to enhance accessibility by providing on-demand content, which is beneficial for supplementary learning outside of classroom environments.”

The full results can be found here.

 

Back to the Featured Stories

Connect with us

Whether you are a professional looking for a new job or a representative of an organization who needs workforce solutions - we are here to help.