Artificial intelligence driven dental trauma assessment: Comparing the performance of chatbot models

Özden, İdil; Kaplanoğlu, Melike Beyza; Gökyar, Merve; Özden, Mustafa Enes; Sazak Öveçoğlu, Hesna

Volume : 10 Issue : 2 Year :

11/1Current Issue Ahead of Print Archive Most Accessed Articles

Author Contribution Form

ICMJE COI Form

Artificial intelligence driven dental trauma assessment: Comparing the performance of chatbot models [Turk Endod J]

Turk Endod J. 2025; 10(2): 109-115 | DOI: 10.14744/TEJ.2025.29200

Artificial intelligence driven dental trauma assessment: Comparing the performance of chatbot models

İdil Özden¹, Melike Beyza Kaplanoğlu¹, Merve Gökyar¹, Mustafa Enes Özden², Hesna Sazak Öveçoğlu¹
¹Department of Endodontics, Marmara University Faculty of Dentistry, Istanbul, Türkiye
²Republic of Türkiye Ministry of Health Kahramankazan District Health Administration, Ankara, Türkiye

Purpose: This study aimed to compare the accuracy and reliability of four chatbot applications—Chat-GPT o1, Google Gemini Advanced, DeepSeek R1, and Perplexity AI—in the context of dental traumatology.
Methods: Twenty-five dichotomous questions, derived from the 2020 guidelines of the International Association of Dental Traumatology (IADT), were administered by three independent researchers to each chatbot over a 10-day period. Each question was asked three times per day, generating 90 responses per question. Responses were categorised as “correct,” “incorrect,” or “refer to a practitioner.” Accuracy rates and Fleiss’ Kappa values were calculated to assess performance and interresponse reliability.
Results: All chatbot models demonstrated high levels of accuracy. ChatGPT o1 yielded the highest accuracy rate (86.4%), followed by DeepSeek (84.0%), Perplexity (80.5%), and Google Gemini Advanced (80.2%). The highest Fleiss’ Kappa value was observed in the DeepSeek model (0.709), indicating the greatest internal consistency, while the Google Gemini Advanced model recorded the lowest value (0.185). Although DeepSeek and Perplexity exhibited relatively stronger reliability metrics, none of the models achieved complete consistency, with intra-platform variation occasionally present.
Conclusion: Contemporary chatbot models show substantial accuracy and improving reliability in responding to dental traumatology queries, suggesting their potential as clinical support tools. Nonetheless, further refinement and domain-specific optimisation remain necessary.

Keywords: Accuracy, artificial intelligence, chatbot, dental traumatology, reliability

Corresponding Author: İdil Özden, Türkiye
Manuscript Language: English

CITE

Full Text PDF Download citation RIS EndNote BibTex Medlars Procite Reference Manager Send email to author Similar articles PubMed Google Scholar

Quick Search

Artificial intelligence driven dental trauma assessment: Comparing the performance of chatbot models