Are Large Language Models ready for the lab medicine?

Large Language Models (LLMs) like ChatGPT, Claude, and Med-PaLM have drawn widespread attention for their ability to pass medical board exams, generate empathetic responses, and assist with clinical documentation. GPT-4, a general-purpose model, has scored over 90% on USMLE-style questions, rivaling purpose-built tools like Med-Gemini. In a study published in JAMA Internal Medicine, nearly 80% of physician evaluators preferred ChatGPT’s replies to real patient questions for their clarity and empathy (JAMA Intern Med. 2023). These models aren’t theoretical—they’re already helping clinicians manage workloads and improve patient communication.

Healthcare is already seeing this shift. Epic’s Augmented Response Tool (ART), which drafts patient messages using OpenAI’s LLMs, is now deployed in over 180 health systems, generating more than a million messages per month. Similarly, AI scribes like Nuance DAX Copilot produce over 2 million physician notes monthly. These tools reduce administrative overhead, ease documentation burdens, and free clinicians to focus more on patient care.

Laboratory medicine is also beginning to explore LLMs, though the pathway is more nuanced. A 2024 study in the American Journal of Clinical Pathology evaluated ChatGPT on 258 board-style questions. The model matched or exceeded peer performance in areas like chemistry, hematology, and coagulation, but struggled with more interpretive or niche content. Another study in Clinical Chemistry found that ChatGPT’s responses to real patient lab result questions were preferred nearly 76% of the time, cited for being better structured, clearer, and more patient-centered than those of professionals—but also noted issues such as over-explanation.

So, where’s the opportunity in lab medicine? It spans a wide range—from drafting SOPs, preparing regulatory documentation, and supporting staff training, to troubleshooting QC issues and interpreting test results and improving test utilization. But realizing these benefits comes with technical, regulatory, and security challenges. On the technical side, some tasks can be addressed with well-structured prompts; others benefit from retrieval-augmented generation (RAG), which anchors LLM output in internal institutional knowledge like LIS updates or CAP checklists. More advanced implementations use APIs to integrate LLMs directly into existing platforms, streamlining documentation and review workflows. Further possibilities—like function calling or fine-tuning—require stronger infrastructure and IT support.

Security and regulatory concerns add additional complexity. LLM-generated content must be auditable, traceable, and privacy-compliant—especially in high-stakes environments. Human review is essential. For this reason, the best approach is to start small: pilot low-risk use cases, evaluate outcomes, and scale up with confidence. With proper oversight and governance, LLMs can transition from experimental tools to reliable assets in laboratory operations.

The future of LLMs in lab medicine depends on broad education and engagement across the laboratory community. Everyone—from bench technologists to lab directors—needs to understand what these tools can and cannot do, how to mitigate their risks, and how to use them responsibly. It’s also our role to shape emerging regulatory frameworks and define standards for trustworthy AI in lab workflows. The bottom line: LLMs are no longer optional. They’re already transforming healthcare—and the lab is no exception. The opportunity is here. The question is: how will your lab respond?

Reference

Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Amin, M., Hou, L., Clark, K., Pfohl, S. R., Cole-Lewis, H., Neal, D., Rashid, Q. M., Schaekermann, M., Wang, A., Dash, D., Chen, J. H., Shah, N. H., Lachgar, S., Mansfield, P. A., Prakash, S., … Natarajan, V. (2025). Toward expert-level medical question answering with large language models. Nature medicine, 31(3), 943–950. https://doi.org/10.1038/s41591-024-03423-7
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing physician and artificial intelligence Chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589. https://doi.org/10.1001/jamainternmed.2023.1838
Geetha, S. D., Khan, A., Khan, A., Kannadath, B. S., & Vitkovski, T. (2024). Evaluation of ChatGPT pathology knowledge using board-style questions. American journal of clinical pathology, 161(4), 393–398. https://doi.org/10.1093/ajcp/aqad158
Girton, M. R., Greene, D. N., Messerlian, G., Keren, D. F., & Yu, M. (2024). ChatGPT vs Medical Professional: Analyzing Responses to Laboratory Medicine Questions on Social Media. Clinical chemistry, 70(9), 1122–1139. https://doi.org/10.1093/clinchem/hvae093