Microsoft has introduced a powerful new tool, the AI Diagnostic Orchestrator, which outperforms experienced physicians in diagnosing complex medical cases. Tested on 304 challenging real-world scenarios from the New England Journal of Medicine, the AI system reached correct diagnoses in 85% of cases, four times the accuracy rate of human doctors.
The system mimics how real doctors work by analyzing symptoms step by step, asking diagnostic questions, and ordering virtual tests as needed. For instance, if a patient presents with a cough and fever, the AI may request a blood test and X-ray before diagnosing pneumonia.
Each step incurs a virtual cost, allowing researchers to measure accuracy and resource efficiency. Microsoft developed this new approach to move beyond multiple-choice tests like the US Medical Licensing Exam.
The company argues that such tests reward memorization over critical thinking and exaggerate AI’s diagnostic abilities. Instead, Microsoft’s AI focuses on sequential diagnosis, reflecting real clinical reasoning and decision-making.
The research team transformed over 300 NEJM case records into interactive simulations, allowing the AI to tackle each problem in stages. They used top models from OpenAI, Meta, Anthropic, and Google.
When paired with OpenAi’s o3 model, Microsoft’s system outperformed physicians who attempted the same cases without tools or support. Led by Mustafa Suleyman, Microsoft AI says the orchestra acts like a panel of specialists offering insights surpassing any doctors.
The team sees this technology as a step toward medical superintelligence, though it’s not ready for clinical use yet. The system must still be tested on more routine symptoms and conditions. While Microsoft acknowledges potential cost savings, it insists the goal is not to replace doctors but to support them.