MedicaLLM Evaluation
Comprehensive evaluation platform for medical AI systems and algorithms

About LLM Evaluation
Our project, Medical LLM Evaluation, addresses a critical need in healthcare: ensuring AI-powered language models provide accurate, safe, and culturally relevant information. We've developed a comprehensive framework to evaluate the performance of Large Language Models (LLMs) in answering questions related to maternal health in multiple languages, including English, Hindi, and Marathi. Our work is essential for validating AI tools before they are used by patients and healthcare providers.
Our Solution
Comprehensive AI model testing and validation protocols.
A gold-standard dataset of questions and answers created and validated by local medical experts.
A professionally weighted scoring system to determine the clinical accuracy, completeness, and contextual safety.
Evaluation of responses on multiple dimensions: medical quality, semantic similarity, and language quality.
Focus on low-resource languages like Hindi and Marathi to ensure linguistic accessibility and cultural relevance.
Utilization of state-of-the-art NLP models like Cohere's Command-A and Aya Expanse for multilingual QA.
A holistic final score that aggregates medical, semantic, and linguistic quality for a complete performance metric.

Why LLM Evaluation Stands Out
Unique Value
- Comprehensive multi-dimensional evaluation framework
- Real-time bias detection and mitigation tools
- Regulatory compliance and safety assessment
- Clinical validation with healthcare partners
Evidence of Success
*Including Cohere's Command-A and Aya
English, Hindi, Marathi
*Semantic Similarity, Linguistic Analysis, and LLM-based Evaluation.
Medical LLM Evaluation Platform Showcase
These slides demonstrate our comprehensive approach to assessing medical AI systems, featuring our dashboard interface, scoring mechanisms, and validation methodologies.







Meet the Team

Varun Nair
Machine Learning Engineer
D.J. Sanghvi College of Engg.
B. Tech Computer Engineering ‘ 25

Himanshu Beniwal
Mentor
PHD Student
Indian Institute of Technology Gandhinagar

Dhara Mungra
CTO
Data Scientist
MS, New York University

Swapneel Mehta
Chief Scientist
Postdoc, MIT & Boston University
Ph.D. New York University