PubMedQA Homepage

About

The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts.

For more details about PubMedQA, please refer to this paper:

Dataset

PubMedQA has 1k expert labeled, 61.2k unlabeled and 211.3k artificially generated QA instances.

Please visit our GitHub repository to download the dataset:

Submission

To submit your model, please follow the instructions in the GitHub repository.

Citation

If you use PubMedQA in your research, please cite our paper by:

@inproceedings{jin2019pubmedqa,
  title={PubMedQA: A Dataset for Biomedical Research Question Answering},
  author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={2567--2577},
  year={2019}
}

Leaderboard (reasoning-required setting)

	Model	Size	Accuracy (%)	Macro-F₁ (%)
1 Nov 28, 2023	GPT-4 (Medprompt) Microsoft (Nori et al. 2023)	NA	82.0	NA
2 May 16, 2023	Med-PaLM 2 Google Research & DeepMind (Singhal et al. 2023)	NA	81.8	NA
3 Nov 27, 2023	MEDITRON EPFL (Chen et al. 2023)	70B	81.6	NA
4 Jul 6, 2023	Palmyra-Med Writer Inc. (Kamble et al. 2023)	40B	81.1	NA
5 Dec 1, 2023	AntGLM-Med Ant Group (Li et al. 2023)	10B	80.6	NA
6 Apr 12, 2023	GPT-4-base Microsoft & OpenAI (Nori et al. 2023)	NA	80.4	NA
7 Mar 4, 2024	Claude 3 Anthropic (Anthropic, 2024)	NA	79.7	NA
8 Jan 11, 2023	GPT-3.5 + Z-Code++ Microsoft Azure AI (He et al. 2022)	175B	79.6	55.8
9 Dec 26, 2022	Flan-PaLM (3-shot) Google Research & DeepMind (Singhal et al. 2022)	540B	79.0	NA
10 Mar 14, 2024	HEAL DeepScribe Inc (Yuan et al. 2024)	13B	78.4	NA
11 Dec 20, 2022	Codex (5-shot) Technical University of Denmark & Copenhagen University Hospital (Liévin et al. 2022)	175B	78.2	NA
12 Sep 13, 2019	Human Performance University of Pittsburgh & Carnegie Mellon University (Jin et al. 2019)	NA	78.0	72.2
13 Nov 16, 2022	Galactica Meta AI (Taylor et al. 2022)	120B	77.6	NA
14 May 22, 2023	GatorTronGPT University of Florida & NVIDIA (Peng et al. 2023)	20B	77.6	NA
15 Mar 4, 2024	MediSwift-XL Cerebras Systems (Thangarasa et al. 2024)	1.2B	76.8	NA
16 Mar 20, 2023	GPT-4 Microsoft & OpenAI (Nori et al. 2023)	NA	75.2	NA
17 Apr 18, 2024	Reka Core Reka (Ormazabal et al. 2024)	NA	74.6	NA
18 Dec 15, 2022	PubMedGPT Stanford University (Bolton et al. 2022)	2.7B	74.4	NA
19 Oct 17, 2022	DRAGON Stanford University & EPFL (Yasunaga et al. 2022)	360M	73.4	NA
20 Apr 27, 2023	PMC-LLaMA Shanghai Jiao Tong University (Wu et al. 2023)	7B	73.4	NA
21 Mar 29, 2022	BioLinkBERT (large) Stanford University (Yasunaga et al. 2022)	340M	72.2	NA
22 Mar 29, 2022	BioLinkBERT (base) Stanford University (Yasunaga et al. 2022)	110M	70.2	NA
23 Sep 13, 2019	BioBERT (multi-phase tuning) University of Pittsburgh & Carnegie Mellon University (Jin et al. 2019)	110M	68.1	52.7
24 June, 2021	BioELECTRA SAAMA AI Research Lab (Kanakarajan et al. 2019)	110M	64.0	NA
25 July 31, 2020	PubMedBERT Microsoft Research (Gu et al. 2020)	110M	55.8	NA