About
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts.
For more details about PubMedQA, please refer to this paper:
Dataset
PubMedQA has 1k expert labeled, 61.2k unlabeled and 211.3k artificially generated QA instances.
Please visit our GitHub repository to download the dataset:
Submission
To submit your model, please follow the instructions in the GitHub repository.
Citation
If you use PubMedQA in your research, please cite our paper by:
@inproceedings{jin2019pubmedqa, title={PubMedQA: A Dataset for Biomedical Research Question Answering}, author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua}, booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)}, pages={2567--2577}, year={2019} }
Leaderboard (reasoning-required setting)
Model | Code | Size | Accuracy (%) | Macro-F1 (%) | |
---|---|---|---|---|---|
1 Nov 28, 2023 |
GPT-4 (Medprompt) Microsoft (Nori et al. 2023) |
NA | 82.0 | NA | |
2 May 16, 2023 |
Med-PaLM 2 Google Research & DeepMind (Singhal et al. 2023) |
NA | 81.8 | NA | |
3 Nov 27, 2023 |
MEDITRON EPFL (Chen et al. 2023) |
70B | 81.6 | NA | |
4 Jul 6, 2023 |
Palmyra-Med Writer Inc. (Kamble et al. 2023) |
40B | 81.1 | NA | |
5 Dec 1, 2023 |
AntGLM-Med Ant Group (Li et al. 2023) |
10B | 80.6 | NA | |
6 Apr 12, 2023 |
GPT-4-base Microsoft & OpenAI (Nori et al. 2023) |
NA | 80.4 | NA | |
7 Mar 4, 2024 |
Claude 3 Anthropic (Anthropic, 2024) |
NA | 79.7 | NA | |
8 Jan 11, 2023 |
GPT-3.5 + Z-Code++ Microsoft Azure AI (He et al. 2022) |
175B | 79.6 | 55.8 | |
9 Dec 26, 2022 |
Flan-PaLM (3-shot) Google Research & DeepMind (Singhal et al. 2022) |
540B | 79.0 | NA | |
10 Mar 14, 2024 |
HEAL DeepScribe Inc (Yuan et al. 2024) |
13B | 78.4 | NA | |
11 Dec 20, 2022 |
Codex (5-shot) Technical University of Denmark & Copenhagen University Hospital (LiƩvin et al. 2022) |
175B | 78.2 | NA | |
12 Sep 13, 2019 |
Human Performance University of Pittsburgh & Carnegie Mellon University (Jin et al. 2019) |
NA | 78.0 | 72.2 | |
13 Nov 16, 2022 |
Galactica Meta AI (Taylor et al. 2022) |
120B | 77.6 | NA | |
14 May 22, 2023 |
GatorTronGPT University of Florida & NVIDIA (Peng et al. 2023) |
20B | 77.6 | NA | |
15 Mar 4, 2024 |
MediSwift-XL Cerebras Systems (Thangarasa et al. 2024) |
1.2B | 76.8 | NA | |
16 Mar 20, 2023 |
GPT-4 Microsoft & OpenAI (Nori et al. 2023) |
NA | 75.2 | NA | |
17 Apr 18, 2024 |
Reka Core Reka (Ormazabal et al. 2024) |
NA | 74.6 | NA | |
18 Dec 15, 2022 |
PubMedGPT Stanford University (Bolton et al. 2022) |
2.7B | 74.4 | NA | |
19 Oct 17, 2022 |
DRAGON Stanford University & EPFL (Yasunaga et al. 2022) |
360M | 73.4 | NA | |
20 Apr 27, 2023 |
PMC-LLaMA Shanghai Jiao Tong University (Wu et al. 2023) |
7B | 73.4 | NA | |
21 Mar 29, 2022 |
BioLinkBERT (large) Stanford University (Yasunaga et al. 2022) |
340M | 72.2 | NA | |
22 Mar 29, 2022 |
BioLinkBERT (base) Stanford University (Yasunaga et al. 2022) |
110M | 70.2 | NA | |
23 Sep 13, 2019 |
BioBERT (multi-phase tuning) University of Pittsburgh & Carnegie Mellon University (Jin et al. 2019) |
110M | 68.1 | 52.7 | |
24 June, 2021 |
BioELECTRA SAAMA AI Research Lab (Kanakarajan et al. 2019) |
110M | 64.0 | NA | |
25 July 31, 2020 |
PubMedBERT Microsoft Research (Gu et al. 2020) |
110M | 55.8 | NA |