About
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts.
For more details about PubMedQA, please refer to this paper:
Dataset
PubMedQA has 1k expert labeled, 61.2k unlabeled and 211.3k artificially generated QA instances.
Please visit our GitHub repository to download the dataset:
Submission
To submit your model, please follow the instructions in the GitHub repository.
Citation
If you use PubMedQA in your research, please cite our paper by:
@inproceedings{jin2019pubmedqa,
  title={PubMedQA: A Dataset for Biomedical Research Question Answering},
  author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={2567--2577},
  year={2019}
}
    
      Leaderboard (reasoning-required setting)
    
    | Model | Code | Size | Accuracy (%) | Macro-F1 (%) | |
|---|---|---|---|---|---|
| 1 Nov 28, 2023 | GPT-4 (Medprompt) Microsoft (Nori et al. 2023) | NA | 82.0 | NA | |
| 2 May 16, 2023 | Med-PaLM 2 Google Research & DeepMind (Singhal et al. 2023) | NA | 81.8 | NA | |
| 3 Nov 27, 2023 | MEDITRON EPFL (Chen et al. 2023) | 70B | 81.6 | NA | |
| 4 Jul 6, 2023 | Palmyra-Med Writer Inc. (Kamble et al. 2023) | 40B | 81.1 | NA | |
| 5 Dec 1, 2023 | AntGLM-Med Ant Group (Li et al. 2023) | 10B | 80.6 | NA | |
| 6 Apr 12, 2023 | GPT-4-base Microsoft & OpenAI (Nori et al. 2023) | NA | 80.4 | NA | |
| 7 Mar 4, 2024 | Claude 3 Anthropic (Anthropic, 2024) | NA | 79.7 | NA | |
| 8 Jan 11, 2023 | GPT-3.5 + Z-Code++ Microsoft Azure AI (He et al. 2022) | 175B | 79.6 | 55.8 | |
| 9 Dec 26, 2022 | Flan-PaLM (3-shot) Google Research & DeepMind (Singhal et al. 2022) | 540B | 79.0 | NA | |
| 10 Mar 14, 2024 | HEAL DeepScribe Inc (Yuan et al. 2024) | 13B | 78.4 | NA | |
| 11 Dec 20, 2022 | Codex (5-shot) Technical University of Denmark & Copenhagen University Hospital (LiƩvin et al. 2022) | 175B | 78.2 | NA | |
| 12 Sep 13, 2019 | Human Performance University of Pittsburgh & Carnegie Mellon University (Jin et al. 2019) | NA | 78.0 | 72.2 | |
| 13 Nov 16, 2022 | Galactica Meta AI (Taylor et al. 2022) | 120B | 77.6 | NA | |
| 14 May 22, 2023 | GatorTronGPT University of Florida & NVIDIA (Peng et al. 2023) | 20B | 77.6 | NA | |
| 15 Mar 4, 2024 | MediSwift-XL Cerebras Systems (Thangarasa et al. 2024) | 1.2B | 76.8 | NA | |
| 16 Mar 20, 2023 | GPT-4 Microsoft & OpenAI (Nori et al. 2023) | NA | 75.2 | NA | |
| 17 Apr 18, 2024 | Reka Core Reka (Ormazabal et al. 2024) | NA | 74.6 | NA | |
| 18 Dec 15, 2022 | PubMedGPT Stanford University (Bolton et al. 2022) | 2.7B | 74.4 | NA | |
| 19 Oct 17, 2022 | DRAGON Stanford University & EPFL (Yasunaga et al. 2022) | 360M | 73.4 | NA | |
| 20 Apr 27, 2023 | PMC-LLaMA Shanghai Jiao Tong University (Wu et al. 2023) | 7B | 73.4 | NA | |
| 21 Mar 29, 2022 | BioLinkBERT (large) Stanford University (Yasunaga et al. 2022) | 340M | 72.2 | NA | |
| 22 Mar 29, 2022 | BioLinkBERT (base) Stanford University (Yasunaga et al. 2022) | 110M | 70.2 | NA | |
| 23 Sep 13, 2019 | BioBERT (multi-phase tuning) University of Pittsburgh & Carnegie Mellon University (Jin et al. 2019) | 110M | 68.1 | 52.7 | |
| 24 June, 2021 | BioELECTRA SAAMA AI Research Lab (Kanakarajan et al. 2019) | 110M | 64.0 | NA | |
| 25 July 31, 2020 | PubMedBERT Microsoft Research (Gu et al. 2020) | 110M | 55.8 | NA |