Dataset card

Dataset Card for PathVQA

Dataset Description

PathVQA is a dataset of question-answer pairs on pathology images. The dataset is intended to be used for training and testing
Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions.
The dataset is built from two publicly-available pathology textbooks: "Textbook of Pathology" and "Basic Pathology", and a
publicly-available digital library: "Pathology Education Informational Resource" (PEIR). The copyrights of images and captions
belong to the publishers and authors of these two books, and the owners of the PEIR digital library.

Repository: PathVQA Official GitHub Repository

Paper: PathVQA: 30000+ Questions for Medical Visual Question Answering

Leaderboard: Papers with Code Leaderboard

Dataset Summary

The dataset was obtained from the updated Google Drive link shared by the authors on Feb 15, 2023,
see the commit
in the GitHub repository. This version of the dataset contains a total of 5,004 images and 32,795 question-answer pairs.
Out of the 5,004 images, 4,289 images are referenced by a question-answer pair, while 715 images are not used.
There are a few image-question-answer triplets which occur more than once in the same split (training, validation, test).
After dropping the duplicate image-question-answer triplets, the dataset contains 32,632 question-answer pairs on 4,289 images.

Supported Tasks and Leaderboards

The PathVQA dataset has an active leaderboard on Papers with Code
where models are ranked based on three metrics: "Yes/No Accuracy", "Free-form accuracy" and "Overall accuracy". "Yes/No Accuracy" is
the accuracy of a model's generated answers for the subset of binary "yes/no" questions. "Free-form accuracy" is the accuracy
of a model's generated answers for the subset of open-ended questions. "Overall accuracy" is the accuracy of a model's generated
answers across all questions.

Languages

The question-answer pairs are in English.

Dataset Structure

Data Instances

Each instance consists of an image-question-answer triplet.

{
  'image': <PIL.JpegImagePlugin.JpegImageFile image mode=CMYK size=309x272>,
  'question': 'where are liver stem cells (oval cells) located?',
  'answer': 'in the canals of hering'
}

Data Fields

  • 'image': the image referenced by the question-answer pair.
  • 'question': the question about the image.
  • 'answer': the expected answer.

Data Splits

The dataset is split into training, validation and test. The split is provided directly by the authors.

Training Set Validation Set Test Set
QAs 19,654 6,259 6,719
Images 2,599 832 858

Additional Information

Licensing Information

The authors have released the dataset under the MIT License.

Citation Information

@article{he2020pathvqa,
    title={PathVQA: 30000+ Questions for Medical Visual Question Answering},
    author={He, Xuehai and Zhang, Yichen and Mou, Luntian and Xing, Eric and Xie, Pengtao},
    journal={arXiv preprint arXiv:2003.10286},
    year={2020}
}