Is there a large medical database with symptom frequencies for diseases? I.e. the percentage of patients with a specific disease that have specific symptoms?

For example (numbers and symptoms made up by me):

Barrett's esophagus:

  • longstanding heartburn 81.3%
  • dysphagia 72.5%
  • hematemesis 38.1%
  • unintentional weight loss 20.9%


  • fever 51.2%
  • fatigue 17.1%
  • dry cough 66.0%
  • sneezing 26.2%
  • malaise 8.5%

The answer should not be several research papers that contain two or three different diseases with symptom frequencies each but a proper database containing hundreds if not thousands of diseases with symptom frequencies.

  • Big Data Medicine is a promising field with much profit to be made, to my knowledge there are no publicly available data sets, but IIRC both Google and MIT have their own, non-public databases.
    – Narusan
    Commented Jun 8, 2020 at 8:45
  • Maybe an analysis of electronic health records (not scientific papers) would be interesting but I couldn't find one?
    – lordy
    Commented Jun 8, 2020 at 8:48

2 Answers 2


Nature Communications have such a database.

Human symptoms–disease network

XueZhong Zhou, Jörg Menche, Albert-László Barabási & Amitabh Sharma

Published: 26 June 2014

They count the number of times a disease and a symptom keywords both appear on the same PudMed article, and assign a TF-IDF value to each pair.

In their words:

Acquisition of symptom and disease relationships

The association between symptoms and diseases were then quantified using term co-occurrence (number of PubMed identifiers in which two terms appear together;

Link to the database:


(Supplementary Data 3 from the article)

  • This is good (I up-voted the answer) but not exactly what I search for - I am searching more for data directly counted by MDs as pubmed articles might be quite biased ...
    – lordy
    Commented Jun 8, 2020 at 8:31

What you're looking for is used by AI or Bayesian networks for online diagnosis. These databases are commercially sensitive and likely only accessible by API rather than raw data. For example, ISABEL, has been around for 20 years


Not the answer you're looking for? Browse other questions tagged or ask your own question.