As I live in Germany, I am trying to follow the local Covid-19 situation with attention.

The other day I saw a source (Financial Times) I was not following being shared on twitter. I noticed that it was reporting significantly different numbers from sources I am following (Die Zeit), hence this question.

In my search for an answer I came across the plots from Ourworldindata that I am adding to this question for comparison.

Source 1: Financial Times

Germany normalized new cases according to FT (click for larger version)

One can extract the following data points from this plot:

  • 190.9 new cases every 1,000,000 people on average for the week ending on November 2nd
  • 165.2 on November 3rd (a possibly incomplete number)

The Financial Times indicates the following as their source of the data:

Financial Times analysis of data from the European Centre for Disease Prevention and Control, the Covid Tracking Project, the UK Government coronavirus dashboard and the Spanish Ministry of Health. Data updated November 4 2020 11.58am GMT. Interactive version: ft.com/covid19

Source 2: Ourworldindata

Germany normalized new cases according to Ourworldindata (click for larger version)

One can extract the following data points from this plot:

  • 198.76 new cases every 1,000,000 people on average for the week ending on November 5th
  • 189.44 on November 3rd
  • 182.72 on November 2nd

The source is cited as:

Data published by European Centre for Disease Prevention and Control (ECDC) at https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

The ECDC in turn gives the following source for Germany's data : https://experience.arcgis.com/experience/478220a4c454480e823b17327b2bf1d4 This page is titled:

Robert Koch-Institut: COVID-19-Dashboard
Auswertungen basierend auf den aus den Gesundheitsämtern gemäß IfSG übermittelten Meldedaten


Evaluations based on the reporting data transmitted from the health authorities in accordance with IfSG

Source 3: Die Zeit

Germany normalized new cases according to Die Zeit

One can extract the following data points from the website:

  • 140.8 new cases every 100,000 people on average for the week ending on November 5th
  • 136.4 on November 3rd
  • 132.7 on November 2nd

The source is cited as (translation in parenthesis in italics):

Quelle: Kreis- und Landesbehörden, RKI, eigene Berechnungen. (Source: District and state authorities, RKI, own calculations.)

And additional notes are provided for "bestätigteNeuinfektionen in den letzten 7 Tagen" (confirmed new infections in the last 7 days) and "bestätigteFälle seit Beginn der Pandemie" (confirmed cases since the pandemic began):

Die Anzahl der positiven Tests, die in den vergangenen sieben Tagen gemeldet wurden. Weitere Informationen unter der Infoschaltfläche »Bestätigte Fälle«.

Bei so vielen Menschen fiel ein Test auf das Virus Sars-CoV-2 positiv aus. Wer sich angesteckt hat, aber nicht getestet wurde, wird nicht gezählt. Da nicht jeder überhaupt Symptome verspürt und nicht jeder Verdachtsfall getestet wird, liegt die Dunkelziffer wahrscheinlich höher.


The number of positive tests reported in the past seven days. Further information under the information button »Confirmed cases«.

The number of people tested positive for the Sars-CoV-2 virus. Those who are infected but not tested are not counted. Since not everyone feels symptoms at all and not every suspected case is tested, the number of unreported cases is probably higher.

Now my question:

How is it possible that the same data (from RKI) leads to wildly different normalized numbers? Why does the FT and Ourworldindata report ~190 cases per million per week, while Die Zeit reports ~140 per 100,000 per week? Isn't that almost a factor of 10x?

Even accounting for delays in updating the charts and/or interpolations/extrapolations, I don't see how that is a reasonable difference.

    You have to be careful, there are different kinds of normalized numbers used for this. The one used in German media and politics is generally the total number of cases in a 7 days timeframe per 100,000 people. But also used is sometimes the number of cases per day averaged over 7 days per 1,000,000 people.
– Mad Scientist
  I see you're right. I did not notice the "daily" in the ourworldindata chart. would you mind answering so I can accept?
– Federico
As mentioned in a comment by Mad Scientist:

Ourworldindata reports around 200 cases per 1 million inhabitants per day where this number is the average over the last 7 days. DieZeit reports around 140 cases per 100.000 inhabitants in total over the last 7 days. With rounding these two incidence rates match up exactly.

    cases per day: 200 cases per 1 million inhabitants = 200/1000000 = 0.0002 140 cases per 100000 inhabitants, divided by 7 days = (140/100000)/7 = 0.0002

Question: What is the real amount of new Covid-19 cases per week per 100,000 people in Germany?

Sad but true: nobody knows.

The best numbers available only tell you 'number of registered positive tests'.
These are not identical to 'number of cases', and especially not 'real number of cases' and also not '*real number of active cases'.

Nobody in the world can give you an even ballpark-correct figure for 'currently active real cases in Germany'. Even the best numbers from the official source which are used to inform actual policy in the country are unreliable, and unreliable to an unknown extent.

The official source with the best numbers — under the above caveat — is the Robert-Koch-Institute, which is tasked with monitoring the situation, counting the numbers, and publishing them in a timely manner.

Those numbers are of a bad quality in general, but all other sources depend on those. No newspaper can magically produce better stats.

The RKI official German language site is here: Aktueller Lage-/Situationsbericht des RKI zu COVID-19. The English version site is usually lagging a few days, but is now published only weekly and missing the footnotes giving necessary restrictions on the possible reach of the low quality numbers.

Given that all data originates at the RKI (which pushes their data up the chain to the ECDC tracking project), and recognizing the date for this question (November 5, 2020), the RKI published at the time the following numbers:

7-Tage-Inzidenz 126,8 Fälle/ 100.000 EW (src, Nov, 5, 2020)

7-Tage-Inzidenz 128,7 Fälle/ 100.000 EW (src, Nov, 6, 2020)

Which in both cases was a good bit lower than either second-hand news outlet reported.

A non-daily but retrospective analysis of the data for the 45th calendar week then shows a very different picture:

enter image description here

— Marc Schneble, Arne Bathke, Göran Kauermann: "Deutschland im Vergleich zu Österreich — zwei vergleichbare Länder und doch ein ganz unterschiedliches Infektionsgeschehen in Bezug auf COVID-19", CODAG Bericht Nr 7, LMU München, 21.01.2021.

Notice the black line for total cases over 7 days per 100,000 inhabitants hitting exactly 150 if 'sorted by day of reporting' for the week in question.

That's more than the news outlets reported. Cases are counted by 'day of reporting' or 'day of symptom onset', and cases are added and subtracted from the data set, a practice called 'corrections', weeks after the daily charts go out. The daily situation reports as displayed in any charts in newspapers are even more unreliable.

Using the data as published at the time from the situation reports for new infections per 1 Million people:

day daily changes over 7 days daily changes absolute over 7 days
1.11. +3062 14.177
2.11. +4561 12.097
3.11. +3400 15.352
4.11. +1333 17.214
5.11. + 836 19.990
6.11. +1588 21.506
7.11. +3844 23.399
-> 1487.37 per million over 7 days -> 212.5 per day on average

Is that an accurate calculation?

In another accumulated version of the RKI data, the RKI itself publishes the following:

Total number of cases over all age-groups, November: 6434 December: 22446
7-day incidence per 100000, all age-groups, November: 7,74 December: 26,99

COVID-19-Fälle nach Altersgruppe und Meldewoche (Tabelle wird jeden Dienstag aktualisiert) (as of April 13, 2021)

This calculation is indeed wildly different.

As of November 2020, which data set is correct? Which displays the dynamism of infections more adequately? Or which one has the better impact (for whatever noble or sinister motives in communication goals)?

The undetected and unreported cases cloud any of those numbers into an opaque mush of illusionary precision anyway. The more testing is performed the more cases will be generated and with any 'undetected case' of asymptomatic and non-infectious corona-nucleic-acid-carrier ending up with increased likelihood in those datasets for 'cases', any incidence-measurement number changes its meaning. A '7-day-incidence' of '50' from Spring 2020 is in very few ways even comparable to the same number '50' in Spring 2021 in terms of any real-world meaning.

Thus drawing a line into a chart that treats all those generated numbers as absolutely comparable, on the same scale, with having the same meaning implied, is bordering on scientific fraud and misrepresentation.

Because of this big 'But', neither source presented so far does really know that number, and neither source can know that number.

That's because the central agency to collect and publish these numbers counts in a peculiar way.

The Robert-Koch-Institute (RKI) says:

wertet das RKI alle labordiagnostischen Nachweise von SARS-CoV-2 unabhängig vom Vorhandensein oder der Ausprägung der klinischen Symptomatik als COVID-19-Fälle.

the RKI evaluates all laboratory diagnostic evidence of SARS-CoV-2 as COVID-19 cases, regardless of the presence or severity of clinical symptoms.
RKI Corona Dashboard

This is of course in stark contrast to any medical definition of "a case". A case is usually one person experiencing symptoms of a disease. (Compare WHO definitions for Influenza-Like-Illnesses).

While the RKI asserts that it behaves in a manner consistent with WHO guidelines, this appears not to be true. But then the WHO issues conflicting definitions as well.

One the one hand the WHO currently says it would count 'a confirmed case' as

Definition WHO-1 "confirmed case":

  • A) A person with a positive Nucleic Acid Amplification Test (NAAT)
  • B) A person with a positive SARS-CoV-2 Antigen-RDT AND meeting either the probable case definition or suspect criteria A OR B
  • C) An asymptomatic person with a positive SARS-CoV-2 Antigen-RDT who is a contact of a probable or confirmed case

But the above is of course inherently nonsensical in part, as it ignores all other clinical guidelines for diagnosing infectious respiratory illnesses. A diagnostic test cannot be in itself 'a diagnosis' and mere contact of an asymptomatic person is a terrible way to confirm anything. The problem of inherent false positives and operational false positives is completely ignored with this counting method.

One lab covering most of its region was recently analysed as giving the following kind of data quality and how accurately *''cases' currently infected' and ''cases' currently infectious' turned out over the weeks 10–49, 2020:

We analysed real-world data from a large laboratory in the city of Münster (population 313,000), Germany, derived from a single fully automated high throughput RT-PCR platform (cobas SARS-CoV-2 RT-PCR system, Roche Diagnostics) utilizing the same two gene targets for the entire study period (weeks 10-49, 2020). This laboratory performed about 80% of all SARS-CoV-2 RT-PCR tests in the Münster region during this time. We explored changes in the percentage of positive RT-PCR tests (positive rate) over time. In addition, we assessed the influence of covariates such as age, sex, calendar time, and symptoms at the time of first RT-PCR test on the distribution of cycle threshold (Ct) values. […]

RT-PCR tests that had not crossed the positivity threshold after the 40th cycle were reported as “negative”.

The Ct value is inversely proportional to the initial amount of target nucleic acid and is thus a relative indicator of the concentration of viral particles in the clinical specimen. An increase in Ct value of three points indicates that the initial amount of viral particles was smaller by a factor of about ten. […]

We categorized our population-based Ct values according to the recommendations of the UK Office for National Statistics (ONS) COVID-19 household survey as < 25 and ≥ 25. Since there has been some discussion regarding this Ct-threshold, we performed a second categorization using a cutoff of < 30 versus ≥ 30. For a small subset of 58 people, sufficient clinical information was available to allow classification as symptomatic or asymptomatic. […]

Only 40.6% of positive tests showed Ct values below the threshold of 25, indicating a likelihood of the person being infectious. […]

In light of our findings that more than half of individuals with positive PCR test results are unlikely to have been infectious, RT-PCR test positivity should not be taken as an accurate measure of infectious SARS-CoV-2 incidence.

— Andreas Stang et al.: "The performance of the SARS-CoV-2 RT-PCR test as a tool for detecting SARS-CoV-2 infection in the population", Journal of Infection, May 31, 2021. doi:https://doi.org/10.1016/j.jinf.2021.05.022

As the question here asks explicitly for "real numbers", "the real amount" it looks for "real cases", a false positive is by definition not a real case.

The WHO itself has members working for them who know that perfectly well.

The false positive rates increases

  • when inherent quality of test employed goes down (they are treated all as the same despite obvious differences, which tests are used remains undisclosed in aggregated data),
  • when mistakes, cross contamination, mishandling etc occurs
  • when old fragments are amplified, signaling a 'long cold case', or true nucleic acid fragments are present from an inhaled as 'already dead' virion (cf this article & its sources)
  • when some test are just run long enough (Cycle threshold too high)
  • when prevalence of the disease is low (positive predictive value in relation to specificity and sensitivity (src1, src2), we also have Skeptics answers addressing the extremely high risk of false positives)

As such, a competing information schema was released in January:

WHO guidance Diagnostic testing for SARS-CoV-2 states that careful interpretation of weak positive results is needed (1). The cycle threshold (Ct) needed to detect virus is inversely proportional to the patient’s viral load. Where test results do not correspond with the clinical presentation, a new specimen should be taken and retested using the same or different NAT technology.

WHO reminds IVD users that disease prevalence alters the predictive value of test results; as disease prevalence decreases, the risk of false positive increases (2). This means that the probability that a person who has a positive result (SARS-CoV-2 detected) is truly infected with SARS-CoV-2 decreases as prevalence decreases, irrespective of the claimed specificity.

Most PCR assays are indicated as an aid for diagnosis, therefore, health care providers must consider any result in combination with timing of sampling, specimen type, assay specifics, clinical observations, patient history, confirmed status of any contacts, and epidemiological information.

WHO Information Notice for IVD Users 2020/05

This is the saner part of the WHO hammering home that a test is "a diagnostic aid", and must be interpreted in context to arrive at a clinical diagnosis, that is 'a case'.

This practice is definitively not followed in Germany. The false positive rate is completely ignored on all levels, the prevalence of the disease unknown as there are basically no representative samples being collected, and the methods of testing changed a lot and finally the methods of counting are just terrible.

As the RKI explains in their official publications (Situation report April 13, 2021, PDF):

Active cases are the number of cases transmitted minus deaths and the estimated number of recoveries.

This status report presents data on laboratory-confirmed COVID-19 cases submitted to the RKI.

And as late as March 2021, it contained the following disclaimer:

Bis einschließlich KW 10/2021 haben sich 259 Labore für die RKI-Testlaborabfrage oder in einem der anderen oben aufgeführten Netzwerke registriert und berichten nach Aufruf überwiegend wöchentlich. Da Labore in der RKI-Testzahlerfassung die Tests der vergangenen Kalenderwochen nachmelden bzw. korrigieren können, ist es möglich, dass sich die ermittelten Zahlen nachträglich ändern. Es ist zu beachten, dass die Zahl der Tests nicht mit der Zahl der getesteten Personen gleichzusetzen ist, da z. B. in den Angaben Mehrfachtestungen von Patienten enthalten sein können (Tabelle 5).

Up to and including week 10/2021, 259 laboratories have registered for the RKI test laboratory query or in one of the other networks listed above and report predominantly on a weekly basis following the call. Since laboratories can subsequently report or correct tests from previous calendar weeks in the RKI test count survey, it is possible that the numbers determined may change retrospectively. It should be noted that the number of tests is not to be equated with the number of persons tested, since, for example, multiple testing of patients may be included in the data (Table 5).

Of course contrary to many so-called 'fact-checkers' on the net, the RKI does include multiple positive tests on one person as multiple cases. How many? They don't say.

So, we have to assume that there are real cases in Germany that go undetected and uncounted. People with coronavirus and getting sick from it being 'a case', but who never show up at doctors or test-centers, as the illness is usually mild. Add to those the number of tested that are false negatives.

On the other hand the data we are presented with as 'situational' or 'actual' are outdated and premature, full of false positives unaccounted for, from unsupervised labs, employing all sorts of unstandardized tests of varying quality, for the test themselves, in reagents, cycle-numbers etc, with shifting testing strategies, unclear prevalence, a lot of guesswork and wobbly 'case definitions' and just a big mess.

A test alone cannot determine either acute infection or infectiousness. A test alone cannot determine 'a case'. That's what is stated in plain language on all leaflets accompanying such tests.

A few of these problems are even repeatedly emphasized by the two men who developed the first recognized and endorsed by the WHO PCR-test, one of them being even bold enough to estimate that around 50% of 'counted positive' cases have been irrelevant:

First Christian Drosten:

Yes, but the method is so sensitive that it can detect a single hereditary molecule of this virus. If, for example, such a pathogen flits across a nurse's nasal mucosa for a day without her falling ill or noticing anything else, then she is suddenly a […] case. Where previously very ill people were reported, now mild cases and people who are actually perfectly healthy are suddenly included in the reporting statistics. This could also explain the explosion in the number of cases […]

It would be very helpful if the authorities […] would go back to following the previous definitions of the disease. Because what is of interest first are the real cases. Whether symptomless or mildly infected […] are really virus carriers is, in my opinion, questionable. Even more questionable is whether they can pass the virus on to others. [They] should make a stronger distinction between medically necessary diagnostics and scientific interest.

Our body is constantly attacked by viruses and bacteria. However, they often fail at barriers such as the skin or the mucous membranes in the nose and throat. There, they are successfully warded off before they can do any harm.

And the producer of that test, Olfert Landt:

Olfert Landt is the managing director of the Berlin-based company TIB Molbiol, which produces the tests. Currently, up to two million units per week, as the company boss confirmed to the Nordkurier.

In short, it is undisputed that a positive PCR test does not initially provide any information about whether someone will contract Covid-19 and experience the corresponding symptoms. For weeks, however, critics of the measures in the Corona debate have been urgently pointing out that the PCR method — to put it simply — is so sensitive that it detects even the smallest traces of the SARS-CoV-2 pathogen and can therefore say nothing about the real risk of infection posed by those tested without symptoms.

Half of those testing positive [are] "not infectious".

Horror numbers, case explosions and not least on this data basis decided measures such as the current Lockdown are thus based on a gigantic mass of positive test results, which represents a completely unrealistic danger scenario — so the conclusion, which critical scientists and transverse thinkers already draw and spread for months from it. Naturally under harsh criticism of "serious" scientists.

The fact that the manufacturer of the PCR tests and scientific companion of chief virologist Christian Drosten, of all people, now supports this thesis, which was previously regarded as illegitimate trivialization, is a surprise. In an interview with the Fuldaer Zeitung, Olfert Landt first emphasizes that he still considers PCR tests to be absolutely suitable for monitoring the pandemic situation and case numbers. However, he also says that in his estimation, half of all people who test positive are not infectious. To be dangerous to third parties, he says, one would have to carry "100 times more viral load than the detection limit of the tests."

And of course, both men think very highly of the PCR-test method itself, even as it is used and misused and abused in the current situation, praising it where they can, with an obvious conflict of interest present.

How much guidance can the so-called incidence value provide in the Corona crisis? This is the value that indicates the number of new infections within seven days per 100,000 inhabitants. For the federal government, it is one of the most important indicators when it comes to tightening or loosening Corona measures. Criticism of this strategy comes, among others, from the medical statistician and former director of the Cochrane Center at Freiburg University Hospital, Gerd Antes. The incidence value has never been an appropriate indicator, Antes said. […] "The tests do not provide robust figures for general statements. Therefore, all figures derived from these tests should be taken with caution as a matter of principle and, at worst, are grossly incorrect."

And for example the association of Berlin Amtsärzte concur:

"These incidences do not represent the real incidence of infection." This is because the incidence values are related to both the testing capacities and the willingness of individual citizens to test: "This results in fluctuations that do not reflect the infectious situation."

As a summary for the misinformation with an abundance of bad data quality throughout the entire year 2020:

Also at the RKI, which has become one of the most important German Corona instances, he [Kauermann, head of statistics department Munich] leaves no good hair:

"For us it is frightening to see that the data quality in Germany is still a singular disaster," […]

Göran Kauermann: Covid Data analysis Group CODAG LMU Munich, with this above quote sourced from 30.01.2021 Artikel bei Focus Online: "Statistiker holt zur RKI-Schelte aus: Corona-Daten 'eine einzige Katastrophe'", likewise "No Excess Mortality in Germany"

    While a potentially corrective comment on what test result mean (which itself may or may not be relevant to what they are actually used for which ought to be tracking the spread of the virus not the number of infectious people) this is irrelevant to the question which was about why numbers of reported results appeared to differ in different sources.
– matt_black
  • 1
    Commented Apr 14, 2021 at 14:18
  • 1
  • 2
    Commented Apr 15, 2021 at 15:30
    – matt_black
    Commented Apr 15, 2021 at 19:59

