Every part of our response to this virus has been based on the belief that it is unusually dangerous and would kill very many people if we did nothing. Is this proving to be the case?
To start with, we had to rely on “the experts” and their predictions for this analysis, because there was insufficient real world data to turn to. But now there have been almost 2 million confirmed cases, and over 100,000 confirmed deaths, can we place an actual number on how deadly the virus is?
This seems like both an important question to ask and answer, and a fairly easy question, too. After all, the information to calculate this, on a personal level, seems very simple and straightforward.
- Did you have the coronavirus : Yes or No?
- Did you die, or did you survive : Yes or No?
But even on an individual level, this is not easy. Not everyone knows they have the coronavirus or not – the answer to the first question isn’t just yes or no. There’s also a “maybe” and a “don’t know” answer.
This is partially because some people never even realize they were sick with the virus at all; they just think they had a touch of a cold/cough/sore throat for a short while, or the “regular” ‘flu.
It is also because some people have never been tested, so don’t know for sure what they had.
Moving to the second question, the never-tested problem remains there, too. Many people, especially in the early stages of this outbreak, died of what was described as ‘flu-like symptoms or pneumonia, but which, back then, no-one thought might be Covid-19. Others have been tested, but the test results have been wrong. And others have been tested, but by the time the test results come back weeks later, they are now long since either dead or discharged, and those test results may get discarded. And some people died of Covid-19, but they didn’t die in an official hospital and so their death isn’t counted (crazy but true).
We’ll come back to the accuracy of these numbers later in this piece, but let’s start off with looking at how fatality rates should be and are calculated.
1. The Perfect Calculation Method
The perfect calculation method is to wait until the outbreak is completely over and finished. At that point, count all the survivors and all the people lost to the disease, and use those numbers to get your mortality rate (ie the people who died divided by everyone who had the disease, whether they lived or died).
Unfortunately, we’re far from the point where we can say the outbreak is completely over and finished. And we don’t want to wait until whenever that will be. So let’s see what we can do, now, to get as close to an exact number as possible.
2. The Almost Perfect Calculation Method
The best calculation while an outbreak is underway – especially if it is rapidly growing (or shrinking) – is to identify everyone who became infected during a specific time period (for example, one day, or longer if needed to have a larger sample size), and then at the end of sufficient time for everyone who became infected during that specific time period to either die or be cured (probably 21 – 28 days), sort those people into two groups – people who died and survived, while making very sure not to miss anyone or to accidentally include people from earlier or later time periods.
Unfortunately, as far as I can tell, this has not yet been done with the Covid-19 infection. That is very regrettable, because there are major losses in accuracy when we move from this to other methods of calculating.
All the other calculations out there are compromises with varying degrees of error introduced.
3. The Most Common – But Worst – Calculation
The most common calculation – because it is the easiest – is to simply look at the total cases and the total deaths. But this gives a very wrong number, particularly in the early stages of a rapidly developing infection. For example in the first week of an outbreak, no-one has yet died, so it could be thought the disease is harmless.
Not quite so obvious is when an outbreak is skyrocketing in numbers, there’s a huge number of cases being counted which have not yet had time to resolve as either “the patient lived” or “the patient died”.
Most people take 14 – 28 days to die from the virus. So if you look at a count of deaths, you really should look at the number of cases 14 – 28 days before the date you use for total deaths to better match the deaths to when they were counted as cases.
For example, on Saturday 11 April, there were a total of 108,800 deaths. There were also a total of 1.78 million cases. If you match those two numbers, the calculation suggests a 6.1% death rate.
But almost all the people who died have been cases for at least a week – almost none of the people added as new cases in the last seven days yet have a known outcome. They haven’t had a chance to be registered as either a cleared case/survivor, or as a fatality, but all these unknown outcome cases are being considered as survivors. That is an incorrect assumption, and so this method of calculating a death rate understates the true number.
4. Time Shifting the Case Count and the Death Count
To solve the problem in the previous way of calculating the fatality rate, it is more accurate to compare the total deaths on a specific day with the total case count on a date in the past far enough back to allow all the cases then to now be resolved as either survivors or deaths.
The problem with this is knowing how far back to choose for the case rate. This is especially acute when an outbreak is rapidly growing. For example, with this last Saturday having 1.78 million cases, if we go back a mere one week (to Saturday 4 April), the total case count was much lower – 1.20 million cases. Using that as the denominator would suggest a 9.1% death rate – almost a 50% increase in apparent death rate.
However, even going back one week is probably too short. Maybe we go back two weeks (March 28), when the official case count was 662,800. Using that as the denominator makes the death rate 16.4%. And if you wanted to go back three weeks (March 21) the case rate then was 304,900, which would make the death rate 35.7%. Don’t even ask what would happen if you went back further!
It is astonishing how drastically the death rate changes, depending on how far back we should go. But we don’t want to go too far back – the further back we go, the more we have people who died very quickly “leaking” in to the total count of deaths, bringing an opposite distortion to the final calculated death rate.
So what can we conclude from these numbers? We know that using Saturday 11 April for both total cases and total deaths is wrong, so that number is too low. Going back one week is also too short, and while we’d prefer to see the three week count, we also have to consider that some people do die quickly, so maybe we could compromise on the two week count, and its suggested death rate of 16.4%, while keeping in mind it could be a little less (the one week 9.1% number is probably too low), but could also be much more (the three week 35.7% number might only be a little high).
5. Comparing Survivor and Death Counts
Another method of calculation, after the first few weeks, is to count up all the survivors and all the fatalities, and then calculate the death rate from those two numbers. Never mind when they were counted as a case, just focus on when their outcome has been considered.
Even this has some problems – what happens if it takes different amounts of time to be adjudged either dead or cured? That would mean that one or the other numbers has more “pending” cases still to count than the other number. This “work in progress” proportion though should start to decline in significance as time passes, although it remains substantial if the case numbers are rapidly increasing. Our understanding is that most people take less time to be cured than they do to die, so using this method might, if anything, slightly under-state the fatality rate.
The Worldometers site does this calculation every day, and for the last several weeks the death rate number as been around the 20% point compared to all closed cases.
While there may be a problem with undercounting deaths that way, there seems to be a very much larger problem with the Worldometers statistics. We do not believe the number of survivors is being accurately counted. For example, for a long time, Britain was reporting, every day, a mere 135 people who had finally emerged from their illness and been pronounced cured. We were hearing, daily, in the news about high profile people who had been tested with the disease and were now announcing themselves as cured, but the number stayed exactly at 135, day after day after today.
Even today, the numbers are totally and almost certainly wrong – Britain is reporting 79,000 cases, almost 10,000 deaths, and only a mere 344 people who are now officially recovered.
So, while this would be a good number to use if the data was reliable, the data doesn’t pass a “reasonableness test” and so we feel the 20% number is much higher than it should be.
Trying to Reconcile the Different Answers
Maybe we can look at the various numbers from these various approaches and use them to help create a reasonable range from “not likely to be higher than …..” and ranging to “unlikely to be less than …..”.
For example, we know that the Method 5 calculation (20% death rate) is too high. We also know that the Method 3 6.1% death rate (from comparing today’s total cases and total deaths) is too low. So we can start to zero in on a death rate somewhere between 6.1% and 20%.
If we look at the half-way point as a compromise, we’re looking at 13%, which is also in the middle between the numbers for comparing deaths to cases as of one and two weeks ago. That sounds like it is on the low end of accurate, but perhaps it might be a useful number to work forward from. Let’s make it 15%, because the Method 4 calculation seems to be suggesting numbers more in the 20% range of Method 5 rather than the 6% range of Method 3.
Before we adjust further, we should point out one other thing. The death rate is not consistent between one country and the next. Some countries have death rates that are ten or more times higher/lower than other countries. For simplicity, if we just look at current cases and current deaths, we see countries like Italy, Belgium and the UK with their ratio being 10% or higher, but we also see countries like Germany, South Korea, and Turkey with rates of 2% or lower, and outliers like Iceland, Singapore and New Zealand with rates of 0.5% or lower.
Some of this could perhaps be explained by differing standards of healthcare, but we see a huge range of ratios within the EU, which we’ll assume has generally similar healthcare from country to country.
We also see a wide range of death ratios within the US, which is presumably somewhat similar from state to state. New York is at about 4.8%, but immediately adjacent New Jersey is 3.8%, and Texas is 2.0%.
This is sending us a strong signal. Perhaps something is interfering with the validity of this data, there is another factor that we’ve not considered. Indeed, as any good detective would tell us, there might be more than one other factor as yet not considered. As it turns out, there definitely are.
The Small Problem – Counting Deaths
Why is is difficult to count deaths? It isn’t difficult to count “one, two, three”; but it is difficult to classify deaths and count deaths for a particular cause.
Some death certificates have multiple lines on them for the immediate cause of death (his head was cut off, for example) and then the contributing factors (he crashed his car, he was driving too fast, he was drunk). Or cause of death – lack of oxygen and organ failure. Contributing factors – acute respiratory distress syndrome (his lungs stopped working), Covid-19 (virus side-effects harmed the lungs). Or maybe other contributing factors – lung cancer.
What happens when there are multiple contributing factors? Do they all get counted as “the” cause of death? (No, only one does.) Are percentages shared between the different factors? (Nope, it is an all or nothing count.) We know that people who are already unwell are more likely to have more severe Covid-19 infections; how much of a death should be blamed on a patient with other significant comorbidities?
It seems that both nationally and at some state levels, there is pressure on doctors to certify the primary cause of death as Covid-19 – even in cases where the Covid-19 disease hasn’t been confirmed as being present (ie Minnesota). So the eagerness to blame Covid-19 (a bit like the eagerness to blame road deaths on excessive speed) might be inflating the death numbers.
This situation has been aptly described as the difference between dying of Covid-19 and dying with Covid-19.
On the other hand, other states won’t certify a death as Covid-19 related if Covid-19 hasn’t been tested to be present (even if it is very likely to have been the cause), and there are times when states don’t bother testing dead patients. So in those cases, Covid-19 deaths are being under-reported.
Some states are only submitting death data to the CDC for patients who die in hospitals. If you die in an old folks home (as many are) or at home or somewhere else, your case “doesn’t exist” for these purposes, meaning again there is under-reporting of Covid-19 deaths.
So the death count clearly has some imprecision and uncertainty, although some of the “plus” factors may be balanced by some of the “negative” factors. However, even in the most extreme case, it is unlikely that death numbers are being under or over reported by a factor of two or more.
But this plus or minus a factor of two, while dismaying, still makes the death counting “very exact”, compared to the other number in any calculation – the total Covid-19 cases.
The Huge Problem – Counting Cases
This is the huge unknown. How many people have been infected by the coronavirus and have experienced an attack of Covid-19? Sure, we know of some – the people who chose to get tested, and who tested positive for the disease. But what about people who have not been tested – what is the proportion between people we know about (ie tested positive for the disease) and people we don’t know about – people who may have had a very mild case of the disease and never bothered to see a doctor or go to a hospital, or people who did go to a doctor/hospital, but were misdiagnosed?
We should point out that in the earlier stages of the disease spreading, doctors and hospitals outright refused to even test people with Covid-19 symptoms, unless they had been to China or were in close contact with someone from China. Yes, that stunning refusal to consider the disease was spreading “in the community” rather than only from people who had been to China allowed the disease precious extra weeks to silently infiltrate much more of our community. (Keep in mind that many of the same people who said “don’t test, there’s no danger” back then are now the people saying “trust us with our future projections, we’re the experts”.)
More recently, with testing now more open to more people, we still hear cases as recently as last week of people who could not get a test because their “risk profile” didn’t fit that of someone likely to have the virus and they weren’t showing obvious symptoms.
However, the number of people in that category (the “refused to test them” category) is probably small compared to the number of people who have never been suspected of having the disease, and have never asked to be tested.
There’s another problem with the testing, too. It only tests if you currently have the disease. It does not report if you had the disease before and have now been cured.
By obvious definition, we have no way of exactly knowing how many people have silently had the disease. With each passing day, it is likely that more “silent sufferers” are being cured of the disease just through their body successfully fighting the virus, and those people disappear, never to register in any statistics, until such time as we get a reliable test for people who have formerly had the disease and now have developed antibodies for the virus.
There are some empirical ways to guess at the number of people who have had the disease but who have never appeared in official statistics. Some of these methods unfortunately base their logic on the Chinese data. This is ill-advised – most “experts” believe the Chinese data is totally unreliable and so should never be used at all. Other methods are not quite so dependent on the Chinese data, and some are quite inventive and imaginative (like the viral “load” in city sewer systems).
The guesses for how many people have already had the disease range from as low as “for every person we know about, there is one additional person we don’t know about”, up to as high as “for every person we know about, there are 20 additional people we don’t know about” – instead, some studies go even higher than that.
Most studies currently seem to be suggesting 5 – 15 times more people have had the disease silently (and survived). But there’s no real confidence or certainty in any of those numbers. It might be as low as 2, it might be as high as 50.
And now you can see the enormity of the problem. We sort of know, within a factor of about two times up or down, how many people are dying of the disease. But we don’t know how many people have had the disease and lived. There is an official number of known cases, but that number could be 5 times too low; it might even be 50 times too low.
So, looking back to our section above where we tried to find a compromise death rate number in the middle between too high and too low, we decided on 15%. Now we need to consider the inaccuracies in the total cases and total death numbers.
All we can say from the possible plus and minus factors for how deaths are counted is that whatever number we come to, it is possibly up to twice as high, or up to twice as low, as it should be.
But for the case count, that primarily works one way. Conceivably, some people might be misdiagnosed with Covid-19 by mistake, but that number is likely very small. On the other hand, various empirical guesses suggest the true number of cases might be somewhere in a range of 5 – 15 times higher than the count of officially diagnosed cases, and with upper and lower limits being possibly as much as 50 times higher, or only 2 times higher.
The thought of ten times higher seems like an appealing compromise number to use – but we select this number with absolutely no exact scientific backing whatsoever. Choose your own number, and it is as likely to be correct as ours.
Which means we take the 15% semi-scientific number from above, and divide it by ten, to get a net fatality rate of 1.5%.
The lowest this could be? Take the 6.1% number from the series of calculations above, then halve it due to over-counting deaths as Covid-19 when they were really something else, then divide by 50. That gives you 0.06%.
The highest this could be? Take a 25% death rate from Method 4 above, then double it due to undercounting Covid-19 deaths, then divide by two. That gives you 25%.
So we “know” that somewhere between perhaps one person in four (25%) or perhaps one person in 1650 (0.06%) who gets the virus is likely to die.
In other words, we don’t really know a single certain thing at all.
Still More Considerations
There’s a further consideration as well. Actually, there are many further considerations.
When we talk about on average there being a 1.5% fatality rate from this virus, that average hides a lot of variation between different demographic groups. To express that in simple terms, the older you get, the more likely you are to die. There are some great tables halfway down this page that show relative death rates by age. Based on their numbers, if you’re in your 70s, you’re 100 times more at risk than if you’re in your 20s or 30s.
The sicker you already are, the more likely you are to die. Overweight? High Blood Pressure? Diabetes? Lung problems? All these increase your risk greatly.
This is further complicated by the fact that the more negative factors you have, the more likely you are to become seriously unwell and so therefore the more likely you are to become a counted case.
There are other considerations too, but we hope we’ve made our point thoroughly well already.
Summary
So, if you’re an ordinary guy (or gal), with ordinary health, what is your chance of dying, if you get the disease? You’ve now had the long answer (re-read everything above). If you want the short answer, it is “No-one has a clue”.
But, the good news is your chances of dying are probably less than 1%, and possibly less than 0.1%.
Publication History
12 April 2020 : First published
David,
Not covered in your discussion was the reliability of the tests themselves. One of the tenets of measurement science is that all measures contain error.
Knowing how large the error component is helps in assessing the validity of the measure. There was some early discussion of the error rate for the current corona virus test but it has dropped to almost none. Early false positive rates as high as 80% were thrown about with little corroborating data but I have yet to see a good scientific discussion of the “real” number. Even if the 80% number were reduced by an order of magnitude, it would still impune any data on the number of cases.
You are certainly covering the discussion and data on these subjects much harder than I so I ask what you have seen relating to the testing and the reliability of the tests.
Hi, Doug
The early testing was indeed a terrible mess. It was beyond slow, and it was inaccurate.
I’ve never seen a convincing and complete statement of the testing inaccuracies. If a tenet (sic) of measurement science is that all measures contain errors, so too do all measurements of measurement error. 🙂
I decided not to discuss this in detail because of lack of solid data, and also because what happened in the early days has been vastly superseded by more recent numbers. The “early days” covers the first few thousand cases, now that we’re up to 575,000, some percent of a few thousand is no longer as material as it was then.
Truly, when it comes to calculating death rates, there are so many different errors in every imaginable part of the process that it gets to the point where one just throws up one’s hands in horror and says “enough, already”; and that’s essentially what I did. When faced with an order of magnitude or more in understanding the actual case count, most other errors become less critical.