“There are three kinds of lies: lies, damned lies, and statistics.”
-Mark Twain
I've had a few questions regarding the Public Health graph released this week. What do I mean when I say it's misleading and how did I calculate the more accurate and transparent total cases line? Let's look.
Once again, here's the graph Public Health provided media. It shows case counts broken down by age cohort.
Now let's do a thought experiment.
Let's say our province was experiencing 1,000 cases every day. A graph of cases over time would look like this:
1,000 is a big number. It's a scary number. We all remember the first time we had over 100 cases in one day and the feeling we had in our guts.
Big numbers also contradict the soothsaying narrative and the attempts to justify the lifting of all PH protections.
What if we divided those 1k cases up by gender? Let's say 450 of them are women and 550 of them are men.
On the same scale we get two lines, one for women and one for men, both below 600. 600 is not as big as 1,000. Less big is less scary and less contradictory to narrative.
All the data in the above graph is accurate, however, we must remember the line is not representative of the quantity of cases. The area under the line is actually representative of the quantity of cases. When the data is divided, those areas overlap and obscure each other.
Let's go a little further and split each gender into groups over 50 years old and under 50 years old.
Let's say that each day 200 women 50+, 250 women 50-, 260 men 50+ and 290 men 50- are reported as covid positive. That graph looks like this:
Now all lines are way down near the bottom of the graph. All are near or below 350. This is a much smaller number. This is much less contradictory to the narrative.
What the majority of people see on any graph is the direction the line is going (up or down) and the number on the left hand side.
GNB can't control the slope of the line but they can graph in such a way that people walk away with "350" in their heads instead of "1,000."
The truth of our hypothetical numbers looks like this. The red line is the actual full scale representation of daily case volume. The clustered lines below are overlapping volumes of cases. At a quick glance, where would you prefer case volume in your community be?
So let's go back to the PH graph. You can now see that total case load is split into 5 age cohorts and the volume in each of those cohorts is overlapped and obscuring one another. As of March 10, all lines are below 300. This can lead to a misinterpretation of the situation.
The more accurate representation of the community case volume is below, where I've plotted the total cases on the same scale. Daily cases are well over 800. This doesn't look as good as the split graph, and therein lies a valid question: why publish something so misleading?
When we are left to rely on community indicators, we must have data literacy and nuanced context for the information presented.
People are better served by transparent, quantifiable, and verifiable data, presented clearly.
Teaching people to interpret that data and calculate relative risks is something we all must focus on in order to successfully navigate this pandemic.