COVID-19 Map

Welcome to the BigMap COVID-19 Tracking Project!

This is a constantly evolving product and is meant to be "unfinished" in that we will constantly be making improvements as we think of - and have time to implement - them. Given the time-sensitive nature of the COVID-19 pandemic, we wanted to get our analysis "out there" as quickly as possible while maintaining the quality we would expect from other high-quality data science projects.

Functionality

The basic functionality of the map includes:

Change the metric used to color the map with the radio buttons in the top right
Select a date using the slider on the bottom right
"Play" the historical data, showing how the map changes over time by clicking the button to the left of the time slider
Hover over a county to get the name and current value of the selected metric
Click on a county to show a timeseries plot of the selected metric in that county

Metrics

Case Count - The number of reported COVID-19 Cases
% Increase - Percent increase of COVID-19 cases from the previous day
Deaths - The number of reported COVID-19 Deaths
Doubling - The number of days since there were half as many COVID-19 cases as the current day.
Note, some other projects calculate this value using the current trajectory of the disease in a location, but we have opted for a simpler metric that is more interpretable. Our approach is more of a retrospective look on how a county is doing currently relative to the recent past.
R(t) - The effective reproduction number
As per Thompson et al. (2019): "R(t) represents the expected number of secondary cases arising from a primary case infected at time t. This value changes throughout an outbreak. If the value of R(t) is and remains below one, the outbreak will die out. However, while R(t) is larger than one, a sustained outbreak is likely. The aim of control interventions is typically to reduce the reproduction number below one."

Because R(t) is sensitive to reporting abnormalities that may occur day-to-day, we provide smoothed metrics for 3- and 7-day time scales.

For more details on the R(t) metric used in this project, see Nick Clark's Git Repo
Infectious Probability - The probability that someone on the street is infectious
This probability is calculated by dividing the number of people who have COVID-19 (but are not hospitalized) by a county's total population. The number of non-hospitalized people with COVID-19 is based on the following assumptions:
- Number of days between symptoms and getting a positive test result = 2 days (source: https://www.labcorp.com/tests/139900/2019-novel-coronavirus-covid-19-naa)
- Number of days infectious after showing symptoms = 10 days (source: https://www.medrxiv.org/content/10.1101/2020.03.05.20030502v1)
- Number of days between symptoms and hospitalization (if hospitalized) = 7 (source: https://jamanetwork.com/journals/jama/fullarticle/2761044)
- Undetected cases (i.e., amount of people who likely have the virus and are not captured in the positive testing numbers) = 10x (source: https://science.sciencemag.org/content/368/6490/489)
- Hospitalization rates of infected people by age (source: https://www.thelancet.com/action/showPdf?pii=S1473-3099%2820%2930243-7)

Using these assumptions, we calculate the number of infectious people in the population by finding the number of people who tested positive within the infectious period, multiplying by the undetected case figure, and subtracting the number of people who are likely to have been hospitalized (based on the age demographics of a county).

Note 1: Because there are so many assumptions baked into this metric, it is certainly not the exact probability that a person on the street is infected. If we assume, however, that these assumptions are fairly stable across counties, we can compare the numbers to get an idea of relative risk between locations.

Note 2: This probability should be considered like a geographically informed prior. There are many other things that need to be accounted for when assigning a probability that an individual is infected. For example, if you go to a bar, the probability that an individual is infectious is probably much higher than the number we report. Bars are filled with people who are (likely regularly) engaging in behavior that may expose them to the virus. Conversely, a person in the waiting room at a doctor's office is probably much less likely to be infectious than our metric. Doctor's offices require temperature checks and screening surveys that lower the probability of an infectious person sitting in the waiting room.

Methodology and Technology

The map was created with the goal of visualizing the spatial and temporal distributions of various COVID-19 statistics. While there are many similar projects, this one is focused on allowing for easy traversal of time and space. To that end, we have a pseudo-animation feature that lets the user "play out" the pandemic over the country with any metric they choose. At the county level, we give the user the ability to click to see a timeseries visualization for the selected metric. In combination, we aim to give users multiple approaches to visually interrogate the current and past states of the COVID-19 pandemic across a number of metrics. See the repository for the map for the full code base. More documentation is forthcoming there.

Data

The data used to generate the map is generated by an R script that is run daily. We are constantly updating this file and it is not stable enough to release publicly at this stage. Eventually, we will add this munge file to the repository.

The case counts and deaths are sourced from USA Facts. This is the best county-level data set we have found, and it is consistently updated daily. The remainder of the fields are calculated using team's internally developed methodologies.

Mapping

This map is a fairly vanilla Leaflet implementation with Javascript (and jQuery) used to implement additional functionality. I stayed away from Leaflet plugins for the most part because they are generally pretty rigid in their implementation and I am particular. The layers on the map are sourced by a geojson created from US Census county boundary shape files.

Time Slider

The timeline controls were manually created and added as Leaflet control layers. Adding custom control layers allows for arbitrary HTML elements to be included "on top" of the map. We used jQuery to listen for changes to the time slider which then updates the map using the data for that specific date. The "play" button starts iterating the time slider by one tick on a set time interval (starting either at the current location of the time slider or restarting at the beginning if the slider is in the last position).

Timeseries Visualization

Clicking on a county produces a timeseries using Apache's Echarts Javascript implementation. While a user could find the same information by adjusting the time slider, seeing the data in one timeseries visualization supports questions like, "how bad was the COVID-19 pandemic in county X?"

Colors

Colors are one of the most controversial topics when it comes to any visualization. We chose to go with a yellow-to-red scale, using grey for NA values. The intent is to get close to an intensity scale while maintaining readability for people with and without color blindness.

Color thresholds are a tricky and are being constantly reevaluated. If possible, we try to make the thresholds meaningful. For example, an R(t) below 0 represents a "good" situation in that case count should continue to decrease so we assign the first threshold at 1 for the R(t) scales.

About the Authors

Ian Kloo

Ian is a data scientist and is the lead developer on the BigMap project.

Nick Clark

Nick is a statistician and creator of the R(t) statistic used on the BigMap project.

Problems, Concerns, or Feedback?

Please submit an issue on the project repository and we'll do our best to address it/get back to you.