When I first joined Premise I was excited at the prospect of getting to use Google Cloud. I had experience with Azure and AWS over my career, but Google Cloud Platform (GCP) hadn’t yet taken off when I joined in 2017. However, the thought of getting to use the same tools that Google uses was very enticing. Two-plus years later, and I am still excited about all of the tools we have access to.
Specifically for COVID-19, we are leveraging our mobile application’s Task Marketplace to place various tasks into our 90+ country networks for our Contributors (that’s what we call the mobile app users that perform our tasks) to then complete for varying amounts of monetary reward. All of that data has timestamps, geolocations and all of the data points we can collect from select one style inputs to photos. It funnels into BigQuery where we then have our data science team work with the data and our data analysts building insights and dashboards, mainly using Google Data Studio.
One of the things we were also interested in is the ability to cross-compare actual COVID-19 case data with the data we were collecting, look for patterns, and identify possible areas of focus for Premise. One of our team members informed us that Johns Hopkins was publishing their data on GitHub, the same data ESRI and countless others are using for their visualizations. We found the repo and found two data types, a model with dates as columns where every day had everything rolled up into a row as well as a daily file with each county/state/country where available. We quickly decided the daily format was for us.
Once we knew the data we wanted, we created a Google Apps Script that fed the data into a Google Sheet every day (after building a backload version that handled the column differences) using a Google Trigger. This sheet then became the back end for a Big Query table. All of that data was then immediately available to us in under an hour and ready to analyze and visualize. We soon found some data cleaning that was needed. The data had five different date formats as well as multiple names for countries like “UK” vs “United Kingdom” and “Mainland China” vs “China,” however, this was very easy to clean using some simple SQL and we were off to the races.
Once that was done, we could really make some powerful Data Studio visualizations. For example, we could identify change over time and select the exact countries we wanted to compare. Note all data below was current up to March 27th, 2020 for case data and up to March 28th for Premise Data collection.
We could bring in other data sets like population estimates for 2020 and cross-compare case rates, death rates and recovered rates.
And finally, we can look at change over time, quickly identifying percent change of one day versus the previous day. By looking at this change we can then find interesting patterns in our own data which we’ll talk about later on in this blog. While the data has since been released as an official data set by Johns Hopkins and Google, we still find the format we are using to be more powerful and useful.
This analysis then allows us to focus our data collection on places of need or pull data from existing task collection to find common themes. As we can see from above that last country on the list, Turkey, was having a massive increase of 57.0%. This allows us to focus on Turkey over the last few week’s data collection and find that concern ramped up considerably with the gap between ‘Concerned’/’Very Concerned’ and other answers taking off around March 16th and widening even more around March 23rd.
We can also go down to the county level in the U.S. for analysis. For example, we noticed a leveling off of the case data in Cook County, Illinois from March 23 to the 26th with a large increase on the 27th.
We can then compare that to a new “Daily Diary” task we put in place in the U.S. for various sentiments and begin tracking sentiment given the data above.
We also have collected data on social distancing around the world. Here is Cook County’s current snapshot.
Finally, we capture photos on various subjects from parks to ATM locations to store shelves and the stock of items.
All of this data can be used to help fight COVID-19 and help communities around the world. Our hope is to help keep people informed, safe and healthy. We are happy to collaborate and share the data we have, please reach out (firstname.lastname@example.org) so we can work together.