[API] coronavirus

API & Databases R Courses

Access a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic through the coronavirus API.

Thierry Warin https://warin.ca/aboutme.html (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
04-02-2020

Database description

The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

A csv format of the package dataset available here.

A summary dashboard is available here.

Functions

This package gives access a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The function below allows you to download the data.

Each of these functions are detailed in this course and some examples are provided.

data(“coronavirus”)

This is a basic example which shows you how to get the data:

library(coronavirus)

data("coronavirus")

This coronavirus dataset has the following fields:

head(coronavirus) 
        date province     country      lat     long      type cases
1 2020-01-22          Afghanistan 33.93911 67.70995 confirmed     0
2 2020-01-23          Afghanistan 33.93911 67.70995 confirmed     0
3 2020-01-24          Afghanistan 33.93911 67.70995 confirmed     0
4 2020-01-25          Afghanistan 33.93911 67.70995 confirmed     0
5 2020-01-26          Afghanistan 33.93911 67.70995 confirmed     0
6 2020-01-27          Afghanistan 33.93911 67.70995 confirmed     0
tail(coronavirus)
             date province country     lat     long      type cases
150715 2020-07-26 Zhejiang   China 29.1832 120.0934 recovered     0
150716 2020-07-27 Zhejiang   China 29.1832 120.0934 recovered     0
150717 2020-07-28 Zhejiang   China 29.1832 120.0934 recovered     0
150718 2020-07-29 Zhejiang   China 29.1832 120.0934 recovered     0
150719 2020-07-30 Zhejiang   China 29.1832 120.0934 recovered     0
150720 2020-07-31 Zhejiang   China 29.1832 120.0934 recovered     0

Here is an example of a summary total cases by region and type (top 20):

library(dplyr)

summary_df <- coronavirus %>% group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)
# A tibble: 20 x 3
# Groups:   country [12]
   country        type      total_cases
   <chr>          <chr>           <int>
 1 US             confirmed     4562038
 2 Brazil         confirmed     2662485
 3 Brazil         recovered     2008854
 4 India          confirmed     1695988
 5 US             recovered     1438160
 6 India          recovered     1094374
 7 Russia         confirmed      838461
 8 Russia         recovered      637217
 9 South Africa   confirmed      493183
10 Mexico         confirmed      424637
11 Peru           confirmed      407492
12 Chile          confirmed      355667
13 Chile          recovered      328327
14 Mexico         recovered      327115
15 South Africa   recovered      326171
16 United Kingdom confirmed      304793
17 Iran           confirmed      304204
18 Colombia       confirmed      295508
19 Spain          confirmed      288522
20 Peru           recovered      283915

Summary of new cases during the past 24 hours by country and type (as of 2020-03-26):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)
# A tibble: 188 x 4
# Groups:   country [188]
   country      confirmed death recovered
   <chr>            <int> <int>     <int>
 1 US               67023  1259     24005
 2 India            61242   793     39026
 3 Brazil           52383  1212     52047
 4 South Africa     11014   193     16570
 5 Colombia          9488   295      5692
 6 Mexico            8458   688      7015
 7 Peru              6809   205         0
 8 Argentina         5929   102      3184
 9 Russia            5468   161      8735
10 Iraq              3346    70      1888
# … with 178 more rows

tl;dr

library(coronavirus)

data("coronavirus")

head(coronavirus) 
tail(coronavirus)

library(dplyr)

summary_df <- coronavirus %>% group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)

Code learned this week

Command Detail
data(“coronavirus”) Get data for of all Corona Virus cases

References

This tutorial uses the coronavirus package, created by Rami Krispin.


Citation

For attribution, please cite this work as

Warin (2020, April 2). Thierry Warin, PhD: [API] coronavirus. Retrieved from https://warin.ca/posts/api-coronavirus/

BibTeX citation

@misc{warin2020[api],
  author = {Warin, Thierry},
  title = {Thierry Warin, PhD: [API] coronavirus},
  url = {https://warin.ca/posts/api-coronavirus/},
  year = {2020}
}