COVID-19 Analysis

Studying and Predicting the Progress of COVID-19 using Pandas and ARIMA

COVID-19 has been around for nearly 4 months since the outbreak. In this notebook, we will study some of the useful statistics regarding number of confirmed/deaths/recovered cases as a function of time per each country/region. We will use the the dataset that has been publicly avaiable by www.kaggle.com in here.

What will You Learn?

– How to use Pandas to load .csv files
– How to check for attributes with missing values and if necessary getting rid of those attributes
– How to generate some useful statistics from the dataset?
– How to use visualisation for a better understanding regarding the patterns in the data?

What is inside the COVID-19 Dataset?

Main file in this dataset is covid_19_data.csv and the detailed descriptions are below: covid_19_data.csv and below is a summary of the attributes in the .csv file:

Sno – Serial number
ObservationDate – Date of the observation in MM/DD/YYYY
Province/State – Province or state of the observation (Could be empty when missing)
Country/Region – Country of observation
Last Update – Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it)
Confirmed – Cumulative number of confirmed cases till that date
Deaths – Cumulative number of of deaths till that date
Recovered – Cumulative number of recovered cases till that date

Are Country Level Datasets Available?

The Country level datasets are also available:

If you are interested in knowing country level data, please refer to the following Kaggle datasets:
– India
– South Korea
– Italy
– Brazil
– USA
– Switzerland
– Indonesia