Studying and Predicting the Progress of COVID-19 using Pandas and ARIMA
COVID-19 has been around for nearly 4 months since the outbreak. In this notebook, we will study some of the useful statistics regarding number of confirmed/deaths/recovered cases as a function of time per each country/region. We will use the the dataset that has been publicly avaiable by www.kaggle.com in here.
What will You Learn?
– How to use Pandas to load .csv files
– How to check for attributes with missing values and if necessary getting rid of those attributes
– How to generate some useful statistics from the dataset?
– How to use visualisation for a better understanding regarding the patterns in the data?
What is inside the COVID-19 Dataset?
Main file in this dataset is covid_19_data.csv and the detailed descriptions are below: covid_19_data.csv and below is a summary of the attributes in the .csv file:
- Sno – Serial number
- ObservationDate – Date of the observation in MM/DD/YYYY
- Province/State – Province or state of the observation (Could be empty when missing)
- Country/Region – Country of observation
- Last Update – Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it)
- Confirmed – Cumulative number of confirmed cases till that date
- Deaths – Cumulative number of of deaths till that date
- Recovered – Cumulative number of recovered cases till that date
Are Country Level Datasets Available?
The Country level datasets are also available:
If you are interested in knowing country level data, please refer to the following Kaggle datasets:
– India
– South Korea
– Italy
– Brazil
– USA
– Switzerland
– Indonesia
You can download the Jupyter Notebook for COVID-19 Analysis here:






Responses