911 Calls Capstone Project

911 Calls Capstone Project is my own exercise project from Udemy Python for Data Science and Machine Learning Bootcamp by Jose Portilla. In this project, I learned how to analyze and visualize the dataset to better understand 911 calls. In this project, I use Python Anaconda using Jupyter Notebook. For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:

lat : String variable, Latitude
lng : String variable, Longitude
desc : String variable, Description of the Emergency Call
zip : String variable, Zipcode
title : String variable, Title
timeStamp : String variable, YYYY-MM-DD HH:MM:SS
twp : String variable, Township
addr : String variable, Address
e : String variable, Dummy variable (always 1)

Project Intro/Objective

The purpose of this project is show step-by-step how to analyze and visualize the dataset to better understand 911 calls and what originates them.

Project Library

Numpy
Pandas
Matplotlib
Seaborn

Data and Setup

In this section, I want to show some of the data information, it includes Data Frame Info and Data Frame Head.

Data Visualization

In this section, I want to visualize the data needed using seaborn and look for something strange about the data.

Reason Countplot

In this section, I will show you the countplot from the reason of all 911 calls. Before do this, I create a new feature from the title column where I split the reason and the department, where I got EMS, Traffic, and Fire departments.

Day of Week Countplot

In this section, I create a coountplot of the Day of Week column with hue based off of the Reason Column. Because timestamp column datatype is a string, I need to convert this column to datetime object. Then divide it to Hour, Month, and Day of Week. After that, I can plot the Day of Week countplot.

Month Countplot

In this section, I create a countplot of the Month column with hue based off of the Reason Column.

Exploratory Data Analysis

From month countplot, I can see that the our data has missing value, where there are some missing months. To see this missing data, I can use pandas and then visualize it using simple plot and lmplot (2D scatterplot with an optional overlaid regression line).

Create groupBy Data Using Month Column

In this section, I create groupBy data using month column and then count all the data for aggregation, and then we can see the groupBy data frame using head() of the data frame.

groupBy Data Visualization

Futhermore, I can visualize the data using simple plot and lmplot.

With simple plot and lmplot, we can see the data from month 9 to 11.

Visualize Available Data

For further understanding, we can visualize the available data. We can visualize the data by date and by departments. First, I have to create a new column called date from the timestamp then I can plot the data by date and by departments.

HeatMap and ClusterMap Data

To help me read the data, I can use HeatMap and ClusterMap. HeatMap and ClusterMap makes it easy to read some related data where I collect the related data in certain places.

Using groupBy I create a new table using day of week and date column and then count all the data.

Then I create HeatMap and ClusterMap using dayHour data frame.

Then using groupBy I create a new table using day of week and month column and then count all the data.

From all of the HeatMaps and ClusterMaps, we can conclude that:

There are high 911 calls between hour 8-20.
There are low 911 call between hour 22-6.
Month 1, 4, and 7 has the most 911 call.
Month 8 and 12 has low 911 call.
On saturday month 1, there is the highest 911 call.

Additional Resources

Header Backgrounds by Wallpaper Flare at wallpaperflare.com
Kaggle Original Challenge Source
For further explanation regarding python code, please kindly check this link.