911 Calls Capstone Project is my own exercise project from Udemy Python for Data Science and Machine Learning Bootcamp by Jose Portilla. In this project, I learned how to analyze and visualize the dataset to better understand 911 calls. In this project, I use Python Anaconda using Jupyter Notebook. For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:
- lat : String variable, Latitude
- lng : String variable, Longitude
- desc : String variable, Description of the Emergency Call
- zip : String variable, Zipcode
- title : String variable, Title
- timeStamp : String variable, YYYY-MM-DD HH:MM:SS
- twp : String variable, Township
- addr : String variable, Address
- e : String variable, Dummy variable (always 1)
Project Intro/Objective
The purpose of this project is show step-by-step how to analyze and visualize the dataset to better understand 911 calls and what originates them.
Project Library
- Numpy
- Pandas
- Matplotlib
- Seaborn
Data and Setup
In this section, I want to show some of the data information, it includes Data Frame Info and Data Frame Head.
Data Visualization
In this section, I want to visualize the data needed using seaborn and look for something strange about the data.
Reason Countplot
In this section, I will show you the countplot from the reason of all 911 calls. Before do this, I create a new feature from the title column where I split the reason and the department, where I got EMS, Traffic, and Fire departments.
Day of Week Countplot
In this section, I create a coountplot of the Day of Week column with hue based off of the Reason Column. Because timestamp column datatype is a string, I need to convert this column to datetime object. Then divide it to Hour, Month, and Day of Week. After that, I can plot the Day of Week countplot.
Month Countplot
In this section, I create a countplot of the Month column with hue based off of the Reason Column.
Exploratory Data Analysis
From month countplot, I can see that the our data has missing value, where there are some missing months. To see this missing data, I can use pandas and then visualize it using simple plot and lmplot (2D scatterplot with an optional overlaid regression line).
Create groupBy Data Using Month Column
In this section, I create groupBy data using month column and then count all the data for aggregation, and then we can see the groupBy data frame using head() of the data frame.
groupBy Data Visualization
Futhermore, I can visualize the data using simple plot and lmplot.
With simple plot and lmplot, we can see the data from month 9 to 11.
Visualize Available Data
For further understanding, we can visualize the available data. We can visualize the data by date and by departments. First, I have to create a new column called date from the timestamp then I can plot the data by date and by departments.
HeatMap and ClusterMap Data
To help me read the data, I can use HeatMap and ClusterMap. HeatMap and ClusterMap makes it easy to read some related data where I collect the related data in certain places.
Using groupBy I create a new table using day of week and date column and then count all the data.
Then I create HeatMap and ClusterMap using dayHour data frame.
Then using groupBy I create a new table using day of week and month column and then count all the data.
From all of the HeatMaps and ClusterMaps, we can conclude that:
- There are high 911 calls between hour 8-20.
- There are low 911 call between hour 22-6.
- Month 1, 4, and 7 has the most 911 call.
- Month 8 and 12 has low 911 call.
- On saturday month 1, there is the highest 911 call.
Additional Resources
- Header Backgrounds by Wallpaper Flare at wallpaperflare.com
- Kaggle Original Challenge Source
- For further explanation regarding python code, please kindly check this link.