What are the five steps of the analytic process?

STEP 1: Asking the right question(s)

Data Wrangling has 3 sub-steps:-

Gathering of data

  • API or Web Scraping — If the data needed is available in particular website(s), then we can use the websites API (if available) or Web Scraping techniques to collect, and store data in our local storage/ databases. Often, data collected from the Internet is stored in a JSON format, and further processing is needed to convert JSON to the commonly used “.csv” format.
  • Databases — If the data required is available in our companies databases, then we can easily use SQL queries to extract the data needed from them.
  • Sites like kaggle.com store data sets in appropriate formats to be downloaded by the members for practice/ competitions.
Assessing of data

  • The number of rows and columns present in the data set
  • columns present in the data set, along with the data type and number of non-null values
  • PassengerId — Unique ID of each passenger.
  • Survived — Binary Feature (consisting of only 0 or 1 values) which indicate whether that passenger survived or not. ( 0 = NO, 1 = YES).
  • Pclass — Indicates the socioeconomic status of the passenger. ( 1 = Upper Class, 2 = Middle Class, 3 = Lower Class ).
  • Name — Passenger name.
  • Sex — Male or Female.
  • Age — Age of the passenger.
  • SibSp — Number of siblings/ spouses of the passenger aboard the titanic.
  • Parch — Number of parents/ children of the passenger aboard the titanic.
  • Ticket — Ticket number of each passenger.
  • Fare — Fare paid by the passenger.
  • Cabin — shows the cabin number allotted to each passenger.
  • Embarked — Shows the port of embarkation of the passenger. ( C = Cherbourg, Q = Queens town, S = Southampton ).

Data Cleaning

  1. Analyze utilizing visualization techniques, which Gender was given more priority during the rescue operation?
  • Did the analysis answer my original question?
  • Was there any limitation in my analysis which would affect my conclusions?
  • Was the analysis sufficient enough to help decision making?
  • Males had a higher chance of survival if they belonged to the upper class. Had an age between 0 to 4 years old, or 18 to 50 years old, and had 1 to 3 relatives traveling onboard the titanic.
  • Females had a higher chance of survival irrespective of their class, but if they had an age between 0 to 4 years old, or 15 to 50 years old, and had 0 to 4 relatives traveling onboard the titanic.


A Fun Fact

