This page showcases three unique data-centric projects built using Python and Pandas. Each project explores a different data workflow and skillset. Click projects to visit -
This project showcases the complete data cleaning lifecycle using the Pandas library. It includes inspection, cleaning names, fixing formats, and preparing structured, analysis-ready data from a messy CSV file.
import pandas as pd
df = pd.read_excel('Customer Call List.xlsx')
df
We began by importing the Pandas library and loading a raw customer data CSV file located on our computer. The initial data required inspection to understand missing values, inconsistencies, and noise.
Initial structure of the raw dataset. (Only Top 6 Results are displayed)
df.info()
df.describe()
df.isnull().sum()
Using strategies like info(), describe(), and isnull() allowed us to identify problems such as null values, incorrect data types, and inconsistencies.
Output for last statement
df = df.drop_duplicates()
df = df.drop(columns = "Not_Useful_Column")
df
We removed duplicate entries and dropped unnecessary columns that did not add any value to our analysis.
Duplicate rows and temporary columns removed.
df["Last_Name"].str.lstrip("...")
df["Last_Name"].str.lstrip("/")
df["Last_Name"].str.lstrip("_")
df["Last_Name"] = df["Last_Name"].str.lstrip("123._/")
df
Names were cleaned by removing leading/trailing whitespaces and standardizing them to title case for consistency.
Name column cleaned and standardized.
df["Phone_Number"] = df["Phone_Number"].str.replace('[^a-zA-Z0-9]', '', regex = True)
df["Phone_Number"] = df["Phone_Number"].apply(lambda x: str(x))
df["Phone_Number"] = df["Phone_Number"].apply(lambda x: x[0:3] + '-' + x[3:6] + '-' + x[6:10])
df["Phone_Number"] = df["Phone_Number"].str.replace('nan--', '')
df["Phone_Number"] = df["Phone_Number"].str.replace('na--', '')
df
We formatted phone numbers into a standardized pattern for better readability.
Phone numbers standardized as 123-456-7890.
df[["Street_address", "State", "Zip_code"]] = df[Address].str.split(','', expand = True)
# Replacing Y with Yes using str.replace :
df["Paying Customer"] = df["Paying Customer"].str.replace('Y', 'Yes')
df
Addresses were split into separate fields—Street, City, and State—to allow more granular analysis. And all string were converted into single string or one perticular selected str. The original 'Address' column was removed afterward.
Address column split into multiple structured fields. And replacing of strings.
This data cleaning process helped transform raw, unstructured CSV data into a well-structured format ideal for further analysis. We used multiple data wrangling strategies including renaming, formatting, filtering, and standardizing—all using powerful Pandas functionalities.
Project Deployed on Github but not published on live html page. (GitHub link below....)
Project Deployed on Github but not published on live html page. (GitHub link below....)