🐍 Python & Pandas Projects

This page showcases three unique data-centric projects built using Python and Pandas. Each project explores a different data workflow and skillset. Click projects to visit -

🧼 Project 1: Data Cleaning with Pandas

This project showcases the complete data cleaning lifecycle using the Pandas library. It includes inspection, cleaning names, fixing formats, and preparing structured, analysis-ready data from a messy CSV file.


Step 1: Importing Libraries & Loading Local CSV

import pandas as pd

df = pd.read_excel('Customer Call List.xlsx')

df

We began by importing the Pandas library and loading a raw customer data CSV file located on our computer. The initial data required inspection to understand missing values, inconsistencies, and noise.

Initial data view

Initial structure of the raw dataset. (Only Top 6 Results are displayed)


Step 2: Data Understanding - Strategies & Problem-Solving

df.info()

df.describe()

df.isnull().sum()

Using strategies like info(), describe(), and isnull() allowed us to identify problems such as null values, incorrect data types, and inconsistencies.

Data info output

Output for last statement


Step 3: Removing Duplicates & Unnecessary Columns

df = df.drop_duplicates()

df = df.drop(columns = "Not_Useful_Column")

df

We removed duplicate entries and dropped unnecessary columns that did not add any value to our analysis.

Cleaned duplicate entries

Duplicate rows and temporary columns removed.


Step 4: Cleaning & Standardizing Names

df["Last_Name"].str.lstrip("...")

df["Last_Name"].str.lstrip("/")

df["Last_Name"].str.lstrip("_")

df["Last_Name"] = df["Last_Name"].str.lstrip("123._/")

df

Names were cleaned by removing leading/trailing whitespaces and standardizing them to title case for consistency.

Cleaned names

Name column cleaned and standardized.


Step 5: Formatting Phone Numbers

df["Phone_Number"] = df["Phone_Number"].str.replace('[^a-zA-Z0-9]', '', regex = True)

df["Phone_Number"] = df["Phone_Number"].apply(lambda x: str(x))

df["Phone_Number"] = df["Phone_Number"].apply(lambda x: x[0:3] + '-' + x[3:6] + '-' + x[6:10])

df["Phone_Number"] = df["Phone_Number"].str.replace('nan--', '')

df["Phone_Number"] = df["Phone_Number"].str.replace('na--', '')

df

We formatted phone numbers into a standardized pattern for better readability.

Formatted phone numbers

Phone numbers standardized as 123-456-7890.


Step 6: Address - Splitting and Cleaning and Replacing

df[["Street_address", "State", "Zip_code"]] = df[Address].str.split(','', expand = True)

# Replacing Y with Yes using str.replace :

df["Paying Customer"] = df["Paying Customer"].str.replace('Y', 'Yes')

df

Addresses were split into separate fields—Street, City, and State—to allow more granular analysis. And all string were converted into single string or one perticular selected str. The original 'Address' column was removed afterward.

Address cleaned

Address column split into multiple structured fields. And replacing of strings.


✅ Conclusion

This data cleaning process helped transform raw, unstructured CSV data into a well-structured format ideal for further analysis. We used multiple data wrangling strategies including renaming, formatting, filtering, and standardizing—all using powerful Pandas functionalities.

📊 Project 2: Data Visualization with Python

Project Deployed on Github but not published on live html page. (GitHub link below....)

🔗 View Project

GitHub Repository - Visualization Project

🌐 Project 3: Web Scraping using Python

Project Deployed on Github but not published on live html page. (GitHub link below....)

🔗 View Project

GitHub Repository - WebScraping Project