Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Used Cars in Saudi

Arabia (EDA)
Comets Members:
Turki Bintaleb
Saud Almuhaysh
Lujain Alomari
Wejdan Alhashim
Abdullah Ashmawi
TABLE OF CONTENTS

01 02 03 04 05
Market Collecting Data Cleaning Exploring
Opportunities Data Translation Dataset Dataset
01
Market Opportunities
Market Opportunities for Used
Cars in Saudi Arabia

● Most Saudi families own at least two cars The Revenue of Used Cars Market in KSA
30
● Saudis use their car for a range of activities
25
● No good transportation system

The Revenue $
20

(in billion)
● Looking for better bargains 15
24.3
● The market is estimated to grow at a CAGR 10
16.8

of 6.3% in 2025 5

0
2019 2025
[Reference] http://glasgowconsultinggroup.com/market-opportunities-
for-used-cars-in-saudi-arabia-2020/ Year
02
Collecting Data
Collecting Data

● Using web scraping to extract data by Selenium

● Syarah (https://syarah.com/home)
Collecting Data
● Find the data and Inspecting the Page
Collecting Data
● Store the data in the required format
03
Data Translation
Our Translation Process

Find Unique Values

Translate The Dictionary


Step 01
Step 03
Step 02
Step 04

Create a New Dictionary


Translate The Dataset
Find all unique values from
specific columns

Fuel Gear
COLUMN Make Type Origin Color Options
Type Type
Condition Region

Length 65 435 4 15 4 3 3 1 27

9 Columns
Add All Unique Values From Specific
Columns To The Dictionary

The code Arabic Dictionary


Translate The Dictionary From Arabic
To English Manually

Why Didn't We Use An External Library For Translation?


- The Number Of The Unique Values Is Small Except Type Column
- For Type Column, We Can't Translate It Correctly By The Translation Library
- Performance
Translate The Dictionary From Arabic
To English Manually
English Dictionary
Using The Dictionary To Translate
The Dataset

Old Value New Value


Using The Dictionary To Translate
The Dataset
Before Translation

After Translation
04
Cleaning Dataset
Cleaning Dataset

Purpose
The purpose of the data cleaning is to clean-up the ‘Used Cars (Syarah)’ dataset by
removing missing and other out of place characters.

Dataset
Number Of Columns = 14
Number Of Rows = 8248
Cleaning Dataset
Process
Remove column that contain Step
"‫ "على السوم‬in price column 01

Step Change the data type


02 of price column
Step Remove duplicate columns depend on
03 repeated values specific columns

Identifying and handling the Step


missing values 04
Fill NA in Origin column Step
by Unknown 05

Step Identifying and handling the miss value functions to


06 clean NaN and return mode value of target_column
depend on conditions (Make and Type) in next step
values
Fill NA in options column Step
by clean_na function 07

Step Fill NA in Engine_Size column


08 by Clean_NA function
Step Fill NA in Gear_Type column
09 by Clean_NA function

Step
After clean NaN values
10
Drop rows with NaN Step
11

Step Drop link and condition columns


12 (final shape for dataset)
The Dataset on
05
Exploring Dataset
Exploring Dataset
Brand Count

Top cars Toyota 1270

Hyundai 719

Ford 512

Chevrolet 424

Nissan 362

Kia 268
Most Popular Cars By Brand
Most Popular Cars By Brand
Most Popular Cars By Brand
Price Comparison for Top Brands
Count of Cars for Each Origin,
Fuel Type, Color, and Region
Thank You ☺
Any Questions ?
Comets Team

You might also like