Team AN

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

I.

INTRODUCTION
1. Reasons for choosing this topic
2. Why do we need to predict car prices
3. Attributes description
II. ANALYZE DATASET
1. Reading and Understanding the Data
2. Data Cleaning and Preparation
3. Visualizing the data
● Visualizing Categorical Data
● Visualizing numerical data
4. Deriving new features
5. Bivariate Analysis
6. Model Building
7. Check model by library
8. Prediction and Evaluation
III. SUMMARY
1. Conclusion
2. Contribution comments

i
ii
Title

iii
Acknowledgement

iv
Table of content
Acknowledgement..............................................................................................iv
Table of content...................................................................................................v
List of figures and tables....................................................................................vi
Participants.......................................................................................................viii
Chapter 1. INTRODUCTION.............................................................................1
1.1. Reasons for choosing this topic................................................................1
1.2. Predict car prices......................................................................................1
1.3. Attributes description...............................................................................2
1.4. Conclusion................................................................................................4
Chapter 2. Analyze Dataset.................................................................................5
2.1. Reading and Understanding Data.............................................................5
2.1.1. Data set..............................................................................................5
2.1.2. Data cleaning and preparation...........................................................5
2.1.3. Sửa các giá trị không hợp lệ..............................................................6
2.2. Visualize data...........................................................................................6
2.3. Deriving new features.............................................................................11
2.4. Bivariate analysis....................................................................................12
2.5. Building model.......................................................................................15

v
List of figures and tables
Figure 1.1. Car.....................................................................................................1
Table 1.1. Attributes............................................................................................2
Figure 1.2. Old car...............................................................................................4
Figure 2.1. Read CSV..........................................................................................5
Figure 2.1.............................................................................................................7
Figure 2.2.............................................................................................................8
Figure 2.?. data..................................................................................................16

vi
Participants

vii
Chapter 1. INTRODUCTION
1.1. Reasons for choosing this topic
Choosing the topic of Car Price Prediction is driven by the rapid growth of the
automotive industry, which plays a pivotal role in the global economy. Understanding
car prices provides a profound insight into a significant segment of the market.
Predicting car prices is essential for both buyers and sellers, as the dynamics of the
automotive market are subject to continual changes. The future fluctuations and
transformations in the automotive pricing landscape are of keen interest to my group
and many others. In this report, we aim to construct a predictive model for car prices
using the simple linear regression method.

Figure 1.1. Car

1.2. Predict car prices


The prices of cars are constantly in flux. In today's market, there exists a
myriad of different vehicle types, each with unique features and characteristics.
Companies seeking to enter the market must delve into the factors influencing
the prices of these vehicles to gain a competitive edge.
Likewise, individuals contemplating the purchase of a vehicle should be aware
of the reasons behind the pricing of a particular model to assess its value. Therefore,
there is a need for comprehensive price surveys, and based on this data, we aim to
construct a predictive model for car prices to understand the pricing dynamics of a
new market.

1
1.3. Attributes description
Table 1.1. Attributes

1 Car_ID Unique id of each observation (Interger)

2 Symboling Its assigned insurance risk rating, A value of +3


indicates that the auto is risky, -3 that it is probably
pretty safe.(Categorical)

3 carCompany Name of car company (Categorical)

4 fueltype Car fuel type i.e gas or diesel (Categorical)

5 aspiration Aspiration used in a car (Categorical)

6 doornumber Number of doors in a car (Categorical)

7 carbody body of car (Categorical)

8 drivewheel type of drive wheel (Categorical)

9 enginelocation Location of car engine (Categorical)

10 wheelbase Weelbase of car (Numeric)

11 carlength Length of car (Numeric)

12 carwidth Width of car (Numeric)

13 carheight height of car (Numeric)

14 curbweight The weight of a car without occupants or baggage.


(Numeric)

15 enginetype Type of engine. (Categorical)

2
16 cylindernumber cylinder placed in the car (Categorical)

17 enginesize Size of car (Numeric)

18 fuelsystem Fuel system of car (Categorical)

19 boreratio Boreratio of car (Numeric)

20 stroke Stroke or volume inside the engine (Numeric)

21 compressionratio compression ratio of car (Numeric)

22 horsepower Horsepower (Numeric)

23 peakrpm car peak rpm (Numeric)

24 citympg Mileage in city (Numeric)

25 highwaympg Mileage on highway (Numeric)

26 price(Dependent Price of car (Numeric)


variable)

Figure 1.2. Old car

1.4. Conclusion

3
Chapter 2. Analyze Dataset
2.1. Reading and Understanding Data
2.1.1. Data set

Figure 2.1. Read CSV


2.1.2. Data cleaning and preparation

4
2.1.3. Sửa các giá trị không hợp lệ

2.2. Visualize data

5
Figure 2.1

6
Figure 2.2.

7
8
We can see that

- The most common Company car is Toyota because the price of toyota is cheaper than other car

- Almost the cars fuel type is Gas, as gasoline is more common and cheaper than diesel

- Sedan is mostly used because it is affordable and maneuverable

9
2.3. Deriving new features
a. As we can see, city mpg represents the fuel consumption (miles per gallon -
MPG) in an urban area. Highway mpg represents the fuel consumption (miles
per gallon-MPG) on the highway.
Because two variables are significantly related, so we have an equation to
represent the fuel economy of cars based on two above variables

Because of the difference of gallon among countries, this leads to the different
value of parameters. In this situation, we choose 0.5 for city mpg and 0.5 for highway
mpg.

Besides fuel economy, the customers also consider the speed parameter when
buying cars. Therefore, horsepower and peak rpm are very important. Using two
independent variable to create optimal performance equation:

10
We have two new labels in dataset:

b. After deriving new features, we continue to show the car range by dividing the
car price to three groups: Budget, Medium and High End
1. Budget to show car price from 0 to 10000
2. Medium to show car price from 10000 to 20000
3. High End to show car price from 20000 to 40000
Therefore, we have new label which shows car range:

2.4. Bivariate analysis


In this part, we will show the relation among three variables:
a. fuel economy, price and drive wheel by scatter plot

11
b. optimal performance, price and engine type by scatter plot
And we have the result:

In this scatter diagram, most of the scatter which points to fuel economy via the
drive wheel focuses from 20 to 35.
Now, we will describe specifically:
1. when the fuel economy focuses from 28 to 35, cars having fwd accounts most.
Moreover, the price fluctuates between 7000 to 14000
2. when the fuel economy focuses from 20 to 25, cars having rwd accounts most
and the price of this group is higher than the above group( fluctuating from
15000 to 25000)
3. There are also a small group of cars which have a high price. The reason for
high price due to they waste the least energy ( only from 15- 20)
In conclusion, the higher price, the lower fuel economy
12
the second scatter:

In this scatter, most of points focus from 0.013 to 0.022


Now we will describe specifically:
1. the engine type ohc focuses mostly in array 0.013 to 0.018. Moreover, the price
of this group is quite low, only fluctuating form 5000 to 12000
2. Most of the points in this scatter indicate that the ohc engine type is common
with everyone
3. the highest price and biggest optimal performance occur with the engine type
ohcv
Prediction: the engine types which have price from 5000 to 15000 can be
Toyota because of according to above visualization, toyota also has price from 5000 to
15000
13
2.5. Building model
Now, we will build a model to predict the car price. In this situation, we make
a forecast about price according to fuel economy by constructing a single linear
regression.
As we know, a linear line is formed by y = ax+b and this is the same form
with linear regression.
Now we call an equation with + the independent variable: X(fueleconomy)
+ the dependent variable: y (price)
The equation has theta_0 and theta_1 to represent the free parameter and slope
of the linear equation.
=> linear regression: y = h(x) = theta_0 + theta_1*X
The aim of us is building the model in order that the distance from points to
linear regression is minimum.
Now, we will show you the formulas to calculate loss function, theta_0 and
theta_1:
1. Loss function:

with m is the number of variable X having in dataset


After that, deriving the loss function to find the minimum value
2. Theta_0 and theta_1:

After calculating the value of slope and free parameter, we have result:

14
Now, we have enough information to draw linear regression. However, we
must show the scatter plot to understand more about the distribution of data:

Figure 2.?. data


Fitting the data with linear regression:

15
16

You might also like