Professional Documents
Culture Documents
Team AN
Team AN
Team AN
INTRODUCTION
1. Reasons for choosing this topic
2. Why do we need to predict car prices
3. Attributes description
II. ANALYZE DATASET
1. Reading and Understanding the Data
2. Data Cleaning and Preparation
3. Visualizing the data
● Visualizing Categorical Data
● Visualizing numerical data
4. Deriving new features
5. Bivariate Analysis
6. Model Building
7. Check model by library
8. Prediction and Evaluation
III. SUMMARY
1. Conclusion
2. Contribution comments
i
ii
Title
iii
Acknowledgement
iv
Table of content
Acknowledgement..............................................................................................iv
Table of content...................................................................................................v
List of figures and tables....................................................................................vi
Participants.......................................................................................................viii
Chapter 1. INTRODUCTION.............................................................................1
1.1. Reasons for choosing this topic................................................................1
1.2. Predict car prices......................................................................................1
1.3. Attributes description...............................................................................2
1.4. Conclusion................................................................................................4
Chapter 2. Analyze Dataset.................................................................................5
2.1. Reading and Understanding Data.............................................................5
2.1.1. Data set..............................................................................................5
2.1.2. Data cleaning and preparation...........................................................5
2.1.3. Sửa các giá trị không hợp lệ..............................................................6
2.2. Visualize data...........................................................................................6
2.3. Deriving new features.............................................................................11
2.4. Bivariate analysis....................................................................................12
2.5. Building model.......................................................................................15
v
List of figures and tables
Figure 1.1. Car.....................................................................................................1
Table 1.1. Attributes............................................................................................2
Figure 1.2. Old car...............................................................................................4
Figure 2.1. Read CSV..........................................................................................5
Figure 2.1.............................................................................................................7
Figure 2.2.............................................................................................................8
Figure 2.?. data..................................................................................................16
vi
Participants
vii
Chapter 1. INTRODUCTION
1.1. Reasons for choosing this topic
Choosing the topic of Car Price Prediction is driven by the rapid growth of the
automotive industry, which plays a pivotal role in the global economy. Understanding
car prices provides a profound insight into a significant segment of the market.
Predicting car prices is essential for both buyers and sellers, as the dynamics of the
automotive market are subject to continual changes. The future fluctuations and
transformations in the automotive pricing landscape are of keen interest to my group
and many others. In this report, we aim to construct a predictive model for car prices
using the simple linear regression method.
1
1.3. Attributes description
Table 1.1. Attributes
2
16 cylindernumber cylinder placed in the car (Categorical)
1.4. Conclusion
3
Chapter 2. Analyze Dataset
2.1. Reading and Understanding Data
2.1.1. Data set
4
2.1.3. Sửa các giá trị không hợp lệ
5
Figure 2.1
6
Figure 2.2.
7
8
We can see that
- The most common Company car is Toyota because the price of toyota is cheaper than other car
- Almost the cars fuel type is Gas, as gasoline is more common and cheaper than diesel
9
2.3. Deriving new features
a. As we can see, city mpg represents the fuel consumption (miles per gallon -
MPG) in an urban area. Highway mpg represents the fuel consumption (miles
per gallon-MPG) on the highway.
Because two variables are significantly related, so we have an equation to
represent the fuel economy of cars based on two above variables
Because of the difference of gallon among countries, this leads to the different
value of parameters. In this situation, we choose 0.5 for city mpg and 0.5 for highway
mpg.
Besides fuel economy, the customers also consider the speed parameter when
buying cars. Therefore, horsepower and peak rpm are very important. Using two
independent variable to create optimal performance equation:
10
We have two new labels in dataset:
b. After deriving new features, we continue to show the car range by dividing the
car price to three groups: Budget, Medium and High End
1. Budget to show car price from 0 to 10000
2. Medium to show car price from 10000 to 20000
3. High End to show car price from 20000 to 40000
Therefore, we have new label which shows car range:
11
b. optimal performance, price and engine type by scatter plot
And we have the result:
In this scatter diagram, most of the scatter which points to fuel economy via the
drive wheel focuses from 20 to 35.
Now, we will describe specifically:
1. when the fuel economy focuses from 28 to 35, cars having fwd accounts most.
Moreover, the price fluctuates between 7000 to 14000
2. when the fuel economy focuses from 20 to 25, cars having rwd accounts most
and the price of this group is higher than the above group( fluctuating from
15000 to 25000)
3. There are also a small group of cars which have a high price. The reason for
high price due to they waste the least energy ( only from 15- 20)
In conclusion, the higher price, the lower fuel economy
12
the second scatter:
After calculating the value of slope and free parameter, we have result:
14
Now, we have enough information to draw linear regression. However, we
must show the scatter plot to understand more about the distribution of data:
15
16