Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Used Car Price Prediction


Nikhil N, 01FB15ECS188
J Shiv Santosh, 01FB15ECS131
Mohammad Fahad, 01FB15ECS173

Goal
Prediction
A common question for any person selling their car would be about what the best
price is at which they can sell their used car and for a person who is looking to buy a
secondhand car, their aim would be to get the best deal out there.
We answer these question based on 7 aspects of a car. The model can be used to
predict the approximate price with the corresponding features.

Analysis
We will be showing how each specification of a car affects the secondhand price in
the market.
With this model, a buyer can prevent himself/herself from overpaying for a car and
a seller can prevent himself/herself from being underpaid for his/her car.

Data
Our data is gathered from a website in the country Germany and will be reflecting 10 years
of data for over 40 companies.

● Brand

○ Since our data has 40 companies, we will be requiring either 39 dummy


variable or 40 separate models for each company. And we have considered
to make a separate model.
2

○ We have chosen BMW as our brand of interest.

○ Our utilized data contains 24568 BMW cars’ details.

● Vehicle Type

○ Which is being one of cabrio, coupe, bus, hatchback, SUV or a sedan.

○ We have used 5 dummy variables for the six types. The value of the variable
is 1 if it is of that type, while all others will be 0.

● Year Of Registration

○ The year in which the car was bought and registered.

○ Years ranging from 1997 to 2016.

■ We have limited the oldest to be 1997 because making a model out of


cars which are more than 10 years old would make the model
imprecise. Importance to features change over time.
And at the same time we would have to consider antiquity as a key
feature which is not possible with a database majorly containing cars
from the 21st century.

● Gearbox

○ One of automatic and manual.

○ We have used 1 dummy variable “Automatic”, if ‘1’ indicates the


corresponding car has automatic transmission, else manual transmission
gearbox.

● Power

○ Power(in PS) of car ranging till 1000PS.

○ The reason we have limited the data of cars with maximum power to 1000 is
because the most powerful engine ever produced by BMW is in the M7 which
itself has a power of around 600PS. And hence, anything above 1000 PS is
obviously either incorrect data or heavily modified cars.

● Distance

○ Number of kilometers a car has run.

○ It is not continuous variable, but a single number.


3

■ Platform from which we gathered data, gives user a choice between


ranges travelled. (0-5000 km, 5000 -10000 km,10000-20000 km, etc)

■ Data has count of cars for each range. For example, there 164 cars
whose distance travelled is in between 0 to 5000 kilometers. Data is
built a way such that ‘5000’ is given the value of 164.

● Fuel Type

○ We have used one dummy variable “Petrol”

○ If “Petrol” is 0, indicates car runs on diesel. If “Petrol” is 1, car runs on petrol.

● Not Repaired/Damaged

○ We have used one dummy variable “Damaged”.

○ If “Damaged” is 1, indicates the car is damaged or some problem in the car


has not been repaired.

Process
● Filter data to match requirements.
○ Brand - BMW
○ Maximum Power - 1000 PS
○ Year Of Registration - 1997 onwards
● Create dummy variables
○ 5 variables(cabrio, coupe, suv, hatchback, sedan) for vehicle type.
○ Damaged
○ Petrol
○ Automatic
● Apply Multiple Linear Regression
○ Our dependent variable would be “price”
○ With 11 explanatory variables.

Model
Price = (Intercept) + ( year Of Registration * β₁ ) + ( power * β₂ ) + ( Distance * β₃ ) +
( Damaged * β₄ ) + ( Petrol * β₅ ) + ( coupe * β₆ ) + ( hatchback * β₇ ) + ( sedan * β₈ ) +
( cabrio * β₉ ) + ( suv * β₁ ₀ ) + ( Automatic * β₁ ₁ )
4

Coefficients Estimate Std. Error t value Pr(>|t|)

(Intercept) -1757000 23620 -74.363 < 2e-16

yearOfRegistration 883.0 11.71 75.375 < 2e-16

powerPS 61.78 0.7355 83.994 < 2e-16

kilometer -0.08124 0.001304 -62.293 < 2e-16

Damaged -1760 143.9 -12.233 < 2e-16

Petrol -662.9 92.04 -7.202 6.10e-13

coupe -2625 828.4 -3.169 0.00153

hatchback -4359 824.0 -5.289 1.24e-07

sedan -3942 820.8 -4.803 1.57e-06

cabrio -963.0 826.3 -1.165 0.24385

suv -2151 843.5 -2.550 0.01078

Automatic 477.2 86.61 5.510 3.63e-08

Residual standard error 5622 on 24555 degrees of freedom

Multiple R-squared 0.6778

Adjusted R-squared 0.6777

F-statistic 4696 on 11 and 24555 DF

p-value < 2.2e-16

Interpretation
From the table we can see the the p-value for all the variables except for “cabrio” are
lesser than 0.05.

Variable “cabrio” ’s p-value of 0.2438 which is greater than the standard threshold
p-value of 0.05, indicates there is very weak evidence against null hypothesis.
5

From the graph, we can see that there are quite a few cars of type cabrio that are
very expensive, which may be the reason for large p-value for “cabrio”.

Data Visualization & Model Comparison


Vehicle Type
6

Coefficients Estimate
The graph above is a histogram of number of cars of each
(Intercept) -1757000 type. The green point indicates the the average price of that
type.
coupe -2625 Number on the Y-axis applies for both count and price.
The measure is count when we take histogram in
hatchback -4359 consideration and Euros if we take the points into
consideration. The histogram does not show anything for bus
sedan -3942 count because the number of buses in our data is very low
when compared to other types.
cabrio -963.0

suv -2151

From the graph we can see that buses have the highest average price. And since all
other variables’ coefficients have a negative value, if vehicle of interest is a bus, it is more
likely to be of a higher price than if it is anything else.

Inference Since hatchback’s coefficient is the lowest value, we can say that on an average
hatchbacks are the least expensive type of car available. Hence, a person looking to buy a
car for a cheap price, he/she is more likely to find one suitable if they look for hatchbacks.
7

Damage

Coefficient Estimate The above histogram shows the count of damaged and
repaired/not damaged cars.
Damaged -1760

It is clearly seen that the average price of a damaged car is way lower than a car that
is not damaged. And negative coefficient indicates that value of our dependent variable is
1760€ less if a vehicle taken into consideration is damaged than it it is not.

The histogram on
the left shows the
proportions of vehicle
of each type that are
damaged.
It is seen that
comparatively,
proportion of sedans
and hatchbacks that
are damaged is higher
than the other types.
8

Inference A buyer may have to be more cautious while checking for damage in a vehicle if
they choose to buy a sedan or a hatchback. From a seller perspective, a person may choose
to repair any damage to the car if it will cost him less than 1000-1500€, since the value of a
damaged car of the same specs can sell for around 1700-1800€ less than a repaired.

Fuel & Gearbox

Coefficient Estimate The above graph is a histogram of number of car of each fuel
type. And the point indicates the average price of each fuel
Petrol -662.9
type.

It is seen that the average diesel powered vehicle is greater. And a negative
coefficient -662.9 indicates, diesel powered vehicles has more value than petrol powered
vehicles.
9

Coefficient Estimate The graph above is a histogram of number of cars of each


transmission type. And the point indicates the average price
Automatic 477.2
of each.

It is seen that on an average a vehicle with automatic transmission is likely to be more


expensive than manual transmission. “Automatic” variable’s coefficient value of 477.2
indicates that for two vehicles with all same specifications but different types of gearbox,
the model predicts a price of 477.2€ higher for the one with automatic transmission.

Inference A seller may charge more if his/her vehicle is diesel powered variant and/or
auto transmission included vehicles.
From a buyer’s perspective, since both features adds up to the cost, with this model
the buyer can now give preference to each feature. For example, if a person prefers diesel
vehicle, he can save up some money by compromising on manual transmission.

Table below shows how the cost changes on an average for the combination of both
the features.
Manual & Petrol Automatic & Petrol Manual & Diesel Automatic & Diesel

-662.9 -185.7 0 477.2


10

Year Of Registration

The bar graph above shows the number of cars sold each year. It can be seen that
on an average, a person is more likely to prefer selling his/her car after 9-12 years.
11

Coefficient Estimate The above graph indicates the average price of vehicles for
each year.
yearOfRegistration 883.0
It is clearly seen that newer the car, more likely that it costs
high.

A positive coefficient (883) indicates that newer cars are given more value than older
ones.

Inference On an average, a car’s value approximately decreases around 800-900€ every


year.

Distance

Coefficient Estimate The graph below contains average prices of cars for distance
travelled.
kilometer -0.08124
12

As the distance travelled by a car increases, its value decreases. The graph below
clearly agrees with the same, since the points are going lower as we go across X-axis.

Negative coefficient for variable “kilometer” indicates, as the variable's’ value

increases, price predicted by model decreases.

The reason why we included histogram with the count of cars in the same graph is
to explain the abnormally small value for 5000 km.

Looking at the histogram, we can see that the number of cars which have travelled
5000 kilometers or less is extremely low. Due to the severely low number of data points,
we get an unexpected graph.

Inference For every 10000 kilometers a car travels, its value decreases by 800-900€ on an
average

Power
Coefficient Estimate The graph below contains price of car corresponding to its
13

powerPS 61.78 power(in PS) .

The graph below shows that with increase in power of the car, the price of the car
also increases. Positive coefficient of “powerPS” variable agrees with the same.

You might also like