Professional Documents
Culture Documents
Analysis On Car Resale Price
Analysis On Car Resale Price
─
Nikhil N, 01FB15ECS188
J Shiv Santosh, 01FB15ECS131
Mohammad Fahad, 01FB15ECS173
Goal
Prediction
A common question for any person selling their car would be about what the best
price is at which they can sell their used car and for a person who is looking to buy a
secondhand car, their aim would be to get the best deal out there.
We answer these question based on 7 aspects of a car. The model can be used to
predict the approximate price with the corresponding features.
Analysis
We will be showing how each specification of a car affects the secondhand price in
the market.
With this model, a buyer can prevent himself/herself from overpaying for a car and
a seller can prevent himself/herself from being underpaid for his/her car.
Data
Our data is gathered from a website in the country Germany and will be reflecting 10 years
of data for over 40 companies.
● Brand
● Vehicle Type
○ We have used 5 dummy variables for the six types. The value of the variable
is 1 if it is of that type, while all others will be 0.
● Year Of Registration
● Gearbox
● Power
○ The reason we have limited the data of cars with maximum power to 1000 is
because the most powerful engine ever produced by BMW is in the M7 which
itself has a power of around 600PS. And hence, anything above 1000 PS is
obviously either incorrect data or heavily modified cars.
● Distance
■ Data has count of cars for each range. For example, there 164 cars
whose distance travelled is in between 0 to 5000 kilometers. Data is
built a way such that ‘5000’ is given the value of 164.
● Fuel Type
● Not Repaired/Damaged
Process
● Filter data to match requirements.
○ Brand - BMW
○ Maximum Power - 1000 PS
○ Year Of Registration - 1997 onwards
● Create dummy variables
○ 5 variables(cabrio, coupe, suv, hatchback, sedan) for vehicle type.
○ Damaged
○ Petrol
○ Automatic
● Apply Multiple Linear Regression
○ Our dependent variable would be “price”
○ With 11 explanatory variables.
Model
Price = (Intercept) + ( year Of Registration * β₁ ) + ( power * β₂ ) + ( Distance * β₃ ) +
( Damaged * β₄ ) + ( Petrol * β₅ ) + ( coupe * β₆ ) + ( hatchback * β₇ ) + ( sedan * β₈ ) +
( cabrio * β₉ ) + ( suv * β₁ ₀ ) + ( Automatic * β₁ ₁ )
4
Interpretation
From the table we can see the the p-value for all the variables except for “cabrio” are
lesser than 0.05.
Variable “cabrio” ’s p-value of 0.2438 which is greater than the standard threshold
p-value of 0.05, indicates there is very weak evidence against null hypothesis.
5
From the graph, we can see that there are quite a few cars of type cabrio that are
very expensive, which may be the reason for large p-value for “cabrio”.
Coefficients Estimate
The graph above is a histogram of number of cars of each
(Intercept) -1757000 type. The green point indicates the the average price of that
type.
coupe -2625 Number on the Y-axis applies for both count and price.
The measure is count when we take histogram in
hatchback -4359 consideration and Euros if we take the points into
consideration. The histogram does not show anything for bus
sedan -3942 count because the number of buses in our data is very low
when compared to other types.
cabrio -963.0
suv -2151
From the graph we can see that buses have the highest average price. And since all
other variables’ coefficients have a negative value, if vehicle of interest is a bus, it is more
likely to be of a higher price than if it is anything else.
Inference Since hatchback’s coefficient is the lowest value, we can say that on an average
hatchbacks are the least expensive type of car available. Hence, a person looking to buy a
car for a cheap price, he/she is more likely to find one suitable if they look for hatchbacks.
7
Damage
Coefficient Estimate The above histogram shows the count of damaged and
repaired/not damaged cars.
Damaged -1760
It is clearly seen that the average price of a damaged car is way lower than a car that
is not damaged. And negative coefficient indicates that value of our dependent variable is
1760€ less if a vehicle taken into consideration is damaged than it it is not.
The histogram on
the left shows the
proportions of vehicle
of each type that are
damaged.
It is seen that
comparatively,
proportion of sedans
and hatchbacks that
are damaged is higher
than the other types.
8
Inference A buyer may have to be more cautious while checking for damage in a vehicle if
they choose to buy a sedan or a hatchback. From a seller perspective, a person may choose
to repair any damage to the car if it will cost him less than 1000-1500€, since the value of a
damaged car of the same specs can sell for around 1700-1800€ less than a repaired.
Coefficient Estimate The above graph is a histogram of number of car of each fuel
type. And the point indicates the average price of each fuel
Petrol -662.9
type.
It is seen that the average diesel powered vehicle is greater. And a negative
coefficient -662.9 indicates, diesel powered vehicles has more value than petrol powered
vehicles.
9
Inference A seller may charge more if his/her vehicle is diesel powered variant and/or
auto transmission included vehicles.
From a buyer’s perspective, since both features adds up to the cost, with this model
the buyer can now give preference to each feature. For example, if a person prefers diesel
vehicle, he can save up some money by compromising on manual transmission.
Table below shows how the cost changes on an average for the combination of both
the features.
Manual & Petrol Automatic & Petrol Manual & Diesel Automatic & Diesel
Year Of Registration
The bar graph above shows the number of cars sold each year. It can be seen that
on an average, a person is more likely to prefer selling his/her car after 9-12 years.
11
Coefficient Estimate The above graph indicates the average price of vehicles for
each year.
yearOfRegistration 883.0
It is clearly seen that newer the car, more likely that it costs
high.
A positive coefficient (883) indicates that newer cars are given more value than older
ones.
Distance
Coefficient Estimate The graph below contains average prices of cars for distance
travelled.
kilometer -0.08124
12
As the distance travelled by a car increases, its value decreases. The graph below
clearly agrees with the same, since the points are going lower as we go across X-axis.
The reason why we included histogram with the count of cars in the same graph is
to explain the abnormally small value for 5000 km.
Looking at the histogram, we can see that the number of cars which have travelled
5000 kilometers or less is extremely low. Due to the severely low number of data points,
we get an unexpected graph.
Inference For every 10000 kilometers a car travels, its value decreases by 800-900€ on an
average
Power
Coefficient Estimate The graph below contains price of car corresponding to its
13
The graph below shows that with increase in power of the car, the price of the car
also increases. Positive coefficient of “powerPS” variable agrees with the same.