Da Project Report

TOPIC :
The project is aimed at deriving a correlation between the price of the second hand cars and
various parameters like mileage (kilometer), age, power etc. The motivation to do this project is
to find a measure of price standardisation into the second-hand market of cars as it is gaining
impetus with big players like Maruthi, Mahindra, Tata and Hyundai growing their foothold in the
industry. With independent individual agents loosing their market share and established players
starting to dominate, standardization in terms of quality, warranties, services and price will come
into play. These market changes provide enough justification to the importance of the project
which aims at drawing a standardized price along various makes and models. we
DATA
Data in use is secondary data about the second hand car sales in United States of America (sold
in 2016). It contains the following data:
 Date of sale
 Name of the car ( contains brand name, model name and variant which are usually
specified by engine capacity and/or fuel injection system initials)
 Nature of ownership (private/public)
 Price
 Vehicle type (sedan, coupe, suv etc)
 Year of registration
 Type of gear box (manual & automatic)
 Power in ps
 Model name
 Mileage / Kilometer used
 Fuel type
 Brand name
This data set was cleaned and modified for the ease of usage. Modifications included in the data
are as follows:
 Date of sale is converted to year of sale
 Variants are not separately analyzed
DATA USED
Data used include the following car makers and their chosen models
AUDI MERCEDES VOLVO HONDA VOLKSWAGON CHEVORLET
A1 850 accord Captiva
A3 Andree Andree aveo
A4 c_reihe civic matiz
A5 s60 Cr_reihe spark
A6 v40 jazz Andree
v50
v60
v70
xc_reihe
METHEDOLOGY
The project is divided into 2 sections.

 Firstly , Anova is used to find out the variation of the mean price of different brands from
the average selling price of second hand cars. This helps us define those makers and their
respective brands which form premium segment and economy segment on the basis of
their variance from mean selling price.
 Secondly, we try to build a regression model to predict the future sales price of concerned
brands and models on the basis of chosen parameters. The closeness of the model
depends on the parameters chosen, and their influence on price. The difference between
the projected and real price depends on the lack of the appropriate parameter.
RESTRICTIONS
Second hand car price is influenced by various qualitative data such as number of times the car
claimed insurance coverage above certain limit, number of previous owners etc. Quantifying
such qualitative data is extremely complicated and uncertain. These parameters influence the
price very deeply and this has impacted our model from being one of a very high degree of
fitness.
ANALYSIS OF VARIANCE
Assumptions
 The sample is normal. To ascertain the normality of the data (which according to central
limit theorem should adhere to normal characteristics as the sample consists of a large
number of entries) we can run a normality test.
 Population variances are assumed to be equal
 Samples are independent
Test
Anova is carried out among data of various models of a car maker to ascertain which model
belongs to the upper spending bracket and those which belong to the lower spending bracket
from historical data.
For instance, Audi was analysed with the help of historical sales data of its 5 models, namely A1,
A3, A4, A5 and A6.
Hypothesis Testing
- Defining Null hypothesis and Alternative hypothesis
Ho : Mean sales prices of all models are equal
HA : Mean sales prices of all models are not equal
Significance level set at 5%
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value

Factor 4 4.91387E+12 1.22847E+12 1.51 0.195
Error 24577 1.99460E+16 8.11571E+11
Total 24581 1.99509E+16
P value is greater than α. Hence we don’t reject the null hypothesis. It means all means could be
equal at a significance level of 5%
Further plotting the confidence interval for difference of means among various models, we get
From the above graph, we can infer that Audi A6 tends to trade at a higher price than Audi A1,
A3 and A4 which trade at almost the same levels. Audi A5 trades at a slightly higher rate than
the previous three.
REGRESSION TEST
In the second phase of the project we are trying to develop a model using certain chosen
characteristics of second hand cars to predict a cars value.
- Defining response variable and explanatory variables
- Response variable : Price
- Explanatory variables : Mileage/ Kilometer
Power
Age of the car
AUDI A1
Forming the regression equation for Audi A1,
Regression Equation
price = 12898 - 632.9 AGE - 0.03812 kilometer + 54.58 powerPS
Here we define the null hypothesis and alternate hypothesis to test the explanatory variables
- Hoi : Explanatory variable ‘i’ is not correlated to price
- Hai : Explanatory variable ‘i’ is correlated to price
Coefficients
Term Coef SE Coef T-Value P-Value VIF

Constant 12898 384 33.59 0.000
AGE -632.9 63.8 -9.91 0.000 1.42
Kilometer -0.03812 0.00333 -11.45 0.000 1.42
powerPS 54.58 3.01 18.12 0.000 1.00
From the p values we can conclude that Null hypothesis is rejectable. Therefore the alternate
hypothesis which says that all the variables are correlated to price is established.
Model Summary
S R-sq R-sq(adj) R-sq(pred)

2267.36 56.61% 56.40% 55.82%
Coefficient of determination (adjusted) along with standard error of the estimate (S)
points out to a decent fitting regression model. For real life data, achieving an ideal R2
value is not easy.
Here , our model is able to predict 56.4% of the price and the rest is unexplained. The
unexplained part points out to missing parameters/ explanatory variables , which can
improve the existing model.
PREDICTION & CONCLUSION
From the formed equation, we will try to predict the sales price of a second hand car.
To facilitate comparison of our model and real price, we will use a data from the data
set to predict selling price an then compare it to the original sales price.
For Eg:
Inserting the data highlighted in red as our explanatory variables we get
Prediction
Fit SE Fit 95% CI 95% PI

10065.3 217.789 (9637.56, 10493.0) (5591.93, 14538.6)
ORIGINAL PRICE PREDICTED PRICE

$ 10800 $10065.3
POSSIBILITIES
More models can be created as per the requirements of buyers, for instance by sorting cars
according to the fuel preference, transmission type, vehicle type etc.

Da Project Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Da Project Report

Uploaded by

Copyright:

Available Formats

TOPIC :

The project is divided into 2 sections.

Significance level set at 5%

Source DF Adj SS Adj MS F-Value P-Value

Error 24577 1.99460E+16 8.11571E+11

Total 24581 1.99509E+16

Term Coef SE Coef T-Value P-Value VIF

S R-sq R-sq(adj) R-sq(pred)

Inserting the data highlighted in red as our explanatory variables we get

Fit SE Fit 95% CI 95% PI

ORIGINAL PRICE PREDICTED PRICE

You might also like