Professional Documents
Culture Documents
Da Project Report
Da Project Report
The project is aimed at deriving a correlation between the price of the second hand cars and
various parameters like mileage (kilometer), age, power etc. The motivation to do this project is
to find a measure of price standardisation into the second-hand market of cars as it is gaining
impetus with big players like Maruthi, Mahindra, Tata and Hyundai growing their foothold in the
industry. With independent individual agents loosing their market share and established players
starting to dominate, standardization in terms of quality, warranties, services and price will come
into play. These market changes provide enough justification to the importance of the project
which aims at drawing a standardized price along various makes and models. we
DATA
Data in use is secondary data about the second hand car sales in United States of America (sold
in 2016). It contains the following data:
Date of sale
Name of the car ( contains brand name, model name and variant which are usually
specified by engine capacity and/or fuel injection system initials)
Nature of ownership (private/public)
Price
Vehicle type (sedan, coupe, suv etc)
Year of registration
Type of gear box (manual & automatic)
Power in ps
Model name
Mileage / Kilometer used
Fuel type
Brand name
This data set was cleaned and modified for the ease of usage. Modifications included in the data
are as follows:
Date of sale is converted to year of sale
Variants are not separately analyzed
DATA USED
Data used include the following car makers and their chosen models
AUDI MERCEDES VOLVO HONDA VOLKSWAGON CHEVORLET
A1 850 accord Captiva
A3 Andree Andree aveo
A4 c_reihe civic matiz
A5 s60 Cr_reihe spark
A6 v40 jazz Andree
v50
v60
v70
xc_reihe
METHEDOLOGY
RESTRICTIONS
Second hand car price is influenced by various qualitative data such as number of times the car
claimed insurance coverage above certain limit, number of previous owners etc. Quantifying
such qualitative data is extremely complicated and uncertain. These parameters influence the
price very deeply and this has impacted our model from being one of a very high degree of
fitness.
ANALYSIS OF VARIANCE
Assumptions
The sample is normal. To ascertain the normality of the data (which according to central
limit theorem should adhere to normal characteristics as the sample consists of a large
number of entries) we can run a normality test.
Population variances are assumed to be equal
Samples are independent
Test
Anova is carried out among data of various models of a car maker to ascertain which model
belongs to the upper spending bracket and those which belong to the lower spending bracket
from historical data.
For instance, Audi was analysed with the help of historical sales data of its 5 models, namely A1,
A3, A4, A5 and A6.
Hypothesis Testing
- Defining Null hypothesis and Alternative hypothesis
Ho : Mean sales prices of all models are equal
HA : Mean sales prices of all models are not equal
Analysis of Variance
P value is greater than α. Hence we don’t reject the null hypothesis. It means all means could be
equal at a significance level of 5%
Further plotting the confidence interval for difference of means among various models, we get
From the above graph, we can infer that Audi A6 tends to trade at a higher price than Audi A1,
A3 and A4 which trade at almost the same levels. Audi A5 trades at a slightly higher rate than
the previous three.
REGRESSION TEST
In the second phase of the project we are trying to develop a model using certain chosen
characteristics of second hand cars to predict a cars value.
- Defining response variable and explanatory variables
- Response variable : Price
- Explanatory variables : Mileage/ Kilometer
Power
Age of the car
AUDI A1
Forming the regression equation for Audi A1,
Regression Equation
price = 12898 - 632.9 AGE - 0.03812 kilometer + 54.58 powerPS
Here we define the null hypothesis and alternate hypothesis to test the explanatory variables
- Hoi : Explanatory variable ‘i’ is not correlated to price
- Hai : Explanatory variable ‘i’ is correlated to price
Coefficients
From the p values we can conclude that Null hypothesis is rejectable. Therefore the alternate
hypothesis which says that all the variables are correlated to price is established.
Model Summary
Coefficient of determination (adjusted) along with standard error of the estimate (S)
points out to a decent fitting regression model. For real life data, achieving an ideal R2
value is not easy.
Here , our model is able to predict 56.4% of the price and the rest is unexplained. The
unexplained part points out to missing parameters/ explanatory variables , which can
improve the existing model.
PREDICTION & CONCLUSION
From the formed equation, we will try to predict the sales price of a second hand car.
To facilitate comparison of our model and real price, we will use a data from the data
set to predict selling price an then compare it to the original sales price.
For Eg:
Prediction
POSSIBILITIES
More models can be created as per the requirements of buyers, for instance by sorting cars
according to the fuel preference, transmission type, vehicle type etc.