Professional Documents
Culture Documents
Summer Internship Outlook
Summer Internship Outlook
Group
Summer Internship
BY
Abhishek Yadav
Content
Task 1: Create Five Problem statement and Model the relationships
Task 4: Forecasting
Introduction
The house prices prediction will help us to decide whether the house they
desire to buy is worth of the price or not.
By making use of the house price prediction system, the vendor would be
ready to decide what all features he/she could add to the house so that the
house can be sold for a higher price.
OBJECTIVE:
The objective of this task is to predict house prices based on various
parameters.
Regression Analysis:
Dependent variable: Price
Independent variable:
Joint Plot
Regression Analysis: Results
Price = 6.46584626601122 -
0.00357375Age - 0.03122348lintst +
0.50968087larea + 0.0787914lland +
0.05111123rooms + 0.10779092baths
PS 2: Relationship between distance from
incinerator and interstate to the price of the house
Introduction
Customers buying a replacement car are often assured of the cash
they invest to be worthy
So, there is a need for a used car price prediction system to
effectively determine the worthiness of the car using a variety of
features.
OBJECTIVE:
The objective of this project is to analyze various factors
affecting the price of used Car using Multiple regression
Analysis.
Regression Analysis:
Variable Description
Model Model Description
Age_08_04 Age in months as in August 2004
Accumulated Kilometers on
KM odometer
HP Horsepower
Cylinder Volume in cubic
CC centimetres
Doors Number of doors
Quarterly_Tax Quarterly road tax in EUROs
Weight Weight in Kilograms
Descriptive Analysis
Quarterly_T
Price Age KM HP cc Doors Weight
ax
count 1436 1436 1436 1436 1436 1436 1436 1436
10730.8 68533.2 1072.459
mean 55.94 101.50 1576.85 4.03 87.12
2 5 6
37506.4
std 3626.96 18.59 14.98 424.38 0.95 41.13 52.64112
5
min 4350 1 1 69 1300 2 19 1000
25% 8450 44 43000 90 1400 3 69 1040
50% 9900 61 63389.5 110 1600 4 85 1070
87020.7
75% 11950 70 110 1600 5 85 1085
5
max 32500 80 243000 192 16000 5 283 1615
Scatter Plot
Joint Plot:
Price is decreases with increase in the Age and the distance travel (KM) of the
car.
Price is increase with increase in Horsepower (HP)or increase in quarterly tax.
Task 4:
Forecasting
Quarterly year Sales per quarter (x $
It is a planning tool that helps Sales 10,000)
management in its attempts to
cope with the uncertainty of the I II III IV
future, relying mainly on data 1991 16 21 9 18
from the past and present and
analysis of trends. 1992 15 20 10 18
1993 17 24 13 22
Problem formulation and data
collection: 1994 17 25 11 21
Sales of goods are given for four
years on quarterly bases now fill 1995 18 26 14 25
the tables of Trend, Index value,
seasonal sales etc with the help of
forecasting.
Year Quarter Actual 4-quarter 4-quarter moving 4-quarter Percentage of
sales moving total average centred moving actual to
average moving average
evaluation:
II
21 NA
64
NA
NA
NA
III 9 16 15.875 56.7
63
IV 18 15.75 15.625 115.2
62
1992 I 15 15.5 15.625 96.0
63
II 20 15.75 15.75 127.0
Actual to Moving Data: 63
III 10 15.75 16 62.5
We use Holt’s linear 65
exponential smoothing.
IV
18
69
16.25
16.75
107.5
Deseasonalization: 1
2
1991 I
II
16
21
0.946745562
1.286335404
15.1
15.3
3 III 9 0.657807309 15.5
4 IV 18 1.094520548 15.7
5 1992 I 15 0.934306569 15.9
Intercept 14.91052632 6 II 20 1.274131274 16.0
7 III 10 0.673640167 16.2
8 IV 18 1.07860262 16.4
Slope 0.189473684 9 1993 I 15 0.928909953 16.6
10 II 20 1.272108844 16.8
11 III 10 0.681818182 17.0
12 IV 18 1.048192771 17.2
13 1994 I 17 0.918918919 17.4
14 II 24 1.282442748 17.6
15 III 13 0.728971963 17.8
16 IV 18 1.063829787 17.9
17 1995 I 17 0.894736842 18.1
18 II 24 1.220338983 18.3
19 III 13 0.742857143 18.5
20 IV 22 1 18.7
Cyclical Variation:
Year Quarter Deseasonalized sales Seasonal Seasonalized Sales
(Y) x index/100 Y=a+bx Percent of trend
1991 I 15.1 1 0.922222222 13.92555556 92.2
II 15.3 2 1.288888889 19.70643275 128.9
III 15.5 3 0.633333333 9.803333333 63.3
IV 15.7 4 1.155555556 18.10573099 115.6
1992 I 15.9 5 0.922222222 14.62450292 92.2
II 16.0 6 1.288888889 20.68327485 128.9
III 16.2 7 0.633333333 10.28333333 63.3
IV 16.4 8 1.155555556 18.98152047 115.6
1993 I 16.6 9 0.922222222 15.32345029 92.2
II 16.8 10 1.288888889 21.66011696 128.9
III 17.0 11 0.633333333 10.76333333 63.3
IV 17.2 12 1.155555556 19.85730994 115.6
1994 I 17.4 13 0.922222222 16.02239766 92.2
II 17.6 14 1.288888889 22.63695906 128.9
III 17.8 15 0.633333333 11.24333333 63.3
IV 17.9 16 1.155555556 20.73309942 115.6
1995 I 18.1 17 0.922222222 16.72134503 92.2
II 18.3 18 1.288888889 23.61380117 128.9
III 18.5 19 0.633333333 11.72333333 63.3
IV 18.7 20 1.155555556 21.60888889 115.6
There are 14 attributes in each case of the dataset. They are:
Task 2: Tree
CRIM per capita crime rate by town
Making ZN proportion of residential land zoned for lots over
25,000 sq. ft.
INDUS proportion of non-retail business acres per town.
Objective: Analyse the CHAS Charles River dummy variable (1 if tract bounds river;
dataset and analyse the nature 0 otherwise)
of the variables and find out NOX nitric oxides concentration (parts per 10 million)
the course of action using a RM average number of rooms per dwelling
suitable decision tree. AGE proportion of owner-occupied units built prior to
1940
DIS weighted distances to five Boston employment
centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000
Pair Wise Scatter Plot for all variables:
Correlation Matrix:
Decision Tree:
Accuracy: 0.9210526315789473 Precision: 0.68 Recall: 0.8095
Confusion Matrix is: [[123 8] [ 4 17]] Mean Squared Error:
0.07894736842105263 Mean Absolute Error: 0.07894736842105263
Root Mean Squared Error: 0.28097574347450816