Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Statistics for Business Decisions

Project on
Correlation and Regression Analysis
Number of Vehicles and Pollution Levels
New Delhi
Certificate of Originality
This is to certify that this project has been made by Mr. Alekh
Kushwaha, Mr. Ashish Sharma, Mr. Harsh Lal, Mr. Kartik Jain,
First year students of Bachelor of Management Studies,
Ramanujan College, University of Delhi. This project has been
made under the guidance of our honorable statistics professor
Dr. K. Latha. The project submitted is our original work and we
are responsible for the content, except as specified in
acknowledgements or references.
We would like to express our special thanks of gratitude to
our teacher Dr. K. Latha as well as our principal Dr. S.P.
Aggarwal who gave us the golden opportunity to do this
wonderful project on the topic correlation and regression
analysis on number of vehicles and pollution levels in New
Delhi which also helped us in doing a lot of research and we
came to know about so many new things we are really
thankful to them.

Secondly we would also like to thank our parents and friends

who helped us a lot in finalizing this project.
 The air quality in Delhi, the capital of India,
according to a WHO survey of 1600 world cities,
is the worst of any major city in the world.
 Air pollution in India is estimated to kill 1.5
million people every year; it is the fifth largest
killer in India. India has the world's highest death
rate from chronic respiratory diseases and asthma,
according to the WHO.
 In Delhi, poor quality air damages irreversibly the
lungs of 2.2 million or 50 percent of all children.
Air quality or ambient (outdoor) air pollution is represented
by the annual mean concentration of particulate matter PM10
which are particles smaller than 10 microns.
Safe levels for PM according to the WHO's air quality
guidelines are 20 μg/m3 (annual mean) for PM10
2.2 million children in Delhi have irreversible lung damage
due to the poor quality of the air. In addition, research
shows that pollution can lower children’s intelligence
quotient and increase the risk of autism, diabetes and even
adult-onset diseases like multiple sclerosis.
Poor air quality is also a cause of reduced lung capacity,
headaches, sore throats, coughs, fatigue, and early death.
Causes of Air Pollution in New
Motor vehicle emissions are one of the causes of
poor air quality. According to some reports, 80 per
cent of PM10 air pollution is caused by vehicular
Other causes include wood-burning fires, fires on
agricultural land, exhaust from diesel generators, dust
from construction sites, and burning garbage.
The vehicular population in the national capital
registered a 135.59 per cent jump between 1999-2000
The rising number of motor vehicles has been a
primary cause for causing Delhi’s air pollution
Objective of the project
The objective of this project is to analyze the relationship
between the number of motor vehicles and air pollution in
New Delhi through statistical measures like correlation and
 The goal of the project is to analyze if the rising vehicular
population has been a cause of Delhi’s air pollution
We have taken data for ten years between 2001 and 2011 to
study the correlation between the number of vehicles and
worsening air quality of New Delhi.
Correlation is a statistical technique that can show whether and
how strongly pairs of variables are related. The main result of a
correlation is called the correlation coefficient (or "r"). It ranges
from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the
two variables are related.
If r is close to 0, it means there is no relationship between the
variables. If r is positive, it means that as one variable gets larger
the other gets larger. If r is negative it means that as one gets larger,
the other gets smaller (often called an "inverse" correlation).
 Correlations are useful because they can indicate a
predictive relationship that can be exploited in practice.
The most familiar measure of dependence between two
quantities is the  "Pearson's correlation coefficient“. It is
obtained by dividing the covariance of the two variables by
the product of their standard deviations. 
Karl Pearson developed the coefficient from a similar but
slightly different idea by Francis Galton.
Pearson's correlation coefficient when applied to
a sample is commonly represented by the letter r and may
be referred to as the sample correlation coefficient or
the sample Pearson correlation coefficient. We can obtain a
formula for r by substituting estimates of the covariances
and variances based on a sample into the formula above. 
No. of Registered Motor Vehicles in Delhi
YEAR Number of motor vehicles
2001 3635000
2002 3699000
2003 3971000
2004 4236000
2005 4186000
2006 4487000
2007 5492000
2008 5899000
2009 6302000
2010 6746000
2011 7228000
Annual Average Ambient PM10 concentration in Delhi

YEAR Annual Average of PM10 concentration

2001 120
2002 140
2003 130
2004 135
2005 120
2006 135
2007 160
2008 220
2009 250
2010 260
2011 270

source: and CPCB report

Correlation Coefficient Formula
Correlation Analysis
The coefficient of correlation between the two
variables number of vehicles (X) and Average Annual
PM10 levels comes out to be 0.959077.
The coefficient of correlation is very high which
signifies there is a relationship between the two
This signifies that due to a positive increase in the
number of vehicles on the roads of Delhi the average
consideration pollution causing PM10 has also shown a
positive increase.
In this case the coefficient of correlation is very close
to perfect correaltion.
Graph showing correlation between number of
vehicles and annual average PM10 levels in New
Correlation scatter plot
Annual Average PM10 levels







3000 3500 4000 4500 5000 5500 6000 6500 7000 7500

Number of vehicles (in thousands)

 Regression is a statistical measure used in finance, investing and
other disciplines that attempts to determine the strength of the
relationship between one dependent variable (usually denoted by
Y) and a series of other changing variables (known as independent
Regression analysis is widely used for prediction and forecasting,
where its use has substantial overlap with the field of machine
learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent
variable, and to explore the forms of these relationships.
 Regression Formula:
Regression Equation
y = a + bx
Slope (b) = (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2)
Intercept (a) = (ΣY - b(ΣX)) / N
The next slide shows that in this case the equation of
the regression line thus formed is y = 0.44x-50.98.
Here 0.44 is the slope of the regression line. R square
is a large value which shows high degree of
correlation between the two values.
A n n u a l A v e r a g e P M 1 0 le v e l
Regression analysis


f(x) = 0.04 x − 50.98

250 R² = 0.92





3635 4135 4635 5135 5635 6135 6635 7135 7635

Number of vehicles (in thousands)

From the above analysis, we obtain the correlation 0.95907747 which
indicates the high degree of relation between the increasing number of
vehicles on the roads of Delhi to the pollution that persists in the
High correlation shows that the level of pollution hikes up as the
number of registered motor vehicles increases. Since the city's
pollution is rising as a whole, the pollution from the vehicles occupies
a major part of it. less number of cars shall contribute towards lesser
pollution and hence a safer environment. Carpooling, Ecofriendly
Gases, Less usage of vehicles can also contribute towards the
betterment of the people of Delhi.
“simple and multiple regression analysis” - ND Vohra
“simple, multiple and partial correlation analysis” - ND Vohra

You might also like