Unit 3 Big Data

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Regression Modelling

Regression analysis is a form of predictive modelling technique


which investigates the relationship between a dependent (target)
and independent variable (s) (predictor)
• This technique is used for forecasting, time series modelling and
finding the causal effect relationship between the variables.
Dependent and Independent variables

• In data science, variables refer to the properties or characteristics of


certain events or objects. 
• There are mainly two types of variables while performing regression
analysis which is as follows: 
Dependent and Independent variables

• Independent variables – These variables are manipulated or are


altered by researchers whose effects are later measured and
compared.
• They are also referred to as predictor variables.
• They are called predictor variables because they predict or forecast
the values of dependent variables in a regression model. 
Dependent and Independent variables

Dependent variables – These variables are the type of variable that


measures the effect of the independent variables on the testing units.
It is safer to say that dependent variables are completely dependent on
them.
They are also referred to as predicted variables. They are called
because these are the predicted or assumed values by the independent
or predictor variables. 
Dependent and Independent variables
Dependent and Independent variables
• In data models, independent variables can have different names such
as
• “regressors”,
• “explanatory variable”, “
• input variable”,
• “controlled variable”, etc.  
Dependent and Independent variables
On the other hand, dependent variables are called
• “regressand,”
• “response variable”,
• “measured variable,”
• “observed variable,”
• “responding variable,”
• “explained variable,” “outcome variable,” “experimental variable,” or
“output variable.”
• Below are a few examples to understand the usage and significance of
dependent and independent variables in a wider sense: 
• Suppose you want to estimate the cost of living of a person using a
regression model. In that case, you need to take independent
variables as factors such as salary, age, marital status, etc. The cost of
living of a person is highly dependent on these factors. Thus, it is
designated as the dependent variable. 
• Another scenario is in the case of a student's poor performance in an
examination. The independent variable could be factors, for example,
poor memory, inattentiveness in class, irregular attendance, etc. Since
these factors will affect the student's score, the dependent variable,
in this case, is the student's score.  
• Suppose you want to measure the effect of different quantities of
nutrient intake on the growth of a newborn child. In that case, you
need to consider the amount of nutrient intake as the independent
variable. In contrast, the dependent variable will be the growth of the
child, which can be calculated by factors such as height, weight, etc. 
• What is the difference between Regression and Classification?
• Regression and Classification both come under supervised learning
methods, which indicate that they use labelled training datasets to
train their models and make future predictions.
• Thus, these two methods are often classified under the same column
in machine learning.
• Supervised learning, also known as supervised machine learning, is a
subcategory of machine learning and artificial intelligence. It is
defined by its use of labeled datasets to train algorithms that to
classify data or predict outcomes accurately.
• However, the key difference between them is the output variable. In
regression, the output tends to be numerical or continuous, whereas,
in classification, the output is categorical or discrete in nature.  

You might also like