Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

Coefficient of Correlation

A coefficient of correlation is generally applied in statistics to calculate a relationship


between two variables. The correlation shows a specific value of the degree of a
linear relationship between the X and Y variables, say X and Y. There are various
types of correlation coefficients. However, Pearson’s correlation (also known as
Pearson’s R) is the correlation coefficient that is frequently used in linear regression.

Pearson’s Coefficient Correlation


Karl Pearson’s coefficient of correlation is an extensively used mathematical method
in which the numerical representation is applied to measure the level of relation
between linearly related variables. The coefficient of correlation is expressed by “r”.

Karl Pearson Correlation Coefficient Formula

Alternative Formula (covariance formula)

Pearson correlation example


1. When a correlation coefficient is (1), that means for every increase in one variable,
there is a positive increase in the other fixed proportion. For example, shoe sizes
change according to the length of the feet and are perfect (almost) correlations.

2. When a correlation coefficient is (-1), that means for every positive increase in one
variable, there is a negative decrease in the other fixed proportion. For example, the
decrease in the quantity of gas in a gas tank shows a perfect (almost) inverse
correlation with speed.
3. When a correlation coefficient is (0) for every increase, that means there is no
positive or negative increase, and the two variables are not related.

3.3.3 - Probabilities for Normal Random Variables (Z-scores)


The standard normal is important because we can use it to find probabilities for a
normal random variable with any mean and any standard deviation.

But first, we need to explain Z-scores.


Z-value, Z-score, or Z
We can convert any normal distribution into the standard normal distribution in order
to find probability and apply the properties of the standard normal. In order to do this,
we use the z-value.

Z-value, Z-score, or Z

The Z-value (or sometimes referred to as Z-score or simply Z) represents the


number of standard deviations an observation is from the mean for a set of
data. To find the z-score for a particular observation we apply the following
formula:

�=(�������� ����� −����)��


Let's take a look at the idea of a z-score within context.

For a recent final exam in STAT 500, the mean was 68.55 with a standard deviation
of 15.45.

 If you scored an 80%: �=(80−68.55)15.45=0.74, which means your score of 80 was


0.74 SD above the mean.
 If you scored a 60%: �=(60−68.55)15.45=−0.55, which means your score of 60 was
0.55 SD below the mean.
Is it always good to have a positive Z score? It depends on the question. For exams,
you would want a positive Z-score (indicates you scored higher than the mean).
However, if one was analyzing days of missed work then a negative Z-score would
be more appealing as it would indicate the person missed less than the mean
number of days.

Characteristics of Z-scores

 The scores can be positive or negative.


 For data that is symmetric (i.e. bell-shaped) or nearly symmetric, a common
application of Z-scores for identifying potential outliers is for any Z-scores that are
beyond ± 3.
 Maximum possible Z-score for a set of data is (�−1)�
From Z-score to Probability
For any normal random variable, if you find the Z-score for a value (i.e standardize
the value), the random variable is transformed into a standard normal and you can
find probabilities using the standard normal table.

For instance, assume U.S. adult heights and weights are both normally distributed.
Clearly, they would have different means and standard deviations. However, if you
knew these means and standard deviations, you could find your z-score for your
weight and height.

You can now use the Standard Normal Table to find the probability, say, of a
randomly selected U.S. adult weighing less than you or taller than you.

Understanding Bayesian networks in AI


A Bayesian network is a type of graphical model that uses probability to determine the
occurrence of an event. It is also known as a belief network or a causal network. It consists
of directed cyclic graphs (DCGs) and a table of conditional probabilities to find out the
probability of an event happening. It contains nodes and edges, where edges connect the
nodes. The graph is acyclic - meaning there is no direct path where one node can reach
another. The table of probability, on the other hand, shows the likelihood that a random
variable will take on certain values.
The above image illustrates a directed acyclic graph. There are five nodes, namely, a, b, c, d,
and e. From the above graph, we can get the following information:

1. Node a is the parent of node b, c, and e, and node b, c, and e are the child nodes of node a.
2. Node b and c are the parent nodes of d.
3. Node e is the child node of nodes d, c, and a.

It is important to note the relationships between the nodes. Bayesian networks fall under
probabilistic graphical techniques; hence, probability plays a crucial role in defining the
relationship among these nodes.

There are two types of probabilities that you need to be fully aware of in Bayesian networks:

1. Joint probability

Joint probability is a probability of two or more events happening together. For example, the
joint probability of two events A and B is the probability that both events occur, P(A∩B).

2. Conditional probability

Conditional probability defines the probability that event B will occur, given that event A has
already occurred. There are two ways joint probability can be represented:

The conditional probability distribution of each node is represented by a table called the
"node table". It contains two columns, one for each possible state of the parent node (or
"parent random variable") and one for each possible state of the child node (or "child random
variable").

The rows in this table correspond to all possible combinations of parent and child states. In
order to find out how likely it is that a certain event will happen, we need to sum up the
probabilities from all paths of that event.

Bayesian network in artificial intelligence examples


Here’s an example to better understand the concept.

You have installed a burglar alarm at home. The alarm not only detects burglary but also
responds to minor earthquakes. You have two neighbors, Chris and Martin, who have agreed
to get in touch with you when the alarm rings. Chris calls you when he hears the alarm but
sometimes confuses it with the telephone ringing and calls. On the other hand, Martin is a
music lover who sometimes misses the alarm due to the loud music he plays.

Problem:
Based on the evidence on who will or will not call, find the probability of a burglary
occurring in the house.

In a Bayesian network, we can see nodes as random variables.

There are five nodes:

1. Burglary (B)
2. Earthquake (E)
3. Alarm (A)
4. Chris calls ( C )
5. Martin calls (M)

Links act as causal dependencies that define the relationship between the nodes. Both Chris
and Martin call when there is an alarm.

Let’s write the probability distribution function formula for the above five nodes.
Now, let's look at the observed values for each of the nodes with the table of probabilities:

Node B:

Node E:

Node A:
Node C:

Node M:

Based on the above observed values, the conditional values can be derived and, therefore, the
probability distribution can be calculated.

Conditional values for the above nodes are:

Node A:

Node C:
Node M:

To calculate the joint distribution, we use the formula:

“P[B,E,A,C,M] = P[C | A] P[M | A] P[A | B, E] P[B | E] P[E]”


Applications of Bayesian networks in AI
Bayesian networks find applications in a variety of tasks such as:

1. Spam filtering: A spam filter is a program that helps in detecting unsolicited and spam
mails. Bayesian spam filters check whether a mail is spam or not. They use filtering to learn
from spam and ham messages.
2. Biomonitoring: This involves the use of indicators to quantify the concentration of
chemicals in the human body. Blood or urine is used to measure the same.
3. Information retrieval: Bayesian networks assist in information retrieval for research, which
is a constant process of extracting information from databases. It works in a loop. Hence, we
have to continuously reconsider and redefine our research problem to avoid data overload.
4. Image processing: A form of signal processing, image processing uses mathematical
operations to convert images into digital format. Once images are converted, their quality can
be enhanced with more operations. The input image doesn’t necessarily have to be in the
form of an image; it could be a photograph or a video frame.
5. Gene regulatory network: A Bayesian network is an algorithm that can be applied to gene
regulatory networks in order to make predictions about the effects of genetic variations on
cellular phenotypes. Gene regulatory networks are a set of mathematical equations that
describe the interactions between genes, proteins, and metabolites. They are used to study
how genetic variations affect the development of a cell or organism.
6. Turbo code: Turbo codes are a type of error correction code capable of achieving very high
data rates and long distances between error correcting nodes in a communications system.
They have been used in satellites, space probes, deep-space missions, military
communications systems, and civilian wireless communication systems, including WiFi and
4G LTE cellular telephone systems.
7. Document classification: This is a problem often encountered in computer science and
information science. Here, the main issue is to assign a document multiple classes. The task
can be achieved manually and algorithmically. Since manual effort takes too much time,
algorithmic documentation is done to complete it quickly and effectively.
We have seen what Bayesian networks in machine learning are and how they work. To recap,
they are a type of probabilistic graphical model. The first stage of belief networks is to
convert all possible states of the world into beliefs, which are either true or false. In the
second stage, all possible transitions between states are encoded as conditional probabilities.
The final stage is to encode all possible observations as likelihoods for each state.

A belief network can be seen as an inference procedure for a set of random variables,
conditioned on some other random variables. The conditional independence assumptions
define the joint probability distribution from which the conditional probabilities are
computed.
What is regression analysis and what
does it mean to perform a regression?
Regression analysis is a reliable method of identifying which variables have
impact on a topic of interest. The process of performing a regression allows
you to confidently determine which factors matter most, which factors can
be ignored, and how these factors influence each other.

In order to understand regression analysis fully, it’s essential to


comprehend the following terms:

 Dependent Variable: This is the main factor that you’re trying to


understand or predict.
 Independent Variables: These are the factors that you hypothesize have
an impact on your dependent variable.
In our application training example above, attendees’ satisfaction with the
event is our dependent variable. The topics covered, length of sessions,
food provided, and the cost of a ticket are our independent variables.

How does regression analysis work?


In order to conduct a regression analysis, you’ll need to define a dependent
variable that you hypothesize is being influenced by one or several
independent variables.

You’ll then need to establish a comprehensive dataset to work with.


Administering surveys to your audiences of interest is a terrific way to
establish this dataset. Your survey should include questions addressing all
of the independent variables that you are interested in.

Let’s continue using our application training example. In this case, we’d
want to measure the historical levels of satisfaction with the events from the
past three years or so (or however long you deem statistically significant),
as well as any information possible in regards to the independent
variables.

Perhaps we’re particularly curious about how the price of a ticket to the
event has impacted levels of satisfaction.

To begin investigating whether or not there is a relationship between these


two variables, we would begin by plotting these data points on a chart,
which would look like the following theoretical example.
(Plotting your data is the first step in figuring out if there is a relationship
between your independent and dependent variables)

Our dependent variable (in this case, the level of event satisfaction) should
be plotted on the y-axis, while our independent variable (the price of the
event ticket) should be plotted on the x-axis.

Once your data is plotted, you may begin to see correlations. If the
theoretical chart above did indeed represent the impact of ticket prices on
event satisfaction, then we’d be able to confidently say that the higher the
ticket price, the higher the levels of event satisfaction.

But how can we tell the degree to which ticket price affects event satisfaction?
To begin answering this question, draw a line through the middle of all of
the data points on the chart. This line is referred to as your regression line,
and it can be precisely calculated using a standard statistics program like
Excel.

We’ll use a theoretical chart once more to depict what a regression line
should look like.

The regression line represents the relationship between your independent


variable and your dependent variable.

Excel will even provide a formula for the slope of the line, which adds
further context to the relationship between your independent and
dependent variables.

The formula for a regression line might look something like Y = 100 + 7X +
error term.
This tells you that if there is no “X”, then Y = 100. If X is our increase in
ticket price, this informs us that if there is no increase in ticket price, event
satisfaction will still increase by 100 points.

You’ll notice that the slope formula calculated by Excel includes an error
term. Regression lines always consider an error term because in reality,
independent variables are never precisely perfect predictors of dependent
variables. This makes sense while looking at the impact of ticket prices on
event satisfaction — there are clearly other variables that are contributing
to event satisfaction outside of price.

Your regression line is simply an estimate based on the data available to


you. So, the larger your error term, the less definitively certain your
regression line is.

Why should your organization use


regression analysis?
Regression analysis is helpful statistical method that can be leveraged
across an organization to determine the degree to which particular
independent variables are influencing dependent variables.

The possible scenarios for conducting regression analysis to yield valuable,


actionable business insights are endless.

The next time someone in your business is proposing a hypothesis that


states that one factor, whether you can control that factor or not, is
impacting a portion of the business, suggest performing a regression
analysis to determine just how confident you should be in that hypothesis!
This will allow you to make more informed business decisions, allocate
resources more efficiently, and ultimately boost your bottom line.

Least Squares Method: What It Means, How to Use It, With Examples
By

WILL KENTON
Updated September 24, 2023

Reviewed by
MICHAEL J BOYLE

Fact checked by

YARILET PEREZ
Investopedia / Xiaojie Liu

Trending Videos

Close this video player


0 seconds of 15 secondsVolume 0%
What Is the Least Squares Method?
The least squares method is a form of mathematical regression analysis
used to determine the line of best fit for a set of data, providing a visual
demonstration of the relationship between the data points. Each point of
data represents the relationship between a known independent variable
and an unknown dependent variable. This method is commonly used by
statisticians and traders who want to identify trading opportunities and
trends.

KEY TAKEAWAYS

 The least squares method is a statistical procedure to find the best fit
for a set of data points.
 The method works by minimizing the sum of the offsets or residuals
of points from the plotted curve.
 Least squares regression is used to predict the behavior of
dependent variables.
 The least squares method provides the overall rationale for the
placement of the line of best fit among the data points being studied.
 Traders and analysts can use the least squares method to identify
trading opportunities and economic or financial trends.

Understanding the Least Squares Method


The least squares method is a form of regression analysis that provides
the overall rationale for the placement of the line of best fit among the data
points being studied. It begins with a set of data points using two variables,
which are plotted on a graph along the x- and y-axis. Traders and analysts
can use this as a tool to pinpoint bullish and bearish trends in the market
along with potential trading opportunities.

The most common application of this method is sometimes referred to as


linear or ordinary. It aims to create a straight line that minimizes the sum of
squares of the errors generated by the results of the associated equations,
such as the squared residuals resulting from differences in the observed
value and the value anticipated based on that model.

For instance, an analyst may use the least squares method to generate a
line of best fit that explains the potential relationship between independent
and dependent variables. The line of best fit determined from the least
squares method has an equation that highlights the relationship between
the data points.
If the data shows a lean relationship between two variables, it results in a
least-squares regression line. This minimizes the vertical distance from the
data points to the regression line. The term least squares is used because
it is the smallest sum of squares of errors, which is also called the
variance. A non-linear least-squares problem, on the other hand, has no
closed solution and is generally solved by iteration.

Dependent variables are illustrated on the vertical y-axis, while


independent variables are illustrated on the horizontal x-axis in regression
analysis. These designations form the equation for the line of best fit,
which is determined from the least squares method.

Advantages and Disadvantages of the Least


Squares Method
The best way to find the line of best fit is by using the least squares
method. But traders and analysts may come across some issues, as this
isn't always a fool-proof way to do so. Some of the pros and cons of using
this method are listed below.

Advantages
One of the main benefits of using this method is that it is easy to apply and
understand. That's because it only uses two variables (one that is shown
along the x-axis and the other on the y-axis) while highlighting the best
relationship between them.

Investors and analysts can use the least square method by analyzing past
performance and making predictions about future trends in the economy
and stock markets. As such, it can be used as a decision-making tool.

Disadvantages
The primary disadvantage of the least square method lies in the data used.
It can only highlight the relationship between two variables. As such, it
doesn't take any others into account. And if there are any outliers, the
results become skewed.

Another problem with this method is that the data must be evenly
distributed. If this isn't the case, the results may not be reliable.

You might also like