Ba CH - 3

Business Analytics
Chapter – 3
Predictive Analytics
Predictive Analytics:
Predictive analytics is the branch of the advanced analytics which is used to
make predictions about unknown future events. It uses many techniques
from data mining, statistics, modeling, machine learning, and artificial
intelligence to analyze current data to make predictions about future.
Predictive analytics are used to determine customer responses or purchases,
as well as promote cross-selling opportunities. Predictive models help
businesses attract, retain and grow their most profitable customers.
Improving operations. Many companies use predictive models to forecast
inventory and manage resources.
It predicts what is most likely to happen in the future. Prescriptive
Analytics recommends actions you can take to affect those outcomes
Predictive analytics uses historical data to predict future events. Typically,
historical data is used to build a mathematical model that captures important
trends. That predictive model is then used on current data to predict what
will happen next, or to suggest actions to take for optimal outcomes.
Companies which use predictive analysis for its organization are:
• Retail
• Healthcare
• Entertainment
• Manufacturing
• Cybersecurity
• Human resources
• Sports
• Weather Forecast
Forecasting Techniques:
Trend Lines:
A trend line is a line drawn over pivot highs or under pivot lows to show the
prevailing direction of price. Trend lines are a visual representation of support and
resistance in any time frame. They show direction and speed of price, and also
describe patterns during periods of price contraction.
In finance, a trend line is a bounding line for the price movement of a security. It
is formed when a diagonal line can be drawn between a minimum of three or
more price pivot points. A line can be drawn between any two points, but it does
not qualify as a trend line until tested.
A trend line connects a swing low to a swing high, from the lowest point of the
downward movement to the highest point in the upward movement. When the
price rises, the trend line rises accordingly. Connecting these lows with
a line results in an ascending trend line, showing you that the prices are trending
upwards.
The general rule in technical analysis is that it takes two points to draw a trend
line and the third point confirms the validity. When you add a trend line to a chart
in Microsoft Excel, you can choose any of the six different trend/regression types
1. Linear
2. Logarithmic
3. Polynomial
4. Power
5. Exponential
6. Moving average
Trends
• A trend is a pattern in a set of results displayed in a graph.
• In the graph below, although there is not a straight line increase in figures,
overall the trend here is that sales are increasing.
• In this graph, the trend is that sales are increasing.
Regression Analysis:
The term "regression" was coined by Francis Galton in the nineteenth century to
describe a biological phenomenon. The phenomenon was that the heights of
descendants of tall ancestors tend to regress down towards a normal average (a
phenomenon also known as regression toward the mean). Regression analysis is a
set of statistical methods used for the estimation of relationships between a
dependent variable and one or more Independent Variables. An independent
variable is an input, assumption, or driver that is changed in order to assess its
impact on a dependent variable. Regression analysis is all about determining how
changes in the independent variables are associated with changes in the dependent
variable. Coefficients tell you about these changes and p-values tell you if these
coefficients are significantly different from zero. Typically, a regression
analysis is done for one of two purposes: In order to predict the value of the
dependent variable for individuals for whom some information concerning the
explanatory variables is available, or in order to estimate the effect of some
explanatory variable on the dependent variable.
The two primary uses for regression in business are forecasting and optimization.
In addition to helping managers predict such things as future demand for their
products, regression analysis helps fine-tune manufacturing and delivery
processes.
Linear Regression works by using an independent variable to predict the values
of dependent variable. ... The equation can be of the form: y = mx + b where y is
the predicted value, m is the gradient of the line and b is the point at which the
line strikes the y-axis.
Statistical Methods for Finding the Best Regression Model

• Adjusted R-squared and Predicted R-squared: Generally, you choose the models
that have higher adjusted and predicted R-squared values
• P-values for the predictors: In regression, low p-values indicate terms that are
statistically significant. (p <0.05)- SPSS
The best fit line is the one that minimises sum of squared differences between
actual and estimated results. Taking average of minimum sum of squared
difference is known as Mean Squared Error (MSE). Smaller the value, better
the regression model
Correlation is a single statistic, or data point, whereas regression is the entire
equation with all of the data points that are represented with
a line. Correlation shows the relationship between the two variables,
while regression allows us to see how one affects the other.
R-squared is a statistical measure of how close the data are to the fitted regression
line. It is also known as the coefficient of determination, or the coefficient of
multiple determination for multiple regression. ... 100% indicates that the model
explains all the variability of the response data around its mean
Types of Regression Analysis:
1. Linear model
2. Multiple regression model
In statistics, linear regression is a linear approach to modelling the relationship
between a scalar response (Dependent) and one or more explanatory variables
(Independent). The case of one explanatory variable is called simple linear
regression; for more than one, the process is called multiple linear regression.
The Linear Regression Equation:
The equation has the form Y= a + bX, where Y is the dependent variable (that's
the variable that goes on the Y axis), X is the independent variable (i.e. it is
plotted on the X axis), b is the slope of the line and a is the y-intercept.
In linear regression, the relationships are modelled using linear predictor
functions whose unknown model parameters are estimated from the data. Such
models are called linear models. Linear regression was the first type of regression
analysis to be studied rigorously, and to be used extensively in practical
applications. This is because models which depend linearly on their unknown
parameters are easier to fit than models which are non-linearly related to their
parameters and because the statistical properties of the resulting estimators are
easier to determine.
Multiple Regression:
Multiple regression is an extension of linear regression models that allow
predictions of systems with multiple independent variables. It does this by simply
adding more terms to the linear regression equation, with each term representing
the impact of a different physical parameter. For example, if you're doing
a multiple regression to try to predict blood pressure (the dependent variable)
from independent variables such as height, weight, age, and hours of exercise per
week, you'd also want to include sex as one of your independent variables. It is an
extension of simple linear regression. It is used when we want to predict the value
of a variable based on the value of two or more other variables. The variable we
want to predict is called the dependent variable (or sometimes, the outcome, target
or criterion variable).
Multiple regression generally explains the relationship
between multiple independent or predictor variables and one dependent or
criterion variable. A dependent variable is modelled as a function of several
independent variables with corresponding coefficients, along with the
constant term. These models are used to study the correlations between two
or more independent variables and one dependent variable. These would be
useful when conducting research where two possible independent variables
could affect one dependent variable.
Multiple Regression in Research Methodology: Multiple regression is a
general and flexible statistical method for analyzing associations between
two or more independent variables and a single dependent
variable. ... Multiple regression is most commonly used to predict values of
a criterion variable based on linear associations with predictor variables.
A linear regression model extended to include more than one independent variable
is called a multiple regression model. It is more accurate than to the simple
regression. The purpose of multiple regressions are:
i) Planning and control
ii) Prediction or forecasting.
Data Mining:
Data mining is a process used by companies to turn raw data into useful
information. By using software to look for patterns in large batches of data,
businesses can learn more about their customers to develop more effective
marketing strategies, increase sales and decrease costs
In simple words, data mining is defined as a process used to extract
usable data from a larger set of any raw data. It implies
analysing data patterns in large batches of data using one or more software.
Data mining is also known as Knowledge Discovery in Data (KDD).

Data mining is the process of finding anomalies, patterns and correlations
within large data sets to predict outcomes. Using a broad range of
techniques, you can use this information to increase revenues, cut costs,
improve customer relationships, reduce risks and more.
Data mining has several types, including pictorial data mining,
text mining, social media mining, web mining, and audio and
video mining amongst others
Different Data Mining Methods
• Association.
• Classification.
• Clustering Analysis.
• Prediction.
• Sequential Patterns or Pattern Tracking.
• Decision Trees.
• Outlier Analysis or Anomaly Analysis.
• Neural Network
Other Applications
◦ Text mining (news group, email, documents) and Web analysis.
◦ Intelligent query answering
Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
◦ Fraud detection and management
 target marketing, customer relation management, market basket
analysis, cross selling, market segmentation
◦ Risk analysis and management
 Database analysis and decision support
◦ Market analysis and management Data Mining— Potential
Applications
Why we need Data Mining
Volume of information is increasing everyday that we can handle from
business transactions, scientific data, sensor data, Pictures, videos, etc. So,
we need a system that will be capable of extracting essence of information
available and that can automatically generate report,
views or summary of data for better decision-making.
Why Data Mining is required in Business:
Data mining is used in business to make better managerial decisions by:
• Automatic summarization of data
• Extracting essence of information stored.
• Discovering patterns in raw data.
Knowledge Discovery in Database(KDD): is an iterative process where
evaluation measures can be enhanced, mining can be refined, new
data can be integrated and transformed in order to get different and
more appropriate results.
• Preprocessing of databases consists of Data cleaning and Data
Integration
Data Mining Techniques:
Classification Analysis: It is a data analysis task within data-mining, that
identifies and assigns categories to a collection of data to allow for more
accurate analysis. The classification method makes use of mathematical
techniques such as decision trees, linear programming, neural network and
statistics. Classification means arranging the mass of data into different classes or
groups on the basis of their similarities and resemblances. ... For example, if we
have collected data regarding the number of students admitted to a university in a
year, the students can be classified on the basis of sex
Cluster Analysis in Data Mining means that to find out the group of objects
which are similar to each other in the group but are different from the object in
other groups. This is a data mining method used to place data elements in their
similar groups. Cluster is the procedure of dividing data objects into
subclasses. Clustering quality depends on the way that we used. Clustering is also
called data segmentation as large data groups are divided by their similarity.
Association Learning: Association Rule Mining, as the name suggests,
association rules are simple If/Then statements that help discover
relationships between seemingly independent relational databases or
other data repositories. Most machine learning algorithms work with
numeric datasets and hence tend to be mathematical. However,
association rule mining is suitable for non-numeric, categorical data
and requires just a little bit more than simple counting.
Anomaly Detection: Anomaly detection is the identification of rare
events, items, or observations which are suspicious because they differ
significantly from standard behaviors or patterns. Anomalies in data are
also called standard deviations, outliers, noise, novelties, and
exceptions. Anomaly detection (aka outlier analysis) is a step in data
mining that identifies data points, events, and/or observations that
deviate from a dataset's normal behavior.
Anomalous data can indicate critical incidents, such as a technical glitch, or
potential opportunities, for instance a change in consumer behavior.
Choice Modeling: Most traditional mode choice models are based on the principle

of random utility maximization derived from econometric theory. ... The capability
and performance of two emerging pattern recognition data mining methods,
decision trees (DT) and neural networks (NN), for work travel mode choice
modeling were investigated
Rule Induction: Rule induction is an area of machine learning in which

formal rules are extracted from a set of observations. ... Data mining in general
and rule induction in detail are trying to create algorithms without human
programming but with analyzing existing data structures. Rule induction is a data
mining process of deducing if-then rules from a data set. These symbolic decision
rules explain an inherent relationship between the attributes and class labels in
the data set. Many real-life experiences are based on intuitive rule induction.
Neural Networks: Neural networks are used for effective data mining in order to
turn raw data into useful information. Neural networks look for patterns in large
batches of data, allowing businesses to learn more about their customers which
directs their marketing strategies, increase sales and lowers costs. Neural
networks are designed to work just like the human brain does. In the case of
recognizing handwriting or facial recognition, the brain very quickly makes some
decisions. For example, in the case of facial recognition, the brain might start with
“It is female or male?
Use of discovered knowledge:
Pattern evaluation and knowledge presentation
◦ visualization, transformation, removing redundant patterns, etc.
Data mining: search for patterns of interest
Choosing the mining algorithm(s)
Choosing functions of data mining
◦ summarization, classification, regression, association, clustering.
Data reduction and transformation:
◦ Find useful features, dimensionality/variable reduction, invariant
representation.
Data cleaning and preprocessing: (may take 60% of effort!)
Creating a target data set: data selection
Learning the application domain:
◦ relevant prior knowledge and goals of application
Steps of a KDD Process
Data Mining and Business Intelligence: Increasing potential to support
business decisions
End User Making Decisions: Data Presentation Business Analyst
Visualization Techniques: Data Mining Data Information
Discovery Analyst: Data Exploration Statistical Analysis, Querying and
Reporting
Data Mining: Classification Schemes
 General functionality
 Descriptive data mining
 Predictive data mining
 Different views, different classifications
 Kinds of databases to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications
A Multi-Dimensional View of Data Mining Classification
 Databases to be mined: Relational, transactional, object-oriented,
object- relational, active, spatial, time-series, text, multi- media,
heterogeneous, legacy, WWW, etc.
 Knowledge to be mined: Characterization, discrimination,
association, classification, clustering, trend, deviation and outlier
analysis, etc.
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized: Database-oriented, data warehouse (OLAP),
machine learning, statistics, visualization, neural network, etc.
 Applications adapted: Retail, telecommunication, banking, fraud
analysis, DNA mining, stock
Data characterization is a summarization of the general
characteristics or features of a target class of data. The data
corresponding to the user-specified class are typically collected by a
query. For example, to study the characteristics of software products
with sales that increased by 10% in the previous year, the data related
to such products can be collected by executing an SQL query on the
sales database.
Data discrimination is a comparison of the general features of the
target class data objects against the general features of objects from one
or multiple contrasting classes. The target and contrasting classes can
be specified by a user, and the corresponding data objects can be
retrieved through database queries. For example, a user may want to
compare the general features of software products with sales that
increased by 10% last year against those with sales that decreased by at
least 30% during the same period. The methods used for data
Data entries can be associated with classes or concepts. For example, in
the AllElectronics store, classes of items for sale
include computers and printers, and concepts of customers
include bigSpenders and budgetSpenders. It can be useful to describe
individual classes and concepts in summarized, concise, and yet precise
terms. Such descriptions of a class or a concept are
called class/concept descriptions. These descriptions can be derived
using (1) data characterization, by summarizing the data of the class
under study (often called the target class) in general terms, or (2) data
discrimination, by comparison of the target class with one or a set of
comparative classes (often called the contrasting classes), or (3) both
data characterization and discrimination.
Data Association & Classification: Association rule mining finds
interesting associations and relationships among large sets
of data items. This rule shows how frequently a itemset occurs in a
transaction. A typical example is Market Based Analysis
Classification is a data mining function that assigns items in a
collection to target categories or classes. The goal of classification is to
accurately predict the target class for each case in the data. For
example, a classification model could be used to identify loan
applicants as low, medium, or high credit risks.
Classification and association rule discovery are similar except
that classification involves prediction of one attribute, i.e., the class,
while association rule discovery can predict any attribute in
the data set. In addition, the classification and association tasks can also
be distinguished according to the (a)symmetry of the attributes
Data exploration is the initial step in data analysis, where users
explore a large data set in an unstructured way to uncover initial
patterns, characteristics, and points of interest. Data exploration can use
a combination of manual methods and automated tools such
as data visualizations, charts, and initial reports.
Using interactive dashboards and point-and-click data exploration,
users can better understand the bigger picture and get insights faster.
Begin by examining each variable by itself. Then move on to study
relationships among the variables. Begin with a graph or graphs. Then
add numerical summaries of specific aspects of the data.
Data reduction is a process that reduced the volume of
original data and represents it in a much smaller volume. Data
reduction techniques ensure the integrity of data while reducing the
data. The time required for data reduction should not overshadow the
time saved by the data mining on the reduced data set.
Salary
Job Profile
Working Condition
Quality Resources
Satisfaction of
Climate & Environment
employee
Relationship with Employees
Rewards & Recognition
Responsibilities

Ba CH - 3

Uploaded by

Copyright:

Available Formats

You might also like

Ba CH - 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ba CH - 3

Uploaded by

Copyright:

Available Formats

Business Analytics

Statistical Methods for Finding the Best Regression Model

Data mining is also known as Knowledge Discovery in Data (KDD).

Choice Modeling: Most traditional mode choice models are based on the principle

Rule Induction: Rule induction is an area of machine learning in which

You might also like