Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Kunal M.

Guhagarkar
TYIT Roll No. 521

Business Intelligence Assignment 2

1. What are the different types of mathematical models?

Models can be divided into the following:


Iconic
• An iconic model is a material representation of a real system, whose behaviour
is imitated for the analysis. A miniaturized model of a new city neighbourhood is
an example of an iconic model.

Analogical
• An analogical model is also a material representation, although it imitates the
real behaviour by analogy rather than by replication. A wind tunnel built to
investigate the aerodynamic properties of a motor vehicle is an example of an
analogical model intended to represent the actual progression of a vehicle on the
road.

Symbolic
• A symbolic model, such as a mathematical model, is an abstract representation
of a real system. It is intended to describe the behaviour of the system through a
series of symbolic variables, numerical parameters, and mathematical
relationships.

Stochastic
• In a stochastic model some input information represents random events and is
therefore characterized by a probability distribution, which in turn can be
assigned or unknown.
Kunal M. Guhagarkar
TYIT Roll No. 521

Deterministic
• A model is called deterministic when all input data are supposed to be known a
priori and with certainty.
• Since this assumption is rarely fulfilled in real systems, one resort to
deterministic models when the problem at hand is sufficiently complex, and any
stochastic elements are of limited relevance.
• Notice, however, that even for deterministic models the hypothesis of knowing
the data with certainty may be relaxed.
• Sensitivity and scenario analyses, as well as what-if analysis, allow one to assess
the robustness of optimal decisions to variations in the input parameters.

Static
• Static models consider a given system and the related decision-making process
within one single temporal stage.
Kunal M. Guhagarkar
TYIT Roll No. 521

2. Explain in brief different phases of mathematical models

Problem identification
• First, the problem at hand must be correctly identified. The observed critical
symptoms must be analysed and interpreted to formulate hypotheses for
investigation.
• For example, too high a stock level, corresponding to an excessive stock
turnover rate, may represent a symptom for a company manufacturing
consumable goods.
• It is, therefore, necessary to understand what caused the problem, based on the
opinion of the production managers.
• In this case, an ineffective production plan may be the cause of the stock
accumulation.

Model formulation
• Once the problem to be analysed has been properly identified, effort should be
directed toward defining an appropriate mathematical model to represent the
system.
• Several factors affect and influence the choice of model, such as the time
horizon, the decision variables, the evaluation criteria, the numerical parameters,
and the mathematical relationships.

Time Horizon.
Usually, a model includes a temporal dimension. For example, to formulate a
tactical production plan over the medium term it is necessary to specify the
production rate for each week in a year.
A supposition or proposed explanation made based on based on limited evidence
as a starting point for further investigation.
Kunal M. Guhagarkar
TYIT Roll No. 521

Evaluation criteria.
Appropriate measurable performance indicators should be defined to establish a
criterion for evaluating and comparing the alternative decisions.
These indicators may assume various forms in each different application, and may
include the following factors:
• monetary costs and payoffs;
• effectiveness and level of service;
• quality of products and services;
• flexibility of the operating conditions;

Decision variables.
Symbolic variables representing alternative decisions should then be defined. For
example, if a problem consists of the formulation of a tactical production plan
over the medium term, decision variables should express production volumes for
each product, for each process, and for each period of the planning horizon.

Numerical parameters.
It is also necessary to accurately identify and estimate all numerical parameters
required by the model. In the production planning example, the available capacity
should be known in advance for each process, as well as the capacity absorption
coefficients for each combination of products and processes.

Mathematical relationships.
The final step in the formulation of a model is the identification of mathematical
relationships among the decision variables, the numerical parameters, and the
performance indicators defined during the previous phases.

Development of Algorithms
• Once a mathematical model has been defined, one will naturally wish to proceed
with its solution to assess decisions and select the best alternative.
Kunal M. Guhagarkar
TYIT Roll No. 521

• In other words, a solution algorithm should be identified and a software tool that
incorporates the solution method should be developed or acquired.
• An analyst in charge of the model formulation should possess a thorough
knowledge of current solution methods and their characteristics.

Implementation and Test


• When a model is fully developed, then it is finally implemented, tested, and
utilized in the application domain.
• It is also necessary that the correctness of the data and the numerical parameters
entered in the model be assessed.
• These data usually come from a data warehouse or a data mart previously set
up.
• Once the first numerical results have been obtained using the solution procedure
devised, the model must be validated by submitting its conclusions to the opinion
of decision-makers and other experts in the application domain. Several factors
should be considered consider at this stage:
• the plausibility and likelihood of the conclusions achieved;
• the consistency of the results at extreme values of the numerical parameters;
• the stability of the results when minor changes in the input parameters are
introduced.
Kunal M. Guhagarkar
TYIT Roll No. 521

3. Write a short note on Predictive models

Predictive models play a primary role in business intelligence systems since they
are logically placed upstream concerning other mathematical models and, more
generally, to the whole decision-making process.

Predictions allow input information to be fed into different decision-making


processes, arising in strategy, research and development, administration and
control, marketing, production, and logistics.

Basically, all departmental functions of an enterprise make some use of predictive


information to develop decision-making.
Kunal M. Guhagarkar
TYIT Roll No. 521

4. Write a short note on Project management models

A project is a complex set of interrelated activities carried out with a specific goal,
which may represent an industrial plant, a building, an information system, a new
product, or a new organizational structure, depending on the different application
domains.

The execution of the project requires a planning and control process for the
interdependent activities as well as the human, technical, and financial resources
necessary to achieve the final goal.

Project management methods are based on the contributions of various


disciplines, such as business organization, behavioural psychology, and
operations research.
Kunal M. Guhagarkar
TYIT Roll No. 521

5. What are the different factors used for model formulation?

Once the problem to be analysed has been properly identified, effort should be
directed toward defining an appropriate mathematical model to represent the
system.
Several factors affect and influence the choice of model, such as the time horizon,
the decision variables, the evaluation criteria, the numerical parameters, and the
mathematical relationships.

Time Horizon.
Usually, a model includes a temporal dimension. For example, to formulate a
tactical production plan over the medium term it is necessary to specify the
production rate for each week in a year.
A supposition or proposed explanation made based on based on limited evidence
as a starting point for further investigation.

Evaluation criteria.
Appropriate measurable performance indicators should be defined to establish a
criterion for evaluating and comparing the alternative decisions.
These indicators may assume various forms in each different application, and may
include the following factors:
• monetary costs and payoffs;
• effectiveness and level of service;
• quality of products and services;
• flexibility of the operating conditions;

Decision variables.
Symbolic variables representing alternative decisions should then be defined. For
example, if a problem consists of the formulation of a tactical production plan
over the medium term, decision variables should express production volumes for
each product, for each process, and for each period of the planning horizon.
Kunal M. Guhagarkar
TYIT Roll No. 521

Numerical parameters.
It is also necessary to accurately identify and estimate all numerical parameters
required by the model. In the production planning example, the available capacity
should be known in advance for each process, as well as the capacity absorption
coefficients for each combination of products and processes.

Mathematical relationships.
The final step in the formulation of a model is the identification of mathematical
relationships among the decision variables, the numerical parameters, and the
performance indicators defined during the previous phases.
Kunal M. Guhagarkar
TYIT Roll No. 521

6. Explain in detail the purpose of Pattern & Machine Learning model

The purpose of pattern recognition and learning theory is to understand the


mechanisms that regulate the development of intelligence, understood as the
ability to extract knowledge from experience to apply it in the future.

• Mathematical learning models can be used to develop efficient algorithms that


can perform such tasks.

• This has led to intelligent machines capable of learning from past observations
and deriving new rules for the future, just like the human mind can do with great
effectiveness due to the sophisticated mechanisms developed and fine-tuned
during evolution.

• Mathematical learning models have two primary objectives.

• The purpose of interpretation models is to identify regular patterns in the data


and to express them through easily understandable rules and criteria.

Prediction models help to forecast the value that a given random variable will
assume in the future, based on the values of some variables associated with the
entities of a database.
Kunal M. Guhagarkar
TYIT Roll No. 521

7. What are the different data streams in Data Mining?

1. Classification
Classification is a supervised learning technique. In classification, the classifier
model is built based on the training data (or past data with output labels). This
classifier model is then used to predict the label for unlabelled instances or items
continuously arriving through the data stream. Prediction is made for the
unknown/new items that the model never saw, and already known instances are
used to train the model.
Generally speaking, a stream mining classifier is ready to do either one of the
tasks at any moment:
 Receive an unlabelled item and predict it based on its current model.
 Receive labels for past known items and use them for training the model

Best Known Classification Algorithms


Let’s discuss the best-known classification algorithms for predicting the labels
for data streams.

1. Lazy Classifier or k-Nearest Neighbour


The k-Nearest Neighbour or k-NN classifier predicts the new items’ class
labels based on the class label of the closest instances. In particular, the lazy
classifier outputs the majority class label of the k instances closest to the one
to predict.

2. Naive Bayes
Naive Bayes is a classifier based on Bayes’ theorem. It is a probabilistic model
called ‘naive’ because it assumes conditional independence between input
features. The basic idea is to compute a probability for each one of the class
labels based on the attribute values and select the class with the highest
probability as the label for the new item.
Kunal M. Guhagarkar
TYIT Roll No. 521

3. Decision Trees
As the name signifies, the decision tree builds a tree structure from training
data, and then the decision tree classifier is used to predict class labels of
unseen data items. They are easy to understand their predictions. In Data
Streams in Data Mining Hoeffding tree is the state-of-the-art decision tree
classifier. In addition, the Hoeffding adaptive tree is advanced.

4. Logistic Regression
Logistic Regression is not a regression classifier, but a classification classifier
used to estimate discrete values/binary values like 0/1, yes/no, true/false, etc.
It predicts the probability of occurrence of an event by fitting data to a logit
function based on known instances of the data stream.

5. Ensembles
Ensembles combine different classifiers, which can predict better than
individual classifiers. Data is divided into distinct subsets, and these different
subsets of data are fed to different classifiers of ensemble model Bagging and
boosting are two types of ensemble models. The ADWIN bagging method is
widely used for Data Streams in Data Mining.

2. Regression
Regression is also a supervised learning technique used to predict real values of
label attributes for the stream instances, not the discrete values like classification.
However, the idea of regression is similar to classification either to predict the
real-values label for the unknown items using the regressor model or train and
adjust the model using the known data with the label.

Best Known Regression Algorithms


Regression Algorithms are also the same as classification algorithms. Below are
the best-known regression algorithms for predicting the labels for data streams.
 Lazy Classifier or k-Nearest Neighbour
 Naive Bayes
Kunal M. Guhagarkar
TYIT Roll No. 521

 Decision Trees
 Linear Regression
 Ensembles

3. Clustering
Clustering is an unsupervised learning technique. Clustering is functional when
we have unlabelled instances, and we want to find homogeneous clusters in them
based on the similarities of data items. Before the clustering process, the groups
are not known. Clusters are formed with continuous data streams based on data
and keep on adding items to the different groups.

Best Known Clustering Algorithms


Let’s discuss the best-known clustering algorithms for group segmentation of
data streams.

1. K-means Clustering
The k-means clustering method is the most used and straightforward method
for clustering. It starts by randomly selecting k centroids. After that, repeat
two steps until the stopping criteria are met: first, assign each instance to the
nearest centroid, and second, recompute the cluster centroids by taking the
mean of all the items in that cluster.

2. Hierarchical Clustering
In hierarchical clustering, the hierarchy of clusters is created as dendrograms.
For example, PERCH is a hierarchical algorithm used for clustering online
data streams.

3. Density-based Clustering
DBSCAN is used for density-based clustering. It is based on the natural human
clustering approach.
Kunal M. Guhagarkar
TYIT Roll No. 521

4. Frequent Pattern Mining


Frequent pattern mining is an essential task in unsupervised learning. It is used to
describe the data and find the association rules or discriminative features in data
that will further help classification and clustering tasks. It is based on two rules.
 Frequent Item Set- Collection of items occurring together frequently.
 Association Rules- Indicator of the strong relationship between two items.

Best Known Frequent Pattern Mining Algorithms


Below are the best-known frequent pattern mining algorithms for finding frequent
item sets in data.
 Apriori
 Eclat
 FP-growth
Kunal M. Guhagarkar
TYIT Roll No. 521

8. Explain the differences between OLAP, statistics and data mining

OLAP
Online analytical processing (OLAP) is a database analysis technology that
organizes data into multidimensional structures. OLAP allows users to analyse
data from multiple viewpoints. OLAP enables users to slice and dice data, drill
down or up, and pivot data along different dimensions.

Statistics
Statistics is a mathematical science that involves collecting, analysing,
interpreting, and presenting data. Statistics is also a technique for collecting,
reviewing, analysing, and drawing conclusions from quantified data.

Data mining
Data mining is the process of using computers and automation to search large
sets of data for patterns and trends. Data mining involves cleaning raw data,
finding patterns, creating models, and testing those models. Data mining
includes statistics, machine learning, and database systems.
Kunal M. Guhagarkar
TYIT Roll No. 521

9. What are the different types of attributes contained in a dataset?

Here are some types of attributes in a dataset:

Binary attributes
A type of nominal attribute that has two possible values, often true and false. For
example, the attribute "Survived" in the Titanic Dataset is binary.

Nominal attributes
Attributes that provide enough information to distinguish between objects.
Examples include name, roll number, and address.

Ordinal attributes
Qualitative attributes that describe a feature of an object without providing an
actual size or quantity. For example, the attribute "Grades" is ordinal because its
values A+, A, B, and C have an order.

Continuous attributes
Attributes with real numbers as values. Examples include height, weight, and
temperature.

Numeric attributes
Attributes that represent quantitative data, such as numbers. There are two main
types of numeric attributes: continuous and discrete.

Data mining is a technique that uses patterns and data visualization to find out
how different attributes of a data set are related to each other. The goal of data
mining is to find relationships between two or more attributes of a dataset and use
this to predict outcomes or actions.
Kunal M. Guhagarkar
TYIT Roll No. 521

10. Write a short note on exploratory analysis in data mining process

Exploratory Data Analysis (EDA) is an approach of analysing data sets to


summarize their main characteristics, often using statistical graphics and other
data visualization methods. It is a critical process of performing initial
investigations on data so as to discover patterns, to spot anomalies, to test
hypothesis and to check assumptions with the help of summary statistics and
graphical representations.

The four steps of exploratory data analysis (EDA) typically involve:


1. Data Cleaning: Handling missing values, removing outliers, and ensuring
data quality.
2. Data Exploration: Examining summary statistics, visualizing data
distributions, and identifying patterns or relationships.
3. Feature Engineering: Transforming variables, creating new features, or
selecting relevant variables for analysis.
4. Data Visualization: Presenting insights through plots, charts, and graphs to
communicate findings effectively.

EDA is used in data mining to understand the data and to identify patterns and
relationships that may not be immediately apparent. It can also be used to test
hypotheses and to develop models. EDA is an essential step in the data mining
process, as it helps to ensure that the data is properly understood and that the
results of the analysis are reliable.
Here are some of the benefits of using EDA in data mining:
 It helps to identify patterns and relationships in the data that may not be
immediately apparent.
 It can be used to test hypotheses and to develop models.
 It helps to ensure that the data is properly understood and that the results
of the analysis are reliable.
 It can help to improve the accuracy of data mining models.
 It can help to reduce the time and cost of data mining projects.
Kunal M. Guhagarkar
TYIT Roll No. 521

EDA is a powerful tool that can be used to improve the results of data mining
projects. By using EDA, data miners can better understand the data, identify
patterns and relationships, and develop more accurate models.

You might also like