Professional Documents
Culture Documents
Tech Max
Tech Max
Systems
- Knowledge management is an activity practised by enterprises all over the world. In the
- Then, gathered information is organized, Stored, shared, and analysed using defined
Scanned by CamScanner
(P}pusiness Intelligence (MU-B,Sc.-IT-Sem-VI) 5-9 Knowledge Mgmt. & Al & Expert Systems
SS ee ee EO oe
(5 Marks)
_ The process of knowledge management is universal for any enterprise. Sometimes, the
resources used, such as tools and techniques, can be unique to the organizational
environment.
- The Knowledge Management process has six basic steps assisted by different tools and
techniques. When these steps are followed sequentially, the data transforms into
knowledge.
Decision Making
Synthesizing
Analyzing —
Summarizing
Organizing
Data
Collecting ce
Fig. 5.1.1
Step 1 : Collecting
~ This is the most important step of the knowledge management process. If you collect the
incorrect or irrelevant data, the resulting knowledge may not be the most accurate.
Therefore, the decisions made based on such knowledge could be inaccurate as well.
-
There are many methods and tools used for data collection. First of all, data collection
The data collection procedure defines certain data collection points. Some points may be
the Summary of certain routine reports. As an example, monthly sales report and daily
With data collection points, the data extraction techniques and tools are also defined. As
an example, the sales report may be a paper-based report where a data entry operator
9°
Scanned by CamScanner
et Business Intelligence (MU-B.Sc.-IT-Sem-VI) 5-3 Knowledge Mgmt. & Al & Expert s tom
needs to feed the data manually to a database whereas, the daily attendance report May },
- In addition to data collecting points and extraction mechanism, data storage is aly
defined in this step. Most of the organizations now use a software database application for
this purpose,
Step 2 : Organizing
The data collected need to be organized. This organization usually happens baseq on
As an example, all sales-related data can be filed together and all staff-related data coyy
be stored in the same database table. This type of organization helps to maintain data
~— If there is much data in the database, techniques such as ‘normalization’ can be used for
Step 3: Summarizing —
- In this step, the information is summarized in order to take the essence of it. The lengthy
Step 4: Analyzing
- At this stage, the information is analyzed in order to find the relationships, redundancies
and patterns. °
- An expert or an expert team should be assigned for this purpose as the experience of the
person/team plays a vital role. Usually, there are reports created after analysis of
information.
Step 5 : Synthesizing
- At this point, information becomes knowledge. The results of analysis (usually the
~ - A pattern or behavior of one entity can be applied to explain another, and collectively, the
organization will have a set of knowledge elements that can be used across the
organization.
~ This knowledge is then stored in the organizational knowledge base for further use.
Scanned by CamScanner
(7 eusiness Intelligence (MU-B.Sc.-IT-Sem-VI)__5-4 Knowledge Mgmt. & Al & Expert Systems
_ Usually, the knowledge base is a software implementation that can be accessed from
_ You can also buy such knowledge base software or download an open-source
_ - At this stage, the knowledge is used for decision making. As an example, when estimating
a specific type of a project or a task, the knowledge related to previous estimates can be
used.
_ This accelerates the estimation process and adds high accuracy. This is how the
organizational knowledge management adds value and saves money in the long run.
=e)
only search for it and improve it for applying it to improving internal processes, but to
make them see the benefits of sharing it with the organization, in this context it is
“important:
1. To give people autonomy in their jobs and find new ways to fulfill them.
- The manager should always be aware of the fact that decisions made by people can affect
> That’s why your motivation is crucial, that’s what will make employees share and
replicate the knowledge they accumulate in their activities in the company with
colleagues. ’
~ The worst that can happen is to lose that talent to the competition, along with everything
Scanned by CamScanner
Sem-VI) 5-5 Knowledge Mgmt. & Al & Expert System, |
i tion
[a. 5.3.1. Write short note on learning organization. (Ref. Sec. 5.3) _(5 Marks)
This concept reviews several theories relating to the learning organisation, including some
criticism. -
This permeates all organisational activities, stractures, processes, climate and values,
leading to an enhanced ability to react quickly to opportunities and threats.
n=
Organizational transformation takes place when there is a change in the way the business
Along with the structural changes, the attitude of the employees, their perspectives as well
There are three key stages for managing organisational transformation along with the
.|
Scanned by CamScanner
Hews ness Intelligence
{MU'8.Sc-IT-Sem-V!)
Break with your administrative heritage. Important mechanisms here can be the removal
wn away.
This will vary from company to company, Some may be able leverage a traditional
change, however, in environments where a more democratic leadership style is the norm,
it may be more appropriate to leverage other factors, for example, customer relationships,
participating in new initiatives, Crisis is also an important lever for organisational change.
Vary your leadership style as appropriate. The top-down approach of Stage 1 may be still
required to break with the past in some parts of the organisation, while other parts may by
this stage already have the ability to learn and therefore may be given authority and
empowerment to act,
Exploit best practice from your own or other organisations. This will require knowledge
-Reconfigure, divest and integrate resources. This involves everything from streamlining
Empower the organisation. The top management team should delegate to employees as
can achieve this by encouraging innovation, trial and experimentation and by developing
a culture which encourages informed risk-taking and facilitates learning from mistakes.
Exploration enables the organisation to develop new capabilities fitted to its specific
context, rather than just importing systems and routines from other contexts.
Create new paths. This means creating a deliberate change in direction using new
Capabilities, whether that be in terms of new products, services, processes or business models,
Scanned by CamScanner
sl
[business Intelligence (MU-B.Sc.-IT-Sem-VI) 5-7 Knowledge Mgmt. & Al & Expert Systems
The combination of exploration and path creation will lead you to the “disruptive Innovation”
By going through these stages, organizations can establish new developmental Pathway.
enhance their strategic flexibility, and react successfully to changes in the environment.
5.5
| knowledge-based assets.
purpose or objectives they wish to fulfill or how the organization will adopt and follow
A successful knowledge management program will consider more than just technology.
9.5.1
—_—
People. : They represent how you increase the ability of individuals within the
driven culture.
€d approach that
Scanned by CamScanner
[Hf ausiness Intelligence (MU-B.Sc.-IT-Sem-Vl) 5g Knowledge Mgmt. & Al & Expert Systems.
_ Organizations that have made this kind of investment in knowledge management realize
_ They add to their top and bottom lines through faster cycle times, enhanced efficiency,
better decision making and greater use of tested solutions across the enterprise.
rr
nS
that new technology, the Internet, to work and see what it was capable of.
That first stage has been described using a horse breeding metaphor as “by the internet out
The concept of intellectual capital, the notion that not just physical resources, capital, and
manpower, but also intellectual capital (knowledge) fueled growth and development,
provided the justification, the framework, and the seed. The availability of the internet
capabilities provided by the Internet, using it first for themselves, realizing that if they
shared knowledge across their organization more effectively they could avoid reinventing
The central point is that the first stage of KM was about how to deploy that new |
The first stage might be described as the “If only Texas Instruments knew what Texas
Instruments knew” stage, to revisit a much quoted KM mantra. The hallmark phrase of Stage 1
Was first “best practices,” later replaced by the more politic “lessons learned.”
Scanned by CamScanner
eu Business Intelligence (MU-B.Sc.-IT-Sem-Vl)__5-9 Knowledge Mgmt. & Al & Expert System,
lt SY stom,
—$——__
* Steps to Implementation
knowledge.
‘The following eight-step approach will enable you to identify these challenges so you can
plan for them, thus minimizing the risks and maximizing the rewards. This approach was
developed based on logical, tried-and-true activities for implementing any new organizational
program. The early steps involve strategy, planning, and requirements gathering while the later
Before selecting a tool, defining a process, and developing workflows, you should
In order to establish the appropriate program objectives, identify and document the
business problems that need resolution and the business drivers that will provide
Provide both short-term and long-term objectives that address the business problems and
support the business drivers. Short-term objectives should seek to provide validation that
the program is on the right path while long-term objectives will help to create and
Scanned by CamScanner
(er Buainosa Intolligonco (MU-B.So,-IT-Som-V1) 5-10 Knowledge Mgmt. & Al & Expert Systems.
cultural changes in the way employees perceive and share knowledge they develop or
possess.
7 One commion cultural hurdle to increasing the sharing of knowledge is that companies
- This practice promotes a "knowledge is power" behavior that contradicts the desired
within the organization's norms and shared values; changes that some people might resist
- To minimize the negative impact of such changes, it's wise to follow an established
- The process can be progressively developed with detailed procedures and work
instructions throughout steps four, five, and six. However, it should be finalized and
Organizations that overlook or loosely define the knowledge management process will not
realize the full potential of their knowledge management objectives.
best. There are a number of knowledge management best practices, all of which comprise
similar activities. 7 |
reporting.
~ Depending on the program objectives established in step one and the process controls and
criteria defined in step three, you can begin to determine and prioritize your knowledge
the cost and benefit ef each type of technology and the primary technology providers in
the marketplace.
Scanned by CamScanner
(I business Intelligence (MU-B.Sc.-IT-Sem-VI) 5-11 Knowledge Mgmt. & Al & Expert S Stems
- Don't be too quick to purchase a new technology without first determining if your existing
You can also wait to make costly technology decisions after the knowledge managemen
program is well underway if there is broad support and a need for enhanced computing
and automation.
- Now that you've established your program objectives to solve your business Problem,
prepared for change to address cultural issues, defined a high-level process to enable the
effective management of your knowledge assets, and determined and prioritized your
technology needs that will enhance and automate knowledge management relateg
activities, you are in a position to assess the current state of knowledge management
- The knowledge management assessment should cover all five core knowledge
- A typical assessment should Provide an overview of the assessment, the gaps between
current and desired states, and the recommendations for attenuating identified gaps. The
recommendations will become the foundation for the roadmap in step six.
— With the current-state assessment in hand, it is time to build the implementation roadmap
- But before going too far, you should re-confirm senior leadership's support and
commitment, as well as the funding to implement and maintain the knowledge
Management program. ;
- Without these prerequisites, your efforts will be futile. Having solid evidence of your
organization’ s shortcomings, via the assessment, should drive the urgency rate up.
~ This strategy can be presented as a roadmap of related Projects, each addressing specific
— The roadmap can span months and years and illustrate key milestones and dependencies.
A good roadmap will yield some short-term wins in the first step of projects, which will
— _ As time progresses, continue to review and evolve the roadmap based upon the changing
Scanned by CamScanner
feusiness In Intelligence (MU-B.Sc.-IT-Sem-VI) 5.49 Knowledge Mgmt. & Al & Expert Systems
Step 7: Implementation
- As long as there are recognized value and benefits, especially in light of ongoing
investments.
With that said, it's time for the rubber to meet the road. You know what the objectives are.
- You've got the processes and technologies that will enable and launch your knowledge
management program. You know what the gaps are and have a roadmap to tell you how
to address them.
- As you advance through each step of the roadmap, make sure you are realizing your
short-term wins. Without them, your Program may lose momentum and the support of key
stakeholders. 7
How will you know your knowledge management investments are working? You will
need a way of measuring your actual effectiveness and comparing that to anticipated
results.
- If possible, establish some baseline measurements in order to Capture the before shot of
program.
~ Then, after implementation, trend and compare the new results to the old results to see
Don’t be disillusioned if the delta is not as large as you would have anticipated. It will
take time for the organization to become proficient with the new processes and
When deciding upon the appropriate metrics ‘to measure your organization’s progress,
establish a balanced scorecard that provides metrics in the areas of performance, quality,
Scanned by CamScanner
i.
er Business Intelligence (MU-B.Sc,-IT-Som-VI) 5-13 Knowledge Mgmt. & Al & Expert System,
- You can then take the necessary actions to mitigate compliance, performance, quality, ay, d
value gaps, thus improving overall efficacy of the knowledge management program.
———
— Since the invention of computers or machines, their capability to perform various tasks
- Humans have developed the power of computer systems in terms of their diverse Working
domains, their increasing speed, and reducing size with respect to time.
- According to the father of Artificial Intelligence, John McCarthy, it is “The science and
software think intelligently, in the similar manner the intelligent humans think.
- Al is accomplished by studying how human brain thinks, and how humans learn, decide,
_ and work while trying to solve a problem, and then using the outcomes of this study as a
SaaS eee
Intelligence can be defined asa general mental ability for reasoning, problem-solving, and
learning. Because of its general nature, intelligence integrates cognitive functions such as
On the basis of this definition, intelligence can be reliably measured by standardized tests
with obtained scores predicting several broad social outcomes such as educational
achievement, job performance, health, and longevity. So let’s study the differences
ee Business Intelligence (MU-B.Sc.-IT-Sem-V1) 5-14 Knowledge Mgmt. & Al & Expert Systems .
@ Artificial Intelligence
Artificial Intelligence is the study and design of Intelligent agent, These intelligent agents
have the ability to analyze the environments and produce actions which maximize
success.
Al research uses tools and insights from many fields, including computer science,
Al research also Overlaps with tasks such as robotics, control systems, scheduling, data
# Human Intelligence :
- Human Intelligence is defined as the quality of the mind that is made up of capabilities to
learn from past experience, adaptation to new situations, handling of abstract ideas and
the ability to change his/her own environment using the gained knowledge.
Human Intelligence can provide several kinds of information. It can provide observations
during travel or other events from travellers, refugees, escaped friendly POWs, etc.
It can provide data on things about which the subject has specific knowledge, which can
be another human subject, or, in the case of defectors and spies, sensitive information to
which they had access. Finally, it can provide information on interpersonal relationships
and networks of interest.
Below are the lists of points, describe the key Differences between Artificial Intelligence
4 1.Nature of Existence
2.Memory usage
3.Mode of creation
4.Leaming process
5.Dominance
Fig. 5.10.1 : Key Differences between Artificial Intelligence and Human Intelligence |
Scanned by CamScanner
, (eT Business Intelligence (MU-B.Sc.-IT-Sem-VI) 5-15 Knowledge Mgmt. & Al & Expert Systems
SSS ————E====—
Humans use content memory and thinking whereas, robots are using the built-in
Human intelligence is bigger because its creation of God and artificial intelligence as the
name suggests is artificial, little and temporary created by humans. Also, Humans
intelligence is the real creator of the artificial intelligence even but they cannot create a
- Human intelligence is based on the variants they encounter in life and responses they get
_ 7 However, for Artificial intelligence is defined or developed for specific tasks only and its
Artificial intelligence can beat human intelligence in some specific areas:such as in Chess
a supercomputer has beaten the human player due to being able to store all the moves played
by all humans so far and being able to think ahead 10 moves as compared to human players
who can think 10 sey ahead bat cannot store and r retrieve that number ofr moves in i Chess.
learning machine.
considerably high.
Scanned by CamScanner
(7 Business Intelligence (MU-B,Sc.-IT-Sem-V1) 5-16 Knowledge Mgmt. & Al & Expert Systems
oo ,
rat
No. - Factor
4, | Decision Making | Humans have the ability to | Even the most advanced robots
[@.5.11.2 What are expert systems? (Ref, Sec. 5.11 _(S Marks)
Expert Systems (ES) are one of the prominent research domains of AI. It is introduced by
The expert systems are the computer applications developed to solve complex problems
Understandable.
Reliable.
- Highly responsive.
- Advising.
~ Demonstrating. |
~ Deriving a solution.
. Diagnosing.
~ Explaining.
Scanned by CamScanner
. & Al & Expert System
- Interpreting input.
- _ Predicting results.
Sm SSS ee
- Knowledge Base.
- Inference Engine.
~ User Interface.
‘> ci
Human Knowledge
Expert Engineer
Fig. 5.12.1
Scanned by CamScanner
Intelli
[ff eusiness intoligence (MU-B.Se.-. SECS (MU-B.Sc.-IT-Sem-VI) 5-18 Knowledge Mgmt. & Al & Expert
Systems
The data is collection of facts. The information is organized as data and facts about the
ask domain. Data, information, and past experience combined together are termed as
knowledge.
5.12.1.1 Components of Knowledge Base
* Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is
* Knowledge acquisition |
- The success of any expert’ system majorly depends on the quality, completeness, and
accuracy of the information stored in the knowledge base.
- The knowledge base is formed by readings from various experts, scholars, and the
him at work, etc. He then categorizes and organizes the information in a meaningful way,
Scanned by CamScanner
-=
er Business Intolligonce (MU-B.Sc,-IT-Sem-VI 5-19 Knowledge Mgmt. & Al & Expert S tems
=»
Use of efficient procedures and rules by the Inference Engine is essential in deducting 4
In case of knowledge-based ES, the Inference Engine acquires and manipulates the
© Applies rules repeatedly to the facts, which are obtained from earlier rule application,
© Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a sol ution, the Inference Engine uses the following strategies :
1. Forward Chaining
2. Backward Chaining
1. Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
to a solution.
AND
Fact 2
AND} Decision 4
OR decision 2
Fact 4 _———
Fig. 5.12.2
2. Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this
happened?”
Scanned by CamScanner
[$f ausiness Intelligence (MU-B.Sc.-IT-Sem-VI) 5.20 Knowledge Mgmt. & Al & Expert Systems
On the basis of what has already happened, the Inference Engine tries to find out which —
conditions could have happened in the past for this result. This strategy is followed for
finding out Cause oF reason. For example, diagnosis of blood cancer in humans.
Fact 1
Fact 2
Fact 3
Fact 4
User interface provides interaction between user of the ES and the ES itself. It is generally
Natural Language Processing so as to be used by the user who is well-versed in the task
domain.
It explains how the ES has arrived at a particular recommendation. The explanation may
Its technology should be adaptable to user’s requirements; not the other way round.
Scanned by CamScanner
ey Business Intelligence (MU-B,Sc,-IT-Sem-VI)_ 5-21. Knowledge Mgmt. & Al & Expert Systems
No technology can offer casy and complete solution. Large systems are costly, require
significant development time, and computer resources. ESs have their limitations Which
include :
'
See
Raa
Q.5.13.1_ Explain applications of expert system in detail. (Ref. Sec. 5.13) __(5 Marks)
petroleum pipeline.
Scanned by CamScanner
(7 pusiness Intelligence (MU-B.Sc.-IT-Sem-V1) 5-22 Knowledge Mgmt. & Al & Expert Systems
There are several levels of ES technologies available, Expert systems technologies include :
Levels of ES Technologies
3. Shells
They are :
o Large databases.
+ 2. Tools
- They reduce the effort and cost involved in developing an expert system to large extent.
Java Expert System Shell (JESS) that provides fully developed Java API for creating
o Java Expe:
’ an expert system.
o Vidwan, a shell developed at
Scanned by CamScanner
[FT Business Intelligence (MU-B.Sc.-IT-Sem-VI) __5-23 Knowledge Mgmt. & Al & Expert Systems
-——
hy
T cE eee
Know and establish the degree of integration with the other systems and databases
Realize how the concepts can represent the domain knowledge best.
= . Scanned by CamScanner
[7] eusiness Intelligence (MU-B.Sc.-IT-Sem-VI) 5-24 Knowledge Mgmt. & Al & Expt Sas Al & Expert
Systems
The knowledge engineer uses sample cases to test the prototype for any deficiencies in
performance.
Test and ensure the interaction of the ES with all elements of its environment, including
Cater for new interfaces with other information systems, as those systems evolve.
Less Production Cost : Production cost is reasonable. This makes them affordable.
Speed : They offer great speed. They reduce the amount of work an individual puts in.
Steady response : They work steadily without getting motional, tensed or fatigued.
(5 Marks)
Scanned by CamScanner
5.25 __Knowledge M
Q.17 Explain forward chaining and backward chaining. (Refer Section 5,12.1.1)
Scanned by CamScanner
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(7 susiness Intelligence (MU-B.Sc.-IT-Sem-Vl) 5-26 Knowledge Mgmt. & Al & Expert Systems
g.18 Explain applications of expert system in detail. (Refer Section 5.13) (5 Marks)
O00
Chapter Ends...
; Ae :
Unit IV
Business Intelligence
Applications
Syllabus Topic : M
Relational Marketing
Q.4.1.1 Explain Relational marketing and various factor associated with it.
Let’s understand relational marketing with example. Most of us have noticed that
whenever a mobile company is about to launch a new device into the market a survey is
done by the company so that they get different opinions from their customers, which
And it is not only about a mobile phone, when you visit a restaurant waiters get the
feedback forms along with the bills wherein the customers have to rate the restaurant in
Almost all the companies study the behaviour and the feedbacks given by the customers
and try to inculcate the features that are been required by the customers into their device
with a reasonable and effective cost price so that the customers are attracted towards the
Most of the e-commerce company store huge database which have collective information
about their customers and the data regarding their previous purchases which helps the
company to provide options to its customers which are more likely to be liked by tb
The strategy that is been followed in relational marketing is to start, strengthen, objectily
and maintain the relationship between the customers, stakeholders and the one?
which is been presented by the customers, analysis is done, planning is done according y
executed and evaluated to achieve the objectives.
Scanned by CamScanner
[YF Business Intalligence (MU-B.Sc.-IT-Sem-Vl) 4-2 Business Intelligence Applications
Relational Marketing evolved and became popular in late 1990s to increase customer’s
wherein they are more concern about what the customer actually needs and accordingly
implement the same into their respective products so as to sustain the competitive market.
Reasons to spread relational marketing are complex but interconnected which are listed
below :
With evolution of companies in the respective fields, the number of customers has also
increased comparatively.
Increase transparency and flow of data an also addition of e-commerce sites lead to global
comparisons between different features, prices and also reviews from the customers who
renew the existing service or opt a new one because the facilities to change the services
Most of the companies have maintain different levels/versions of the products and
services provided by them so that the customer has got the flexibility of choosing the
Services according to its requirement and also switch between the services as and when
required.
Data is gathered of the transactions and products and services that are been used by the
Customers so that the company has huge range of data to analyze what is expected next by
the customers, advanced automation techniques are used to analyze this data so that
Scanned by CamScanner
Bu
es of re ;
_ Strategi
Above mentioned are the choices through which the strategies for relational marketing
Product services are the services that can be provided by the company for the
Various distribution channels can be constructed to make the product available for the
customers, like nowadays the companies are not sourly depended on traditional approach
where the product is distributed to various shops from where the customers would
purchase the same instead the products are been distributed to e commerce sites and sales
with attractive offers due to which customers get wide range of options to purchase the
product.
Scanned by CamScanner
-IT-Sem-Vl) 4-4 Business Intelligence Applications
and prices of the product is also maintained to compete in the market. Different
motions are done to attract the customers and make them aware about the
Segments
creative pro
Above mentioned are the different components that are been used in relational marketing
strategy Where in the organization, its technologies, business strategies and its data
mining, Process implemented to construct and promote the product together help in
[ achieving efficient and strong relationship among its customers and also the company.
Fig. 4.1.1(c) represent the different people involved in relational marketing strategy where
Operational
; Extemal data
Scanned by CamScanner
f
. i -IT- -V - i :
Fig. 4.1.2(a) shows the main elements that are been used to create an environment f |
achieved by collecting data from different internal and external data Sources, and als
from marketing data mart which gives business intelligence and data mining analyses fo,
understanding the potential of the company and identifying the actual customers that the
company has.
— With different machine learning and pattern recognition models it is easy to achiev.
various sections of customer base which can be later on used to define and design Policies
Classification model can also be generated to classify different objectives of the company |
say as for example the classification model can be made to check what the customer j,
frequently buying from the offers been provided by the company and project the similz
kind of offer to only those customers where the possibility of their acceptance to the
model is more.
Managing marketing campaign is a difficult task which needs strong planning for every
typé of customer, what would be the actions taken and communication channels through
which the customer can communicate with the company and how can the available
This decision making process can be managed and formally expressed with the help of
optimization models. The end phase of marketing activity cycle is execution of the
The data that is been collected through this results is then put into marketing data mart fet
- Whenever a campaign is been executed it is important to set procedures which will help to
control the campaign and also analyze the data which is been obtained in the form of
result. ,
— To test how effective the campaign has been it is important to restrict the campaign '
selected set of People which will have same features as of the people who would be using
Scanned by CamScanner
|
—— <r
mos
a ee Ts
PS
Data
‘warehouse
* customers
® products
* services
® payments
pe rotitabily og
OP
- Following are the main stages of customer lifetime which show cumulative value of
- — Italso shows the different actions that can be taken for a customer by any company. In the
starting phase any individual is a prospect or also known as potential customer who has
not yet started purchasing the product or using the services provided by the company,
Scanned by CamScanner
(a7 Business Intelligence (M U-B.SC.-1T- SO SCS A = Benne nealiperice Application,
For these customers acquisition actions are been carried out in both directly and indirecyy
fashion.
In direct acquisition the customer is been given information about the product or S€rVicg
via calls, emails, oral talks with the agents of the company and so on.
In indirect acquisition advertising and information about the product is displayed on the
This actions includes cost which will be assigned to the customers and then calculate the
loss as all the customers that are been approached would not agree the buy the product or
service.
This event can have different forms in different situations like the service may require
subscription of the service, or the customer will only be able to buy the product wheg
Before the prospect becomes a customer for the company he/she will be getting constant
reminders from the company in the form of messages, call, and emails in order to get their
customer ship.
This lead to generation of cost which has an progressive amount and if the prospect is not
convinced to buy the product this ultimately puts the company at the loss which is stated
to be negative outcome.
Retention
gt Cross/up-selling
2 Churmer
a. Retention
Lost proposal 1
Cross/up-selling '
Acquisition
4'
al lL =
> Time
This phase which is considered to make the relationship between the customer and
ee Pic and also known as maturity phase may also lead to retention, cross
The . . * . e
last phase is interruption of relationship where the customer calls off the service of
the ‘
of onm and moves on to the competitor company due to the inconvenience in terms
Scanned by CamScanner
>
- Fig. 4.1.4 illustrates the logic for development of classification model for analysis of
relational marketing taking into consideration the temporal dimension. Let’s assume t is
the current time period which needs to be derived as inductive learning model of
classification problem.
- Say for example at the beginning of month January a mobile provider wants to develop a
classification model to find the probability of its customer. The data mart will contain
data from past periods which will be updated as t-1.In our case will have data up to
December.
- Imagine the provider wanted to get the probability of future h months in advance say for
supposing next 2 months that is February and March so in that case probability will be
- Here you have to note that data for period t will not be used to predict because the data for
- To develop classification model the values of target variables are used for last known
period as t — 1, which are the customers that were seethed in December month.
- For testing the model the data from t — 2 should not be used because that is the training
Scanned by CamScanner
Business Intellij |
=— =
- Pastdata.
“from marketing
data mart
upto...
period t-1
Fig. 4.1.4 : Development and application flow chart for a predictive model
4.1.5 Acquisition
Even if retention is the important aspect of relational marketing strategies acquisition js
potential customers which can be or may be partially or completely unaware about the
products or services that are been provided by the company for did not require this
products or services in the past and now are in need of one or the might also be customers
of the competitors who are hunting for better services or the other case would be that the
Once the company has identified the prospects it is important to assign acquisition
campaign with high profitability to both the prospects and the company with various
levels marketing strategies along with the marketing resources available with the
company.
- Traditional marketing strategies are were the advertising and campaign is based on the
earlier pools taken from the public in order to enhance the quality of products and
services that are been provided which is been fed into data mart to derive classificatio®
= i to the reach of maturity stage by most of the products and services and its saturatio®
Scanned by CamScanner
7] Business Intelligence (MU jence MU-B.So.-IT-Som-V1) 4-10 Businoss Intelligence Applications
Due to this the negative side effect is that the expansion of customer base of company has
more of switch mechanism like acquisition of customer at cost of that taken by other
Due to this many companies invest more amounts in resources to analyze and characterize
the attributes due to which customer’s switches from their company to another.
The other reason could be the attractive offers given by the competitive company to grab
the attention of the prospects and thus bring the market strategies if the company down.
Also there can be various reasons that the customer would not find the charge relevant to
pay for the services provided by the company and thus hunt for an alternative one and
There are various other aspects that would lead to retention of products and services that
are been provided by the company and thus the company has to be keen about the same.
Data mining models can also contribute to relational marketing analysis which aims to.
identify different market segments through which most of the possibility for purchasing
For example assume a mobile shop where there is an offer that if the customer buys a
smart phone the or she can pay extra Rs. 100 to get annual subscription of Netflix along
with smart phone but there is no compulsion that every customer purchasing smart phone
would be interested for the subscription and due to this the mobile provider get the
classification of customer who are interested and people who are not interested in the
offer.
And if the number of interested customer is more the shop owner will have to get more
services from Netflix. This demographic information about the customer can be fed into
data mart which can be used as explanatory attributes to develop classification model
which will help to develop various offers in forthcoming period and how customer would .
react to it.
Cross selling means trying to sell a product or service to the customer who is already
Scanned by CamScanner
=a _—_
Through classification model the company can understand which ajj custom
ers
For example, we often get calls from our banks asking us to upgrade oy; debit
credit once, now this calls are only been done to the customers holding debjt card ang
to those holding credit. So this defines a margin for acquisition to cajj only te,
This can also be stated as up selling where the customer is informed and asked to ow,
product or services which are one level higher than the existing one and will haye ae
The main objective of market basket analysis is to get the exact view of what products t,,
customers are purchasing so that the company gets the required knowledge to organiz.
industries. .
It can also to be applied to check the purchases done with help of credit card or landline
services or complementary once to check whether the policies taken are been taken by
same households.
Data-used here can also be referred as purchase transactions which can be associated with
As it is well known fact that web is the most common and easier way of communication
And most of the companies are using social media platform to promote their products (9
the people. E commerce sites are considered to be the important sales channels.
Since web mining is used to analyze data from the activities that are been carried out 0°
those sites by the visitor this web mining methods are mostly used for three purposes
Scanned by CamScanner
ae Business Intelligence (MU-B.Sc.-IT-Sem-Vl) 4-12 Business Intelligence Applications
— Text mining
-HTML mining —
_ Image mining ~
_ Web mining -
| User profile —
ge
ie
¥ ie a ‘a at any
Usage mining”
It involves analyses of content that is there on the web page to remove required
information. Search engines like Google also perform content mining to provide links to
It can also be tracked back to data mining problems for analysis of texts present on web
This type of mining is used to understand the structure of web using different links on
different pages. Graphs can be created where nodes correspond to web pages and arches
are going to the nodes that are the link to other page.
Results and algorithms from graph theory is used to characterize web structure which
It aims to certifying most relevant standpoint of relational marketing which explores paths
that are been followed by navigators and behaviour during the visit to company’s website.
Methods that are been used for extraction of association rules are used to obtain
Scanned by CamScanner
Business Intelligence
Business Inte =
Syllabus Topic : M
1 Explain sales force management and various factor associated with it,
| Q. 4.2.
|5
days almost all the companies have sales department into their organizations 4,
eT employees of those department for the sales of product or services that ar.
rely o
Every employee is been given a target and depending upon id the targets are been
achieved these employees play an important role in the profit that is been gained by the
company.
- ee ae various marketing strategies that are been implemented by the sales departmen,
- The sales forces is a term coined for all the people and roles along with different tasks and
— The basic terms associated with sales forces based on the activities that are been carried
© Residential : This sales activities take place at one, or more Places which are
managed by company supplying products and services from where the customers can
© Mobile : In this type of sales the agents of the company go to the customers house or
office to give information about their Product or service and also collect the orders.
In this category the sale occurs within B2B(Business 2 Business) relationship it can
agents call up the customers and Promote the product and also collect the orders.
© customer management.
Scanned by CamScanner
(7 eusiness Intelligence (MU-B.Sc.-IT-Sem-V1) 4-14 Business Intelligence Applications
activity management,
order management.
o 0 0o9 08 8
When a sales network is been designed and when agent’s activity are been planned there
are requirement of decision making task which will take advantage of optimization
model.
Rest can be managed with help of automation tools also known as Sales Force
When it comes to designing and managing sales force various problems related to
decision making arises as shown is Fig. 4.2.1. If this problems are successfully overcome
then they yield maximum of profit, increases the efficiency of sales action and also sees to
efficient use of resources along with professional rewards to the sales agents.
The process of decision that is shown in the Fig. 4.2.1. It shows that how the strategic
objective of the company should be taken into consideration along with different other
components of marketing and see to it that the role assigned to sales force have broader
Scanned by CamScanner
ON
Ong
~ The two ways arrow connection means that all the component interact with €ach othe,
;A‘n
4.2.1.1 Design
~ It deals with the start phase of any commercial activity or during subsequent TeStriction
phase,
- For example, during the planning of acquisition plans for the PrOSpects or group of
companies.
1. Organizational structure -
| 2. Sizing
3. Sales territories -
— This structure can take different forms which corresponds to hierarchical cluster of agents
with help of group of products, geographical areas or brands, in some cases markets are
customers, products and else activity to decide how can agents be specialized and to what
extent.
~> 2. Sizing
It is the working done on the number of agents that should work within a selected
structure of sales which relies on different factors like count of customers and prospects, how
much of sales area coverage should be done, time limit for every call and travelling time of
every agent.
Scanned by CamScanner
Business Intelligence (MU-B.Sc.-IT-Sem-V1)
~» 3. Sales territories
en it comes to designj :
agen S
ales Potential of every area, time required to travel from one area to
| Segmentation [Products-services]
_ Sales activity a
- Decision making tasks that are associated with planning are assignment of sales resources,
- Resources can be calculated as work time of the agent and the budget whereas market
- Allocation can be calculated as the time spend on every customer to promote the product
or service, time and cost required to travel and how effect the action was to convince the
- Further possibilities can also be considered like explaining the technical and functional
features of the product or service and suggestions coming from the customers.
4.2.1.3 Assessment
- Assessment is important to control the activities to check the effectiveness and efficiency
of the agents in sales network so that proper remuneration and incentives can be designed
Scanned by CamScanner
P
ON
Ong
Ce
So that the agents give their full contribution towards the sales of the Product and SerVicg
thus increasing the profit of the company as well as their individual Profit and sis,
Following are some classes of optimization models for designing and planning Salesforc,
Before starting here are some of the notions that-would be used in following Sections ¢
Let’s assume that are a particular region is divided into M geographical areas Of sales
divided into disjoint clusters known as territories such that each area belongs to only one
Time connection property implements that each area it is possible to reach another area of
same territory.
Time span can be divided into T intervals which are of same length which are usually
Each territory has a sales agent associated with it which belongs 0 one area of the territory
Time and cost of travelling from one area to another depends on the area of residence of
In territories there are customers and prospects which would be visited by the agent to
Promote their product which will be given as H in some models it is considered to have
various segments and thus they are counted same. So h = (1, 2, ..., H}.
And finally assume every agent sells K products and services during the call so let
k=(l, 2,..,K}.
This plays an important role in formulating the models to design and plan sales network.
In general it defines the flexibility of sales with respect to sales action and a formal way
Sales to which response functions refers to are expressed in products units or monetary
They are presented as sales revenues formally. The anxiety of sales action can be related
to different variables number of calls made to the customer in given period of time, how
Scanned by CamScanner
118 __Business Intelligence Applications
od of time.
=; .
4>
sum of two terms, which represents total distance between the areas of same territory and
- Every region is divided in J areas which are then combined into I territories whose
number will be already decided. Every territory has an agent which would be associated
- It is imagined that travel times with each area is slandered keeping in mind travel time
- Every area will be identified by coordinates (¢;, fj)of one of its point Choose the point
whose coordinates are obtained as the average of the coordinates of all points belonging
7 Scanned by CamScanner
[& Business Intelligence (MU-B.Sc.-IT-Sem-VI)_4-19 Business Intelligence plications
to that area. For every territory, let (e;, f, ) denote the coordinates of the area Where ty,
.¢
This area will be called centroid of territory i. The parameters in the model are as follow,
.a; is the opportunity for sales in area j; and is a relative weight factor between total
distance and sales imbalance. Consider a set of binary decision variables Yj defined as
Define I additional continuous variables that express the deviations from the average sale,
JE
min ~ adj Yj +B S,
iel jeJ
aa Te Bee ts ‘eh
i Wye Ie
Q. 4.3.4 Describe Supply chain optimization. (Ref. Sec. ASS) oe ia a pte ale os (5 Mark )|
Scanned by CamScanner
Business Intelligence Applications
ween the supply chain j es an integrated planning and operations been carried out
take a om me ingly to maintain the standard of sub programs which would be related
Most of the companies involved in manufacturing are implementing such kind of logistic
supply chain approach so that the upstream and downstream of the supply chain whereas
the problems in the co-operation between the subprograms can also be tracked.
Also oat other advantage of having integrated logistic supply chain will reduce the cost of
expenditure which includes cost of processing, cost for transportation and distribution.
Also the inventory and equipment cost are been included and reduced in integrated supply
chain.
It is equally important to upgrade logistic supply chain by adding models and automated
tools which would help in planning and analyzing the capacity in critical situations where
~ the complexity is high in the logistic supply chain which is made to function.
In most dynamic situations where the competition is much more high as the competitor
company would also have all its efforts put into their supply chain to make it more
effective.
Competitor companies can be the companies which are production wide range of products
and so these companies will require multi centric logistic supply chain which would
effectively look into distribution of the products according to the demands of the
customers.
automation which makes the work simpler and also these chains
financial investment done so as to automate and make the chains more effective.
The effectiveness and features that are associated with logistic supply chain is directly
proportional to the profile that the company maintains to communicate with the
customers.
Scanned by CamScanner
Business Intel
| | , T ue
Offshore suppliers
Kitsuppliers © OS
Following are some of the optimization models which are associated with the features of
While learning about this models one should understand that real world logistic
production systems have more than one element that are been considered so it would be
more complex and it will have combination of different features of different elements.
Before stosting with detailed study of the models some notations that are usually used by
Scanned by CamScanner
|
In logistic systems I is products denoted by index i € I= {1, 2, ... , I}. Also the planning
The manufacturing company have some set of critical resources that are been shared
among the companies during the manufacturing process and are also available in limited
quantity.
These PekENaS may contain manpower, tools, assembly lines, specific fixtures and so on.
When even a single critical resource is applicable to the manufacturing process the index
It is the first form where the main objective of planning is to regulate the amount of
production for every product over T time period which includes midterm ‘planning
horizon as well which should also satisfy given demand and capacity limits for each and
every resource that is been used in manufacturing process and which also keeps the cost
i€T iel
Scanned by CamScanner
4.4.2 Extra Capacity —
, ‘ : ti ‘
The first model deals with resorting extra capacity with respect to over time, part time .
The decision variables in first model are also considered here with addition of fey Moe
i€T ie!
ie!
iel, te T.
Ps T,O,20,
If the critical resources are to be included in the manufacturing process the formula will
have few more parameters included and the decision variables required are already been
included.
i€T ie!
x ¢, P, Sb, re R,teT,
Pi, Tj, 2 0, ie I, te T,
4.4.4 Backlogging
refers to possibility that a portion of demand is to be given in certain period of time andit |
Scanned by CamScanner
pusiness Intelligence (MU- .Sc.-IT-
was left after th penalty cost that is been involved and the work that
as lost sales which cannot be fulfilled and so the there is a subsequent lost.
This ance 1S Important to add new decision variables like B,, is units of demand for
1S unit cost of delaying the demand for product i over period of time.
;x+
presented in minimum conditions which would be for technical and economy reasons. -
only, sometimes the conditions are like the production values should be equal to 0 for one
or more products or less than the threshold value that is been in minimum lot.
- To include these conditions in model binary decision variables listed below need to be
included.
1 ifP,>0-:
Yy -= :
0 otherwise,
x x _P., +h, I,
2 e; Py Sb, , te T,
iel
Scanned by CamScanner
oS
Pi, 2 1, Yin ié I, te T,
Pi SY Yin ie I, te T,
- One more feature that can be added in planning model is bill of materials which is
- In which end product that is been made will have various components that are been useg
~ Aj which is units of product j directly required by one unit of product j, in which term,
product refers to end product and associated components required which define differen,
zz
Pee, | ie tet. .
— Itis the responsibility of logistic system to supply N number of peripheral depots to every
network,
The main aim of Company is to have optimistic logistic plan which satis
on plants.
Scanned by CamScanner
P
transported from m to n. :
Xan = 8
s.to neN mn m? me M,
Xma = 0, me M, ne N.
Syllabus Topic : Logistic and Production Models : Revenue
Management Systems —
Revenue management is a policy to manage and its main objective is to maximize the
profits for the company by maintaining the balance between demand and supply.
It is usually created for marketing and logistic criteria and has also gained interest in
expected to grow as the basic idea was related to the revenue and every company thinks
But the revenue management needs to be planned according to the strategies and decision
making patterns and models of the company and so it becomes complex when data is bee:
feed to it.
When it comes to revenue management the models that are involved have mathematical
models which are used to determine the actions of the customers at every level so the
availability of the product and its price can be optimized to have maximum of the profits
~ Scanned by CamScanner
et Business Intelligence (MU-B.Sc.-IT-Sem-Vl)_4-27 Business Intelligence Applications
— The aim of revenue management is not only maximizing profit but also managing Various
offers on products and services to increase the demand which will have different ideas of
- It gives focus on fulfilling the requirements with minimum expenditure on the cost for the
transport.
Since it is a managerial policies most of the companies have taken up this policy ang
working over it. It is been notices that this policy have become the favourite and growing
successfully and the fields that are actively implementing this policy are automotive rentaj
The common features among these fields are they have low margin sales cost and the
possibility of imposing dynamic policies for public and also violating various sales
channels. ;
SNe ee
NOTE
known as decision making units also known as DMUs as they have decisions that are self
governed.
To calculate the efficiency of n units N = {1, 2,...,.n} re the set of units being compared.
If these units are able to produce one single output from one single input only the effect of
In that yj will be the output value generated by DMU, and x; is input that is been used. -
And if output is generated using different input factors, the efficiency of DMU; will be
Given by H = {1, 2, ..., 8) is set of production factors and K = {1, 2, .... m} which are.the
outputs. In x, i € H which gives quantity of inputs I which are been used in DMU, and
’ v1 K which is the quantity of output r that is been gained and the efficiency of DMU;
is given as : ,
Scanned by CamScanner
Business Intelli :
: Is Uy...) Uy are been associated by outputs and v,, v2,..., Vv, is been
assigned to inputs.
_ Whereas when j ue
een it comes to second case, the ability value may have different variations
different units.
- Soto erond different problems that can be raised by units to represent a unit of weights
Data envelopment analysis calculates the ability for every unit on bases of this weigh
mechanism which is good for DMU where the efficiency of system will be maximized.
- Also by doing additional analysis the aim of data envelopment analysis are efficient or
not. ,
Syllabus Topic : Data Envelopment Analysis : Efficient Frontier
It is also known as production function which shows the relation between the inputs that
are been used and the outputs that are been produced using those inputs. It also shows the
Also it showed the minimum quantity of inputs that would be required to obtain the
‘methods. Efficient frontier can easily be gained by having set of observations which
shows the output level of given set of combination of input level production factor.
When it comes to data envelopment analysis the observations that are been obtained
responds to the units that are been evaluated. Statistical methods which use instances to
calculate regression curve give predefined hypotheses on shape of production functions.
‘The only condition is that the units which are been compare
"Scanned by CamScanner
(er Business Intelligence (MU-B.Sc.-IT-Sem-Vl) 4-29 Business Intelligence Applications
—__
When data envelopment analysis model is used the option of choosing the optimal
decision variables are given by weights u,, r € K and y,, i € H that is been associated With
There are various formulas to get the efficiency score both the well known js
reK UY,
x
i¢H
icH 195.
an ViXjj
u,, v; 2 0, ré K,ie H.
max Ve eK Uy
i€H
u,, ¥, 2 0, re K,ie H.
max v=
s.to ViXij = 1,
Vj x50 ‘ j Ee N,
Let 9« be the optimum value of the objective function corresponding to the optimal
Solution (v*, u*) of . DMUj is said to be efficient if 9* =1 and if there exists at least one
By solving a similar optimization model for each of the n units being compared, one
_ The flexibility enjoyed by the units in choosing the weights represents an undisputed
advantage, in that if a unit turns out to be inefficient based on the most favourable system
_ However, given a unit that scores 0° =1, it is important to determine whether its efficiency
- CCR model that is been associated with input oriented dual problem which has
interpretation as follows :
min,
a j AGj ‘i
ZX Ay,
i, 2.0, jeEN.
- When it comes to real world applications it is always favourable to set improvement aims
for inefficient units for both input utilized and output generated.
_ - _Data envelopment analysis gives important suggestions in this case as it can identify at
which levels of the input and output the not so capable units will give ability values.
-. The ability scores of unit show the highest peoportion of inputs that are been utilized and
-- The opposite of ability score shows the factors the factor by which current level of output
must be multiplied to make unit capable which constantly holds the level of utilized
inputs.
Based on capability values data envelopment analysis gives a account for every unit that
will be compared to savings that is been done in inputs or what has increased in output to
To analyses target values input output strategy can be followed where the first case is the
improvement aims that ate to be considered for resources to be used and target values of
target —_ .
x = V*Xy-5; * ie H,
target *
Vy = y,+s* » rek,
Scanned by CamScanner
4-31 Business Intelligence Application,
Whereas in second case, target values for inputs and outputs are given by,
target — ic ‘
4*
tarpet Yyt 5, K
Vy = ve ee
- Data envelopment analysis demonstrates every unit that is not capable from the set Of best
units which are said to be peer group which have both capable units that contribute jp
~ This group is made up of multiple DMUs which are differentiated based on Operating
methods which are same as inefficient units that are been checked in real environment
where the unit should show its best capability so as to improve the operating practices and
its performance. .
- The units that are present in peer group the given unit DMU; can be identified by
following and DMUs for first and second conditions are :
_)., > * _ I .
Q.4.9.1 Explain basic factors associated with Identification of good operating practices.
~ Having good operating practices is important has it helps to improve the performance
The units that are said to be capable in terms of data envelopment analysis demonstrate to
compare and also examples that are associated with other units.
~ Also between all the most efficient units there might be some which will help to improve
the existing ability. It is important to search for most capable unit so that the ability of
existing operating practices is improved. .
Scanned by CamScanner
er Business Intelligence (MU-B.Sc.-IT-Sem-\ 49 s ni septa
to identifi ; ;
So Y Ereat operating practices the units that are actually capable needs to be
“ cm i i
_ To distinguish betwee :
alysis, evalustion pn these units We can use different methods like: cross-efficiency
_ Cross efficiency analysis is done with the help of efficiency matrix that gives information
about ne nature of Weights systems which are been implemented by units for their ability
calculation.
- The square efficiency matrix contains multiple rows and columns that have units that are
been compared. The element 0,; of matrix denotes ability of DMU, calculated with optimal
weights structure for DMU, and Q;, ability of DMU, which is evaluated using optimal
weights.
dimension along with units the ability value in column related to DMU, that should be
less than 1. ,
- The quantities of interest can be derived from efficiency matrix. In which first is the
average ability which is obtained from ;j column whereas second is average efficiency
Later is gained by averaging values in rows which is been associated with units that are -
been examined. .
The difference between 0, and DMU; and ability gained as average value of j" column
gives the result of how much the unit relies on system weights that is been used by units
If the difference obtained between the two terms is relevant, DMU, will choose structure
that is not beer’ shared by other DMU in order to given all the privilege of analysis for
efficient functioning.
- Virtual inputs and virtual outputs gives information about importance of every units
features for every input and output for the reason to maximize its ability score.
- And hence allows some specific capability of every lnicsSeteribsFiee, BAGRNEAteS nt Aled
its weaknesses are been presented at same time. The virtual inputs that are of DMU are
said to be the product of inputs that are been used by unit and its interrelated weights.
Scanned by CamScanner
Business Intelligence Application,
f input outputs pair for which unit shows maximum high 5COrg
different Operating
combinations of inputs an a a
practices, So here each unit has got two different ways in which it can function to gai,
maximum output.
When the units that are really efficient are to be separated from efficiency score majorly
Conditions are been implied on the values of weights which will be related to inputs and
outputs. These conditions are the converted into definition of maximum threshold of
specific output for a particular weight or minimum threshold for specific inputs of
weights.
Even when different conditions are imposed on weights they still have some resilience in
And due to this reason it will be helpful to sort evaluation of virtual inputs and outputs to
identify units that are more efficient operating practices related to usage of specific input
Q.1 Explain Relational marketing and various factor associated with it.
Q.3 Explain sales force management and various factor associated with it.
Scanned by CamScanner
—————
du |
(5 Marks)
Q.7 List and explain efficiency measures associated with Data Envelopment analysis.
Practices :
Q.10 Explain basic factors associated with Identification of good operating practices.
Q00
Chapter Ends...
Scanned by CamScanner
Unit Il
98 CHAPTER
—o>—>Eyz~zEmAAAADAD»_ _—_——eeeE=eEEeeEeEEEeEeEeEeEeEEEeE>EeEeEeEE>EEEE_
* _ Classification problems are supervised learning methods. It is used to predict the target
. attribute.
- Classification application includes image and pattern recognition, medical diagnosis, loan
approval, detecting faults and industry applications. Estimation and prediction are viewed
as type of classification.
- The explanatory attribute are termed as predictive variables. The target attribute is named
Predicted or explanatory variables. It describes the examples belonging to the same class.
These relationships are interpreted into classification rules. It is used to predict the class
Components of a classification
problem
1. Generator
2, Supervisor
3. Algorithm
+> 1. Generator
The role of the generator is to take out random vectors m of examples permitting to an
=> 2. Supervisor .
The supervisor returns for each vector m of examples the value of the target class
+> 3. Algorithm
Q. 3.1.2 Whatare the three phases of classification model ? (Ref. Sec. 3.1.1) (5 Marks)
1. Training phase
2. Test phase |
The classi i . . i
allow the c i .
Scanned by CamScanner
| (277 Business Intelligence (MU-B.ScIT-Sem-VI)_3-3 Classification and Clustering.
The rules are generated during the training phase. It is used to classify the observations of
L N. It is not included in the training set, for which the target class value is already known. The
| = 3. Prediction phase
| A prediction is achieved by applying the rules generated during the training phase to the
Components of
classification model -
2. Separation moels
The classification models which belongs to separation model category differ from each
other with respect to the type of separation regions, loss function etc.
methods, neural networks and support vector machines. Some variants of classification
Scanned by CamScanner
Classification and Cluster
. Regression model
supervisor.
= 4. Probabilistic models
supervisor.
Evaluation of classification
model
3. Scalability
4, interpretability .
> 1. Accuracy
The accuracy of a model is to forecast the target class for future observations. Based 04
accuracy values, it is possible to compare different models in order to select the classifier.
—> 2. Speed
samplings.
accuracy, do not vary significantly as the choice of the training set. It is expected !°
Scanned by CamScanner
: (&F Business | Intelligence (MU-B.Sc.
T-Sem-VI)__3-5
=> 3. Scalability
> 4. Interpretability
generated should be simple knowledge workers and experts in the application domain
training. Usually one third for testing, the rest for training .
The holdout method offers an evaluation of the true error rate (accuracy) of a classifier.
We have a (small) data sample of the whole data (population). Sampling is used to divide
In Holdout estimate, the process of repeating different subsamples make the method more
reliable. In each iteration, a certain proportion is arbitrarily selected for training (possibly
with stratification).
The error rates (or some other performance measure) on the different iterations are
The disadvantage of repeated holdout method is that it is still not optimum. The different
There hre m observations in two disjoint sets T and V. T is for training and V is for testing
purpose. Repeated random sampling involves replicating the holdout method r number of
times. |
For each repetition a sample Ty, is extracted and corresponding accuracy is calculated T,
acc, = Tr
Y accan (Vx)
k=1
Scanned by CamScanner
Classification and Clustering
Se hanes
3.2.3. Cross-Validation
The cross validation is based on dataset D. There are r disjoint subsets L,, L,, L,...L, and
require r iterations. At i iteration L, is selected as the test set and union of all other
Vj=L T= jek
— Standard method for evaluation is ten fold cross validation. Extensive experiments have
— Repeated stratified cross validation even better. Ten fold cross validatio
times and results are averaged (reduces the variance). Leave one out is a particular form
of cross validation. In this case m test sets include only one observation and each example
in turn measure accuracy.
m repeated 10
A binary classifier produces output with two class values or labels, such as Yes/No and
1/0, for given input data. The class of interest is usually denoted as “positive” and the
other as “negative”.
- A test dataset is used for performance evaluation. It should hold the correct labels
(observed labels) for all data instances. These labels are used to compare with the
- The predicted labels will be exactly the same if the performance of a binary classifier is
- A binary classifier predicts all data instances of a test dataset as either positive of
negative. This classification (or prediction) produces four outcomes - true positive, true
negative, false positive and false negative.
- Enrror rate (ERR) and accuracy (ACC) are the most common and intuitive measures
@ Error rate
Scanned by CamScanner
et Business Intelligence (MU-B.Sc.-IT-Sem-VI)__3-7 Classification and Clustering
re
Error rate is calculated as the total number of two incorrect predictions (FAN + FAP)
@ Accuracy
Accuracy is calculated as the number of all correct predictions divided. by the total
number of dataset. The best accuracy is 1.0 whereas the worst is 0.0. It can be calculated as,
1-EPR.
True positive rate or sensitivity is calculated as the number of correct positive predictions
It is the number of correct negative predictions divided by the total number of negatives.
TAN
SP = TAN + FAP
@ Precision
It is calculated as the total number of correct positive predictions divided by the total
number of positive predictions. The best precision is 1.0 whereas the worst is 0.0.
_.. _ __TRP
It is calculated as the number of incorrect positive predictions divided by the total number
of negatives.
1 — Specificity
FAP
Fy
Scanned by CamScanner
@} Business Intelligence (MU-B.Sc.-IT-Sem-VI)__3-8 Classification and Clustering
B is commonly 0.5, 1 or 2.
@.3.2.6 Explain the ROC curve chart. (Ref. Sec. 3.2.5) (5 Marks) |
Characteristic (ROC) curve charts allow the user to visually evaluate the accuracy of a
classifier.
It-is used to compare different classification models. They visually express the
It allow the ideal trade-off between ‘the number of correctly classified positive
costs. :
prediction
ape ea Be
se dataset into four outcomes - TAP, TAN, FAP, FAN. The ROC plot
Fig. 3.2.2
ROC CuEveR with the top left corner area (0.0, 1.0) show good performance levels. ROC
curves bottom right comer (1.0, 0.0) area indicate poor performance levels.
Scanned by CamScanner
(47 Business intelligence (MU-B.Sc.-IT-Sem-V1)
1.00 +
0.75) Good
0.50 - Random
Sensitivity
0.25 + Poor
0.00 +
1 - Specificity
performance levels.
Fig. 3.2.3
It is visual aid for calculating performance of classification model. Both charts consist of
For example, An educational institute wants to do mail marketing drive for new course. It
costs institute Irs for each item mailed. They have information of 1,00,000 students. Out
Scanned by CamScanner
app Business Intelligence (MU-B.Sc.-IT-*
— They axis shows the percentage of positive response and x axis shows the percentage of
students contacted. .
— Baseline — overall response rate-It means if institute contact n number of students then n
response for the percentage of the students contacted. e.g. [6000/20000]* 100 = 30 %.
1007-
90
2 80T
S 70 : — Lift curve
S = —e Base li
: 50 @ line
3 40
. 30 3
* 20 4
10 3
0'TTT''TtT
0 10 20 30 40 50 60 70 80 90 100
% Customers Contacted .
Fig. 3.2.4
@ Liftchart |
Scanned by CamScanner
(EF eusiness Intelligence (MU-B.Sc.-IT-Sem-VI) 3.44 Classification and Clustering
For contacting 10% of students using no model we should get 10% of the responders and
using model 30% of the responders so y value of the lift curve is 30/10 = 3. Similarly for
_ The cumulative and lift chart gives an idea that which customers to contact.
Lift Chart
3.5
? \ ~e Lift Curve
2.5
’ Baseline
12=
= Oe
-A
1.5
1 fee. gg ga
0.5 +— ss
0s t ' T T T LU '
10 20 30 40 50 60 70 80 90 100
% Customers Contacted
Fig. 3.2.5
~ It assumes that there is independence among predictors. In simple terms, a Naive Bayes’
~ For example, a fruit may be considered to be an apple if it is red, round, and about
3 inches in diameter. Even if these features depend on each other or upon the existence of
the other features, all of these properties independently contribute to the probability that
Scanned by CamScanner
Pa
hei a SF
ey Business Intelligence (MU-B.Sc.-IT-Sem-Vl) 3-12 Classification and Clustering
Let us implement the Bayes’ Theorem using a simple example. Suppose we want to find
the odds of an individual having high blood pressure, given that he or she was tested for it
In the medical field, such probabilities play a very important role as it usually deals with
— P(PoslBp) is the probability of getting a positive result on a test done for detecting Blood
pressure, given that you have Blood pressure. This has a value 0.9. In other words the test
is correct 90% of the time. This is also called the Sensitivity or True Positive Rate.
-P(Negl ~ Bp) is the probability of getting a negative result on a test done for detecting
diabetes, given that you do not have diabetes. This also has a value of 0.9 and is therefore
correct, 90% of the time. This is also called the Specificity or True Negative Rate.
— The Bayes formula is as follows : .
_ P(A) is the prior probability of A occurring independently. In our example this is P(Bp)-
— P(B) is the prior probability of B occurring independently. In our example this.is P(Pos).
— P(AIB) is the posterior probability that A occurs given B. In our example this is
P(Bp!Pos).
— Thatis, the probability of an individual having Blood pressure, given that, that individual
got a positive test result. This is the value that we are looking to calculate.
— Putting our values into the formula for Bayes theorem we get:
— The probability of getting a positive test result P (Pos) can be calculated using the
Scanned by CamScanner .
(47) Business Intelligence (MU-B.Sc.-IT-Sem-VI) 3-13 Classification and Clustering
[fa'as2 expan nave Bayes lassie wih example. (Ref $60.82) (Marka
— The naive Bayes algorithm reduces the complexity of Bayes’ theorem by assuming
- Given, n different attribute values, the likelihood now can be written as,
:n
P(X,...XIY) = TI POY),
i=1
- In Naive Bayes algorithm considers the features that particular feature in a class is
— For example, a fruit may be considered to be an apple if it is red, round, and about 3
inches in diameter. In this case all properties or features are independently contribute to
the probability that this fruit is an apple and that is why it is known as ‘Naive’.
- So in the above example, we are considering only one feature, that is the test result. If we
- Let’s say this feature has a binary value of O and 1, where the former signifies that the
individual exercises less than or equal to 2 days a week and the latter signifies that the
- If we had to use both of these features, namely the test result and the value of the
‘exercise’ feature, to compute our final probabilities, Bayes’ theorem would fail. Naive
Bayes’ is an extension of Bayes’ theorem that assumes that all the features are independent
of each other.
Scanned by CamScanner
-T-Sem-Vl) 3-14 _Classification and Clustering
@ Advantages
It is easy and fast to predict class of test data set. It performs well in multi class
prediction.
compare to other models like logistic regression and you need less training data.
For numerical variable, normal distribution is assumed (bell curve, which is a strong
assumption). ,
@F Disadvantages
— If categorical variable in test data set has a category ,which was not observed in training
data set, then model will assign.a 0 (zero) probability. It will be unable to make a
prediction. This is often known as “Zero Frequency”. To solve this, one of the simplest
— The limitation of Naive Bayes is the assumption of independent predictors. In real life
situation, it is not possible to get a set of predictors which are completely independent.
— Naive Bayes is used for making prediction§ in real time. It is very fast.
- It is used for multi class prediction feature. It predict the probability of multiple classes of
target variable.
- Naive Bayes classifiers mostly used in text classification (due to better result in multi
class problems and independence rule) have higher success rate as compared to other
algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and
Sentiment Analysis (in social media analysis, to identify positive and negative customer
sentiments).
System. It uses machine learning and data mining techniques to to predict whether a uset
Scanned by CamScanner
Classification and Clusterip,
= 0.222*0.444*0.667*0.667 = 0.044
It can be used for a wide range of tasks including time series prediction, decision under
A Bayesian network consist of two main components. The first is an acyclic oriented
graph where the nodes correspond to the predictive variables and the arcs indicate
The variable X; associated with nade a; in the network which is dependent on predecessor
nodes of a;.
The second component consists of the table associated with the variable Xj indicates the
conditional distribution of P(X; IC; ), where C; represents the set of explanatory variables
associated with the predecessor nodes of node a; in the network and is estimated based on
3.4
Logistic Regression
@.3.4.1 Write short note on logistic regression. (Ref. Sec. 3.4) —SSC*C«((G Marks)|
category.
Logistic regression is generally used where the dependent variable is Binary. That means
the dependent variable can take only two possible values such as “Yes or No”, “Default or
Example 1
- Ifacredit card company is going to build a model to decide whether to issue a credit card
to a customer or not, it will model for whether the customer is going to “Default” or “Not
- The probability of any event lies between 0 and 1 (or 0% to 100%). when we plot the
curve.
Example 2
college of his or her choice by-the score candidates receives in the admission test. The
Since the relationship between the Score and Probability of Selection is not linear it
shows an ‘S’ shape, we can’t use a linear model to predict probability of selection by a
correlation between the predictor and dependent variable linear. Use a logistic regression
model to predict the probability of getting the “Admission.
100.0%
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0% }
0.0%
700 800
Scanned by CamScanner
(ey Business Intelligence (MU-B.Sc.-IT-Sem-VI) 3-18 Classification and sestering
The above graph is called as Sigmoid function and it gives S-shaped curve. It gives value
between 0<p<l.
Transformed = 1/(1+¢e*-x)
Where e is the numerical constant Euler’s number and x is a input we plug into the
where the left-hand side is called the logit or log odds function. The odds signifies the
ss,
3.5
Neural Networks
Each unit takes an input, it applies a nonlinear function to it and then passes the output on
to the next layer. Generally the networks are defined to be feed-forward: a unit feeds its
output to all the units on the next layer, but there is no feedback to the previous layer.
Weightings are applied to the signals which passes from one unit to another, and in these
weightings which are tuned in the training phase to adapt a neural network to the
Perceptron were popularised by Frank Rosenblatt in the 1960. They appeared to have very
. A perceptron is<a neural network unit (an artificial neuron) which does certain
It consists of single neuron with adjustable synaptic weights and bias. It can be used to
a linearly separated pattern, A simple perceptron can be used to classify into two
classes.
oie eee 18 a supervised learning algorithm for binary classifiers. This algorithm
enables neurons to learn and processes elements in the training set one at a time. -
Scanned by CamScanner
7 eusiness Intelligence (MU-B.Sc.-IT-Sem-Vl) 3-19 Classification and Clustering
SSS SSS
spite Weights wt
function function
Fig, 3.5.1
- Multilayer Perceptrons or feed forward neural networks with two or more layers have the
@ Perceptron Function
- Perceptron is a function that maps its input “x” which is multiplied with be ies
5) -{ 1. ifw-x+b>0 Ve
~ {| 0 otherwise - /
Where, | ;
2 Wj X;
i=]
- The output can be represented as “1” or “0.” It can also be represented as “1” or “—1”
*. Inputs of a Perceptron
~ A Perceptron accepts inputs, moderates them with certain weight values, then applies the
Scanned by CamScanner
(7 Business Intelligence (MU-B.Sc.-IT-Sem-VI) __3-20 Classification and Clusterin
SSS
A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc
It has only two values: Yes and No or True and False. The summation function “y»
multiplies all inputs of “x” by weights “w” and then adds them up as follows :
For example: If © @,x; > 0 => then final output “o” = 1 (issue bank loan).
In the Perceptron Learning Rule, the predicted output is compared with the known output,
If it does not match, the error is propagated backward to allow weight adjustment to
happen. ;
classifier. , |
Weights are multiplied with the input features and decision is made if the neuron is
fired or not.
o Activation function applies a step rule to check if the output of the weighting
— If the sum of the input signals exceeds a certain threshold, it outputs a signal; otherwise,
there is no output.
Multilayer Perceptron (MLP) includes at least one hidden layer (except for one input layer
Multi-level feed-forward neural network, is a more complex structure than the perceptron,
since it includes input nodes, hidden nodes and output nodes use a neural network with |
two input nodes i, and i,, two hidden neurons h, and h,, two output neurons 0, and 0.
Scanned by CamScanner
(7) Business Intelligence (MU-B.Sc.-IT-Sem-VI) 3-214 Classification and Clustering
ee SSS SSS
Fig. 3.5.2
The goal of back propagation is to optimize the weights so that the neural network can
Input nodes : Input nodes receive input the values of the explanatory attributes for each
observation. Usually, the number of input nodes equals the number of explanatory
variables. i
Hidden nodes : Hidden nodes receives the information from input nodes and transforms
the input values inside the network. Each node is connected with outgoing arcs to output
Output nodes : Output nodes receive connections from hidden nodes or from input nodes
and return an output value that corresponds to the prediction of the response variable.
Each node of the network has given weights which are associated with the input
arcs. Each node is associated with a distortion or bias coefficient and an activation
function. -
3.6
0.3.64 Write short note on support vector machine. (Ref. Sec.3.6) — _—«(& Marks)
The simply way to describe SVM is a binary classifier. It attempts to find a hyperplane
that can separate two class of data by the largest margin. Quazi Marufur Rahman gives a
trick is most important part of SVM, it distinct SVM with other classifiers.
Scanned by CamScanner
(ep Business Intelligence (MU-B.Sc.-IT-Sem-Vl) 3-22 Classification and Clustating
principle for model selection used for learning from finite training data sets.
/ Fig. 3.6.1
/ - The above data can be divided into two classes class 1 and class 2. The above data is
linearly separable.
Class + 1
kk ae eve
: Aa x * f(x) <0
ae : pe
or 4 :
—
Fig. 3.6.2
- Astraight line will classify data into two classes. The equation is f(x) = f(x,, X,) = 0.
Unseen pattem
-— (Test data)
G6
"88°
co
Fig. 3.6.3
Scanned by CamScanner
a
- Suppose we have unseen data set. The value of unseen dataset f(x) < 0 then it is classified
as class — I,
- Now we are in position to define tow quantities training error and test error.
-— Since during training phase classifier learns the distribution of data, the low value of
training error is required. Test error also should be low. Because it controls the unseen
data pattern.
We have to always look for test error along with training error.
a Pp
Training error
(Complexity)
ap Vg dimension
Fig. 3.6.4
with VC dimension.
Inn dimensional feature space a set of m points (m > n) is in general position iff no subset
n=2 Where
Fig. 3.6.5
Fig. 3.6.6
@ Shattering
So VC dimension is cardinality of the largest set of points that the hypothesis can shatter.
Scanned by CamScanner
(7 Business Intelligence (MU-B.Sc.-IT-Sem-VI)__ 3-25 Classification and Clustering
— The following is an example of hyper plane that separates training instances with no
errors.
Fig. 3.6.8
- If we think then there are multiple hyper planes which can be choose for separating two
data points.
- For the maximum margin hyper plane only ‘examples on the margin matter (only these
® Definition
w+ x,+b2-1, when y; =— 1,
H,:w-x;+b2+1
H,:w-x,+b2-1
The points on the planes H, and H, are the tips of the support vectors.
Scanned by CamScanner
[ep Business Intelligence (MU-B.Sc.-IT-Sem-VI)__3-26 Classification and Clustering
Fig, 3.6.10
— Kernels: Make linear models work in nonlinear settings By mapping data to higher
-— The simplest way to separate two groups ‘of data is ‘with straight line, flat plane an
N-dimensional hyper plane. However there are situations where a non linear region can
SVM handles this by using kernel function(non linear) to map the data into different
It means a non linear function is learned by linear learning machine in a high dimensional
feature space which the capacity of the system is controlled by a parameter that does not
Kernel function map the data into new space. It take the inner product of new vectors. The
image of the inner product of the data is the linear product of the images of the data. Two
@ Polynomial kernel
k(X;, Xp)
(x, X, + 1)"
@ Gaussian kernels
wo (eget)
k(x, X) 20
Scanned by CamScanner
Classification and Clustering
-IT-Sem-Vl)_3-27
3.7 Clustering
—_—_—_—
Cluster analysis or clustering is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more similar (in some sense) to each other than
[a. 3.7.1 Whatare the characteristics of clustering method? (Ref. Sec. 3.7.1) _ (4 Marks)
Clustering methods must satisfy a few general necessities, as indicated below.
Clustering Methods |
Necessities
1. Flexibility |
[ 3, Efficiency
~> 1. Flexibility
There are clustering methods which can be used on numerical characteristics only. In such
cases most of the time Euclidean metrics is used to determine the distances between
observations.
attributes.
> 2. Robustness
The robustness of an algorithm is the stability of the clusters generated with respect to
small changes in the values of the attributes of each observation.
~> 3. Efficiency .
Scanned by CamScanner
(Business Intelligence (MU-B.Sc.-IT-Sem-VI) 3-28 a Classification and Clustering
The different types of Clustering based on the logic are partition methods, hierarchical
Types of Clustering
1. Partition methods
TT
2. Hierarchical methods
4. Grid methods
~ Fig. 3.7.2: Types of Clustering
=~ 1. Partition methods
non-empty subsets. They generate a spherical or at most convex shape after grouping.
+ 2. Hierarchical methods
Hierarchal and partition methods are founded on the distance between observations.
Density-based methods determine clusters from the number of observations locally falling
in a neighbourhood of each observation.
For each member which belongs to a specific cluster, a neighbourhood with a specified
diameter should contain a number of observations which should not be less than 4
Density-based methods identify clusters of non-convex shape which helps them to isolate
Scanned by CamScanner
(4) Business Intelligence (MU-B.Sc.-IT-Sem-VI) 3-29 Classification and Clustering
In Hierarchical clustering clusters are repeatedly links to pairs of clusters so that every
data object is included in the hierarchy. To determine the similarity between the clusters the
distance functions, such as the Manhattan and Euclidian distance functions, are used
- Given two p-dimensional data objects i = (Xj, Xjz, «--Xjp) aNd j = (Xj,Xjq, -.Xjp), the
2.
- Distances are always positive numbers. In the Euclidian distance function, attributes with
prevent this problem, the attribute values are often normalized to lie between 0 and 1.
- A third option which generalizes both the Euclidean and Manhattan metrics. The
dist (i,j) =4
- Example:
To calculate a distance between two points p (x1, y) and q (Xzs y2) in xy-plane.
Fig. 3.7.3
Scanned by CamScanner
(2) susiness Intell ence (MU-B.Sc.-IT-Sem-VI) _ 3-30 , ae ere CAN
The distance between two points is the sum of the (absolute) differences of their
coordinates. E.g. it counts | unit for a straight move, and it counts cost as 2 if one takes
crossed move.
Manhattan Distance
214 |.2
14 1
212
Fig. 3.7.4
In chess, the distance between squares on the chessboard for rooks is measured in
Manhattan distance
3.7.4 Attribute
The nouns attribute, dimension, feature, and. variable are commonly recognized as
attribute in literature. ‘
Data mining and database professionals commonly use the term attribute. Attributes
describing a customer object can include, for example, customer ID, name, and address.
Univatiate distribution involves only one attribute. The distribution of data having two
The type of an attribute is determined by the set of possible values the attribute can have.
Types of Altribute
2. Nominal
3. Ordinal
Fig. 3.7.5
Scanned by CamScanner
&P Business Intelligence (MU-B.Sc.-IT-Sem-VI)_3-31 ______Classification and Clustering
Q. 3.7.3 Write short note on Binary attribute. (Ref. Sec, 3.7.5(1)) (5 Marks)
- 0 means attribute is absent and 1 means it is present. Binary attributes are referred to as
Boolean as two states correspond to true and false. 1 means that it is present.
- E.g. Smoker describing a patient object, | indicates that the patient smokes, while O
- Asimilarity measure for two objects, i and j, will typically return the value 0 if the objects
are unalike. The higher the similarity value, the greater the similarity between objects.
(Typically, a value of 1 indicates complete similarity, that is, that the objects are
identical.)
- Adissimilarity measure works the Opposite way. It returns a value of 0 if the objects are
the same (and therefore, far from being clsaivailar. The higher the dissimilarity value, the
_~ A nominal attribute can take on two or more states. For example, flower color is a
nominal attribute that may have, say, five states: red, yellow, green, pink, and blue
| — © Let the number of states of a nominal attribute be M. The states can be denoted by letters,
symbols, or a set of integers, such as 1, 2,..., M. The dissimilarity between two objects i
a,j) = =™
- Where m is the number of matches (i.e., the number of attributes for which i and j are in
the same state), and p is the total number of attributes describing the objects. Weights can
- There is another approach which involves computing a dissimilarity matrix from the
Object j
1 q R qtr
0 Ss t st+t
sum | q+s|r+t|P
Scanned by CamScanner
Classification and Clustering
-. r+s
The above equation states a degree of similarity between pairs(i,j) of observations through
the coefficient of similarity.
Assume that all n attributes are binary and asymmetric. In such case, for a pair of
d(i,j) =rt+sqtrts
- (4 Marks)
Nominal attributes means “relating to names.” Nominal attribute are symbols or names of
things. Each value denotes some kind of category, code, or state. Nominal attributes are
also referred as categorical. In computer’ science, the values are also known as
enumerations.
Nominal attributes. Suppose that Hair color and Marital status are two attributes
describing person objects. In our application, possible values for Hair color are black,
It is symmetric attribute where the value is greater than 2.We use similarity coefficient in
Where, f is the number of attributes in which observations i and j take the same value.
Q.3.7.5 Write short note on Ordinal attribute. (Ref. Sec, 3.7.5(3)) 4 Marks) |
Values of ordinal attribute has possible values and have a meaningful order or ranking
Scanned by CamScanner
[G7] Business Intelligence (MU-B.Sc.-IT-Sem-VI)__3-33 ___CClassification and Clustering
Suppose that Drink size corresponds to the size of drinks available at a restaurant. This
ordinal attribute has three possible values — small, medium, and large. However, we
cannot tell from the values how much bigger, say, a medium is from a large.
Gd
Z:=M,-1
A dataset contain all attribute types nominal, ordinal, symmetric binary, asymmetric
binary etc. To define an overall affinity measure which defines similarity between
sw a a’
a,j) = =4—
=1
PO
a
If f is numeric it uses the normalized distance.
Hol
= Mol
3.8
Partition Methods
Partition methods are heuristic nature. They are.based on greedy methods where at each
‘Step they make the choice that locally appears the most advantageous.
There is guarantee that a good subdivision will be obtained for the majority of the
datasets. The K-means method and the K-medoids method, , are two of the best-known
Partition algorithms
Scanned by CamScanner
[FT Business Inteligence (MU-B.Sc.1T-Ser-VI)_9-
K is positive integer. The grouping is done by minimizing the sum of squares of distances
The algorithm assumes two clusters, and each individual's scores include two variables (as
Distance Functions:
1.
2:
3.
1.
Given two p-dimensional data objects i = (Xj,Xjg, -+-:Xjp) ANd j = (Xj /Xjar ---»Xjp), the
loop
a) Partition by assigning or reassigning all data objects to their closest cluster center.
b) Compute new cluster centers as mean value of the objects in each cluster.
¢) Until no change in cluster center calculation.
Variable 1 | Variable 2
l 1.0
1.5 2.0
3 3 4.0
Scanned by CamScanner
ey Business Intelligence (MU-B.Sc.-IT-Sem-VI)__3-35 Classification and Clustering
Variable 1 | Variable 2
4 5 7.0 '
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Step 1:
In this case two centroids are c, and c, where c, = (1.0,1.0) and c, = (5.07.0).
Group | 1 (1.0,1.0)
Group2|. 4 (5.0,7.0)
“| centroid 1 - centroid 2~
1 0 7.21
3 3.61 3.61
4 7.21 0
5 4.72 2.06
6 §:31 2.06
7 4.30 3 “4
We are still not sure that each individual has been assigned to the right cluster. So, we
compare each individual’s distance to its own cluster mean and to that of the opposite cluster.
Scanned by CamScanner
Classification and Clusterin, |
And we find :
o mean (centroid) of
1 1.5 5.4
2 0.4 4.3
3 21 1.8
4 57 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 11
Individual 3 is closer to the mean of the opposite cluster (Cluster 2) than its own
(Cluster 1). In other words, each individual's distance. to its own cluster mean should be
smaller that the distance to the other cluster's mean (which is not the case with individual 3),
- K-means tries to minimize the total squared error. While k-medoids minimizes the sum of
center of that cluster. In contrast to the k-means algorithm, k-medoids chooses datapoints
as centers
- Instead of taking mean value of the object in a cluster as reference point , mediods can be
- All the items from the input data set are examined by one to see that they are medoids are
not.
1. Initialize : arbitrarily select k out of the n data points as the medoids.
Scanned by CamScanner
A Business Intelligence (MU-B.Sc.-IT-Sem-VI)__ 3.37 Classification and Clustering
iO
For each medoid m and each data point h associated to m, swap m and h and compute
the total cost (that ms the average dissimilarity of h to all the data points associated to
m). Select the medoid h with the lowest cost of the configuration.
- In more simpler terms for each pair of a medoid m and a non-medoid object h, measure
1 PEG
E=
a d(p, m,)
i Me
Compute E,-En.
medoid objects j.
Case 1: j is closer to some & than to h; after swapping m and h, j relocates to cluster
represented by k. . |
represented by k .
Scanned by CamScanner
TT
ee et
Q.3.9.1 Explain single linkage, complete linkage, average linkage and ward distance | |
deterministic.
dendrogram. .
- In order to calculate the distance between two clusters, the hierarchical algorithms reson |
1. Single Linkage
2. Complete Linkage
3. Average Linkage
In single linkage hierarchical clustering, the shortest distance between two points in each
cluster is defined.
L(r,s) = min(D(x,.6)))
Fig, 3.9.2
Scanned by CamScanner
(Ef eusiness Intelligence (MU-B.Sc.-IT-Sem-Vl) 3-39 Classification and Clusterin
- 2. Complete linkage
In complete linkage hierarchical clustering, longest distance between two points in each
cluster is defined.
For example, the distance between clusters “r’ and “s” is equal to the length of the
L(r,s) = max(D(x4.%.))
Fig. 3.9.3
= 3. Average Linkage
In average linkage hierarchical clustering, the average distance between each point in one
For example, the distance between clusters “” and “s” to the left is equal to the average
length each arrow between connecting the points of one cluster to the other.
Ward distance
The Ward distance, based on the analysis of the variance of the Euclidean distances
between the observations. °
Methods based on the Ward distance tend to generate a large number of clusters, each
Centroid Method
In centroid method, distance between the two mean_vectors of the clusters is consider as
the distance between two clusters. At each stage of the process we combine the two
Scanned by CamScanner
IP usiness Intelligence (MU-B.Sc.-IT-Sem-VI) _ 3-40 Classification and Clustering
- Hierarchical methods can be subdivided into two main groups: agglomerative and divisive
methods.
— Calculate the distances (similarities) between the clusters equal the distances (similarities)
between the items they contain. Join the two most similar clusters.
cluster. Then,
Step1: Calculate the similarity (e.g., distance) between each of the .clusters and join the
Step2: Find the nearest (most similar) pair of clusters and merge them into a single
Step3: Compute distances (similarities) between the new cluster and eachof the old
clusters.
Step4: Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.
- Finally, we proceed repetitively on each cluster until there is one cluster for -each
observation. There is evidence that divisive algorithms produce more accurate hierarchies
In Divisible hierarchical clustering, top down approach is used. It starts with all objects in
one cluster. Clusters are subdivided into smaller and smaller clusters until each object
A cluster is split according to some principle, ¢.g., the maximum Euclidian distance
between the closest neighbouring objects in the cluster. Start with single cluster at the top
Clusters till the bottom is reached where there are n clusters with one member each.
Scanned by CamScanner
|
- Each level shows clusters for that level. Leaf- individual cluster, Root- one cluster. A
—_——“——_—_—~—«x«x<—X——X—XXX—SESESaCTCTCT2CX2—NA2a—X—XK€#=#=[Z=[{[
=[[$}}_>_>>E>~—_ =e.
other clustering algorithms and to compare the results obtained by different methods.
- In this way it is also possible to evaluate if the number of identified clusters is robust with
cluster, .
C,e€ X,
Sep (X,,X,) = Ge X,
C.e X,
Scanned by CamScanner
Classification and Cluste C
data. The silhouette value is 4 measure of how similar an object 1s to its Own Cluste,
- The coefficient value ranges from — 1 to + 1. The high value enn ame the object ig
well matched with its own cluster and poorly matched with neighbouring Cluster,
distance,
Q.2 What are the three phases of classification model ? (Refer Section 3.1.1) (5 Marks)
Q.4 How you evaluate classification method? (Refer Section 3.2) | (5 Marks)
Q.6 Explain the Repeated random sampling. (Refer Section 3.2.2) (4 Marks)
Q@.9 Explain the ROC curve chart. (Refer Section 3.2.5) (5 Marks)
Q.10 Explain the Cumulative gain and lift chart. (Refer Section 3.2.6) (5 Marks)
Q.11 Write short note on Bayesian methods. (Refer Section 3.3) (4 Marks)
Q.12 Explain naive Bayes classifier with example. (Refer Section 3.3.2) (5 Marks)
Q.14 Write short note on logistic regression. (Refer Section 3.4) (5 Marks)
Scanned by CamScanner
er
Q.16 Write short note on support vector machine. (Refer Section 3.6) (5 Marks)
Q.17 Whatare the characteristics of clustering method? (Refer Section 3.7.1) (4 Marks)
Q.19 Write short note on Binary attribute. (Refer Section 3.7.5(1.)) (5 Marks)
Q.20 Write short note on Nominal attribute. (Refer Section 3.7.5(2.)) (4 Marks)
Q.21 Write short note on Ordinal attribute. (Refer Section 3.7.5(3.)) _ (4 Marks)
Q.24 Explain single linkage, complete linkage, average linkage and ward distance.
Q.25 How one evaluates clustering model? (Refer Section 3.10) "(5 Marks)
goo
Chapter Ends....
Scanned by CamScanner
Uniti 4)
Data Preparation
2.1 Modeling
SO Called as the
entities of a System.
A Model is a simplified representation of the essential entities of some specific reality and
their characteristics,
— Exploration
— Explanation
- Extrapolation
_Q. 2.2.1 What are the different types of model? (Ret, Seo, 2.2.1)
Scanned by CamScanner
ee eee
Types of Mathematical
Models
3. Symbolic Model
An iconic model is a physical copy of a system usually based on a.different scale than the
original. These may appear in three dimensions like airplane, car or bridge model to scale.
Photographs are another type of iconic model but it is only two dimensions. An Iconic
Model is a look-alike representation of some specific entity for example house.
An analog model does not look like the real system but behaves like it. These are usually
“two dimensional charts or diagrams for e.g., organization charts, showing structure,
through diagrams).
Scanned by CamScanner
‘ ical Models for Decision,
i (MU-B.Sc.-IT-Se
(7 Business Intelligence
(e.g. the colour coding of a geographical chart for representing different altitudes)
Analogue Devices
(e.g. the flow of water in pipes to represent the flow of electricity in wires or the flow of
physical representation may be cumbersome and take time to construct. Therefore a more
Symbols can be :
© Mathematica.
© Logical.
© ad-hoc.
Scanned by CamScanner
ar Business Intelligence MU-B.Sc.-IT-Som-VI) 2-4 Mathematical Models for Decision Making
- the factors of the system (variables) can be represented by symbols that can be
la. 2.3.1 Write short note on structure of mathematical model. (Ref. Sec. 2.3) (5 Marks)
| statements.
For example, the relationship between cost, revenue and profit can be expressed as:
Where, _P is profit, .
Mathematical Models
Mathematical models are usually composed by’ variables, which are abstractions of
quantities of interest in the described systems, and operators that act on these variables,
Scanned by CamScanner
(>) susiness Intelligence (MU-B.Sc.-IT-Sem-Vl) __2-5 Mathematical Models for Decision Makin
If all the operators in a mathematical model exhibit linearity, the resulting mathematical
nonlinearity is dependent on context, and linear models may have nonlinear expressions
in them. ,
For example, in a statistical linear model, it is assumed that a relationship is linear in the
represented entirely by linear equations, then the model is regarded as a linear model.
If one or more of the objective functions or constraints are represented with a nonlinear
chaos and irreversibility. Although there are exceptions, nonlinear systems and models
one is trying to study aspects such as irreversibility, which are strongly tied to
nonlinearity.
—> 2. Deterministic vs. probabilistic (stochastic) -
A deterministic model is one in which every set of variable states is uniquely determined
Therefore, deterministic models perform the same way for a given set of initial
conditions.
equations.
Scanned by CamScanner
@ Business Intelligence (MU-B,Sc.-IT-Sem-V1) 2-6 Mathematical Models tor Decision Making
A discrete model does not take into account the function of time and usually uses time-
Continuous models typically are represented with f (t) and the changes are reflected over
from empirical findings and generalization from them. The floating model rests on neither
Application of mathematics in'social sciences outside of economics has been criticized for
a floating model.
(iv) The model should be adaptive. The parameters and structure of the model should be easy
¥) The model should be complete on important issues, i.e., all important variables and
Scanned by CamScanner
oe
l.
Use of models avoids constructing costly plants and warehouses in locations that do not
time.
Because of the constant squeeze on profits, the cost and time saving that MS models
the manager.
ithin a relatively-short
Disadvantages of mathematical models
A model that oversimplifies may inaccurately reflect the real world situation.
If the person who builds a model does not know what he is doing, output from the model
will be incorrect.
Models can sometimes prove too expensive to originate when their cost is compared to
2.4
Classes of Models
(6 Marks)
There are various models which are used for meRS decisions. The various mathematical
Classes of Models
Predective model
id
Optimisation model
Scanned by CamScanner
a gusiness Intelligence (MU-B.Sc.-IT-Sem-VI) 9.5
Risk analysis is the process of assessing the likelihood of an adverse event occurring
Risk analysis is the study of the underlying uncertainty of a given course of action and
returns, the probability of a project's success or failure, and possible future economic
states. .
Risk analysts often work in tandem with forecasting professionals to minimize future
Every project is extremely unique which means we cannot have a standard structure to
However, to have a good plan we need some kind of framework or structure to follow
A framework is something that tells you how often you will meet and discuss the
progress, how you will document results, how you will communicate and so on.
=» 3. Predective model
Predictive modeling is a process that uses data mining and probability to forecast
outcomes. Each model is made up of a number of predictors, which are variables that are
Once data has been collected for relevant predictors, a statistical model is formulated. The
model may employ a simple linear equation, or it may be a complex neural network,
As additional data becomes available, the statistical analysis model is validated or revised.
4. Optimisation model
The Optimization Model class provides a common API for defining and accessing
Scanned by CamScanner
27" Business Intelligence (MU-B.Sc.-IT-Sem-VI)_2-9 Mathematical Models for Decision Makin
Optimization problems can be classified in terms of the nature of the objective function
and the nature of the constraints. Special forms of the objective function ang the
From this point of view, there are four types of optimization problems, of increasing
complexity.
function can be of any kind (linear or nonlinear) and there are no constraints. These types
the variables, and all constraints are also linear. Linear programs are implemented by the
quadratic in the variables (i.e. it may contain squares and cross products of the decision
variables), and all constraints are linear. A quadratic program with no squares or cross
arbitrary nonlinear function of the decision variables, and the constraints can be linear or
service and the cost of waiting. Note that I am not considering another possible cost
lines (meaning you should never wait in the doctor's office - yeah, right!) and is not
Scheduling systems are useful when the customer is known to the system and the short
and long run costs of waiting are relatively high. We will study scheduling system
Scanned by CamScanner
.
3, The average number of customers in the system (customers in line plus those being
served.
5, The average time a customer spends in the system (waiting time plus time in the
service facility.
Pattern recognition deals with identifying a pattern and confirming it again. In general, a
pattern can be a fingerprint image, a handwritten cursive word, a human face, a speech
_ The individual patterns are often grouped into various categories based on their
properties. When the patterns of same properties are grouped together, the resultant group
— Pattern recognition is the science for observing, distinguishing the patterns of interest, and
making correct decisions about the patterns or pattern classes, Thus, a biometric system
applies pattern recognition to identify and classify the individuals, by comparing it with
Esse
(2 Marks)
— Data mining is a process used by companies to turn raw data into useful information. By
using software to look for patterns in large batches of data, businesses can learn more
about their customers to develop more effective marketing strategies, increase sales and
decrease. costs.
- Data mining depends on effective data collection, warehousing and computer processing.
Scanned by CamScanner
er Business Intelligence (MU-B.Sc.-IT-Sem-V1)
anne
Q.2.6.1 Write short note on Data Mining parameters. (Ref. Sec. 2.6) (5 Maria]
In data mining, association rules are created by analysing data for frequent if/then
patterns, then using the support and confidence criteria to locate the most importan
Support is how frequently the items appear in the database, while confidence is the
Clustering and Forecasting. Sequence or Path Analysis parameters look for patterns wikis
A Sequence i is an ordered list of sets of items, and it is a common type of data structure
found in many databases. A Classification parameter. looks for new patterns, and might
result in a change in the way the data is organized. Classification algorithms predict
Clustering parameters find and visually document groups of facts that were previously
unknown. Clustering groups a set of plots and aggregates them based on how similar
There are different ways a user can a the cluster, which differentiate between
each clustering model. Fostering parameters within data mining can discover patterns in
data that can lead to reasonable predictions about the future, also known as predictive
analysis.
Data mining techniques are used in many research areas, including mathematics,
cybernetics, genetics and marketing. While data mining techniques are a means to drive
efficiencies and predict customer behavior, if used correctly, a business can set itself apart
wen mining, a type of data mining used in customer relationship management, integrates
information gathered by traditional data mining methods and techniques over the web.
Other data mining techniques include network approaches based on multitask learning for
classifying patterns, ensuring parallel and scalable execution of data mining algorithms,
Scanned by CamScanner
}
the mining of large databases, the handling of relational and complex data types, and
machine learning. Machine learning is a type of data mining tool that designs specific
——_———_—
The major components of any data mining system are data source, data warehouse server,
data mining engine, pattern evaluation module, graphical user interface and knowledge
base.
' Graphical User Interface
| Pattern Evaluation _
- Database, data warehouse, World Wide Web (WWW), text files and other documents are
the actual sources of data. You need large volumes of historical data for data mining to be
successful.
~ Organizations usually store data in databases or data warehouses. Data warehouses may
contain one or more databases, text files, spreadsheets or other kinds of information
repositories. Sometimes, data may reside even in plain text files or spreadsheets. World
Scanned by CamScanner
emer
G7? Business Intelligence (MU-B.Sc.-IT-Sem-V1) 2-13 Mathematical Models for Decision Maki
@ Different processes
— The data needs to be cleaned, integrated and selected before passing it to the database o,
data warehouse server. As the data is from different sources and in different formats, jt
cannot be used directly for the data mining process because the data might not be
These processes are not as simple as we think. A number of techniques may be performed
The database or data warehouse server contains the actual data that is ready to be
processed, Hence, the server is responsible for retrieving the relevant data based on the data
The data mining engine is the core component of any data mining system. It consists of a
number of modules for performing data mining tasks including association, classification,
The pattern evaluation module is mainly responsible for the measure of interestingness of
the pattern by using a threshold value. It interacts with the data mining engine to focus the
~The graphical user interface module communicates between the user and the data mining
system. This module helps the user use the system easily and efficiently without knowing
- When the user specifies a query or a task, this module interacts with the data mining
, Lis knowledge base is helpful in the whole data mining process. It might be useful for
Scanned by CamScanner
_
The knowledge base might even contain user beliefs and data from user experiences that
can be useful in the process of data mining. The data mining engine might get inputs from
the knowledge base to make the result more accurate and reliable.
The pattern evaluation module interacts with the knowledge base on a regular basis to get
Architecture
- In this architecture, data mining system does not use any functionality of a database. A
no-coupling data mining system retrieves data from a particular data sources.
- The no-coupling data mining architecture does not take any advantages of a database.
That is already very efficient in organizing, storing, accessing and retrieving data.
- The no-coupling architecture is considered a poor architecture for data mining system.
| - In this architecture, data mining system uses a database for data retrieval. In loose
coupling, data mining architecture, data mining system retrieves data from a database.
_ ~ Data mining architecture is for memory-based data mining system. That does not must
Scanned by CamScanner
>
os : Chous,
systems. That is to perform some data mining tasks. That includes Sorting, in
dexing
‘aggregation. ;
In this, some intermediate result can be stored in a database for better performance,
the features of database or data warehouse are used to perform data mining tasks.
information.
i. Data Layer
We can define data layer as a database or data warehouse systems. This layer is a0
Data mining results are stored in the data layer. Thus, we can present to end-user in form
It provides the intuitive and friendly user interface for end-user. That is to interact wit)
Scanned by CamScanner
Business Intelligence (MU-B.Sc.-IT-Sem-VI)__2-16
Data mining result presented in visualization form to the user in the front-end layer.
Different data mining processes can be classified into two types: data preparation or data
preprocessing and data mining. In fact, the first four processes, that are data cleaning, data
integration, data selection and data transformation, are considered as data preparation
processes.
The last three processes including data mining, pattern evaluation and knowledge
A = _Gesnng = a
[[“Thintegration"| “pata
~ Data Mining
_ Knowledge
Evaluation
Fig. 2.7.4
Data cleaning is the process where the data gets cleaned. Data in the real world is
The data available in data sources might be lacking attribute values, data of interest etc.
For example, you want the demographic data of customers and what if the available data
does not include attributes for the gender or age of the customers? Then the data is of
An example is an age attribute with value 200. It is obvious that the age value is wrong in
this case. The data could also be inconsistent.
Scanned by CamScanner
= (b) Data integration
les
results would
g involves a number 0
Data cleanin .
d human inspection,
Suppose a table A
The same data might be available in different tables in the same database or even in
different data sources. Data integration tries to reduce redundancy to the maximum
data repository with integrated data contains much more data than actually required.
From the available data, data of interest needs to be selected and stored. Data selection is
the pr:
process where the data relevant to the analysis is retrieved from the database.
Data transformation i
ise fom tha is the ‘Process of transforming and consolidating the data into
For i
example, a data set available as "5, 37, 100, 89, 78" can be transformed as "0,05
Scanned by CamScanner
a Business Intelligenc
@ (MU-B.Sc.-IT-Sem-V1)
ae.
Data mining is the core process where a number of complex and intelligent methods are
The pattern evaluation identifies the truly interesting patterns representing knowledge
The information mined from the data needs to be presented to the user in an appealing
way.
Data mining helps organizations to make the profitable adjustments in operation and
production.
The data mining is a cost-effective and efficient solution compared to other statistical data
applications.
of hidden patterns.
It is the speedy process which makes it easy for the users to analyze huge amount of data
in less time.
Disadvantages of data mining
There are chances of companies may sell useful information of their customers to other
companies for money. For example, American Express has sold credit card purchases of
Scanned by CamScanner
Sem-V' -VI) 2-19 Mathematical Models for Decision | Making
(FP usine:
to work on.
3. Different dat
difficult task.
4. The data mining techniques are not accurate, and so it can cause serious Consequences in
certain conditions.
- (5 Marks)
3. Fraud Detection
Apart from these, data mining can also be used in the areas of production control,
customer retention, science exploration, sports, astrology, and Internet Web Surf-Aid
Listed below are the various fields of market where data mining is used :
— Customer Profiling : Data mining helps determine what kind of people buy what kind of
products.
products for different customers. It uses prediction to find the factors that may attract new
customers.
Scanned by CamScanner
[77 Business Intelligence (MU-B.Sc.-IT-Sem-VI)__ 2-20 Mathematical Models for Decision Making
product sales,
Target Marketing : Data mining helps to find clusters of model customers who share the
summary reports.
Finance Planning and Asset Evaluation : It involves cash flow analysis and prediction,
contingent claim analysis to evaluate assets.
Resource Planning : It involves summarizing and comparing the resources and spending.
0.2.83 Wite short note on fraud detaction. (Ref. Sec. 2.8.3) —_—_—(5 Marks)
understanding}¢— understanding
Scanned by CamScanner
[AEF usinoss intetigence (MU-B.Se.IT
Decision Makin
~ 1. Business understanding
First, it is required to understand business objectives clearly and find out what are the
business’s needs,
Next, we have to assess the current situation by finding the resources, assumptions,
Then, from the business objectives and current situations, we need to create data mining
Finally, a good data mining plan has to be established.to achieve both business and data
2. Data understanding
First, the data understanding phase starts with initial data collection, which we collect
from available data sources, to help us get familiar with the data.
Some important activities must be performed including data load and data integration in
Next, the “gross” or “surface” properties of acquired data need to be examined carefully
and reported.
Then, the data needs to be explored by tackling the data mining questions, which can be
Finally, the data quality must be examined by answering some important questions such
as “Is the acquired data complete?”, “Is there any missing values in the acquired data?”
3. Data preparation
The data preparation typically consumes about 90% of the time of the project. The
Once available tata sources are identified, they need to be selected, cleaned, constructed
a Aoesiatind into the desired form. The data-exploration task at a greater depth may be
carried during this phase to notice the patterns based on business understanding.
> 4. Modeling
First, modeling techniques have to be selected to be used for the prepared dataset.
Next, the test scenario must be gencrated to validate the quality and validity of the model.
Scanned by CamScanner
F
lo. 2.9.1 Draw diagram and explain data preparation. (Ref, Sec. 2.9)
Then, one or more models are created by running the modeling tool on the prepared
dataset.
Finally, models need to be assessed carefully involving stakeholders to make sure that
5. Evaluation
In the evaluation phase, the model results must be evaluated in the context of business
objectives in the first phase. In this phas¢, new business requirements may be raised due
to the new patterns that have been discovered in the model results or from other factors.
6. Deployment
The knowledge or information, which we gain through data mining process, needs to be
presented in such a way that stakeholders can use it when they want it.
Based on the business requirements, the deployment phase could be as simple as creating
In the deployment phase, the plans for deployment, maintenance, and monitoring have to ©
From the project point of view, the final report of the project needs to summary the
project experiences and reviews the project to see what need to improved created learned
lessons.
guidelines. In addition, the CRISP-DM can apply in various industries with different
types of data. ;
EEE —————
(5 Marks)
n of data into
a form suitable for further analysis and processing. It is a process that involves many
Scanned by CamScanner
—
igonce (MU-8.S0-IT-Sem™VI)
aration accoun
mining project. |
— Data preparation i
5 essential for successful data mining. Poor quality data typically result
in inc
Data Preparation
Fig. 2.9.1
- Data validation is about checking the information and to ensure that it complements he
data needs of the system. This removes the chances of errors. One of the many examples
checking the input data to ensure it conforms to the data requirements of the system [0
- An example of this is a range check to avoid an input number that is greater or smaller
i aan Ann
Scanned by CamScanner
| (ev Business Intelligence (MU-B.Sc.-IT-Sem-VI) 2-24 Mathematical Models for Decision Making
[a2tts Explain data transformation with suitable diagram. (Ref. Sec. 2.11) (5 Marks)
In data transformation process data are transformed from one format to another format,
Strategles
1. Smoothing
2. Aggregation
3. Generalization
4, Normalization :
5, Attribute Construction |
> 1. Smoothing
> 2. Aggregation
> 3. Generalization
hierarchies climbing.
> 4. Normalization
Normalization scaled attribute data so as to fall within a small specified range, such as 0.0
to 1.0.
Scanned by CamScanner
eee
[7 Business Intelligence (MU-B,Sc.-IT-Sem-VI) _2-25 Mathematical Models for Decision Makin
In Attribute construction, new attributes are constructed from the given set of attributes
database or date warehouse may store terabytes of data. So it may take very | Ong to
ooo
—,
Q. 2.12.1 Write short note on data Reduction, (Ref. Sec. 2.12) (5 Marks)
Data reduction techniques can be applied to obtain a reduced representation of the data set
Types of Data
Reduction Strategies
1. Data Cube Aggregation
2. Dimensionality Reduction
3. Data Compression
4. Numerosity Reductions
Aggregation operations are applied to the data in the construction of a data cube.
In dimensionality reduction redundant attributes are detected and removed which reduce
SS ————————“€
of 4 Numerosity reductions
Where raw data values for attributes are replaced by ranges or higher conceptual levels.
.—_—_———"——
| @.1 Whatare the different types of model? (Refer Section 2.2.1) (5 Marks)
| Q.2 Write short note on structure of mathematical model. (Refer Section 2.3) (5 Marks)
'Q.5 Write short note on Data Mining parameters. (Refer Section 2.6) (5 Marks)
| © Syllabus Topic : Data Mining Process
Q.6 Draw and explain architecture of data mining. (Refer Section 2.7) (5 Marks)
| Q.7 Write various application of data mining. (Refer Section 2.8) (5 Marks)
| Q.9 Write short note on fraud detection. (Refer Section 2.8.3) . (5 Marks)
| Q.10 Draw and explain data preparation. (Refer Section 2.9) . (5 Marks)
Scanned by CamScanner
athematical Models for Decision Makin
Q.12 Explain data transformation with suitable diagram. (Refer Section 2.11) (5 Marks)
Q.13 Write short note on data Reduction. (Refer Section 2.12) (5 Marks)
| Q00
Chapter Ends...
Scanned by CamScanner
ea CHAPTER
The term Business Intelligence (BI) refers to technologies, applications and practices for
the collection, integration, analysis, and presentation of business information. The main
(5 Marks)
reason behind Business Intelligence is to provide better business decision making.
These systems are data-driven Decision Support Systems (DSS). Business Intelligence is
sometimes used interchangeably with briefing books, report and query tools and executive
methodology which is very useful for decision making process which are complex.
Large amount of data can be easily accessed by individuals and organizations because of
Transactions are commercial, financial and administrative, making the data heterogeneous
in origin, content and representation. Emails, texts and hypertexts, and the results of
Their accessibility opens various scenarios and opportunities, and raises a rather
important question: is it possible to convert such data into information and knowledge thal
can then be used by decision makers to assist and improve the operation of enterprises and
of public administration?
Scanned by CamScanner
(er Business Intelligence (MU-B.Sc.-IT-Sem-V1) 1-2 Business Intelligence & Decision Support Sys.
Q. 1.2.1. Write short note on Effective and Timely decisions. (Ref. Sec. 1.2) (5 Marks)
In public or private organizations, decisions are made continuously. Such decisions can
prove to be critical, have long-term or short-term effects and involve people and roles at
various rankings.
— Most people reach their decisions mainly using simple and easy approaches, which use
specific elements such as experience, knowledge of the application domain and the
available information.
- Decision-making processes within today’s organizations are often too complex and
dynamic to be effectively dealt with through an intuitive approach, and instead require a
much stricter attitude based on analytical tactics and mathematical models.
- The marketing person of a cellular company realizes that most of the customers are
diverting towards other service provider due to better option and low cost. It is critical for
the company as it will reduce the number of customer which affects business.
— Socompany manager can decide conduct a customer retention campaign. With the help of
this campaign they can select the best target group which will maximize customer
— The main purpose of business intelligence systems is to provide skilled workers with tools
and methodologies that allow them to make effective and timely decisions.
* Effective decisions
- As a result, they are able to make better decisions and formulate action plans that a
— Turning to formal analytical methods forces decision makers to describe both the c
for accessing alternative choices and the mechanisms regulates the problem
investigation.
Scanned by CamScanner
ieep Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-3 Business Intelligence & Decision Support Sys.
Furthermore, the ensuing observation and thought lead to a better awareness and
@ Timely decisions
competition and high dynamism. As a consequence, the ability to rapidly react to the
actions of competitors and to new market conditions is a critical factor in the success or
Fig. 1.2.1 shows the benefits provided to organization, which can draw from the adoption
of a business intelligence system. When they face problem decision makers can ask
themselves a group of questions on the basis of that they can make analysis based on it.
If decision makers follow business intelligence system then the overall quality of the
- Alternative i
actions --
Business
intellingence
Therefore we can say that it is effective and advantageous to use a business intelligence
As we saw that, a big amount of data we can store into the systems of public and private
organizations.
This data can be from internal transactions of an administrative, logistical and commercial
Scanned by CamScanner
(7 Business Intelligence (MU-B.Sc.-IT-Sem-V1) 1-4 Business Intelligence & Decision Support Sys.
But even we collect it and store it systematically we cannot use it directly for decision-
making purposes. For that we need an extraction tools and methods which will convert
Q. 1.3.1 What do mean by data, knowledge and information? (Ref. Sec. 1.3) (5 Marks)
The difference between data, information and knowledge can be better understood
@ Knowledge
Knowledge means what we know. We build world map in our brain as we know.
It’s like a physical map which helps us to know where things are but it contains more than
together into a giant network of ideas, memories, predictions, beliefs, etc. It is from this
“map” that we base our decisions, not the real world itself.
Our brains constantly update this map from the signals coming through our eyes, ears,
nose, mouth and skin. We can’t currently store knowledge in anything other than a brain,
Everything is inter-connected in our brain. Computers are not artificial brains. Computers
don’t understand what they are processing, and can’t make decisions by themselves and it
The knowledge uses two sources to build it they are Information and data.
Data is a set of representation of plain facts. Data are the facts of the world.
For example, take yourself. You may be 6ft tall, have black hair and brown eyes. All of
this is “data”,
The confusion between data and information often arises because information is made out
Scanned by CamScanner
(er Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-5 Business Intelligence & Decision Support Sys.
We can perceive this data with our senses, and then the brain can process this. Human
beings have used data as long as we’ve existed to form knowledge of the world.
® Information
Information is used to expand our knowledge beyond the range of our senses. We can
capture data in information, and then move it about so that other people can access it at
different mediums.
For example if we click picture then photo is information how we look like is the data.
We can send the picture around through various medium without moving that person who
is in the picture. If we lose that photo it won’t change your look. In this case we lose
eee nc ee
Mathematical models and algorithms help decision makers to extract information and
knowledge from the data through the means of a business intelligence system.
approach of organization.
Example- a spreadsheet is used to estimate the effects on the fluctuations in interest rates
with the help of that decision makers can generate a mental representation of the financial
flows process.
Classical scientific fields, such as physics, have always resorted to mathematical models
Other areas, such as operations research, haye instead made full use of the application of
scientific methods and mathematical models to the study of artificial systems, for example
Scanned by CamScanner
(ET eusiness Intelligence (MU-B.Sc.-IT-Sem-Vi)__1-6 __Business Intelligence & Decision Support Sys.
een)
o They identify the objectives of the analysis and the performance indicators which
o Finally on the basis of variation in the control variable and changes in the parameters
Q. 1.5.1 Draw and explain architecture of Business Intelligence. (Ref. Sec. 1.5) (5 Marks)
Fig. 1.5.1, shows the architecture of a business intelligence system, which consist of three
Operational
systems
Multidimensional cubes
Data mining
Optimization
Scanned by CamScanner
ET) business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-7 Business Intelligence & Decision Support Sys.
Optimization
Data mining
primary and secondary sources, they are heterogeneous in origin and type.
- The sources for most of the part o data belongs to operational system which also includes
an unstructured documents like emails and data received from various external sources.
- ETL stands for Extract Transform Load. In an ETL process data is extracted from the
- The data from various sources are stored into a database which is made to support
business intelligence analysis, This database is called as data warehouse and data mart.
~ This data is extracted to provide input to mathematical model and support decision
makers.
3. Optimization models,
~ The pyramid in Fig. 1.5.2 shows pyramid of a business intelligence system. We have
Scanned by CamScanner
(7 Business Intelligence (MU-B.Sc,-IT-Sem-VI) 1-8 Business Intelligence & Decision Support Sys.
=» 3. Data exploration
This is the third level called as Data exploration. Data exploration is an informative search
which is used by data consumers to form real and true analysis from the information
collected Data Exploration is about describing the data by means of statistical and
visualization techniques.
We explore data in order to bring important aspects of that data into focus for further
For true analysis, this unorganized bulk of data needs to be narrowed down. This is where
data exploration is used to analyze the data and information from the data to form further
analysis.
Data often converges in a central warehouse called a data warehouse. This data can come
Relevant data is needed for tasks such as statistical reporting, trend spotting and pattern
4. Data mining .
The fourth level is data mining. Data mining technique has to be chosen based on the type
5. Optimization
If we go one level on top we get optimization models which allow us to select best
6. Decisions
The top most level of the pyramid is the decision where we need to select best alternative
decision.
Scanned by CamScanner
FT ousinass Intelligence (MU-B.Sc.-IT-Sem-V!)__1-9__ Business Intelligence & Decision Support Sy,
. Customers
Suppliers
nce ———
Business intellige!
" Evaluation |
Decision ~
‘ei
Ss
> 1. Analysis
In this phase we find out the problem and understand which path is critical for making
- For example if first phase shows the information of many customers who wants (0°
discontinue insurance policy after validity expires and second phase gives information
Scanned by CamScanner
(7 Business Intelligence (MU-B.Sc.-IT-Sem-VI)__1-10 Business Intelligence & Decision Support Sys.
_ In this phase information is carried out through the analysis phase. Insight Assessment
We provide world class test instruments supported by high quality customer service to
At each phase of the assessment process, we offer the instrumentation, data gathering
capacity and report options to guide you to your goal of demonstrating institutional
effectiveness.
=> 3. Decision
This is a third phase where decision makers take decision. The availability of BI helps
— This is an important phase which decides over all time for execution.
=> 4, Evaluation
This is the final phase of cycle which performance measurement and evaluation.
@. 1.53 Draw and explain phases of Business Intelligence. (Ref. Sec. 1.5.2) _( Marks)
=~ 1. Analysis
This step is about analyzing the performance of the software at various stages and making
notes on additional requirements. Analysis is very important to proceed further to the next
This phase consist of some interviews and knowledge of workers who performs various
roles in the organization. We also needs to decide costing and benefits of developing
> 2. Design
which is basically
Scanned by CamScanner
-IT-Sem-VI). 1-11 Business Intelligence & Decision Support Sys
Per es
he ll
+> 3. Planning
The main purpose of the planning phase is to know the requirement and understand.
opportunities, In this we need to find out cost, time, and benefits of the system.
What is the scope of the system? What will be the problem and solution for it?
Without the perfect plan, calculating the strengths and weaknesses of the project,
The actual task of developing the software starts here with data recording going on in the ©
background. Once the software is developed, the stage of implementation comes in where *) ‘
the product goes through a pilot study to see if it’s functioning properly.
‘3
sarees
Scanned by CamScanner
eT Business |
_- A Metadata achieve should be created for this ETL procedures are used. And finally the
Multidimensional cubes
Relational
marketing ( )
Click stream o
analysis C ) Optimization
Campaigns (_)
optimization
analysis
Sales force ( )
: C) Risk analysis
planning
management analysis
optimization scorecard
EEE ___EESESS
~ The type of ethics in Business Intelligence (BI) is the ethical principles of conduct that
professional ethics and not to be confused with other forms of philosophical ethics
- Professional ethics according to Griffin (1986) is that profit is not the only important
- Companies must acknowledge that they have a common good to protect their local
community, improve employee relations and promote informational press to the public.
While back in 1986, Griffin was directing his argument towards ethics in accounting but it
Scanned by CamScanner
[GFP eusiness Intetigence (MU-B.Sc-IT-Sem-VI)_ 1-13 Business Inteligence & Decision Support Sys,
- Government regulations are not changing fast enough to cover all the changes in
code of ethics, and to persistently be receptive to the needs of the public being served.
there decisions that regards the consumer, business and/or other employees data. Ethics js
sometimes it may involve illegal practices, other times it is just a decision that needs to be
cheaper data in his/her data mining activities to save money. The data he/she chooses to
— The cheaper data sets have a 20% possibility of being incorrect. The manager did not see
it as being an unethical decision when it was made, just a way to continue to generate
- The impacting decision on 20% of the company’s customers may have different results as
more people are turned down for credit because inaccurate reports. It is not a crime to
have implemented the inaccurate data sets but it may seem as an unethical practice to
others.
- While it is important for managers to be able to make their own decisions, this example
decision being made should have involved more managers since it affected the whole
business.
- The manager’s choice could bankrupt the company as user start to leave their business for
more accurate competitive companies. As the example points out, sometimes there is no
really clear answer to wither an issue involves an ethical or legal choice and each situation
can be different.
- Trying to make decisions based on individuals’ beliefs when dealing with a company can
amount to intellectual stalls and trying to come to a decision can be expensive and time
consuming.
- Today’s society.has come to the point where there are more solutions to problems than
ever before. What once was impossible can now be accomplished through the use of BI
Scanned by CamScanner
(ar Business Intelligence (MU-B.Sc.-IT-Sem
It is not going to stop; technology is going to keep advancing. What seems improbable
and customers, companies and competitors than there was when everything was done
“ ‘+ ‘
- Larger separation between companies and the consumer has resulted in unethical and
- Because of all the technology used in big businesses, and resulting exposure to unethical
practices by some of the larger corporations like Enron, there is growing anxiety of large
- Additionally the general trust level of users has eroded to the point were trust really has to
be earned. Users are very aware of cases of identity information being lost to theft as well
—E—— Sr
business data and presents it so that users who can make business decisions more easily.
A DSS allows users to compile information which can be used to solve problems and
The advantage of decision support system is that it includes more informed decision-
making, timely problem-solving and improved efficiency for dealing with problems with
Scanned by CamScanner
[GFF Business intoligence (MU-B.Sc-1T-SemV)__
1.8
Definition of System
Q. 1.8.1 Explain system with neat diagram. (Ref. Sec. 1.8) __ (5 Marks)
The term system is widely used in everyday language: for example, we refer to the solar
All these systems contains a common characteristic, which can be used for abstract
definition of the notion of system: each of them is made using collection of components
which are some way connected to each other to get the single collective result and a
common purpose.
Every system is characterized by boundaries that separate its internal components from —
the external environment. A system is also called as open if its boundaries can be crossed
When such weakness is lacking in the system then it is knows as closed. In other words,
any system receives specific input flows, and gives an internal transformation process
This definition of the system can be used to describe a broad class of real-world
‘phenomena.
From the Fig. 1.8.1 it can be seen it uses a structure for describing concept of the system.
In this system it receives a group of input flows then returns a group of output flows from
Measurable performance indicators are used to assess effectiveness and efficiency of the
The Fig. 1.8.1 shows the main types of metrics which is used to evaluate systems
embedded within the enterprises and the public administration.
generates an output flow i.e. fed back into the system itself as an input flow, possibly
Scanned by CamScanner
[42F eusiness inteligence (MU-B.Se.IT-Som
aL}
External conditlons
Input
System
e materials
e services:
_ information —
Transformation
process
Intermal conditions _
® products
esarvices _
_ «information
- System performances
rofitabitity © dependability
. Fig. 1.8.1: Abstract representation of a system
- System which modifies their output flaws depending upon feedback is known as closed
cycle system. For example, the closed cycle system explained in Fig. 1.8.2 describes the
XY
\
/
The sales results of each campaign are collected and used as feedback input to design
subsequent marketing promotions so that they can make decision and improve the system.
It is very important for decision-making process. For this purpose we can use two main
Scanned by CamScanner
[ep Business Intelligence (MU- -B.Sc.-IT-Sem-Vl)__1- 17 Business 3 Intelligence & Decision n Support
Ss 8,
f Effectiveness
Effectiveness means whether we are achieving desired outcome or not. In other word
@ Efficiency
- Effectiveness metrics shows that whether the right action is being taken or not, whereas
efficiency metrics is used to check whether taken action is best possible way or not.
Sy
- To build effective DSSs, we first need to describe in general terms how a decision-making
process is joined,
- We wish to understand the steps that lead individuals to make decisions and the extent of
the influence applied on them by the subjective attitudes of the decision makers and the
professional life.
- It’s plays vital role to achieve desired goal. We are focusing on decision which is made by
- This decision is used to developing strategic plan. The decision-making process is used
for problem solving, individuals fills the gap between current system’s operating
— In other words, the transition of a system towards the desired state implies overcoming
certain obstacles and is not easy to attain. It will force decision makes to devise a set of
alternative best options to get the required goal, and then it will make a decision based on
- Therefore, the decision selected should be put into use first then check whether it has
enabled the planned objectives to be achieving goals. When this fails then problem is
Scanned by CamScanner
Business Intelligence (MU-B.Sc.-IT-Sem-Vl) _ 1-18 Business Intelligence & Decision Support Sys.
Toe OO ———————__eo’
Fig. 1.9.1 shows the process of the problem-solving. The alternatives represent the
possible actions targeted for solving the given problem and helping to achieve the planned
objective. ;
Sometime number of alternatives available can be less. While making decision of granting
Joan of an applicant there are only two alternatives available they are either approve or
reject.
But in other cases there can be many alternatives where we need to select best alternative
_ Environment _
Criteria are used to measure effectiveness of the various options and correspond to the
different kinds of system performance shown in Fig. 1.9.1 shows rational approach to
decision making where best alternative is selected among all other alternative.
Apart from economic criteria, which tend to prevail in the decision-making process within
companies, it is however possible to identify other factors influencing a rational choice.
Factors Influencing
, a rational cholce
2. Technical
3. Legal
4, Ethical.
6. Political
Scanned by CamScanner
[GFP Business intoligence (MU-B.Sc-.T-S
em-V1)
1. Economic
Economic is the most important and influential factors for making decisions. It is also
For example, an annual logistic plan can be used rather than other alternative plans to
2. Technical
For instance, a production plan which exceeds the maximum capacity of a plant cannot be
3. Legal
In this means decision maker should verify whether it is compatible with the legislation in
In this decision maker should follow certain principles and. social rules related to the
system.
5. Procedural
A decision can be considered ideal from an economic, legal and social Standpoint, but it
6. Political
The decision maker can access the political consequences of a specific decision from
The process of evaluating the alternatives can be divided into two main phases as shown
In first phase i.e. exclusion it checks rules and restriction of the alternative. In this
process, some alternatives can be rejected from consideration; others represent feasible
options which represent evaluation. In second phase best alternatives are compared based
on their performance,
Scanned by CamScanner
| (7 Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-20 Business Intelligence & Decision Support Sys.
eS ——————eee
+ ( Altemative options )
T Ld re
- Constraints —
* operational
« technical
* procedural
e legal
esocial -
litical
Exclusion
+34
ay
Feasible options
° profitability: |
« overall cost
Evaluation
1960s and remains today a major methodological reference. The model consist of three
- Fig. 1.9.4 shows an enhanced version of the original scheme, It has additional two stages
Scanned by CamScanner
ep Business Intelligence (MU-B.Sc.- -IT-Sem-VI) __
First phase of the decision-making process is Intelligence Phase. In this phase, decision
This phase is very important in decision making process as we are trying to identify
problems. |
For example, we like to practice Lean Startup methodology which emphasizes importance
of right problem definition before building anything that can be product or business.
Additionally, one of the Digital Transformation pillars is the aa, Organizations should
become data-driven.
That means proper usage and implementation of Business Intelligence (BI) systems.
Business Intelligence implementations are considered successful only if you have clear
Business Intelligence is not just about data. It should be ganeetel with organizational
The intelligence phase can really remain for long time. But, since decision-making
process starts with this phase, it should be long as it has to be done properly.
2. Design Phase
The main aim of this phase is to define and construct a model which represent a system. It
Once we validate the model, we define the criteria of choice and search for several
possible solutions for defined problem (opportunity). In this phase we need to predict the
3. Choice Phase
In this phase we are actually making decisions by selecting best alternative. The end
we are sure that the decision we made can actually be achieved:and then we can move
4. Implementation Phase
All the previous steps we’ve made (intelligence, design and choice) are now
implemented.
Scanned by CamScanner
(77 Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-22 Business Intelligence & Decision Support Sys.
eee anaes
earlier phase.
We described Simon’s model which, even today, serves as the basis of most models of
decision-making process. The process describes series of events that precede final
decisions. . |
It is important to say that, at any point, the decision maker may choose to return to the
previous step for additional validation. This model is a concept, a framework of how
5. Control Phase | ‘
Once we are done with all the paases it is very important to check whether everything is
This is the final stage of rational decision-making process, wherein, the outcomes of the
decision are measured and compared with the predetermined, desired goals.
If there is a discrepancy between the two, the decision-maker may restart the process of
ee AG RP rae
(6 Marks)
Decision supports systems can be group of are group of manual or computer-based tools
Good decision support systems will help us perform a wide variety of functions, including
Previously regarded as primarily a tool for big companies, DSS has in recent years come
Strategic
Tactical
Operational
Many analysts categorize decisions according to the degree of structure involved in the
all three components of a decision the data, process, and evaluation are determined.
Since structured decisions are made regularly in business environments, it makes sense to
place a comparatively rigid framework around the decision and the people making it.
Structured decision support systems are easy to use a checklist or form to so that we can
ensure that all necessary data are collected and that the decision making process there is
no data missing. .
If the choice is also to support the procedural or process component of the decision, then
it is quite possible to develop a program either as part of the checklist or form. It is also
important to develop computer programs which will collect and combine all data.
When there is a need to make a decision more structured, the support system for that
Many firms who hire individuals: without a great deal of experience provide them with
detailed guidelines on their decision making activities and support them by giving them
little flexibility.
One interesting consequence of making a decision more structured is that the liability for
inappropriate decisions is shifted from individual decision makers to the larger company
or organization.
2. Unstructured Decisions
It has same components like structured decision they are data, process, and evaluation:
Unstructured decisions are made when all elements of the business environment
Scanned by CamScanner
_— —
i.e. customer expectations, competitor response, cost of securing raw materials, etc. are
Unstructured decision systems typically focus on the individual who or the team that will
make the decision. These decision makers are usually entrusted with decisions that are
value.
One approach to support systems in this area is to construct a program that simulates the
understand the role that individuals experience or expertise plays in the decision and to
3. Semi-Structured Decisions
Decisions of this type are characterized as having some agreement on the data, process,
which often have limited technological or work force resources. This unstructured or
semi-structured nature of these decisions situations can create the problem of limited
resources and staff expertise available to a small business executive to analyze important
decisions appropriately.
4. Strategic decisions
Strategic decisions are used for taking action or a major part of business enterprise. They
help to achieve common goals of the enterprise. They have long-term implications on the
business enterprise.
They may involve major departures from practices and procedures being followed earlier.
Usually, strategic decision is unstructured therefore a manager has to apply his business
These decisions are based on partial knowledge of the environmental factors which can be
uncertain or dynamic, These types of decisions are taken at the higher level of
management. :
5. Tactical decisions
Scanned by CamScanner
n Su
- These decisions relate to day-to-day operations of the enterprise. They have a short-term
horizon as they are taken repetitively, It does not require business judgements and it ig
needed for helping the manager to take rational, well informed decisions, information
Source - Intem
Nature of information lz a
- The characteristics of the information very useful in a decision-making process which will
change depending upon the scope of the decisions to be supported, and consequently also
- Fig. 1.9.6 shows variations in the characteristics of the information as the scope of the
decisions changes, The scheme may be used as an assessment tool while designing a DSS.
Q. 1.9.4 _Whatare the approaches of decision making process? (Ref. Sec. 1.9.4) (5 Marks
. Approaches to
Scanned by CamScanner
[7 Business Intelligence (MU-B.Sc.-IT-Sem-Vl)__ 1-26 Business Intelligence & Decision Support Sys.
This approach assumes that decision-makers operate with bounded rationality instead of
Bounded rationality is the idea which decision makers cannot deal with information about
all the aspects and alternatives pertaining to a problem and therefore choose to tackle
Thus, this process is not exhaustive and completely rational solutions are not entirely
ideal. | , |
Decision-makers operating with bounded rationality restrict the inputs to the decision-
making process, focus their attention on two or three most favorable alternatives, process
these in great detail and base their decisions on judgment and personal biases as well as
logic.
- This approach combines the steps of the rational approach with the worthwhile features
and conditions in the behavioural approach to make more realistic Pi pIneons for making
decisions in institutions.
— This approach states that decision-maker should try to go beyond rules of thumb and
satisfying limitations and generate as many alternatives as possible within the given time,
- Here, the rational approach provides an analytical framework for making decisions while
_ the behavioural approach provides a moderating influence.
- The preceding three approaches explicitly explain the processes involved into decision-
making.
— However, they do not throw light on how people take decisions when they are nervous,
ne ES
Lo. 1.10.1, Write short not on evolution of Information Systems. (Ref. Sec. 1.10) (5 Marks)
Scanned by CamScanner
FP eusiness Intelligence (MU-B.Sc.-IT-Sern-V1)
software, infrastructure and standards that are designed to create, modify, store, manage
and distribute information to suggest new business strategies and new products.
It leads to efficient work practices and effective communication to make better decisions
Transaction
Processing
System
organization organization
managers wide |
Q. 1.11.1 Draw structure of DSS and explain, (Ref, Sec. 1.11) (5 Marks) |
organizes and analyzes business data to facilitate quality business decision-making for
Scanned by CamScanner
(FP Business Intelligence (MU-B.Sc.-IT-Sem-VI)__ 1-28 Business Intelligence & Decision Support Sys.
_— A well-designed DSS aids decision makers in compiling a variety of data from many
executives and business models. DSS analysis helps companies to identify and solve
3. The hardware.
Components of Decision
Support System
1. Dialogue management
2. Model management
3. Database management
Consists of the three sub systems; known as the user interface, the dialogue control, the
request translator.
Scanned by CamScanner
1
ep Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-29 Business Intelligence & Decision Support Sys. i
- The user interface sub system controls the physical user interface.
— It also manages the appearance of the screen and also accepts the input from the user and
- The user interface sub system is also responsible for checking the user commands for the
correct syntax.
— The dialogue control sub system is responsible for the maintenance of the processing
— The request translator helps in the translation of the user command into the actions for the
model management or the data management components into such a pattern that can be
‘The command processor delivers those commands from the dialogue management
components to either the model base management system or the mode execution system
— Works under the guidance of the either the mode] management component or the dialogue
management component.
- Helps in the maintenance of the interface with the data sources that are generally external
==
Development of a Model
Q. 1.12.1 What are the phases of DSS ? (Ref. Sec. 1.12) (& Marks)
— DSSs are usually not available as standard programs like software applications, such as
Scanned by CamScanner |
&7P Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-30 Business Intelligence & Decision Support Sys.
— Fig. 1.12.1 shows the major steps involved in the development of a DSS.
Requirements
> 1. Requirement
In this phases gather information and make report of the entire requirement
> 2. Planning
- The main purpose of the planning phase is to know the requirement and understand
opportunities. In this we need to find out cost, time, and benefits of the system. What is
- What will be the problem and solution for it? Without the perfect plan, calculating the
Planning kicks off a project flawlessly and affects its progress positively.
> 3. Analysis
Scanned by CamScanner
!
[7T Business Intelligence (MU-B.Sc.-IT-Sem-VI) 1-31 Business Intelligence & Decision Support Sys.)
:|
=> 4. Design
- Once the analysis is complete, the step of designing takes over, which is basically
- This step helps remove possible flaws by setting a standard and attempting to stick to it.
> 5. Implementation
— The actual task of developing the software starts here with data recording going on in the .
background. |
— Once the software is developed, the stage of implementation comes in where the product
=> 6. Testing
The testing stage assesses the software for errors and documents bugs if there are any.
= 7. Maintenance
Once the software passes through all the stages without any issues, it is to undergo a
maintenance process wherein it will be maintained and upgraded from time to time to
adapt to changes.
=> 8. Delivery
— Successful project delivery requires the implementation of management systems that will
control changes in the key factors of scope, schedule, budget, resources, and risk to
- This section offers guidance for the entire team to successfully and effectively optimize
Q.2 Write short note on Effective and Timely decisions. (Refer Section 1.2) (5 Marks)
Q.3 What do mean by data, knowledge and information ? (Refer Section 1.3) (5 Marks)
Scanned by CamScanner
rarer
IF) Business Intelligence (MU-B.Sc.-IT-Sem-VI)__1-32_ Business Intelligence & Decision Support Sys.
Q.4 — Write short note on the role of mathematical models. (Refer Section 1.4)
Q.5 Draw and explain architecture of Business Intelligence. (Refer Section 1.5)
Q.7 Draw and explain phases of Business Intelligence. (Refer Section 1.5.2)
Q.8 What are the ethics of Business Intelligence ? (Refer Section 1.6)
Q.9 Write short note on Decision Support System. (Refer Section 1.7)
Q.15 Write short not on evolution of Information Systems. (Refer Section 1.10)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
(5 Marks)
000
Chapter Ends...
Scanned by CamScanner