Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

Agenda

• Understanding Business

• Understanding the problem

• Frame the problem / Frame the Decision

• Understanding CRIPS-DM

• Decisions First software


• Analytics and Decision Making (Exercise)
Understanding the Business
Data Scientist Day to Day Activities (CRISP-DM)
Business Data
Data Preparation Modeling Optimization Deployment
Understanding Understanding

Determine Business Transform/Fix Data Select Planning


Design Features Model Selection
Objectives Target Variable Normalization The Model Deployment

Frame the Problem Collect Redundant & Data Model Monitoring &
Assess Feasibility Initial Data Split Data
Duplicates Factorization Optimization Maintenance

Define Success Install & Import Data Quality Audit Data Data Parameters
Measurements Packages (Missing Values) Binarization Scaling Final Report
Tuning

Identify Target Read the Data Quality Audit Data


Variables (Y) Data Dummy Model Lessons Learned
(Outliers) Standardizing
Data
Identify Analytical Data Quality Audit Data
Manipulation & Build Model
Approach (Cardinality Check) Correlations
Wrangling
Data
Identify Exploratory Data Data Fit Model
Aggregation
Deployment Plan Analysis (EDA) Conversion (Train)
Binning
Produce Data Data Data Predict
Project Plan Visualization Transformation Decomposition (Test)

Identify the team Statistical Feature Feature Assess &


& Stakeholders Analysis Engineering Selections Evaluate
(Importance, Low variance, PCA)

Analytics Base Code Book


Data Version 2/3/4 Best Model Best Parameters ROI
Table (ABT) Quality Report
Analytical Team
• Find Assumptions • Explain Data
• Drive Questions • Build Data
• Know the business Dictionary

Research Research
Lead Lead
(Domain (Data
Expert) Expert)

Project
Manager Data Scientist

• Democratize Data • Prepare Data


• Distribute Results • Select tools
• Enforce Learning • Present Result
The Three-Legged Stool

One way to understand the collaborations that lead to predictive modeling success is to think of a three-legged stool. Each leg is critical
to the stool remaining stable and fulfilling its intended purpose. In predictive modeling, the three legs of the stool are:

Domain experts : Needed to frame a problem properly in a way thatwill provide value to the organization

Data or database experts : Needed to identify what data is available for predictive modeling and how that data can
be accessed andnormalized

Predictive modeling experts : Needed to build the models that achieve the business objectives
Specialist Traditional Data Science Team

Data Scientist (DS)


– Prepares data, engineers features, most valuable skill: training models.

Data Engineer (DE)


– Data acquisition focus. Build data pipelines. Not uncommon to have 5:1 ratio DE:DS

Data Analyst (DA)


– Assist DS with data prep

Application architect (AA)


– Design complete solution; deploy and maintain models in production
Typical Data Science Project

DS DS DS DS DS

DE DA

AA AA AA

Understand ID and procure Prepare data and Train model Deploy models Update models
business objectives training data build new features
Key Roles for a successful Analytics Project
Framing a Decision
Outline what decision is being considered, why it is important, what data is needed, who will provide input, etc.

• Business Understanding
• Data Understanding
• Data Preparation
Decision Making & Analytics Combined

1- Framing a decision (outline what decision is being considered, why it is important, what data is needed, who will provide input,
etc.)

2- Analyzing a decision (what kind of analytical approach is needed, what does it show , what it mean , etc.)

3- Implementing a decision ( how do I make use of the decision, what can I expect, what else should be considered, how do I "sell"
the result, etc..)

Framing a Decision Analyzing a Decision Implementing a Decision


Setting up the Problem
Predictive Analytics Processing Steps: CRISP-DM

• The Cross-Industry Standard Process Model for Data Mining (CRISP-DM) describes the data-mining process in six steps. It
has been cited as the most- often used process model since its inception in the 1990s.

• One advantage of using CRISP-DM is that it describes the most commonly applied steps in the process and is documented in an 80-
page PDF file.

• The CRISP-DM name itself calls out data mining as the technology, but the same process model applies to predictive analytics and other
related analytics approaches, including business analytics, statistics, and text mining.
CRISP-DM (Feedback Loops )
▪ Note the feedback loops in the figure. These indicate the most common ways the typical process is modified based on
findings during the project.

▪ For example, if business objectives have been defined during Business Understanding, and then data is examined during
Data Understanding, you may find that there is insufficient data quantity or data quality to build predictive models.

▪ In this case, Business Objectives must be re-defined with the available data in mind before proceeding to Data
Preparation andModeling.

▪ Or consider a model that has been built but has poor accuracy. Revisiting data preparation to create new derived
variables is a common step to improve the models.
Data Modeling Block #1 : Data

Block #2 : Build Model Block #3 : Inter hidden variables Block #4 : Predict & Explore

➢ Supervised vs. unsupervised : Block #1 and #4

➢ Probabilistic vs non-probabilistic : Primarily Block #2 (some Block #3)

➢ Model development (block #2) vs. optimization techniques (Block #3)


Framework to Manage ML Projects
CRISP-DM
A Standard Methodology to Ensure a Good Outcome

• It is often said that 80% of the work done on predictive data analytics projects is done in the Business
Understanding, Data Understanding, and Data Preparation phases of CRISP-DM, and just 20% is spent on the
Modeling, Evaluation, and Deployment phases. Why do you think this would be the case?
Case Studies
Business Problem

1. Insurance companies that sell life, health, car, property and casualty insurance are using
having a fraud investigation team that investigates up to 30% of all claims made, We are still
losing too much money due to fraudulent claims”. The company wishes to drive
improvements in customer service, fraud detection, and operational efficiency.

2. Corporation Bank provides loans to businesses across industry verticals like education, real
estate, BFSI, retail, logistics, mining etc. The bank wishes to reduce the percentage of non-
performing assets from its portfolio

3. An online charity (DonorsChoose.org) allows teachers to propose projects requesting


funding for their students. Information is available about project essays from teachers,
resources requested for the projects and information about the projects. The online charity
wishes to know which projects will receive funding
Case Studies

Auto Insurance Case Study Banking Case Study Online Charity Seeking Donations

Corporation Bank provides loans An online charity (DonorsChoose.org)


“In spite of having a fraud to businesses across industry allows teachers to propose projects
investigation team that investigates verticals like education, real requesting funding for their students.
up to 30% of all claims made, We estate, BFSI, retail, logistics, Information is available about project
are still losing too much money mining etc. The bank wishes to essays from teachers, resources
due to fraudulent claims” reduce the percentage of non- requested for the projects and
performing assets from its information about the projects. The
portfolio online charity wishes to know which
projects will receive funding
Framework For Formulation
Setting up the Problem

▪ The most important part of any Data Science project is the very beginning when the Data Science project is defined.

▪ Setting up a Data Science /predictive modeling project is a very difficult task because the skills needed to do it well are very
broad, requiring knowledge of the business domain, databases, or data infrastructure, and predictive modeling algorithms
and techniques.

▪ Very few individuals have all of these skill sets, and therefore setting up a predictive modeling project is inevitably a team
effort.
Business Problem Framing

• Business Problem Framing is the most important step you will take in your analytics project.

• If you mess this up, no matter how good your number crunching is, you risk a Type II error - providing the right answer to the
wrong question. Since it is not your problem, but someone else's problem that you must frame, it can get awkward. You need to
talk with the right people, without knowing who the right people are, ask the right questions, without knowing what questions to
ask, and even ask your questions in the right way!

• You must consider carefully the impacts of framing when choosing your questions, selecting measures and presenting results.
Business Problem Framing
Business problem framing can be more challenging than you think. This is because there are four factors that are competing for your
attention:

• Stakeholders - "Where you stand depends on where you sit!" Stakeholders will usually define the problem from their own
interests and perspective. You can plan to make stakeholders part of the solution by doing good stakeholder analysis and
interviews.

• Problem Complexity - Business problems usually have many causes and symptoms.

• Analysts' Expertise- "If all you have is a hammer, every problem is a nail." Guilty as charged

• Solutions in Search of a Problem - It is very tempting to simply try to fit an available solution to the problem. This reuse
of solutions can be efficient, but it can also ignore important aspects of the new problem. Don't "pound a square peg into a
round hole." Instead, continuously expand your set of good solutions whenever you have time.

You won't know all the answers when you have to make the call on "what is the business problem", so give yourself a few degrees of
flexibility and an escape door, just in case!
Business Problem Framing
Data Scientist Day to Day Activities (CRISP-DM)
Business Data
Data Preparation Modeling Optimization Deployment
Understanding Understanding

Determine Business Transform/Fix Data Select Planning


Design Features Model Selection
Objectives Target Variable Normalization The Model Deployment

Frame the Problem Collect Redundant & Data Model Monitoring &
Assess Feasibility Initial Data Split Data
Duplicates Factorization Optimization Maintenance

Define Success Install & Import Data Quality Audit Data Data Parameters
Measurements Packages (Missing Values) Binarization Scaling Final Report
Tuning

Identify Target Read the Data Quality Audit Data


Variables (Y) Data Dummy Model Lessons Learned
(Outliers) Standardizing
Data
Identify Analytical Data Quality Audit Data
Manipulation & Build Model
Approach (Cardinality Check) Correlations
Wrangling
Data
Identify Exploratory Data Data Fit Model
Aggregation
Deployment Plan Analysis (EDA) Conversion (Train)
Binning
Produce Data Data Data Predict
Project Plan Visualization Transformation Decomposition (Test)

Identify the team Statistical Feature Feature Assess &


& Stakeholders Analysis Engineering Selections Evaluate
(Importance, Low variance, PCA)

Analytics Base Code Book


Data Version 2/3/4 Best Model Best Parameters ROI
Table (ABT) Quality Report
Business Business Understanding (Problem Framing)
Understanding

Determine Business
Objectives

Frame the Problem


Understand a business problem and determine whether the problem is amenable to an analytics
Assess Feasibility solution

Define Success
Measurements
• Obtain or receive problem statement and usability requirements
Identify Target
Variables (Y) • Identify stakeholders
Identify Analytical
• Reformulate problem statement as an analytics problem
Approach • Refine the problem statement and delineate constraints
Identify • Define key metrics of success and initial set of business benefits
Deployment Plan
• Obtain stakeholder agreement on the problem statement
Produce • State the set of assumptions related to the problem
Project Plan

Identify the team


& Stakeholders

Analytics Base
Table (ABT)
Typical Data Science Process
What is the What data do we
problem(s) we need and how This usually takes 60
need to solve? do we get it? – 70% of any analysis

Ask Research
How can we use
the conclusions in Sanity check Model
the real world? yourself before Which method(s)
you… to use?
Interpret
How does the Do the model and
model generalize assumptions work
to real world data? as expected?

Test Validate
Framework for Formulation

As-Is State To-Be State Success? Entity? Target? Evaluation


Metric?

• Classification
• Imbalanced: AUC, F-
score,…
• Balanced: Accuracy,
• One sentence to
• One-sentence to • What do we seek to • What is the Log Loss
summarize the
summarize current do move from the lowest • What do we • Regression: MSE,
future state
state current state to the granularity of the want to be better RMSE, …
• What will the
• Describe the future state? object that we at? • Unsupervised
‘focus metric’ be
“focus metric” seek to improve?
• Anomaly Detection:
like/Wishlist?
AUC, F-score
• Segmentation:
• Accuracy


Convert a Business Problem into a ML Problem
Insurance companies that sell life, health, car, property and casualty insurance are using having a fraud investigation team that
investigates up to 30% of all claims made, We are still losing too much money due to fraudulent claims”. The company wishes to
drive improvements in customer service, fraud detection, and operational efficiency.

• Business Defined the root of money loss (Investigating 30% of all claims, but still losing money )
As-Is
State

• ML based model that determines a fraud based on the claim information provided
To-Be
State

• Reducing the amount of lost money by 50% the first 6 months and 100% the second 6 months
Success

• State of the of a claim


Entity

• If the claim is ‘Fraud’ requiring an alert to be set


Target

• Accuracy (Confusion Matrix)


Evaluation
Metric
Convert a Business Problem into a ML Problem
An online charity (DonorsChoose.org) allows teachers to propose projects requesting funding for their students. Information is
available about project essays from teachers, resources requested for the projects and information about the projects. The online
charity wishes to know which projects will receive funding

• Business Rules Based Display of Projects Seeking donations on the website


As-Is
State
• Prominent Display of Projects most likely to get donations with the projects identified using Machine
To-Be Learning
State

• Increased %age of Projects getting funding


Success

• Project
Entity

• Whether the project receives Funding


Target

• Area Under Curve


Evaluation
Metric
Convert a Business Problem into a ML Problem
Corporation Bank provides loans to businesses across industry verticals like education, real estate, BFSI, retail, logistics, mining etc.
The bank wishes to reduce the percentage of non-performing assets from its portfolio

• Propensity to Default Model that ranks applications in their likelihood of default amounts
As-Is
State

• Model with increased level of performance that reduces the Non-Performing Assets in the future
To-Be
State

• Lower Non-Performing Assets


Success

• Loan Application
Entity

• The default amount in INR


Target

• Mean Squared Error


Evaluation
Metric
Objective
&
Unit of Analysis
Converting Business Problems into Analytics Solutions

▪ Organizations don’t exist to do predictive data analytics. Organizations exist to do things like make more money, gain new
customers, sell more products, or reduce losses from fraud.

▪ Unfortunately, the predictive analytics models that we can build do not do any of these things.

▪ The models that analytics practitioners build simply make predictions based on patterns extracted from historical datasets.

▪ These predictions do not solve business problems; rather, they provide insights that help the organization make better decisions to
solve their business problems.
Business Objectives
• A key step in any data analytics project is to understand the business problem that the organization wants to solve and based on
this, to determine the kind of insight that a predictive analytics model can provide to help the organization address this problem.
This defines the analytics solution that the analytics practitioner will set out to build using machine learning. Defining the analytics
solution is the most important task in the Business Understanding phase of the CRISP-DM process.

• In general, converting a business problem into an analytics solution involves answering the following key questions:

1. What is the business problem? What are the goals that the business wants to achieve? : Unless a project is focused on
clearly stated goals, it is unlikely to be successful.

2. How does the business currently work? : Analytics practitioners must, however, possess what is referred to as situational
fluency

3. In what ways could a predictive analytics model help to address the business problem? : (1) the predictive model that
will be built; (2) how the predictive model will be used by the business; and (3) how using the predictive model will help
address the original business problem
Business Objectives
Six key issues that should be resolved during the Business Understanding stage include definitions of the following:

■ Core business objectives to be addressed by the predictive models

■ How the business objectives can be quantified

■ What data is available to quantify the business objectives

■ What modeling methods can be invoked to describe or predict the business objectives

■ How the goodness of model fit of the business objectives are quantified so that the model scores make business sense

■ How the predictive models can be deployed operationally


Business Understanding

Every Data Science project NeedsObjectives.

▪ Domain experts who understand decisions, alarms, estimates, or reports that provide value to an organization must define theseobjectives.

▪ Analysts themselves sometimes have this expertise, although most often, managers and directors have a far better
perspective on how models affect theorganization.

▪ Without domain expertise, the definitions of what models should be built and how they should be assessed can lead to failed
projects that don’t address the key business concerns.
Business Understanding
Frequently, the compromises reached during discussions are the result of the imperfect environment that is typical in most
organizations.

• Data that you would want to use in the predictive models may not be available in a timely
manner or at all.

• Target variables that address the business objectives more directly may not exist or be able to be quantified.

• Computing resources may not exist to build predictive models in the way the analysts
would prefer.

• Or there may not be available staff to apply to the project in the timeframe needed.

Project managers need to be realistic about which business objectives can be achieved in the timeframe and within the budget
available.
Defining the Unit of Analysis

Data for predictive modeling must be two-dimensional, comprised of rows and columns

Each row represents what can be called a unit of analysis.


▪ For customer analytics, this is typically a customer.
▪ For fraud detection, this may be a transaction.
▪ For call center analytics, this may refer to an individual call.
▪ For survey analysis, this may be a single survey.

The unit of analysis is problem specific and therefore is defined as part of the Business Understanding stage of predictive modeling
Problem Statement
Framing
Business Problem(Motor insurance Company)

[Claim prediction] A model could be built to predict the likelihood that an insurance claim is fraudulent. This model could be used to
assign every newly arising claim a fraud likelihood, and those that are most likely to be fraudulent could be flagged for investigation
by the insurance company’s claims investigators. In this way the limited claims investigation time could be targeted at the claims that
are most likely to be fraudulent, thereby increasing the number of fraudulent claims detected and reducing the amount of money lost to
fraud.
Business Problem(Motor insurance Company)

[Member prediction] A model could be built to predict the propensity of a member to commit fraud in the near future. This model
could be run every quarter to identify those members most likely to commit fraud, and the insurance company could take a risk
mitigation action ranging from contacting the member with some kind of warning to canceling the member’s policies. By
identifying members likely to make fraudulent claims before they make them, the company could save significant amounts of
money.
Business Problem(Motor insurance Company)

[Application prediction] A model could be built to predict, at the point of application, the likelihood that a policy someone has
applied for will ultimately result in a fraudulent claim. The company could run this model every time a new application is made
and reject those applications that are predicted likely to result in a fraudulent claim. The company would therefore reduce the
number of fraudulent claims and reduce the amount of money they would lose to these claims.
Business Problem(Motor insurance Company)

[Payment prediction] Many fraudulent insurance claims simply over-exaggerate the amount that should actually be paid out. In
these cases, the insurance company goes through an expensive investigation process but still must make a reduced payment in
relation to a claim. A model could be built to predict the amount most likely to be paid out by an insurance company after having
investigated a claim. This model could be run whenever new claims arise, and the policy holder could be offered the amount
predicted by the model as settlement as an alternative to going through a claims investigation process. Using this model, the company
could save on claims investigations and reduce the amount of money paid out on fraudulent claims.
Assessing
Feasibility
Assessing the Feasibility
Once a set of candidate analytics solutions that address a business problem have been
defined, the next task is to evaluate the feasibility of each solution. This involves
considering the following questions:

▪ Is the data required by the solution available, or could it be made available?

▪ What is the capacity of the business to utilize the insights that the analytics solution will provide?

The first question addresses data availability. Every analytics solution will have its own set of data requirements, and it is useful, as early
as possible, to determine if the business has sufficient data available to meet these requirements. In some cases, a lack of appropriate data
will simply rule out proposed analytics solutions to a business problem.
More likely, the easy availability of data for some solutions might favor them over others.

The second issue affecting the feasibility of an analytics solution is the ability of the business to utilize the insight that the solution
provides.
Assessing the Feasibility
In general, evaluating the feasibility of an analytics solution in terms of its data
requirements involves aligning the following issues with the requirements of the analytics
solution:

▪ The key objects in the company’s data model and the data available regarding them.(policy holders, policies, claims,
policy applications, investigations, brokers, members, investigators, and payments.)

▪ The connections that exist between key objects in the data model: (possible to connect the information from a policy
application with the details (e.g., claims, payments, etc) of the resulting policy itself?)

▪ The granularity of the data that the business has available.

▪ The volume of data involved.

▪ The time horizon for which data is available: (It is important that the data available covers the period required for the
analytics solution)
Assessing the Feasibility

[Claim prediction]

Data Requirements: This solution would require that a large collection of historical claims marked as fraudulent and non-fraudulent
exist. Similarly, the details of each claim, the related policy, and the related claimant would need to be available.

Capacity Requirements: Given that the insurance company already has a claims investigation team, the main requirements would be
that a mechanism could be put in place to inform claims investigators that some claims were prioritized above others. This would also
require that information about claims become available in a suitably timely manner so that the claims investigation process would not be
delayed by the model.
Assessing the Feasibility

[Member prediction]

Data Requirements: This solution would not only require that a large collection of claims labeled as either fraudulent or non-
fraudulent exist with all relevant details, but also that all claims and policies can be connected to an identifiable member. It would
also require that any changes to a policy are recorded and available historically.

Capacity Requirements: This solution first assumes that it is possible to run a process every quarter that performs an analysis of the
behavior of each customer. More challenging, there is the assumption that the company has the capacity to contact members based
on this analysis and can design a way to discuss this issue with customers highlighted as likely to commit fraud without damaging the
customer relationship so badly as to lose the customer.

Finally, there are possibly legal restrictions associated with making this kind of contact.
Assessing the Feasibility

[Application prediction]

Data Requirements: Again, a historical collection of claims marked as fraudulent or non-fraudulent along with all relevant details
would be required. It would also be necessary to be able to connect these claims back to the policies to which they belong and to the
application details provided when the member first applied. It is likely that the data required for this solution would stretch back over
many years as the time between making a policy application and making a claim could cover decades.

Capacity Requirements: The challenge in this case would be to integrate the automated application assessment process into whatever
application approval process currently exists within the company.
Assessing the Feasibility
[Payment prediction]

Data Requirements: This solution would require the full details of policies and claims as well as data on the original amount
specified in a claim and the amount ultimately paid out.

Capacity Requirements: Again, this solution assumes that the company has the potential to run this model in a timely fashion
whenever new claims rise and also has the capacity to make offers to claimants. This assumes the existence of a customer contact
center or something similar.
Analytical
Approaches
Analytical Approach
While many models are built to predict the behavior of people or things, not all are. Some models are built expressly for the
purpose of understanding the behavior of people, things, or processes better.
Target Variable
Defining the Target Variable
• For models that estimate or predict a specific value, a necessary step in the Business Understanding stage is to identify one
or more target variables to predict.

• A target variable is a column in the modeling data that contains the values to be estimated or predicted as defined in the business
objectives. The target variable can be numeric or categorical depending on the type of model that will be built

PROJECT TARGET VARIABLE TYPE TARGET VARIABLE VALUES


Customer Acquisition Binary 1 for acquired, 0 for non-acquired
Customer Value Continuous Dollar value of customer

Invoice Fraud Detection Categorical Type of fraud (5 levels)

Days to Next Purchase Continuous Number of days

Days to Next Purchase <= 7 Binary 1 for purchased within 7 days, 0 for
did not purchase within 7 days
Measures of Success
Defining Measures of Success
The determination of what is considered a good model depends on the particular interests of the organization and is specified as the
business success criterion. The business success criterion needs to be converted to a predictive modeling criterion so the modeler
can use it for selecting models.

• If the purpose of the model is to provide highly accurate


predictions or decisions to be used by the business, measures of
accuracy will be used.

• If interpretation of the business is what is of most interest, accuracy


measures will not be used; instead, subjective measures of what
provides maximum insight may be most desirable.
Success Criteria
• Some projects may use a combination of both so that the most accurate model is not selected if a less accurate but more transparent
model with nearly the same accuracy is available
• For classification problems, the most frequent metrics to assess model accuracy is Percent Correct Classification (PCC). PCC
measures overall accuracy without regard to what kind of errors are made; every error has the same weight.

• Another measure of classification accuracy is the confusion matrix, which enumerates different ways errors are made, like false
alarms and false dismissals. PCC and the confusion matrix metrics are good when an entire population must be scored and acted on.
• For example, if customers who visit a website are to be served customized content on the site based on their browsing behavior, every
visitor will need a model score and a treatment based on that score.

• For continuous-valued estimation problems, metrics often used for assessing models are R2, average error, Mean Squared
Error (MSE), median error, average absolute error, and median absolute error.

• In each of these metrics, you first compute the error of an estimate—the actual value minus the predicted estimate—and then
compute the appropriate statistic based on those errors. The values are then summed over all the records in the data.
Analytical Base Table
Data to Insights to Decisions
Predictive data analytics projects are not handed to data analytics practitioners fully
formed. Rather, analytics projects are initiated in response to a business problem, and it is
our job—as analytics practitioners—to decide how to address this business problem using
analytics techniques.

• Present an approach to developing analytics solutions that address specific business problems.

• Involves an analysis of the needs of the business, the data we have available for use, and the capacity of the business to use
analytics.

• Designing Analytics Base Table (ABT). that properly represent the characteristics of a prediction subject is a key skill for
analytics practitioners.

• Develop a set of Domain Concepts that describe the prediction subject, and then expand these into concrete Descriptive Features.
Designing the Analytical Base Table
In designing an ABT, the first decision an analytics practitioner needs to make is on the prediction subject for the
model they are trying to build. The prediction subject defines the basic level at which predictions are made, and each
row in the ABT will represent one instance of the prediction subject—the phrase one-row-per-subject is often used
to describe this structure.

The hierarchical relationship between an analytics solution, domain concepts, and descriptive features.
Designing the Analytical Base Table
The set of domain concepts that are important change from one analytics solution to another. However, there are a number of general
domain concepts that are often useful:
• Prediction Subject Details: Descriptive details of any aspect of the prediction subject.

• Demographics: Demographic features of users or customers such as age, gender, occupation, and address.

• Usage: The frequency and recency with which customers or users have interacted with an organization. The monetary value of a
customer’s interactions with a service. The mix of products or services offered by the organization that a customer or user has used.

• Changes in Usage: Any changes in the frequency, recency, or monetary value of a customer’s or user’s interactions with an
organization (for example, has a cable TV subscriber changed packages in recent months?).

• Special Usage: How often a user or customer used services that an organization considers special in some way in the recent past
(for example, has a customer called a customer complaints department in the last month?).

• Lifecycle Phase: The position of a customer or user in their lifecycle (for example, is a customer a new customer, a loyal customer
or a lapsing customer?).

• Network Links: Links between an item and other related items (for example, links between different customers or different
products, or social network links between customers).
Designing the Analytical Base Table
• Once we have decided which analytics solution, we are going to develop in response to a business problem, we need to begin to
design the data structures that will be used to build, evaluate, and ultimately deploy the model.

• This work sits primarily in the Data Understanding phase of the CRISP-DM process but also overlaps with the Business
Understanding and Data Preparation phases (remember that the CRISP-DM process is not strictly linear).

• An analytics base table is a simple, flat, tabular data structure made up of rows and columns. The columns are divided into a set of
descriptive features and a single target feature. Each row contains a value for each descriptive feature and the target feature and
represents an instance about which a prediction can be made.
Know your
Stakeholders
Analytical Team
Stakeholder Analysis

Develop Stakeholder Analysis Worksheet


Motor Insurance Fraud
• At this point in the motor insurance fraud detection project, we have decided to proceed with the proposed claim prediction
solution, in which a model will be built that can predict the likelihood that an insurance claim is fraudulent.

Example domain concepts for a motor insurance fraud prediction analytics solution.
Decisions First
Decisions First Software
Decision First Software
Decision Requirements Modeling has five steps:

1. Identify Decisions Identify the decisions that will be impacted by the analytic project

2. Describe Decisions Describe the decisions and document how improving these decisions will impact the business objectives and
metrics of the business.

3. Specify Decision Requirements Move beyond simple descriptions of decisions to specify detailed decision requirements.
Specify the information and knowledge required to make the decisions and combine into a Decision Requirements Diagram.

4. Complete the Model Refine the requirements for these decisions using the precise yet easy to understand graphical notation of
Decision Requirements Diagrams until the decisions are completely specified and everyone has a clear sense of how the decisions
will be made and how the analytics involved will be used.

5. Generate an Analytic Requirements Document Package this information as an Analytic Requirements Document
Decision First Software
Start with the decision to be made and “decompose it”
Decision First Software
Start filling in the necessary “pieces”
Decision First Software
Add more “details”
Decision First Software
Complete the “model framing”
Decision First Software
Then link “action to analytics”
Decision First Software
End Result = a complete “framing and documenting” of the decision to be analyzed.

You might also like