Lecture 1.2 & 1.3 - Working With Modeler

 The Life Cycle of a Data-Mining Project
 The stages influence each other in a non-linear way
 Data mining is an ongoing endeavor
You will rarely, if ever, simply plan a data-mining project, execute it and then pack up
the data and go home. Using data mining to address customers' demands is an ongoing
iterative endeavor. The knowledge gained from one cycle of data mining will almost
invariably lead to new questions, new issues, and new opportunities to identify and
meet customers' needs. Those new questions, issues, and opportunities can usually be
addressed by mining data once again. This process of mining and identifying new
opportunities should become part of the way that you think of the business and a
cornerstone of the overall business strategy.
Data Mining Success
1. Measuresof success:
• The initial assessment will be directly tied to the predictive accuracy
• In the long run the success of a data-mining effort is measured by concrete factors
The CRISP-DM model tells us, in the Evaluation stage, to assess the results with
respect to business success, not statistical criteria. And indeed, from the moment you
begin to develop a research question, the eventual evaluation of the results should be
foremost in mind. The initial assessment will be directly tied to the modeling effort; that
is to say that you will be concerned with predictive accuracy (for example, the ability to
predict if a customer churns). But in the long run the success of a data-mining effort
will be measured by concrete factors such as reduced savings, ROI, profitability, and so
forth.
2. Monitoring:
o After deployment,collect data to assess the model’s success
To determine success, you must monitor the model after it is deployed. Once a model
has been deployed, plans must be put in place to record the data and information that
make it possible to assess the model's success. Thus, if a real-time model is being used
to supply sales representatives with offers for customers, both the suggested offer and
the customer's decision, among other factors, must be retained in a database for future
analysis.
3. Cost of errors:
• There will always be errors, sometimes with high cost
• If no cost estimates are possible beforehand, then try to gather this information afterwards, for
future use
Do not forget to consider the cost of errors as another measure of success. You tend to
focus on success but there will always be errors, and sometimes the cost of making
errors can be high. For example, mispredicting which insurance claims are fraudulent
may be expensive because of the effort involved to investigate the claim further. Some
data-mining tools allow you to take cost into account when estimating the model. Use
this feature if it is possible to make even a rough cost estimate. When you cannot use
cost in the modeling stage, be sure to think carefully about the costs of errors before
deployment. And if no reliable cost estimates are possible beforehand, then try to
gather this information after the fact for use in future data-mining projects and as ad
hoc evaluation criteria.
4. Other measures of project successes:
• Seek other measures to determine success from a business perspective
• Bring successes to the attention of colleagues and management early on in the project, so that
tracking systems or reports can be developed
As you develop a model and think about its deployment, consider what other measures
can be used to determine how successful and useful it is, from a business or
organization perspective. Do not wait to mention these factors until after deployment,
but bring them to the attention of colleagues and management early on, so that tracking
systems or reports can be developed. In the case of a financial institution using data
mining to predict customer retention, there are many other factors to investigate
beyond simple retention. Changes in average account balance, account activity, account
profitability, the opening of other accounts, and use of other services (ATM card) can
be investigated after the model is deployed to see if they are also changing.
Data Mining Failure
1. Bad data:
• No data mining algorithm will be able to compensate for large amounts of error in the data
• Never scrimp on the time spent on data preparation and cleaning
Not every data-mining project is successful, or, at the least, not as successful as you might have
anticipated. As with any research lots of things can go wrong.
2. Organizational resistance:
• Difficulties implementing a solution are still part of the whole data-mining effort
• To address resistance, educate and convince others about the potential benefits of the solution
• Consider implementation in only a portion of the organization
Difficulties implementing a solution are still part of the whole data-mining effort. A
Health Maintenance Organization (HMO) investigated ways to reduce costs by looking
at patterns of treatment and care, and found that there was an optimal length of stay in
the hospital for several types of major surgeries. While not requiring doctors to rigidly
follow the statistical results (which would be inappropriate for any specific patient), the
HMO encouraged doctors to take this information into account. But after a few
months, it was clear that length of stay decisions were not changing, that is, that the
physicians were sticking to their current practices. When resistance occurs, the best
strategy is usually further education on the potential benefits of the solution, or
perhaps, implementation in only a portion of the organization. For the HMO, this
could mean convincing a few doctors initially to change their release decisions, hoping
that eventually more will follow this lead.

3. Results that cannot be deployed:
• Factors can be out of the control,
or
o Cannotlegally be used in marketing or in making decisions
Sometimes a model cannot be deployed for factors other than organizational
opposition. The most common reason is because factors found to be important are out
of the control of the organization, or cannot legally be used in marketing or in making
decisions. A consumer products company discovered that certain types of promotions
were successful and led to repeat business, but could only offer these promotions to
customers it could readily identify, which in practice were those who returned a
registration card or bought a service contract. Some obstacles can be anticipated, and
the data-mining process adjusted accordingly. If a model can be only partially
implemented, as with the consumer products firm, it may still be worthwhile to do the
analysis when sufficiently good results would justify the effort (this is always a judgment
call).
4. Cause and effect:
• You must be certain that inputs/predictors in a model occur before the output
Research methodology is important for the data-mining effort. One reason is because a
carefully formulated study will consider whether there is a cause-and-effect relationship
between the predictors and outcome variable. For example, customer satisfaction
research often uses attitudes about product/service fields to predict overall satisfaction,
willingness to buy again/ to remain a customer, or willingness to recommend a
product/service. In terms of cause and effect, all these attitudes about fields and future
actions or satisfaction occur at one point in time, that is, when the survey is conducted.

Skills Needed for Data mining
5. Understanding the business:
• Asking the right data-mining question requires knowledge of the specific business area and
organization
• Evaluating a data-mining solution needs a business perspective
2. Database knowledge:
• The database administrator plays an important role:
Ø Which data tables or files are available?
Ø How are they linked?
Ø How are the fields coded?
Ø What are reasonable data values?
3. Knowledge of data-mining techniques:
• Best tools for situation
• Fine-tuning techniques
• Assess effects of data on outcome
• Identify anomalies
4. Team work combining multiple competencies,
such as:
• Business domain knowledge
• Database knowledge
• Data-mining algorithms
• Project management

Lecture 1.2 & 1.3 - Working With Modeler

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1.2 & 1.3 - Working With Modeler

Uploaded by

Copyright:

Available Formats

 The Life Cycle of a Data-Mining Project

 The stages influence each other in a non-linear way

 Data mining is an ongoing endeavor

cornerstone of the overall business strategy.

Data Mining Success

• The initial assessment will be directly tied to the predictive accuracy

• There will always be errors, sometimes with high cost

hoc evaluation criteria.

4. Other measures of project successes:

• Seek other measures to determine success from a business perspective

Data Mining Failure

• Never scrimp on the time spent on data preparation and cleaning

• Consider implementation in only a portion of the organization

Health Maintenance Organization (HMO) investigated ways to reduce costs by looking

strategy is usually further education on the potential benefits of the solution, or

that eventually more will follow this lead.

3. Results that cannot be deployed:

• Factors can be out of the control,

o Cannotlegally be used in marketing or in making decisions

Sometimes a model cannot be deployed for factors other than organizational

of the control of the organization, or cannot legally be used in marketing or in making

decisions. A consumer products company discovered that certain types of promotions

the data-mining process adjusted accordingly. If a model can be only partially

4. Cause and effect:

carefully formulated study will consider whether there is a cause-and-effect relationship

willingness to buy again/ to remain a customer, or willingness to recommend a

Skills Needed for Data mining

5. Understanding the business:

• Evaluating a data-mining solution needs a business perspective

• The database administrator plays an important role:

Ø Which data tables or files are available?

Ø How are they linked?

Ø How are the fields coded?

Ø What are reasonable data values?

3. Knowledge of data-mining techniques:

• Best tools for situation

• Assess effects of data on outcome

4. Team work combining multiple competencies,

• Business domain knowledge

You might also like