Professional Documents
Culture Documents
Lecture 1.2 & 1.3 - Working With Modeler
Lecture 1.2 & 1.3 - Working With Modeler
You will rarely, if ever, simply plan a data-mining project, execute it and then pack up
the data and go home. Using data mining to address customers' demands is an ongoing
iterative endeavor. The knowledge gained from one cycle of data mining will almost
invariably lead to new questions, new issues, and new opportunities to identify and
meet customers' needs. Those new questions, issues, and opportunities can usually be
addressed by mining data once again. This process of mining and identifying new
opportunities should become part of the way that you think of the business and a
1. Measuresof success:
• In the long run the success of a data-mining effort is measured by concrete factors
The CRISP-DM model tells us, in the Evaluation stage, to assess the results with
respect to business success, not statistical criteria. And indeed, from the moment you
begin to develop a research question, the eventual evaluation of the results should be
foremost in mind. The initial assessment will be directly tied to the modeling effort; that
is to say that you will be concerned with predictive accuracy (for example, the ability to
predict if a customer churns). But in the long run the success of a data-mining effort
will be measured by concrete factors such as reduced savings, ROI, profitability, and so
forth.
2. Monitoring:
o After deployment,collect data to assess the model’s success
To determine success, you must monitor the model after it is deployed. Once a model
has been deployed, plans must be put in place to record the data and information that
make it possible to assess the model's success. Thus, if a real-time model is being used
to supply sales representatives with offers for customers, both the suggested offer and
the customer's decision, among other factors, must be retained in a database for future
analysis.
3. Cost of errors:
• If no cost estimates are possible beforehand, then try to gather this information afterwards, for
future use
Do not forget to consider the cost of errors as another measure of success. You tend to
focus on success but there will always be errors, and sometimes the cost of making
errors can be high. For example, mispredicting which insurance claims are fraudulent
may be expensive because of the effort involved to investigate the claim further. Some
data-mining tools allow you to take cost into account when estimating the model. Use
this feature if it is possible to make even a rough cost estimate. When you cannot use
cost in the modeling stage, be sure to think carefully about the costs of errors before
deployment. And if no reliable cost estimates are possible beforehand, then try to
gather this information after the fact for use in future data-mining projects and as ad
• Bring successes to the attention of colleagues and management early on in the project, so that
tracking systems or reports can be developed
As you develop a model and think about its deployment, consider what other measures
can be used to determine how successful and useful it is, from a business or
organization perspective. Do not wait to mention these factors until after deployment,
but bring them to the attention of colleagues and management early on, so that tracking
systems or reports can be developed. In the case of a financial institution using data
mining to predict customer retention, there are many other factors to investigate
beyond simple retention. Changes in average account balance, account activity, account
profitability, the opening of other accounts, and use of other services (ATM card) can
be investigated after the model is deployed to see if they are also changing.
1. Bad data:
• No data mining algorithm will be able to compensate for large amounts of error in the data
Not every data-mining project is successful, or, at the least, not as successful as you might have
anticipated. As with any research lots of things can go wrong.
2. Organizational resistance:
• Difficulties implementing a solution are still part of the whole data-mining effort
• To address resistance, educate and convince others about the potential benefits of the solution
Difficulties implementing a solution are still part of the whole data-mining effort. A
at patterns of treatment and care, and found that there was an optimal length of stay in
the hospital for several types of major surgeries. While not requiring doctors to rigidly
follow the statistical results (which would be inappropriate for any specific patient), the
HMO encouraged doctors to take this information into account. But after a few
months, it was clear that length of stay decisions were not changing, that is, that the
physicians were sticking to their current practices. When resistance occurs, the best
perhaps, implementation in only a portion of the organization. For the HMO, this
could mean convincing a few doctors initially to change their release decisions, hoping
or
opposition. The most common reason is because factors found to be important are out
were successful and led to repeat business, but could only offer these promotions to
customers it could readily identify, which in practice were those who returned a
registration card or bought a service contract. Some obstacles can be anticipated, and
implemented, as with the consumer products firm, it may still be worthwhile to do the
analysis when sufficiently good results would justify the effort (this is always a judgment
call).
• You must be certain that inputs/predictors in a model occur before the output
Research methodology is important for the data-mining effort. One reason is because a
between the predictors and outcome variable. For example, customer satisfaction
research often uses attitudes about product/service fields to predict overall satisfaction,
product/service. In terms of cause and effect, all these attitudes about fields and future
actions or satisfaction occur at one point in time, that is, when the survey is conducted.
• Asking the right data-mining question requires knowledge of the specific business area and
organization
2. Database knowledge:
• Fine-tuning techniques
• Identify anomalies
such as:
• Database knowledge
• Data-mining algorithms
• Project management