Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

 The Life Cycle of a Data-Mining Project

 The stages influence each other in a non-linear way

 Data mining is an ongoing endeavor

You will rarely, if ever, simply plan a data-mining project, execute it and then pack up

the data and go home. Using data mining to address customers' demands is an ongoing

iterative endeavor. The knowledge gained from one cycle of data mining will almost

invariably lead to new questions, new issues, and new opportunities to identify and

meet customers' needs. Those new questions, issues, and opportunities can usually be

addressed by mining data once again. This process of mining and identifying new

opportunities should become part of the way that you think of the business and a

cornerstone of the overall business strategy.

Data Mining Success

1. Measuresof success:

•      The initial assessment will be directly tied to the predictive accuracy

•      In the long run the success of a data-mining effort is measured by concrete factors

The CRISP-DM model tells us, in the Evaluation stage, to assess the results with

respect to business success, not statistical criteria. And indeed, from the moment you

begin to develop a research question, the eventual evaluation of the results should be

foremost in mind. The initial assessment will be directly tied to the modeling effort; that

is to say that you will be concerned with predictive accuracy (for example, the ability to

predict if a customer churns). But in the long run the success of a data-mining effort

will be measured by concrete factors such as reduced savings, ROI, profitability, and so

forth.

2. Monitoring:
o After deployment,collect data to assess the model’s success

To determine success, you must monitor the model after it is deployed. Once a model
has been deployed, plans must be put in place to record the data and information that

make it possible to assess the model's success. Thus, if a real-time model is being used

to supply sales representatives with offers for customers, both the suggested offer and

the customer's decision, among other factors, must be retained in a database for future

analysis.

3. Cost of errors:

•      There will always be errors, sometimes with high cost

•      If no cost estimates are possible beforehand, then try to gather this information afterwards, for
future use

Do not forget to consider the cost of errors as another measure of success. You tend to

focus on success but there will always be errors, and sometimes the cost of making

errors can be high. For example, mispredicting which insurance claims are fraudulent

may be expensive because of the effort involved to investigate the claim further. Some

data-mining tools allow you to take cost into account when estimating the model. Use

this feature if it is possible to make even a rough cost estimate. When you cannot use

cost in the modeling stage, be sure to think carefully about the costs of errors before

deployment. And if no reliable cost estimates are possible beforehand, then try to

gather this information after the fact for use in future data-mining projects and as ad

hoc evaluation criteria.

4. Other measures of project successes:

•      Seek other measures to determine success from a business perspective

•      Bring successes to the attention of colleagues and management early on in the project, so that
tracking systems or reports can be developed

As you develop a model and think about its deployment, consider what other measures

can be used to determine how successful and useful it is, from a business or

organization perspective. Do not wait to mention these factors until after deployment,

but bring them to the attention of colleagues and management early on, so that tracking
systems or reports can be developed. In the case of a financial institution using data

mining to predict customer retention, there are many other factors to investigate

beyond simple retention. Changes in average account balance, account activity, account

profitability, the opening of other accounts, and use of other services (ATM card) can

be investigated after the model is deployed to see if they are also changing.

Data Mining Failure

1. Bad data:

•      No data mining algorithm will be able to compensate for large amounts of error in the data

•      Never scrimp on the time spent on data preparation and cleaning

Not every data-mining project is successful, or, at the least, not as successful as you might have
anticipated. As with any research lots of things can go wrong.

2. Organizational resistance:

•      Difficulties implementing a solution are still part of the whole data-mining effort

•      To address resistance, educate and convince others about the potential benefits of the solution

•      Consider implementation in only a portion of the organization

Difficulties implementing a solution are still part of the whole data-mining effort. A

Health Maintenance Organization (HMO) investigated ways to reduce costs by looking

at patterns of treatment and care, and found that there was an optimal length of stay in

the hospital for several types of major surgeries. While not requiring doctors to rigidly

follow the statistical results (which would be inappropriate for any specific patient), the

HMO encouraged doctors to take this information into account. But after a few

months, it was clear that length of stay decisions were not changing, that is, that the

physicians were sticking to their current practices. When resistance occurs, the best

strategy is usually further education on the potential benefits of the solution, or

perhaps, implementation in only a portion of the organization. For the HMO, this

could mean convincing a few doctors initially to change their release decisions, hoping

that eventually more will follow this lead.


 

3. Results that cannot be deployed:

•      Factors can be out of the control,

or

o Cannotlegally be used in marketing or in making decisions

Sometimes a model cannot be deployed for factors other than organizational

opposition. The most common reason is because factors found to be important are out

of the control of the organization, or cannot legally be used in marketing or in making

decisions. A consumer products company discovered that certain types of promotions

were successful and led to repeat business, but could only offer these promotions to

customers it could readily identify, which in practice were those who returned a

registration card or bought a service contract. Some obstacles can be anticipated, and

the data-mining process adjusted accordingly. If a model can be only partially

implemented, as with the consumer products firm, it may still be worthwhile to do the

analysis when sufficiently good results would justify the effort (this is always a judgment

call).

4. Cause and effect:

•      You must be certain that inputs/predictors in a model occur before the output

Research methodology is important for the data-mining effort. One reason is because a

carefully formulated study will consider whether there is a cause-and-effect relationship

between the predictors and outcome variable. For example, customer satisfaction

research often uses attitudes about product/service fields to predict overall satisfaction,

willingness to buy again/ to remain a customer, or willingness to recommend a

product/service. In terms of cause and effect, all these attitudes about fields and future

actions or satisfaction occur at one point in time, that is, when the survey is conducted.

 
 

Skills Needed for Data mining

5. Understanding the business:

•      Asking the right data-mining question requires knowledge of the specific business area and
organization

•      Evaluating a data-mining solution needs a business perspective

2. Database knowledge:

•      The database administrator plays an important role:

Ø Which data tables or files are available?

Ø How are they linked?

Ø How are the fields coded?

Ø What are reasonable data values?

3. Knowledge of data-mining techniques:

•      Best tools for situation

•      Fine-tuning techniques

•      Assess effects of data on outcome

•      Identify anomalies

4. Team work combining multiple competencies,

 such as:

•      Business domain knowledge

•      Database knowledge

•      Data-mining algorithms

•      Project management

You might also like