ABSRACT: perspectives and summarizing it into useful

information - information that can be used to

Data mining, extraction of hidden
increase revenue, cuts costs, or both. Data
predictive information from large databases,
mining software is one of a number of
is a powerful new technology with great
analytical tools for analyzing data. It allows
potential to help companies focus on the
users to analyze data from many different
most important information in their data
dimensions or angles, categorize it, and
warehouses. Data mining tools predict future
summarize the relationships identified.
trends and behaviors, allowing businesses to
Technically, data mining is the process of
make proactive, knowledge-driven
finding correlations or patterns among
decisions. Data mining tools can answer
dozens of fields in large relational
business questions that traditionally were
databases. Although data mining is a
too time consuming to resolve. They scour
relatively new term, the technology is not.
databases for hidden patterns, finding
Companies have used powerful computers
predictive information that experts may miss
to sift through volumes of supermarket
because it lies outside their expectations. It
scanner data and analyze market research
is a process for organization, which uncover
reports for years. However, continuous
patterns hidden in their data that can be used
innovations in computer processing power,
to predict the behavior of customers,
disk storage, and statistical software are
products and processes .Data mining can
dramatically increasing the accuracy of
give the same result, some combinations or
analysis while driving down the cost. 
specific characters of customers, products
and process, which is further useful for next
working. Data mining is the tool, which can Data, Information, and Knowledge 

give your data the intelligence for any

particular models or work.
Data are any facts, numbers, or text that can
be processed by a computer. Today,
Generally, data mining (sometimes organizations are accumulating vast and
called data or knowledge discovery) is the growing amounts of data in different formats
process of analyzing data from different and different databases. This includes:
1.operational or transactional data such as, processing power, data transmission, and
sales, cost, inventory, payroll, and storage capabilities are enabling
accounting. organizations to integrate their various
2.nonoperational data, such as industry databases into data warehouses. Data
sales, forecast data, and macro economic warehousing is defined as a process of
data  centralized data management and retrieval.
3.meta data - data about the data itself, such Data warehousing, like data mining, is a
as logical database design or data dictionary relatively new term although the concept
definitions itself has been around for years. Data
warehousing represents an ideal vision of
maintaining a central repository of all
organizational data. Centralization of data is
The patterns, associations, or relationships
needed to maximize user access and
among all this data can provide information.
analysis. Dramatic technological advances
For example, analysis of retail point of sale
are making this vision a reality for many
transaction data can yield information on
companies. And, equally dramatic advances
which products are selling and when. 
in data analysis software are allowing users
to access this data freely. The data analysis
Knowledge  software is what supports data mining. 
Information can be converted into
knowledge about historical patterns and
What can data mining do? 
future trends. For example, summary
information on retail supermarket sales can Data mining is primarily used today by
be analyzed in light of promotional efforts to companies with a strong consumer focus -
provide knowledge of consumer buying retail, financial, communication, and
behavior. Thus, a manufacturer or retailer marketing organizations. It enables these
could determine which items are most companies to determine relationships among
susceptible to promotional efforts.  "internal" factors such as price, product
positioning, or staff skills, and "external"

DataWarehouses : factors such as economic indicators,

Dramatic advances in data capture, competition, and customer demographics.

And, it enables them to determine the impact Stored data is used to locate data in
on sales, customer satisfaction, and predetermined groups. For example, a
corporate profits. Finally, it enables them to restaurant chain could mine customer
"drill down" into summary information to purchase data to determine when customers
view detail transactional data.  visit and what they typically order. This
information could be used to increase traffic
With data mining, a retailer could use point- by having daily specials. 
of-sale records of customer purchases to
send targeted promotions based on an Clusters:
individual's purchase history. By mining
Data items are grouped according to
demographic data from comment or
logical relationships or consumer
warranty cards, the retailer could develop
preferences. For example, data can be mined
products and promotions to appeal to
to identify market segments or consumer
specific customer segments. . 

How does data mining work?  Associations:

While large-scale information
technology has been evolving separate Data can be mined to identify

transaction and analytical systems, data associations. The beer-diaper example

mining provides the link between the two. is an example of associative mining. 

Data mining software analyzes relationships

Sequential patterns:
and patterns in stored transaction data based
on open-ended user queries. Several types of Data is mined to anticipate behavior
analytical software are available: statistical, patterns and trends. For example, an outdoor
machine learning, and neural networks. equipment retailer could predict the
likelihood of a backpack being purchased
Generally, any of four types of
based on a consumer's purchase of sleeping
relationships:  bags and hiking shoes. 
Data mining consists of five major Decision trees:
Tree-shaped structures that represent
sets of decisions. These decisions generate

1.Extract transform and load transaction data rules for the classification of a dataset.

onto the data warehouse system Specific decision tree methods include
Classification and Regression Trees. CART
2. Store and manage the data in a
and CHAID are decision tree techniques
multidimensional database system. 
used for classification of a dataset. They
3. Provide data access to business analysts
provide a set of rules that you can apply to a
and information technology professionals. 
new (unclassified) dataset to predict which
4. Analyze the data by application software.  records will have a given outcome.
Nearest neighbor method:
5. Present the data in a useful format, such
as a graph or table.  A technique that classifies each record in
a dataset based on a combination of the
Different levels of analysis are available:  classes of the k record(s) most similar to it
in a historical dataset .Sometimes called the
Artificial neural networks: k-nearest neighbor technique. 

Non-linear predictive models that Rule induction:

learn through training and resemble
The extraction of useful if-then rules
biological neural networks in structure. 
from data based on statistical significance. 

Genetic algorithms:
Data visualization: The visual
Optimization techniques that use interpretation of complex relationships in
processes such as genetic combination, multidimensional data. Graphics tools are
mutation, and natural selection in a design used to illustrate data relationships. 
based on the concepts of natural evolution.  Technological infrastructure
Today, data mining applications are
available on all size systems for mainframe, Parallel Processors (MPP) to achieve order-
prices range from several thousand dollars of-magnitude improvements in query time.
for the smallest applications up to $1 million
a terabyte for the largest. Enterprise-wide
applications generally range in size from 10 Marking/Retailing
gigabytes to over 11 terabytes. NCR has the
Data mining can aid direct marketers
capacity to deliver applications exceeding
by providing them with useful and accurate
100 terabytes.
trends about their customers’ purchasing
There are two critical technological behavior.   Based on these trends, marketers
drivers:  can direct their marketing attentions to their

Size of the database: customers with more precision. Retail stores

The more data being processed and can also benefit from data mining in similar

maintained, the more powerful the system ways.  For example, through the trends

required.  provide by data mining, the store managers

can arrange shelves, stock certain items, or
Query complexity:
provide a certain discount that will attract
The more complex the queries and the their customers. 
greater the number of queries being
processed, the more powerful the system Banking/Crediting
Data mining can assist financial
institutions in areas such as credit reporting
Relational database storage and management
and loan information.  For example, by
technology is adequate for many data
examining previous customers with similar
mining applications less than 50 gigabytes.
attributes, a bank can estimated the level of
However, this infrastructure needs to be
risk associated with each given loan.  In
significantly enhanced to support larger
addition, data mining can also assist credit
applications. Some vendors have added
card issuers in detecting potentially
extensive indexing capabilities to improve
fraudulent credit card transaction. 
query performance. Others use new
hardware architectures such as Massively Law enforcement
Data mining can aid law enforcers in Although companies have a lot of
identifying criminal suspects as well as personal information about us available
apprehending these criminals by examining online, they do not have sufficient security
trends in location, crime type, habit, and systems in place to protect that information. 
other patterns of behaviors. For example, recently the Ford Motor credit
company had to inform 13,000 of the
consumers that their personal information
Data mining can assist researchers including Social Security number, address,
by speeding up their data analyzing process; account number and payment history were
thus, allowing them more time to work on accessed by hackers who broke into a
other projects.    database belonging to the Experian credit
reporting agency.  This incidence illustrated
DISADVANTAGES OF DATA MINING that companies are willing to disclose and
share your personal information, but they
Privacy Issues
are not taking care of the information
Personal privacy has always been a properly.  With so much personal
major concern in this country.  In recent information available, identity theft could
years, with the widespread use of Internet, become a real problem. 
the concerns about privacy have increase
tremendously.  Because of the privacy Profitable Applications
issues, some people do not shop on Internet. 
They are afraid that somebody may have A wide range of companies have deployed

access to their personal information and then successful applications of data mining.

use that information in an unethical way;

Two critical factors for success with data
thus causing them harm. 
mining are:

Although it is against the law to sell

A large, well-integrated data warehouse and
or trade personal information between
a well-defined understanding of the business
different organizations, selling personal
process within which data mining is to be
information have occurred. 
applied Some successful application areas
Security issues
1. A pharmaceutical company can analyze and prospects, and design targeted
its recent sales force activity and their marketing strategies to best reach them.
results to improve targeting of high-value
physicians and determine which marketing Conclusion:
activities will have the greatest impact in the
Data mining can be beneficial for
next few months..
businesses, governments, society as well as
2. A credit card company can leverage its the individual person.  However, the major
vast warehouse of customer transaction data flaw with data mining is that it increases the
to identify customers most likely to be risk of privacy invasion.  Currently, business
interested in a new credit product. Using a organizations do not have sufficient security
small test mailing, the attributes of systems to protect the information that they
customers with an affinity for the product obtained through data mining from
can be identified. unauthorized access, though the use of data
mining should be restricted.  In the future,
3. A diversified transportation company with when companies are willing to spend money
a large direct sales force can apply data to develop sufficient security system to
mining to identify the best prospects for its protect consumer data, then the use of data
services. mining may be supported. 

4. A large consumer package goods

company can apply data mining to improve
its sales process to retailers.

Each of these examples have a clear

common ground. They leverage the
knowledge about customers implicit in a
data warehouse to reduce costs and improve
the value of customer relationships. These
organizations can now focus their efforts on
the most important (profitable) customers

