Download as pdf or txt
Download as pdf or txt
You are on page 1of 9



Financial Data Analysis

Retail Industry
Telecommunication Industry
Biological Data Analysis
Other Scientific Applications
Intrusion Detection
Financial Data Analysis:
Financial Data
Collected from Banks and Financial Institutions
Usually complete and reliable
Design and Construction of data Warehouses for

multi-dimensional data analysis and mining

Analysis Changes by month, by region, by
sectorand max, min, total, average, trend etc.
Characteristic and Comparative analysis, Outlier
Loan payment and customer credit policy analysis

Feature Selection and attribute relevance ranking

(Debt ratio, credit history, income, education
level )
Loan granting policy can be adjusted
Low risk Customers are granted loans

Classification and Clustering of customers for

targeted marketing
Customer group identification
Multidimensional clustering techniques
Can associate new customer with existing groups
Detection of money laundering and financial crimes

Data from several sources integrated

Data Analysis tools can be used to detect unusual
Data Visualization tools, Linkage Analysis tools
Classification tools, Clustering tools
Outlier Analysis tools

Retail Industry:
Sales Data, Customer Shopping history, Goods

Transportation, E-Commerce
Mining can help to
Identify buying behaviour, discover shopping
Improve the quality of customer service, retain
Design and Construction of data warehouses
Several ways to design a warehouse

Entities involved: Sales, Customers,

Employers, Goods transportation
Preliminary data mining exercises can help to
guide the design process
Dimensions and levels to involve and preprocessing to be done

Multi-dimensional analysis of sales, customers,

products, time and region

Multi-feature data cube
Visualization tools
Analysis of effectiveness of sales campaigns

Compare sales and transaction volume

Multidimensional analysis

Compare sales amount, number of

transactions containing same items before and
after the campaign
Association Analysis

Identify items likely to be purchased together

Customer Retention

Customer loyalty and trends

Sequential pattern mining

Adjust pricing strategy and goods range

Purchase recommendation and cross-reference of

Recommender Systems
Sales promotion by displaying deal information in
association with items of interest

Telecommunication Industry:
Computer and Web data transmission, fax, Mobile

phone, Telephone services

Multidimensional analysis of telecommunication data

Helps to identify and compare the data traffic,

System work load, Resource usage, User Group
Behavior, Profit..
Time-of-day usage patterns

Fraudulent pattern analysis

Identify fraudulent users and atypical usage


Illegal Customer account access

Automatic Dial-out equipment

Switch and route congestion patterns

Multidimensional association and sequential pattern

Usage patterns for a set of communication
services by customer group, time of day
Sales Promotion
Mobile Telecommunication Services

Spatio-temporal data mining

Use of visualization tools

Biomedical and DNA Data Analysis:

Research in DNA Analysis has led to

Development of new drugs

Cancer therapies
Human genome study
Discovery of genetic causes for many diseases

Genome Research

Study of DNA Sequences

Adenine, Cytosine, Guanine, Thymine
1,00,000 genes each has hundreds of
nucleotides can be combined in a number of
Identifying Gene Sequence patterns is challenging

Semantic Integration of Heterogeneous, distributed

genome databases
Highly distributed generation and use of DNA
Integrated data warehouses and distributed
federated databases
Efficient Data Cleaning and Integration methods
Similarity Search and Comparison among DNA

Gene sequences isolated from healthy and
diseased tissues
Compare frequently occurring patterns in each
Help to identify the genetic factors of the disease
and immune factors
Non-numeric nature of data poses difficulties
Association Analysis: Identification of co-occurring

gene sequences
Diseases triggered by a combination of genes
acting together
Association analysis helps to detect the kinds of
genes that may co-occur
Study interactions and relationships between them

Path Analysis: Linking genes to different stages of

disease development
Different genes become active at different stages
of the disease
Develop drug interventions that target specific
Visualization tools and genetic data analysis

Complex Gene structures Graphs, trees,

Cuboids and visualization tools
Better Understanding and support interactive data

Intrusion Detection:

Any set of actions that threaten the integrity,

availability, or confidentiality of a network
Misuse detection: use patterns of well-known attacks
to identify intrusions

Signatures Must be updated

Classification based on known intrusions

E.g., three consecutive login failures: password


Anomaly detection: use deviation from normal usage

patterns to identify intrusions

Any significant deviations from the expected

behavior are reported as possible attacks

Data Mining Algorithms

Misuse detection

training data labeled normal / intrusion

Classifier can be used to detect known


Classification algorithms, Association rule

Anomaly detection

Builds models of normal behavior and detects

significant deviations

Supervised normal training data

Unsupervised no information about training


Classification, clustering

Association and Correlation Analysis

Finds relationships between system attributes
describing the network data
Helps in selection of useful attributes
Analysis of Stream data
Transient and dynamic nature of intrusions
An event maybe normal on its own but malicious
when viewed as a part of a sequence
Distributed Data Mining
Analysis of data from several locations
Visualization and Querying tools

Data Mining in other Scientific Applications:

Old Scenario: Small, homogeneous data sets

Formulate hypothesis, build model, evaluate

Current Scenario: High-dimensional data, stream

data, heterogeneous data (spatial, temporal)

Collect and store data, mine for new hypotheses,
confirm with data or experimentation

Vast amounts of data have been collected from

Scientific domains
Climate and ecosystem modeling, Chemical
engineering, fluid dynamics, structural

Data Warehouses and data preprocessing

Scientific applications methods are needed for

integrating data from heterogeneous sources
(Geospatial data warehouse) and identifying
events (Climate and Ecosystem data)

Mining complex data types

Scientific data Semi-structured and unstructured

Multimedia and Spatial data

Graph-based mining

Labeled graphs capture spatial, topological,

geometric and other relational characteristics
present in scientific data
Nodes objects to be mined; edges
Scalable and efficient mining methods are needed

Visualization tools and domain specific knowledge

High level GUIs and visualization tools are

Integrated with existing domain-specific systems
and database systems

Issues in Data Mining:

Mining methodology and user interaction

Mining different kinds of knowledge in databases

Interactive mining of knowledge at multiple
levels of abstraction
Incorporation of background knowledge
Data mining query languages and ad-hoc data
Expression and visualization of data mining
Handling noise and incomplete data
Pattern evaluation

Issues relating to the diversity of data types

Handling relational and complex types of data
Mining information from heterogeneous
databases and global information systems

Performance and scalability

Efficiency and scalability of data mining


Parallel, distributed and incremental mining


You might also like