Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

WEB MINING

by NINI P SURESH

PROJECT CO-ORDINATOR Kavitha Murugeshan 1 Page

OUTLINE
Introduction Data mining Vs Web mining Web mining subtasks Challenges Taxonomy Web content mining Web structure mining Web usage mining Applications

Page 2

INTRODUCTION
Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources. The target of search engines is only to discover the resources on the web.

Page 3

INTRODUCTION Needs for Web Mining


Narrowly searching scope Low precision

Page 4

INTRODUCTION

Other Approaches
 Database approach (DB)  Information retrieval  Natural language processing (NLP)  Web document community

Page 5

WEB MINING

DEFENITION

Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data.

Page 6

DATA MINING

WEB MINING

 Extraction of useful patterns from data sources like databases, texts, web, images etc

 Extracting relevant information hidden in Web-related data, like hypertext documents on web

Page 7

WEB MINING SUBTASKS


 Resource finding  Information selection & preprocessing  Generalization  Analysis

Page 8

CHALLENGES

 Search relevant information on web  Create knowledge  Personalization of Information  Learn patterns  Uniformity & standardisation

Page 9

CHALLENGES

 Redundant Information  Noisy web  Monitoring changes  Sites providing Services  Privacy

Page 10

TAXONOMY
Web Mining

Web Content Mining

Web Structure Mining

Web Usage Mining

Web Text Mining

Web Multimedia Mining

Gen. Access Pattern Track

Personalized Usages Track

Link Mining

Internal Structure Mining

URL Mining

Page 11

WEB CONTENT MINING


Discovering useful information & Analyses the content Automatic process beyond keyword extraction Approaches to restructure document content Two groups of mining strategies

Page 12

WEB CONTENT MINING


Agent based Approach Intelligent search agents Information filtering/categorization Personalized web agents

Page 13

WEB CONTENT MINING


Database Approach Multilevel databases Web query system

Page 14

WEB STRUCTURE MINING

Discovering structure information from web Web graph : web pages as nodes & hyperlinks as edges

Page 15

WEB STRUCTURE MINING

Two algorithms for handling of links  PageRank  HITS

Page 16

WEB STRUCTURE MINING

PageRank
 Metric for ranking hypertext documents  Depends on rank of pages pointing it  Iterative process

Page 17

WEB STRUCTURE MINING

n : Number of nodes in graph Outdegree(q) : Number of hyperlinks on page q d : damping factor

Page 18

WEB STRUCTURE MINING

HITS
 Iterative algorithm  Identify topic hubs & authorities  Input : search results returned by traditional text indexing technique

Page 19

WEB STRUCTURE MINING

 Assigns weight to hub based on authoritiveness  Outputs pages with largest hub & authority weights

Page 20

WEB USAGE MINING


Extracting information from server logs Discover user access patterns of Web pages Decomposed into 3 subtasks
Site Files Mining algorithms
User session file Rules, Patterns & Statistic

Preprocessing

Pattern Analysis
Interesting Rules, Patterns & Statistic

Raw logs

Page 21

WEB USAGE MINING Preprocessing


 Data cleaning  User identification  User sessions identification  Access path supplement  Transaction identification

Page 22

WEB USAGE MINING

Pattern discovery
 Statistical Analysis  Association Rules  Clustering analysis

Page 23

WEB USAGE MINING

 Classification analysis  Sequential Pattern  Dependancy Modeling

Page 24

WEB USAGE MINING Pattern Analysis


 Eliminates irrelevant rules or patterns  Extract intresting patterns

Page 25

APPLICATIONS

Personalized Services Improve website design System Improvement Predicting trends Carry out intelligent buisness

Page 26

PROS

 High trade volumes  Classify threats & fight against Terrorism  Establish better customer relationship  Increase profitability

Page 27

CONS

Invasion of Privacy Discrimination by controversial attributes

Page 28

CONCLUSION

 Rapidly growing area  Promising area of future research

Page 29

REFERENCE
[1] http://en.wikipedia.org/wiki/Web mining [2] http://www.galeas.de/webimining.html [3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000. [4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition [5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97 [6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RESEARCH: A SURVEY, 2010 IEEE [7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition [8] Web mining: applications and techniques By Anthony Scime

Page 30

WEB MINING

Thank You

Page 31

You might also like