Web Mining

WEB MINING
PRESENTERS:
• Eshwari
• Kunal
• Parth
• Pranita
Introduction
◦ Web Mining is the use of the data mining techniques to discover and extract information from
web documents/services.
◦ It aims in finding and extracting relevant information that is hidden in web related data
◦ Web mining is a subset of Data mining
◦ Data is collected from server, client and database.
◦ Web mining helps in discovering patterns and insights from the World Wide Web. Hence, it
discovers useful data from Hyperlinks
WWW – Global Information Service center
WWW is huge, widely distributed within
News Advertisements E Commerce Government Social Consumer Access & Usage

Media Information Information
Difference - Data mining VS Web mining
Data Mining Web mining

1. The process of discovering patterns and relationships in large The process of extracting information and knowledge from web data
datasets
Involves techniques are used to analyze user behavior, trends in

2. Involves using statistical and computational techniques to
online content and identify patterns in web-based transactions.
identify meaningful insights from the data.
3. Data Mining uses structured data Web Mining uses structured & unstructured data
WEB MINING TECHNIQUES
Types of Web Mining
Web Mining can be generally

divided into 3 categories,
based on the data to be mined
as seen in the figure :
WEB CONTENT MINING
What is web content mining?
◦ Every webpage can have text, graphics, audio, video, forms, applications,
and more kinds of content
◦ It includes user-generated content and extracting all this relevant and useful
information from a website or any other online platform is web content
mining
Goal of web content mining
◦ Aims to discover patterns, generate insights, and trends from the large
volume of data obtain via web content mining
◦ Used to inform and improve business decisions, enhance search results,
personalize content and enhance the overall user experience
◦ Understand the upcoming trends from social media and using it beneficially.
Tools and Technologies used
◦ There are various technologies used in content mining, depending on the specific task
and data being analysed.
◦ Some commonly used tools and technologies are; web crawlers, natural language
processing, machine learning, text mining, data visualization tools and cloud computing
platforms.
◦ Tools like Scrapy, Selenium, ProWebScraper, Rstudio, Tableau, Oracle Data Mining
(ODM), Octoparse and algorithms like HITS algorithm, PageRank Algorithm are also
used for Web Content Mining.
HITS
algorithm
◦ HITS algorithm or
Hyperlink-Induced Topic
Search (HITS) is a link
analysis algorithm that
rates web pages as being
hubs or authorities.
WEB CONTENT
MINING FLOW
CHART
References
◦ Web content mining: A systematic review" by G. P. Saroha and H. S. Chahal

(2018)
◦ Mining the Web: Discovering Knowledge from Hypertext Data" by Soumen
Chakrabarti (2003)
◦ Web content mining using neural networks" by H. Abdollahi and S. A.
Mirroshandel (2017)
WEB STRUCTURE
MINING
What is Web Structure Mining?
◦ The process of analysing and extracting information from the link structure of the
World Wide Web.
◦ The link structure of the Web consists of the set of hyperlinks between web pages,
which can be represented as a directed graph, with web pages as nodes and
hyperlinks as edges
◦ Involves several techniques for analysing and extracting information from this graph
structure.
◦ One common technique is link analysis, which examines the relationships between
pages and their links, and can be used to identify important pages, such as
authoritative sources or hubs.
What is Web Structure Mining? - continued
o The structure of a typical web graph consists of Web pages as

nodes and hyperlink as edges connecting between two related
pages.
o Web structure mining techniques include link analysis, graph

analysis, clustering, and classification.
o These techniques can be used to identify important websites

or web pages, discover hidden communities or clusters of web
pages, and analyse the evolution of the web over time.
TYPES OF LINKS
Link-based classification: The task is to focus on the prediction of the
category of a web page, based on words that occur on the page, links
between pages, anchor text, html tags and other possible at- tributes
found on the web page.
Link-based Cluster Analysis: The data is segmented into groups,

where similar objects are grouped together, and dissimilar objects are
grouped into different groups.
Link Type: There are a wide range of tasks concerning the prediction of
the existence of links, such as predicting the type of link between two
entities, or predicting the purpose of a link.
Link Strength: Links could be associated with weights.
Link Cardinality: The main task here is to predict the number of links
between objects.
Applications
◦ Marketing
◦ E-commerce
◦ Information Retrieval
References
1. Business Intelligence and Data mining by Anil Maheshwari

2. Jidong Wang, Zheng Chen, Li Tao, Wei-Ying Ma, Liu Wenyin, Rank- ing User’s Relevance to a
Topic through Link Analysis on Web Logs, WIDM’ 02, November 2002.
3. A. A. Barfourosh, H.R. Motahary Nezhad, M. L. Anderson, D. Perlis, Information Retrieval on the
World Wide Web and Active Logic: A Survey and Problem Definition, 2002.
4. G. Piatetsky-Shapiro, and W.J. Frawley, Knowledge Discovery in Databases. AAAI/MIT Press,
1991.
WEB USAGE MINING
Web-Usage Mining
Web Mining
What is Web Usage Mining?

Web Structure Web Content Web Usage
◦ Extracting useful information and Mining Mining Mining
Discovering user ‘navigation patterns’
from data generated through web data.
◦ Prediction of user behavior while the user
interacts with the web
Usage Mining Process
◦ Data Collection:
Server Level
Client Level
◦ Analyzing data
Identify users, clicks, location & duration.
◦ Data Mining:
Navigation Patterns
Sequential Patterns
Data Mining Techniques – Navigation Patterns
B E
C D
Web Page Hierarchy of a Web Site

Data Mining Techniques – Sequential Patterns Example
◦ Customer Transaction Time Purchased Items
John 6/21/05 5:30 pm Beer

John 6/22/05 10:20 pm Brandy
Frank 6/20/05 10:15 am Juice, Coke

Frank 6/20/05 11:50 am Beer
Frank 6/20/05 12:50 am Wine, Cider
Mary 6/20/05 2:30 pm Beer

Mary 6/21/05 6:17 pm Wine, Cider
Mary 6/22/05 5:05 pm Brandy
Data Mining Techniques – Sequential Patterns
Example - continued
Customer Sequence Mining Result
Customer Customer Sequences Sequential Patterns Supporting
Customers
John (Beer) (Brandy)
Frank (Juice, Coke) (Beer) (Wine, Cider) (Beer) (Brandy) John, Mary
Mary (Beer) (Wine, Cider) (Brandy) (Beer) (Wine, Cider) Frank, Mary
Association
Transaction ID Items Purchased
1 butter, milk
2 bread, milk, beer, egg
3 diaper
… ………
◦ Example: Supermarket
An association rule can be:
“If a customer buys milk, in 50% of cases, he/she also buys

beer”. This happens in 33% of all transactions.
50%: confidence
33%: support
 Discovery of meaningful patterns from data generated by
client-server transactions.
◦ Restructure a website
◦ Extract user access patterns to target specific ads
◦ Predict user behavior based on previously learned rules and users’ profile
◦ Present dynamic information to users based on their interests and profiles
Conclusion
◦ As web usage and information source in the World Wide Web are growing
continuously it is a good opportunity having web miner to extract hidden
knowledge from the web
◦ As a weakness, not all bur some researchers have replaced Web Mining by
Text Mining.
◦ Since Web Mining is concentrated with too much multimedia information
however Text Mining is only textual data

Web Mining

Uploaded by

Copyright:

Available Formats

You might also like

Web Mining

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Web Mining

Uploaded by

Copyright:

Available Formats

WEB MINING

WWW is huge, widely distributed within

News Advertisements E Commerce Government Social Consumer Access & Usage

Data Mining Web mining

Involves techniques are used to analyze user behavior, trends in

Web Mining can be generally

◦ Web content mining: A systematic review" by G. P. Saroha and H. S. Chahal

o The structure of a typical web graph consists of Web pages as

o Web structure mining techniques include link analysis, graph

o These techniques can be used to identify important websites

Link-based Cluster Analysis: The data is segmented into groups,

Link Strength: Links could be associated with weights.

1. Business Intelligence and Data mining by Anil Maheshwari

What is Web Usage Mining?

Web Page Hierarchy of a Web Site

John 6/21/05 5:30 pm Beer

Frank 6/20/05 10:15 am Juice, Coke

Mary 6/20/05 2:30 pm Beer

“If a customer buys milk, in 50% of cases, he/she also buys

You might also like