Professional Documents
Culture Documents
Web Mining
Web Mining
Web Mining
PRESENTERS:
• Eshwari
• Kunal
• Parth
• Pranita
Introduction
◦ Web Mining is the use of the data mining techniques to discover and extract information from
web documents/services.
◦ It aims in finding and extracting relevant information that is hidden in web related data
◦ Web mining is a subset of Data mining
◦ Data is collected from server, client and database.
◦ Web mining helps in discovering patterns and insights from the World Wide Web. Hence, it
discovers useful data from Hyperlinks
WWW – Global Information Service center
3. Data Mining uses structured data Web Mining uses structured & unstructured data
WEB MINING TECHNIQUES
Types of Web Mining
◦ Every webpage can have text, graphics, audio, video, forms, applications,
and more kinds of content
◦ It includes user-generated content and extracting all this relevant and useful
information from a website or any other online platform is web content
mining
Goal of web content mining
◦ Aims to discover patterns, generate insights, and trends from the large
volume of data obtain via web content mining
◦ Used to inform and improve business decisions, enhance search results,
personalize content and enhance the overall user experience
◦ Understand the upcoming trends from social media and using it beneficially.
Tools and Technologies used
◦ There are various technologies used in content mining, depending on the specific task
and data being analysed.
◦ Some commonly used tools and technologies are; web crawlers, natural language
processing, machine learning, text mining, data visualization tools and cloud computing
platforms.
◦ Tools like Scrapy, Selenium, ProWebScraper, Rstudio, Tableau, Oracle Data Mining
(ODM), Octoparse and algorithms like HITS algorithm, PageRank Algorithm are also
used for Web Content Mining.
HITS
algorithm
◦ HITS algorithm or
Hyperlink-Induced Topic
Search (HITS) is a link
analysis algorithm that
rates web pages as being
hubs or authorities.
WEB CONTENT
MINING FLOW
CHART
References
◦ The process of analysing and extracting information from the link structure of the
World Wide Web.
◦ The link structure of the Web consists of the set of hyperlinks between web pages,
which can be represented as a directed graph, with web pages as nodes and
hyperlinks as edges
◦ Involves several techniques for analysing and extracting information from this graph
structure.
◦ One common technique is link analysis, which examines the relationships between
pages and their links, and can be used to identify important pages, such as
authoritative sources or hubs.
What is Web Structure Mining? - continued
Link Type: There are a wide range of tasks concerning the prediction of
the existence of links, such as predicting the type of link between two
entities, or predicting the purpose of a link.
Link Cardinality: The main task here is to predict the number of links
between objects.
Applications
◦ Marketing
◦ E-commerce
◦ Information Retrieval
References
B E
C D
Mary (Beer) (Wine, Cider) (Brandy) (Beer) (Wine, Cider) Frank, Mary
Association
Transaction ID Items Purchased
1 butter, milk
2 bread, milk, beer, egg
3 diaper
… ………
◦ Example: Supermarket
An association rule can be:
50%: confidence
33%: support
Discovery of meaningful patterns from data generated by
client-server transactions.
◦ Restructure a website
◦ Extract user access patterns to target specific ads
◦ Predict user behavior based on previously learned rules and users’ profile
◦ Present dynamic information to users based on their interests and profiles
Conclusion
◦ As web usage and information source in the World Wide Web are growing
continuously it is a good opportunity having web miner to extract hidden
knowledge from the web
◦ As a weakness, not all bur some researchers have replaced Web Mining by
Text Mining.
◦ Since Web Mining is concentrated with too much multimedia information
however Text Mining is only textual data