Professional Documents
Culture Documents
Navi-2. Literature Survey
Navi-2. Literature Survey
LITERATURE SURVEY
2. LITERATURE SURVEY
The unabated growth and increasing significance of the World Wide Web has
resulted in a flurry of research activity to improve its capacity for serving information
more effectively. But at the heart of these efforts lie implicit assumptions about
“quality” and “usefulness” of Web resources and services. This observation points
towards measurements and models that quantify various attributes of web sites. The
science of measuring all aspects of information, especially its storage and retrieval or
informatics has interested information scientists for decades before the existence of
the Web. Is Web informatics any different, or is it just an application of classical
informatics to a new medium? In this paper, we examine this issue by classifying and
discussing a wide ranging set of Web metrics. We present the origins, measurement
functions, formulations and comparisons of well known Web metrics for quantifying
Web graph properties, web page significance, web page similarity, search and
retrieval, usage characterization and information theoretic properties. We also discuss
how these metrics can be applied for improving Web information access and use.
In this paper we have reviewed and classified some well known Web metrics.
Our approach has been to consider these metrics in the context of improving Web
content while intuitively explaining their origins and formulations. This analysis is
fundamental to modeling the phenomena that give rise to the measurements. To our
knowledge this is the first survey that incorporates an extensive treatment of wide
range of metrics and measurement functions. Nevertheless, we do not claim this
survey is complete and acknowledge any omissions. We hope that this initiative
would serve as a reference point for further evolution of new metrics for
characterizing and quantifying information on the Web and developing the
explanatory models associated with them.
Page | 3
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
To address the questions above, we describe the design space of adaptive web sites
and consider a case study: the problem of synthesizing new index pages that facilitate
navigation of a web site. We present the PageGather algorithm, which automatically
identies candidate link sets to include in index pages based on user access logs. We
demonstrate experimentally that PageGather outperforms the Apriority data mining
algorithm on this task. In addition, we compare PageGather's link sets to pre-existing,
human-authored index pages. The work reported in this paper is part of our ongoing
research effort to develop adaptive web sites and increase their degree of automation.
We list our main contributions below:
1. We motivated the notion of adaptive web sites and analyzed the design space for
such sites, locating previous work in that space.
Page | 4
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
long-term goal of change in view: adaptive sites that automatically suggest alternative
organizations of their contents based on visitor access patterns.
The two most important tasks in information extraction from the Web are
webpage structure understanding and natural language sentences processing.
However, little work has been done towards an integrated statistical model for
understanding webpage structures and processing natural language sentences within
the HTML elements. Our recent work on webpage understanding introduces a joint
model of Hierarchical Conditional Random Fields (i.e. HCRF) and extended Semi-
Markov Conditional Random Fields (i.e. Semi-CRF) to leverage the page structure
understanding results in free text segmentation and labeling. In this top-down
integration model, the decision of the HCRF model could guide the decision-making
of the Semi-CRF model. However, the drawback of the top-down integration strategy
is also apparent, i.e., the decision of the Semi-CRF model could not be used by the
HCRF model to guide its decision-making. This paper proposed a novel framework
Called WebNLP, which enables bidirectional integration of page structure
understanding and text understanding in an iterative manner. We have applied the
proposed framework to local business entity extraction and Chinese person and
organization name extraction. Experiments show that the WebNLP framework
achieved significantly better performance than existing methods.
Page | 5
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
procedure. The auxiliary corpus is introduced to train the statistical language features
in the extended Semi-CRF model for text understanding, and the multiple occurrence
features are also used in the extended Semi-CRF model by adding the decision of the
model in last iteration. Therefore, the extended Semi-CRF model is improved by
using both the label of the vision nodes assigned by the HCRF model and the text
segmentation and labeling results, given by the extended Semi-CRF model itself in
last iteration as additional input parameters in some feature functions; the extended
HCRF model benefits from the extended Semi- CRF model via using the
segmentation and labeling results of the text strings explicitly in the feature functions.
The WebNLP framework closes the loop in webpage understanding for the first time.
The experimental results show that the WebNLP framework performs significantly
better than the state-of-the-art algorithms on English local entity extraction and
Chinese named entity extraction on WebPages.
Page | 6
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
index, extract, and navigate significant information from a Web site. Experiments on
several real news Web sites show that the precision and the recall of our approaches
are much superior to those obtained by conventional methods in mining the
informative structures of news Web sites. On the average, the augmented LAMIS
leads to prominent performance improvement and increases the precision by a factor
ranging from 122 to 257 percent when the desired recall falls between 0.5 and 1. In
comparison with manual heuristics, the precision and the recall of InfoDiscoverer are
greater than 0.956.
In the paper, we propose a system, composed of LAMIS and InfoDiscoverer,
to mine informative structures and contents from Web sites. Given an entrance URL
of a Web site, our system is able to crawl the site, parse its pages, analyze entropies of
features, links and contents, and mine the informative structures and contents of the
site. With a fully automatic flow, the system is useful to serve as a preprocessor of
search engines and Web miners (information extraction systems). The system can also
be applied to various Web applications with its capability of reducing a complex Web
site structure to a concise one with informative contents. During performing
experiments of LAMIS, we found that the HITS-related algorithms are not good
enough to be applied in mining the informative structures, even when the link entropy
is considered. Therefore, we developed and investigated several techniques to
enhance the that LAMIS-LN-CB-HR-TW was able to achieve the optimal solution in
most cases. The R-Precision 0.82 indicates that the enhanced LAMIS performs very
well in mining the informative structure of a Web site. The result of InfoDiscoverer
also shows that both recall and precision rates are larger than 0.956, which is very
close to the hand-coding result. In the future, we are interested in the further
enhancement of our system. For example, the concept of generalization/specialization
can be applied to find the optimal granularity of blocks to be utilized in LAMIS and
InfoDiscoverer. Also, our proposed mechanisms are significant for and are worthy of
further deployment in several Web domain-specific studies, including those for Web
miners and intelligent agents. These are matters of future research.
Page | 7
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
2.6. Toward an Adaptive Web: The State of the Art and Science
As the World Wide Web matures, it makes leaps forward in both size and
complexity. In this expanding environment, the needs and interests of individual users
become buried under the sheer weight of possible viewing choices. To counter this,
there has been a rise in research in adaptive websites, a combination of data mining,
machine learning, user modeling, Human Computer Interaction (HCI), optimization
theory and graph theory which seeks to sift through the tides of possible pages to
provide users with a high-quality stream of information. This paper provides a
Page | 8
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
description of adaptive website research, including the goals aimed at, the challenges
discovered and the approaches to solutions.
This paper has presented an overview of the goals, challenges, approaches and
implementations that surround adaptive website research. This work is meant to
provide an introduction to many of the most important difficulties, characteristics and
solutions that have occurred to date, but is not intended to be an exhaustive overview.
Readers are directed to for additional good overviews of the topic. The title of this
paper suggests the nature of the problem as both an art and a science. While
considerable research has been performed into studying how users behave in a web
environment, the relationships between pages, how to rate and rank suggestions to
users, etc., there is still considerable art involved in producing effective adaptive web
systems, from the choice of particular parameters of clustering algorithms to the
measuring of the effectiveness of a particular adaptation, for example. Nonetheless,
substantial work has been done to explore the problem from three basic directions:
understanding users, understanding websites, and understanding information. Most
approaches seem to examine the problem from one of these directions; some examine
it from two of these directions; but few (if any) consider all three directions. We are in
the midst of the early stages of the problem, where there is primarily analysis being
performed, with the problem not yet sufficiently explored to allow broader synthesis
to occur. It is expected, however, that for a substantial portion of time there will
remain a large proportion of the problem which can only be solved via the sound and
steady application of considerable art, backed by the driving solidity of science.
Page | 9
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
survey of the use of Web mining for Web personalization. More specifically, we
introduce the modules that comprise a Web personalization system, emphasizing on
the Web usage mining module. A review of the most common methods that are used
as well as technical issues that occur is given, along with a brief overview of the most
popular tools and applications available from software vendors. Moreover, the most
important research initiatives in the Web usage mining and personalization area are
presented.
The main component of a Web personalization system is the usage miner. Log
analysis and Web usage mining is the procedure where the information stored in the
Page | 10
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
Web server logs is processed by applying statistical and data mining techniques, such
as clustering, association rules discovery, classification and sequential pattern
discovery, in order to reveal useful patterns that can be further analyzed. Such
patterns differ according to the method and the input data used, and can be user and
page clusters, usage patterns and correlations between user groups and Web pages.
Those patterns can then be stored in a database or a data cube and query mechanisms
or OLAP operations can be performed in combination with visualization techniques.
The most important phase of Web usage mining is data filtering and pre-processing.
In that phase, Web log data should be cleaned or enhanced, and user, session and page
view identification should be performed. Web personalization is a domain that has
been recently gaining great momentum not only in the research area, where many
research teams have addressed this problem from different perspectives, but also in
the industrial area, where there exists a variety of tools and applications addressing
one or more modules of the personalization process. Enterprises expect that by
exploiting the information hidden in their Web server logs they could discover the
interactions between their Web site visitors and the products offered through their
Web site. Using such information, they can optimize their site in order to increase
sales and ensure customer retention. Apart from Web usage mining, user profiling
techniques are also employed in order to form a complete customer profile. Lately,
there is an effort to incorporate Web content in the recommendation process, in order
to enhance the effectiveness of personalization. However, a solution that combines
efficiently techniques used in user profiling, Web usage mining, content acquisition
and management as well as Web publishing has not yet been proposed.
Page | 11
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
dynamically suggest links for him to navigate. In this paper, we describe the overall
design of a system that implements these ideas, and elaborate on the preprocessing,
clustering, and dynamic link suggestion tasks. We present some experimental results
generated by analyzing the access log of a web site.
We have presented a system design that facilitates the analysis of past user
access patterns to discover common user access behavior. This information can then
be used to improve the static hypertext structure, or to dynamically insert links to web
pages. We have implemented the offline module and the session-logging web server,
and started work on the online module. We are distributing the offline module as
public domain software:
ftp://www-db.stanford.edu/pub/analog/analog.0.1.tar.Z
Web administrators may find the tool useful for analyzing user access logs
generated by a NCSA http server. Our experimental results obtained by analyzing real
user access logs show that indeed clusters of user access patterns exist. Further, some
of these clusters are not apparent from the physical linkage of the pages, and thus
would not be identified without looking at the logs. For future work, we will look
into how to capture the order of accesses to better represent user interests, the use of
semantic information to model user interests, the impact of different clustering
algorithms on the quality of the cluster information, and the effectiveness of the
suggestions given to the users (i.e., we need to evaluate whether the users find the
suggestions useful).
Web usage mining has been used effectively as an underlying mechanism for
Web personalization and recommender systems. A variety of recommendation
frameworks have been proposed, including some based on non-sequential models,
such as association rules and clusters, and some based on sequential models, such as
sequential or navigational patterns. Our recent studies have suggested that the
structural characteristics of Web sites, such as the site topology and the degree of
Page | 12
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
Page | 13
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
Page | 14
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering
Facilitating Effective User Navigation 2. LITERATURE SURVEY
A key part of the personalization process is the generation of user models. The
most commonly used user models are still rather simplistic, representing the user as a
vector of ratings or using a set of keywords. Even where more multi-dimensional or
ontological information has been available, the data is generally mapped onto a single
user-item table which is more amenable for most data mining and machine learning
techniques. To provide the most useful and effective recommendations,
personalization systems need to incorporate more expressive models. Some of the
discussion on the integration of semantic knowledge and technologies in the mining
process suggests that some strides have been made in this direction. However, most of
this work has not, as of yet, resulted in true and tested approaches that can become the
basis of the next generation personalization systems. Another important and difficult
of challenge is the modeling of user context. In particular profiles commonly used
today lack in their ability to model user context and dynamics. Users access different
items for different reasons and under different contexts. The modeling of context and
its use within recommendation generation needs to be explored further. Also, user
interests and needs change with time. Identifying these changes and adapting to them
is a key goal of personalization. However, very little research effort has been
expended the evolution of user patterns over time and their impact on
recommendations. This is in part due to the trade-offs between expressiveness of the
profiles and scalability with respect to the number of active users. Solutions to these
important challenges are likely to lead to the creation of the next generation of more
effective and useful Web personalization and recommender systems that can be
deployed in increasingly more complex Web-based environments.
Page | 15
AVANTHI Institute Of Engineering & Technology,
Gunthapally, RR Dist.
Department of Computer Science Engineering