Next Generation Search Engine: Key Words

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

National Conference on Advances in Computer Applications

2015

Next Generation Search Engine


Dr. B R Prakash Vasantha Kavitha Dr. M. Hanumanthappa
Assistant Professor, Assistant Professor, Professor, Department of
Dept. of MCA, Sri Maharani Lakshmi Ammanni Computer Science &
Siddhartha Institute of College for women, Applications, Bangalore
Technology, Tumkur, Bangalore. University, Bangalore.

Key words: search engines, Information


Abstract retrieval, Mobile Environment
In the recent years, there have been
significant advancements in the areas of
scientific data management and retrieval 1. INTRODUCTION
techniques, particularly in terms of standards Conventional web search engines provide
and protocols for archiving data and search results as a list of their titles and
metadata. Scientific data is rich, and spread summaries, and it is difficult for users to
across different places. In orderto integrate find web pages which they need by these
these pieces to gether, a data archive and systems because search results are too much
associated metadata should be generated. to read. CMU Lab is trying to develop the
Data should be stored in a format that can be next generation search engine, in order to
retrievable and more importantly it should help the users to find web pages which they
be in a format that will continue to be need. So, the name is ‘search@once’.
accessible as technology changes, such as The core theory of Search@once is formal
XML. While general purpose search engines concept analysis and, it provides well-
are useful for finding many things on the structured outputs as line-diagram in terms
Internet, they are often of limited usefulness of semantic of each page.
for locating Earth Science data relevant to a
specific spatiotemporal extent. By contrast, As you know, the 1st generation search
tools that search repositories of structured engines are known as indexing search
metadata can locate relevant datasets with engines or directories. People can use
fairly high precision, but the search is keywords, but that is really ‘searching’, no
limited to that particular repository. other process to determine the relevance of
Federated searches have been used, but can data. Directory is a really good idea and is
be slow and the comprehensiveness can be invented by the portal giant, Yahoo. In this
limited by downtime in any search partner. generation, at least people can find
something from a huge set of data.

ISBN:978-81-927765-0-5
National Conference on Advances in Computer Applications
2015

The 2nd generation which is the current actions will be able to be understood by the
generation is known as the core technology 3rd generation search engines and they will
and of course, a great example would be use them to improve the result for the next
Google’s powerful PageRank. PageRank is user who wants to find answers to a similar
actually a set of formulas and equations term.
which determines the relevance and
reliability of every webpage for every 2. Information retrieval
keyword. People can find the most relevant Information retrieval (IR) deals with the
and reliable data almost instantly. Also, representation,storage, and access to
Google is famous for its indexing engine informationaccording to the user’s
too, because we can search over 1 billion information need. Themain goal of an
webpages in less than 0.1s. This is indeed information retrieval system(IRS) is to bring
amazing! In this generation, we can all get relevant documents to users inresponse to
the most relevant webpages, but of course, their queries. However, the explosionof the
they are fetched and generated by machines. information available on the Internet andits
heterogeneity has made traditional IRS less
So what will happen in 3rd generation effective. The traditional IRS do nottake into
search engines? In my own opinion, the account the user context in the
importance and relevance of data will be retrievalprocess. Indeed, traditional retrieval
determined by humans instead. At the models andsystem design are based solely
moment, when we start searching, we are on the query andthe document collection
telling the search engine what we want to which leads to providingthe same set of
find, and the machines at the background results for different users when thesame
will then return the result to us. This is query is submitted. In order to tackle
actually known as one-way communication, thisproblem, a key challenge in IR is: how to
just like Web 1.0 or Read-Only Web. Why captureand how to integrate contextual
not let the machines understand the information in theretrieval process in order
information of searches we sent? Our to increase the search performance?
searches may be very important for other Contextual retrievalis defined as “combine
users, because we are finding information search technologies andknowledge about
according to computers’ mind, and the result query and user context into asingle
may not be the best. framework in order to provide the
mostappropriate answer for users
In the 3rd generation of search engines, information needs”.Thus, contextual IR aims
machines will collect our searches and at optimizing the retrievalaccuracy by
analyze them. For example, we will tell the involving two related steps:
search engine what we click, how long we appropriatelydefining the context of user
stay, how many pages we view, etc. These informationneeds, commonly called “search

ISBN:978-81-927765-0-5
National Conference on Advances in Computer Applications
2015

context”, andthen adapting the search by time like in touristguide and network routing
taking it into accountin the information applications
selection process.
3. User context: user context is the
centraldimension in contextual IR and the
mostwidely one addressed in the research
area.This dimension contains two sub-
dimensionsrelated respectively to the
personal contextof the user and his social
environment.

a) Personal context: deals with the


followingsub-dimensions:
a. Demographic context:
One of the fundamental questionsin
personalpreference attributes such
contextual IR is: which context
aslanguage
dimensionsshould be considered in the
b. Psychological context: anxietyand
retrieval process? We considered five
frustration are examples ofuser’s affective
context specific dimensions listed below,
characteristics
that have been explored in contextual IR
that influence information-seekingbehavior
literature.
and user’s relevance assessments.
c. Cognitive context: this sub-dimensionis
1. Device: device refers to a physical tool
the most addressed onein the area. It refers
thatgives to the user direct access to the
to the user’slevels of expertise and
informationsuch as computer, mobile phone,
userinterests either short-term onesor long-
PDAetc. Regarding this dimension,
term ones.
adaptingretrieval consists mainly in
considering thedevice characteristics.
b) Social context: points on the
user’scommunity such as friends,
2. Spatio-temporal contex: this
neighboursand colleagues for instance.
dimensioncontains two sub-dimensions
Accordingto the social dimension, adapting
related respectivelyto geographical location
retrievalaims at leveraging the
and time.According to this dimension,
searchaccording to implied preferences ofthe
contextualretrieval aims at delivering the
user’s community rather than justthe
informationthat better addresses the user’s
individual .
situation inspatio-temporal applications
where the dataand/or query objects change
4. Task/problem: this dimension refers to
their locationsand they are not valid over
thebasic goal or intention behind the search

ISBN:978-81-927765-0-5
National Conference on Advances in Computer Applications
2015

activitysuch as fact-finding vs. exploration


task,transactional, informational or
navigational task in web search.

5. Document context: Two main sub-


dimensionscould characterize the
documentcontext. The first one concerns the
documentsurrogates (relevant text
fragments) such asform, colors, structural
elements, citations, metadata. The second
dimension concerns the datasource
characteristics and their perceptionby the
users .

3. Contextual IR in Mobile
Environment
Most traditional search engines do not
consider the search context in the retrieval
process and arenot tuned to mobile
environments. Recent worksin IR
community attempt to improve the
searchaccuracy in this environment. This
research workscan be grouped under the
field of “ContextualMobile Information
Retrieval” (CMIR). CMIR aimsto tackle the Contextual retrieval is achieved by
problem of information overloadby exploiting themobile context during query
providing appropriate results according tothe reformulation and
resource constraints in one hand and document re-ranking steps. Below we give
users’location, time and interests on the anoverview of some significant approaches
other hand. in thisdomain which we can categorize into
three maincategories: device-based
adaptation approaches,
location-based adaptation approaches and
userbasedadaptation approaches.

• Device-based adaptation:
• Location-based adaptation:
• User-based adaptation:
ISBN:978-81-927765-0-5
National Conference on Advances in Computer Applications
2015

characteristics, while the valueis the result


4. MOBILESEARCH USING A of the reasoning based on the premise.
SPATIOTEMPORAL
The premise part of a case referred in our
Motivation and General Approach
situationsimilarity computing seating, is a
In mobile IR, the computing environment is
specific searchsituation S of a mobile user,
continuouslychanging due to the inherent
while the value partof a case is the user
mobilityframework. More specifically,
profile G to be used for thepersonalization
users’ interests maychange anytime due to
of the search results. Each casefrom our case
change in their environment(location, time,
base represents then a specific element from
near persons, etc.). Just forexample, assume
U, denoted: Case = (S, G). For eachnew
that a person being at a “museum”submits
submitted query, we build a new
the query “Water lilies”, knowing that heis
semanticsituation, by modeling its
interested both in “art” and “gardens”, we
associated time andlocation contexts. A
canimprove search results by taking into
situation based similaritymeasure is set up
account hisinterests for “art” and not for
and allows selecting the mostsimilar
“gardens” giventhat he is at a “museum” and
situation, from the past ones from the
not in a “garden”.Static approaches for
casebase. When the computed similarity is
building the user profile aretherefore poorly
above athreshold value, we re-rank the
useful, so we rather focus on moredynamic
search results ofthe query using the user
techniques, any time capable of adjustingthe
profile associated to themost similar
user interests to the current search
situation. After the user clicks orviews
situation.Our general approach for search
interesting documents, the user feedbackis
personalizationrelies on building and
selecting the most appropriateuser profile in
Teduu is Search Engine For Next
a particular search situation. Infact, while a
Generation
user can have many profiles, one ofthese When you search on www.teduu.com,
profiles is the one primarily correspondingto search beyond limit, you get
the current users’ query and situation. In
orderto select the most adequate user profile  All searches of google, yahoo,
to be usedfor personalization, we compare bing, youtube, meta cafe etc, in the
the similaritybetween a new search situation most sophisticated manner ever
you ever cherished for.The
and the past ones.Comparing past user
searches from all of them are
experiences is referred toin the literature as reflected with proper
case-based reasoning (CBR). In CBR a differentiation on single screen.
problem issolved based on solutions of past  You can listen music while you
similar problems.A case is described by a search any information.
pair tuple <premise,value>. Premise is the  Your searches are more result
description of the casewhich contains its oriented as you have choice and
ease of information
 You ISBN:978-81-927765-0-5
earn point when you search
images, videos, news etc(coming
very soon).
 Your points gets you rewards,
prizes and eligibility to take part in
contests.(coming very soon).
National Conference on Advances in Computer Applications
2015

used to maintain the case base.

5. CONCLUSION AND
FUTURE WORK
This chapter gives an overview of a number
ofrepresentative state-of-the-art contextual
IR techniquesin the mobile environment and
describes ourspatio-temporal based
personalization approachfor mobile search.
Our approach for personalizingmobile
search consists of three basic steps:
(1)inferring semantic situations from low
level locationand time data, (2) learning and
maintaininguser interests based on his
search history relatedto the identified
situations, (3) selecting a profileto use for
personalization given a new situationby
exploiting a CBR technique. We have
presenteda novel evaluation framework
based on adiary study approach devoted for
a context-awarepersonalization approach for
mobile search. Weevaluated our approach
according to the proposedevaluation
framework and show that it is effective.In
future work, we plan to extend this protocol
byusing real user data provided from a
search enginelog file. Extending the protocol
aims at testing theeffectiveness of the
personalized search based on
real mobile search contexts and click-
through dataavailable in the log file.

References
[1] K. Tamsin Maxwell, “Pushing the
Envelope: Innovation in Legal Search,”
in VoxPopuLII, Legal Information Institute,
Cornell University Law School, 17 Sept.

ISBN:978-81-927765-0-5
National Conference on Advances in Computer Applications
2015

2009.http://blog.law.cornell.edu/voxpop/200 Lachmayer, G. Schefbeck Eds., Festschrift


9/09/17/pushing-the-envelope-innovation- ed. for Erich Schweighofer, Editions
in-legal-search/ Weblaw, Bern, pp. 417-427, 2011.
[2] Howard Turtle, “Natural Language vs. [6] On Cluster definition and
Boolean Query Evaluation: A Comparison population: Qiang Lu, Jack G. Conrad,
of Retrieval Performance,” In Proceedings Khalid Al-Kofahi, William Keenan, “Legal
of the 17th Annual International ACM- Document Clustering with Build-in Topic
SIGIR Conference on Research & Segmentation,” In Proceedings of the 2011
Development in Information Retrieval ACM-CIKM Twentieth International
(SIGIR 1994) (Dublin, Ireland), Springer- Conference on Information and Knowledge
Verlag, London, pp. 212-220, 1994. Management (CIKM 2011)(Glasgow,
[3] West’s Key Number Scotland), ACM Press, pp. 383-392, 2011.
System: http://info.legalsolutions.thomsonre [7] On Cluster association with individual
uters.com/pdf/wln2/L-374484.pdf documents: Qiang Lu and Jack G.
[4] West’s KeyCite Citator Conrad, “Bringing order to legal documents:
Service: http://info.legalsolutions.thomsonre An Issue-based Recommendation System
uters.com/pdf/wln2/L-356347.pdf via Cluster Association,” In Proceedings of
[5] Peter Jackson and Khalid Al-Kofahi, the 4th International Conference on
“Human Expertise and Artificial Intelligence Knowledge Engineering and Ontology
in Legal Search,” inStructuring of Legal Development (KEOD 2012) (Barcelona,
Semantics, A. Geist, C. R. Brunschwig, F. Spain), SciTePress DL, pp. 76-88, 2012.

ISBN:978-81-927765-0-5

You might also like