Title: Perform Evaluation of Any Popular Search Engine Based On Relevancy. (E.g Google) Theory

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Assignment 4


Perform evaluation of any popular search engine based

on relevancy.(E.g Google) Theory : Introduction

The Web can be used as a quick and direct reference to get any type of information all over the world. However, information found on the Web needs to be filtered and may include voluminous misinformation or non relevant information. The Internet surfer may not be aware of many search engines to get information on a topic quickly and may use different search strategies.Finding useful information quickly on the Internet poses a challenge to both the ordinary users and the information professionals. Though the performance of currently available search engines has been improving continuously with powerful search capabilities of various types, the lack of comprehensive coverage, the inability to predict the quality of retrieved results, and the absence of controlled vocabularies make it difficult for users to use search engines effectively. The use of the Internet as an information resource needs to be carefully evaluated as no traditional quality standards or control have been applied to the Web. Librarians need to be able to provide informative recommendations to their clientele regarding the selection of search engines and their effective search strategies. To evaluate an IR system is to measure how well the system meets the information needs of the users.This is troublesome, given that a same result set might beinterpreted differently by distinct users.To deal with this problem, some metrics have been defined that,on average, have a correlation with the preferences of a group of users Without proper retrieval evaluation, one cannot determine how

well the IR system is performing compare the performance of the IR system with that of othersystems, objectively Retrieval evaluationis a critical and integral component of any modern IR system.

Searching the web and how of it?

The world is all over the web the amount of transactions on the world (www) seem to justify this statement. This century has brought the whole shopping experience from the physical stores to the wires. During the last decade we have encountered an extreme shift in the online shopping paradigm. Companies that provide the online shopping facilities have increased tremendously. The users patience threshold is lowering day by day and they want their demands in a single click while sellers also want to push the best of their products. These days, one of the most challenging task is to provide the most relevant and meaningful search results to the customer. Normally, when a user searches for some product, most of them would just go through top 10 to 15 results, So, if the right product doesnt show up within this range of results there is a high probability that the vendor would lose out to a competitor.

What is Relevancy? And its importance

With tremendous increase in the amount of data over the web, it has become really tough to manage data in a way so as to present the user with the most accurate search results. Right information means more business return on investments. In simple words, relevancy can be defined as simplicity and usefulness. If a specific bit of information is useful for the user and they can reach it without making a lot of effort, then that is what relevancy is. Just the way a web designer thinks of making a site that captures users attention within seconds, the search results on the site decide whether a user will be

interested in the site or not. The users engagement with search -experience defines their site-behavior including the likelihood to purchase/complete certain transactions. This calls for the need to study and analyze behavior of site-users, their action-paths, decision-points, interest areas etc. Relevancy means the relationship between things or events.

Measurements for evaluation

Precision of Search Engines After a search, the user is sometimes able to retrieve relevant information andsometimes able to retrieve irrelevant information. The quality of searching the right information accurately would be the precision value of the search engine. Precisionis the fraction of the retrieved documents (the set A) which is relevant i.e., Precision =|Ra| |A|

Consider, R: the set of relevant documents A: the answer set for I, generated by an IR system R a: the intersection of the sets R and A


Precision of Google
Google, being one of the most popular search engines on the Internet, was selected as one of the search engines for comparison. Google focuses on the link structure of the Web to determine relevant results and is representative of the variety of easy-to-use search engines. This study would measure the relevance of

the web sites retrieved for each search query. Advanced search options were used for retrieving sites. Only English pages were searched for each search query since the web pages in other languages would be difficult to assess for relevancy. It was specified that the search query must appear in the title of the web page. Since the number of search results retrieved was large, only the first 100 sites were selected for analysis.

Relative Recall of Google

Recall is the ability of a retrieval system to obtain all or most of the relevant documents in the collection (Shafi& Rather, 2005).Recall is the fraction of the relevant documents which has been retrieved. The relative recall can be calculated using following the formula: Recall = |Ra| |R| consider:R is a set of relevant documents corresponding to query q. |R| be the no. of documents in the set R. A is an answer set. |Ra| be the no. of documents in the intersection of sets R and A. Recall : E measure A measure that combines recall and precision.The idea is to allow the user to specify whether he is more interested in recall or in precision. The E measure is defined as follows E(j) = 1 1+(b*b)

(b*b)/r(j)+1/p(j) where

r(j) is the recall at the j-th position in the ranking P(j) is the precision at the j-th position in the ranking b _ 0 is a user specified parameter E(j) is the E metric at the j-th position in the ranking

The parameter b is specified by the user and reflects the relative importance of recall and precision. If b = 0 E(j) = 1 P(j) low values of b make E(j) a function of precision If b ! 1 limb!1 E(j) = 1 r(j) high values of b make E(j) a function of recal For b = 1, the E-measure becomes the F-measure

Search queries and Web pages retrieved

Queries in the study were designed to test various search features including single word search,phrase search, and a combination of the two using a Boolean operator. The four search topics and their corresponding search queries were: Type 1: Single word search query. Type 2: Phrase search query. Type 3:Two word searches connected by a Boolean AND. Type 4:A phrase search and a word search connected by a Boolean AND. Type 1: Search query : NASA Type of query : Single word search query Rq={d2,d5,d13,d19,d48,d55,d68,d121,d140,d151} Rq is a set containing relevant documents for query.

Ranking for query:1. d68* 2. d48 3. d140 4. d19* 5. d13 6. d151 7. d2 8. d5 9. d55 10. d121 *

Calculated measures for query:-Document d68 corresponds to 10% of all the relevant documents in the set Rq. -Thus having precision of 1/1 i.e. 100% and recall of 1/10 i.e. 10%. -E measure can be calculated by using formula.

Document d68 d19 d55

Precision(%) 100 50 33

Recall(%) 10 20 30

E measure 0.82 (b=1) 0.73 (b=1.1) 0.52 (b=1)

Type2:Search query: University of Pune Type of query : Phrase search query Rq={d5,d7,d19,d23,d58,d70,d99,d190} Ranking of query:1. d58 2. d5 5. d99 6. d70 *

3. d190*7. d19 4. d7 8. d23 *

Calculated measures for query:Document d190 d70 d23 Precision(%) 33 33 38 Recall(%) 13 25 38 E measure 0.81 (b=1) 0.72 (b=1) 0.62 (b=1)

Type3:-Search query:- Passport AND office. Query type:- Two word searches connected by a Boolean AND. Rq={ d2,d5,d10,d55,d80,d90,d125,d150,d200,d250} Ranking for query:1. d55 2. d5 3. d80 4. d125 5. d200 * * * 6. d250 7. d2 8. d90 9. d100 10. d150 * *

Calculated measures for query:Document d55 corresponds to 10% of all the relevant documents in the set Rq.Thus having precision of 1/1 i.e. 100% and recall of 1/10 i.e. 10%.

Document d55 d80 d200 d2 d150

Precision(%) 100 66 60 57 50

Recall(%) 10 20 30 40 50

E-Measure 0.81 (b=1) 0.69 (b=1) 0.60 (b=1) 0.07 (b=1) 0.5 (b=1)

Type4:Search query: admission requirements AND ME. Query type: a phrase search and a word search connected by a Boolean AND. Rq={d10,d25,d5,d1,d70,d65,d100,d15,d150,d80} Ranking for query:1. d25 2. d5 3. d1 4. d70 5. d65 * 6. d100 7. d10 8. d80 9. d15 10. d150 * *

Calculated measures for query:Precision of document d70 is 1/4 i.e. 25%. While recall is 1/10 i.e. 10%. Document Precision (%) Recall(%) E-Measure

d70 d100 d150

25 33 33

10 20 30

0.85 (b=1) 0.75 (b=1) 0.65 (b=1)

While the concept of relevancy as an approach is considered important.Search engine relevancy is a key feature, which often tends to get ignored in losses in terms of time, money and effort to fix and tune the effectiveness of the engine. By giving adequate thought to various parameters, which can boost the relevancy, organizations can achieve a self-sustainable, intelligent and useful search system. Precision, recall and E measure these measurements are best to evaluate the search engine.

You might also like