Textbook Textual and Visual Information Retrieval Using Query Refinement and Pattern Analysis S G Shaila Ebook All Chapter PDF

Textual and Visual Information
Retrieval using Query Refinement and

Pattern Analysis S.G. Shaila
Visit to download the full and correct content document:
https://textbookfull.com/product/textual-and-visual-information-retrieval-using-query-re
finement-and-pattern-analysis-s-g-shaila/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Image Retrieval and Analysis Using Text and Fuzzy Shape

Features Emerging Research and Opportunities P. Sumathy
https://textbookfull.com/product/image-retrieval-and-analysis-
using-text-and-fuzzy-shape-features-emerging-research-and-
opportunities-p-sumathy/
Introduction to Information Retrieval Manning
https://textbookfull.com/product/introduction-to-information-
retrieval-manning/
Experiment and Evaluation in Information Retrieval

Models 1st Edition K. Latha
https://textbookfull.com/product/experiment-and-evaluation-in-
information-retrieval-models-1st-edition-k-latha/
INFORMATION RETRIEVAL a biomedical and health

perspective 4th Edition William Hersh
https://textbookfull.com/product/information-retrieval-a-
biomedical-and-health-perspective-4th-edition-william-hersh/
Textual and Contextual Analysis in Empirical
Translation Studies 1st Edition Sara Laviosa
https://textbookfull.com/product/textual-and-contextual-analysis-
in-empirical-translation-studies-1st-edition-sara-laviosa/
Mobile Information Retrieval 1st Edition Prof. Fabio

Crestani
https://textbookfull.com/product/mobile-information-
retrieval-1st-edition-prof-fabio-crestani/
Textual and Visual Representations of Power and Justice

in Medieval France: Manuscripts and Early Printed Books
Rosalind Brown-Grant
https://textbookfull.com/product/textual-and-visual-
representations-of-power-and-justice-in-medieval-france-
manuscripts-and-early-printed-books-rosalind-brown-grant/
Information Retrieval Technology: 14th Asia Information

Retrieval Societies Conference, AIRS 2018, Taipei,
Taiwan, November 28-30, 2018, Proceedings Yuen-Hsien
Tseng
https://textbookfull.com/product/information-retrieval-
technology-14th-asia-information-retrieval-societies-conference-
airs-2018-taipei-taiwan-november-28-30-2018-proceedings-yuen-
hsien-tseng/
Debates, Rhetoric and Political Action: Practices of

Textual Interpretation and Analysis 1st Edition Claudia
Wiesner
https://textbookfull.com/product/debates-rhetoric-and-political-
action-practices-of-textual-interpretation-and-analysis-1st-
edition-claudia-wiesner/
S. G. Shaila · A. Vadivel
Textual and Visual

Information Retrieval
using Query Refinement
and Pattern Analysis
Textual and Visual Information Retrieval
using Query Refinement and Pattern Analysis
S. G. Shaila A. Vadivel
•
Textual and Visual

Information Retrieval
using Query Refinement
and Pattern Analysis
123
S. G. Shaila A. Vadivel
Department of Computer Science Department of Computer Science
and Engineering and Engineering
Dayananda Sagar University SRM University AP
Bangalore, India Amaravati, Andhra Pradesh, India
ISBN 978-981-13-2558-8 ISBN 978-981-13-2559-5 (eBook)

https://doi.org/10.1007/978-981-13-2559-5
Library of Congress Control Number: 2018955166
© Springer Nature Singapore Pte Ltd. 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
This book is dedicated to my guide, who
is the co-author of the book, my parents
and my family. Their encouragement and
understanding helped me to complete this
work. I thank my husband and children,
Vinu and Shamitha, for their love and
support to bring this book as a reality.
Foreword
I am extremely delighted to write the foreword for Textual and Visual Information
Retrieval using Query Refinement and Pattern Analysis. The authors, Dr. S. G. Shaila
and Dr. A. Vadivel, have disseminated their knowledge on information retrieval
through this book. The content of the book is a very good educational and valuable
resource for researchers in the domain of information retrieval.
The book includes various chapters on deep Web crawler, event pattern retrieval,
Thesaurus generation and query expansion, CBIR applications and indexing and
encoding to cover the whole concept of information retrieval. Each chapter contains
the theoretical information with experimental results and is intended for information
retrieval researchers. The layout of each chapter includes a table of contents, intro-
duction, material content and experimental results with analysis and interpretation.
This edition of the book reflects new guidelines that have evolved in information
retrieval in terms of text- and content-based information retrieval schemes. The
indexing and encoding mechanism of the low-level feature vector is also presented
with results and analysis.
It is my hope and expectation that this book will provide an effective learning
experience and referenced resources for all information retrieval researchers,
leading to advanced research.
Bangalore, Karnataka, India Dr. M. K. Banga

Chairman
Department of Computer
Science Engineering
Dean (Research)
School of Engineering
Dayananda Sagar University
vii
Preface
Multimedia information retrieval from the distributed environment is an important

research problem. It requires an architecture specification for handling various
issues such as techniques to crawl information from WWW, user query prediction
and refinement mechanisms, text and image feature extraction, indexing and
encoding, similarity measure. In this book, research issues related to all the
above-mentioned problems are discussed and suitable techniques are presented in
various chapters.
Both the text and images are presented in web documents. In a comprehensive
retrieval mechanism, text-based information retrieval (TBIR) plays an important
role. In Chap. 1, the text-based retrieval is used for retrieving relevant documents
from the Internet by using a suitable crawler with the capability to crawl deep and
surface web. The functional dependency of core and allied fields in HTML FORM
is identified for generating rules using SVM classifier. The presented crawler fet-
ches a large number of documents while using the values in most preferable class.
This architecture has a higher coverage rate and reduces fetching time.
In recent times, information classification is very important for text-based
information retrieval. In Chap. 2, the classification based on events is presented and
also the event Corpus is discussed, which is important for many real-time applica-
tions. Event patterns are identified and extracted at the sentence level using term
features. The terms that trigger events along with the sentences are extracted from
web documents. The sentence structures are analysed using POS tags. A hierarchal
sentence classification model is presented by considering specific term features
of the sentence, and the rules are derived along with fuzzy rules to get the importance
of all term features of the sentences. The performance of the method is evaluated for
‘Crime’ d ‘Die’ and found that the performance of this approach is encouraging.
In general, the retrieval system depends on the user query to retrieve the web
documents. The user-defined queries should have sufficient relevant terms, since the
retrieval set depends on the queries. The query refinement through query expansion
mechanism plays an important role. In Chap. 3, the N-gram Thesaurus construction
mechanism for query expansion is presented. The HTML TAGs in web documents
are considered and their syntactical context is understood. Based on the significance
ix
x Preface
of the TAGs in designing the web pages, suitable weight is assigned for TAGs. The
term weight is calculated using corresponding TAG weight and frequency of the
term. The terms along with the TAG information are updated into an inverted index.
The N-grams are generated using the term and term weights in the document and
updated as N-grams in the Thesaurus. During the query session, the term is
expanded based on the content in the Thesaurus and suggested to the user. It is
found that while the selected query is submitted to the retrieval system, the retrieval
set consists of a large number of relevant documents.
In Chap. 4, the issues related to content-based image retrieval (CBIR) are pre-
sented. The chapter presents a histogram based on human colour visual perception
by extracting the low-level features. For each pixel, the true colour and grey colour
proportion are calculated using a suitable weight function. During histogram con-
struction, the hue and intensity values are iteratively distributed to the neighbouring
bins. The NBS distance is calculated between the reference bin and their adjacent
bins. The NBS distance provides the overlapping proportion of the colour from the
reference bin to its adjacent bins, and accordingly, the weight is updated. The
distribution makes it possible to extract the background colour information effec-
tively along with the foreground information. The low-level features of all the
images in the database are extracted and stored in a feature database, and the
relevant images are retrieved based on the rank. The Manhattan distance is used as a
similarity measure, and the performance of the histogram is evaluated on Coral
benchmark dataset.
In Chap. 5, the issues of indexing and encoding of low-level features and a
similarity measure are presented. In CBIR system, the low-level features are stored
along with the images and require a large number of storage space along with
increased search and retrieval time. The search time increases linearly with the
database size, which reduces the retrieval performance. The colour histograms of
images are considered as low-level features. The bin values are analysed to
understand their contribution representing image colour. The trivial bins are trun-
cated and removed, and only important bins are considered to have histograms with
lesser number of bins. The coding scheme used GR coding algorithm, and the
quotient and remainder code parts are evaluated. Since there is variation between
the number of bins in the query and database histogram, BOSM is used as a
similarity measure. The performance of all the schemes is evaluated in an image
retrieval system. The retrieval time, number of bits needed for histogram
construction and precision of retrieval are evaluated using benchmark datasets, and
the performance of the presented approach is encouraging.
Finally, as a whole, the book presents various important issues in information
retrieval research filed and will be very much useful for the postgraduates and
researchers working in information retrieval problems.
Bangalore, India S. G. Shaila

Amaravati, India A. Vadivel
Acknowledgements
First and foremost, we thank the Almighty for giving the wisdom, health, envi-
ronment and people to complete this book.
We express our sincere gratitude to Dr. Hemachandra Sagar and Premachandra
Sagar, Chancellor and Pro-Chancellor, Dayananda Sagar University, Bangalore;
Dr. A. N. N. Murthy, Vice Chancellor, Dayananda Sagar University, Bangalore;
Prof. Janardhan, Pro-Vice Chancellor, Dayananda Sagar University, Bangalore;
Dr. Puttamadappa C., Registrar, Dayananda Sagar University, Bangalore;
Dr. Srinivas A., Dean, School of Engineering, Dayananda Sagar University,
Bangalore; Dr. M. K. Banga, Chairman, Department of CSE, and Dean Research,
Dayananda Sagar University, Bangalore, for providing an opportunity and moti-
vation to write this book.
We express our sincere gratitude to Dr. P. Sathyanarayanan, President,
SRM University, Amaravati, AP; Prof. Jamshed Bharucha, Vice Chancellor, SRM
University, Amaravati, AP; Prof. D. Narayana Rao, Pro-Vice Chancellor, SRM
University, Amaravati, AP; Dr. D. Gunasekaran, Registrar, SRM University,
Amaravati, AP, for providing an opportunity and motivation to write this book.
We would like to express our sincere thanks to our parents, spouse, children and
faculty colleagues for their support, love and affection. Their inspiration gave us the
strength and support to finish the book.
Dr. S. G. Shaila
Dr. A. Vadivel
xi
Contents
1 Intelligent Rule-Based Deep Web Crawler . . . . . . . . . . . . . . . . . . . . 1

1.1 Introduction to Crawler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Reviews on Web Crawlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Deep and Surface Web Crawler . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Estimating the Core and Allied Fields . . . . . . . . . . . . . . . . . . . . . 6
1.5 Classification of Most and Least Preferred Classes . . . . . . . . . . . . 8
1.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Functional Block Diagram of Distributed Web Crawlers . . . . . . . . 10
1.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8.1 Rules for Real Estate Domain
in http://www.99acres.com . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8.2 Performance of Deep Web Crawler . . . . . . . . . . . . . . . . . . 14
1.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Information Classification and Organization Using Neuro-Fuzzy
Model for Event Pattern Retrieval . . . . . . . . . . . . . . . . . . . . . . . . .. 21
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21
2.2 Reviews on Event Detection Techniques at Sentence Level . . . . .. 23
2.3 Schematic View of Presented Event Detection Through Pattern
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26
2.4 Building Event Mention Sentence Repository from Inverted
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Event Mention Sentence Classification . . . . . . . . . . . . . . . . . . . . . 29
2.6 Refining and Extending the Rules Using Fuzzy Approach . . . . . . 33
2.6.1 Membership Function for Fuzzy Rules . . . . . . . . . . . . . . . 34
2.6.2 Verification of Presented Fuzzy Rules Using Fuzzy Petri
Nets (FPN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37
2.7 Weights for Patterns Using Membership Function . . . . . . . . . . .. 40
xiii
xiv Contents
2.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 43

2.8.1 Performance Evaluation Using Controlled Dataset ....... 43
2.8.2 Performance Evaluation for Uncontrolled Dataset
Generated from WWW . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8.3 Web Corpus Versus IBC Corpus . . . . . . . . . . . . . . . . . . . 51
2.9 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 Constructing Thesaurus Using TAG Term Weight for Query
Expansion in Information Retrieval Application . . . . . . . . . . . . . . . . 55
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Reviews on Query Expansion and Refinement Techniques . . . . . . 56
3.3 Architecture View of Query Expansion . . . . . . . . . . . . . . . . . . . . 59
3.4 TAG Term Weight (TTW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5 N-Gram Thesaurus for Query Expansion . . . . . . . . . . . . . . . . . . . 64
3.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.7.1 Performance Evaluation of Query Reformulation
and Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.7.2 Performance of TTW Approach . . . . . . . . . . . . . . . . . . . . 71
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Smooth Weighted Colour Histogram Using Human Visual
Perception for Content-Based Image Retrieval Applications . . . . . . . 77
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Reviews on Colour-Based Image Retrieval . . . . . . . . . . . . . . . . . . 79
4.3 Human Visual Perception Relation with HSV Colour Space . . . . . 80
4.4 Distribution of Colour Information . . . . . . . . . . . . . . . . . . . . . . . 81
4.5 Weight Distribution Based on NBS Distance . . . . . . . . . . . . . . . . 82
4.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5 Cluster Indexing and GR Encoding with Similarity Measure
for CBIR Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2.1 Review on Indexing Schemes . . . . . . . . . . . . . . . . . . . . . . 94
5.2.2 Literature Review on Encoding Approaches . . . . . . . . . . . 97
5.2.3 Literature Review on Similarity Metrics . . . . . . . . . . . . . . 98
5.3 Architectural View of Indexing and Encoding with Similarity
Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 99
Contents xv
5.4 Histogram Dimension-Based Indexing Scheme . . . . . . . . . . . . . . . 99

5.5 Coding Using Golomb–Rice Scheme . . . . . . . . . . . . . . . . . . . . . . 102
5.6 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.6.1 Algorithm—BOSM (Hq, Hk) . . . . . . . . . . . . . . . . . . . . . . 106
5.7.1 Experimental Results on Coding . . . . . . . . . . . . . . . . . . . . 107
5.7.2 Retrieval Performance of BOSM on Various
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.7.3 Evaluation of Retrieval Time and Bit Rate . . . . . . . . . . . . 109
5.7.4 Comparative Performance Evaluation . . . . . . . . . . . . . . . . 110
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
About the Authors
Dr. S. G. Shaila is an Associate Professor in the Department of Computer Science

and Engineering in Dayananda Sagar University, Bangalore, Karnataka. She
earned her Ph.D. in computer science from the National Institute of Technology,
Tiruchirappalli, Tamil Nadu, for her thesis on multimedia information retrieval in
distributed systems. She brings with her years of teaching and research
experience. She worked for the DST project, Govt of India and Indo US based
projects as Research Fellow. Her main areas of interest are information retrieval,
image processing, cognitive science and pattern recognition. She published 20
international journals and confernce proceedings.
Dr. A. Vadivel received his master’s in science from the National Institute of
Technology, Tiruchirappalli (NITT), Tamil Nadu, before completing a master’s in
Technology (M.Tech.) and Ph.D. at the Indian Institute of Technology (IIT),
Kharagpur, India. He has 12 years of technical experience as a network and
instrumentation engineer at the IIT Kharagpur and 12 years of teaching experience
at Bharathidasan University and NITT. Currently, he is working as Associate
Professor at SRM University, Amaravati, AP. He has published papers in more than
135 international journals and conference proceedings. His research areas are
content-based image and video retrieval, multimedia information retrieval from
distributed environments, medical image analysis, object tracking in motion video
and cognitive science. He received the Young Scientist Award from the Department
of Science and Technology, Government of India, in 2007; the Indo-US Research
Fellow Award from the Indo-US Science and Technology Forum in 2008; and the
Obama-Singh Knowledge Initiative Award in 2013.
xvii
Abbreviations
ACE Automatic Content Extraction

ALNES Active Long Negative Emotional Sentence
ALNIES Active Long Negative Intensified Emotional Sentence
ALPES Active Long Positive Emotional Sentence
ALPIES Active Long Positive Intensified Emotional Sentence
ANFIS Artificial neuro-fuzzy inference system
ANN Artificial neural network
ASNES Active Short Negative Emotional Sentence
ASNIES Active Short Negative Intensified Emotional Sentence
ASPES Active Short Positive Emotional Sentence
ASPIES Active Short Positive Intensified Emotional Sentence
Bo1 Bose–Einstein statistics model
BoCo Bose–Einstein statistics co-occurrence model
BOSM Bin overlapped similarity measure
CART Classification and regression tool
CBIR Content-based image retrieval
CF Core field
Co Co-occurrence
CRF Conditional random fields
DCT Discrete cosine transform
DOM Document object model
EMD Earth mover’s distance
ET Emotional triggered
FGC Form Graph Clustering
FPN Fuzzy Petri-Nets
GR Golomb–Rice
HCPH Human colour perception histogram
HiWE Hidden Web Exposer
HQAOS High-Qualified Active Objective Sentence
HQASS High-Qualified Active Subjective Sentence
xix
xx Abbreviations
HQPOS High-Qualified Passive Objective Sentence

HQPSS High-Qualified Passive Subjective Sentence
HSV Hue Saturation Value
HTML Hyper-Text Markup Language
IBC Iraq Body Count
InPS Internal Property Set
IPS Input Property Set
IR Information retrieval
IRM Integrated Region Matching
IWED Integrated Web Event Detector
KDB K-Dimensional B-tree
KLD Kullback–Liebler divergence
KLDCo Kullback–Liebler divergence co-occurrence model
LDA Latent Dirichlet allocation
LP Least Preferable
LVS Label value set
MAP Mean average precision
MCCM Colour-based co-occurrence matrix scheme
ME Mutually Exclusive
MEP Minimum Executable Pattern
MHCPH Modified human colour perception histogram
MP Most Preferable
MRR Mean reciprocal recall
MUC Message Understanding Conference
NBS National Bureau of Standards
NIST National Institute of Standards and Technology
NLP Natural language processing
NQAOS Non-Qualified Active Objective Sentence
NQASS Non-Qualified Active Subjective Sentence
NQPOS Non-Qualified Passive Objective Sentence
NQPSS Non-Qualified Passive Subjective Sentence
OPS Output Property Set
PHOTO Pyramid histogram of topics
PIW Publicly Indexable Web
PLNES Passive Long Negative Emotional Sentence
PLNIES Passive Long Negative Intensified Emotional Sentence
PLPES Passive Long Positive Emotional Sentence
PLPIES Passive Long Positive Intensified Emotional Sentence
POS Part of speech
PQ Product quantization
PSNES Passive Short Negative Emotional Sentence
PSNIES Passive Short Negative Intensified Emotional Sentence
PSPES Passive Short Positive Emotional Sentence
PSPIES Passive Short Positive Intensified Emotional Sentence
QA Question Answering
Abbreviations xxi
QAOS Qualified Active Objective Sentence

QASS Qualified Active Subjective Sentence
QPOS Qualified Passive Objective Sentence
QPSS Qualified Passive Subjective Sentence
RED Retrospective new Event Detection
RS Rule set
SCM Sentence classification model
SOAP Simple object access protocol
SQAOS Semi-Qualified Active Objective Sentence
SQASS Semi-Qualified Active Subjective Sentence
SQPOS Semi-Qualified Passive Objective Sentence
SQPSS Semi-Qualified Passive Subjective Sentence
SVM Support vector machine
SWC Surface web crawlers
SWR Semantic Web Retrieval
TBIR Text-based information retrieval
TDT Topic Detection and Tracking
TS Text Summarization
TSN Term semantic network
TTW TAG term weight
UMLS Unified Medical Language System
URL Uniform Resource Locator
ViDE Vision-based data extraction
WaC Web as Corpus
WSDL Web Services Description Language
WWW World Wide Web
XML Extensible Markup Language
List of Figures
Fig. 1.1 Block diagram of deep web crawler . . . . . . . . . . . . . . . . . . . . . . 5

Fig. 1.2 Co-relation between core and allied fields . . . . . . . . . . . . . . . . . . 7
Fig. 1.3 Classification of allied and core fields . . . . . . . . . . . . . . . . . . . . . 7
Fig. 1.4 Functional view of distributed web crawler. . . . . . . . . . . . . . . . . 10
Fig. 1.5 Sample core and allied fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Fig. 1.6 a Rule for real estate web application. b Classification
result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13
Fig. 1.7 Average precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15
Fig. 1.8 Coverage rate of the crawler. a Coverage rate for single fetch.
b Coverage rate for periodic fetch . . . . . . . . . . . . . . . . . . . . . .. 16
Fig. 1.9 Retrieval rate by crawlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17
Fig. 2.1 Event features. a Relationship of Event features, b Sample
Event features for crime-related document . . . . . . . . . . . . . . . . . 22
Fig. 2.2 Presented approach framework . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Fig. 2.3 Event Mention sentence repository . . . . . . . . . . . . . . . . . . . . . . . 28
Fig. 2.4 Hierarchical Event Mention sentence classification . . . . . . . . . . . 30
Fig. 2.5 Triangular membership function for Subjective Active
class patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36
Fig. 2.6 Triangular membership function for complete Subjective
class patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37
Fig. 2.7 Fuzzy Petri Net representation of the sentence
classification model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Fig. 2.8 Reachability graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Fig. 2.9 Significant levels for patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Fig. 2.10 Weights derived for Subjective class patterns . . . . . . . . . . . . . . . 42
Fig. 2.11 Event Corpus built from Event Instance patterns . . . . . . . . . . . . 43
Fig. 2.12 F1 score (%). a Various training data and b various
Event Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Fig. 3.1 Functional units of N-gram Thesaurus construction . . . . . . . . . . 59
Fig. 3.2 DOM tree for HTML TAGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Fig. 3.3 Significant scale a <head> field and b <section> . . . . . . . . . . . . 62
xxiii
xxiv List of Figures
Fig. 3.4 Weight for TAGs under <head> and <section> . . . . . . . . . . . . . 63

Fig. 3.5 Weight distribution of HTML TAGs. . . . . . . . . . . . . . . . . . . . . . 64
Fig. 3.6 Unigram Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Fig. 3.7 Bigram Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Fig. 3.8 Procedure to generate N-gram Thesaurus . . . . . . . . . . . . . . . . . . 67
Fig. 4.1 HSV colour model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Fig. 4.2 Distribution of true colour and grey colour components . . . . . . . 82
Fig. 4.3 Construction of smooth weight distribution tree . . . . . . . . . . . . . 83
Fig. 4.4 Smooth distribution of hue and intensity . . . . . . . . . . . . . . . . . . . 87
Fig. 4.5 Average precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Fig. 4.6 Average recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Fig. 4.7 Average precision versus recall . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Fig. 4.8 Average F measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Fig. 4.9 Sample retrieval set using MHCPH . . . . . . . . . . . . . . . . . . . . . . 89
Fig. 4.10 Sample retrieval set using HCPH . . . . . . . . . . . . . . . . . . . . . . . . 90
Fig. 5.1 Schematic diagram of the presented approach of indexing,
encoding and distance similarity measure . . . . . . . . . . . . . . . . . . 100
Fig. 5.2 Sample histogram with empty cells (bins) . . . . . . . . . . . . . . . . . . 101
Fig. 5.3 Sample indexing structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Fig. 5.4 Values of M for various indexing level . . . . . . . . . . . . . . . . . . . . 102
Fig. 5.5 View of database cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Fig. 5.6 Working principle of BOSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Fig. 5.7 Retrieval performance of encoded and flat histogram
a precision, b precision versus recall, c F1 score . . . . . . . . . . . . 108
Fig. 5.8 Performance of BOSM on MIT dataset_212 a precision,
b precision versus recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Fig. 5.9 Retrieval set MIT dataset_212 a clustered quotient code,
b clustered combined, c flat qcode, d flat ccode . . . . . . . . . . . . . 109
Fig. 5.10 Retrieval set from Caltech dataset_101 . . . . . . . . . . . . . . . . . . . . 113
Fig. 5.11 Retrieval set from Caltech dataset_256 . . . . . . . . . . . . . . . . . . . . 114
Fig. 5.12 Performance of similarity measure . . . . . . . . . . . . . . . . . . . . . . . 114
Fig. 5.13 Performances of distance measures for Caltech dataset_101 . . . . 115
Fig. 5.14 Performance on Caltech dataset_256 . . . . . . . . . . . . . . . . . . . . . . 115
List of Tables
Table 1.1 Estimated relationship between label and value for real estate
web application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11
Table 1.2 Combination of AF2j and AF1 for real estate web
applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 1.3 Coverage Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Table 2.1 POS-tagged Event Mention sentences . . . . . . . . . . . . . . . . . . . . 29
Table 2.2 POS tags used for sentence classification . . . . . . . . . . . . . . . . . 30
Table 2.3 Event Mention sentence classification in three levels
using CART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32
Table 2.4 Event Mention patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32
Table 2.5 Sentential term features and POS tags . . . . . . . . . . . . . . . . . .. 33
Table 2.6 Fuzzy rules of patterns for (a) Subjective class,
(b) Objective class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35
Table 2.7 Sample Event Types and their Trigger terms . . . . . . . . . . . . .. 44
Table 2.8 IBC Corpus Statistics for ‘Die’ Event Type . . . . . . . . . . . . . .. 44
Table 2.9 Classification accuracy of human annotation and ANFIS
for Event Type ‘Die’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46
Table 2.10 Performance of ANFIS using tenfold cross-validation for
Event Type ‘Die’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47
Table 2.11 Performance evaluations (%) for IBC dataset using k-fold
cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47
Table 2.12 Performance measure (%) of presented approach with other
approaches for Event Type ‘Die’ . . . . . . . . . . . . . . . . . . . . . .. 48
Table 2.13 Web Corpus Statistics from www.trutv.com/library/crime . . .. 50
Table 2.14 F1 measure of the presented approach with other approaches
for various Event Types in Web Corpus. . . . . . . . . . . . . . . . .. 50
Table 2.15 F1 measure for various combinations of training/test data
for the ‘Die’ Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 51
Table 2.16 Accuracy for the Instances for various Event Types . . . . . . . .. 53
xxv
xxvi List of Tables
Table 3.1 Information on benchmark dataset . . . . . . . . . . . . . . . . . . . . .. 69

Table 3.2 Clueweb09B: expanded queries . . . . . . . . . . . . . . . . . . . . . . .. 70
Table 3.3 Query expansion and user analysis . . . . . . . . . . . . . . . . . . . . .. 70
Table 3.4 Baseline performance on Clueweb09B, WT10g and GOV2
datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71
Table 3.5 Comparative performance of TTW . . . . . . . . . . . . . . . . . . . . .. 71
Table 3.6 Performance comparison of various approaches for various
datasets against baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 72
Table 3.7 Gain improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
Table 3.8 Difference in the improvement gain of comparative
approaches with TTW approach . . . . . . . . . . . . . . . . . . . . . . . . 73
Table 4.1 NBS distance table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Table 5.1 Sample value of multiplication factor . . . . . . . . . . . . . . . . . . . . 102
Table 5.2 Encoded feature (histogram) . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Table 5.3 Sample distance value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Table 5.4 BOSM between histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Table 5.5 Details of benchmark datasets. . . . . . . . . . . . . . . . . . . . . . . . . . 107
Table 5.6 Retrieval time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Table 5.7 Bit ratio for MIT dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Table 5.8 Performance comparison on Caltech dataset_101 . . . . . . . . . . . 111
Table 5.9 Performance comparison on Caltech dataset_256 . . . . . . . . . . . 112
Table 5.10 Precision on Caltech dataset_101 . . . . . . . . . . . . . . . . . . . . . . . 116
Table 5.11 Precision on Caltech dataset_256 . . . . . . . . . . . . . . . . . . . . . . . 116
Chapter 1
Intelligent Rule-Based Deep Web
Crawler
1.1 Introduction to Crawler
In earlier days, the number of documents in WWW was relatively less, and thus,
managing and fetching them for processing is easy. The crawler is used as a tool
by most of the search engine systems for fetching these static-natured documents.
However, WWW has grown fast with thousands to million number of web pages. The
content of HTML is also altered, uploaded by authors often. This makes the retrieval
task complex and difficult to achieve good precision of retrieval. The retrieval result
is also influenced by the user’s query, and search engine systems process the user
query suitably for retrieving relevant results. The web pages are crawled by the
crawler periodically and are stored in the repositories for continuously updating the
content. In recent scenario, the content of WWW is hidden in the backend and referred
to as deep web. The web applications use these hidden information for dynamically
creating web pages, which is the most frequently retrieved information by dynamic
web-based applications. These information is invariably not available to the all well-
known surface web crawlers, since because, hidden nature of database information.
Most of the web-based applications use searchable data sources, and this kind of
information is referred to as deep web. This content is dynamically retrieved by
the users. All these database systems provide tools for performing database-related
analysis and search process.
In general, the crawler fetches documents from Publicly Indexable Web (PIW)
(Alvarez et al. 2007). The PIW consists of set of web pages with interconnected
hypertext links. The pages with FORMs where user provides username and password
for authorization are not visible these kinds of crawler. The advanced web-based
applications demands more web databases are created and used. The data stored in
such as database can be accessed through search interfaces only. In fact, there are
huge number of deep web databases, and query interfaces are hidden with very high
increasing rate (Ajoudanian and Jazi 2009). The number of hidden pages is very
high in number compared to PIW, which shows that large number of information
© Springer Nature Singapore Pte Ltd. 2018 1

S. G. Shaila and A. Vadivel, Textual and Visual Information Retrieval using Query
Refinement and Pattern Analysis, https://doi.org/10.1007/978-981-13-2559-5_1
2 1 Intelligent Rule-Based Deep Web Crawler
is hidden and directly is not accessible by the surface web crawlers. The difference
between the surface and deep web crawlers is the logical challenge in fetching the
information from deep and surface web. The Googlebot (Brooks 2004) and Yahoobot
(Gianvecchio et al. 2008) are well-known crawler and indexers. These applications
are not capable of accessing the web databases. The reason is that the information
from the database will be retrieved after performing computation. As a result, it is
important to understand the query interface, handling the technical challenges such
as interaction with them, locating, accessing and indexing. Also, the web crawler
has to traverse the web automatically, fill the FORMs intelligently, store the fetched
data effectively in local repositories and manage all these tasks.
It is understood from the above discussion that an intelligent deep web crawler is
need of the hour. The crawler should essentially interact with the FORMs in HTML
pages automatically to fill the fields effectively without human intervention. While
filling the FORMs, the combinations of FORM element and FORM value have to
be predicted. This can be made possible by understanding the FORM with prior
knowledge. In this chapter, architecture of a web crawler is presented, where the
FORMs in HTML pages are filled by the following rules. The rules are derived for
Mutually Exclusive, Most Preferable and Least Preferable classes. In addition, all
possible combinations of values are tried, and the crawler efficiency is improved. The
rest of the chapter is organized as follows. The similar work is reviewed in Sect. 1.2,
and intelligent crawler is presented in Sect. 1.3. In Sect. 1.4, the experimental results
are presented and the chapter is concluded in the last section of the chapter.
1.2 Reviews on Web Crawlers
The design and development of crawlers is in steady growth with a large number of
design and specification from acadamia and industry. The basic crawler along with
heuristics is proposed for increasing the efficiency. Chakrabarti et al. (1999) have
proposed a general approach, which uses link structure of the web for analysis. Renni
and McCallum (1999) have proposed a machine learning approach for fetching the
documents. In recent times, the deep web is considered as potential research interest
(Chang et al. 2004). Raghavan and Arasu et al (2001) have developed a web crawler
for interacting with hidden web data through web search interfaces. Hidden Web
Exposer (HiWE) is a deep web crawler to interact with FORM, and the FORMs are
filled using a customized database, which use the layout-based information extraction
approach. However, this approach fails due its FORM-filling strategy based on simple
dependency. The source-biased probing techniques are used to facilitate interaction
with the target database to find its relevancy (Caverlee et al. 2006). The relevancy
is calculated using relevance metrics for evaluating interesting content in the hidden
web data source. One of the limitations of his approach is that it relies on several key
junctures, errors in relevance evaluation and difficult level in identifying relationship.
While processing large number web documents, it is difficult to identify the rele-
vance of deep web content with the query. Zhao et al. (2008) have crawled substantial
1.2 Reviews on Web Crawlers 3
number of web pages from WWW to structure the sources of the web pages for pro-
viding effective query interface for suitable results (Zhao et al. 2008). Organizing
structured deep web sources for various domains of web applications is a challeng-
ing task. A Form Graph Clustering (FGC) approach has been proposed to classify
the content of deep web resources with the help of FORM link graph by suitably
applying the fuzzy partition technique on web FORM. This method is suitable for
single-domain deep web resources. The schema matching is continuously found
using probability of schema. However, there is a higher chance that valuable data are
ignored while calculating probability of schema. Ntoulas et al. (2008) have proposed
a crawler to discover and download deep web content autonomously. This method
generates queries automatically for handling query interface. FORMs are filled using
the set covering problem. For each successful query, the number of matching pages
is calculated. The list of keywords is randomly chosen as query list using query gen-
eration policy. This is using frequency of occurrence of the keyword in generic text
collection and content of the web pages fetched from hidden website. As a result, each
and every page requires more time to download and storage space. Lie et al. (2011)
have proposed Minimum Executable Pattern (MEP), where the query FORM uses
minimal combination of elements for performing query (Liu et al. 2011). It works
based on MEP generation and MEP-based deep web adaptive crawling approach.
The optimal queries are generated by parsing the FORMs and partitioning ME pat-
terns. One of the drawbacks of this approach is that the number of documents fetched
by this approach is very low.
Wei et al. (2010) have developed vision-based data extraction (ViDE), which has
used the visual content of Web documents. The similarity in visual content among
the web documents is analysed for extracting information. However, the ViDE per-
forms good only when there is visual similarity between web content and otherwise
fails. Kayed and Chang (2010) have proposed Fiva-Tech where the web data is
extracted in page level (Kayed and Chang 2010). The web pages are matched with
each other based on a template. Tree alignment method along with mining technique
is applied for extracting the data. The distributed object model tree is constructed to
understand the pattern of HTML documents. It is found that tree construction time is
large and in case if immediate web page differs from the previous one, matching may
not be accurate. For every FORM-based query, its distribution in the corpus is used
to predict the property of retrieved document (Ntoulas et al. 2005). One of the major
drawbacks of this method is that attribute value set and the distributions of queries
are not found to be adequate. Wu et al. (2006) have modelled the structured web
information as a distinct attribute-value graph to crawl the hidden web information
(Wu et al. 2006). However, query of all the nodes require to be inserted in the graph
and the cost of the process is high. Madhavan et al. (2008) of Google Corporation
have developed a crawler to surf the deep web, where the search space is navigated
with possible input combinations for query for identifying only the suitable URLs to
include in the search index mechanism. However, the efficiency of the model is not
taken into consideration. The query having high frequency of occurrence is selected
for seed for crawling the web document (Barbosa and Freire 2004). However, it is
not always guaranteed that more new pages are fetched with high-frequency terms.
The domain specification has been used as a parameter for accessing the hidden
web pages (Alvarez et al. 2007). Several heuristics based on visual distance and
text similarity measures are used. The bounded fields may not have any globally
associated text in the Web FORMs. This approach assumes that every FORM field
has an association with a text to show the function of the field. It is observed that
most of the FORMs may not have association and explanation information of the text
label. Yongquan and Qingzhong (2012) have sampled web document, and training
set is constructed for selecting multiple features. Suitable query strings are selected
in each round of crawling phase till the termination condition is reached. It is noticed
that a good coverage rate is achieved, but a large number of duplicate documents
are fetched with higher processing time. In addition, Barbosa and Freire (2007) have
proposed a learning approach and found that learning iteration and the sample path
are high.
It is imperative for the above discussions that fetching deep web data is an impor-
tant task. Developing a deep web crawler for all the domain web applications is a
difficult task. The processing time is bottleneck for the most of web crawler. The
web FORMs are filled with all the possible values, and the combination of values is
huge. As a result, crawler is ineffective to fetch the web documents from deep web.
In addition, none of the approach has indexer included in the architecture for effec-
tively utilizing the ideal time of the crawler. Thus, a deep web crawler is required
to crawl the surface web, deep web and indexer. Suitable rule sets are required to
understand the FORM values for most of the domain-specific applications. In this
chapter, an architecture specification of a crawler is proposed for retrieving surface
and deep web documents with prior knowledge of the domain using the rules. The
relationship between FORM values are found by the rules, and the values to FORMs
are filled effectively to achieve higher coverage rate.
1.3 Deep and Surface Web Crawler
In this section, the details about the crawler are presented and explained. Each
web application consists of FORMs with input elements along with values. Each
FORM element has a label and name relationship. As a result, the FORM is writ-
ten as F ({e1 , e2 , e3 , . . . , ei}, S, M) where ei is the number of FORM ele-
ments, S is information to be submitted. The URL of the FORM, web address
and number of pages connected to the FORM are denoted by M. L is label, and
V {v1 , v2 , v3 , . . . , vi } are set of values. The labels are assigned suitable values.
Thus, each vi is a value that is assigned to appropriate element ei if label e is equal to
L. For a given domain, suitable and probable values are available in the FORMs. The
text boxes in the FORMs are free form inputs as these kinds of input elements are
entered by the user without any constraint. Similarly, input in the form of descriptive
text is also considered as free form such that the users understand the meaning of the
element. The FOMRs of a web page for a domain-specific web page, Label-Value-Set
(LVS) table is extracted for further processing.
1.3 Deep and Surface Web Crawler 5
Fig. 1.1 Block diagram of deep web crawler
The crawler has various blocks with rule specification for fetching deep web
content and is shown in Fig. 1.1. The figure contains three functional units such as
deep web crawler, surface web crawler along with indexer; there are three blocks
such as surface web, deep web and indexer. The conventional HTML pages are
fetched by the surface web crawler. The web page is located and fetched by the URL
fetcher from the WWW, and the URL links are verified by the URL parser to avoid
the dead link. The FORMs in a HTML document are located by understanding the
TAG values. While the HTML page contains <FORM> HTML TAG, the deep web
crawler fetches the document as shown in Fig. 1.1. The FORM elements are filled,
and it is submitted as query to the web applications. The response to the query is
created dynamically based on the values to FORM elements, which is the content of
the deep Web. The function of the FORM processor is self-explanatory by which the
form filling, submission, and retrieving dynamic web pages are handled.
Each FORM contains a large number of input fields. All or some of the fields
are filled by the users, and the LVS table is constructed. However, the allied and
core fields combination estimated by the rules may vary drastically for different web
applications.
1.4 Estimating the Core and Allied Fields
Given two sets, {i 1, 2, . . . n} and { j 1, 2, . . . m}. The functional dependency

between given sets is referred to as constraint on the attributes belong to the sets. A
set of attributes AF1i ∈ R1 in relation R1 will functionally determine another attribute
AF2j ∈ R2 , if and only if, AF1i → AF2j . Thus, AF1i is the determinant attribute and
AF2j is dependent attribute. As a result, if value of AF1i is found, the value of AF2j
is estimated approximately. The functional dependency between two attributes are
such that one attribute depends on the other. Given that AF1i , AF2j , and CFl where
{l 1, 2, . . . k} are sets of attributes in a relation R, the axiom of transitivity is used
to determine the properties of functional dependencies as given below.
Axiom of transitivity: If AF1i → AF2j and AF2j → CFl , then AF1i → CFl
The above axiom is applied for deriving the functional dependency between the
core and allied fields. The (AF1i ) is functionally dependent on (AF2j ); i.e., AF1i
→ AF2j and (AF2j ) is functionally dependent on (CFl ), i.e., AF2j → CFl , which is
shown in Fig. 1.2. In this work, the Core field (CFl ) is constant and AF1j functionally
dependent on CFl attributes. The rules are constructed in the form of if-then-else
construct for the core and allied filed combinations. Various combinations of AF1i ,
AF2j and CFl and classes such as Most Preferable (MP), Least Preferable (LP)
and Mutually Exclusive (ME) found. The attribute values of MP class retrieve large
number of documents from the hidden web. In contrast, the attribute values belonging
to LP class retrieve lesser number of documents. The attribute belonging to ME class
is logically invalid and as the combination will not retrieve any of the documents
from the Web.
For a given application domain, CFl is fixed, and AF1i as well as AF2j are suitably
chosen. Based on experiments, it is noticed that for a combination of CFl, AF1i and
AF2j the rules belonging to most preferable class retrieve large number of documents
compared to the rules belonging to least preferable class, which is shown in Fig. 1.3.
Here, the value of CFl and AF2j is fixed, and the value of AF1i is substituted. This is
represented in Eqs. (1.2–1.4). These equations are derived by analysing the results
manually for a domain to suitably select the combination of allied and core fields.

AF1i → AF2 j
⇒ AF1i → CFl (1.1)
AF2 j → CFl
Equations (1.2–1.4) show the relationship between allied and core fields for MP
and LP classes.
AF1i(y)MP (AF2 j(x) − 1) + CFl (1.2)

AF1i(y)LPT (AF2 j(x) + 1) − 1 + CFl (1.3)
AF1i(y)LPD (AF2 j(x) − 1) − 1 + CFl (1.4)
1.4 Estimating the Core and Allied Fields 7
Fig. 1.2 Co-relation between core and allied fields
Fig. 1.3 Classification of allied and core fields

The SVM classifier is used for classifying Most preferable and Least Preferable
classes by correlating the allied and core fields. The output space of the classifier is
binary to denote the MP and LP classes. The classification scheme for real estate
website is presented in Sect. 1.8.1.
1.5 Classification of Most and Least Preferred Classes
The interpretations of the rules are presented below and can be interpreted as follows.
Say, for any CFl, the first rule is interpreted as the combination of AF21 and AF11 ,
which is Most Preferable and AF21 and AF12 is Least Preferable combinations. In
general, for AF2m and AF1n combination, there is a MP class, and next units AF1n
are the LP class. While there is ME class, the MPs and LPs do not exist, and in a
similar way, the rule is interpreted for various rules. Using these rules, the FORM is
filled with values of MP class, and the documents are retrieved from the deep web
stored in the repositories for processing.
1.6 Algorithm 9
1.6 Algorithm
The main objective of combining the indexer within the proposed crawler is to uses
the ideal time of the crawler. The HTML parser removes various TAGS present in
the fetched HTML documents. As a result, the web page is segmented into set of
words, where the stop words are removed as well as stemming of keywords is carried
out to construct the inverted index. The inverted index stores the term along with its
frequency of occurrence. Crawler components communicate with each other through
web services (Chakrabarti et al. 1999).
Another random document with
no related content on Scribd:
"Yes, soon there could be no doubt whatever about it: the trail led
straight toward those rocks. What would we find there?
"So engrossed were we that we did not see it coming. There was a
sudden exclamation, we halted, and there was the fog—the dreaded
fog that we had forgotten—drifting about us. The next moment it was
gone, but more was drifting after. We resumed our advance. It was
not far now. Why couldn't the fog have waited a little longer? But
what did it matter? It could affect but little our immediate purpose;
and, though I knew that it would be difficult, surely we could find our
way back to the camp.
"The fog thinned, and the rocks loomed up before us, dim and
ghostly but close at hand. Then the vapor thickened about us again,
and they were gone. We were in the midst of crevasses now and
had to proceed with great caution. How it happened none of us
knew; but of a sudden we saw that we had lost the trail. But we did
not turn back to find it. It didn't matter, really. The demon and the
angel had gone to those rocks. Of that we were certain. And there
the rocks were, right there before us. 'Tis true we couldn't see them
now, but they were there.
"We went on. Minutes passed. And still there were no rocks. At
length we had to acknowledge it: in the twistings and turnings we
had been compelled to make among those cursed crevasses, we
had missed our objective, and now we knew not where we were.
"But we knew that we were not far. White and Long cursed and
wanted to know how we were ever going to find our way back
through this fog, since we had failed to find the rocks when they had
been right there in front of us. But it was nothing really serious; we
would find that rock-mass. We started. Of a sudden Long gave a
sharp but low exclamation, and his hand clutched at my arm.
"'What is it?' I asked in a voice low and guarded.
"'Voices!' he whispered."
Chapter 5
"DROME!"
"We listened. Not a sound. Suddenly the glacier cracked and
boomed, then silence again. We waited, listening. Still not the
faintest sound. Long, so White and I decided, must have been
deceived. But Long declared that he had not.
"'I heard voices, I tell you! I know that I was not mistaken at all. I
heard voices.'
"Again we listened.
"'There!' Long said suddenly. 'Hear them?'
"Yes, there, coming to us from out of the fog, were voices, plain,
unmistakable, and yet at the same time—how shall I say it?—
strangely muffled. Yes, that is the word, muffled. I wondered if the
fog did that; but it couldn't, I decided, be the fog. One voice was
silvery and strong, that of Sklokoyum's angel doubtless; the other
deep and rough, the voice of a man. The woman (or girl) seemed to
be urging something, pleading with him. Once we thought that there
came a third voice, but we could not be sure of that. But of one thing
we were sure: they were not speaking in English, in Spanish,
French, Siwash or Chinook. And we felt certain, too, that it was not
Scandinavian, German or Italian.
"'They are over there,' said Long, pointing. 'I am sure of it.'
"'No, there!' whispered White.
"For my part, I was convinced that these mysterious beings were in
still a different direction!
"'Well,' I suggested, 'let's be moving. We won't get the solution of this
queer business by standing here and wondering.'
"We got in motion, uncertain, though, whether we were really
advancing in the right direction; but we could not, I thought, be
greatly in error. Soon we came to a great crevasse. White leaped
across it, and on that instant the voices ceased.
"Had they heard?
"We waited, White crouching there on the other side. Soon the
sounds came again, whereupon White, in spite of my whispered
remonstrance, began stealing forward. Long and I being less active,
did not care to risk that jump, and so we made our way along the
edge of the fissure, seeking a place to cross. This we were not long
in finding, but by this time, to my profound uneasiness, White had
disappeared in the fog.
"We advanced cautiously, and as swiftly as possible. This, however,
was not very swiftly. See! There it was, the ghostly loom of the rocks
through the vapor. At that instant the voices ceased. Came a
scream, a short, sharp scream from the woman. A cry from White,
the crack of his revolver, and then that scream he gave—oh, the
horror of that I can never forget. Long and I could not see him, or the
others—only the ghostly rocks; and soon, too, they were
disappearing, for the fog was growing denser.
"We heard the sound of a body striking the ice and knew that White
had fallen. He was still screaming that piercing, blood-curdling
scream. We struggled to reach him, but the crevasses, those
damnable crevasses, held us up.
"The sound sank. Of a sudden it ceased.
"But there was no silence. The voice of the woman rang out sharp
and clear. And I thought that I understood it: she was calling to it, to
that thing we had seen, down at the camp, squatting beside her, its
eyes burning with that demoniacal fire—calling it off!
"Came a short silence, broken by a cry of horror from the angel. The
man's voice was heard, then her own in sudden, fierce, angry
pleading; at any rate, so it seemed to me—that she was pleading
with him again.
"But what had happened to White?
"All this time—which, indeed, was very brief—Long and I were
struggling forward. When we got out of that fissured ice and reached
the place of the tragedy, the surroundings were as still as death.
There lay our companion stretched out on the blood-soaked ice, a
gurgle and wheezing coming from his torn throat with his every gasp
for breath.
"I knelt down beside him, while Long, poor fellow, stood staring
about into the fog, his revolver in his hand. A single glance showed
that there was no hope, that it was only a matter of moments.
"'Go!' gasped the dying man. 'It was Satan, the Fiend himself. And
an angel. And the angel, she said:
"'"Drome!"
"'Yes, I heard her say it. She said:
"'"Drome!"'
"There was a shudder, and White was dead. And the fog drifted
down denser than ever, and the stillness there was as the stillness of
the grave."
Chapter 6
AGAIN!
"What was that? The angel's voice again, seeming to issue from the
very heart of that mass of rocks. A loud cry and a succession of
sharp cries—cries that, I thought, ended in a sobbing sound. Then
silence. But no. What was that, that rustling, that flapping in the air?
"Long and I looked wildly—overhead, and then I knew a fear that
sent an icy shudder into my heart.
"I cried out—probably it was a scream that I gave—and sprang
backward. My soles were well calked, but this could not save me,
and down I went flat on my back. The revolver was knocked from my
hand and went sliding along the ice for many feet. I sprang up. At
this instant the thing came driving down at Long.
"He fired, but he must have missed. The thing struck him in the
throat and chest and drove him to the ice. I sprang for my weapon.
Long screamed, screamed as White had done, and fought with the
fury of a fiend. I got the revolver and started back. The thing had its
teeth buried in Long's throat. So fierce was the struggle that I could
not fire for fear lest I should hit my companion. As I came up, the
monster loosened its hold and sprang high into the air, flapping its
bat wings, then it came diving straight at me.
"I fired, but the bullet must have gone wild. Again, and it screamed
and went struggling upward. I emptied my revolver, but I fear that I
missed with every shot, except that second one. A few seconds, and
that winged monster had disappeared.
"I turned to Long. I have seen some horrible sights in my time but
never anything so horrible as what I saw now. For there was Long,
my companion, my friend—there he was raised up on his hands, his
arms rigid as steel, and the blood pouring from his throat. And I—I
could only weep and watch him as he bled to death. But it did not
last long. In Heaven's mercy, the horror was ended soon.
"And then—well, what followed is not very clear in my mind. I know
that a madness seemed to come over me. But I did not flee from that
place of mystery and death; the madness, if madness it was, was not
like that. It was not of myself that I was thinking; it was not of escape.
It was as though a bloody mist had fallen upon the place. Vengeance
was what I wanted—vengeance and blood, vengeance and
slaughter. I reloaded my revolver, picked up Long's and thrust it into
my pocket, then caught up White's weapon with my left hand and
started for the rocks, shouting defiance and curses as I went.
"I reached that pile of stone, found the tracks of the angel and the
man and of that winged beast; but, at the edge of the rocks, the
tracks vanished, and I could not follow farther. But I did not stop
there. I went on, clear around that pile, and again and yet again. I
climbed it, clear to the summit, searched everywhere; but I could not
find a single trace of them I sought. Once, indeed, I thought that I
heard a voice, the voice of the angel—thought that I heard that
cursed word Drome.
"But I can not write any more now. Why, oh, why didn't we listen to
Sklokoyum and keep away from this hellish mountain? That, of
course, would have been foolish; but it would not have been this
thing which will haunt me to my dying hour."
Chapter 7
"AND NOW TELL ME!"
Scranton closed the journal, leaned back in his chair and looked
questioningly at Milton Rhodes.
"There you are!" he said. "I told you that I was bringing you a
mystery, and I trust that I have, at least in a great measure, met your
expectations."
There was silence for a moment.
"Hellish mountain!" said Rhodes. "Hellish mountain! Noble old
Rainier a hellish mountain!
"Pardon my soliloquy," he added suddenly, "And I want to thank you,
Mr. Scranton, for bringing me a problem that, unless I am greatly in
error, promises to be one of extraordinary scientific interest."
Extraordinary scientific interest! What on earth did he mean by that?
"Still," he subjoined, "I must confess that there are some things
about it that are very perplexing, and more than perplexing."
"I think I know what you mean. And that explains why the story has
been kept a secret all these years."
"Your grandfather, Mr. Scranton, seems to have been a well-
educated man."
"Yes; he was."
Milton Rhodes' pause was a significant one, but Scranton did not
enlighten him further.
"On his return from Old He, did he tell just what had happened up
there?"
"He did not, of course, care to tell everything, Mr. Rhodes, for fear he
would not be believed. And little wonder. He was cautious, very
guarded in his story; but, at that, not a single soul believed him.
Perhaps, indeed, his very fear of distrust and suspicion and his
consequent caution and vagueness, hastened and enhanced those
dark and sinister thoughts and suspicions of his neighbors, and,
indeed, of every one else who heard the story. There was talk of
insanity, of murder even. This was the cruelest wound of all, and my
grandfather carried the scar of it to his grave."
"Probably it would have been better," said Rhodes, "had he given
them the whole of the story, down to the minutest detail."
"I do not see how. When they did not believe the little that he did tell,
how on earth could they have believed the wild, the fantastic, the
horrible thing itself?"
"Well, you may be right, Mr. Scranton. And here is a strange thing,
too. It is inexplicable, a mystery indeed. For many years now,
thousands of sightseers have every summer visited the mountain—
this mountain that your grandfather found so mysterious, so hellish—
and yet nothing has ever happened."
"That is true, Mr. Rhodes."
"They have found Rainier," said Milton Rhodes, "beautiful, majestic,
a sight to delight the hearts of the gods; but no man has ever found
anything having even the remotest resemblance to what your
grandfather saw—has ever even found strange footprints in the
snow. I ask you: where has the mystery been hiding all these
years?"
"That is a question I shall not try to answer, Mr. Rhodes. It is my
belief, however, that the mystery has never been hiding—using the
word, that is, in its literal signification."
"Of course," Milton said. "But you know what I mean."
The other nodded.
"And now, Mr. Rhodes, I am going to tell you why I this day so
suddenly found myself so anxious to come to you and give you this
story."
Milton Rhodes leaned forward, and the look which he fixed on the
face of Scranton was eager and keen.
"I believe, Mr. Rhodes, I at one point said enough to give you an idea
of what—"
"Yes, yes!" Milton interrupted. "And now tell me!"
"The angel," said Scranton, "has come again!"
"Alone?"
"No; the demon is with her."
Chapter 8
"DROME" AGAIN
Scranton produced a clipping from a newspaper.
"This," he told us, "is from today's noon edition of The Herald. The
account, you observe, is a short one; but it is my belief that it will
prove to have been (at any rate, the pre-cursor of) the most
extraordinary piece of news that this paper has ever printed."
He looked from one to the other of us as if challenging us to doubt it.
"What," asked Rhodes, "is it about?"
"The mysterious death (which the writer would have us believe was
not mysterious at all) of Miss Rhoda Dillingham, daughter of the well-
known landscape painter, on the Cowlitz Glacier, at the Tamahnowis
Rocks, on the afternoon of Wednesday last."
"Mysterious?" queried Milton Rhodes. "I remember reading a short
account of the girl's death. There was, however, nothing to indicate
that there had been anything at all mysterious about the tragedy. Nor
was there any mention of the Tamahnowis Rocks even. It said only
that she had been killed, by a fall, on the Cowlitz Glacier."
"But there was something mysterious, Mr. Rhodes, how mysterious
no one seems to even dream. For again we have it, that word which
White heard the angel speak—that awful word Drome."
"Drome!" Milton Rhodes exclaimed. "That word again—after all these
years?"
"Yes," said Scranton. "And you will understand the full and fearful
meaning of what has just happened there on Mount Rainier when I
tell you that knowledge of that mysterious word has always been
held an utter secret by the Scrantons. No living man but myself knew
it, and yet there it is again."
"This is becoming interesting indeed!" exclaimed Milton Rhodes.
"I was sure that you would find it so. And now permit me to read to
you what the newspaper has to say about this poor girl's death."
He held the clipping up to get a better light upon it and read the
following:
"The death of Miss Rhoda Dillingham, daughter of Francis
Dillingham, the well-known painter of mountain scenery, on the
Cowlitz Glacier on the afternoon of last Wednesday, was, it has now
been definitely ascertained, a purely accidental one. Victor Boileau,
the veteran Swiss guide, has shown that there is not the slightest
foundation for the wild, fantastic rumors that began to be heard just
after the girl's death. Boileau's visit to the Tamahnowis Rocks, the
scene of the tragedy, and his careful examination of the place, have
proved that the victim came to her death by a fall from the rocks.
"There was no witness to the tragedy itself. Francis Dillingham, the
father of the unfortunate girl, was on another part of the rocks at the
time, sketching. On hearing the screams, he rushed to his daughter.
He found her lying on the ice at the foot of the rock, and on the point
of expiring. She spoke but once, and this was to utter these
enigmatic words:
"'Drome!' She said, 'Drome!'
"This is one of those features which gave rise to the stories that
something uncanny and mysterious had occurred at the Tamahnowis
Rocks, as if the spot, indeed, was justifying its eerie name.
"Another is that Dillingham declared that he himself, as he made his
way over the rocks in answer to his daughter's screams, heard
another voice, an unknown voice, and that he is sure that he
distinctly heard that voice pronounce that strange word Drome.
"Victor Boileau, however, has shown that there had been no third
person there at the occurrence of the tragedy, that Rhoda
Dillingham's death was wholly accidental, that it was caused by a
fall, from a height of about thirty feet, down the broken and
precipitous face of the rocky mass.
"Another feature much stressed by those who see a mystery in
everything connected with this tragic accident was the cruel wound
in the throat of the victim. The throat, it is said, had every
appearance of having been torn by teeth; but it is now known that
the wound was made by some sharp, jagged point of rock, struck by
the girl during her fall."
Chapter 9
"TO MY DYING HOUR"
Scranton folded the clipping and placed it between leaves of the
journal.
"There!" he said. "My story is ended. You have all the principal facts
now. Additional details may be found in this old record—if you are
interested in the case and care to peruse it."
Milton Rhodes reached forth a hand for the battered old journal.
"I am indeed interested," he said. "And I wish to thank you again, Mr.
Scranton, for bringing to me a problem that promises to be one of
extraordinary scientific interest."
"I suppose that you will visit the mountain, the Tamahnowis Rocks,
as soon as possible."
Milton Rhodes nodded.
"It will take some time, some hours, that is, to make the necessary
preparations; for this journey, I fancy, is going to prove a very strange
one and perhaps a very terrible one, too. But tomorrow evening, I
trust, will find us at Paradise. If so, on the following morning, we will
be at the Tamahnowis Rocks."
"We?" queried Scranton.
"Yes; my friend Carter here is going along. Indeed, without Bill at my
side, I don't know that I would care to face this thing."
"Me?" I exclaimed. "Where did you get that? I didn't say that I was
going."
"That is true, Bill," Milton laughed; "you didn't say that you were
going."
A silence ensued, during which Scranton sat in deep thought, as,
indeed, did Milton Rhodes and myself. What did it all mean? Oh,
what was I to make of this wild, this fantastic, this fearful thing?
"There is no necessity," Scranton said suddenly, "for the warning, I
know; and yet I can't help pointing out that this adventure that you
are about to enter upon may prove a very dangerous, even a very
horrible one."
"Yes," Rhodes nodded; "it may prove a very dangerous, a very
horrible adventure indeed."
"Why," I exclaimed, "all this cabalistic lingo and all this mystery? Why
not be explicit? There is only one place that the angel could possibly
have come from, this wonderful and terrible creature that says
Drome and has a demon for her companion."
"Yes, Bill," Milton nodded; "here is only one place. And it was from
that very place that she and her demon came."
"Good Heaven! Why, that supposition is absurd. The thing's
preposterous."
"Do you think so, Bill? The submarine, the airplane, the radio—all
were absurd, all were preposterous, Bill, until men got them. And
many other things, too. Why, it was only yesterday that the sphericity
of this old world that we inhabit ceased to be absurd, ceased to be
preposterous. Don't be too sure, old tillicum. Remember the oft-
repeated observation of Hamlet:
"'There are more things in heaven and earth, Horatio,

Than are dreamt of in your philosophy.'"
"That is true enough. But this is different. This isn't philosophy or

something in philosophy. This—"
"Awaits us!" said Milton Rhodes. "The question of prime importance
to us now is if we can find the way to that place whence the angel
and the demon came; for, so it seems to me, there can be little doubt
that it is only on rare occasions, on very rare occasions, that these
strange beings appear on the mountain."
"It is," Scranton remarked, "as, of course, you know, against the
rules to take any firearm into the Park; but, if I were you, I should
never start upon this enterprise without weapons."
"You may rest assured on that point," Milton told him: "we will be
armed. The hazardous possibilities of this very strange problem that
we are going to endeavor to solve justifies this infraction of the rule."
"Well," said Scranton, suddenly rising from his chair, "you are
doubtless anxious to start your preparations at once, and I am
keeping you from them. There is one thing, though, Mr. Rhodes, that
I, that—"
He paused, and a look of trouble, of distress settled upon his pale,
pinched features.
"What is it?" Milton Rhodes queried.
"I am glad that you are going, and yet—and yet I may regret this day,
this visit, to my dying hour. For the thing that I have brought you is
dangerous. It is more than that; it is awful."
"And probably," said Milton, "it is very wonderful indeed."
"But," Scranton added, "one should not blink the possibility that—"
"Tut, tut, man!" Milton Rhodes exclaimed, laughing. "We mustn't find
you a bird of ill-omen now. You mustn't think things like that."
"Yet I can't help thinking about them, Mr. Rhodes. I wish that I could
accompany you, at least as far as the scene of the tragedies; but I
am far from strong. Even to drive a car sometimes taxes my
strength. I doubt if I could now make the climb even from the Inn as
far as Sluiskin Falls."
A silence fell, to be suddenly broken by Milton.
"Let us regard that as a happy augury," said he, pointing towards the
southern windows, through which the sunlight, bright and sparkling,
came streaming in: "the gloom and the storm have passed away,
and all is bright once more."
"I pray Heaven that it prove so!" the other exclaimed.
"For my part, I shall always be glad that you came to me, Mr.
Scranton; glad always, even—even," said Milton Rhodes, "if I never
come back."
Chapter 10
ON THE MOUNTAIN
It was a few minutes past three on the afternoon of the day following
when Milton Rhodes and I got into his automobile and started for
Mount Rainier. When we arrived at the Park entrance, which we did
about half-past six, the speedometer showed a run of one hundred
and two miles.
"Any firearms, a cat or a dog in that car?" was the question when
Milton went over to register.
"Nope," said Milton.
There was a revolver in one of his pockets, however, and another in
one of mine. But there was no weapon in the car: hadn't I got out of
the car so that there wouldn't be?
A few moments, and we were under way again, the road, which ran
through primeval forest, a narrow one now, sinuous and, it must be
confessed, hardly as smooth as glass.
Soon we crossed Tahoma Creek, where we had a glimpse of the
mountain, its snowy, rocky heights aglow with a wonderful golden tint
in the rays of the setting sun. Strange, wild, fantastic thoughts and
fears came to me again, and upon my mind settled gloomy
foreboding—sinister, nameless, foreboding terrible as a pall. We
were drawing near the great mountain now, with its unutterable
cosmic grandeur and loneliness, near to its unknown, which Milton
Rhodes and I were perhaps fated to know soon and perhaps to know
to our sorrow.
From these gloomy, disturbing thoughts, which yet had a strange
fascination too, I was at length aroused by the voice of Rhodes.
"Kautz Creek," said he.
And the next moment we shot across the stream, which went racing
and growling over its boulders, the pale chocolate hue of its water
advertising its glacial origin.
"Up about two thousand four hundred feet now," Milton added.
"Longmire Springs next. I say, Bill, I wonder where we shall be this
time tomorrow, eh?"
"Goodness knows. Sometimes I find myself wondering if the whole
thing isn't pure moonshine, a dream. An angel and a demon on the
slopes of Mount Rainier! And they say that this is the Twentieth
Century!"
Rhodes smiled wanly.
"I think that you will find the thing real enough, Billy, me lad," said he.
"Too real, maybe. The fact is that I don't know what on earth to
think."
"The only thing to do is to wait, Bill. And we won't have to wait long,
either."
When we swung to the grade out of Longmire, I thought that we
were at last beginning the real climb to the mountain. But Milton said
no.
"When we reach the Van Trump auto park, then we'll start up," said
he.
And we did—the road turning and twisting up a forest-clad steep.
Then, its sinuosities behind us, it ran along in a comparatively
straight line, ascending all the time, to Christine Falls and to the
crossing of the Nisqually, the latter just below the end of the glacier
—snout, as they call it. Yes, there it was, the great wall of ice, four or
five hundred feet in height, looking, however, what with the earth and
boulders ground into it, more like a mass of rock than like ice. There
it was, the first glacier I ever had seen, the first living glacier, indeed,
ever discovered in all these United States—at any rate, the first one
ever reported. Elevation four thousand feet.
The bridge behind us, we swung sharp to the right and went slanting
up a steep rampart of rock, moving now away from the glacier, away
from the mountain; in other words, we were heading straight for
Longmire but climbing, climbing. At length the road, cut in the
precipitous rock, narrowed to the width of but a single auto; and at
this point we halted, for descending cars had the way.
The view here was a striking one indeed, down the Nisqually Valley
and over its flanking, tumbled mountains, and the scene would
probably have been even more striking than I found it had the spot
not been one to make the head swim. I had the out side of the auto,
and I could look right over the edge, over the edge and down the
precipitous wall of rock to the bed of the Nisqually, half a thousand
feet below.
The last car rolled by, and we got the signal to come on. This narrow
part of the road passed, we swung in from the edge of the rampart,
and I confess that I was not at all sorry that we did so.
Silver Forest, Frog Heaven, Narada Falls, Inspiration Point, then
Paradise Valley, with its strange tree-forms, its beautiful flower-
meadows, and, in the distance, the Inn on its commanding height,
five thousand five hundred feet above the level of the sea; and, filling
all the background, the great mountain itself, towering fourteen
thousand four hundred feet aloft; the end of our journey in sight at
last!
The end? Yes—until tomorrow. And then what? The beginning then
—the beginning of what would, in all likelihood, prove an adventure
as hazardous as it was strange, a most fearful quest.
Had I been a believer in the oneirocritical science, the things that I
dreamed that night would have ended the enterprise (as far as I was
concerned) then and there: in the morning I would have started for
Seattle instanter. But I was not, and I am not now; and yet often I
wonder why I dreamed some of those terrible things—those things
which came true.
And, through all the horror, a cowled thing, a figure with bat wings,
hovered or glided in the shadows of the background and at intervals,
in tones cavernous and sepulchral, gave utterance to that dreaded
name:
"Drome!"

Textbook Textual and Visual Information Retrieval Using Query Refinement and Pattern Analysis S G Shaila Ebook All Chapter PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Textual and Visual Information Retrieval Using Query Refinement and Pattern Analysis S G Shaila Ebook All Chapter PDF

Uploaded by

Copyright:

Available Formats

Textual and Visual Information

Retrieval using Query Refinement and

Image Retrieval and Analysis Using Text and Fuzzy Shape

Introduction to Information Retrieval Manning

Experiment and Evaluation in Information Retrieval

INFORMATION RETRIEVAL a biomedical and health

Mobile Information Retrieval 1st Edition Prof. Fabio

Textual and Visual Representations of Power and Justice

Information Retrieval Technology: 14th Asia Information

Debates, Rhetoric and Political Action: Practices of

Textual and Visual

Textual and Visual

ISBN 978-981-13-2558-8 ISBN 978-981-13-2559-5 (eBook)

Library of Congress Control Number: 2018955166

© Springer Nature Singapore Pte Ltd. 2018

Bangalore, Karnataka, India Dr. M. K. Banga

Multimedia information retrieval from the distributed environment is an important

Bangalore, India S. G. Shaila

1 Intelligent Rule-Based Deep Web Crawler . . . . . . . . . . . . . . . . . . . . 1

2.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 43

5.4 Histogram Dimension-Based Indexing Scheme . . . . . . . . . . . . . . . 99

Dr. S. G. Shaila is an Associate Professor in the Department of Computer Science

ACE Automatic Content Extraction

HQPOS High-Qualiﬁed Passive Objective Sentence

QAOS Qualiﬁed Active Objective Sentence

Fig. 1.1 Block diagram of deep web crawler . . . . . . . . . . . . . . . . . . . . . . 5

Fig. 3.4 Weight for TAGs under <head> and <section> . . . . . . . . . . . . . 63

Table 3.1 Information on benchmark dataset . . . . . . . . . . . . . . . . . . . . .. 69

1.1 Introduction to Crawler

© Springer Nature Singapore Pte Ltd. 2018 1

1.2 Reviews on Web Crawlers

1.3 Deep and Surface Web Crawler

Fig. 1.1 Block diagram of deep web crawler

1.4 Estimating the Core and Allied Fields

Given two sets, {i 1, 2, . . . n} and { j 1, 2, . . . m}. The functional dependency

AF1i(y)MP (AF2 j(x) − 1) + CFl (1.2)

Fig. 1.2 Co-relation between core and allied fields

Fig. 1.3 Classification of allied and core fields

1.5 Classification of Most and Least Preferred Classes

"'There are more things in heaven and earth, Horatio,

"That is true enough. But this is different. This isn't philosophy or

You might also like