Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

IRS IMP Q&A MID-1

Q) Explain hyper text and xml Data structure?(unit-2)


A) ### Hypertext and XML Data Structures:
1. **Introduction:**
- The Internet's growth has led to the adoption of new mechanisms for
representing information.
- Hypertext and XML are two key structures that differ from traditional data
structures in format and usage.
2. **Hypertext:**
- Hypertext is stored in HTML and XML formats.
- HTML and XML offer detailed descriptions for text subsets, akin to zoning in
traditional structures.
- Hypertext enables one item to reference another through an embedded
pointer.
3. **HTML (Hypertext Markup Language):**
- HTML defines the internal structure for information exchange over the World
Wide Web.
- It is a standard markup language for creating web pages.
4. **XML (eXtensible Markup Language):**
- XML is defined by DTD (Document Type Definition), DOM (Document Object
Model), XSL (eXtensible Stylesheet Language), etc.
- It provides a flexible and extensible way to represent structured information.
5. **Key Characteristics:**
- Both HTML and XML are used for structuring and organizing content on the
web.
- They support the concept of zoning, providing detailed descriptions for text
subsets.
6. **Hypertext Functionality:**
- Hypertext allows seamless linking between different items through
embedded pointers, enhancing navigation and interactivity.
7. **HTML Usage:**
- HTML is widely used for creating static web pages and defining the structure
and layout of content.
8. **XML Features:**
- XML is more versatile, allowing users to define their own tags and structures
to represent data in a machine-readable format.
9. **XML Components:**
- XML components include DTD for defining document structure, DOM for
manipulating documents, and XSL for styling and transforming XML documents.
10. **Internet Integration:**
- Both structures play a crucial role in organizing and presenting information
on the internet, contributing to the development of the World Wide Web.
In summary, Hypertext and XML are integral to the internet's information
representation, with HTML serving as a standard for web page structure, and
XML providing a flexible and extensible means to represent structured data.
Q) Describe in detail about functional overview of an information retrieval
system(unit-1)
A) ### Functional Overview of Information Storage and Retrieval System:
1. **Item Normalization:**
- Process of converting incoming items to a standard format for system
understanding.
2. **Selective Dissemination of Information (Mail):**
- Dynamically compares newly received items against user profiles and
delivers items to users with matching interests.
- Composed of the search process, user profiles, and user mail files.
3. **Document Database Search:**
- Allows queries to search against all items received by the system.
- Composed of the search process, user-entered ad hoc queries, and the
document database storing all received items.
4. **Index Database Search:**
- Enables users to file and logically store items in indexes for future reference.
- Provides the capability to create and search public and private index files.
5. **Public and Private Index Files:**
- Public Index files are maintained by professionals, indexing every item in the
Document Database.
- Private Index files, associated with individual users, reference a subset of
items and have limited access.
6. **Automatic File Build (Information Extraction):**
- Process for generating indexes, especially for professional indexers.
7. **Multimedia Database Search:**
- Multimedia data is an augmentation to existing structures in the Information
Retrieval System.
8. **Integration with Database Management Systems (DBMS):**
- Integration of DBMS and Information Retrieval Systems is crucial.
- Examples include INQUIRE DBMS, ORACLE DBMS with CONVECTIS, and
INFORMIX DBMS linking to RetrievalWare.
9. **Digital Libraries and Data Warehouses (DataMarts):**
- Digital Libraries address issues related to the migration of library products to
digital formats.
- Data warehouses focus on structured data and decision support
technologies, including data mining for automatic analysis.
10. **Interconnection of Systems:**
- Both Digital Libraries and Data Warehouses share a need for search and
retrieval but have distinct focuses and functionalities.
In summary, an Information Storage and Retrieval System encompasses
processes for normalization, selective dissemination, document and index
database searches, and multimedia retrieval, often integrated with Database
Management Systems. Digital Libraries and Data Warehouses offer specialized
functionalities within this context.
Q) . List data flow in IRS?
A) In an Information Retrieval System (IRS), data flow involves the movement of information
between different components and processes. Here is a simplified list of data flow in an IRS:
1. **Item Submission:**
- Users submit items (documents, queries, etc.) to the system.
2. **Item Normalization:**
- Incoming items are normalized to a standard format for system understanding.
3. **Document Processing:**
- Items undergo processing for indexing, analysis, and storage in the document database.
4. **Selective Dissemination of Information (Mail):**
- Dynamic comparison of newly received items against user profiles.
- Delivery of matched items to users through the mail process.
5. **User Queries:**
- Users submit queries (ad hoc or predefined) to search for specific information.
6. **Document Database Search:**
- Search process compares user queries against the document database to retrieve
relevant items.
7. **Indexing:**
- Users create and manage indexes for specific items, associating additional metadata.
8. **Public and Private Index Files:**
- Public Index files maintained by professionals index every item in the document
database.
- Private Index files associated with individual users reference a subset of items.
9. **Automatic File Build (Information Extraction):**
- Generation of indexes, particularly for professional indexers.
10. **Multimedia Retrieval:**
- Retrieval of multimedia data is integrated with existing structures.
11. **Integration with Database Management Systems (DBMS):**
- Integration of structured data and information retrieval functions.
12. **Digital Libraries and Data Warehouses:**
- Specialized functionalities for handling digital content and structured data.
13. **Data Mining:**
- Analytical tools for extracting relationships and dependencies from structured data.
14. **Information Dissemination:**
- Relevant information is disseminated to users based on queries, profiles, and system
capabilities.
15. **Feedback Loop:**
- Users may provide feedback on the relevance of retrieved items, influencing future
searches and recommendations.
Remember that the actual data flow in an IRS can be more complex and may involve
additional steps and interactions based on the specific design and features of the system.

Q) Explain about automatic indexing?(unit-3)


A) Automatic indexing is a process in Information Retrieval Systems (IRS) where the system
generates indexes for documents without direct human intervention. This process is
designed to efficiently organize and categorize information for effective retrieval. Here's an
explanation of automatic indexing in an IRS:

### Automatic Indexing Overview:


1. **Total Document Indexing:**
- Case where the entire document is indexed automatically.
2. **Processing Time:**
- Automatic indexing takes a few seconds based on processor speed and
algorithm complexity.
3. **Advantages:**
- Consistency in index term selection process.
4. **Types of Indexing:**
- Automated indexing results in two classes: weighted and unweighted.
5. **Unweighted Indexing:**
- Includes the existence of an index term in a document and sometimes its
word location in the searchable data structure.
6. **Weighted Indexing:**
- Attempts to place a value on the index term based on its frequency of
occurrence in the document.
- Values are normalized between 0 and 1.
- Results are presented to the user in rank order.
7. **Indexing by Term:**
- Uses the vocabulary of the original item as the basis for the indexing
process.
- Two major techniques: statistical and natural language.
8. **Statistical Indexing:**
- Based on vector models and probabilistic models, including Bayesian
models.
- Calculation of weights uses information like word frequency.
9. **Natural Language Indexing:**
- Uses statistical information but performs more complex parsing to define
the final set of index concepts.
10. **Weighted Systems (Vectorized Information System):**
- Emphasizes weights as a foundation for information detection.
- Each vector represents a document, and each position in a vector
represents a unique word with a weight between 0 and 1.
11. **Bayesian Approach:**
- Based on evidence reasoning, applied in index term weighing or retrieval
processes.
12. **Natural Language Processing:**
- Utilizes DR-LINK (Document Retrieval through Linguistics Knowledge) for
complex linguistic analysis.
13. **Indexing by Concept:**
- Determines a canonical set of concepts based on a test set of terms.
- Example: Match Plus system by HNC Inc. uses neural networks for context
vectors.
14. **Multimedia Indexing:**
- Accomplished at the raw data level for video or images.
- Allows positional and temporal (time) search.
Automatic indexing enhances the efficiency and consistency of the indexing
process, utilizing various techniques such as statistical modeling, natural
language processing, and neural networks for effective information retrieval.

You might also like