Information Storage and Retrieval - Professional Practice

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Information Storage and Retrieval

Submitted to – Submitted by –

Prof. Madhvi Sharma Abhinav Sharma


Akshay Jain
Aman Saini
Anuj Somani
Gitika Gupta
Gunjan Jain
Introduction:

By nature man is social and the society is the web of social relationship. Being a social animal, man
wants to communicate as among all animals, only man is endowed with the gift of speech. If speech
was the first step forward in the development of human communication, the second great milestone
was invention of writing.

In the beginning the man had been developing methods of recording his experiences through
clay-tablets, wax-tablets, papyrus sheets, parchment roles, codices, etc. The third great leap forward came
with invention of printing, by means of which what was written could be reproduced and distributed
in quantity, thus disseminating information and learning among ever widening circles of the
community. Bewildering amount of progress have accompanied the development of electronic systems
of communication- the telegraphs, telephone and especially radio, television and satellites.
Recently, the electronic computer and telecommunication technologies have brought many other
revolutionary possibilities.
Introduction:

We may understand an information retrieval system as a


system which help us to recover information. To
understand the nature of an information retrieval system,
we must try to understand the meaning of its component
words. Firstly, it is a system that is composed of a set of
several interacting parts, each of which is designed to serve
a specific function for a specific purpose.

All these components are interrelated to achieve a goal.


Here it is to recover or retrieve information in a narrower
sense, and by doing so increase the level of knowledge of the
users in a broader sense. The concept of information
retrieval thus pre-supposes that there are some items of
information which have been organized and stored in a
suitable order for an easy recovery, whenever needed.
Importance:

An information retrieval system is developed in order to help users to discovery relevant information
from a storehouse containing collection of documents.

The idea of information retrieval assumes that there exists several documents or records comprising data
that have been arranged in a suitable order for easy retrieval. The storehouse contains many
bibliographic information, which is quite different from other kinds of information or data. For such
scenarios the retrieval system is designed to search for and retrieve specific facts or data.

The main objective of databases is to enable the user to search for specific records that be matched
with one or more specific conditions or search criteria, for example, details of a certain recipe
containing a particular ingredient; details of a specific product within a specific range of market
price; The main purpose of designing an information retrieval system is to meet the user
requirements. It enables in document retrieval in-order to answer to the users' queries.
Uses:
-Regulatory Compliance-

A well-organized information storage and retrieval system that follows compliance regulations and tax
record-keeping guidelines significantly increases a business owner’s confidence the business is fully
complying.

-Efficiency & Productivity-

A good information storage and retrieval system, including an effective indexing system, not only decreases
the chances information will be misfiled but also speeds up the storing and retrieval of information. The
resulting time-saving benefit increases office efficiency and productivity.

-Environment-

Improves Working environment. It is important for an office structure to have well-organized information
storage and retrieval system in order to create a healthy working environment and avoid stressful or poor
situations.
Electronic vs. Manual Systems-

Although a very small business may choose to institute a manual system, the importance of electronic
information storage and retrieval systems lie in the fact that electronic systems reduce storage space
requirements and decrease equipment and labor costs. In contrast, a manual system requires budgetary
allotments for storage space, filing equipment and administrative expenses to maintain an organized
filing system. Additionally, it can be significantly easier to provide and monitor internal controls designed
to deter fraud, waste and abuse as well as ensure the business is complying with information privacy
requirements with an electronic system.
Information Storage -
Organizations process data to derive the information required for their day-to-day operations. Storage is
a repository that enables users to persistently store and retrieve this digital data.

Data - A collection of raw facts from which conclusions


might be drawn. Handwritten letters, a printed book, a
family photograph, printed and duly signed copies of
mortgage papers, a bank's ledgers, and an airline ticket
are all examples that contain data.
Earlier, the methods adopted for data creation and sharing were limited to a fewer forms, such as paper
and film but now the same data can be converted into more convenient forms by using a computer.
Factors that have contributed to the growth of digital data:
1. Increase in data-processing capabilities
2. Lower cost of digital storage
3. Affordable and faster communication technology
4. Proliferation of applications and smart devices
The importance and value of data vary with time. Data created holds significance for a short term but
becomes less valuable over time. Recent data has higher usage so is stored on faster and more expensive
storage. As it ages, it may be moved to slower, less expensive but reliable storage.

Types of Data

Structured Data Unstructured Data

Organized in rows and columns in a rigidly Elements cannot be stored in rows and
defined format so that applications can retrieve columns, which makes it difficult to query and
and process it efficiently and is stored using a retrieve by applications. A vast majority of
database management system (DBMS). new data being created today is unstructured.

NOTE -Data, whether structured or unstructured, does not fulfill any purpose for individuals or
businesses unless it is presented in a meaningful form. Information is the intelligence and knowledge
derived from data.
Storage-
• Data created by individuals or businesses must be stored so that it is easily accessible for further
processing.
• In a computing environment, devices designed for storing data are termed storage devices or
simply storage. Examples:
i. Individuals: Digital camera, Cell phone, DVDs, Hard disks
ii. Businesses: Hard Disks, External Disk Arrays, Tape Library
iii. Centralized: Mainframe Computers
iv. Decentralized: Client-Server Model (Data spread across many servers)
v. Centralized: Storage Networking
Architecture-
Historically, organizations had centralized computers (mainframes) and information storage devices
(tape reels and disk packs) in their data center-

1. Server-centric storage architecture - The storage was typically internal to the server and could not be
shared with any other servers.

2. Information-centric architecture - Storage devices are managed


centrally and independent of servers and are shared with multiple
servers. The capacity of shared storage can be increased dynamically
by adding more storage devices without impacting information
availability. In this architecture, information management is easier and
cost-effective.
Infrastructure
Organizations maintain data centers to provide centralized data-processing capabilities across
the enterprise. Data centers house and manage large amounts of data.

• Core Elements of a Data Center


i. Application: A computer program that provides the logic for computing operations.
ii. Database management system (DBMS): Provides a structured way to store data in logically
organized tables that are interrelated.
iii. Host or compute: A computing platform (hardware, firmware, and software) that runs
applications and databases.
iv. Network: A data path that facilitates communication among various networked devices
v. Storage: A device that stores data persistently for subsequent use.
Managing a Data Centre

Managing a data center involves many tasks. The key management activities include the following:

● Monitoring: It is a continuous process of gathering information on various elements and services


running in a data center. The aspects of a data center that are monitored include security,
performance, availability, and capacity.
● Reporting: It is done periodically on resource performance, capacity, and utilization. Reporting
tasks help to establish business justifications and chargeback of costs associated with data center
operations.
● Provisioning: It is a process of providing the hardware, software, and other resources required to
run a data center. Provisioning activities primarily include resources management to meet capacity,
availability, performance, and security requirements.

Virtualization and cloud computing have dramatically changed the way data center infrastructure
resources are provisioned and managed. Continuous cost pressure on IT and on-demand data processing
requirements have resulted in the adoption of cloud computing.
Information Retrieval

An information retrieval system is developed in order to help users to discovery relevant information
from a storehouse containing collection of documents. Information retrieval is the activity of
obtaining information resources relevant to an information need from a collection of information
resources.
An information retrieval process begins when a user enters a query into the system. Queries are formal
statements of information needs. User queries are matched against the database information. Most IR
systems compute a numeric score on how well each object in the database matches the query, and rank
the objects according to this value.

Major Components of IR
• Information retrieval can be divided into several major constitutes which include:

i. Database iii. Language

ii. Search mechanism iv. Interface


i. Database
A system whose base, whose key concepts, is simply a particular way of handling data & its objective is
to record and maintain information.The idea of information retrieval assumes that there exists several
documents or records comprising data that have been arranged in a suitable order for easy retrieval. The
storehouse contains many bibliographic information, which is quite different from other kinds of
information or data.

For examples if we maintain a database of information about an institution , all we have are the different
types of records and related facts, such as, names of students, faculties, staffs, their positions,
qualifications and so on.

ii. Search Mechanism


Information organized systematically that can be searched and retrieved when a corresponding search
mechanism is provided.
Search procedures can be categorized as basic or advance search procedure. Capacity of search
mechanism determines what retrieval techniques will be available to users and how information stored
in databases can be retrieved.
iii. Language
Information relies on language when being processed, transferred or communicated. Language can
be identified as natural language and controlled vocabulary.

iv. Interface
Interface regularly considered whether or not an information retrieval system is user friendly.

• Quality of interface checked by interaction mode

• Determines the ultimate success of a system for information retrieval


Basic Retrieval Techniques

Boolean Searching - Logical operations are also known as Boolean Logic.


When Boolean logic is applied to information retrieval, the three operators, called Boolean operators.

The AND operate for narrowing down a search The OR operate for broadening a search.
The NOT operator for excluding unwanted data.
Basic Retrieval Techniques

Case Sensitivity Searching - Text sometimes exhibits case sensitivity; that is, words can differ in
meaning based on differing use of uppercase and lowercase letters. Words with capital letters do not
always have the same meaning when written with lowercase letters.

For example, Bill is the first name of former U.S. president William Clinton, who could sign a bill
The opposite term of "case-sensitive" is "case-insensitive“
For example, Google searches are generally case-insensitive and Gmail is case-sensitive by default.
Truncation - Truncation allows a search to be conducted for all the different forms of a word having the same
common roots
• Used symbol (Question mark? , asterisk* and pound sign # ) for truncation purpose.
•A number of different options are available for truncation like Left truncation, Right truncation and middle
truncation.
Left truncation retrievals all the words having the same characteristics at the right hand part, for example, *hyl
will retrieval words such as “methyl” and “ethyl”
•Right truncation, for example the term of Network* as a query results in retrieving documents on networks and
networking.
Basic Retrieval Techniques

Proximity Searching - A proximity search allows you to specify how close two (or more) words must
be to each other in order to register a match.
There are three types of proximity searches:
• Word proximity
• Sentence proximity
• Paragraph proximity

Range Searching - It is most useful with numerical information. The following options are
usually available for range searching
• greater than (>) less than (<) • equal to (=)
• not equal to (/= or o)
• greater than equal to (>=)
• less than or equal to (<=)

Example of Range Searching


To search for documents or items that contain numbers within a range, type your search term and the range of
numbers separated by two periods (“..”). For example, to search for pencils that costs between $1.50 and $2.50,
type the following:
Advanced Retrieval Techniques

Fuzzy Searching - It is designed to find out terms that are spelled incorrectly at data entry and query
point. For example the term computer could be misspelled as compter, compiter, or comuter. Optical
Character Recognition (OCR) or compressed texts could also result in erroneous results. Fuzzy
searching is designed for detection and correction of spelling errors that result from OCR and text
compression.

Query Expansion - Query expansion is a retrieval technique that allows the end user to improve
retrieval performance by revising search queries based on results already retrieved.
Information Retrieval Systems

1. Online Systems - Online information retrieval systems allow the user to search databases located
remotely with the help of the computer and telecommunication technology.

• Basic searching techniques

• Advanced retrieval techniques

• Examples: Library of Congress, University of Punjab Library

2. CD - Rom Systems - CD-ROM systems are usually searched locally and it works if the systems are not
networked. Basic retrieval techniques are supported in CD-ROM systems while advanced search facilities
are applied in limited scope. The data which is stored on compact disc (CD) can to read by any computer
operating systems and any CD-ROM drive. Example: LISA
Information Retrieval Systems

3. OP AC - Online public access catalogs (OPACs) are traditional catalogs executed in a different
medium. Different features of OPACs are
First, OPACs contains bibliographic information about library resources.
Second, OPACs can be considered as an extension of MARC records.
Third, OPACs support at least field searching, keyword searching and Boolean searching.

4. Web Information Retrieval Systems - It deals with text as well as multimedia information resources
that are linked with other documents and there is no target user’s community as such.
Basically web is a platform where anyone from anywhere can publish virtually any information, in any
language or in any format. Examples, Google, Alta Vista
Thank You

You might also like