Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 70

IST 511 Information Management: Information and

Introduction to IST 511
Dr. C. Lee Giles
David Reese Professor, College of Information Sciences
and Technology
Professor of Computer Science and Engineering
Professor of Supply Chain and Information Systems
The Pennsylvania State University, University Park, PA,
What is IST 511?
• Introduction to algorithmic/computational parts of IST
– There will be some maths

• Guide to research
– In information and related sciences
– In IST
– Illustrate the intellectual diversity of IST

• Methodology
– Read, view, discuss and write about ideas and papers in the field
• When possible, use examples of IST 511 research from IST grad students
– Write a research proposal paper and give a professional
• Focus on methodologies discussed here
IST 511
• Nearly all course material is at:

Lose this address, put IST511 into Google or Bing

Read this page and links very carefully at least once a week

• Angel is used so far only for student submissions.

• Important notices will be sent by email with the subject:

• What is information
– Things - artifacts
– Use
• Personal, social,etc.
– Foundations and representation
– Information vs knowledge
• Information science vs informatics vs
information theory
Topics considered and used in IST (will consider some,
not all)
• Complexity
• Representation
• AI
• Machine learning
• Information retrieval and search
• Text
• Encryption
• Social networks
• Probabilistic reasoning
• Digital libraries
• Others?
Theories in Information Sciences
• Enumerate some of these theories in this course.
• Issues:
– Unified theory?
– Domain of applicability
– Conflicts
• Theories here are mostly algorithmic
– Automated vs manual
– Scalable features
• Google vs iPhone
• Quality of theories
– Occam’s razor
– Subsumption of other theories
Past & Recent Headlines

• A Minnesota hacker was sentenced to 18 years in prison on Tuesday

for using his neighbors’ wireless network without permission and then
framing them for child pornography distribution and email threats
against Vice President Joe Biden and other officials.
• “Latest Genealogy Tools Create a Need to Know”
• “Bots Hammer Estonia In Cyber Vendetta”
• “UPS slashed the time it takes to determine the least-expensive route
from months and wants to make that information available in real time”
• “Sophisticated internet users continue to fall for spam”
• “Google makes us stupid”
• “Google makes us smarter”
• “IT doesn’t matter”
• “Microsoft and Yahoo unite against Google Book Search”
What is Information?
• There are several ways to define
– Subjective: People develop models of their
environment. Information created by people
makes those models more accurate.
– Thing/artifact: Information is what’s
captured in a book, web page, or other
• More information is digital
Information - wikipedia
• Information as a concept has a diversity of meanings, from everyday
usage to technical settings. Generally speaking, the concept of
information is closely related to notions of constraint, communication,
control, data, form, instruction, knowledge, meaning, mental stimulus,
pattern, perception, and representation.

• Many people speak about the Information Age as the advent of the
Knowledge Age or knowledge society, the information society, the
Information revolution, and information technologies, and even
though informatics, information science and computer science are
often in the spotlight, the word "information" is often used without
careful consideration of the various meanings it has acquired.
How much information is
there in the world
Informetrics - the measurement of
• Stored
– What can we store
– What do we intend to store.
– What is stored.
• How do we use it
– Decision making
Information Age
• We have entered the information age
– What is the information age?

• When do we leave it and where do we

go next?
– David Weinberger’s Too Big to Know
– What information was
Digitization of Everything: the Zettabytes are coming

• Soon most everything

will be recorded and
• Much will remain local
• Most bytes will never
be seen by humans.
• Search, data
summarization, trend
detection, information
and knowledge
extraction and
discovery are key
• So will be
infrastructure to
manage this.
Digital Information
Created, Captured, Replicated Worldwide


Growth in 5 RFID
1,200 Digital TV
Years! MP3 players
Digital cameras
800 Camera phones, VoIP
Medical imaging, Laptops,
Data center applications, Games
400 Satellite images, GPS, ATMs, Scanners
Sensors, Digital radio, DLP theaters, Telematics
200 Peer-to-peer, Email, Instant messaging, Videoconferencing,
CAD/CAM, Toys, Industrial machines, Security systems, Appliances
2006 2007 2008 2009 2010 2011

Source: IDC, 2008

Scale of things to come
• Information growth:
– In 2002, recorded media and electronic information flows generated
about 22 exabytes EB (1018) of information
– In 2006, the amount of digital information created, captured, and
replicated was 161 EB
– In 2010, the amount of information added annually to the digital
universe was about 988 EB (almost 1 ZB)

• How much of this is information, data or knowledge?

Digital Universe Environmental Footprint
• In our physical universe, 98.5% of the
known mass is invisible, composed of
interstellar dust or what scientists call “dark
matter.” In the digital universe, we have
our own form of dark matter — the tiny
signals from sensors and RFID tags and the
voice packets that make up less than 6% of
the digital universe by gigabyte, but
account for more than 99% of the “units,”
information “containers,” or “files” in it.
• Tenfold growth of the digital universe in
five years will have a measurable impact
on the environment, in terms of both power
consumed and electronic waste.
How much information is there?
• Soon most everything will be
recorded and indexed ! Zetta
• Most bytes will never be seen Recorded
by humans. All Books Exa
• Data summarization, MultiMedia
trend detection Peta
anomaly detection All books
are key technologies (words)
See Mike Lesk:
How much information is there: .
Movie Giga
See Lyman & Varian:
How much information A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
Information Facts
Print, film, magnetic, and optical storage media produced about 5 exabytes of new
information in 2002. Ninety-two percent of the new information was stored on
magnetic media, mostly in hard disks.

• How big is five exabytes? If digitized with full formatting, the seventeen million
books in the Library of Congress contain about 136 terabytes of information; five
exabytes of information is equivalent in size to the information contained in
37,000 new libraries the size of the Library of Congress book collections.
• Hard disks store most new information. Ninety-two percent of new information is
stored on magnetic media, primarily hard disks. Film represents 7% of the total,
paper 0.01%, and optical media 0.002%.
• The United States produces about 40% of the world's new stored information,
including 33% of the world's new printed information, 30% of the world's new
film titles, 40% of the world's information stored on optical media, and about
50% of the information stored on magnetic media.
• How much new information per person? According to the Population Reference
Bureau, the world population is 6.3 billion, thus almost 800 MB of recorded
information is produced per person each year. It would take about 30 feet of
books to store the equivalent of 800 MB of information on paper.
Information Census
Varian & Lyman


• ~10 Exabytes
• ~90% digital TB

• > 55% personal

• Print: .003% of bytes
5TB/y, but text has lowest entropy
• Email is 4PB/y and is 20% text
(10 Bmpd) (estimate by Media
Gray) TB/y Growth
Rate, %
• WWW is ~50TB
optical 50 70
deep web ~50 PB paper 100 2

• Growth: 50%/y film 100,000 4

magnetic 1,000,000 55
total 1,100,150 50
First Disk 1956

• 4 MB

• 50x24” disks

• 1200 rpm

• 100 ms access

• 35k$/y rent
• Included computer &
accounting software
(tubes not transistors)
1.6 meters
10 years later

30 MB
Now - Terabytes on your desk

Terabyte external
drive for
$200 - 20 cents a

In 5 years, 1
cent/gigabyte, $10
for a terabyte?
Now - Terabytes on your desk

Terabyte external drive for

$200 - 6 cents a gigabyte.

In 5 years, 1 cent/gigabyte, $10 for a

Moore's Law
• Defined by Dr. Gordon Moore during the
• Predicts an exponential increase in component
density over time, with a doubling time of 18
• Applicable to microprocessors, DRAMs ,
DSPs and other microelectronics.
• Monotonic increase in density observed since
the 1960s.
Moore’s Law - Density
Disk TB Shipped per Year
1998 Disk Trend (Jim Porter)

Storage capacity 1E+6

disk TB

beating Moore’s law 1E+5


Moore's Law:

• Improvements:
1E+4 58.7%/y

Capacity 60%/y 1E+3

Bandwidth 40%/y
1988 1991 1994 1997 2000

Access time 16%/y

• 1000 $/TB today
• 100 $/TB in 2007
Moores law 58.70% /year
TB growth 112.30% /year since 1993
Price decline 50.70% /year since 1993
Most (80%) data is personal (not enterprise)
This will likely remain true.
Digital Immortality Bell, Gray, CACM, ‘01
Requirements for storing various media for a single
person’s lifetime at modest fidelity
What is Digital Immortality?
• Preservation and interaction of digitized
experiences for individuals and/or groups
– Preservation and access
– Active interaction with archives through
queries and/or an avatar (agents)
– Avatar interactions for group experiences
• Issues:
– Archiving
– Indexing
– Veracity
– Access
New Information Flows
• Telephone increase is significant
All the world’s libraries on
your iPod! iPhone
NY Times Magazine
And you thought finding that
song was hard.

•Storage is practically free

•Much is mobile
•Access is crucial
•Moore’s law keeps on trucking
Why Put Everything in Cyberspace?

Low rent Point-to-Point

min $/byte OR

Immediate OR Time Delayed


Shrinks time
now or later
Shrinks space Process
here or there Analyze

Automate processing
As We May Think, Vannevar Bush, 1945

“A memex is a device in which an individual

stores all his books, records, and
communications, and which is mechanized so
that it may be consulted with exceeding speed
and flexibility”
“yet if the user inserted 5000 pages of material a
day it would take him hundreds of years to fill
the repository, so that he can be profligate and
enter material freely”
Trying to fill a terabyte in a year
Item Items/TB Items/day
300 KB JPEG 3M 9,800
1 MB Doc 1M 2,900
1 hour 256 kb/s 9K 26
MP3 audio
1 hour 1.5 Mbp/s 290 0.8
MPEG video
Progress of Science Paradigms
• Thousand years ago:
science was empirical
describing natural phenomena
• Last few hundred years:
theoretical branch
using models, generalizations 2
⎛⎛ .. ⎞⎞2
⎜⎜aa ⎟⎟ 44ππG
G cc22
• Last few decades: ⎜⎜a ⎟⎟ == 3 −
⎜⎜ a ⎟⎟ 3
Κ 22
a computational branch ⎝⎝ ⎠⎠

simulating complex phenomena

• Today:
data and information exploration (eScience)
unify theory, experiment, and simulation - information driven
– Data captured by sensors, instruments
or generated by simulator
– Processed by software
– Information/Knowledge stored in computer
– Scientist analyzes database / files
using data management and statistics
– Network Science
– Cyberinfrastructure
Information Systems
• An Information System is the system of persons, data records and
activities that process the data and information in a given
organization, including manual processes or automated processes.
– Usually the term is used erroneously as a synonymous for computer-
based information systems, which is only the Information
technologies component of an Information System.
– The computer-based information systems are the field of study for
Information technologies (IT); however these should hardly be
treated apart from the bigger Information System that is always
involved in.
• The actual system such as a search engine, etc.
The Information Funnel
Information is nearly always developed to facilitate human needs!

• Complexity of the World



Representation as Information:
What Makes a Good Representation?

•A straight line can be a good representation

for describing some data.
• For other data, a curved (quadratic) line is
Types of Representations
• Categories
• Equations
• Language
• Logic statements
• Images
• Mental models
Models(information) of Processes


Modeled by
sine wave
Information Processing
• There are many ways to apply the information stored in
• Retrieval
– Finding useful information
• Recognition
– Identifying an instance
• Inference
– Extend stored information to a new situation
• One of the hardest problems for
information processing is determining the
context in which the information is
• This may lead to incorrect inferences.
• Some say information is data in context.
People and Information
• People process information based on their
experience and context.
• Human information processing is affected
by emotions and needs.
• Your data may be my information
What is an information system?
• Processes information
• Requires knowledge of what information is
• How much information is available
– Static vs dynamic
– Explict vs implicit
• How it is used and structured
– information management
• How it’s managed
• Incorporated into personal or social use.
Information Characteristics

• Structural / Ontological / context

– State based
• Representations / rules
• Functional / active
• Language / communication
• Personal
• Social
What is knowledge?
• Data - Facts, observations, or perceptions.
• Information - Subset of data, only including those data
that possess context, relevance, and purpose.
• Knowledge - A more simplistic view considers
knowledge as being at the highest level in a hierarchy with
data (at the lowest level) and information (at the middle level).

•Data refers to bare facts void of context.

–A telephone number.
•Information is data in context.
–A phone book.
•Knowledge is information that facilitates action.
–Recognizing that a phone number belongs to a good client,
who needs to be called once per week to get his orders.
From Facts to Wisdom
(Haeckel & Nolan, 1993)
one example of the hierarchy

Volume Less is Value

Completeness More Structure
Objectivity Wisdom Subjectivity



What is knowledge?
• Knowledge - A more complex view considers
knowledge as intrinsically different from
information. Instead of considering knowledge as
richer or more detailed set of facts, we define
knowledge in an area as justified beliefs about
relationships among concepts relevant to that
particular area.
Is Information
• An aspect of intelligence?
– Derivative to its use
• An aspect of life?
• Innate to physical reality?
– Innate code, ex DNA, etc.
Characteristics of Information
– Invariant
– Dynamic
– Personal
– Situational
– Cultural
– An act versus a fact
– Additive
– Symbolic
– Others?
Information Theory
• Information theory is a discipline in applied
mathematics involving the quantification of data
with the goal of enabling as much data as possible
to be reliably stored on a medium or
communicated over a channel.
• The measure of information, known as
information entropy, is usually expressed by the
average number of bits needed for storage or
– The more common the event, the higher the entropy
Claude Shannon
• Claude Shannon is the creator
of “information theory”
• The definition was not a broad
definition of “information”
nor it was others were
referring to information at that
time and even now.
• However, the definition can
be quite useful
Models of Information
• Common model: a representation of data
– When possible formalize the information process
– Interoperability
– Standards
• What is formalization?
– Logical or mathematical representation
• Natural language definitions are becoming formal
– Why formal definitions of information?
– Examples?
of Information
• Costs
• Reproducibility
• Scalability
• Automation
• Interpretation
• Others?
Consequences of Information
• Information can lead to
– Decisions
– Actions
– Contemplation
– Laws
– More information
Models of Information Use
• Personal models
– Cognitive
• Social models
– Institutions
– Groups
– Nations
– Commerce
– Etc.
What is Information?
• There is no standard definition
• Context is important; maybe vital
– "Information is produced when data are processed so
that they are placed within some context in order to
convey meaning to a recipient."
• Information causes things to happen
– Permits decisions, actions, predictions, etc.
• An innate aspect of intelligence/universe?
The Philosophy of Information: A Definition

What is the Philosophy of Information?

a new philosophical discipline, concerned with

a) the critical investigation of the conceptual nature and basic

principles of information, including its dynamics (especially
computation and flow), utilisation and sciences; and
b) the elaboration and application of information-theoretic and
computational methodologies to philosophical problems.

L. Floridi
What is the Philosophy of Information? (2002)

Open Problems in the Philosophy of Information © L. Floridi

P.3 The GUTI Challenge
Is a grand unified theory of information possible?
The word “information” has been given different
meanings by various writers in the general field of
information theory. It is likely that at least a number of
these will prove sufficiently useful in certain applications
to deserve further study and permanent recognition. It is
hardly to be expected that a single concept of
information would satisfactorily account for the
numerous possible applications of this general field.
(Shannon 1993, 180)

Reductionism: we can extract what is essential to understanding

the concept of information and its dynamics from the wide
variety of models, theories and explanations proposed.
Non-Reductionism: we are dealing with a network of logically
interdependent but mutually irreducible concepts.

Open Problems in the Philosophy of Information © L. Floridi

What is information science?
Not to be confused with informatics or information theory

• Information science is an interdisciplinary science primarily

concerned with the collection, classification, manipulation, storage,
retrieval and dissemination of information. Practitioners within the
field study the application and usage of knowledge in organizations,
along with the interaction between people, organizations and any
existing information systems, with the aim of creating, replacing or
improving information systems. Information science is often
(mistakenly) considered a branch of computer science. However, it
is actually a broad, interdisciplinary field, incorporating not only
aspects of computer science, but often diverse fields such as
mathematics, business, library science, cognitive science, and the
social sciences.
information science vs informatics
• Informatics is the science of information, the practice of
information processing, and the engineering of
information systems. Informatics studies the structure,
algorithms, behavior, and interactions of natural and
artificial systems that store, process, access and
communicate information.
• It also develops its own conceptual and theoretical
foundations and utilizes foundations developed in other
fields. Since the advent of computers, individuals and
organizations increasingly process information digitally.
• This has led to the study of informatics that has
computational, cognitive and social aspects, including
study of the social impact of information technologies.
• Many subfields: X-informatics
Great Predictions
• "Computers in the future may weigh no more than 1.5 tons.” Popular
Mechanics, forecasting the relentless march of science, 1949
• "I think there is a world market for maybe five computers.” Thomas Watson,
chairman of IBM, 1943
• "Heavier-than-air flying machines are impossible.” Lord Kelvin, president,
Royal Society, 1895.
• "Man will never reach the moon regardless of all future scientific
advances."Dr. Lee De Forest, inventor of the vacuum tube and father of
• "Everything that can be invented has been invented.” Charles H. Duell,
Commissioner, U.S. Office of Patents, 1899.
• “Nobody would ever need more than 640 kilobytes of memory on their
personal computer,” 1981, Bill Gates.
– Other predictions of Bill Gates?
Great Predictions

• Artificial Intelligence:
– speech recognition
– Some reasoning; computer beats man in
– Privacy and security problems
– Computers can be a pain in the butt


• Missed Moore’s law and ubiquity of

Predicting the future
– “The future ain’t what it used to be” Yogi Berra
• Can we really predict the future?
• Who predicted the implications of the web and
search engines?
• Social networking?
• Can we understand power laws and their
– We have no examples of exponential growth in our
evolution except plagues.
• Can we understand the pervasiveness of
Everything Gets Bigger
“Screens” are larger
• Flat screen television
• Wall televisions
“Screens” are everywhere
• Every room of the house
• Waiting rooms
• Stores
• Cars
• Phones
“ The return of large data centers”
Everything Gets Smaller
• Phones
• Watches / instruments
• Computers
– embedded
• Glasses
• Projectors
Everything Gets Cheaper
• World wide cell phone penetration
– 5 Billion
• Some places 100% penetration
– 1 Billion smart phones
Everything gets smarter
• Mobile phones - the new computer
• The PDA that is really an assistant
• Digital immortality
Discussion Questions
• Is more and more information being digitized?
• Which definition of information do you prefer? Can
information be inaccurate? Can you measure it?
• Information is the message
• How is information accessed?
• Is entertainment information? Are music and games
information resources?
• What is a “fact”? Can it exist without a context? What is
• Can information be both explicit and implicit?
• What does the growth of information mean?
• What about Moore’s law?
Thanks to:
• Jim Gray, Microsoft

• L. Floridi, Hertfordshire

• Robert Allen, Drexel

• Wikipedia

You might also like