The Future of Information Filtering

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

12/18/21, 11:45 AM The Future of Information Filtering

The Future of Information Filtering



Paul Canavese
Library and Information Studies 296A
Howard Besser

April 29, 1994

� NOTICE

This is a high-input society. It seems that not a minute may be


wasted in consuming commodities and
communicating with as many people as
possible. But in a Babel of signals, we must listen to a great deal of
chatter
to hear one bit of information we really want.

Without significance, variety isn't the spice of life. It can be as dull as


monotony when it has nothing
to say... This is because to recognize a thing as
new, one must be able to distinguish it from what is
old... we can be surprised
repeatedly only by contrast with that which is familiar... not by chaos.

- Orrin Klapp, Overload and Boredom: Essays on the Quality of Life in the Information Society

Never before has so much information been available to the general public. And
never before has the flow of
newly-created information been greater. There has
also never before been a time when so much information has
been available in an
electronic form. We live in an "Information Age," the extent of which is
growing by the
minute. "We have for the first time an economy based on a key
resource that is not only renewable, but self-
generating," notes John Naisbitt
in Megatrends. "Running out is not a problem, but drowning in it is."

So, as information production continues to increase, the areas of study geared


toward extracting the information
we want become increasingly important. As
the stream of data has become wider, more and more people are
realizing that
they simply cannot process it all in time. These people need a plan for
processing this information,
and many will look to computers to implement these
plans through an information filtering system.

Automated information filtering systems are in their infancy, but current


trends suggest that a demand for them
will be high in the near future. For
now, experimental work continues, and we can only speculate on how these
systems will ultimately affect how people will get their information, and how
that information will be affected.

Information Anxiety

In 1989, Richard Saul Wurman wrote a book titled Information Anxiety. He


writes that this ailment "is produced
by the ever-widening gap between what we
understand and what we think we should understand. It is the black
hole
between data and knowledge, and it happens when information doesn't tell us
what we want or need to
know."

The number of conventional information sources is at an all-time high. The


number of mainstream periodicals
continues to increase, while specialized
magazines and newsletters also proliferate. The advent of desktop
publishing
has made it possible for almost anyone to publish, and the result is a torrent
of new information
produced every day.

But even greater is the amount of information becoming available in an


electronic form. This information comes
in an even richer variety of formats.
First, there are the electronic equivalent of all the conventional forms: more
and more periodicals and journals are "going online" and making themselves
available in an electronic form.
Second are the "raw" sources, such as
newswire feeds, which give us access to information in an even less-
digested
form. The third, and most unique kind of information is the "grass roots"
data. This information, such
as that in Usenet newsgroups, can be created by
anyone online (and this makes it a very prolific source). The

besser.tsoa.nyu.edu/impact/s94/students/paul/paul_final.html 1/4
12/18/21, 11:45 AM The Future of Information Filtering

problem only
gets worse if we consider the databases of "reference" information, which are
growing at an even
greater rate.

Information Filtering Today


"Information Filtering" is a field of study designed for creating a systematic
approach to extracting information
that a particular person finds important
from a larger stream of information. It shares a lot of similarities to the
"Information Retrieval", which actively searches out information from an
existing database of information.

The main goal of Wurman's Information Anxiety is to help readers filter their
conventional information. He
prescribes a "Low-Fat Information Diet," which
should limit the sources a person should look through. He
suggests limiting
the input to one of each kind of information source (daily newspaper, news
magazine, culture
magazine...). Unless a person has a personal reader who will
perform filtering for him or her (like some rich
executives), this kind of
approach is necessary.

When a significant amount of information is available in electronic form, it


becomes possible to use a computer
program to do some of the filtering tasks.
The ultimate computer filterer would read all incoming information
and set
aside all articles that a particular human reader would want to read. The
problem comes in both defining
that complex interest and determining what
matches that interest.

Unfortunately, no significant automatic information filters are in mainstream


use. However, computer users can
use some very simple ("manual") methods of
filtering information. The organization of Usenet newsgroups
allows a user to
choose perticular discussion groups that focus on topics that interest the
user. News reader
program also allow a user to mark certain subject titles or
article authors so they are "filtered out". The subject
title must be an exact
match, however, and is only useful in filtering out "follow-up" articles to an
original
posting that is uninteresting.

Simple information retrieval systems exist, as well. Databases can be searched


for matching keywords or
combinations of keywords. Some systems even allow
slightly more complicated searches ("Search for articles
containing the words
'Star Wars' or 'Lucasfilm,' but not the word 'S.D.I.'). Neither information
filtering systems
nor information retrieval methods in wide use today utilize
"intelligent" look-up or extraction.

Advanced Filtration Concepts

Computer scientists and information technologists have experimented with a


number of different methods to best
determine a reader's interest in a given
piece of information. Most work has been done using a newsgroup-type
model,
which consists of a stream of separate articles. Experimental filtration
programs would select out articles
that would interest a particular reader.

The first step in creating a filtration system is determining and representing


a reader's interest. What topics or
other elements of an article determine a
reader's interest? One straight-forward approach asks the target reader
for a
list of keywords that he or she finds interesting. Some systems may also ask
for a "rating" which
determines the level of interest associated with the word.
These words are then compiled into a "user profile,"
which will later be
compared with articles to determine matches.

Another method would observe the articles that a user decides to read, analyze
their content, and add that
information to a cumulative user profile. For more
accurate results, a program could ask a reader to indicate how
interesting each
article is after reading it. A system could construct a user profile from this
information by
simply add every word found in interesting articles, and
increasing the ranking of words that appear multiple
times.

Commonly-used words can create problems when this kind of approach is taken.
We wouldn't want occurances
of conjunctions to determine if an article is
filtered in or out. The simple solution is to construct a list of
commonly-used words, and make sure they are never put into a user profile. A more advanced method would
besser.tsoa.nyu.edu/impact/s94/students/paul/paul_final.html 2/4
12/18/21, 11:45 AM The Future of Information Filtering

analyze a word within the context of an article, by making a ratio out of the number of times the word occurs in
the article and the total words in the article. This is then compared to average occurance of that word in any text.
If the occurance of the word is higher in our article, the program weights it more significantly.

Once the program has created a reader profile, it can use this to screen
articles in or out. The program can, again,
look directly at the number of
keyword matches, calculate ranked matchings, and compare the matchings with
average occurances of the word in a random text. Programs could also try to
recognize synonymous or related
words and calculate that into the ranking.

Some studies have found that the best result come from performing a number of
different methods and
combining the results. Particular articles tend to be
matched well by particular methods, and using one matching
model usually
filters out too many close matches. The ultimate goal in the field of
artificial intelligence is to
emulate the understanding of ideas with a
computer, and current research is aspiring to results much greater than
those
mentioned here.

Future Impact of Filtering


With information filtration systems in the early stages of development, is it
difficult to get a clear picture of how
the information we will get from the
media of the future will differ from the information we get today. There are
certainly a number of issues that should be of strong concern for those
developing these systems, however.

It is very easy to imagine problems that are a direct extension of those


explained by current media critics. Noam
Chomsky and others describe a
mainstream media with increasing outlets but decreasing sources of information.
Large media companies are buying up smaller media companies and continuing to
operate them, with the same
editorial viewpoint. Chomsky points out that no
matter where you go in the mainstream media, there are
particular kinds of news
that you will never find. The advances in information technology could help or
hurt this
situation, depending on who decides which news sources will become
part of the "information stream."
Hopefully, alternative sources of information will be able to find their way into the electronic forms of media.

A more realistic problem with alternative voices, however, may be more subtle.
If a complete information
filtration system is built to include daily news, it
will have to not only choose stories of interest, but select the
"best"
relation of that story. However that is determined (and whether that is
determined on a story-by-story
basis or not) would drastically affect a
person's (or a public's) perception of the news.

A more philosophical objection to filtering technology asks if a system of


automated selection of information
will stifle new thought. After all, if a
readers are sheltered from all topics except those that they know about and
want to hear more about, they may get a skewed perception of the world, or at
least one that is much less rich
than they might have gotten if they were
forced to plow through articles themselves.

Some form of information filtering is already a necessity. While current


filtering systems are very simplistic,
more complex and workable systems will
be developed soon, and probably enter the mainstream. How much
automated forms
of filtering will take over this role will depend on the strengths of developed
systems and how
much people will trust computers to tell them what they want to
know.

Bibliography

Belkin, Nicholas J., et al, "Information Filtering and Information Retrieval: Two Sides of the Same Coin?",
Communications of the ACM (Volume 35, No 12, December 1992).

Bowen, T.F., et al, "The Datacycle Architecture", Communications of the ACM


(Volume 35, No 12, December
1992).

Foltz, Peter W., et al, "Personalized Information Delivery: An Analysis of Information Filtering Methods",
Communications of the ACM (Volume 35, No 12, December 1992).
besser.tsoa.nyu.edu/impact/s94/students/paul/paul_final.html 3/4
12/18/21, 11:45 AM The Future of Information Filtering

Klapp, Orrin, Overload and Boredom: Essays on the Quality of Life in the
Information Society (New York:
Greenwood Press, 1986).

Naisbitt, John, Megatrends (New York: Warner Books, 1985).

Sheth, Beerud Dilip, A Learning Approach to Personalized Information Filtering


(Master's Thesis, Department
of Electrical Engineering and Computer Science,
University of California at Berkeley, 1994).

Wurman, Richard Saul, Information Anxiety (New York: Doubleday, 1989).


Impact Main Menu

besser.tsoa.nyu.edu/impact/s94/students/paul/paul_final.html 4/4

You might also like