Download as pdf or txt
Download as pdf or txt
You are on page 1of 298

INTREX

Report of a Planning Conference on


Information Transfer Experiments

Edited by Carl F. J. Overhage and R. Joyce Harman


The MIT Press

This file has been authorized and provided by the


publisher, The MIT Press, as part of its ongoing efforts
to make available in digital form older titles that are no
longer readily available.

This file is provided through the Internet Archive for


library lending, and no further reproduction or
distribution is allowed except as permitted by
applicable law or in writing by the MIT Press.

The MIT Press One Rogers Street Cambridge, Massachusetts 02142


INTREX
Report
of a Planning Conference on
Information Transfer Experiments
INTREX
Report
of a Planning Conference on
Information Transfer Experiments

September 3, 1965

Edited by Carl F. J. Overhage and R. Joyce Harman

Sponsored by
The Independence Foundation
of Philadelphia, Pennsylvania

The M.I.T. Press

Massachusetts Institute of Technology


Cambridge, Massachusetts, and London, England
Copyright © 1965
by The Massachusetts Institute of Technology

All Rights Reserved. This book may not be reproduced, in whole or


in part, in any form (except by reviewers for the public press),
without written permission from the publishers.

0262150042
OVERHAGE
intrex REPORT

Library of Congress Catalog Number: 65-28409


Printed in the United States of America
TABLE OF CONTENTS

Participants vii

Summary xv

Chapter I Project Intrex Planning Conference 1

II The University Library and Its


Continuing Evolution 7
The Changing Intellectual Environment
The Impact upon the University Library
and Its Response
Needs for the Future

III The National Network of Information


Centers 15
National Library-Information Systems
Operating Systems
Cooperation and Networks
Networks
A Look at the Future
International Networks

IV The On-Line Intellectual Community 25


Introduction
The Project MAC Experience
The Concept
The Influence of the Concept

V The Information Transfer System at


MIT in 1975 43
A Central Computer
The Information Store
Access Techniques
Selective Dissemination
New Publishing Patterns
Integration with Other Information
Sources
The Role of the Librarian
The Information Transfer Budget
of 1975

VI Project Intrex 53
Library Modernization
Extension of Time-Shared Computing
The Course of Project Intrex

iii
VII The Experimental Program 61
1* The Model System 61
The Augmented Catalog
Text Access
2. Integration with National Resources 91
Objectives
Assumptions
Typical Experimental Use
General Comment
Selection of Initial Information Centers
Other Users
Teletype Links
Observational Objectives
3. Fact Retrieval 96
The Automated Index
The Automated Handbook
The Automated Notebook
Future Research
4. Initial Facilities 102
The Computational Facility
Software
Storage
Transfer of Library into the Store
Transmission, Display and Consoles
Permanent Copy
Summary
5. Related Studies: Extensions and 111
Elaborations
Educational Functions of Intrex
Selective Dissemination
Browsing( Accidental Discovery)
Publishing
Selective Retention
6. R&D to Support the Experimental Program 131
Consoles
Interaction Language
Analysis of Content
Analysis of the Needs of Users
Theory of Information Transfer
7. Data Gathering for Evaluation 139
Data on Use
Economic Controls
Data on Learning
An Annotated Userrs Card

iv
Appendix A Remarks of Vannevar Bush to Project
Intrex Planning Conference 144

B An On-Line Information Network -


J. C. R. Licklider 147

C Measuring User Needs and Preferences


George A. Miller 156

D Interaction Languages - D. G. Bobrow 159

E Economics, Libraries and Project


Intrex - H. H. Fussier 163

F The Role of Graphics in Information


Transfer - J. L. Simonds 165

G Data Archives and Libraries - Ithiel de


Sola Pool 175

H Guidelines for Intrex Content-Analysis


Experiments - Ascher Opler 183

I Proposed Experiments in Browsing -


J. C. R. Licklider 187

J The Motivations of Authors - Intellec¬


tual Property and the Computer - E. S.
Proskauer 199

K Project Intrex and Microphotography -


Peter Scott 203

L The Nature of the Experiments to be


Carried Out by Project Intrex - J. C. R.
Licklider 215

M How Humanists Use a Library - John E.


Burchard 219

N On the Prediction of Library Use -


Philip M. Morse 225

v
Appendix O A Technique of Measurement That May
Be Useful in Project Intrex Experi¬
ments J. C. H. Licklider 235

P Experiments on Indexing - Search and


Dissemination - Merrill M. Flood 237

Q Educational Functions of Intrex -


Stanley Backer 243

R Theory, Modeling and Simulation - R. C.


Raymond 255

S Graphic Communications for the Library -


Paul L. Brobst 269

T More on the Expert Filter - Stanley


Backer 275

vi
PROJECT INTREX PLANNING CONFERENCE
Carl F. J. Overhage, Chairman
PARTICIPANTS

Stanley Backer
Professor of Mechanical Engineering
Massachusetts Institute of Technology
Cambridge, Massachusetts

Gary L. Benton
Assistant Director, Project Intrex Planning Conference
Massachusetts Institute of Technology
Cambridge, Massachusetts

Daniel G. Bobrow
Head, Artificial Intelligence Group
Bolt Beranek and Newman
Cambridge, Massachusetts

John E. Burchard
Dean-Emeritus, School of Humanities
Massachusetts Institute of Technology
Cambridge, Massachusetts

Harold E. Clark
Chief Physicist and Scientific Director
Xerox Corporation
Rochester, New York

Melvin S. Day
Director, Office of Scientific and Technical Information
National Aeronautics & Space Administration
Washington, D.C.

James O. Dyal
Senior Engineering Specialist
Xerox Corporation
Rochester, New York

Miss Frances Flynn


Project Trident and Operations Research Librarian
Arthur D. Little, Inc,
Cambridge, Massachusetts

Herman H. Fussier
Director, The University Library
Professor, The Graduate Library School
University of Chicago
Chicago, Illinois
Richard L. Garwin
Director of Applied Research
International Business Machines Corporation
Yorktown Heights, New York

Miss R. Joyce Harman


Assistant to the Director
Lincoln Laboratory, MIT
Lexington, Massachusetts

Frank E. Heart
Associate Group Leader
Lincoln Laboratory, MIT
Lexington, Massachusetts

Herman H. Henkle
Executive Director and Librarian
The John Crerar Library
Chicago, Illinois

Mrs. Irma Johnson


Reference Librarian
Massachusetts Institute of Technology
Cambridge, Massachusetts

Myer M. Kessler
Associate Director of Libraries
Massachusetts Institute of Technology
Cambridge, Massachusetts

J.C.R. Licklider
Consultant to the Director of Research
International Business Machines Corporation
Yorktown Heights, New York

William N. Locke
Director of Libraries
Massachusetts Institute of Technology
Cambridge, Massachusetts

Stephen A. McCarthy
Director of Libraries
Cornell University
Ithaca, New York

Max V. Mathews
Director, Behavioral Research Laboratory
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

viii
George A. Miller
Professor of Psychology
Harvard University
Cambridge/ Massachusetts

Foster E. Mohrhardt
Director, National Agricultural Library
Washington, D.C.

Philip M. Morse
Professor of Physics
Director, Operations Research Center; Director,
Computation Center
Massachusetts Institute of Technology
Cambridge, Massachusetts

Miss Natalie N. Nicholson


Associate Director of Libraries
Massachusetts Institute of Technology
Cambridge, Massachusetts

CarlF.J. Overhage
Professor of Engineering; Director, Project Intrex
Massachusetts Institute of Technology
Cambridge, Massachusetts

Eric S. Proskauer
Vice-President and Manager
Interscience Publishers, a Division of John Wiley &
Sons, Inc,
New York, New York

Richard C. Raymond
Consultant-Information, Advanced Technology Services
General Electric Company
New York, New York

Arthur L. Samuel
Consultant to the Director of Research
International Business Machines Corporation
Yorktown Heights, New York

Peter R. Scott
Head, Microreproduction Laboratory
Massachusetts Institute of Technology
Cambridge, Massachusetts

John L. Simonds
Head, Information Technology Laboratory
Eastman Kodak Company
Rochester, New York

ix
Charles H. Stevens
Staff Member, Project Intrex
Massachusetts Institute of Technology
Cambridge, Massachusetts

Don R. Swanson
Dean, The Graduate Library School
University of Chicago
Chicago, Illinois

Miss Rebecca L. Taggart


Head, Engineering Libraries
Massachusetts Institute of Technology
Cambridge, Massachusetts

Joseph Weizenbaum
Associate Professor of Electrical Engineering
Massachusetts Institute of Technology
Cambridge, Massachusetts

Gordon Williams
Director, The Center for Research Libraries
Chicago, Illinois

Victor H, Yngve
Professor of Linguistics and Library Science
University of Chicago
Chicago, Illinois

VISITORS

Scott Adams
Deputy Director, National Library of Medicine
Bethesda, Maryland

Burton W. Adkinson
Head, Scientific Information Service
National Science Foundation
Washington, D,C.

Henry Aiken
Professor of Philosophy
Brandeis University
Waltham, Massachusetts

Samuel Alexander
Chief, Data Processing Systems Division
National Bureau of Standards
Washington, D. C.

x
W.O. Baker
Vice-President, Research
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

Curtis G. Benjamin
Chairman, Management Board
McGraw-Hill Book Company
New York, New York

Joseph L. Boon
Technical Assistant to the General Manager
Apparatus and Optical Division
Eastman Kodak Company
Rochester, New York

Paul Brobst
Member of the Technical Staff
Xerox Corporation
Rochester, New York

W. Stanley Brown
Member of the Technical Staff
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

Douglas W. Bryant
University Librarian
Harvard University
Cambridge, Massachusetts

Vannevar Bush
Honorary Chairman and Life Member of the Corporation
Massachusetts Institute of Technology
Cambridge, Massachusetts

Verner W. Clapp
President
Council on Library Resources, Inc.
Washington, D. C.

Norman Cottrell
Director of Documentation Services
American Society of Metals
Cleveland, Ohio

John W. Emling
Executive Director, Transmission Engineering
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

xi
Robert Fano
Professor of Electrical Engineering
Massachusetts Institute of Technology
Cambridge, Massachusetts

Merrill M. Flood
Professor and Senior Research Mathematician
University of Michigan Medical School
Ann Arbor, Michigan

Steven Furth
Industry Development Manager in Information Retrieval
International Business Machines Corporation
White Plains, New York

Miss Marjorie Griffin


Manager, Advanced Systems Development Division Library
International Business Machines Corporation
Los Gatos, California

Leon D. Harmon
Member of the Technical Staff
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

Karl F. Heumann
Director, Office of Documentation
National Academy of Sciences
Washington, D.C.

Eugene Jackson
Corporate Director of Libraries
International Business Machines Corporation
Armonk, New York

Mark Kac
Professor of Mathematics
The Rockefeller Institute
New York, New York

Robert A. Kennedy
Head, Library Systems Department
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

Gilbert King
Member, Board of Directors; and Research Consultant
Itek Corporation
Lexington, Massachusetts

xii
Russell A. Kirsch
Electronic Scientist
National Bureau of Standards
Washington, D.C.

William T. Knox
Technical Assistant to the Director
Executive Office of the President, Office of Science
and Technology
Washington, D.C.

W. Kenneth Lowry
Manager, Technical Information Libraries
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

Lee E. McMahon
Member of the Technical Staff
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

Marvin L. Minsky
Professor of Electrical Engineering
Massachusetts Institute of Technology
Cambridge, Massachusetts

Richard Oldham
Assistant Chief Engineer
Station WGBH
Boston, Massachusetts

Ascher Opler
Vice-President
Computer Usage Company
New York, New York

Harald Ostvold
Director of Libraries
California Institute of Technology
Pasadena, California

Miss Dorothy Parker


Associate Director
The Rockefeller Foundation
New York, New York

John R. Pierce
Executive Director, Research-Communications Principles
and Systems Research Divisions
Bell Telephone Laboratories, Inc.
Murray Hill, New Jersey

xiii
Ithiel de Sola Pool
Professor of Political Science
Massachusetts Institute of Technology
Cambridge, Massachusetts

Nathaniel Rochester
Manager, Boston Programming Center
International Business Machines Corporation
Boston, Massachusetts

Jesse H. Shera
Dean, School of Library Science
Western Reserve University
Cleveland, Ohio

Samuel S. Snyder
Information Systems Specialist
Library of Congress
Washington, D.C.

Fred A. Tate
Associate Director, Chemical Abstracts
Columbus, Ohio

John W. Tukey
Professor of Mathematics
Princeton University
Princeton, New Jersey

Claude Walston
Member of the Technical Staff, Federal Systems Division
International Business Machines Corporation
Bethesda, Maryland

I. A. Warheit
Manager, Projects in Information Retrieval
and Very Large Storage Systems
International Business Machines Corporation
San Jose, California

F. Karl Willenbrock
Associate Dean of Engineering and Applied Physics
Harvard University
Cambridge, Massachusetts

xiv
SUMMARY

1. The Project Intrex Objective

The task of the Planning Conference has been to formulate a


coordinated program of information transfer experiments to
be performed as Project Intrex by the Massachusetts Insti¬
tute of Technology. The objective of these experiments is
to provide a design for evolution of a large university library
into a new information transfer system that could become
operational in the decade beginning in 1970. In its five-week
study, the Conference has concentrated its attention on tech¬
nical and operational matters; it has not attempted to resolve
the social, economic and legal problems implicit in such a
transformation.

2. Information Transfer in the University of the Future

Three main streams of progress in the information transfer


field were intensively discussed among us at the Conference:

(a) The modernization of current library proce¬


dures through the application of technical advances
in data processing, textual storage, and reproduction;

(b) The growth, largely under Federal sponsorship,


of a national network of libraries and other informa¬
tion centers;

(c) The extension of the rapidly developing technol¬


ogy of on-line, interactive computer communities
into the domains of the library and other informa¬
tion centers.

We believe that the university information transfer system


of the next decade will result from a confluence of these
three streams. Rapid advances in information transfer by
on-line computer systems will greatly extend the scope
of information services in the academic community, but
only if they are supported by the resources of a modernized
university library and by integration with coordinated net¬
works of local and national resources.

The experimental program we recommend combines the


exploitation of on-line computer technology with the modern¬
ization of some current library procedures, with emphasis
on the former.

xv
3. Selectivity in the Intrex Program

While our discussions extended over a very large range of


possible experiments, the recommended program is addressed
mainly to the broad problem of access - in particular, access
to bibliographic material, documents, and data banks. A
core program dealing with this information transfer function
has been formulated, together with supporting activities and
recommended extensions.

4. The "Model Library”

To provide an environment for the performance of the Intrex


experiments, we recommend the establishment of a facility,
called a nmodel library”. Only by coming to grips with the
real, every-day problems of setting up and running a pilot
system can the project assemble the experience required to
evaluate its experiments, just as it is only by serving the
real needs of real users in the university community that
the experiments can be meaningful.

5. Mechanization of Current Procedures

In its early stages, the model library will display, in readily


attackable form, most of the basic problems of university
libraries. It will therefore afford an excellent opportunity
to combine the procedural background developed in the MIT
libraries with an on-line computer system's capabilities
for solving such problems as the selection, acquisition and
weeding of materials and the control of serials. The model
library will also be useful in formulating theory, and in
acquiring data for analysis of system performance.

6. The Augmented Catalog Experiment

The principal finding element in an information transfer sys¬


tem involving any form of storage will be a catalog. Augmen¬
tation of the catalog in content, depth and connectivity is
facilitated by the computer that is used to control the flow of
information in the system. We recommend experiments with
an augmented catalog established as a data base in digital
form in the on-line computer system. Such a catalog would
cover books,, journal articles, reviews, technical reports,
theses, pamphlets, conference proceedings, and other forms
of recorded information. The catalog should encompass an
interdisciplinary field, and should contain references to
enough material to interest a serious worker and to present
significant bibliographic problems. Operational experiments
will deal with bibliographic search for both specified and
unknown documents. The catalog will also provide data for
experiments in selection, acquisition, circulation, and other

xvi
library operations; in selective dissemination; in some limi¬
ted forms of browsing; and in recording user interaction
with the system.

7. The Text Access Experiment

When the retrieval specification has been narrowed to the


set of identifiers of the documents the user wants, the prob¬
lem is to deliver or display those documents to him, or to
have them ready when he calls for them. Project Intrex
must determine the merits of the various approaches
to this problem in terms of effectiveness and cost. The
proposed experiments will involve the following technologies.

For storage: — print on paper, analog microimages


on photographic materials, analog signals on
magnetic or possibly thermoplastic materials,
digitally encoded characters and graphic elements
on photographic or magnetic materials;

For delivery: — transportation for some of the


foregoing, electrical transmission for others;

For display: — direct inspection, xerography,


optical projection, oscilloscopic display, and
the like.

We recommend implementation by Project Intrex of several


of the most promising systems, and operational evaluation
through actual use.

8. The Network Integration Experiment

We suggest that Project Intrex explore a range of ideas


designed to promote the integration of university libraries
into the national (and, ultimately, international) network
of information centers. A major experiment is recom¬
mended on the interaction of a computer-based university
information transfer system with the informational resources
of such organizations as the National Library of Medicine
and the National Aeronautics and Space Administration. In
addition, we recommend that Project Intrex explore, with
other research libraries, documentation centers, and infor¬
mation exchanges, the various ways of interchanging biblio¬
graphic, indexing and abstracting information, and of over¬
coming divergences of format and convention that might
impede cooperation.

xvii
9. The Fact Retrieval Experiment

The existing bibliographic organization is largely document-


centered. During the expected life of Project Intrex, con¬
tinued progress will be made on the rapid processing of
data retrieved from very large files; some capability will
be developed for the retrieval and assembly of facts; and
many advanced systems for the automatic answering of
questions will appear. A recommended major Intrex experi¬
ment will involve development of a computerized "handbook"
and data banks, and of techniques for querying them. These
techniques will be compared and evaluated in relation to
book-based techniques.

10. Experiments with Other Library Functions

The facilities required for the major Intrex experiments will


support others. Insofar as personnel, space and funding
permit —and particularly as the topics themselves attract
interest and resources—experiments are recommended in
such areas as:

(a) Teaching and learning in the on-line network.

(b) Browsing; planned facilities to foster unplanned


discovery.

(c) Selective dissemination of information.

(d) Use of the on-line network to expedite preparing,


reviewing, printing, indexing and abstracting of
manuscripts.

(e) "Publishing" through the system to the on-line


community.

11. Component Technology

Perhaps the most significant factor in the situation that


Project Intrex is entering is the availability of a powerful
new computer technology. With respect to this technology,
we are distressed by the primitive state of two critical
items: consoles and interaction languages. We recommend
that Project Intrex give attention to these areas.

12. A Unifying Theory

A major intellectual challenge for Project Intrex is the devel¬


opment of a unifying theory that will lead to coherent design
and interpretation of experiments in information transfer
systems.

xviii
CHAPTER I

PROJECT INTREX AND ITS PLANNING CONFERENCE

Among the many difficulties caused by the growing complexity


of our civilization, the crisis faced by our great libraries is
one of the most distressing, for these libraries have long been
regarded as outstanding manifestations of our culture. But
they will become increasingly ineffectual agencies for the trans¬
fer of information, and they could become lifeless monuments,
unless we can find new methods of managing the enormous mass
of books, periodicals, reports, and other records produced by
our expanding intellectual activities.

The spectacular advances of the last decade in data processing


and in document copying have given us good reason to hope that
a way can be found out of the library crisis by the imaginative
use of new technology.

Faced with this challenge, the Massachusetts Institute of Tech¬


nology has established a program of information transfer ex¬
periments directed toward the functional design of new library
services that might become operational at MIT and elsewhere
in the decade beginning in 1970. Project Intrex has the two¬
fold objective of finding long-term solutions for the operational
problems of large libraries and of developing competence in
the emerging field of information transfer engineering in close
concert with the MIT libraries.

In the university of the future, as it is visualized at MIT, the


library will be the central resource of an information transfer
network that will extend throughout the academic community.
Students and scholars will use this system not only to locate
books and documents in the library, but also to gain access to
the university's total information resources, throughTbuch-
Tone telephones, teletypewriter keyboards, television-like
displays, and quickly made copies. The users of the network
will communicate with each other as well as with the library;
data just obtained in the laboratory and comments made by ob¬
servers will be as easily available as the text of books in the
library or documents in the departmental files. The informa¬
tion traffic will be controlled by means of the university's time-
shared computer utility in much the same way in which today's
verbal communications are handled by the campus telephone
exchange. Long-distance service will connect the university's
information transfer network with sources and users elsewhere.

Today we do not know how to specify the exact nature and scope
of future information transfer services. We believe that their
design must be derived from experimentation in a working

1
environment of students, faculty, and research staff. A
uniquely favorable situation will exist at MIT for fruitful ex¬
perimentation in this field. There are library users, in all
academic categories, who are accustomed to the experimental
approach and who will cooperate in meaningful tests of new
services. In Project MAC, MIT is already carrying forward
a broad study of machine-aided cognition which will greatly
stimulate the rise of new concepts in information transfer.

To formulate the experimental plan for Project Intrex, a Plan¬


ning Conference was convened from 2 August to 3 September
1965 at the Summer Studies Center Qf the National Academy
of Sciences at Woods Hole, Massachusetts. The Planning
Conference was funded by a grant from the Independence Foun¬
dation of Philadelphia, Pennsylvania. The aim was to call in
a group of experts from outside as well as inside MIT, and to
establish a broad consensus on the direction and range of the
experimental program. The membership of the Conference
was divided among librarians and documentalists, scientists
and engineers, with some representation of architecture, lin¬
guistics, mathematics, philosophy, psychology and publishing.

The present book is the report of the Project Intrex Planning


Conference. It represents the general consensus of a group
of some 30 long-term participants, and takes into account
many of the views, though certainly not all, of a comparable
number of visitors who came for shorter periods.

In view of the diversity of training and experience among the


participants, it is hardly surprising that the Conference was
characterized by lively debates in the small and large working
sessions in which it carried on its work. The mutual exposure
of the participants to unaccustomed modes of thought, over a
protracted period, in an environment free of urban distractions,
may have been, in itself, a significant contribution to progress
in the information field.

More important, in terms of the Conference objective, is the


success of the group in reaching agreement on two broad ques¬
tions that had to be resolved before a coherent set of recom¬
mendations became possible. The first of these questions
relates to two major avenues along which progress in informa¬
tion systems will be made in the next decade: (a) the improve¬
ment of conventional library procedures through technological
developments; (b) the organization of information transfer
communities based on time-shared computer systems. Should
Project Intrex concentrate its efforts entirely on the more
radical advances possible under (b) and exploit to the fullest
the experience available at MIT in establishing an on-line com¬
puter community? The Planning Conference decided to counsel
against such exclusive concentration on a single approach.
A combined program was recommended in which the pursuit
of the more advanced concept is to be combined with

2
experimental modifications of conventional library operations.
The rationale of this view is set forth in Chapters V and VI.

The other question relates to the shaping of a program from


the vast and diffuse array of possible experiments that were
reviewed by the Conference. At the outset, no plan was contem¬
plated beyond the selection of those experiments that offered
the greatest promise for successful implementation by 1975.
As this selection was discussed, a strong view developed toward
an early model of an information transfer system as an interim
goal, and as a device to assemble a significant set of experi¬
ments into a coherent group. This model system which occupies
a central place in the recommendations of Chapter VII, is to be
strictly experimental in design and operation. It should be re¬
garded as the matrix within which a selected group of experi¬
ments can be meaningfully performed to serve the real needs
of real users. If, upon termination of Project Intrex, the users
of this experimental system demand that its operation be con¬
tinued, it might become the first installment of the information
transfer system of 1975. Throughout the report, the short
designation Mmodel library" has been used for this experimen¬
tal system.

At many points in its deliberations, the Planning Conference


was aware of important economic and legal problems that will
be raised by the technological innovations under discussion.
It is clear that substantially larger budgets will be required
for the information transfer systems of the future than are
provided for the libraries at present. It is equally clear that
the economic patterns in which authors, publishers and libraries
are operating today will be disturbed by the extensive utilization
of modern duplicating techniques in the information transfer
systems of the future. The reader will find, throughout this
report, implicit or explicit recognition of these problems but
he will look in vain for a plan of action, either toward larger
budgets or toward revised pricing structures. Nor will he
find discussion of the copyright problem other than a reflection
of the confident view held by most participants that an accept¬
able niodus vivendi would be found as more experience with
the new operations becomes available. * The objectives of the

EReaders with specific interest in the copyright implications


of new information transfer methods will wish to review the
work of the Committee to Investigate Copyright Problems
Affecting Communication of Educational and Scientific Informa¬
tion (CICP), and to examine such reports as The 1965 Congres-
sional Hearings on the Copyright Law Revision; Interim Report
on the Study of the Feasibility of a Copyright Clearing House,
Jan. 22, 1965; Freehafer, E.G., Summary Statement of Policy
of the Joint Libraries Committee on Fair Use in Photocopying,
Special Libraries, Vol. 55, p. 104, (February 1964); Hattery,
L.H. and Bush, G. P., Reprography and Copyright Law, Wash¬
ington, D.C., 1964, American Institute of Biological Sciences.

3
Conference were in the technical realm; its principal aim was
to formulate a plan for experiments.

Repeated reference to information transfer experiments at


MIT may have created the impression that Project Intrex is
to be devoted to the improvement of information transfer ex¬
clusively on the MIT campus. This is not the intention. A
project of the magnitude contemplated here can be justified
only if it serves national needs.

But it seems essential at the beginning to provide a clear focus


for new efforts in an environment in which new ideas can be
subjected to the experimental and operational tests. The MIT
community is such an environment; we hope that others may
be identified as the work proceeds and that cooperative experi¬
ments will come into existence. Project Intrex is expected to
yield significant contributions toward the modernization of all
large libraries and, indeed, toward the general improvement
of information transfer. The proposition that work is most
effective at the local level is tenable only if such work is car¬
ried on with constant awareness of the need for future integra¬
tion into regional and national systems.

Acknowledgments

One of the most encouraging aspects of this Conference was


the widespread interest and generous support which it enjoyed
throughout the preparation and the execution of the meeting.
It is a pleasure to report at this time that the financial arrange¬
ments with the Independence Foundation were characterized
by a simplicity and a friendliness unique in the experience of
the chairman. The participants, the sponsor, and the Massa¬
chusetts Institute of Technology are pleased to acknowledge
the assistance of many organizations that contributed to the
success of the Conference by sending members of their staffs
to take part in the work of the Conference. The action of the
National Academy of Sciences in authorizing the use of its
facilities at Woods Hole has not only provided a perfect envi¬
ronment for the Conference, but also a deeply appreciated
signal of the importance of better information transfer in
universities.

The participants in the Planning Conference wish to record


their indebtedness to Mr. Gary L,. Benton and Miss Arlene
Kirshen for administrative arrangements; to Mrs. Helen
Barnum and her staff for countless courtesies in extending
the National Academy's hospitality to the Conference; to the
Misses R. Joyce Harman and Frances Flynn and their staff
for their excellent performance in dealing with the intellectual
and logistical problems of preparing this report for the press;

4
to the MIT Press for foregoing the customary editorial reviews,
to permit publication of the report within six weeks of the close
of the Planning Conference.

5
CHAPTER II

THE UNIVERSITY LIBRARY


AND ITS CONTINUING EVOLUTION

The university library in the 1960’s selects, acquires, orga¬


nizes, and makes available a massive quantity and variety of
materials in support of research in every aspect of the human¬
ities, the social sciences, the sciences and technology. In the
totality of its responsibilities and efforts, the typical university
library is encyclopedic in character. Since it is the hope of
Project Intrex to reach beyond the Massachusetts Institute of
Technology and to benefit the operations of university libraries
generally, this chapter will consider these matters in a broad
context. The Institute has been evolving in the direction of a
university, but it does not claim to offer the breadth of sub¬
jects for study and research typical of great universities here
or abroad. Science, engineering, and the related social sci¬
ences are preponderant in the Institute, and this is reflected
in the development and the present state of its libraries. The
libraries of most universities, on the other hand, reflect the
whole range of intellectual concern, of which science and tech¬
nology are but a part. Inevitably, information problems must
be studied within this broad environment. Their solutions, to
be adequate, must apply to every field of human endeavor,
though different kinds of solutions will be required for different
fields.

THE CHANGING INTELLECTUAL ENVIRONMENT

The university library is critically affected by changes in the


intellectual world in which it exists and which it is designed
to serve. The 20th century has seen quite profound changes
in this world, and one must anticipate that the rate of change
will increase during the rest of the century.

Among the more conspicuous of these changes has been the


increase in the extent to which recorded information has
become a critical need of almost all aspects of modern life --
education, government, business, industry, general cultural
development, the professions, and research of all kinds.
One can no longer effectively run a business or an industrial
enterprise or engage in even a modest research endeavor
without an adequate body of pertinent and current information.

Perhaps the second most conspicuous change has been the


huge increase in research work of very wide subject scope
in an ever-increasing number of universities. Such research

7
requires exceedingly extensive and highly accessible book,
journal and report collections for its support, for the serious
investigator cannot be satisfied until he has reasonably good
access to the past and present records of his subject field.
Indeed, unless he does have such access, the cumulative,
progressive character of virtually all serious research could
not exist. Programs of graduate instruction also create very
extensive demands, if good students are to be well trained.

With these changes has come a recognition that the concerns


of Western scholarship and contemporary society cannot be
parochial, but must be based upon access to, and an under¬
standing of, almost all the other societies and cultures of the
world — regardless of geographic location, language, or
form of material. Furthermore, there are now very few
places in the world that are not producing more and more
literature. Central Africa, for example, produced very
little recorded information 50 years ago; it produces a sub¬
stantial amount now; and it will produce very much more in
the future. It is important that this new information be
collected, organized, and made available to many people in
our society.

It is evident that all societies and cultures, past and present,


generate much information of a popular nature or simply as
a means of day-to-day business of all sorts. Even some re¬
search literature is of marginal quality. Such ephemera,
however, may have very real cultural and historical values;
and to assume, initially, that no access need ever be pro¬
vided to much of the total expanding volume of print, and
other forms of recorded information, suggests a prescience
that, in times past, would have obliterated critical portions
of the record of human achievement. There is no doubt that
this risk still exists. It is therefore asserted that society
needs to have, and can afford, access to a comprehensive
record, even though each library can collect and retain only
a small percentage of the total. The individual investigator
is then free to make his own selections from the corpus of
collected and organized material.

The literature base for research or teaching in many disci¬


plines is subject to more and more rapid change. Thus, while
there are many highly specialized disciplines, the men work¬
ing in these fields may need to draw upon the literature pro¬
duced in many other disciplines. For example, contemporary
research in medicine requires extensive access to the liter¬
ature of physics, chemistry, radiology, etc. It is impossible
to predict either the rate of such shifts or their precise direc¬
tions.

8
Moreover, the rapid pace of scientific and technological re¬
search and development introduces a factor of substantive
obsolescence in many kinds of literature. This reduces the
frequency with which such literature must be consulted. How¬
ever, society is unwilling to erase the record of the past, for
historians and other investigators find the serious study of
the past illuminating and valuable in the better understanding
of many aspects of contemporary civilization. Because so
much of the record of contemporary civilization has been
published on wood pulp paper that is subject to rapid deterior¬
ation, research libraries have the added obligation of finding
means of preserving these records. New methods of pre¬
serving old paper and filming or otherwise preserving the
content of crumbling documents are being put into effect while
concurrently specifications for improved papermaking are
being developed.

Fortunately, the modern evolution of science and technology,


with very heavy Federal financial support, has not only pro¬
duced new demands for scientific and technological informa¬
tion, but also a technology that can be applied in the solution
of information-handling problems facing almost all parts of
society.

THE IMPACT UPON THE UNIVERSITY LIBRARY AND ITS


RESPONSE

Increased emphasis on research and the steadily growing


volume of publication have combined to accelerate the growth
rate of university library collections. Some of the larger
university libraries are growing at the rate of 3-4% per year,
many of the smaller ones at even higher rates. Present
policies and prospects for research support and for the ex¬
tension of educational opportunities suggest that this trend
will continue upwards. The number of books published in
this country alone is increasing about 10% per year, and the
same is true of other major publishing nations. In addition,
many new and smaller nations are developing substantial
publishing programs, under both private and governmental
auspices. The number of current scholarly journals, now
estimated at 70, 000 to 75, 000 titles, is also growing steadily.
The resultant output of published items is perhaps not pro¬
perly characterized as an explosion, but it is very large and
it becomes larger each year.

The university library not only acquires a substantial portion


of this published material but organizes it for use, primarily
through subject classification and cataloging. The experience
of scholars and students who have used foreign as well as
American libraries is that the systems of subject classifica¬
tion and cataloging currently provided by American university

9
libraries make them far easier to use than comparable col¬
lections in other countries. Stack access to a large book
collection classified by subject is both a sobering and an en¬
lightening experience for a reader who encounters it for the
first time, as many foreign visitors testify. Similarly, the
dictionary catalog, with author, title and subject approaches,
is a versatile tool for bibliographic access to the collection.

There is a dramatic extension of the diversity of materials to


be acquired in language, country of origin, and form. As
research in all disciplines is evenmore concerned with en¬
quiries pertaining to every corner of the globe, so there is no
quarter in which libraries must not seek out publications and
reports, and there is no language in which raw material for
study is not issued. Serious research is being conducted and
its results are being published in countries and languages in
which only a few years ago no significant scholarly investiga¬
tion took place. The difficult undertakings of acquisition and
bibliographic organization of such materials are being handled
by university libraries, either singly or cooperatively, with
growing effectiveness.

As regards the diversity in the forms in which research


materials are being produced -- and accordingly collected
and used -- it is sufficient in the context of this report to
mention only a few to indicate the current involvement of
libraries with physical forms of information that have neces¬
sitated marked departures from traditional practices. The
broad array of audio-visual materials is symptomatic.
Libraries now collect sound recordings on discs, tapes, and
sound tracks. Microreproductions and publications issued
initially in microimages appear in more and more different
forms. Motion pictures and still photographs have become
essential elements in research collections. And, finally,
there are video tapes, which potentially constitute the most
numerous, bulky, and potent arm in the whole of the audio¬
visual arsenal.

Technical report literature has become a major element in


university libraries only since World War II. No matter how
this material is handled, whether in central collections or in
scattered special-project files, ultimate disposition for his¬
torical purposes must be a serious concern of librarians. In
addition to bibliographic complexities inherent in this type of
literature, security classification adds a further complication.

The newest forms of research material are products of elec¬


tronic data processing which require treatment radically
different from those of any previous library practices. Data
banks and computer programs are sure to proliferate at a

10
very rapid rate, and they will have to be maintained and con¬
trolled in a manner that will make them readily available for
current use and manipulation, as well as for historical pur¬
poses. Fortunately, such materials are singularly adaptable
to mechanized handling.

The size of most university campuses and the dispersal of


academic facilities raises questions regarding the appropriate
disposition of library collections and services. Although the
university exemplifies the unity of knowledge, it accomplishes
its purposes by segmenting that unity into related working
parts (schools, divisions, departments and centers) in order
to reduce the totality to manageable proportions. The univer¬
sity interacts with the body of knowledge through the combined
efforts of its various parts.

In a similar way, the university library may be considered as


representing the unity of knowledge. If this concept is
dominant, the university library is a single physical entity.
If, however, the pattern of university organization is followed,
the library in turn may be divided into parts to serve the
several units of the university. The degree to which this
process of division is carried is frequently governed by local
conditions. The union card catalog in the central library pro¬
vides a unified bibliographical record despite decentralization
of the collections. This tool, however, valuable as it is, does
not fully meet the needs of interdisciplinary students and
scholars. Thus, in a decentralized university library system,
a relatively high degree of duplication may become essential.
One means of serving cross-disciplinary interests and lessen¬
ing the need for duplication is the divisional library which
serves groups of related departments, such as the biological
sciences.

The principal university and research libraries of the country


have taken a major step toward full bibliographical control
by cooperating with the Library of Congress in developing and
maintaining a national union list of serials and the National
Union Catalog. This catalog was initially an author card file
of some 14, 000, 000 entries at the Library of Congress.
Additions of current imprints are now published by the Library
of Congress in book form under the title National Union Catalog.
Both the National Union Catalog and the Union List of Serials
give conventional bibliographical information and locations,
thus serving as finding tools for scarce and elusive publica¬
tions .

A similar cooperative enterprise under the leadership of the


Library of Congress has resulted in the publication of two
volumes of the National Union Catalog of Manuscript Collections,

11
a continuing publication that serves as a guide to manuscript
collections in university and historical libraries.

To accommodate the steadily increasing university population


(students, faculty, and research staff), universities generally
have found it necessary to make major additions to their library
facilities. Additional library space for readers is needed not
only because of increased numbers but also because of changes
in teaching methods which require more independent work by
undergraduates. Similarly, the heavy emphasis on, and the
expansion of, research have resulted in far more intensive
use of libraries by graduate students and faculty. Project
research programs have also brought to the university library
a relatively new type of user, the full-time research worker who
is neither a student nor a faculty member.

Increased numbers of users, independent work, and emphasis


on research create a new intensity in the use of library
materials. The resulting competition for access to specified
library materials becomes at times severe. When outside
pressures for use of the same materials — through direct or
interlibrary loan requests from other universities and from
industrial and government research agencies — are added to
local demands, acute problems may develop. This situation
has been alleviated by the wide use of full-size and microform
copies. Some institutions, however, have adopted policies
that seriously restrict access to the library collections.

As libraries get larger and more complex, it gets harder and


harder for students, faculty, and research workers to find
the precise item needed. The readers' services, which com¬
prise a major part of every library staff, explain the re¬
sources of the library to individuals, prepare bibliographies,
find desired information locally or elsewhere, and in general
facilitate the interaction between the user and the collections.
There has been much expansion of activity here recently; even
so, at present levels of financial support, most of our univer¬
sity libraries, with their large and diffuse population of users,
cannot provide the specialized information services of indus¬
trial libraries. However, abstracting, translating, and
state-of-the-art surveys are evolving, as larger staffs having
more subject knowledge become available.

Major improvements at the user-collections interface have


already brought about by technology; the fact that more
are soon to come is amply shown by the later sections of
this report. Microfilm and other microforms bring the
resources of the world's libraries to any individual's desk —
provided he has a reader and can afford the copying. In the
last few years, there has come into use new copying equipment

12
that produces a full-size paper copy of a microimage or of any
page of any document in the library. The student or scholar
can now build a small private library quickly though not in¬
expensively. He can annotate, clip and edit. The result of
this is not only a transformation of the user's work habits;
it means we are on the threshold of a new integrity for the
collections. They can be kept intact so that a volume in use by
any one person does not deprive everyone else of it. In more
and more institutions, journals no longer are allowed to cir¬
culate. Quick copies are better, and in the long run, cheaper.

As fast as new types of information storage are developed,


libraries have to introduce the terminal equipment to handle
them. This applies to finder-reader-printers for film slides
and other graphic-storage media, either full- or reduced-size.
It is beginning to apply to data files on tape and to other mag¬
netic storage. Computers and computer consoles have
appeared in libraries experimentally and their operational use
for circulation control and bibliographic search will be a
reality in a few places by the time this book is published.
Teaching machines, simple or sophisticated, language
laboratories, music-listening apparatus -- all these are part
of many university libraries.

The user-collections interface is the principal problem to


which Intrex is addressing itself. This is, indeed, the only
part of the complex that is easily amenable to change, since
the populations of users and of collections are beyond the
library's control.

NEEDS FOR THE FUTURE

It is evident that efficient access to information requires the


development of a more flexible and sophisticated finding
apparatus than the existing card catalog. This apparatus,
ideally, should lead one rather quickly through the universe
of recorded knowledge to exactly those books, documents or
bodies of information that are pertinent to a problem, and at
the desired level. Such an instrument must be capable of
readjusting to changing concepts of a subject, to changing
terminologies, and to the handling of new forms of publications
of many different types. The difficult work of such intellectual
organization must, in the simple interests of efficiency be done
once, without duplication of effort. The resulting bibliograph¬
ical record can then be made available to all interested research
libraries. There are likely to be many important advantages
if access to this "catalog index" can be gained through stations
that are remote from the library, as well as in it.

An interested student or teacher must have the assurance


that, with a reasonable expenditure of effort, he can get rapid

13
access to any piece of recorded information — first to that
held locally, then to pertinent information held elsewhere.
The latter may not be quite so quickly available.

It would be extremely desirable if the library of the future


could call to the attention of interested persons those docu¬
ments, books, or other information sources that might be
helpful to them, as these documents are published or as they
are received by the library.

Libraries need much better information than they now have in


order to determine how well they are meeting reader’s needs,
and how their resources, procedures, or other arrangements
might be modified to improve the effective flow of information.
Because of the changing nature of research and of recorded
information, library organization of materials and services
must become more adaptable to change.

Despite more extensive use of microforms, most libraries


will require increased space for collections. To control stor¬
age requirements, libraries will need to devise better and more
economical systems for weeding their collections, while as¬
suring the continued availability of all materials of research
value. Close integration with national information networks
which are now coming into existence will further alleviate the
acquisition and storage problems of future libraries.

As these and other changes are introduced in their services


and collections, university libraries will require larger, and
better-trained staffs with a higher degree of subject competence.
The information transfer problem is of sufficient complexity
and importance to warrant the best talent that can be brought
to bear on it.

Research and development in new services and equipment, as


well as the conversion of existing records and documents into
new forms, will continue to require substantial new funding.

14
CHAPTER IH

THE NATIONAL NETWORK OF INFORMATION CENTERS

Libraries have traditionally shared their resources, services,


professional tools, and administrative development. This
continuing cooperation has given flexibility and viability to
libraries, enabling them to survive inadequate support, in¬
creasing work loads, and an apathetic public.

The Library of Congress report^on automation points out:


"Cooperation among libraries exists in acquisition, cataloging,
particular bibliographic projects, library lending, and in many
other areas. This cooperation is an attempt to make maxi¬
mum use of limited resources. M For most of the 20th century,
the high points of cooperation were in cataloging (furnished by
the Library of Congress) and in widespread interlibrary loan
of publications. These activities have been carried out on a
nationwide basis and have brought economy and efficiency to
library programs. They have enabled specialized and small
libraries to bring to their users the widest range of the world’s
printed information.

During the past 10 years, however, a serious dilemma arose


from the inability of the Library of Congress to keep up with
the cataloging needs of American libraries, and also from
the inability of libraries generally both to serve their own
users and to share their resources with others. Restrictions
on interlibrary loan have developed in research libraries in
all parts of the country, as a result of increasing demands
for service from those outside as well as users in the imme¬
diate university locale.

Research libraries now face an aggregate of responsibilities


and problems including: a doubling in the output of scientific
information every 10 years; the impact of technical-report
publications; a variety of new demands for service; increases
in the number of languages of publication, as well as the
quantity and quality of foreign-language materials; the pro¬
liferation of new and changing scientific disciplines; a complex
range of forms of publication; and the need to coordinate their
work with an unstructured system of information sources and
information and library centers.

When it became apparent that these problems were not limited


to a small group of libraries but affected all research and uni¬
versity libraries, attention was directed toward expanded
cooperation as a possible alleviation. Administrators, librar¬
ians. Government officials and information specialists proposed
a variety of patterns for cooperative systems.

15
NATIONAL, LIBRARY-INFORMATION SYSTEMS

Juilus Cahn analyzed these proposals and the system structure


in a paper presented at American University2 in which he
states that,

11A system implies effective organization for


optimal input and output, a logical flow of work,
step by step; feedback from part to part; ful¬
filment of separate purposes by each part and
of common purposes by all parts.
"Some have used the concept of an 'Information
Network*. It would consist of a series of 'stations',
each of which would serve a particular audience,
but which would transmit over the equivalent of
'coaxial lines' or 'microwave', the best of centrally
prepared materials, as well.
"Still other observers have used the concept of a
'central information switchboard', such as serves
independent telephone companies, or of a 'grid'
such as electric utilities provide, 'feeding' power
from one location to another at times of peak loads. "

The American Library Association suggested3 a national


network with three levels,

.. local centers will store and handle information


and services which use-patterns indicate are most
frequently desired. . . Regional centers will not
duplicate these resources or services; they will
store information and provide services which use-
patterns indicate are only infrequently desired
within their larger areas of responsibility. National
centers will be designed as single, centralized
compendiums of information and service resources
capable of satisfying every known requirement of
their users. "

A single-level system was suggested in the Library of Congress


study 4 which pointed out that,

". . . other libraries have come to depend upon


the Library of Congress for assistance in carrying
out their functions. The Library of Congress pro¬
vides catalogs, lends books, distributes cards, and
offers varied reference, bibliographic, and consul¬
tant services. . . Thus it is desirable to conceive
of a library network, a national research library
system, incorporating the telecommunications
necessary to accommodate the flow of information
to all its branches. "

16
More than 20 major proposals have been made for national
information systems. Most significant is the plan now being
developed by the Committee on Scientific and Technical In¬
formation (COSATI) in the Office of Science and Technology,
Executive Office of the President. William T. Knox, chair¬
man of COSATI, has given a general outline of the plan: 5

nThe body of the national network of information


systems as now visualized, would be made up of
two parts, i.e. , a complex of library systems
(document-oriented, such as traditional libraries),
and a complex of information evaluation and retrieval
systems (information-oriented).
nThe heart of the system would be a number of
National Libraries, handling the documents in
such fields as medicine, agriculture, engineering,
earth sciences, physical sciences, behavioral
sciences, etc. Each of these National Libraries
would be concerned with such matters as acquisition,
exchange, bibliographic control such as cataloging,
indexing, and the like. As tentatively visualized,
these would be structured, operated, and administered
in somewhat similar manners, each responsible to
a specific Federal agency with primary mission
responsibilities in the field of interest of the assigned
Library. Coordination and compatibility among the
Libraries would be a primary goal from the beginning.
A question for early resolution would be the mechanism
for bringing about this coordination.
nThe complex of library systems would be the
existing library systems (Federal; college and univer¬
sity; public; specialized [industry, institutes, etc. ].;
schools). These libraries would, as now, look to
the National Libraries for loan of documents,
catalogs, etc. , as may be appropriate, and would,
in turn, provide some of the input to those Libraries. ,f

OPERATING SYSTEMS

Current proposals for national systems base their structure


upon the science-information organizations now in existence.
All are Government-operated or supported in part by Govern¬
ment funds. Most effective are those in new Federal agencies
where access to information is recognized as essential to
research and development.

The Atomic Energy Commission and The National Aeronautics


and Space Administration are examples of highly effective
information systems where the entire range of information
and library work is coordinated in one central unit. Manu-

17
script preparation, editing, publishing, abstracting, indexing,
library services and similar functions are all planned and
carried out as a coordinated function.

Other major information-library systems are those of the


National Institutes of Health, Defense Documentation Center,
and the Clearing House for Federal Scientific and Technical
Information.

The National Library of Medicine and the National Agricultural


Library represent national systems oriented toward special
subject groups. Network plans of the National Library of
Medicine are well advanced and the first regional centers are
now functioning. The NLM system includes deposits at the
centers of magnetic tape, listing and indexing books and arti¬
cles in the library.

More specialized national centers are those of the Science In¬


formation Exchange (a clearing house of research project in¬
formation) and The National Referral Center for Science md
Technology at the Library of Congress. The Referral Center
acts as a switching agency, sending inquiries to the proper
information sources.

Outside the Government, science-information systems are


provided by: Biological Abstracts; Chemical Abstracts;
American Institute of Physics; Engineers Joint Council; and
other professional societies. A specialized library net is
operated by Columbia-Harvard-Yale for the cataloging of
medical publications.

COOPERATION AND NETWORKS

Without providing a more detailed analysis, it is evident that


these national services entail a variety of cooperative functions
that are highly important for network activities. A review of
their similarities and their individual characteristics will
aid in discerning a network pattern.

Although some appear to have a subject or


discipline orientation (NLM, CA)5most are
mission-oriented (AEC, NASA).
A central source is basic to an effective
national network.
Coordination of a variety of related responsi¬
bilities (such as indexing, abstracting, and
library services) enhances the usefulness of
the system.

Standardization is imperative.

18
A categorization of network systems would include the following
types: Monolithic; Discipline; Mission; and Special. Each will
have a place in the future of library development.

NETWORKS

Network success rests upon elements beyond traditional


library cooperation which has been informal and variable in
its scope and services. Moving from cooperative to network
activities will necessitate: a strong, well-supported center;
establishment of firm participant responsibilities; agreements
among libraries, and standardization in cataloging, subject
headings and other library procedures.

Libraries will be strengthened by network support to furnish


their clientele with services beyond the competence of a
single library. Information will include not only traditional
books and journals, but also current indexing, abstracting,
state-of-the art studies, unpublished papers and symposia
reports, standard data, and translations. In addition, the
resources of the numerous NASA, AEC and Defense information
centers will be available. Special files such as project in¬
formation in the Science Information Exchange will also be
part of the system.

The Library of Congress automation report pointed out that


the extension of library cooperation would require major
changes in library organization and attitudes. These are two
important elements that thus far have not been treated in
discussions of library networks. In order to avoid disruptive
changes in library organization and unanticipated personnel
problems, it is imperative that all network experiments in¬
clude special studies to determine new library organization
patterns and needs for retraining of library staff and users.

Although much stress is given in discussions of network systems


to the probable elimination of duplication, it should not be
assumed that all duplication is undesirable. It is only un¬
necessary and wasteful duplication that should be identified
and eliminated.

Consideration in these experiments must be given to relative


costs. In general, one assumes that costs are related to
speed and the urgency to have information. For economy,
libraries can use equipment off-the-shelf instead of having
it custom-designed, and they can use delayed procedures
instead of real-time access.

In network planning, it is important to consider the library


clientele as an important element. For a university system,
factors of significance are:

19
Types of users (professors, students, engineers,
scientists, research staff, industry users)..
Location of users.
General and special needs.
Services required.
Subject fields.

A communication method was suggested ^ by three experts in


the field who stated that,

"With the library system, a number of remote


libraries could have low-speed communication
channels feeding into a regional center. At the
regional center there could be high-speed lines
feeding directly into the central library computer
and speed-change devices which buffer the low-and
high-speed channels. 11

A LOOK AT THE FUTURE

Assuming the traditionalists have accepted a new tradition of


togetherness, how do we visualize the library network of 1975?
There will be a large number of specialized information centers
in different fields. Each center will have the responsibility
for monitoring the literature in its particular field or sub¬
field, for collecting necessary experts to do that monitoring
and for providing service to users of the network in that
specialized field. Each center will have a storage and com¬
putation facility, and will provide bibliographic and reference
services for all users.

There will also exist a time-shared network with various


nodes distributed around the country. People with access to
this network will interrogate any of the information centers
attached to the network. Each center will control the biblio¬
graphic information for all classes of literature in its field —
monographs, journal articles, reports.

One can now imagine a really interesting result: the possible


disappearance of the local catalog in fields where such biblio¬
graphical control exists.

Imagine a typical user at a local console attempting to find a


relevant document (not an idea at this point). He would inter¬
rogate the net and be referred to one or more suitable inform¬
ation centers. He would engage in discourse with that center,
until his request had a sufficient degree of specification. He
might then have to wait for a while or he might get immediate
service. In any case, the result of this process would be a
specific citation or citations.

20
Now imagine that there is also a national facility that contains
a catalog of the holdings of books, reports, and serial titles
in all major libraries. Within the network, one can then call
on a national facility with a citation, and find out where some
copy of the document might exist. Thus, the local user would
press a "find11 key on his console with reference to a particu¬
lar citation, and would receive an answer that (happily) the
document was indeed available locally. Or a slightly less
happy answer might be that it was available nearby. Then,
as another possibility, the user might press a "local" button
and be given some indication by the local system as to how
and when he can get a copy of the document. Thus, the local
system might say that the document was available in both micro¬
film and hard copy; did he wish hard copy of the microfilm
for personal retention? Alternatively he might get an immed¬
iate picture of some portion of the text on the cathode- ray
tube on his console.

Note that this process does not involve at any time the use of
the local catalog. All bibliographical information is stored
in decentralized centers having to do with each specialty; the
document location may be in a separate local or national
store. It is evident that, given a specific citation, one may
then need to proceed to ascertain:

Where is it?
What is the local address or call number?
What is the local availability status or
capability? and, finally, request it.

An immediate objection today to the idea of decentralized


centers on this scale is the cost of communications. However,
if we imagine that there are less than 500 such centers, and
if we now imagine that they are connected to time-shared net¬
works, the number of actual connections that would have to
be made to tie all such centers to the net might not be so very
large. They would no doubt group themselves in a number of
geographical regions and one then might connect them together
via coaxial cable or some other reasonable broad-bandwidth
connection.

INTERNATIONAL NETWORKS

Summarizing extensive hearings on scientific information, a


Government committee recently reported, *

"United States scientists and technicians, to be


abreast of developments in their respective fields,
must be aware of what their colleagues in other
countries are thinking and doing. They must know

21
in as much detail as possible the exact nature
of investigations being conducted in foreign
institutions, and they must know it as promptly
as possible. M

A practical means for meeting the need for speed in interna¬


tional information interchange is network development.

J. W. Emling of Bell Telephone Laboratories, in a practical


discussion of library systems, predicted,**

MIf we look some years ahead to the time when


the Bibliotheque Nationale might be connected into
this system with communication by way of a sat¬
ellite system...11

direct international network exchanges would be possible.

The language of science is universal and interchange of in¬


formation is the life blood of scientific progress. It is real¬
istic, therefore, to begin planning for international networks.
An early step is now being taken by the National Library of
Medicine in its arrangements for tape deposits and their use
by European medical and scientific libraries. We should not
be surprised if international networks develop simultaneously
with those at the national level. Their influence might have
broad political and social implications. We know now that
networks are needed at every level.

'Automation and the Library of Congress. " Library of


Congress, Washington, D. C., 1963, p. 21.
2
Julius N. Cahn, "A System of Information Systems. " Paper
presented at the Fourth American University Institute
for Information Retrieval, Washington, D. C., February 12,
1962 .
3
American Library Association, "The Library and Information
Networks of the Future. " Prepared for Rome Air Develop¬
ment Center, RADC-TDR-62-614, 8 April 1963 .

Automation and the Library of Congress,” op. cit.

^William T. Knox, "The Changing Role of Libraries. " Speech


presented at the Association of Colleges and Research
Libraries meeting, Detroit, Michigan, July 8, 1965.
fi
J. W. Emling, J. R. Harris and H. J. McMains, "Library
Communications Networks. " In Libraries and Automation,
Library of Congress, Washington, D. C., 1964, p. 213.

22
7U. S. House of Representatives, Select Committee on Govern¬
ment Research. Study No. IV "Documentation and Dis¬
semination of Research and Development Results.11 88th
Congress 2nd session, Washington, D. C., 1964, p. 54.

8J. W. Emling, 11 Communication Systems for Libraries. "


In Libraries and Automation, Library of Congress,
Washington, D. C., 1964, p. 222.

23
CHAPTER IV

THE ON-LINE INTELLECTUAL COMMUNITY

INTRODUCTION

Five years ago, MThe On-Line Intellectual Community" was


a fragmentary dream in a very few minds. Today,it borders
on actuality, though only in a primitive way and in a very few
places. Five years from now, we think, it will be a signifi¬
cant force in, and some years after that the very basis of,
scholarship, science and technology. In any event, the con¬
cept of the on-line intellectual community has been one of the
three main influences in the planning of Project Intrex.

The essential bases of the on-line intellectual community are


man-computer interaction and computer-facilitated cooperation
among men. To state it just that way, and not to say that the
essential basis is the digital computer itself, implies two
things that we consider significant in the conceptual foundations
for Project Intrex that we have developed during the Planning
Conference.

The first is that the digital computer, together with computer


programs, is now recognized not only as an essential tool but
as a facility of absolutely first-rank importance for almost
any individual or organization whose product depends heavily
upon execution of complex but definite procedures.

The second is that the capabilities of computers themselves


(as now designed and programmed) are great only within the
domain of the execution of pre-defined algorithms; that new
plans and new formulations must stem from men (or from
the domain of heuristics in which men are still unchallenged
masters), and that the solving of difficult problems can be
greatly facilitated by coupling closely together the heuristic
contributions of intelligent men and the algorithmic contri¬
butions of well-programmed computers.

Even in the mid-1950's, there was, in the minds of most of


those who had worked closely with computers, no longer any
question about the first of the foregoing propositions - about
the great value of the then-still-new devices. By the end of the
decade, many people had seen what computers could do, and
the problem was not to convince skeptics but to calm enthusiasts.
Now, five years later, enough responsible people have had the
opportunity to enter into, and to sense the power of, (limited)
intellectual partnerships with computing machines that the valid¬
ity of the second proposition — the three-part proposition of
which the last part asserted the importance of man-computer

25
interaction — is firm enough to support that proposition as a
basis for significant decisions. In short, it is now evident
that much of the creative intellectual process involves moment-
by-moment interplay between heuristic guidance and execution
of procedures, between what men do best and what computers
do best. On the basis of that realization, it seems reasonable
to project to a time when men who work mainly with their brains
and whose products are mainly of information will think and
study and investigate in direct and intimate interaction with ex¬
tensively programmed computers and voluminous information
bases.

With computer facilitation of cooperation among men - the


second of the two concepts underlying the on-line intellectual
community - we have had less experience than we have had
with partnerships between an individual man and a computer.
Nevertheless, some essential features of community-computer
interaction have emerged and are seen clearly enough to con¬
vince some of us that this second part of the conceptual founda¬
tion will be even stronger than the first.

Evidently many major achievements, although they may be


crystallized by the work of one or two great men, result from
the accumulation and the melding of the contributions of many.
Because communication among men is fallible, and because
heretofore men did not have effective ways of expressing com¬
plex ideas unambiguously — and recalling them, testing them,
transferring them, and converting them from a static record
into observable, dynamic behavior — the accumulation of cor-
r el at able contributions was opposed by continual erosion; and
the melding of contributions was hampered by divergences of
convention and format that kept one man's ideas from meshing
with another's. The prospect is that, when several or many
people work together within the context of an on-line, inter¬
active, community computer network, the superior facilities
of that network for expressing ideas, preserving facts, modeling
processes, and bringing two or more people together in close
interaction with the same information and the same behavior —
those superior facilities will so foster the growth and integration
of knowledge that the incidence of major achievements will be
markedly increased. Perhaps as stated (or overstated), that
is still a dream. But we have been seeing the beginnings of its
realization in the cooperative communities that have been de¬
veloping around pioneering time-sharing computer systems;
and we think we see in those communities (most clearly, of
course, in the one at MIT, for it is the oldest and we have had
the closest view of it) an important part of the solution to the
ever-more-pressing problem of timely and effective transfer
of information.

26
THE PROJECT MAC EXPERIENCE

The concept of the on-line intellectual community is in large


part a projection of recent experience with time-sharing com¬
puter systems, mainly at MIT, but also at the Carnegie In¬
stitute of Technology, the System Development Corporation,
the RAND Corporation, and (especially during the last year)
at several other institutions. Work in time sharing began in
the Computation Center at MIT in 1960 and has continued there
and in an Institute-wide program known as Project MAC — at
the present time, two large computer systems, each comprising
a conventional digital computer with several modifications and
additions, disc and drum-storage units, and a communication
subsystem, together with a considerable number of consoles,
most of them located remotely from the computers themselves
and in or near the offices of users. There are now more than
150 typewriter and teletypewriter consoles which may be con¬
nected to either of the large computers through the Institute
telephone system, the external telephone system, or (in the case
of the teletypewriters) through the Western Union Telex System.
Indeed, if it has authorization, any teletypewriter in the inter¬
national Telex system can communicate with the computers at
MIT, and - after slight modification of its modulator - any tele¬
typewriter in the country-wide TWX network can do likewise.
There are several smaller computers and one quite-elaborate
console featuring advanced graphic display devices that con¬
stitute, with one large computer, a fairly complex computer
network. The foregoing is the hardware base of the program,
now five years old, from which have arisen some of the ideas
that we now see as fundamental for information transfer net¬
works of the future.

The experience gained by many members of the MIT community


through interaction with the time-sharing computers has been
so convincing that both of the new, advanced computers (one
to be delivered late this year and the other early in 1967) will
be machines designed especially for time-sharing and — as turns
out to be even more fundamental, memory-sharing — applications.
The new machines, a General Electric Model 645 and an IBM
System/360 Model 67, will increase the capacity for providing
simultaneous on-line service from the present total of 60 con¬
soles (30 for each system) to a total of several hundred.

Although the hardware basis is easier to describe and tends to be


described first, even in this late stage in the ascendancy of soft¬
ware, the software basis - the collection of computer programs,
documentation and doctrine — is equally important and more
directly responsible for the shape and structure of the services
rendered by the systems to the users. Because the basic soft¬
ware is the same for both systems, we need describe only one. We
shall describe it, not from the structural point of view of a

27
,Jsystem programmer", but from the behavioral point of view
of a substantively oriented user of the system.

The user sits at his typewriter or teletypewriter and types


messages to the system, which sends messages back to him,
sometimes cryptic, sometimes full, long or short, in natural
language or.in mathematical notation, all depending upon the
nature of the program that is running in the computer for the
user at the time. While the user upon whom our attention is
here focused is interacting with the computer, other users are
interacting with it too - up to 29 others, and usually precisely
29 others - under the rules that are presently in force. Ac¬
tually, of course, since there is only a single processor in
each of the present main computers, the programs of the 30
users do not run precisely simultaneously. The computer
operates one, and then another, and then another, each for a
brief interval, in a sequence determined by a set of rules in¬
corporated in a program called the "scheduling algorithm".
However, the intervals are short enough that no user ever
need wait very long for his program to have a turn; moreover,
each user can type to the system, and the system can type back
to each user, even when the processor is working on another
user's work, which is possible because the system's communi¬
cations with users are handled by a separate communication
subsystem, and that subsystem can time-share its facilities
among the users.

The computer program with which the user communicates


when he first "logs in" to work on-line, and whenever he
wishes to switch from one course of action to another, is the
supervisory program. That program is quickly available to
him at all times because it always resides in the primary,
directly processible memory of the computer and because all
communications between the user and other programs (and
other people) are monitored by the supervisory program.
The supervisory program not only does what its name sug¬
gests it should do, but also does what one would expect an
arrangements secretary and a recording secretary to do.- It
responds to a "command language" that includes about a
hundred basic commands. Examples are:

List the names of the files in my private sector


of the store, and show me the records of their
recent use.
Accept the following, under the name
as input to my file.

Fetch me the editing program and tell it that I


wish to edit "_".
Tell me how many consoles are currently active.

28
Save this information that I am going to use,
now, so I may recover it if my test goes awry
and destroys the test copy.
Start up program "_11 from just the state in
which I left it when last I used it.
Explain command ff_M to me in greater detail.
Compile (i. e. , prepare for execution) the pro¬
gram called "_" with the ,f_11 compiler.
Operate the program called "_", using the
data called _".

The user does not have to type such long statements, of


course. To have a castle-planning program named "Ludwig"
compiled by the Michigan Algorithmic Decoder, for example,
the user need type only "MAD LUDWIG".

The services arranged for by the supervisor are actually ful¬


filled, of course, by other computer programs. The set of
these other programs, and hence the set of services, is open-
ended and continually increasing. Most of the programs are
devised by users of the system to meet needs encountered in
their work. Such programs begin their existences in the pri¬
vate files of their originators and then are copied into the
private files of other users who discover their convenience
and modify them to meet broader ranges of requirements.
Then, when the programs have proved themselves in use and
have been "documented" well enough to meet the standards
set for "public programs", they are recognized by the Edi¬
torial Committee and admitted into the public files. The
same course is being followed, now, with data that are likely
to be useful to many people, with data including, many users
are pleased to note, instructions on how to use the time¬
sharing system. Thus the community of users is creating
something that no single individual (indeed, no affordable
centralized group of system specialists) could possibly create:
a broad, comprehensive, and continually expanding system
of information and information-processing services that is
at the hunt and peck of any member of the community of users,
either for employment "as is" or for modification or special¬
ization to his own particular requirements. Note that the
services include facilities for retrieval of information and
for retrieval of processors and for applying the processors to
the information (which is the immediate source of the strength
of the system) and for improvement and augmentation of the
services themselves (which is the generic source of the strength).

The "services" of which we have been speaking are the clerical


and quasi-clerical functions and operations involved in creative
intellectual work. (Some of them are too sophisticated to be

29
called merely "clerical”. The ’’on-line mathematical
assistant” can carry out symbolic integrations about as well
as most graduating seniors can. ) Examples of areas in which
services are available to aid the on-line user are:
Preparing and editing text and running off clean copy.
Graphic design of structures and devices.
Modeling or simulation of dynamic and stochastic
processes.
Planning of highways, buildings, water-resource
systems, and the like.
Understanding of systems of mathematical and
logical interrelations and constraints.
Examining and tracing bibliographic information
in the field of physics.
Preparing, testing, revising, and documenting
computer programs.

Each of these services performs for the user, quickly and


responsively, functions that are essential to the development
of his line of thought or investigation but that would be pro¬
hibitively time-consuming if he had to carry out all the
operations himself or wait for a conventional computer center
to return the result pertinent to each elementary step before
he could move on to the next.

The time-sharing systems of Project MAC and the Computation


Center are computing systems primarily, of course, and not
library systems. The experience with them has therefore not
been an experience with precisely the spectrum of classes of
information and information processes with which Project
Intrex will be concerned. Nevertheless, the experience has
been much more directly relevant to library and information-
network topics than most foresaw at the outset. Indeed, the
time-sharing systems include small ’’libraries” — millions
of ’’computer words”, not millions of documents — and in¬
formation-retrieval and information-dissemination subsystems.
The main features that are seen, as a result of the time-sharing
experience, to be essential for future libraries and information
networks, are the on-line mode of operation, the open-ended-
ness, the capability of examining and acting upon information
through programs, and the facilitation of coherent interaction
and cooperation among the members of the community of users.

THE CONCEPT

We try to distinguish clearly in our thiriking between the actual


experience thus far gained in on-line computing and the concept

30
of the full-fledged, on-line intellectual community of unspec¬
ified future date. The experience has provided a concrete
foundation for the concept and attested to the feasibilities
and values of some of its parts; but the over-all concept
is as much an idealization, based on perception of our wants
and needs, as it is a projection of experience in community-
computer interaction.

In the concept, all the members of the university community —


undergraduate students, graduate students, faculty members,
full-time researchers, scientists and engineers in associated
industry, and, indeed, librarians and administrators — work
in close and frequent interaction with the information system.
The system is at once a store, a processor, and a transmitter
of information: the central nervous system of the community.
Unlike the central nervous systems known in biology, however,
the information system of the local university community is
a node in a larger network that serves the complex of commu¬
nities and organizations that support the intellectual life of
the society. Each member of the local university community
therefore works, in a real and effective sense, "on-line to
the public informational resources of the world".

As members of intellectual communities have done for many


years, the members of the on-line intellectual community think,
study, teach, learn, experiment, publish, develop, invent, or¬
ganize and analyze, solve problems, make decisions, and carry
out all the other ill-defined processes of cognition, overt and
covert. They carry out those processes mainly at their desks
(which is to say, their consoles) in close interaction with the
information processors and the information bases of the net¬
work, and sometimes through the network with other members
of the community or the community of communities.

The contributions made by the network are largely clerical


and quasi-clerical, in the sense already mentioned; but one
of the main features of the concept is that the system delivers
much of its help "inside the thought cycle" and ready for inte¬
gration within the structure of the user's thinking. That is
the essential advantage of being on-line. That is what so
greatly facilitates the melding of the heuristic guidance and
evaluation provided by the man and the precise memory and
rapid processing provided by the computer.

The list of services that the system (network) affords begins


with access to stored information. Whenever a user needs
to employ a fact or refer to a document that is "in the net¬
work", he has only to specify the fact or the document uniquely;
and, if it is not buried down in a little-used sector of the mem¬
ory or separated from the request by imposing obstacles, it

31
is delivered. If the cost of delivery would be great, the cost
is presented, rather than the fact or document itself, and a
negotiation between the user and the system is thus opened.
If the user does not specify the fact or document uniquely,
then a negotiation is opened to refine the retrieval prescrip¬
tion or to give the user a notion of how many facts or docu¬
ments he may receive if the system follows through to meet
his request. The fact-retrieval capability postulated here is
based upon the assumption that the system contains a store
of information organized in a more readily proeessible form
than that of natural-language text or, alternatively, that great
strides have been made beyond the present level of understand¬
ing the syntactic organization (and, more especially, the seman¬
tics) of natural language. The document-retrieval capability,
on the other hand, is based approximately upon the present
state of the art. Whether one contents himself with document
retrieval, or assumes retrieval of facts from fact stores
organized by men, or goes on to postulate inference from
natural-language text, the conclusion must remain the same:
that retrieval of stored information is the basic service upon
which all the facilities of the on-line system must depend.

The second fundamental service is processing of retrieved


information with the aid of computer programs. It is essen¬
tial to the concept of the on-line intellectual community that
there exists, within the on-line system, an extensive library
of computer programs that can be called upon by any user,
specialized (through commands given in a convenient language)
to meet his immediate purposes, and directed upon any body
or bodies of information he cares to name. If he knows how
to write computer programs, the user can add to the publicly
available armamentarium and process information in ways of
his own particular choosing. If he commands the services of
a programmer, the user can have programs custom-made.
By and large, however, users of the system here envisioned
are able to retrieve, from the library of programs, all the
processes and procedures that they need — and, from the
library of documents and data, all the information that they
need to process. For example, in filling out an automobile
registration form, one might ask the system to divide the
nominal curb weight of his 1959 Mercedes 190 Sedan by 100,
round to the nearest integer, and then multiply the result by
$1.00. The system would "know" the weight of the automo¬
bile and therefore be able quickly to give him the registration
fee. However, the system would know, also, the formula for
computing the fee, and, indeed, it might very well know every¬
thing, both fact and procedure, required to fill out the form.
But then, of course, there should be no form, for all the
information would be "in the system", and there would be
no need to disturb the automobile owner at all. This line of
discussion can go on to embrace bank accounts, credit, and

32
other non-intellectual aspects of life. For the present pur¬
pose, however, it suffices to illustrate the projected capability
of the system to apply pre-defined procedures to information
contained within its stores.

A third fundamental service deals with display of information


to the user. This service, also, is controlled by a language,
designed to be natural enough for the user and formal enough
for the computer, that deals with entities to be displayed,
with the selection of display devices, and with the specification
of formats. Through this language — or sub-language as it
should be called, since it can be embedded in other lan¬
guages — he can have alpha-numeric text presented to him as
soft (ephemeral) copy on a special alpha-numeric display or
typed out for him on a printer, and he can call for graphs,
diagrams, sketches, and the like, mixed with alpha-numeric
text, on various display screens, and have the information
captured photographically or xerographic ally for later refer¬
ence.

The fourth and final basic service is control. The user con¬
trols the system, or addresses requests to it, through a few
familiar devices: a keyboard, a pen-like stylus, a micro¬
phone, and a small assortment of buttons and switches.
Through those devices, he can communicate in strings of
alpha-numeric characters, by pointing, by writing clearly,
by sketching or drawing, and by speaking distinctly in a
limited vocabulary. His communication with the system is
carried out, as suggested earlier, in languages that are some¬
what more constrained and formal than the open, natural
language of everyday speech among men.

Built upon the basis of the four services just described are
many derivative services. It will have to suffice to mention
only a few of them, and briefly. There are arrangements,
patterned after the Sketchpad programs developed at the MIT.
Lincoln Laboratory, that facilitate the design of structures
and devices. There are programs to faciliate the preparation
and editing of text. There are programs to facilitate the
preparation, editing, testing, modification, and documentation
of computer programs. There are special-purpose languages,
together with facilities for carrying out instructions given in
the languages, for modeling or simulating complex processes.
(The modeled processes may then be set into action and
viewed in operation on the display screens.) There are ar¬
rangements to facilitate communication among members of
the on-line community — arrangements for viewing the same
dynamic model on screens at different consoles, for authori¬
zing access to otherwise private files, for merging texts and
pooling data, and the like. There are many courses and many
tecnhiques of computer-assisted instruction, including some

33
that instruct the user in the operation of the on-line system
and in the preparation of computer programs in various pro¬
cedure-oriented and problem-oriented languages. Indeed,
there are even programs that will play chess, checkers, and
Kalah, for example, with anyone who challenges them; and
some of the game-playing programs will play at any specified
level of mastery and with any specified style.

The foregoing description of the system's possibilities is not


complete. There are many other services, such as programs
for statistical analysis of data, programs to facilitate con¬
ducting experiments through the system, and programs that
conduct tests of all kinds. The description is based upon
something more than free invention of desiderata, for each of
the services mentioned (except possibly the game-playing
service, as elaborated) is an achievable extension of pro¬
grams that already exist in demonstrable form in one com¬
puter system or another. It would be much better, if it were
possible, to substitute demonstration (even of primitive pre¬
cursors that are in operation now) for this description, be¬
cause many of the demonstrations are quite striking and quite
clearly extrapolable to services of the kind described here,
whereas the description in mere words may sound as much
like science fiction as like a reasonable projection.

Be that as it may, the members of the on-line intellectual


community work in close partnership with the system — with
the computer(s) and the information base(s) — in almost all
their work, whether it be formulative thinking, or experi¬
mentation involving the control of apparatus, or teaching, or
learning, or any of the other things in the list of their ac¬
tivities. Many of the members of the community are skilled
in the art of computer programming and fluent in a number
of programming languages. These people contribute in an
important way to the improvement or extension of the system
whenever, in the course of their work, they come to points
at which the existing facilities are less than satisfactory —
and prepare new procedures to fulfill the required functions
or to meet the new circumstances. In that way, they add to
the processing capabilities of the system. Other members of
the community, not given to programming, may nevertheless
add materially to the capability of the system; they do so by
introducing new facts, new data, and new documents into the
store.

The system is augmented not only through the contributions


of its users, of course, but also through the contributions of
full-time organizers, programmers, and maintainers of the
system. The contributions of the system professionals were
greatest during the early years of the development of the
system. During the later years, the fact that the substantively

34
oriented users predominate so greatly in sheer number off¬
sets the greater concentration and, on the whole, greater
skill of the professionals. In many instances, however, it is
difficult to distinguish clearly between the contributions of
the substantively oriented users and the contributions of the
system professionals, for the professionals monitor the con¬
tributions of the users and often modify substantially, and
usually polish, the techniques and programs and the sets of
data that are offered to the public files.

The functions of the network that are of greatest relevance


to Project Intrex are, of course, those that involve con¬
tribution to and consultation of the stored record. Let us
examine briefly a few such functions.

Phillips, an experimenter in the psychology laboratory, is


conducting an experiment on manual control. A subject,
seated before a display screen, is trying to keep a stylus
in contact with a point of light that moves about the screen.
The trajectory of the point of light is calculated and controlled
by a computer. The subject's response is recorded and
analyzed by the computer, and the results are displayed
graphically to the experimenter. The experimenter guides
the execution of the experiment and controls the analysis of
the data.

In the memory of the computer are three dynamic models of


the manual-control process, three competitive theories of
what goes on when a man tries to keep a stylus on a moving
target. Two of the models were retrieved through the net¬
work, the other formulated by the experimenter at his console.
The same target motion that is being displayed to the human
subject is being fed to the three models. The responses of
the models are recorded and analyzed in parallel with the
response of the human subject. The computer adjusts the
parameters of each model to maximize the correspondence
between the model's behavior and the subject's. The com¬
puter keeps records of the degrees of correspondence achieved
and displays the situation to the experimenter. Seeing the
shortcomings of the models under test, the experimenter de¬
vises another model and tests it with the same target courses.

At the end of the session, the experimenter remains at his


console to examine, organize, document, and file the records.
After several such sessions, he assembles the entire accu¬
mulation of records, devoting particular attention to ensuring
their retrievability and their convertibility from static to
dynamic form. As a precaution, he exercises each of the
models with test inputs and puts the parameters of the models
under the parameter tables. Then he calls a type-and-edit
program and prepares first draft of a report.

35
The report is a document, of course, complete with refer¬
ences and figures. The writer does not have to type out the
references, for they are "in the system" — he merely points
to the ones he wants and to the places in the text to which they
are relevant, and the computer captures them and numbers
(and, when a new reference is added, re-numbers) them.
Some of the figures are dynamic. When the models, for ex¬
ample, are displayed on the screen, their input signals flow
and the models "behave". Most figures have several forms.
The reader can select detailed, real-time behavior or sum¬
mary statistics. In some instances, the information contain¬
ed in the figure is displayed from store. In other instances,
a generating function is stored, and the figure is displayed
from concurrent calculation.

The report is typed just once — when the author writes the
initial draft. Indeed, in writing the initial draft, he employs
some material that he prepared earlier for use in the experi¬
ment itself. He does not re-type it; he has the computer copy
it, and perhaps he edits it a bit. In any event, a thing is typed
just once and thereafter only modified. Because editing with
the aid of the editing program is quick and easy, and because
the current approach to*computer "understanding" of natural
language is more demanding than human readers are for ex¬
cellence of style and rigorous adherence to stated conventions,
important articles are revised and re-revised.

In the on-line community, publication is a multi-stage process.


Even while a manuscript is in preparation, it can be as ac¬
cessible to on-line colleagues as the author cares to make it so.

As soon as the manuscript is completed, it is available for


retrieval with the aid of tags the author has tentatively attach¬
ed to it. (Usually the author consults through the system with
a documentation expert during the final phase of preparing the
manuscript.) If the manuscript is read (and "operated") by
several members of the community, it is likely to pick up at¬
tachments: comments affixed to it by readers. The author
may take them into account when he revises his manuscript,
inserting "credit pointers" at the appropriate places in the
text. (Credit pointers, attachments, and various other inter¬
active paraphernalia become visible only in certain modes of
display. The reader can call for them or inhibit them,
whichever he likes.)

When it is ready for more formal publication, the author may


submit his manuscript to any journal that operates within the
network. (The author could submit it simultaneously to sever¬
al journals, but, even if he were to try to hide his skullduggery
by putting a separate copy into the store for each submission,
a monitor program would probably find him out and attach all

36
the submission records to each copy.) Editors use the net¬
work in their communications with reviewers, and that speeds
up the review process.

In addition to its human reviewers, each journal uses a re¬


viewing program that checks that all matters of format are
in order. The program is most insistent that the author in¬
clude suggested retrieval tags. Usually, the author inter¬
acts with the review program on-line. He may use it in a
programmed-instruction mode to find out why it complains
about a particular ,,period-quotes-closedn punctuation. He
may find that the review program has looked up the quotation
and discovered that the original version does not end in a
period. Most journals that operate within the network have
long since adopted stylistic conventions designed to preserve
information — rather than, for example, to facilitate type¬
setting — and, in the case of the period and the quotation mark,
that meant adopting the British practice.

Publication in a good journal within the network carries some


guarantees of accessibility. Recent issues of Physical Re¬
view, for example, are never more remote than the third
echelon of the local store. Publication in a good journal also
ensures that the information contained in the document will
be processed by one of the groups engaged in reorganizing
the body of knowledge — that the information will be introduced
into one of the new computer-processible information struc¬
tures that promise in a few years to displace natural-language
text as the main extra-neural carrier. Finally, publication
in a good on-line journal guarantees publication and distribu¬
tion in print. That is still important because the on-line
network is not yet worldwide and because print-and-paper
books still have advantages over consoles for a few non-
negligible purposes.

The report is not stored all in one place. The title, abstract,
references, etc., are held in a more readily accessible file
than the body. Keyed to the body (and to some of the figures)
are sets of data. The sets of data are stored in a data bank.
Whereas data used to be relegated to vaults — preserved on the
off-chance that some scholarly skeptic might re-examine the
experiment in detail — now that there are means for working
with data effectively, for analyzing or summarizing them in
new ways without going through hours of drudgery, stored
data are as often retrieved and examined as are stored
documents.

The information contributed by the author to whose manu¬


script we have been directing our attention can be retrieved in
several forms: document, set of data, dynamic model, answers
to questions. Associated with each article is a description

37
telling what things are available and in what forms. Part of
the description is for potential users; it gives them the over¬
all picture. Part is for computers; it provides detailed in¬
structions for operation of models, retrieval of associated
data, and the like.

Phillips’s article on manual control has now been published


in (on-line) ‘’Human Factors”. In Los Angeles, Dennis and
Fry are writing a general on-line review of manual control
and, in St. Louis, Rodehafter is conducting an experiment on
manual control. Dennis and Fry have long since found and
examined Phillips's article, of course; it was retrieved by
their very first prescription. The way they incorporate
Phillips's material into their review is worth examining brief¬
ly. They do not ’’rehash” it. They do not copy figures. Their
introduction to Phillips’s work sets it into perspective, de¬
fining its relation to other work in the field. Almost every
sentence of the introduction contains pointers which, when
activated, lead through the retrieval system to the relevant
work. Their discussion of the work itself is organized around
several basic theoretical questions. The discussion is full of
pointers, too. Some of them lead to theoretical articles.
Others lead to the figures and tables of Phillips's article.
There is no need to copy those items and store them as part
of the review article, for they are already in the memory of
the network and readily recoverable from it. The review
article is therefore in large part an associative structure of
the kind Vannevar Bush envisaged in his 1945 description of
’’Memex”.

In their review, Dennis and Fry mention Rodehafer's on-going


experiment. They found out about it by following retrieval
trails recorded within the system. (Each user has the option
of leaving his trails open to inspection or covering them up.
A tradition of ’’open trails" is developing within the academic
community, whereas many business and industrial firms follow
a policy intended to protect proprietary interests.) The re¬
view mentions the unfinished experiment but does not, of
course, draw any conclusions from it. However, Dennis and
Fry file a ’’suspense” message that will call to their attention
the completion of Rodehafer's report. They do that because
an on-line review article is not a one-shot effort. The re¬
viewers have accepted a commitment to keep their review up-
to-date for a period of five years.

Rodehafer, meanwhile, is using parts of the in-process re¬


view. He has become acquainted with Fry through the system.
He disagrees with Fry’s method of analyzing a basic theoretical
problem. He is expanding his experiment to include a com¬
parison of Fry’s version of a pursuit model and his own.

38
The discussion between Fry and Rodehafer leads Fry to re¬
learn some Laplace transform theory. He studies it partly
with the aid of programmed instruction, partly by reading
retrieved texts, and partly by asking questions. At one point,
for example. Fry asks:

What is the rule or law: stability, servomechanism,


locus, origin, Nyquist diagram or plot?

The question-answering system has a bit of trouble with that.


It has to ask a question in reply:

Does the rule or law deal with: complex plane,


Nyquist criterion?

Fry says that it does, that ’’Nyquist criterion” is the phrase


he was trying to think of. The question-answering system then
tells him that the Nyquist criterion is 10 years out of date,
that the modern criteria are given by the root-locus method.
The system then presents instructional material on the root-
locus method. Fry is brought up-to-date and, incidentally,
into better agreement with Rodehafer’s analysis, which had
been based on current concepts. Fry learns to use the system’s
on-line root-locus programs. He introduces Dennis to them.
The review is saved from prenatal obsolescence.

The foregoing example mixes content appropriate to 1965 with


question-answering capability not available in 1965. However,
the postulated question-answering capability is far from
sophisticated. It assumes only a very simple syntax and nothing
more advanced than lexicographic semantics. The service
available to the on-line intellectual community of the 1980’s
will be considerably more complex and sophisticated than the
services we have suggested here.

Complexity and sophistication, however, are not crucial to


the main theme of the concept. The essentials are:

The ”on-lineness” — the auick resoonsiveness that


permits integration into the on-going line of thought.
The breadth of the spectrum of the services.
The fact that the services supplement the capa¬
bilities of men,
and
The fact that the network promotes coherent inter¬
action among the members of the intellectual
community.

39
THE INFLUENCE OF THE CONCEPT

The idealized concept of the on-line intellectual community


is by no means a goal of Project Intrex. If it were a goal, it
would certainly not be achievable on the time scale that has
been laid down for the Project. The idealized concept is, like
the Project MAC experience, an influence that has played a
role in shaping the program of research that is recommended
for Project Intrex to carry out. Whereas the MAC experience
served as a base from which to project, the idealized concept
of the on-line intellectual community serves as a check-list
of attractive features, many of them not achievable on the
Intrex time scale, but all of them worth examining for rele¬
vance and feasibility.

The single consideration that has most severely inhibited the


tendency to reach as far as possible toward the idealized con¬
cept in the planning for Project Intrex was the difficulty of
converting a truly large body of information from the presently
modal form of print-on-paper into the digital code required
for computer processing. There is some spread of opinion
within the Planning Conference as to the extent to which that
difficulty will be overcome during the next few years, but
there was general agreement that it would not be feasible to
convert all the contents of all the documents in the MIT li¬
braries, or even all the contents of all the documents that
would be required for a model library capable of supporting
significant experiments. The conclusion was that every effort
should be made to capture newly generated information directly
in computer-processible form. Much of it exists in that form
at one point or another during the process of preparation,
editing, typesetting, and printing, and almost all of it — all
that is typewritten — could be gotten into computer-processible
form with only a little additional expenditure of time and
facilities. Therefore, new information should go directly
into the digital store. All the information in the apparatus of
bibliographic control must be converted into computer-pro¬
cessible form, even though the conversion may require con¬
siderable labor. As for the contents of the documents that
are already in the libary, however, large-scale conversion
at the present time does not seem practicable. It seems
better to wait for the development of effective and economical
character-reading machines — and perhaps even for devices
and programs capable of encoding line drawings, if not
pictures — before pressing forward to complete the task of
conversion. That set of conclusions severely limits the
distance Project Intrex can go toward realizing a computer
system of the kind required to support a full-fledged, on¬
line intellectual community.

40
The other major restraining factor is the primitive state of
understanding of natural language. Men appear to communi¬
cate with one another in natural language rather well, but
even the most sophisticated linguists do not fully understand
the syntax; and there is almost no semantic theory at all that
is capable of supporting engineering applications. It is evi¬
dent that, although linguistics is recognized as an important
and challenging field, and although it is the focus of much
activity, the basic problems will not be solved to the point of
engineering applications on the time scale that has been set
for Project Intrex and the operational information network to
which it is to lead. This consideration adds its weight to that
of the text-conversion difficulty in limiting the use of the on¬
line intellectual community as a model.

Despite these constraining effects, the concept of the on¬


line intellectual community remains a strong influence on
Project Intrex. The concept has registered itself upon many
creative minds in the Cambridge area, and too forcefully upon
those minds not to shape the Intrex program in many ways.
Enough members of the Cambridge intellectual community
want to be members of an on-line intellectual community to
keep the idea from lying dormant. Therefore, even though
the on-line intellectual community will not come into existence
as a result of Project Intrex, it will become a nearer prospect,
and advances made during the 1970fs might well remove the
constraints and permit, during the 1980!s, an essential real¬
ization of the concept.

41
CHAPTER V

THE INFORMATION TRANSFER SYSTEM AT MIT IN 1975

Three main streams of progress in the information transfer


field were intensively discussed at the Project Intrex Plan¬
ning Conference and have been described in Chapters II, III
and IV:

The modernization of current library procedures


through the application of technical advances in
data processing, textual storage, and reproduction;
The growth, largely under Federal sponsorship, of
a national network of libraries and other information
centers;
The extension of the rapidly developing technology
of on-line, interactive computer communities into
the domains of the library and other information
centers.

The information transfer system that we visualize at MIT in


1975 is a forward projection of these developments based on
the assumption that Project Intrex will aim to merge them
into a balanced combination.

In attempting a description of the principal features of the


system of 1975, we have been mindful of the universal ten¬
dency to overestimate what can be done in a single year, and
to underestimate what can be done in ten. Thus our ten-year
leap into the future of information transfer may have the ap¬
pearance of an overly bold flight of fancy. We present it with
diffidence and we caution the reader against confusing it with
MIT policy. But we also point out to the reader that a factual
description of today's computer operations at MIT would have
seemed fantastic to him in 1955.

A CENTRAL COMPUTER

The flow of information in the system of 1975 will be con¬


trolled by means of a time-shared digital computer. Sharp
distinctions must be drawn among at least four different ways
in which computers are now being used and certainly will be
used by the university of 1975. This is necessary for two
reasons. In the first place, we wish to set limits to the
fields of interest to which Project Intrex will address itself.
Secondly, an early discussion is required to avoid confusion
on the extent to which these different applications can or
should share common computational facilities, either during
the development phase or operationally.

43
Four applications which are already clearly distinguishable
are related to (1) the needs of the universities for computa¬
tional and data-handling facilities in the conduct of their or¬
ganizational businesses, such as payrolls, scheduling classes,
and so forth; (2) the computational needs of the members of
the university community in the pursuit of their intellectual
endeavors; (3) the use of computers for what has now come to
be called computer-aided instruction (CAI); and, finally, (4)
the use of the computer for information retrieval. It is, of
course, this last category that is the proper concern of Pro¬
ject Intrex, although the storage and retrieval of the text used
for CAI can probably be also considered a part of the Intrex
responsibility. We further note that all four of these appli¬
cations can and will make use of computers operating in a
time-sharing mode and that, in principle, they all could share
a single common facility. While this might seem to be a desir¬
able state of affairs in the interest of economy, it seems highly
unlikely that all four different groups of users could work to¬
gether satisfactorily, particularly in the early, developmental
stages.

THE INFORMATION STORE


A university has access to a great many sources of informa¬
tion. Much of the information is recorded in books, pamphlets,
and so forth, and resides in the present-day library. Some of
the information is retained in the minds of the community it¬
self, and is communicated to the student body in the normal
teaching processes, and to colleagues via seminars and in¬
formal discussion. The information transfer system of 1975
might have provisions for tapping this latter source of know¬
ledge just as it obviously must have facilities for tapping the
information stored in the library.

In predicting the form in which the "recorded" portion of the


information will be stored, we have assumed that by 1975 we
shall be roughly half-way in the transition from the library of
tbepresent time to a completely on-line intellectual community
as described in an earlier chapter, at least so far as the
science and engineering holdings are concerned; that is, ap¬
proximately half the scientific information actually trans¬
mitted to the user would be stored in books or on microfilm,
and half in some computer-accessible form. A word of
caution: We are talking here not about the total information
available but that portion of the information that the user
actually makes use of. Obviously, the heritage of the exist¬
ing library will not be easily duplicated or replaced, and most
of the archival type of information will still exist in printed
form.

44
A large portion of library information will be available in
image microform. This main image store will be accessed
by address only. Microform copies of equal size or of larger
format (for inexpensive viewers) and hard copy will be avail¬
able locally within seconds on demand. Although local viewing
of the microform may possibly be done by mechanical trans¬
port of the microform, remote viewing will be possible by high-
resolution facsimile or CRT display.

Certainly, we are not going to take the present library hold¬


ings and transfer them en masse into a machine-readable
form by 1975, if, in fact, this is ever done. Much of the
material that is created between now and 1975 will be pro¬
duced in hard-copy form. However, it is not at all unreason¬
able to assume that the most often-used portion of this infor¬
mation will either be initially produced in machine-readable
form or be converted into machine-readable form and will be
available to an on-line intellectual community through termi¬
nals of some sort. And since a large part of the scientific
community is concerned with recent information rather than
archival information, we will assume that up to 50% of the active
information will be stored, transmitted or reproduced in coded form.

This coded material will consist of source documents in the


present-day sense and of derived material in the form of
card catalog information, abstracts, indices, bibliographies,
concordances, critical reviews, summary information and
condensations. There is, of course, a present-day tendency
to consider such information as being distinct from the docu¬
ments themselves, but we will assume that by 1975 the amount
of such material will have grown disproportionately and that
it will no longer be considered separately. We can anticipate
that it will be necessary to index, abstract, and condense the
normal indexing and abstracting documents themselves.

ACCESS TECHNIQUES

Physical Aspects

In the information transfer system of 1975, the user will have


a choice of means by which he can obtain access to the infor¬
mation stored in the system. It is highly unlikely that he will
borrow books from the library as he does at present. If he
needs the actual document itself, he will obtain a copy of it.
This copy may be prepared by the publisher in the usual fash¬
ion, and we can think of the bookstore and the library as having
coalesced, at least in the university community. If the desired
document is out-of-print at the time of the request, the library
will duplicate it by one of the many duplicating facilities that
it will then possess. We are, of course, assuming that prog¬
ress will have been made toward resolving the legal aspects of
the situation, and toward developing methods of fairly reim¬
bursing the authors and publishers.

45
Because of the difficulties of handling books of different sizes,
one might think that the leading libraries will have standard¬
ized on a fixed book size both for storage and for distribution.
But this could have been done with profit at any time within
the last 300 years, and it seems that a general law may be at
work, the consequence of which is that it is easier to intro¬
duce a distinctly new system than to modify slightly an old one.
We have little confidence that the actual format of ordinary
books will be very much modified by 1975. Books for storage
purposes could be produced with high-quality materials which
would no longer be subject to the high deterioration rates which
affect our present materials.

As a second type of service, we will assume that many users


will want to read specific pages of a given document but will
not have any need for the possession of the actual document.
These users will be served through a variety of consoles with
cathode-ray tube or other method of presentation of page
material. It seems reasonable to assume that the costs of
such consoles will be still quite high and that they will not
have wide distribution. However, in a community the size of
MIT, it would not be unreasonable to assume that there may
be as many as 50 to 200 such terminals scattered around the
campus at strategically located positions, in branch libraries
or student reading rooms and places of that sort.

A third class of service to be provided will be that of produc¬


ing hard copy by typewriters or printers (mechanical and non¬
mechanical) at remote locations, quite analogous to the type¬
writer output now obtainable through the MAC computer, but
at a higher output speed. Again, making an estimate as to
the magnitude of this service, it would seem not unreasonable
to assume that substantially every faculty member would
have such a terminal available to him and that there might be
between 100 and 1000 available to the student body.

A fourth form of service will be through the medium of ter¬


minals designed primarily for CAI use. This service will re¬
semble that provided by the third class of terminals with,
however, serious restrictions on the speed of output printer,
dictated by cost considerations as noted earlier. There may
be 1000 of these terminals in use at MIT by 1975.

And as a final form of output, we will certainly have available


a Touch-Tone push-button input and voice-answer-back system.
By 1975 we can expect that each student will have a telephone
available either on his own study desk or, at worst, shared
with another student in his dormitory room.

In effect, access by telephone and by any of the other termi¬


nals would be independent of geographical location and there
would be as many terminals on the information transfer sys¬
tem as there are telephones.

46
Intellectual Aspects
We have talked about access in terms of physical devices.
Now let us turn our attention to the intellectual aspects of
providing access to stored information.

In the information transfer system of 1975, an augmented


catalog (author, subject, title, table of contents, abstract,
citations, etc. ) will be available via the computer in machine -
readable form. (See Chap. VII. )

In addition, the Touch-Tone telephone with voice-answer-


back could, of course, be used for obtaining certain types of
catalog information. It is not unreasonable to assume that
by 1975 a fairly elaborate voice-answer-back technique will
have been developed, based on the "twenty-question" idea so
that the input required of the user will be minimized and
limited to occasional pushing of one of the ten buttons that
will then exist on many telephones.

This same system would, of course, be employed extensive¬


ly in instructional use. We can envision the situation by 1975
in which all lectures are stored, at least temporarily, and
that the student who has missed a lecture in person can have
it repeated to him over the telephone. Implied in all this,
although not stated so far, is the assumption that this tele¬
phone answer-back service — and, indeed, all the mechanized
aspects of the library — will be available on a 24-hour basis,
so that the student at any time during the preparation of his
next assignment could refresh his memory as to the exact
happenings in the class the day before.

The typewriter output service which now bulks so large in on¬


line systems such as Project MAC will, probably by 1975,
have ceased to have so large a significance, although here
again it may well be that very rapid printers,facsimile printers,
and the like will have been developed as of this time, in
which case this service will still be important. This service
will be used to obtain extracts of documents, abstracts, con¬
densations, and material which requires detailed study on the
part of the student. The absence of effective graphics will,
however, be a serious drawback as of 1975 unless major de¬
velopments are made which cannot at the present be fully
anticipated.

Turning now to the most elaborate form of terminal which, as


indicated earlier, will probably contain both typewriter input
and rapid output, and a cathode-ray tube display with a light
pen or some other graphic input device, we note there are
essentially no limits to the intellectual scope of the activity
that can be carried on by means of these terminals. The user
will be able to have displayed, on his scope, catalog informa¬
tion, extract information, or complete portions of complete

47
documents, depending on his needs at the moment.

We might pause here to note that the same terminal facilities


that provide access to these various classes of services should
also provide access to computational facilities as now pro¬
vided by Project MAC or by the Computation Center at MIT,
and that this dual function should continue and be undoubtedly
much enlarged by 1975. We might well require the user to
dial a separate number for extensive computational services
since it may be unreasonable to require the information-re¬
trieval computer to handle the transcription of the input and
output between the user and the computational facilities.

From what has been said, it is evident that we anticipate a


situation in 1975 in which the on-line intellectual community
will, in effect, have come into existence, still with terminals
inadequate because of economic considerations. It is also
highly unlikely that the system will be able to serve all the
potential users or that it will contain a completely adequate
corpus of information in machine-readable form, but it will
be a start. This would provide the evolutionary aspect of
the system, enabling it to expand to provide all the desired
services at some future time.

The conventional library as a storehouse for books would, of


course, continue to exist and will continue to be extremely
useful to the user of 1975, although it seems possible that, by
this time period, character-reading equipment may have
been developed such that most archival information could be
translated on demand into machine-readable form. However,
we will assume that this would not be done on a routine basis
but only on a request basis.

The routine operational procedure in the*book aspect or do¬


cument-storage aspect of the complete information transfer
system would consist largely of making and distributing re¬
productions, either page reproductions or entire volume re¬
productions, to requesting users. These reproductions
would, however, not necessarily be in full-size copies. It is
uncertain at the present time what these forms will ultimately
be and what the mechanism of their enlargement for ultimate
use will be but, again, it seems reasonable to assume that
we will be in a transition stage with most forms of services
presently available, or conceptually available today, in rather
widespread use. Many users will desire that the documents
be delivered to them in their full size. For storage reasons,
the library will, however, probably not save all source docu¬
ments in their full size but will be well along in an orderly
process of converting a large share of source material into
microforms. The user who then requests full-size copies will
have the enlargements made for him at the library and they
will be delivered to him in this form.

48
Other users, of course, will want to retain a larger volume
of material. These users will ask for the material to be de¬
livered to them in a microform package and will then have
locally available enlarging equipment for viewing.

The viewing equipment of 1975 will undoubtedly provide for


rapid page-turning of microdocuments in such a fashion that
the user will be able to thumb through a book, in much the
fashion that he now thumbs through a physical book, and to
make use of all the quick-scanning procedures that users em¬
ploy today.

SELECTIVE DISSEMINATION
Many of the difficulties which currently plague selective dis¬
semination systems will have been obviated by 1975. We can
therefore expect with confidence that there will be at least
two forms of selective dissemination for which the system
must make provision. The simplest form of such a system
will consist of notifications sent to the user whenever there
are new acquisitions that fit his profile. However, we can
predict that, for a limited class of users, the actual docu¬
ments will be sent on arrival as soon as the profile match
has been observed. The systems, at least the more advanced
systems, will undoubtedly provide for machine determination
of reader profile based on his initial assessment of his in¬
terest, as modified by his actual use of the library system.
From one really extreme point of view, a request by a user
for a specific document to the library indicates either that
the user’s profile has changed with time or that the system
is not functioning properly.

NEW PUBLISHING PATTERNS


The editing programs and typesetting programs that have
been implemented at various places on a time-sharing com¬
puter will have very great impact on the entire publishing
and library field. Using the report-editing facilities which
will be available, many of the MIT staff will undoubtedly use
the on-line facilities as a means for first recording their
potentially publishable information; and, in fact, potential
books may go through several editions before ever appearing
between hard covers, being available only to the users of the
system via the terminals.

A development of automatic library systems will undoubtedly


lead to rather profound changes in the methods by which do¬
cuments are prepared for library use. In all probability
by 1975, the major publishers will be able to supply copies of
most of their publications in a microform, and it is highly
likely that this form will replace book acquisition, as such,
to a very large measure. Many publications, handbooks and
the like, might even be distributed in magnetic form for use

49
in data-retrieval systems rather than in document-retrieval
systems, and be, therefore, available to the library of the
future in this form.

Changes in periodicals may also be rather profound. It is


possible that periodicals will publish reprints rather than
completely bound journals, and libraries may prefer to sub¬
scribe to a reprint service of this sort rather than to have
the bound volumes of the journal in their possession. On the
other hand, the incremental cost of an additional copy in
microform will be so much less than that of actual volumes
that every library in the field may well acquire a microform
version for reference and copying. Depending upon the pub¬
lishing economics as of that time, the library may acquire a
large enough supply of printed reprints to supply them to
potential users, or they may acquire only one or two such
documents (preferably in microform) and produce the desired
copies on demand in microform or hard copy.

INTEGRATION WITH OTHER INFORMATION SOURCES


The information transfer system of 1975 will easily extend
the reach of its users beyond the resources of the local
system. There will be communications networks tying univer¬
sity systems together and providing connections with national
resources. Facsimile transmission schemes will provide
for the rapid interchange of document images in situations
where coded information is not available.

THE RODE OF THE LIBRARIAN


Naturally, collections of art books, posters, original manu¬
scripts and letters, phonograph records, etc. will have to
be maintained, acquired and loaned — much as they are at
present. It will be wrong to imagine, however, that the
librarian's only function in the 1975 environment will be to
care for the more recalcitrant members of the collection.
Certainly, the librarian of 1975 will be less involved than
now with the individual transaction between user and book,
but our whole purpose is to increase very greatly the utility
of the information-transfer system. The librarian will be of
primary importance in the acquisition of new material, in
cooperative cataloging, in organizing the collection, instruct¬
ing users of the library, and in modifying the rules and pro¬
grams to maximize the services provided to the user over
the long run. The librarian will be able to operate with
greater freedom by having control over advanced machinery.
The librarian will be much involved with the arrangement
of channels with other libraries and facilities and with the
presumption and provision to users of proper and economical
terminals or other means of access to the system. Vastly
more material than now will be available to the user of the
library, and there will be a need for professional librarians
at all levels in the system. It seems likely that to be a
librarian in 1975 will be very fruitful and exciting.

50
THE INFORMATION TRANSFER BUDGET OF 1975

The information transfer system that has been described as a


conceptual pattern for 1975 is vastly different from the library
of today in the scope of the services it seeks to provide. While
no cost estimates for such a system can be made until there
has been extensive experimental investigation, it is clearthat
substantially larger budgets will be required for such an infor¬
mation transfer complex than would be available under a normal
extrapolation of todayrs library budgets. The Planning Confer¬
ence has indulged in some speculation on the proper magnitude
of an MIT information transfer budget for 1975, and has ar¬
rived at a figure of $15 million by two different routes.

It is a well-known phenomenon that the amount of published


information doubles approximately every 13 years. Since
scientific information increases at a somewhat higher rate,
and since libraries should, if anything, increase their rela¬
tive holdings, it seems not unreasonable to assume that the
amount of material available to the typical library, assuming
no changes were made in its function, would in 1975 be at
least double the amount now contained.

Project Intrex will, of course, result in an increased empha¬


sis on libraries at MIT; and we can reasonably expect to see
an increase in the relative size of the MIT library compared
to other universities (it has been decreasing steadily since
1930). With due allowance for all these factors, we might
expect the total amount of stored information in the MIT
information-transfer system of 1975 to be more than ten
times as large as it is in the library of today. We conclude
that the magnitude of the information available in 1975 could
demand a 15-fold increase in the budget of the library. While
the advances sparked by Project Intrex could result in a de¬
crease in the actual budget, a more probable outcome, if
past experience is any guide, will be a very great increase
in the services rendered. Thus we might expect that the
total budget for information transfer services at MIT will be
of the order of $15 M in 1975.

Alternatively, we can base an estimated budget on an assumed


size of the MIT community in 1975, and on the needs of the
individual in the community. For this purpose, we shall take
the community to contain 15,000 people: students, faculty,
research staff, and some of the surrounding community. With
increased mechanization that makes other services available
and with the increase in total available information, it is not
unreasonable to expect that the total expenditure for informa¬
tion services per user will increase. There is, of course,
a very wide difference in annual expenditure per user at dif¬
ferent universities. An estimate of $1000 per user of the
future MIT information transfer system would bring us to the
same $15 M figure previously cited for 1975.

51
CHAPTER VI

PROJECT INTREX

With the picture of the information transfer system goal as of


1975 as just presented, and with our knowledge of the present-
day situation, we must now arrive at some sensible decisions
as to what part of the necessary work will be done by Project
Intrex, which part Project Intrex will try to stimulate but will
not actually do, and what part can be safely left undone on the
assumption that it will be done by others.

One obvious approach to the problem would be to consider the


present library as a starting point and to plan an evolutionary
program. Following this approach. Project Intrex would con¬
centrate its talents on solving some of the more pressing prob¬
lems of the day, for which there are recognizable tools avail¬
able — tools which in themselves may not be very revolutionary
but which, nevertheless, are clearly usable.

Another approach would be to identify a specific, available


tool (or one to be available shortly) that seems to hold prom¬
ise for solution of a major part of the information transfer
problem, and to work toward the development and adaptation
of this specific tool. At present, there are at least two quite
promising tools — microforms, and the on-line, time-shared
digital computer. Other promising tools are: automatic copy¬
ing which, of course, is already being used but certainly not
to anything like the extent that can be envisioned; computer-
controlled, mate rial-handling equipment which could be used
to mechanize the entire book-handling problem of the library;
and facsimile transmission which has hardly been used at all.
There may be still other tools which deserve mention.

Finally we can envision a program that combines both the ap¬


proaches just enumerated. Objections to this procedure arise
from the fear that any reasonable project which could be fi¬
nanced, staffed and maintained at MIT would be too small and
diffuse to justify the adoption of such a broad plan. Such ob¬
jections can be partially met by pointing out that the all-in¬
clusive nature of the approach would interest many diverse
segments of the MIT community and that the project would
profit from the advice and help of the entire community. By
including work toward the evolutionary development of the ex¬
isting library, the project will keep the problems of the real
world firmly in mind; by exploiting some of the newer tools,
the project will be able to guide the development of these tools
along lines that will make them maximally useful for the in¬
formation transfer problem. The Planning Conference was
overwhelmingly in favor of this broad approach.

53
There was a consensus at the Planning Conference that the
proper evolution of libraries had such a great involvement
with machine-processing methods and with machine-search¬
ing methods that the over-all system would be very much
concerned with time-shared, on-line computers. It was thus
decided that Project Intrex should apply the same technology
to the solution of the library problem and to the development
of a real-time, on-line community within the universities and
similar establishments so that a single set of terminals would
presumably serve the user both as access to the modern li¬
brary and as his input to the time-shared system for comput¬
ing, editing, and other functions.

LIBRARY MODERNIZATION

The problems of libraries begin with the high cost of produc¬


ing, storing, handling, and revising books; the problems of
loss and misfiling; and in general the divided responsibility
which often leads the library to minimize its costs by mini¬
mizing its services to the users. A prerequisite to a sub¬
stantial improvement in the over-all operation of libraries
would be a systems analysis and a systems design that opti¬
mizes the benefits to be obtained while minimizing the over¬
all costs, including those to the researcher; the cost of not
obtaining the material; as well as the direct, out-of-pocket
cost of the library. It seems probable that much of the very
high development cost associated with the transformation of
the present library system to a modern technology will be
borne by the Federal Government in connection with its mas¬
sive information services which accompany the space, atomic
energy, and defense programs.

It is useful and profitable to apply computers to the problems


of billing, cataloging, updating, search, etc., in a conven¬
tional library that houses books; but it is much more difficult
and, we believe, vastly more beneficial to modify the entire
system so as to attack the problem of storage costs, lost
books, etc., at the same time that one improves the access
to the contents of libraries by computer manipulation of the
bibliographic information. Thus we consider a world in which
balanced progress has been made toward an over-all system:
in terminals; in the storage of the printed page in image and
in coded form; in machine search and in cataloging; in reason¬
able delivery time for books; and, last not least, in charging
at an appropriate rate for the services rendered. We expect
that the lower cost and more ready access afforded by micro¬
format will cause a change from the storage of actual bound
volumes to the preservation of their contents initially as
microimages and eventually for the same existing books as
microcode, when character-recognition machines are well
enough developed to do an adequate job of effectively encoding

54
the microimages. Toward the end of the period under con¬
sideration, there should be a substantial fraction of manu¬
scripts generated directly in coded form, so that they may be
stored much more compactly than if they were set into type
and photographed. For that part of the information that is
stored in microform, the user will have the choice of obtain¬
ing hard-copy output after a few seconds delay and at a cost
of a few cents per page, or of receiving a microimage suit¬
able for reading at his home or in his office on a microfilm
reader.

One important ingredient of modern technical libraries is a


highly trained and valuable human staff. In many cases, the
user has circumvented the deficiencies of current information
systems by reliance on the reference librarian. In moving
toward the use of automatic data processing, we must be sure
to retain the crucial services provided by professional docu-
mentalists and librarians. These jobs must grow to become
allied with the new tools. It seems likely that the librarian
will gain immense new powers to shape and organize the
corpus of his library once the huge burden of repetitive manual
techniques has been removed. This subject will not be often
mentioned in the detailed discussion of the experimental pro¬
gram, but it is intended to consider it throughout the effort.

EXTENSION OF TIME-SHARED COMPUTING

In considering Project Intrex as an extension of Project MAC,


we find immediately that it is no easy matter to give access
via MAC to the contents of. present libraries or even to the
information or documents created outside the MAC system.
On the other hand, the facilities and methods developed in
MAC and elsewhere during the last few years, in symbol
manipulation and text processing, will be very useful in the
library context. In addition, editing programs and "typesetting
programs" which have been implemented at various places on
a time-sharing computer will have very great impact on the
entire publishing and library field. The preparation of a
manuscript by these modern means obviously offers a way of
compact storage of the manuscript thus produced, together
with the possibility of ready transmission of "conventional"
manuscripts (as opposed to compilations of data). A manu¬
script produced in this way can be stored in about 1% of the
photographic or magnetic material that is required for image
storage of the same manuscript. Similarly, the transmission
of such coded manuscripts can be done with 1% of the band¬
width, and at thus substantially smaller cost than can the
transmission of manuscripts that have to be scanned, although
the terminal equipment must be more complex in order to
reconstruct the manuscript in human-readable form after
transmission. At a time when the cost of both memory and

55
of logic is decreasing very rapidly, and when new techniques
in digital transmission as well as physical mechanisms of
transmission (communication satellites) are becoming common,
it is hazardous to predict the exact nature of a system for use
10-15 years hence. On the other hand, neither the video
transmission of scanned images nor the reconstruction at the
terminal of coded images is obviously uneconomical; and,
evidently, in an experimental program both of them will have
to be considered. This is fortunate, because we will have to
start with library materials stored as images, and yet we
want to work toward a system in which the text itself is ma-
chine-processible and thus susceptible of economical stor¬
age and transmission.

We fully expect that this vast improvement in the information


transfer system for the leading universities and organizations
will be accompanied by an improvement of access to books
and images for the rest of the general public as well. And,
although the experiments of Project Intrex will include some
services that may be too inefficient or expensive to be main¬
tained indefinitely, certainly what goes by the name of brows¬
ing is not one of them. The dedicated browser will be able
to browse as never before.

THE COURSE OF PROJECT INTREX

The goal of Project Intrex is an information-transfer system


to be installed by 1975 not only at MIT but at a number of
comparable institutions throughout the nation and perhaps
the world. To achieve this goal, experimental results and
experience with a model system must be in hand by 1970 in
order to afford a basis for commitment of funds for the pur¬
chase and development of economically viable systems of
storage, transmission, display and interaction.

In order to provide this basis by 1970 at a reasonable program


level of some 50 equivalent full-time people, Intrex should
concentrate on a main stream of experimentation, while re¬
maining alert to developments outside this main stream which
might make desirable some alteration in the program. It is,
of course, not reasonable to determine in fine detail the na¬
ture of the 1975 system on the basis of a 5-week Planning
Conference, but the experimental program proposed in the
next section should allow performance to be predicted, costs
estimated and the worth of the entire system to be evaluated.
It will provide an opportunity for the creative interaction of
the librarian and the systems engineer to enhance the intel¬
lectual capability of the individual in the world of 1975.

Comment on the word "experimental" is necessary. At one


extreme, there might be carefully controlled experiments in
psychology; at the other, one might find the design and con¬
struction of a piece of equipment to find out if it worked. In

56
between are most of the experiments typically conducted in
large research and development efforts. Intrex ought really
to work near both ends of this spectrum. On the one hand,
specific new technologies ought to be applied to the problems
of libraries and information transfer. On the other, compet¬
ing techniques and ideas for solving these problems should be
examined and compared. The suggested experimental pro¬
gram includes both types of effort.

Whatever experiments Intrex sets up, they should be conducted


in a real environment. This is not to exclude simulation stud¬
ies which might serve to guide experiments. However, such
studies will presumably be ancillary to the main tasks. The
word "experiment" has the connotation of an activity which is
terminated after having been in an active state for a period of
time. There is even the suggestion that the data which it
yields will be used as one basis for the design of the "next"
experiment. There may be difficulties in terminating a suc¬
cessful experiment which has been conducted in a real en¬
vironment: to be real, it must serve real users on a realistic
scale and over a substantial time period. This means that
users will become so accustomed to the system so intimately
that they will demand that it remain available after the ex¬
perimental project has accomplished its purpose. At this
point, the Intrex experiment may have to become part of an
operating information transfer system and undergo an ad¬
ministrative transfer.

The employment of real users in a real environment suggests


two other useful conditions on the proposed program: First,
truly careful records should be made of user action with and
reaction to the experimental information transfer systems.
Second, some attempt must be made to provide comparisons
between different solutions of the same problem. For ex¬
ample, the "costs" of various services might be manipulated
to investigate the value of such services. Other variants may
be compared.

It is obvious that a multitude of library problems cry for at¬


tention. Literally, hundreds of independent efforts in the
United States alone are directed toward one or another of
these problems in the information transfer field. Intrex must
be selective in allocating its resources to major experimental
efforts. The selection of areas for experimentation which
appear most urgent and best matched to the time scale of the
project and the resources of MIT has been the major task
confronting this Planning Conference. The recommended
program is addressed mainly to the broad problem of access
— in particular, access to bibliographic material, documents,
and data banks. A core program dealing with this information
transfer function has been formulated, together with support¬
ing activities and recommended extensions.

57
Core Program

The Model Library. To provide an environment for the per¬


formance of the Intrex experiments, we recommend the es¬
tablishment of a facility, called a “model library". Only by
coming to grips with the real, every-day problems of setting
up and running a pilot system can the project assemble the
experience required to evaluate its experiments, just as it is
only by serving the real needs of real users in the university
community that the experiments can be meaningful.

Mechanization of Current Procedures. In its early stages,


the model library will display, in readily attackable form,
most of the basic problems of university libraries. It will
therefore afford an excellent opportunity to combine the pro¬
cedural background developed in the MIT libraries with an
on-line computer system’s capabilities for solving such prob¬
lems as the selection, acquisition and weeding of materials
and the control of serials. The model library will also be
useful in formulating theory, and in acquiring data for analy¬
sis of system performance.

Augmented Catalog Experiment. The principal finding ele¬


ment in an information transfer system involving any form of
storage will be a catalog. Augmentation of the catalog in con¬
tent, depth and connectivity is facilitated by the computer that
is used to control the flow of information in the system. We
recommend experiments with an augmented catalog established
as a data base in digital form in the on-line computer system.
Such a catalog would cover books, journal articles, reviews,
technical reports, theses, pamphlets, conference proceedings,
and other forms of recorded information. The catalog should
encompass an interdisciplinary field, and should contain ref¬
erences to enough material to interest a serious worker and
to present significant bibliographic problems. Operational
experiments will deal with bibliographic search for both
specified and unknown documents. The catalog will also
provide data for experiments in selection, acquisition, cir¬
culation, and other library operations; in selective dissemina¬
tion; in some limited forms of browsing; and in recording user
interaction with the system.

Text Access Experiment. When the retrieval specification


has been narrowed to the set of identifiers of the documents
the user wants, the problem is to deliver or display those
documents to him, or to have them ready when he calls for
them. Project Intrex must determine the merits of the vari¬
ous approaches to this problem in terms of effectiveness and
cost. The proposed experiments will involve the following
technologies:

58
For storage: — print on paper, analog microimages
on photographic materials, analog signals on mag¬
netic or possibly thermoplastic materials, digitally
encoded characters and graphic elements on photo¬
graphic or magnetic materials;
For delivery: — transportation for some of the fore¬
going, electrical transmission for others;
For display: — direct inspection, xerography, optical
projection, oscilloscopic display, and the like.

We recommend implementation by Project Intrex of several


of the most promising systems, and operational evaluation
through actual use.

Network Integration Experiment. We suggest that Project


Intrex explore a range of ideas designed to promote the in¬
tegration of university libraries into the national (and, ul¬
timately, international) network of information centers. A
major experiment is recommended on the interaction of a
computer-based university information transfer system with
the informational resources of such organizations as the
National Library of Medicine and the National Aeronautics
and Space Administration. In addition, we recommend that
Project Intrex explore, with other research libraries, docu¬
mentation centers, and information exchanges, the various
ways of interchanging bibliographic, indexing and abstracting
information, and of overcoming divergences of format and
convention that might impede cooperation.

The Fact Retrieval Experiment. The existing bibliographic


organization is largely document-centered. During the ex¬
pected life of Project Intrex, continued progress will be made
on the rapid processing of data retrieved from very large files;
some capability will be developed for the retrieval and assem¬
bly of facts; and many advanced systems for the automatic
answering of questions will appear. A recommended major
Intrex experiment will involve development of a computerized
"handbook" and data banks, and of techniques for querying
them. These techniques will be compared and evaluated in
relation to book-based techniques.

Supporting Activities

Experiments with Other Library Functions. The facilities


required for the major Intrex experiments will support others.
Insofar as personnel, space and funding permit — and particu¬
larly as the topics themselves attract interest and resources —
experiments are recommended in such areas as:

Teaching and learning in the on-line network.

59
Browsing; planned facilities to foster unplanned
discovery.
Selective dissemination of information.
Use of the on-line network to expedite preparing,
reviewing, printing, indexing and abstracting of
manuscripts.
"Publishing” through the system to the on-line
community.

Component Technology. Perhaps the most significant factor


in the situation that Project Intrex is entering is the availa¬
bility of a powerful new computer technology. With respect
to this technology, we are distressed by the primitive state
of two critical items: consoles and interaction languages.
We recommend that Project Intrex give attention to these areas.

A Unifying Theory. A major intellectual challenge for Project


Intrex is the development of a unifying theory that will lead to
coherent design and interpretation of experiments in informa¬
tion transfer systems.

60
CHAPTER VII

THE EXPERIMENTAL PROGRAM

1. THE MODEL SYSTEM

The design of a breadboard or model is a time-honored engi¬


neering technique. Recent years have seen breadboards that
have varied all the way from a single transistor to the Experi¬
mental Subsector of the Semiautomatic Ground Environment
for air defense. As we said in Chap. VI, we believe a model
system will best serve Intrex as a place to try system experi¬
ments, to investigate competing technologies, and to evaluate
new ideas experimentally.

The choice of the field or fields for this model will be crucial.
It is clear that the whole MIT library is simply too large as a
first step. If the funding and scope of the project permit, an
upper bound on the sensible first step might be a direct attack
on the MIT Engineering Library. Barring that, Intrex should
pick one or a small number of subject fields for individual,
experimental systems treatment.

We have suggested that Intrex pick one or a small number of


subject-oriented fields for its major experiments in new
information-handling techniques. Within those fields and
within those experiments, Intrex should seek to provide in¬
formation transfer services that will answer the real needs
of some real users in a university community.

If a single technical field is chosen, several factors should


influence the choice. First, it should be one where useful
and important materials come from a number of different
kinds of sources. Second, the field should imply a reason¬
ably wide class of users in the experimental MIT environment.
Third, the field should be somewhat interdisciplinary in na¬
ture, rather than a very clean and pure area of research.
Finally, the field should be one where the problems are not
so severe as to impede early results. In particular, one
should probably not pick a field like political science where
materials in newspapers and other ephemera represent key
sources of relevant data. The field should at least be struc¬
tured enough that one can identify the relevant sources, and
be one that is not now covered by a specialized index such
as Chemical Abstracts or Biological Abstracts.

Consider, then, a subject field around which a machine-


readable catalog, a collection of documents, a collection
of microfilm, a collection of digital data and so forth are
arranged. Built around that data base, and assuming the

61
availability of a time-shared computer facility and many
users, a large number of experiments could be attempted.
Powerful, machine bibliographic-search techniques could be
implemented. Machine fact retrieval could be tried. Many
different forms of access to actual documents could be inves¬
tigated. "Telebrowsing" could be investigated in the context
of the machine-readable catalog. Experiments in selective
dissemination of materials could be based upon the machine-
readable catalog. The possible uses of this machine-available
body of bibliographic material and data for educational purposes
and class use could be seriously investigated. Experiments
in different techniques for publication would be appropriate.
The more mechanistic functions of everyday, conventional
library life could be approached from a new point of view;
here we would include circulation, acquisitions, serials con¬
trol, etc. Thus, the model library system would be at once
centralized and distributed. There would be a central store
of documents, microfilm, and digital data, and there would
be a distributed time-shared computer net and distributed
users with access to that body of data. Many different aspects
of information transfer could be investigated and evaluated in
the context of this model facility.

The following sections describe a proposed model experimental


library system to be constructed as an operating environment
for the evolution of new information transfer systems at MIT.
The discussion is divided into two main parts, "The Augmented
Catalog" and "Text Access".

62
THE AUGMENTED CATALOG

The catalog in the American academic library has had a long


and respectable tradition as a guide to library resources.
Conceptually, it was designed to encompass a bibliographical
record of all documents in the library — books, periodical
articles, manuscripts, broadsides, maps, music scores, or
whatever form the graphic records of our culture might take.
In isolated instances, catalogs have been structured to cover
this broad spectrum; more typically, they have been limited
to a record of books and journal titles, leaving the role of
cataloging journal articles and other types of documents to
indexing and abstracting journals and other special indexes.
Principally for economic reasons, the catalog has tended to
give only the barest minimum of information to serve as
access channels to the holdings of libraries. As the growth
of the literature mounts, it becomes clearer that the catalog
must be improved.

Hopes that an improvement can be made have been nourished


by the successful trials of computers as versatile working
tools in this field. A notable example is the Technical Infor¬
mation Project (TIP) catalog of articles appearing in selected
physics journals. This work at MIT, under Dr. Myer M.
Kessler, has demonstrated the practicability of taking some
further steps toward the eventual goal of a comprehensive
catalog with more, and more useful, paths of access.

The term, "catalog", is normally used to indicate the biblio¬


graphic record of a discrete collection of documents in a
given library, or a record of several discrete collections in
the same library or in more than one library. In the context
of this discussion, the term is used to indicate a file that is
an inventory record of particular collections, and of sources
of information about items in these collections, linkages to
other files, and even a file that contains records of its own
use and use of the documents recorded in it. It is a function¬
ally augmented catalog which permits easy annotation and
expansion, designed to be a record of the total library holdings
in the selected subject area. The catalog of these holdings is
to go well beyond published books, journals, abstracts, maps,
charts, technical reports, standards, catalogs, etc., into
the area of unpublished works such as conference papers,
class notes, galley proofs, and correspondence.

63
Time boundaries of the collection and its catalog should be
determined by the need to cover as much retrospective liter¬
ature as is required to meet the demands of the serious
scholar.

The augmented catalog would form the heart of all inventory


control of the library. It would provide the store of data in
which: sophisticated bibliographic searches would be possible;
some forms of browsing experiments could be performed;
the location and an indication as to how a particular item
might be physically obtained would appear; selective dissem¬
ination experiments could be arranged; and it would furnish
a key tool in recording user interaction with the system.
These new jobs give rise to the term, "functionally augmented".

The Environment

It is necessary to state several assumptions which are, in a


sense, the environment within which the catalog is constructed.

First, the total universe of recorded information on any worth¬


while subject is large, is scattered in many media, appears
in a wide variety of formats, and is usually interdisciplinary
with respect to any conventional disciplinary division of
knowledge. Means should be available to the specialist
working on a subject to get at any part or all of this universe
of information when he needs it and from the point of view of
the discipline within which he is working.

Second, it is beyond the capability of any one library — except


possibly advanced information centers specializing in very
narrow fields — to acquire and control bibliographically and
to provide in-house access to a relatively comprehensive
collection of everything recorded on any one subject. An
underlying reason is that much information being disseminated
is in forms that are very difficult to collect generally — for
example, a high proportion of conference papers are never
published and are available only from the author or from the
conference secretariat; contract reports get limited distri¬
bution and are frequently restricted or otherwise classified.
Personal correspondence is normally limited to the author
and one recipient. Yet all these may get bibliographical
publicity, as in announcements of meeting programs, citations
in bibliographies accompanying journal articles, or by word
of mouth. Systematic acquisition of these documents, or
knowledge of their existence and accessible locations is ex¬
tremely difficult and complex, beyond the capabilities of
general libraries in more than limited fields, and comes
within the province of specialist information centers.

64
Third, of the recorded information available on any one
subject, only a percentage ever gets under standard biblio¬
graphical control in catalogs, abstracting services, indexing
services, lists of citations, etc. All these devices must be
used in a comprehensive search, even though the results
will be incomplete; and a comprehensive search must go
considerably beyond them. And yet the editorial and intellec¬
tual efforts of assembling even these systematic devices are
extremely complex, costly, and time-consuming, requiring
specialist staffs. They all use different techniques and
different access terminologies, and are structured differently
with respect to subject arrangements.

From this set of circumstances, it is possible to draw the


following conclusions:

No library, including MIT, can hope to collect com¬


prehensively in more than a relatively narrow band,
or series of bands, the total spectrum of knowledge.
Collecting must be a shared responsibility among
specializing, research institutions. Therefore,
while MIT might provide broad-spectrum adequacy
for the student and learners, it will never be able
to satisfy fully the needs of all specialists in all its
fields, and must supplement its own resources by
drawing on those of other institutions in one way or
another. Even more difficult would be the raw pro¬
cessing of hundreds of thousands of items into a
general bibliographical system each year, so as to
provide multi-disciplinary access.
The best the present library can do is to systema¬
tize and organize access to existing bibliographical
keys such as abstracting and indexing services,
catalogs of other libraries, bibliographies of spe¬
cial subjects or other user interests, and citation
lists. Each field has its own terminology and its
own structuring of its information; and each is
constantly evolving new terminology and ways of
looking at information. It is difficult for someone
in one field to work on the information generated
in another field. No one library staff could intel¬
lectually, financially, or physically hope to merge
these into a unified system.
Any one library, therefore, can only hope to be a
switching center — a telephone exchange — to plug
the user's line into any one of many lines in a com¬
munication network, one or more terminals of
which may be the local library but some (and often
many) of which will be in other libraries or infor-

65
mation centers. The excellence of any one library
is based, today, as much on its ability to provide
good switching facilities — to tap other libraries —
as on its ability to provide a substantial part of the
corpus of knowledge of any one subject.

General Requirements

In the general discussion of the model system, we mentioned


the criteria for the selection of an appropriate discipline
within which to perform experiments. Beyond this, a number
of other requirements may be specified to assure a catalog
data base adequate for a varied experimental program. The
more important of these are described below.

When the criteria for search are complex, the system must
be open-ended yet controlled, one that efficiently brings to
light the relevant (while suppressing what is not germane)
from a heterogeneous store. This is important for any use
of the catalog, whether a search is for a specific document
or for unspecified documents relevant to a specific search
interest. In this connection, decisions must be made on the
degree to which input will be limited to the author’s own words
in titles or abstracts, and the degree to which the author’s
terminology will be translated into or supplemented by a con¬
trolled vocabulary.

The program should provide for input of selected reader


annotations on items in the catalog, and for recording by
users of items found to be useful with reference to particular
subject interests. Through the latter provision, personal
bibliographies could become part of the total bibliographical
apparatus represented in the catalog.

The catalog program should also permit input of inventory


information on documents in the collection, including records
of items out on loan, at the bindery, or otherwise unavailable
for immediate use; and the program should provide for auto¬
matic recording of each use made of the catalog as well as
use activity of documents in the collection.

It is assumed that computer-related information systems will


be used to a very significant degree for searches to determine
whether or not the library has a specific document and, if so,
its call number, address, or other locational information.
When request data match those in the computer catalog,
responses should be forthcoming, without complications. On
the other hand, request data which are incomplete or incor¬
rect may be expected to occur frequently. Examples are:
1) a reference in which data are correct but incomplete;
2) a reference that is partially incorrect (wrong date, wrong

66
author, wrong title, etc., with the portion of correct vs in¬
correct information being highly variable); and 3) a reference
that may be correct, but that does not match the data used by
the library to describe the document.

A fairly high level of reader input errors must be assumed,


whatever their cause — misinterpreted or incorrectly copied
data from other bibliographic sources, or the incompleteness
or incorrectness of data in the original source. Some com¬
puter capabilities suggested are:

A shorthand or abbreviated input-query capability


that can be easily constructed by a few simple rules
from a reference or book in hand;
A compressed store, to offer efficiency in storage
and very-high-speed, low-cost searching; and
A reference matching and approximate matching
technique.

The utmost care will need to go into the machine record in


order to have high-quality output later on. Detection and
correction of internally recorded errors is a non-systematic
operation unless a great deal of redundancy is allowed. For
example: If the copyright date for a 1965 book is erroneously
recorded as 1956, there will be no simple way to detect and
correct this after input. If a user includes as one criterion
of a search "only books published after 1960", the book would
not be retrieved by the search program. This emphasizes
the importance of automated input from carefully edited copy
to eliminate unnecessary input errors.

The catalog program must provide for linkage through cross


references supplied on the console to the reader or, hopefully,
through automatic switching to other automated files. The
very high cost of intellectual input makes this provision im¬
perative.

The catalog should be remotely accessible and capable of


being searched by highly sophisticated, user-developed search
strategies. Further, it must be available to a number of
users simultaneously at any hour that access is needed.

Finally, the catalog must have continuity and integrity. Daily


tapes of the working file to prevent accidental loss and pro¬
grams to preclude unauthorized changes are a rigorous re¬
quirement.

Input

The catalog that will accomplish the functions just enumerated


must contain a large amount of information about each item in

67
the collection — far more than has been put into the traditional
card catalog. The inclusion of any new or old element in the
catalog entry will depend, first, on the thought that it should
be there and, second, on its continued usefulness in service.
This open structure provides for the addition of new informa¬
tion and the excision of what is found useless. To begin, we
think the catalog entry should include:

The usual bibliographic data given on Library of


Congress cards for books.
Additional subject indexing, with language taken from
the item itself (paragraph headings, for example )
and from a structured vocabulary.
An abstract, review or extract and the sources.
Citations to the item and given in the item.
The type of material, level of approach, and aim of
the author.
Informative comment by or about the author and re¬
lating to the item.
Languages of text (if not in English) and availability
of translation.
Bibliographic analysis of non-monographic items
such as symposia, conferences, series, i. e., ad¬
vances in, progress in, and compiled texts.
Tables of contents, index, text samples.
Cost and source information.
Location and circulation information, constantly up¬
dated, plus records of use for the entry.
Forms and price, if for sale.

Perhaps it should be immediately noted that a user of this


system will not receive the full flood of material about all
articles in which he is interested. Searches would, un¬
doubtedly, be made in stages; and the fact that material is in
the catalog does not mean that all users need suffer through
looking at all of it.

It should be evident that not all the items that will be found in
the completed catalog will be incorporated from the first.
Indeed some, such as users’ comments and sources of reviews
of current items, could not be included until they become
available. Therefore, an evolving entry is expected, with
some items — such as the basic author, title, place and date -
being entered early in the process and other data being both
added and deleted during the life of the catalog.

68
In the initial development of the catalog, current materials
will need to be supplemented by conversion of records of
material already in the MIT libraries or in other files, in
order to assure, as soon as possible, a file of information
adequate for effective reader use and project experimentation.

For abstracts added to the file, decision should be made on


whether abstracts used will be limited to those prepared by
authors, to abstracts prepared by documentalists, or to a
combination of these.

Establishing the file for the augmented catalog could be the


largest total problem in the experiment. It will certainly be
costly. As a first and most obvious answer, one can key¬
punch the material. Since this experiment is in a restricted
field, the cost of key-punching a body of material may not be
prohibitive. It is an easy, obvious way and we should not be
afraid of it.

However, if we choose wisely, and if we have a little luck,


there are at least two other possible ways to get such infor¬
mation into machine form. The first way is to cooperate with
the producers and sources of the document or material. One
or more journal publishers might be convinced to provide
machine-readable copy of certain portions of the material;
so might the card division of the Library of Congress. This
might even be possible if machine copy were not normally a
characteristic of the source operation. As a second route,
if it were not possible to obtain machine-readable copy, it
might be possible to machine-read the material. In particular,
if one wished to enter previously typed or printed cards into
such a system, it might be entirely feasible to machine-read
some fraction of that material.

The augmented catalog experiment must be applied to a cor¬


pus of material above critical size. Critical size cannot be
described directly in numbers, but it must satisfy the func¬
tional criterion of being useful in a real sense to a serious
worker in a broad discipline. The critical point can be antici¬
pated when a major proportion (say, above 50%) of the liter¬
ature of the experimental field is in the system.

In order to provide for interaction between users and stored


information, qualified users will be invited to evaluate and
enter comments on the effectiveness of the indexing, as well
as on the quality of the documents. As a result of user sug¬
gestions, the system may be modified and document categories
may be shifted, making the catalog a constantly improving
tool. Because of interaction with the user, the catalog will
play a role in book selection, weeding, and general evaluation
of the system.

69
File Organization and Procedures

The machine files of the proposed augmented catalog, and the


machine programs used in the input, output, structuring and
management of these files, will provide for very flexible ex¬
perimental arrangements and for the collection and analysis
of experimental data. The amount of information to be filed
and retrieved suggests that it will be necessary to organize
the files into several hierarchical levels, and to provide pro¬
grams for the computer that permit either continual or
periodic updating and rearrangement of the file structure so
as to allow it to adapt to the uses that develop during the ex¬
periments. Programming techniques to accomplish these ob¬
jectives are beginning to be used in a variety of other computer
applications.

In particular, it is probable that the smallest and fastest part


of the file,namely, the working memory of the computer, will
contain at any time only those parts of the filed information
with which the machine is working in response to the requests
of the users who are working with the system. These files
in the computer will be copied as needed from larger and
slower-filing machines. When no longer needed for problems
at hand, they will be erased and the memory space will be
made available to other users.

In many current computer systems of large size, a magnetic-


drum file is used to hold a somewhat larger volume of infor¬
mation which can be put quickly into the computer’s main
memory as required. In the Intrex catalog, it might be con¬
venient to keep in such a drum the higher levels of directory
information which establish access paths to the larger files,
so that they may be called quickly into the computer memory
for direct use. This directory information will be established
in part by direct input from the personnel responsible for
maintaining the catalog and in part it may be derived through
the application of computer programs to the catalog-use data.
For example, it will be possible to tag any item in the whole
file structure with a marker to indicate the time at which it
was last used. One technique that seems feasible will be to
leave materials called up from slower storage files in the
faster files until these files contain so much material that
their operations become awkward. At this point, the super¬
visory program of the machine could destroy all materials in
the faster files which have not been used since some cut-off
time. In this more-or-less automatic way, it would be possible
to keep more frequently used items more quickly available in
a systematic manner.

Beyond the magnetic-drum file in a modern computer system,


there is often a magnetic-disc machine, or a much larger and

70
slower drum machine. It appears that this level of the file
structure is the one in which it will be found possible to es¬
tablish the majority of the simpler bibliographic control in¬
formation and to provide the high degree of connectivity, or
cross-reference linkage, which the machine system makes
feasible. In the language of the programmer, the file at this
level may be a list-structured or perhaps a ring-structured
file. This file can have a number of interconnected and over¬
lapping classification schemes expressed in its structure.
At each intersection or terminal point in the file structure
will be a list of addresses in the larger file at which complete
data concerning any cataloged document may be found. There
will be sufficient amounts of the normal kinds of bibliographic
data in the disc file to permit the implementation of the
simpler search strategies and the collection of experimental
data on the more normal uses of the system.

One of the important features of the entire file structure


is that it will be necessary to keep the really complete
record pertaining to each document in only one place in
the file structure, and it will be possible to refer to this
record from many other places in the structure. In a
properly programmed system, it will also be possible to
make the processes of addition to and deletion from the
structure entirely automatic, in that the machine can have
a set of file-addition and file-deletion programs which
will automatically enter selected parts of the main record
in all the directory and auxiliary files when a new record
is added, and will automatically delete all references to
a record that is removed.

In a file somewhat larger and slower than the disc file, then,
will be a numerically arranged file of the master records
pertaining to all the cataloged documents in the collection.
These records may be formatted to contain places for all the
kinds of information that may be desired in the experiment,
including places for user annotations and comments.

Some space will be needed in all parts of the file structure


for the supervisory and executive programs which permit the
system to interact effectively with the user and to do its own
internal housekeeping. It will also be necessary to provide
space for monitor programs and experimental data-collection
and data-analysis programs for the conduct of the experiments
As apart of the experimental program, both with the catalog
and with other phases of user operations, it will be desirable
to provide enough space — at least, in the largest and slowest
memory — that each user may have a file of his own for his
personal needs in keeping his own index, writing and keeping
his own search programs, etc.

71
A word is in order on the subject of file protection. The fact
that the system normally loads the computer memory from
data stored in larger and slower files means that a failure
here is not disastrous. Similarly, the drum memory is
normally loaded from other system files, so that a failure in
the drum system will seldom do much permanent damage to
the file structure. Perhaps the most crucial memory of the
structure suggested here is the disc storage. Procedures by
which the disc storage is periodically copied into the larger
and slower data-cell or tape storage, so that the catalog can
never be destroyed completely by a minor accident, are
probably to be desired. At the level of the largest and slowest
file machine, it may again be desirable to copy its contents
from time to time on to magnetic tape, just to avoid the neces¬
sity of creating the new, machinable input data.

Search Considerations

In the discussion of what information is stored, the implication


is given that the list is open-ended and that any important in¬
formation can be added. This open-ended quality is to be seen
and used in other ways as well. The functionally augmented
catalog embraces entirely new kinds of library services and
new ways of performing old services. Through the catalog,
we can get new information about the user's needs and his
search trails which could produce an improved catalog and,
by extension, an improved literature.

This process of search through the catalog for items relevant


to the need is not simple. It will differ from user to user,
from time to time, and from discipline to discipline. The best
one can hope is that there are enough searches of all types so
that analysis will yield a set of generalizations that will help
later searches to get the most relevant items more quickly.
This process is conceived as one of successive approximation
by means of successive cuts or additions through man-machine
dialogue until the number of catalog entries retrieved equals
the number of documents the user is able or willing to examine
in toto for their relevance to his purpose. The implicit assump¬
tion is that, in most cases, no single identifying element, or
simple combination of only two or three, is adequate for this
purpose. The augmented catalog, with its record of use, will
show the sequence in which the successive cuts are made and
what subset of documents finally satisfied the user or perhaps
turned him away unsatisfied.

The subject-matter terms will clearly have an internal organ¬


ization, partly hierarchical, but with cross-links as rich as
can be provided. The other categories should also have in¬
ternal structure whenever possible (e. g., under author there
should be subdivisions by country, institutional affiliation, etc. ).

72
In any of the hierarchically organized index classes, a pro¬
vision would be made for the user to indicate the level of
abstraction (depth in the hierarchy) of the desired search,
and for the machine to inform the user of the hierarchical
structure that he is entering.

From the basic thought that the recommendations of a specific


person are valued, another possibility for this augmented
catalog becomes evident. Assume that the library regularly
obtains the part-time services of an expert in each field or
subfield. Such an expert might be asked to help for a period
of a year; during that time he would produce and/or maintain
a graded bibliography in his specialty. This graded biblio¬
graphy would be stored as part of the augmented catalog and
would be available for perusal by anybody having access to
the system. The term '’graded1^has several implications.
First, the bibliography might be graded by level (in the sense
that students might wish to obtain material reporting new
results). Second, it might mean grading by quality (in the
sense that the reviewer would include critical comments on
the material and would also order the material in such a way
that one might first obtain articles which were considered
best in some sense). Third, the material might be graded by
subcategory, using the sensible subject divisions of the field
that year. Finally, the material might be graded by type in
other ways (an example might be grading as to theory or
experiment). The expert would, in other words, filter the
material and attempt to put into the system a bibliographic
list or a set of lists which would be most helpful to all the
various classes of users of the system in that particular
branch of science.

In attempting to use experts in this fashion, the library takes


on an active role in providing filtering of the material. The
full range of material (good and bad) would, of course, be in
the system and accessible to the users of the system. If the
users did not wish to search the full range of material and
wished, instead, to be led to some particular subset, graded
and filtered bibliography lists might be a very useful tool.

Equipment

Attainment of the large memory is possible in a number of


ways. The best way during the experiment time (1965-1970)
may be in a data-cell memory such as the IBM Cypress
electron-optical memory with 3 x 10H bits of storage. Other
data cells or even disc-like memories can be selected with
space, cost, access times, and capacity being weighed on a
brand-by-brand basis. The model system, based on an esti¬
mate of 100, 000 items, 1000 words per item, and 50 bits per
word, will require between 109 and 1010bits. Remote access

73
can be achieved through a MAC-like system and this is what
is proposed for the model library. It may also be desirable
to integrate local computational capacity with the catalog
memory; such a step implies use of the time-shared system
primarily for the connection of users to the catalog.

The needs for access to the computer system, in whatever


form, will require consideration of different types of pro¬
cessing for different users. Except for circulation informa¬
tion, which must be momentarily current, the librarian’s
input devices can, in all likelihood, take advantage of batch
processing and slack-time entry to the computer. This may
mean that a keypunch or punching typewriter will meet the
catalogeds need. At the other pole, the user will require
instant access, easy viewing, and probably hard copy in some
form. The present use of teletype and typewriter-like devices
can serve as a starting point for user consoles, but the need
for progress is evident. Some users in the experiment might
be provided with Touch-Tone telephone and voice-answer-back
to coded query. This does not solve the hard-copy need but
may satisfy undergraduate students whose needs may have to
be met at low cost. For others, a cathode-ray tube, plus a
line printer of compact design, may have to be engineered
and built. Final acceptance of the augmented catalog will
closely be tied to the means of access given to the consumer.

Another question to be answered involves the need for a num¬


ber of consoles in the experiment. Must each user have his
own terminal on the system? It is easy to take the strong
position that each user should, indeed, have his own terminal.
On the other hand, one can worry about the obvious economic
implications of that view. The users must be a real population
and a sizable one. Certainly 50 faculty or staff members
seems a minimal number. In addition, it would be desirable
if a much larger body of students could have access to the
system but perhaps, as mentioned above, in a less person¬
alized fashion. It is probably in the best interests of the
experiment to give a terminal on the model system to each
potential staff or faculty user in the field of interest. Whether
the terminal has to be general-purpose in the sense that it
can also use general computational facilities is a second-level
question related to loading of the computational facility. One
of the important questions that such a catalog experiment
might answer is whether or not there is a really significant
increase in the amount of use that the individual researcher
makes of the library. It seems clear that an individual console
is the best way really to find out whether such a system has
produced major changes in the habits of people. There is no
attempt here to be pure; if there are two people in an office,
they obviously can share a terminal, or perhaps one might
go even a little farther. But, in principle, access must be
very, very easy.

74
Questions for Investigation

When the operational stage is reached, what should be mea¬


sured? At present it is not possible to suggest a quantitative
scale for measuring all the attributes one might expect in the
catalog envisioned, but the questions to be put at the right
time can be posed. One set of questions relates to the use of
the augmented catalog in the search for a specified item; what
characteristics of documents do people remember, and which
do they remember most reliably? Can a man-machine dialogue
be programmed that will force the user to specify enough of
the characteristics he remembers to permit a successful
search?

Another question, related to search for a specific document,


is: What is the simplest code, based on conventionally given
bibliographic descriptions (e.g. author, title, date, etc.)
that will reliably specify any document? What is the most
efficient order of coded elements? For example, will search
time be minimized if the order is date-author-title, rather
than the conventional author-title-date? Or if it were language-
date-author-title? In large catalogs, might it be field-date-
author-title.. . (where Mfieldn is defined simply as Physics,
History, English Literature, etc. )?

A somewhat more complicated group of questions arises in


connection with those searches in which the searcher is look¬
ing for any or for all documents that meet criteria he specifies,
but he does not know: a) how many documents satisfy his
criteria, or even if one does; b) his criteria generally include
as essential some specification about the content of the docu¬
ment; and c) his criteria are usually more specific, or more
detailed, than any single description, or simple combination
of descriptions, that the catalog has in the past provided about
documents, or has provided in readily searchable positions.

It is assumed that, at least for the period of the plannable


future, any search for unknown documents that results in the
specification of all documents that meet the desired criteria
(as distinct from the stated and searchable criteria) will also
result in specifying documents that do not meet the desired
criteria. The problem is to provide bibliographic search
capabilities and patterns that minimize the number of these
latter types specified as the result of the search.

The questions that experiments should be designed and per¬


formed to answer are the following:
What is the relative contribution to the biblio¬
graphic determination of the document’s rele¬
vancy of each of all practicable document
descriptions or tags, when these descriptions
are considered singly and in various combinations?

75
What is the most efficient search strategy?
What is the most efficient file order and
organization?

It is emphasized that each of the above questions is to be


answered experimentally, that is, by a comparison, in a
controlled environment, of the alternatives suggested, and
by the use of standardized, objective, and quantifiable mea¬
surements. One possible approach follows:

Problem: To determine relative value of various elements


and combinations of elements in the file for predictive rele¬
vance of documents retrieved.

Procedures:

1) Search the file on each element of descrip¬


tion in sequence, using the user’s terminology
against title, abstract, table of contents,
formal descriptions, etc., and record the
descriptive element searched for each docu¬
ment retrieved by a given search.

2) Repeat the search in a revolving sequence of


terms.

3) Repeat (1) and (2) using formal descriptors,


if these terms differ from the language of
user’s request.

4) Repeat (1) through (3) for additional problems


to the level of an adequate statistical sample
and correlate results.

Possible values of the experiment:

1) Assistance in design of search strategies,

2) Decision on descriptive items which might


be more economically stored in microfile
without materially weakening the research
effectiveness of the catalog, and

3) Decision on sequence of descriptive items


in a microform catalog.

Information produced by the interaction of the user with the


augmented catalog, concerning the effectiveness of inputs,
files and user habits, is only one aspect of the yield to be
gained from the proposed system. Implications about the
collection as a whole can be drawn from the user’s queries.

76
Computer programs capable of assembling data on all user
actions at the consoles will be incorporated so that more
elaborate questions can be asked and answered. Samples of
such questions are: What will happen to the habits of the
users of the system? Will they, in fact, be able (and want)
to make quite different uses of the system than is typical of
their current use of libraries? Beyond this there are whole
classes of small sub-experiments which are of interest. Does
the annotation furnish an important portion of the catalog?
Do the users like to see the reviews? The abstracts? The
tables of contents? Or do they simply not use these facilities?
Is the citation index the most often-used portion of the system,
or do people prefer to employ subject headings? Would expert -
generated critical bibliographies furnish an important and
much-used tool? If the chosen field were interdisciplinary in
nature, would this new catalog really make a major change
in the access of one portion of the field to the other?

What classes of materials are most frequently called for?


WTiat classes of material (e. g., university-generated research
reports) are at present not adequately handled and disseminated?
Do user difficulties in tracing a class of literature such as the
above-mentioned reports suggest new action on a national
level — a central store like DDC?

Does the record of use of materials indicate the need for


rearrangement or relocation of the collection? For example,
an active current collection and an inactive stored collection?
What are publication dates of the most-used materials with
respect to the circulation dates? What are the characteristics
of little-used materials?

In searching the catalog, how usable are the corporate entries


established by the Library of Congress? Does the user con¬
sistently approach a corporate author in a manner different
from the established entry? Is it desirable to consider a
revision of Library of Congress entries? Are AEC or NASA
entries more adaptable to the user’s main line of approach?

Without the restrictions imposed by filing in card catalogs,


does the user adjust more easily to standard library entries?

Can the user wander or browse knowingly in the catalog? Are


directional indicators adequate to lead into disciplines that are
foreign to him?

Should there be different search modes? One for established


scholars, one for graduate students, another for undergrad¬
uates? Should there be different levels of input?

77
TEXT ACCESS

Experiments in methods of text presentation require the


availability of a suitable corpus of information, preferably
stored in more than one form, and a variety of different
access mechanisms so that the user may be provided with
several quite different kinds of services. We envision a
situation in which the user will have a choice among:

Access to the original document on a loan basis.

Possession of a full-sized hard copy at a realistic


cost.

Access to a microform copy on a loan basis, with


high-quality readers available in central locations.

Possession of microform, with the ownership or


local availability of a satisfactory reader.

Possession of a typewritten, printed or facsimile


copy prepared remotely at the user's location.

A soft (non-permanent) copy presented on a


cathode-ray tube.

It may be impractical to offer all these services for the en¬


tire corpus of the model system and to all users, but the
experimental facility should be as all-encompassing as
possible.

To provide these services, we must have a corpus of informa¬


tion that is large enough and varied enough to attract serious
users. We can consider two possibilities: 1) that the model
library have its own corpus, completely independent of the
holdings of other libraries; or 2) that the corpus be made
available as a special service within an existing library.
For the moment, we shall assume that the first situation
will obtain, although cost considerations may later modify
this decision. A collection equivalent to 10^ or 10^ books,
or 1C)6 or 10 7 pages, seems quite reasonable. The corpus
might well be divided as follows:

Books: (Monographs, texts, hand¬


books, compendia, symposia, etc. ) 4000-5000 volumes
Journals: (Serials, continuations,
data services, etc.) 100 titles

Abstract and indexing services 50 titles

78
Others: (Pamphlets, maps, charts,
clippings, reprints, preprints, photo¬
graphs, data from computer or other
sources, galley proofs, catalogs,
standards, technical reports, etc.) 2500-3500 pieces

A corpus of this size should preferably be limited to a single


field although there are valid arguments favoring simultaneous
work in more than one field. The corpus will, of course, be
available in the original document form as procured, and it
should also be reproduced in microform to permit fast,
automatic, local copying and to provide uniform-format,
machine-handle able source material for transmission to
remote locations.

The original documents will require roughly 1000 square feet


of stack space, 2000 square feet of user space (for open
shelves) and a staff of three to five professional librarians,
five clerks, and a number of student employees at a yearly
cost of $100, 000 to $200, 000.

The microform equipment will require 1000 square feet in


the central location (plus, of course, the space used by re¬
mote terminals) and a staff of not more than two or three
people.

The microform machines will perform the functions of:

Enlarging images on a screen for reading.


Making a microform copy.
Making a full-sized hard copy.

There will also be machines for transmitting to remote


locations, which will be discussed below.

For remote-access facilities for documents, we envision


several grades of terminals, ranging from a Touch-Tone
telephone with voice-answer-back to quite elaborate equip¬
ment for textual disolav or a facsimile receiving terminal
to produce hard copy. The more complex stations would
be connected to the library by coaxial cables. The library
would have machines for transmitting images from micro¬
forms to these terminals over the cables.

Something in the order of 50 remote stations will be necessary


to get a reasonable experimental evaluation of remote access.
If these are close together, in, say, a radius of one mile, then
a simple, coaxial network can be provided to connect them. *
More specific cost estimates would require a system design
and are beyond the scope of the Planning Conference.
^Harvard has just planned a similar network, costing about
$100, 000 without any terminal equipment.

79
In addition to transmission facilities, satellite microform
files are possible and attractive document or catalog sources
in many cases. Also, a fast messenger service from the
library is attractive, particularly as an experiment.

Experimental studies of document access will, in general,


start by creating a facility to provide access in one of the
ways described above. The following series of questions
then can be asked:

1) Does it work at all? For example, can


a variety of documents be read on a
cathode-ray tube? What resolution is
required?
2) Will the normal library user use the
facility ?
3) Is it popular with library users?
4) What are its strengths and weaknesses?
5) What are the costs?

Answers to questions such as these may turn out to be more


valuable and more appropriate to Intrex than the rigorous pair
comparisons and carefully balanced experimental designs of
formal scientific research.

Many possible combinations of facilities for presenting docu¬


ments are implied in the overview. (Indeed, the permutations
of possibilities are enormous.) All possibilities should be
considered and the promising ones actually tried. Prelimi¬
nary considerations make four variables seem of primary
importance in describing document access. These four may
in themselves be a sufficient description of any situation.
They are:

The mode of presentation to the user as


a) full-sized hard copy or original;
b) microform plus a reader; c) soft copy
on a cathode-ray tube.
The time between the user's request and
the presentation of the document to him.
The cost to the user to obtain access in
terms of money, his time, and his physical
movement.
The actual cost of providing the service.

An interesting choice exists between two possible methods


for handling the original documents:

80
The documents can be arranged on open
stacks, according to some existing classi¬
fication method.
The documents can be arranged in closed
stacks, according to a classification method,
size, or in acquisition-number order.

The first scheme would permit easy browsing and would in


general display the material in a manner most similar to
existing libraries. However, it will be difficult to measure
accurately the in-person use of documents. The second
scheme would allow a record to be made each time a book
is removed from the closed stacks. However, it would
mean that no comparisons could be made between experi¬
mental systems and traditional open-stack libraries, and
it might modify or inhibit use quite severely. We propose
that the documents be arranged according to a classification
scheme in a book stack which can be open or closed, depend¬
ing on the particular experiment. Furthermore, stacks
might be open to some users and closed to others.

A technical innovation that allows a solution is a device to


record the usage of a book on an open stack. Development
of such a mechanism seems possible, but the best approach
is not altogether clear. Some experimentation would be
necessary before this possibility could be accepted or re¬
jected. Hence, the Intrex Planning Conference cannot make
a recommendation.

Microform System
Microform systems have traditionally served two major pur¬
poses: space compaction (with associated ease of access),
and dissemination of document copies. The future uses of
microforms may encompass a more active role in the informa¬
tion transfer process. These uses include files of catalog
data, informative abstracts or extracts, current-awareness
services, microforms as a primary source for transmission
to soft displays and remote hard-copy production, dissemina¬
tion of document images in microform to personal user files,
and a publication medium. Detailed discussions of possible
microform systems and components are given in the
appendices. Specific choices of system components and con¬
figurations await an initial study of the total systems require¬
ments, taking into consideration the diverse nature of the
input material, user requirements and preferences, forms
of microstorage and access, economic constraints, inter¬
facing with adjunct facilities, and needs for dissemination
among the population of users.

81
The major system components and component functions are
as follows:

Microphotography, A variety of equipment is


commercially available for photography of primary documents.
Specific choice of format and reduction ratios requires both
total systems planning and experimentation. For creation
of microfilm catalog files, photography of computer output
is possible. Computer-compiled data are displayed on the
face of a cathode-ray tube and photographed on either 16mm
or 35mm microfilm. The rolls of film are then processed
and issued in roll form or cut into strips of appropriate
length for the catalog files. An alternative is to microfilm
the fan-fold print-out from a computer output impact printer.
Commercial equipment is currently available for this pur¬
pose and can be used to prepare film records of computer
holdings and compilations.

Film Processors. The central photographic


facility will provide prompt and careful processing of the
microfilm generated either by direct photography or by
photography of computer output.

File Considerations. For small files of images


with limited numbers of indexing terms, it may be possible
and advisable to store the microimages with integral in¬
dexes for search* Either magnetic striping or optical code
patterns can be used for chosen descriptors which are placed
contiguous to the images and are searched by appropriate
logic.

For larger files which may involve large numbers and


varieties of indexing terms, the microimages should be
filed by numerical addresses only, with no attempt to file
by a subject- or author-oriented classification structure.
New acquisitions do not require interspersing within the
existing file, but can simply be added to the end of the file.
If an image is to be voided, note is made of this within the
computer catalog, and no further references are made to
this address in future computer displays.

For access to a document, the computer would specify the


storage address and the automatic selection equipment would
display, copy, print or transmit the addressed image. Either
complete automation of selection (computer-controlled) is
possible, or a semi-automatic selection could be used. Fully
automated systems have the advantage of eliminating manual
handling (with associated time delays), but are at present
expensive, mechanically complex, and generally not
commercially available. Semi-automatic systems would
store the film images in a small number of containers. The

82
document address would specify the appropriate container to
be manually selected. Within the container of film, automatic
search to the specified frame is possible with presently avail¬
able equipment. The semi-automatic system offers flexibility
in that the selected image can be manually placed in any one of
a number of display, transmission, or copying facilities.

Specific choices among these alternatives await complete


systems designs at a later stage of Intrex.

Microfilm Readers. A great variety of micro¬


film reader-printers is commercially available for the various
film formats. Very few, if any, of these were designed with
library applications in mind. A significant contribution which
might be made by Intrex would be the specification, design,
and evaluation of improved reader-printers.

Copying Devices. A variety of copying devices are


required, both centrally and at the remote user locations.
Included are replicating equipment for microfilms, hard-copy
printers from microfilms, and document-to-document copies
at the library and at user locations.

Soft Display. The microform system for Intrex


may utilize video display of document images at the user’s
console. (This should be distinguished from digital display
at the console, as discussed elsewhere.) The video presenta¬
tion must be of high resolution, with from 1000 to 2000 scan
lines per image. Coaxial cable or microwave is required
for transmission. Slower-scan systems requiring less band¬
width and some form of image buffering may some day become
feasible, but, as yet, no clearly satisfactory technology has
emerged.

Facsimile. The soft display console discussed


above permits at minimum the photography of the display
to obtain microform or hard copy at a remote location.
Beyond that, it may be feasible within the Intrex time scale
to obtain experimental facsimile-like attachments for direct
production of hard copy.

Facilities for Remote Access to Documents


The use of a remotely accessible catalog makes practical
many possibilities for remote access to documents. In
general, the catalog will provide an accession number identi¬
fying a desired document and perhaps also give its location.
Armed with this information, the user can proceed in many
ways. He can call the library either by phone or by computer
and ask to have material delivered to him by mail or by special
messenger. The material can be either a microform or a
full-sized document. If equipment is available, he can have
document images transmitted to him electrically; these can

83
either be displayed on a cathode-ray tube or reproduced by
facsimile techniques. Reproduction could be either as micro¬
images or as full-sized text, though the former makes little
sense as it costs almost as much in this case as a full-sized
image. Alternatively, using the accession number, he may
consult a remote microimage store.

To test these various possibilities, we envision a series of


remote-access stations, going from a very simple, cheap
facility to a complicated and costly facility. Naturally there
will be more of the former than of the latter. The simplest
facility might be a telephone which could interact with a com¬
puter via a Touch-Tone telephone and voice-answer-back.
The next might be a telephone with perhaps an associated
microimage file. The next facility would add a teletypewriter
connected to the computer. None of these facilities really
transmits document images. To achieve this, possibly a
cathode-ray tube is necessary. The most expensive facility
would include a high-quality facsimile receiving terminal
(which is discussed in an appendix).

The cost of various existing equipment ranges (in rental terms)


from $3/month for a telephone, $50/month for a teletypewriter,
perhaps $100/month for a cathode-ray station, to about $1000/
month for a facsimile machine. Thus, in an experimental
network, almost unlimited telephones are possible, 100 tele¬
typewriters and cathode-ray stations seem reasonable, but
five to ten facsimile machines would seem expensive at pres¬
ent costs.

It is proposed to transmit images for the cathode-ray tube


and the facsimile machines by coaxial cable. Many bandwidth¬
saving schemes involving either storage devices or digital
coding of text and images are conceivable, but none of these
is now available commercially. Also, most of them require
complicated terminal equipment. A coaxial network over a
small area, such as is occupied by MIT, can be installed for
a modest price. A coaxial system could also be used for
other communications purposes in MIT (such as closed cir¬
cuit TV and communication between computers).

Ordinary telephone lines with added regenerative repeaters


can effectively transmit digital information at rates of about
500, 000 bits/second. Such a scheme is economical for areas
the size of MIT and could provide a flexible, digital network.

One other communication scheme that may deserve attention


is a campus-wide pneumatic-tube system. Such a system is
not expensive to install or to maintain, and would have a high
capacity for the fast distribution of microphotographs and
limited quantities of hard copies.

84
Some Experimental Studies Inherent
in Document Access

The programs that follow are designed to explore the costs,


readers1 requirements, and values attached by readers to
such improvements. General objectives of the programs are:

To provide guaranteed access within a


reasonably short period of time to the
text of documents held by a model
library.

To provide access to the text of documents


in a variety of different forms, with the
reader making the choice of forms, under
suitable restraints or controls.

To provide access to the text of documents


at different rates of speed, with the reader
determining the rate of speed to be used.

To provide a closely observable environment


in which the reader’s choices, his use of
resources, and other pertinent reactions
can be observed and recorded.

To provide remote access to documents at


one of a number of library terminals.

We believe experiments in a real-life environment are re¬


quired, preferably with at least three or four sharply dis¬
similar disciplines. There should also be a reasonably
wide range in the levels of readers to be served in the
sample group and it is recommended that these range from
undergraduate to graduate students and from junior to senior
members of the faculty, as well as research staff.

In general, the experiment is designed to offer the reader


a wide range of choices in the speed of document access, the
physical form of document access, the ability to retain, to
use a loan, to purchase or rent (?) a document, to use remote
transmission or transport and delivery facilities, or, if he
prefers, to use the library in person. Each of these choices
will have a set of related costs. The experiment will be de¬
signed to determine: 1) the true costs of different levels of
service as outlined above; 2) the amount of use by different
readers of different modes and speeds of access in relation
to different kinds of material and for different kinds of reader
objectives, insofar as these can be ascertained. Finally, the
experiment will be designed to collect the expressed subjec¬
tive reactions of readers to different modes and levels of access.

85
It should be noted that the experiment offers a complex array
of patterns from which a reader will be free to choose, rather
than simply a dichotomous situation in technology or access
time. Within the range of possibilities provided by the experi¬
mental library, the following confrontations of reader and
document seem of interest.

The reader may come in person to the


library to consult a document.

The reader may come to the library to


borrow a document for a fixed period of
time.

The reader may purchase a document from


the library or from a bookstore associated
with the library.

The reader may request that the library


copy of an original document be delivered
to him.

The reader may come to the library and,


if the original is not available, may consult
the master microform copy in the library.

The reader may have made and borrow a


copy of the microform.

The reader may have made a copy of the


microform to take and keep.

The reader may request that a copy of


the microform be sent to him by messenger,
either for permanent retention or on a loan
basis.

The reader may request a full-size


photocopy of the document (or of a portion
of the document) to take with him from the
library.

The reader may request that a full-size


photocopy be sent to him by messenger
(immediately or delayed).

The reader may exercise the option of


making full-size photocopies on some of
the enlargment or photocopy equipment
located elsewhere in the institution.

86
The reader may request the display
of microform images on a remote CRT
console.

The reader may request the printing of


a facsimile copy from the master micro¬
form on a remote apparatus.

The reader may request the display of


the text of the document from the digital
store at a remote station.

The reader may request print-out of the


text from the digital store at a remote
station.

In the options outlined above, the experiment might provide


certain services as, for example, the consultation of a
book in the library, or the borrowing of a book from the
library, ’’free”. Services beyond this point would be
charged for, with the rates being structured to reflect
approximately the eventual incremental costs of such special
services to the institution. Whether the participant uses
real money or a budgeted balance of "free" money assigned
to him is immaterial. The important restriction is that
there should be a limit to the amount of ,TmoneyM available
to any individual in any finite period of time.

The purpose of a charge on services is designed to prevent


unrealistic personal exploitation of the institution's resources
and to relate reader's choices, at least approximately, to
the costs of the services and the judgment by the reader of
the value of the improved service he is requesting, in terms
of speed or form of access.

As examples in the application of these techniques, we can


envisage a situation in which the reader has consulted the
resources of the library by means of a computerized catalog,
has found the document that he wants and then presses a
button, "Advise on Availability". The computer would
respond with the price of various options, including a
special messenger to bring a book, a regular messenger
trip 24 hours later, a copy of the microform by special
or by regular messenger, a remote display, a hard-copy
print-out, etc. The reader would elect the option he pre¬
ferred and so indicate on his computer console, from which
the library would be advised of the necessary responsive
action required.

It will be noted that the above outline presupposes that all


the relevant documents are held in either the model or

87
the MIT library. This can be a condition of the experi¬
ment, but one could also provide a special staff for the
aggressive pursuit of materials not locally held. The same
kinds of charges could be used for the support of such services.

The selection of the experimental participants should not be


too difficult, once the fields are chosen. We would recommend
selection of some percentage of the student, faculty and re¬
search population in the field of each model library. These
should not initially be volunteers, but rather be invited on a
random or some other independent basis. We assume an
experimental group of from 100 to 500 may be both desirable
and feasible. With this experiment, a carefully chosen, non-
participating control group is probably not necessary, though
it would certainly be valuable to have a precise analysis of
library use by such a non-participating group as a basis for
comparison with the use patterns of different segments of
the experimental group.

These experiments must be accompanied by a carefully de-


si gned system of record keeping. Insofar as the computer
catalog is concerned, requests for documents in the library
are made through it, hence record keeping can easily be pro¬
vided. Furthermore, since the user will be asked to request
in what form he wishes the document, that record also will
be kept by the computer.

The monitoring of the TV display should also be considered


if a user can call up documents without using the computer
for ordering and if the TV display can automatically re¬
trieve at his command. If this degree of automation is pro¬
vided, it should also be possible to add record keeping. On
the other hand, if the TV display is served only by manual
loading of documents by a librarian, the record keeping
could be accomplished by somewhat less automatic means.

The above discussion includes a great number of potential


experiments. To make their nature more concrete we will
present more details for one particular experiment — the
Assured Access test. Its objectives are:

To guarantee to users access to any


document held by the library until it
is officially withdrawn.

To determine user preference, as


between use of original document in
the library, and purchase of a copy
to take away.

To observe and evaluate reader re¬


actions to this type of service and to
accumulate cost data.

88
The "joys" of this library are the complete absence of losses,
of circulation records, of the pressure upon readers to re¬
turn books, plus the assurance to the reader that the book will
be in the library.

In this experiment, the model library will discontinue all


lending for use outside the library. . It will inform the clientele
that all original documents must be used in the library, but
that prompt copying service will be available in place of
loans. Rates charged for copies would be one of the prime
experimental variables. They would be varied from nothing
to the full cost of copying. The users’ behavior and satis¬
faction with the service would be observed as a function of
this cost.

Facilities would be available to provide either microform or


full-sized copies of documents. Different charges for these
would be used to measure their value to the user.

The details of the service could be arranged in a number of


ways. As an example, one procedure could be as follows:

Journal articles: Copies would be made on


request at X cents per page or X cents per
microform containing multiple pages.

Research reports: Small stock of duplicate


of popular reports is on hand for immediate
distribution. Copies are made on an establish¬
ed pricing scale, or more reports are pur¬
chased to maintain stock as necessary.

Books: Library would maintain a stock of


originals of books in print for sale at
regular retail price, less the cost of lending
a book. It is recognized that estimating the
stock of books to be available for sale would
be difficult at the outset, but it should be
similar to that of a small bookstore. After
some experience, this should not be a
serious problem. A more serious objection
may be mixing a business with a library which
is ordinarily a non-profit and tax-exempt
institution. This difficulty can probably
be overcome by having an external bookstore
do business in the library. In any case, the
result must be that the customer can walk
out of the library with a purchased book
under his arm.

89
In this test the copies might be full-size hard copies or
microforms; or, for a given period, all copies might be
hard copies; for another period, all copies might be in
microform. Reading devices would be loaned by the
library.

This test should help to decide whether readers are


sufficiently concerned about having assured access to a
wanted document to:

Accept the possible inconvenience of


using it in the library, or

Pay a (modest?) price for a take- away


copy.

90
2. INTEGRATION WITH NATIONAL RESOURCES

It is a basic assumption of this investigation that a student or


scholar needs access to recorded information which only he
can select, regardless of its form or geographic location.
This access is dependent upon:

1) Availability of the desired records, stored


in libraries and information centers as books,
journals, reports, tapes and other media;

2) The existence of bibliographical and intellec¬


tual records, describing the desired informa¬
tion and stating its locations;

3) Procedures and facilities for making the


material available to the user.

These elements are currently divided between the local library


and a large voluntary national and international network. This
informal network has certain characteristics which make the
development of better network channels essential if informa¬
tion is to be accessible efficiently and on a broad base. It is
the general objective of this investigation to experiment with
possible features of a system for the better utilization of both
local and global resources.

Among the most conspicuous deficiencies of the present net¬


work is the lack of adequate local subject analyses of recorded
literature. The local libraries simply cannot undertake dupli¬
cative, detailed, intellectual analyses of all the materials
their constituencies must use. On the other hand, it is evi¬
dent that these analyses can be undertaken by national libraries,
by special information centers, by professional groups, and
by certain kinds of special libraries.

Any local library can expect to meet only a portion of the


total informational requirements of a constituency with broad
needs. The remainder must come through access by borrow¬
ing, photocopying, or other means to the holdings of national
libraries, regional or national "backstopping" libraries, or
other cooperating libraries. An alternate mode of textual
access may develop in the form of high- or moderate-reduction
microforms of large blocks of material widely distributed.
Much developmental work is necessary before this can become
economically feasible and attractive from the point of view of
the institutional library.

The bibliographical analyses of many information centers are


widely distributed through printed indexes, abstracts and
bibliographies, but it is evident that these bibliographic records

91
often do not provide the level of analysis that is needed and
that may eventually be provided by modern computers in com¬
bination with new techniques of analysis.

One way to employ these powerful search techniques would


be for each information center to supply copies of its tapes
to every interested local library. This would be a costly
form of distribution, difficult to keep fully current. Many
local institutions might not have search requests in sufficient
volume to justify efficient local computer searches.

Substantial, though incomplete, locational data for books and


journal titles already exist in the printed National Union Cata¬
log, the retrospective National Union Catalog, and the Union
List of Serials. None of these sources is presently based
upon machine-readable data, but such conversions are being
considered and should not be difficult.

OBJECTIVES

There is probably an optimum division between the services


and resources to be generated and maintained locally, and
the services and resources that can be provided by efficient
access to national or regional networks. This experiment is
designed to explore some of the important variables associated
with a network-access scheme that might find general use.
Experiments will be carried out with high-speed access to
one or more national information centers and provide quick
bibliographic searches in depth that would not be possible
locally, with any reasonable level of expenditure; to provide
a broader base of full-text access than could reasonably be
developed locally; to analyze the detailed costs of such access;
and to test a new pattern of remote computer literature searches.

ASSUMPTIONS

We assume there will exist at MIT an experimental on-line


automated catalog with dialogue capability; an effective liaison
with one or two specialized information centers, which would
provide for the prompt exchange of information; a communi¬
cation facility from MIT to the selected information centers;
a full, up-to-date set of descriptor terms in the local com¬
puter memory for the analysis of each body of literature;
locally available instructions for the specification of the various
types of searches; and printed bibliographies from each of the
selected information centers, either in original, or in (a com¬
puter-controlled? ) microform display device.

TYPICAL EXPERIMENTAL USE

The user enters into a dialogue with the Intrex computer,


through which he defines or specifies his field of interest;

92
depending upon the field and the nature of his inquiry, the
computer may provide a reference from the locally stored,
general bibliographical data, or refer him to the printed
index-catalog of an information center. Alternatively, the
system may assist the user in determining a set of the des¬
criptors, suitable for his needs, for a custom search of the
bibliographical data stored at an appropriate information
center. This search specification is first checked by the
computer to make sure that it is a feasible search and re-
checked to make sure that it does not duplicate an analysis
already available in a printed index or some other form. The
search specifications are transmitted in a form suitable for
direct computer input to the appropriate cooperating center.
The reader is then informed when the results of his search
will be available. The results are transmitted back as soon
as available and go directly into the user's file without inter¬
mediate print-out. The user may inspect the results, and if
the national locational information has been automated, may
request a locational search of the bibliography, or a portion
of it, to indicate availability. With this information at hand,
the reader may request the text of those items he desires
from the most suitable source.

It is also desirable to incorporate and test a selective


dissemination capability in connection with the network con¬
cept. Such searches would be run at the information center
computer and data would be placed in the user's personal
file in the Intrex system. These searches would be based upon
standing requests by the user, or upon information developed
by the system concerning each user.

GENERAL COMMENT

This system should be able to test in a real environment the


user's ability to design searches for various systems, without
unduly burdening the information center. It must allow each
center to run such searches on-line, or in batches as may be
convenient.

The experimental procedure suggested is not limited to any


one information source, but would be a direct, inexpensive
interface with any citation retrieval service. The local users
would converse only with the Intrex service which would be
linked eventually with a number of centers.

SELECTION OF INITIAL INFORMATION CENTERS

It is suggested that this investigation first utilize the services


of two major specialized information centers. To generate
enough use, it will be highly desirable for the liaison centers
to have a reasonable subject match with those individuals who

93
have Intrex consoles — though some supplemental consoles
will also be desirable. We propose that the Medlars biblio¬
graphical capability at the National Library of Medicine, and
the NASA information system be used. Medical literature is
of increasing interest to MIT and Medlars covers this vast
field in great depth with a very fast service. The NASA system
also covers a broad field of mission- and subject-related
literature of exceptional interest to MIT. Both have batch-
search capability and both issue extensive bibliographies in
both tape and printed form. These two offer interesting con¬
trasts in types of users, types of literature covered, and
types of searches that are possible. Furthermore the exper¬
iments should help both centers to modify and improve their
services.

OTHER USERS

It is desirable to tie, into a network test, a variety of users.


A capability for Medlars searches would be extremely desir¬
able in the Harvard-Boston medical community, and this
access path could also tie directly into the Harvard-Yale-
Columbia computerized combined medical library catalog
project for full-text availability. Therefore, one or more
additional consoles should be placed in this medical commun¬
ity that could, through local or Intrex computers, have the
same dialogue-search capability. An identical arrangement
is proposed for linking a test console facility for access from
(and to) the NASA Cambridge Electronics Center. Additional
consoles should be installed at both the NLM and NASA infor¬
mation centers.

TELETYPE LINKS

In addition to use of computer-based national bibliographic


services, an effort should be made to tap other national in¬
formation resources. This might be accomplished by install¬
ing teletypes in these facilities. Then a user could utilize
the Intrex system to switch him to a line which will allow him
to discuss his problem, via teletype, with a librarian at an
appropriate remote national resource. Such a rudimentary
cooperative program might be phased smoothly into later,
full computer-based networks.

OBSERVATIONAL OBJECTIVES

The data to be expected from these experiments require very


detailed observation of the uses made of the facilities, the
kinds of searches made, the reasons for search failures, and
very detailed analysis of costs.

The proposed system is clearly a conservative approach to


long-range network possibilities. The proposed investigation

94
permits significant alterations in both local and information
center operations without impairing operations or engender¬
ing costly revisions of software or hardware. It is clear
that netting of libraries and information centers on a much
wider and deeper basis than at present will be essential. It
is also clear that many experiments of the type suggested
here must be performed before an adequate engineering
approach to networedesign will be possible.
3. FACT RETRIEVAL

In Chapter IV, a new form of man-machine partnership has


been anticipated. In this future system, documents might
completely disappear, and the interaction between man and
computer might be on the level of facts and ideas. It is not
inconceivable that a user ultimately (say in 1999) might be
able to ask the system, nWhat important new developments
have there been in the theory of social interactions?11 and
receive a clear, carefully constructed, 2000-word essay
discussing the new ideas that have been developed and entered
into the system. In a continuing dialogue, he could pursue in
depth those that caught his interest, developing them still
further in partnership with the active information store.

It is impossible to put a precise time scale on the development


of this symbiotic system, but it is clear that research on com¬
puter manipulation of facts and ideas is an important avenue
to future information transfer systems. The Planning Confer¬
ence considers it important that Intrex invest a substantial
fraction of its resources in pursuit of this longer-term objec¬
tive. This area of more distant research must balance the
significant, but more conservative, goals represented by the
construction of the model library and the participation in the
national citation and document retrieval network.

To enter this vital area of research, a starting point and set


of sub-goals must be chosen which will act as a seed for the
growth of more advanced systems. Within the purview of fact
and idea retrieval, it is clear that we can address problems
with a wide variance in difficulty. The input might vary from
structured indexed facts, through structured text, to complete
documents in free format. Similarly, demands for output
might vary from well-specified names of facts, through struc¬
tured approximate descriptions of the facts desired, to ques¬
tions entered in natural language. Finally, the output might
range from labeled data to coherent essays constructed upon
demand. Thus this broad area includes problems of input
content analysis, fact and idea organization and storage, deduc¬
tive systems, problem solving, language analysis and synthe¬
sis, text manipulation, specification of computational algorithms,
man-computer interaction languages, and many others. Ulti¬
mately, it involves the construction of an artificial intelligence
system of the highest quality.

We have proposed, as a rudimentary entry into this complex


area, several steps toward the implementation of an automated
handbook. These will include: an automated index to some

96
currently published handbooks; storage of the facts contained
in sections of selected handbook(s); access to these facts,
including sophisticated searches for information; and mainte¬
nance of an automated notebook or data bank, continually up¬
dated with current experimental information, which can be used
for a multiplicity of purposes or users. We assume as a basic
facility for these projects a large, general-purpose, time-
shared computer system such as that in use at Project MAC;
such a system would undoubtedly be at the heart of the model
system. Attached to the system will be a variety of consoles
to facilitate delivery and display of requested information.

THE AUTOMATED INDEX

A first step toward automatic fact retrieval is a program to


assist a user to locate in currently published handbooks the
subsection containing facts he desires. A novice in a field
(and sometimes even an expert) may feel sure that published
information exists which will be pertinent to a task he is en¬
gaged in, and yet not know what sources to try. Currently,
he either asks the reference librarian, regenerates the needed
data himself, or gives up. An automated, merged index of all
the available handbooks relevant to a sub field (perhaps even
the subfield of the model system) would provide a user an ave¬
nue into the bewildering forest of available reference material.
A user who asked 11 What is the melting point of Dragon’s Blood?"
might be answered "Tables are available of melting points of
metals, alloys, resins, organic compounds, ... Which cate¬
gory best fits this substance?" Answering "Resin", the user
would be told that "Melting points of resins appears on p. 1508
of the 44th edition of the Handbook of Chemistry and Physics
and on p. 737 of ... " Turning to the copy of this handbook
next to the console (perhaps on microfilm), the user retrieves
the desired fact, T = 120°C.

Continuing the development of the index, one could create more


detailed index supplements for each handbook. Such supple¬
ments would make it much easier for a user to determine if
the section referenced would really be of use to him. With
extensive hierarchical cross-indexing, and with the books imme¬
diately at hand, a user can engage in a dialogue with the system
and locate his desired fact, perhaps from references he didn't
even know existed.

In order sensibly to employ such a facility, it would be impor¬


tant to design and implement a sophisticated interaction language.
Many users come to the reference librarian without knowing
exactly what facts they are really looking for; a cleverly de¬
signed dialogue should be able to extract this information from
the user. Such "programmed interrogations" will be useful
not only in the context of fact retrieval as we have presented

97
it here, but will probably be a necessary adjunct to all the man-
computer interactions in the model system. It is imperative
therefore that the design of interaction languages be an early
goal of the Intrex project. In the interim, before the ultimate
interactive system exists, one command, "HELP”, should
connect the user to a human reference librarian.

The initial cost of this first fact-retrieval experiment is diffi¬


cult to estimate. However, the incremental cost of processing
requests should be competitive with utilization of published
handbooks (if we count the users’ time as part of the cost),
except for the simplest look-up’s in a section already known to
the user. It would be a continuing project; the first useful
programs would probably be available within six months to a
year of initiation of work.

Access to this "automated index" program would be provided


through consoles placed in the present MIT libraries, close
to the reference librarian. Doing so would provide a direct
comparison between the old and new services, yielding infor¬
mation concerning the entire library community. In addition,
consoles can be placed in strategic locations throughout the
MIT physical plant, cojoined with microfilm readers and micro¬
filmed collections of the appropriate handbooks themselves;
this arrangement would provide some information about the
desirability of such remote service.

THE AUTOMATED HANDBOOK

As the next step toward fact retrieval systems, the contents


of a particular handbook or sections of selected handbooks will
be put into digital form for direct access and manipulation by
computer. The choice of material will probably be influenced
by the choice of the field of specialization of the model system.
However, a general handbook such as the Handbook of Chemis¬
try and Physics would provide more experience with a variety
of forms of "facts". Some of the advantages of this completely
computerized data store over a conventional printed handbook
would be: currency, versatile organization, computation with
data, output format control, and depth of detail in data. Each
is discussed below.

(1) Currency. Handbooks and data stores should be continually


updated and edited. This could be done at low cost for an on¬
line data store. New or more precise values for physical con¬
stants could be entered as soon as they are approved by an
appropriate regulatory body; before that, they could be entered
labeled with source and date. In rapidly changing fields, such
as materials science or nuclear physics, this feature would
be of great value. For social science, current data could be
utilized as soon as available and compared with earlier studies.

98
(2) Versatile Organization. At present, one can conveniently
answer questions only about combinations of variables chosen
for printing by the original publisher. In an automated hand¬
book, it would be quite reasonable to ask, say, for a list of
melting points of metals with shear strength above a certain
level and density between two limits. This kind of question
can be answered from current published handbooks, but only
after arduous labor by the user.

(3) Computation. At present, published data are in a passive


store where they can be retrieved, but in which no computation
is possible. Often the facts required by a user are computa¬
tionally related to those in the published compendium. Fre¬
quently formulas are given, but it is left as an exercise for
the reader to evaluate the formula with appropriate substitu¬
tions. This practice may require a user to descend through
several levels of auxiliary formulas before the basic data
available can be used. This type process could easily be auto¬
mated in an on-line, computer-based model system.

(4) Output Format Control. Often data are published in a for¬


mat that makes them difficult or impossible to use. For exam¬
ple, with a computer-based system, one could request plots
of information usually stored in tabular form. The particular
parameters of the graph might be subject to wide variation
according to the purposes of the experimenter. With further
programs, one might even present three-dimensional informa¬
tion in stereographic form, achieving effects unavailable in
present handbooks. Of course, this type output has strong
implications for console construction and program design.

(5) Depth of Detail. Current publications usually contain only


summaries of data, because it is too expensive to print many
copies of the raw data. Moreover, it is difficult to extract
useful information from large quantities of printed text. This
is discussed in more detail in the section on the automated
notebook.

Implicit in the discussion above on the advantages of a comput¬


erized handbook are design criteria for input-output character¬
istics of such a system. An integral part of this work will be
the design of the language of interaction between the user and
the automated handbook. It must be flexible enough to encom¬
pass requests for storage of information, retrieval specifi¬
cations, algorithms for computation, descriptions of desired
output formats, and references to the data store. (The possible
structure of interaction languages is described in more detail
in an appendix.)

Another serious research problem embedded in this project


is the determination of efficient structures for storing informa¬
tion in computers. A straight encoding of all the text of the

99
Handbook of Chemistry and Physics would occupy approximate¬
ly 10^bits of storage. This does not include the necessary
deep indexing discussed earlier. However, by matching the
storage structure to the information to be encoded, radical
reduction in storage requirements can be achieved, concurrent¬
ly with reduction in retrieval time. For example, the 20 pages
of tables of square roots, cube roots, etc., can be replaced
by a very short (and fast) computer program to compute re¬
quested values. Similarly, many pages of tables of experimen¬
tal data can often be compressed to a single (perhaps too com¬
plex for human use) empirical equation.

A third problem impinging on the effectiveness of a fact re¬


trieval system is the design of consoles. Although this is
discussed in more detail later in this chapter, it would per¬
haps be appropriate to indicate the range of the problem. On
the simplest end, it would be quite feasible technically for
every student to have available in his room a 12-button Touch-
Tone telephone which can communicate with the computer.
He could then use the phone as an input for a remote desk-
calculator, an admittedly very simple (but important) case of
fact MretrievalM. Toward the other end of the spectrum, three-
dimensional displays may be available. Similarly, we may
have equipment for creating hard copy from video display of
information resulting from a complex retrieval and data analy¬
sis process. Between these we have teletypewriters and
simple CRT displays in many combinations. Careful study
and human engineering will help match the input-output char¬
acteristics with the fact retrieval capabilities of the system.

THE AUTOMATED NOTEBOOK

If we now augment the information found in the automatic hand¬


book with informal experimental data, a new range of services
can be considered. Perhaps this might be referred to as an
automated notebook, a sort of personal or group data handbook.
With this facility, for example, it would be possible to decom¬
pose data from different sources into comparable small units.
For example, votes are reported by election district; census
information by town and county; economic information by
industry; weather by weather station etc. Only if we have
sufficient detail can we correlate agricultural production with
rainfall and population.

Efforts in this area should be coordinated with the data-bank


project currently under way in the Social Sciences at MIT
(and described in an appendix). This experimental system
will include public-opinion poll data, census data, voting data,
and life-history data. The term Mbankn is used to indicate
that information may be deposited and withdrawn from the
store at times determined by users of the bank, i.e., irregu¬
larly. It also implies a multiplicity of depositors and users.

100
To be effective, a data bank must make it possible for users
to interact with it without having precise knowledge of the
organizational principles that determined the form of storage
of any particular subject of shared data. Interaction here
means both depositing information and its retrieval and amend¬
ment. In particular, a user must be able to retrieve data
organized in one way and have it presented to him in quite
another way. Thus the data bank will contain not only data in
the ordinary sense, but programs to manipulate and reshape
the data. It is obvious that development of this data bank will
necessitate an attack on many problems faced throughout the
model system, and that cooperation would profit everyone
concerned.

FUTURE RESEARCH

Fact retrieval is an open-ended problem which encompasses


the organization, storage, digestion and regurgitation of all
of human knowledge. The experiments proposed will suggest
their own extensions. For example, retrieval of formulas,
and subsequent evaluation of sets of formulas for appropriate
boundary conditions, will lead quickly into development of a
complex computer-aided design system. The necessity for
human intervention in organizing the store of facts suggests
development of content analysis programs. (See appendix H on
this topic.) This implies large-scale text-handling facilities;
a by-product might be library services including in-depth in¬
dexing, automatic extracting and abstracting, or concordance
generation. The existence of large quantities of digital text
will encourage research in content analysis both at MIT and
elsewhere. Research in fact retrieval by Project Intrex is
clearly a necessary step toward the on-line intellectual
community of the future.

101
4. INITIAL FACILITIES

Initial facilities are needed to support the core experiments


previously described. The unknowns are not really the char¬
acteristics and costs of equipment but, rather, the require¬
ments for equipment. In addition, an operating system is
needed in order to develop a system plan and, by an iterative
process, software for the efficient use of the planned facilities.
Since the hardware of 1975 will not be available in 1965-1970,
it will be necessary to simulate much of the 1975 facility,
and, in this simulation, flexibility will be very important.

THE COMPUTATIONAL FACILITY

A computer complex is one of the main system components


of the library as an on-line system. No set of humans could
maintain the required supervision. The computer complex
must, furthermore, be a time-shared one in the MAC sense,
because many users must be permitted to interact with the
library simultaneously and with a minimum of mutual inter¬
ference. A system of the general power and functional capa¬
bility of MAC (1965) or its equivalent at the MIT computation
center is a minimum initial requirement, for no realistic
experiment can be carried out with a smaller population than
that now being served by the time-sharing systems (approxi¬
mately 30 simultaneous users), and no such population could
be served without the computer storage, processing, and lan¬
guage base that such a system makes possible. Even if the
present time-shared system were to be devoted entirely to
Intrex service, the range of library services that could be
delivered would soon be inadequately matched to stated Intrex
goals. Lest this sound too pessimistic, it must be remembered
that the IBM 7094 machine which is the heart of the current sys¬
tem was not designed to be a time-shared computer but was
made into one by means which should never again prove neces¬
sary. The next-generation time-shared systems — one based
on a General Electric 645 computer and the other on an IBM 360
machine — are designed as time-sharable systems from the
ground up, and will prove considerably more capable than
their predecessors (e.g., approximately 150 simultaneous users).

If it is agreed that only a large-scale, time-sharing facility is


sufficient to support an Intrex effort, the question arises as to
how Intrex can obtain the services of such a system. Intrex
can either acquire and maintain its own time-shared computer
system, or become a user of one of the time-shared systems
currently existing or emerging at MIT. That Intrex should own
an independent, large-scale computer system is inadvisable on
the grounds that its acquisition and maintenance would prove
too costly, and Intrex management ought not to be burdened with
the heavy task of operating a computer center. Intrex should

102
therefore look either to MAC or to the time-sharing facilities
provided by the MIT Computation Center for its computational
support. However, MAC may prove an intractable host, for
MAC is itself an experiment and, as’ such, must maintain its
own freedom to an extent which might make serious, long-
range Intrex planning impossible. Intrex will require a stable
computer service. The MIT Computation Center might pro¬
vide this more stable base. On the other hand, many projects
supported at MAC, such as console hardware and software
development, design of interactive languages, and computer-
aided design systems, maybe usefully coordinated with Intrex
efforts. A final decision on the choice of a shared facility
depends on administrative and technical questions beyond
the scope of the Planning Conference.

The Computation Center presently possesses an IBM 7094


machine which, together with its peripheral equipment, is
operable as a time-shared system exactly as is the MAC sys¬
tem. It is planned that the 7094 complex of MAC will be
phased out early in 1966 and that the machine complex thus
made available may become coupled to the existing Computa¬
tion Center facility. Such coupling would about double the
service capacity of the Computation Center. However, the
equipment of the Computation Center will itself be operation¬
ally replaced by an IBM 360/67 system in about two years,
i.e., by about September 1967. Intrex planning must there¬
fore reckon with a transient in this computational flow. In
any event, it would appear rational to divide the discussion
of the computational facility into two parts. These two phases
must, of course, blend smoothly, one into the other. They
cannot be considered independently even if they must some¬
times be discussed separately.

The computer-based work during the first phase of Intrex is


bound to be influenced to a significant extent by the knowledge
that the main computational instrument will vanish at the end
of that period. Work carried out on that facility should there¬
fore be of a kind that will either not have to be re-programmed
or that will benefit from re-programming. Perhaps a third
category, that of machine-independent programs, ought also
to be mentioned, but it does not promise to constitute an im¬
portant fraction of the work; this judgment rests on the fact
that currently there are no machine-independent computer
languages in wide use and none that promise to survive over
a long term.

There may be programs that are written with the sole objective
of learning something from their composition and their exer¬
cise, and with no implied issue of survival. One might, for
example, construct an experimental catalog, its associated
search and retrieval mechanisms, and certain data-logging

103
and tracing machinery (instrumentation) which would be un-
economically slow in any real operational sense. But the
sure knowledge that this catalog would never be required to
evolve into an operational tool would allow significant flexi¬
bility, contributing to the range of experiments that might be
carried out with it. Simulators and computer models could
also be abandoned once they had yielded their results.

Project Intrex should begin to develop special equipment which


could be attached to the existing facilities during the first two
years. But an independent and free-standing computer system
(e.g., one of the GE, PDP, SDS, CCC, or IBM families of
small to medium systems), coupled to a very large disc file
or data-cell storage, might also be a reasonable tool to
investigate. The main consideration in any such venture
should be that such equipments be acceptable to the future
shared-system facility.

An overriding consideration in the project’s use of the MIT


computer facility is that Intrex must have access to that
facility as a matter of right, not as a favor or as a conse¬
quence of some informal gentlemen’s agreement. Intrex
should purchase this right by means of outright financial sup¬
port, or by contributing equipment, or by a combination of
these two strategies. A piece of equipment that Intrex might
usefully consider contributing might be an additional disc file
to be installed quite soon.

The first two years will yield considerable insight into the
ultimate configuration of the modelJLibrary. It is certainly
to be hoped that the adequacy of typewriters. Touch-Tone
telephones, and more sophisticated consoles will have been
tested during this initial period, and that over-all system
design criteria related to these instruments (their placement,
numbers, etc.) will have emerged. A central question will
be how to couple these and other equipments (e.g., the free¬
standing computer system mentioned above) to the opera¬
tional system which will then exist. Granting that Intrex
must have access to the computer facility's machinery
as a right and that such right must be purchased, the idea
emerges that Intrex contribute an additional processor
and disc storage unit (with required input-output controls)
to the computer system. This would enhance the computa¬
tional power of the facility in an over-all sense, thus bene¬
fiting not only Intrex but the entire MIT computing community.
(It might even prove possible occasionally to decouple thelntrex-
contributed subsystem from the over-all system and to run it
independently; whether this proves feasible or desirable cannot
be determined at this stage of the planning.)

Intrex should look to MAC or the MIT Computation Center as


the main source of its computing power and rapid-access digital

104
storage supply. The over-all Intrex facility will probably
have to be augmented by a free-standing computer with its
own storage, as well as by sets of specially designed con¬
sole^. However, such instruments ought, so far as possible,
to be capable of being coupled to then-existing or contemplated
MAC or Computation Center facilities. Participation on the
part of Intrex in a shared facility ought to be purchased as a
contractual right, preferably through Intrex1 s contribution
of equipment and also by means of direct financial support.
An advantage of such an arrangement is that Intrex gains
immediate access to the whole time-sharing apparatus which
now exists, including the current network of typewriter con¬
soles. A long-term benefit to Intrex is that Intrex manage¬
ment is freed from the burden of operating a computer cen¬
ter. The community of users of this shared facility will
benefit by the availability for certain purposes of very large
digital and image stores and by the console development
efforts of Intrex.

SOFTWARE

An important consideration that prompts the above recom¬


mendations is the desire to avoid the necessity for Intrex to
develop, operate and maintain still another time-shared
computer system on the MIT campus. The plan to couple
to MAC or the Computation Center’s system does remove
the need to do this. Such coupling, furthermore, makes
available to Intrex the vast store of programs that have al¬
ready been shown to be an inseparable part of any powerful,
time-shared computer system; among such programs are
the system monitors, language compilers and assemblers,
editing routines, filing and retrieving programs, and so on.
In addition, as the number of other subsystems making use
of this time-shared facility grows, many opportunities for
ever more-global systems integration will present them¬
selves. It is certainly foreseeable, for example, that an
Institute-wide, computer-based student, faculty and staff
registry will some day become operational. Such a (per¬
haps dynamically maintained) store of information will prove
one of several important bases for the future library’s task
of selectively disseminating information. The provision of
these operational and maintained software facilities should
permit the Intrex team to concentrate on the development of
programs specifically directed to Intrex goals while, in fact,
already being a member of a significant on-line intellectual
community. The many programs available directly from
the time-shared system cause it to be a very important part
of the facility available to Intrex.

The development of programs directed specifically to Intrex


is analogous to the development of specifically Intrex-oriented

105
hardware such as microform readers. The analogy breaks
down, however, in that certain functionally correct elements
of operational hardware can be purchased as off-the-shelf
items immediately. Instruments so acquired often come with
manufacturer’s warranties, operating instructions, etc. How¬
ever, Intrex software must be designed, compared, manufac¬
tured, debugged, documented, and ultimately maintained by
Intrex itself. Intrex management is consequently faced with
the task of '’bootstrapping" from the existing software facility
to a more appropriate software facility. Fortunately, this
sort of problem has been faced many times within the com¬
puter community. The problem will be eased for the period
of Intrex by the availability of high-level languages in which
much of the Intrex programming can be written as macro-
instructions.

STORAGE

Adequate storage must be provided to enable significant ex¬


periments to be carried out on both the augmented catalog
and full text. The first requires a large digital store, the
organization of which is a matter for experiment, while the
full-text file can be done with an analog store (photographic
images or magnetically recorded video scans). Factors to
be considered are the continually decreasing cost of storage
with time and a sharply increasing size of storage available,
and, typically, a 12- to 24-month delay between placement
of an order and installation of the equipment.

Naturally, the Intrex experiments need not deal with the whole
of recorded knowledge, and it seems reasonable to use as a
data base in a restricted field about 5000 books and 100 jour¬
nals, the entire contents of each of which may since its incep¬
tion approximate 2500 pages of text. Thus, at 300 pages per
book we will have about 1.5 x 106 pages of books and 3 x 10b
pages of journals to put into the experimental store.

The "Digital" Store

The augmented catalog initially might contain about 2 percent


of the information represented by the books and about 4 per¬
cent of that represented by the journal articles. Thus the
equivalent of 30,000 plus 12,000 or about 40,000 pages at some
1000 words per page should be available in the augmented
catalog storage. At some 50 bits per word, this is about
2xl09 bits of storage, not all of which requires equally rapid
access time. Magnetic-core storage is available now for
2 x 10 ‘ bits and microsecond access time ($200, 000/year).
Magnetic-disc stores currently give perhaps 2 x 108 bit
storage capacity, 75-millisecond access time and 2.5 x 106
bits per second data rate ($60, 000/year), and magentic data-
cell memories can be ordered for 3 x 10 9 bits, 175-millisecond

106
access time and 106 bits per second data rate ($36, 000/year).
It is thus clearly possible to store the information for the aug¬
mented catalog in digital-coded form. The access time for
each of the units seems tolerable as well, for as many as 10
to 30 users conducting active searches. It appears desirable
to put the journal-related augmented catalog data largely into
the disc store and the data dealing with books largely into the
data-cell system. At 2.5 x 10° bits per second, it will take,
for example, about 20 seconds to search all titles of articles
or books in the store.

The "Image" Store

Some 2 x 106 pages must be recorded in this store, with an


access or reproduction time suitable for on-line interaction
and browsing simultaneously by a set of 10 to 30 users. For
Intrex work before 1970, there is no hope of storing this ma¬
terial in character-coded form, and the two possible types of
storage are as photographic images or as a record of a video
scan. Character-coded storage, which is more efficient, can
only come about through the production of character-coded
manuscripts in the process of publication and typesetting, or
by the use of optical character readers. Neither of these two
sources will be available in large amounts before 1970.

The video scan of black-and-white material requires about


2 x 106 bits per page, and our 2 x 10^ page library would thus
take up 4 x 1012 bits of storage. The advantage of video scan
would be the flexible use of the store by the computer and
transmission via coaxial cable without need for further scan¬
ning or conversion. For a file of this magnitude, storage of
the video scan will not be feasible during the Intrex time period.

Photographic-image storage on film chips or microfiche is


already available with a few-second access time, but not in
general with the desired combination of delay time (a few
seconds for first the page plus fractional seconds for follow¬
ing pages) and store size (about 2 x 106 pages). Large sys¬
tems exist which are capable of handling photographic images
of documents at 60:1 linear reduction and thus are capable of
storing, for instance, about 150 pages per chip or the entire
2x10° page model library in 15,000 chips. Certainly for the
model library, reductions beyond 20:1 or 60:1 are unneces¬
sary. In addition, the flexible on-demand copying of micro¬
images onto silver halide or diazo with a delay time of sec¬
onds has also been demonstrated, and it would seem likely
that there will soon be available a store of adequate size and
with about one-second delay to retrieve the image of the first
page, with flexible microform copying and certainly eventu¬
ally with the provision of hard copy on demand. What may
not be available commercially is the coupling of the image

107
store to a video scanner which will allow the flexible electri¬
cal transmission of the image to consoles nearby or across
the campus. It may be necessary to use developmental hard¬
ware for this application, but it may be desirable to do the
video transmission via a temporary storage device, which
may be a magnetic disc, etc., in order to allow the serving
of many customers with a single scanning station, without
holding a page in the scanner beyond the time required for
a single scan.

TRANSFER OF LIBRARY INTO THE STORE

Some 4 x 10^ page equivalents of augmented catalog must be


incorporated into our digital store. Typing into magnetic
tape or into a time-shared computer may be the cheapest way
to convert this information to coded form at costs of perhaps
$3 per page equivalent (1000 words). Thus, of the order of
$120,000 and 40,000 man-hours will be needed for this job.
To do it in a year will require 20 full-time or 40-100 part-
time typists. A few full-time typists will thereafter be able
to keep up with the accession rate of this model library.

The conversion of the document file itself to storage should


probably be done in a two-step process — the first of which
involves manually fed microphotography to machine - man ipul able
form. In this way, a master file of some 2 x 10° pages can be
built up at a cost comparable with the cost of purchasing the
book or journal (say, one cent per page). Conversion of the
entire contents of the model library would thus cost $20,000
and might take a few months . The resulting aperture cards
or other microform could be used to provide masters for on-
demand full-size or low-reduction copy and with minutes
(rather than seconds) delay, if the rapid reproducing equip¬
ment to work with the main high-reduction store is not avail¬
able until 1968 or beyond. Microform duplicates of the entire
collection may cost around $1000 to $2000. The conversion
of the catalog and graphic material for this model library thus
does not seem inordinately expensive or time-consuming.

TRANSMISSION, DISPLAY, AND CONSOLES

The on-line interaction with the system will be via a trans¬


mission system, linked to display devices. The user will
have a console equipped at least with keyboard and buttons,
but perhaps also with a light-pen or other flexible tool.
(Naturally, buttons can be simulated, too.) In the interests
of over-all system economy, the Intrex experimental sys¬
tem should use video links (coaxial cable) to its users, to
allow all displays to be driven by information-generating
and regenerating equipment centrally located at the computer.
A few-tenths second transmission time for a full page of text

108
is required over a 5-Mcps line, and this bandwidth is neces¬
sary to gain experience with the information transfer rates
that will be obtained with character-coded material transmit¬
ted over narrower-band lines in 1975-. Naturally, signals
from the light-pen or the buttons can be sent back to the com¬
puter in a narrow-band form on top of the video transmission,
if desired (or during the retrace time, for instance).

The whole idea of the experimental system consoles is to have


them capable, simple and flexible, which means that one wants
the displays to be generated by the central computer so that
system changes can be implemented in software. Data-rate
and resolution limitations can probably be imposed on these
capable consoles more cheaply than one can develop inexpen¬
sive, less capable consoles. (Of course, this only holds for
the experimental program and not for the operational system
of 1975 +,)

PERMANENT COPY

One of the most important questions for the information trans¬


fer system of 1975 is whether microform will be acceptable
for distribution of hard copy to the user. This is certainly
part of the core experiment on access to full text and must be
approached initially by the provision to the user of an attrac¬
tive microform viewer, with a guarantee of the availability
of microform from the model library, either as output from
the augmented catalog or as copies of library contents or as
current issues of journals, etc. The physical compactness
of files, the ability to carry the equivalent of hundreds or
thousands of selected pages of documents while traveling
should not be compromised by a cumbersome or inadequate
viewer. The goal could be a book-size, book-weight device
with push-button page-stepping.

In addition to the portable personal viewer, one needs micro¬


form viewers to supplement the on-line console and to serve
as office equipment for routine reference. This will serve
to compare a wholly on-line with a more heavily microform-
oriented system, as well as to relieve the central computer
of some load.

Of course, both microform and page copy will be obtainable


from the central store, with a few-second delay if one is on
the premises, and with delay compatible with messenger ser¬
vice if one is across campus. The cost of delay is an experi¬
mental variable and even those physically close will some¬
times be forced to wait for this reason.

It is technically feasible to provide each console with the addi¬


tional equipment required to produce facsimile hard copy from

109
the drive to the CRT display. This competes, of course,
with physical delivery of similar copies from the central
computer store, but may be a very desirable component
of the system.

In addition to the above consoles, it may be desirable to


use some standard remote-computing type terminals, even
though their typewritten output is often disturbingly slow.
This difficulty is likely to be attacked successfully over the
next two years.

SUMMARY

The hardware involved initially is perhaps:

A time-shared computer system.


8
A magnetic-disc store of some 2 x 10 to 4 x 108
bits and 75-msec access time.
A magnetic-chip store of some 5 x 10^ bits and
200-msec access.
An image-microform store capable of holding
2 x 10^ page images with an access time of about
one second to a video scanner and page or micro¬
form copier.
Coaxial cables to the 10 to 30 subscribers.
Flexible, high-resolution CRT displays and inter¬
action consoles equipped for occasional page-
facsimile output.
Portable, personal, book-size microform viewers.
Remote-inquiry type terminals (typewriters and
advanced typewriters).

110
5. RELATED STUDIES: EXTENSIONS AND ELABORATIONS

Many facets of the intellectual life at a university depend on the


university’s libraries; changes in the library system are certain
to have indirect effects on other activities, some of them quite
remote from the libraries themselves. Insofar as possible,
we have attempted to foresee the indirect implications of mov¬
ing toward an on-line intellectual community, and to recommend
studies that would provide information relevant to them.

For example, we discussed with considerable enthusiasm the


proposition that a large body of factual information, so stored
as to be open to interrogation from a variety of users with
different interests and levels of sophistication, might provide
the most powerful teaching instrument ever conceived. It
seemed important for Intrex to consider as explicitly as pos¬
sible what opportunities this new instrument might offer for
the educational process at MIT.

On a less optimistic note, the possibility was considered that


the proposed innovations might impede as much scholarship
as they facilitated. The leisurely perusal of a library’s
collection is considered by many scholars, particularly those
in the humanities, to be an excellent way for students to get
a feeling for the structure of a field of knowledge, and even
to be an important source of serendipitous discoveries by the
more advanced scholar. Some might predict that improvements
in library operations would necessarily preclude browsing,
thus frustrating many of the library’s best customers. So it
seemed important for Project Intrex to examine this issue on
its merits and, where appropriate, to conduct experimental
studies of the browsing habits of various classes of library users.

Moreover, much interest has been expressed in how a library


might be made to play a more active role in the intellectual
life of a university, how the needs of its customers might be
known and served automatically, without waiting for the user
to initiate a request. To state this idea in its most extreme
form, any request that a user has to initiate might be regarded
as a failure of the information system or, if this is too ex¬
treme, at least as a feedback signal for some correction in
the library’s conception of that user's needs. The Intrex
experiments could be seen as a first step toward this ideal,
and some explicit consideration should be given to possible
further steps in the same direction.

In the experimental program on text access, the advantages


and disadvantages of producing on-demand copies of items in
the library’s collection will be studied. A system that oper¬
ates in this manner is, in some sense that needs clarification,
in the publishing business. It seems important for Intrex to

111
look this possibility directly in the eye and to ask what experi¬
mental results might contribute to the wise planning of the
future library's opportunities and responsibilities in this field.

Finally, the Conference gave some attention to the possibility


of relieving the library’s problems without diminishing its
serviceability by well-conceived rules for acquiring and weed¬
ing documents. It is a familiar conceit that the amount of
good material in any scientific field at any particular time is
really rather small; and one of the problems is that this high-
quality signal is being drowned in a vast flood of low-quality
noise. If there is any truth to this notion, Project Intrex would
seem to be in an excellent position to discover it.

Other implications of the Intrex experiment were discussed,


but these five — education, browsing, selective dissemination,
publishing, selective retention — received the bulk of the
Planning Conference’s attention. Although none is essential
for getting the Intrex program under way, all of them are
important. Therefore, the following studies and experiments
relating to these five topics are presented as appropriate foci
for the further development of Intrex research once an auto¬
mated library-laboratory is available. We consider them'in
some detail in the remainder of this section.

EDUCATIONAL FUNCTIONS OF INTREX

A university library plays an important role in the educational


programs of the university. In the past, however, its effective¬
ness in this role has often been limited by inadequacies in
library operations, but quite as often by a tendency of faculty
and students to emphasize particular textbooks while relegating
to a secondary position the immense stores of information
in the library.

Improvements and revisions of traditional library functions


should be accompanied by a reassertion of the library’s role
in the educational network. Viewed broadly, the entire uni¬
versity can be thought of as an information transfer system;
its library provides only part of the transfer services and
channels. It is clearly a requirement for Project Intrex to
develop experiments demonstrating the power of the new library
to supplement the other information transfer channels of the
university.

The educational problems of the library can be conveniently


grouped into three parts. There is, first, the problem of
teaching students how to use the library efficiently, a task that
may grow in magnitude as the information retrieval system
grows in complexity. Second, the library must provide the
traditional back-up for the conventional lecture courses.

112
providing the texts, the reserve books, the special readings,
etc. Third, and in many respects most significant of all, the
library should foster in its users the desire to continue their
education beyond the classroom, perhaps even beyond gradua¬
tion, by self-directed explorations of the library's collections.

Instruction in Information Retrieval

Intrex must consider how best to train students in the use of


the new facilities, and should give some attention to keeping
records on how rapidly the learning process proceeds, what
aspects cause trouble for novitiates, etc. The success of this
instruction can perhaps be evaluated through an experiment
incorporating a series of subject "treasure hunts", with al¬
ternate access points available to different students and to
different research staff.

Not only should Intrex try to reduce the difficulties of the new
users; some attention should also be given to educating them
in the way the system works and the resources of documentation
that it draws upon. Instruction in information retrieval at MIT
is presently limited to publication of an occasional pamphlet
and to direct efforts of the front-counter librarian and the
reference staff. In an Intrex-designed library, however, the
search and retrieval functions will be automated and quite rapid.
A danger exists that students will emerge knowing how to use
the system, but not understanding how it works — much as the
driver of a Ford today probably knows far less about his ma¬
chine than did the driver of a Model T. This situation would
be acceptable if the student could be assured of access to an
Intrex-type library after graduation, but there is a real likeli¬
hood, for some years, at least, that he will go out into a world
of Model T libraries that he may be poorly prepared to use.

Classroom Back-Up

The important function of providing materials for students in


their class work will obviously be affected by the availability
of more efficient information systems in the university. The
possible improvements suggested a variety of possible experi¬
ments that might be conducted.

In order to make the discussion definite, assume that we are


going to investigate the usefulness of an idea-retrieval system
as compared with a document-retrieval system, and of both
of these compared to the usual library reference room and
reserve-book section. Let the corpus be limited to the mater¬
ials required for a single course in some subject where con¬
siderable use must be made of reference material. The corpus
should not be so large that it can not be conveniently coded in
machine-readable form. It should not require graphics, in order

113
to avoid expensive complications in the display equipment. A
laboratory course might conform to these specifications, since
the laboratory apparatus itself provides the graphic information.
The information would then be made available to the students,
who could elect to study it in any of three alternative forms.
A student could, if he wished, ask questions and get detailed
answers, or he could use the system for document retrieval,
or he could ignore the system completely and use the more
conventional option. One advantage of this mode of experi¬
mentation, of course, is that the system itself can keep records
of the student's actual behavior — records which probably
should not be made available to the instructor — and these
could be related to performance in the course.

Obviously, a number of variations on this theme can be imagined,


all of them exploiting the fact that the restriction to a single
course limits the size of the corpus that must be stored to
economically feasible dimensions. No "programmed instruc¬
tion" need be involved, but information in this form could be
incorporated if it was thought desirable to do so. For example,
special teaching programs in the course prerequisites might
be made available so that students could review those in which
they were deficient. Or test programs might be written and
made available through the computer so that students could
evaluate their own progress in mastering the material. The
imaginative use of the new library facilities in the classroom
should be encouraged, and, no doubt, teachers will emerge
who want to use the facilities in this manner, whether or not
Intrex mounts a formal experiment on the question. .

Still another experiment might be included in Intrex to demon¬


strate the potential of the library as a distributor of formal
educational subject matter. The student at MIT must absorb
his educational nutrient through standard-sized courses of,
say, 9 or 12 units. But his key interest is often restricted to
a small portion of the subject, and a library of condensed
teaching programs would provide a welcome solution to his
problem. For example, a store of teaching programs on the
principles and operation of instruments, designed to save in¬
structor time in a given laboratory course, would provide con¬
comitant benefits to innumerable students engaged in experi¬
mental research. And the student experimentalist would not
have to invest in a full course to acquire this specific knowledge.

There is also a real possibility of putting educational television


into the library system, but little attention was devoted to this
matter by the Planning Conference.

Self-Education via Intrex

The concept on which English universities are built — that of


"reading for a degree" - is alien to most campuses in the

114
United States, and it is not obvious that MIT should be the place
where one tries to experiment with the education of young men
by turning them loose in an information transfer system to follow
their own interests and desires. Nevertheless, the ease with
which information will be obtainable in the future system should
encourage expeditions into the unkown that would be quite im¬
practical under the present system. Only a little effort would
be required to implement experimental studies on how best to
stimulate such unassigned adventures by the students.

For example, the fourth-year undergraduate might be required


to take one of his courses as an Intrex reading exercise, with
a qualified tutor assigned to assist and guide him. Guidance
would emphasize acquisition of knowledge from the information
store. And evaluation of the student’s success would be based
on what he selected and read, not on the structure of the cor¬
responding course. An important feedback would be provided
by the tutor who could identify the weak spots of the Intrex
system in satisfying the needs of the student reader.

An experiment of this nature may be extended to cover the full


program of the third and fourth undergraduate years. Or it
may be developed for the older engineer returning from in¬
dustry for a year’s reading in the Intrex system. Here, too,
the feedback would be of considerable value in reinforcing the
power of the Intrex system vis-a-vis continuing education.

SELECTIVE DISSEMINATION

A library of the type envisioned here can take a much more active
role inproviding inform at ion to its clientele than most libraries
have in the past. The availability of a computer system makes
it possible to keep profiles of each user’s interests and to furn¬
ish documents to him even before he requests them. This kind
of active library service,, aimed at supporting the user's current
awareness of developments in his field, is usually called Se¬
lective Dissemination of Information (SDI).

Programs for SDI are presently operating in many libraries,


mostly in industrial or Government organizations; they appear
to have received little attention from academic institutions. It
seems particularly appropriate, therefore, that Project Intrex
should undertake an experiment to determine the cost and
feasibility of SDI services in a university context, and to ex¬
plore the extent to which such service might be undertaken as
a normal part of the library program.

Two approaches to this objective have been suggested. The


first is an approach widely used in industrial libraries, namely,
the personal review of all acquisitions against known profiles
of user interest, for the purpose of calling the user’s attention

115
to documents of potential interest to them. In an industrial
library, this task will normally be performed by the librarian
or other information-service personnel. The information is
disseminated either by routing marked copies of serials or
other documents to appropriate users, or by sending users
selected distribution lists (with or without abstracts), or by
distributing photocopies of individual articles. In one form or
another, these procedures could be easily adapted to one of
the MIT libraries, or to some selected group of users.

A second approach, pioneered by the late Peter Luhn of IBM,


is to perform a computerized analysis of the document’s content
and to match this with user profiles stored in the machine.
This approach has had limited use in industry, but might be
developed into a useful tool in the Intrex environment if and
when facilities are available for handling large quantities
of digital text.

Given the boundary conditions provided by the four experimental


programs to be undertaken initially, it seems unlikely that
Intrex could take the second approach to SDI within the first
five years of its existence. In any case, an intensive study
of existing SDI practices should be made before any attempt
is made to provide such services at MIT. It should be remarked,
however, that the procedures followed in developing the collec¬
tion used in the catalog and text-access experiments might
contribute to an automated SDI program at some later time.

A dissemination system can be implemented only if the approp¬


riate retrieval tools exist or are developed in connection with
the other information transfer experiments. Since many tools
exist at present, however, and since dissemination experience
itself should influence the design of subsequent retrieval tools,
Intrex might well begin selective dissemination experiments
with conventional tools (published indexing journals, for ex¬
ample), using, at least initially, human screening (by individuals
or by panels) to review current literature for dissemination.

The variables in the study are: (a) criteria for dissemination


(subject profiles, citations, recommendation of colleagues,
authors); (b) the number and form of dissemination stages
(titles, titles plus one-sentence subtitles, titles plus abstracts,
titles plus index tracings); (c) the response time of the system
(one minute to several days); (d) the subsequent retrievability
of anything declined; (e) the effectiveness of personal files for
accepted material; and (f) the degree of selectivity (varying
from a long, diffuse initial bibliography from which recipient
makes a selection, to a very short and specific list — possibly
full articles themselves — with no further choice necessary).

An especially difficult problem in assessing user reaction to


disseminated material can be foreseen at the outset. The

116
recipient, when shown a title and perhaps an abstract, may
decide that the article is of probable future interest, and
should therefore be readily accessible to him, but he may not
choose to read it immediately. Thus there are several cate¬
gories of response that must be anticipated. A possible
questionnaire that the recipient would fill out for each dissemin¬
ated article might be something like the following;

Is this material of immediate interest and do you


wish a copy as soon as possible?
Is this material of future interst and therefore do
you wish it kept readily accessible?
Check here if material is not of interest in the
sense of either question above.
Name any other people you think may be parti¬
cularly interested in seeing this article.
Check whichever of the following items you think
pertinent:
This article is very important and should
be widely read.
This article reports new results and therefore
is of importance to specialists in the field.
This article is a review and has considerable
value in its brevity of presentation for the
field it covers.
This article is a relatively unimportant
rehash.

This article has (high, low, moderate)


tutorial value to (scientists in same
specialty, scientists in general, laymen).
Had you become familiar with the contents of this
article by some other means prior to receiving it
in this dissemination experiments: If so, where?

The foregoing might have to be separated into several separate


questionnaires, since some of the questions can be answered
only after the full document has been received.

Material should be disseminated in the experiment in several


stages. It is not yet known what the best form of initial noti¬
fication should be, but several could be tested; the title plus
a one-sentence subtitle is thought to be about the right, but
this conjecture is based on slight evidence that titles alone
are not enough and that full abstracts probably take too long
to read.

117
Following the initial notification, a recipient should be able to
select articles that he wishes to receive in their entirety, and
the system should be capable of providing a very rapid response
should the recipient demand it. A particularly important vari¬
able in these studies is the length of the initial notification list;
it must represent a compromise between economy of effort on
the part of the receiver and the advantage of giving him a wide
range of choice.

It is not intended here to identify all the variables or to propose


a single design for the SDI system. Let us suppose, however,
that some kind of experimental dissemination system has been
implemented. We now reach the crucial point of trying to assess
its value to the recipient. The problem is exceedingly difficult,
and the following suggestion is necessarily tentative. Let us
base our evaluation first on the fact that each scientist has,
either implicitly or explicitly, a budgeted amount of time for
reading the literature of his field; he must allocate that budget
in some way. His normal reading habits result in some kind
of allocation, and at least one question of interest is whether
the selective dissemination process causes him to reallocate
his time. It is a reasonable conjecture that it will, if the ser¬
vice is free and if it is worth anything at all. Exactly how much
the service is worth might be inferable if the experiment in¬
corporated some mechanism whereby the recipient had to pay
for the service — perhaps by giving up some other kind of
information-using privilege. A possible technique might be to
allow him a total budget of some number of minutes each day
for information retrieval, some number of minutes for brow¬
sing, and some remaining minutes to make use of disseminated
journal articles. A weighting factor attached to each of these
three budgets might then be of use in coming to an estimate of
the relative value to the user of these three kinds of services.
One could also offer recipients free subscriptions to journals
in lieu of the dissemination service, again to permit measure¬
ments of relative values.

SDI, like the educational applications, gives the system an


active role to play. No longer must the librarian remain
politely silent until his constituents discover they need him.
In these applications of the Intrex idea, the librarian knows
his users and seeks them out, in the classrooms or the labora¬
tories, to make his services known and effective. This posi¬
tive approach is the dream of every good librarian, and every
effort should be made to ensure that Intrex does all it can to
make the dream come true.

BROWSING (OR ACCIDENTAL DISCOVERY)

Not everyone who walks into a library knows precisely why he


is there. In Sir Ernest Gower’s preface to the recent revised

118
edition of Fowler's Modern English Usage, he speaks of Fowler,
"He knew what he wanted from life; what he wanted was within
his reach; he took it and was content. n This is almost the
antithesis of a modern browser: he does not know what he wants;
what he might want has a good chance of not being within his
reach; and he is likely to be at least vaguely discontented
with what he is able to take.

Accidental discovery is typified by the man turning the leaves


of an encyclopedia or dictionary for something he knows he is
looking for (and there is page turning even after the index has
been consulted); on the way to what he thinks he wants, he
permits himself to be diverted to some other, at least mo¬
mentarily enticing article; he reads it and discovers something —
hopefully, something useful or at least pleasurable. Dis¬
coveries of this sort are not really threatened by any innova¬
tions in information transfer that we anticipate over many years.

"Browsing" may be less structured than accidental discovery,


in the sense that a browser may not be looking for anything in
particular, but it can easily be construed to include all those
aspects of accidental discovery that would ordinarily occur in
a library. Early in the Planning Conference, the participants
were asked to write their own definitions and comments on
browsing. The range was wide. At one end were those who
thought of browsing as systematic with the purpose of dis¬
covering pertinent information not known (by the browser)
previously to exist. In the middle were those who thought the
purpose to be a little less concrete — that it was undertaken in
the hope that one's professional view might be shaken up or
stimulated to an unforeseen insight — and who at the same time
browsed in a literature not obviously relevant and perhaps not
systematically selected. At the other extreme were those who
thought that it was an unprogrammed, recreational sampling
of reading matter, done for pleasure and wonder, quite aimless
in purpose, very personal and subjective in its satisfactions.

The places for browsing also varied: book stores (these were
favorites); university common rooms; periodical tables in
coffee rooms, in the professor's outer office, dentists' offices,
barber shops, and in more focused arrays such as scanning
lists, indexes, catalogs, tapes; tapping computer output at
random; and finally, of course, the library stacks themselves.
One characteristic of browsing as classically conceived is that
it is the examination of a spatially ordered set of documents
with depth of penetration into the items fully and easily con¬
trolled by the browser. The active control of the depth of
penetration is of central importance in the process.

We found no proof that browsing was in fact of major "usefulness”


in scholarly pursuit, but most conferees seemed to believe that
it was and to accept this belief without proof. Thus there was

119
general concern that more sophisticated methods of search
should not destroy the browsing opportunity for those who
desired to browse, whatever their motives. Of course, a
complete destruction of the privilege of browsing by wandering
physically among the open shelves, wherever they may be,
cannot occur so long as machines coexist with books and
periodicals — which we project to be for a long time to come.
How then could browsing be diminished?

It could and indeed is being diminshed by the increasing volume


of browseworthy material in every field. A mile of shelves of
history books is too long to scan. Large university libraries
have generally abandoned the hope of providing useful general
browsing rooms. Moreover, such tightly selected physical
collections impose upon the browser the previous narrow
choice of some one else as to what it would be worthwhile
for him to browse.

Accidental discovery may be threatened by the very skill of


computer dialogue in returning a sharp reply to the question
asked. There may be a natural tendency to polarize in the
interest of efficiency. The fringe material — and therefore
the accidental discovery — may be inhibited by the very effici¬
ency with which the computer prunes the "irrelevant". Of
course, the user can call for the fringe pages of a stored
encyclopedia, but we think he will not let his eyes fall on the
rhinoceros or on romance on the road to Rome as he can
hardly fail to do in a book encyclopedia. And there is always
the lurking fear that the human may have secured the right
answers to the wrong question!

A negative view would be to try deliberately to build, into in¬


formation systems, inefficiencies or at least some degree of
randomness, but this seems to us doomed to failure. The posi¬
tive view is to try to exploit the new technologies to overcome
some of the disadvantages of present physical browsing, to
permit browsing with more not less power, and to forfeit none
of the advantages of present browsing methods save perhaps
those involved in the olfactory and tactile sensations related
to the physical contact with books.

Such an effort would accept the condition that browsing, being


a form of exploration, demands that initiative and control of
depth of penetration be in the hands of the browser — and not
highly constrained by the nature of the collection. It assumes
browsing to be possible in any information system in which the
user has active and relatively rapid control of the documents
he wishes to retrieve, and in which the depth of detail in which
he examines them is also a variable under his control.

It is in this framework that the Planning Conference proposed


a program of experiments in browsing. When one tries to

120
formulate specific experiments on browsing, however, the
absence of any normative data on browsing habits in existing
libraries becomes painfully obvious. This lack is serious,
not merely because we do not know what facilities and oppor¬
tunities different groups of users presently expect, but in
particular because we do not have examples of how such
undirected activity should be described, measured or evaluated.
A whole experimental methodology must be developed; it might
involve questionnaires, automatic record keeping, personal
diaries, user choices between alternatives, subjective estimates
of probabilities or costs involved in browsing, or other types
of data. Appropriate and informative measurement techniques
could no doubt be developed, but they do not exist at the
present time. Developing them should be the first stage in
planning a series of experiments on browsing.

It is perhaps unfortunate that browsing in the stacks of a


contemporary library is not an activity that is easily observed
and recorded. There is a quality of privacy and isolation that
some inveterate browsers regard as an essential attribute of
a pleasant and rewarding prowl through the stacks. In a
completely automated system, it would not be difficult to
obtain detailed records of what the browser had examined and
how long he had spent at it. But, in the conventional library,
the matter of behavioral record keeping is more difficult.

Given that some record of a browser's behavior might be ob¬


tained by suitably ingenious techniques, what aspects of brow¬
sing should the system be designed to optimize? One feels
intuitively that "browser satisfaction" is the most important
criterion to be considered. Browsers might be asked to give
subjective ratings to different situations, but such judgments
should, whenever possible, be backed up by statistical data
describing the user's behavior when he is given a free choice
between alternative opportunities to browse.

Since an automated library will be expensive, it is impossible


to resist the temptation to introduce notions of efficiency in
attempting to evaluate its use for browsing. Perhaps the
concept of efficiency would not completely subvert the essential
meaning of browsing if we are careful to qualify it as follows:
of two systems that permit browsing, the one that leads to the
greater number of fruitful, unplanned discoveries per unit
time will be the more satisfying to its users and, in that sense,
more efficient. If users prove to be able to identify and announce
such profitable (or, at least, interesting) discoveries, the cost
of serendipity might be rather accurately assessed for any
given collection of documents and method of access. The
fallacy of misplaced concreteness should not be forgotten in
making such estimates, but, even when all their limitations
are recognized, they may prove to be instructive for some
purposes.

121
If evaluative measures can be devised, several studies might
be worth conducting, e.g., browsing facilities of different types
could be made available and user preferences, costs of discov¬
ery, and virtues or defects of specific systems could be determined
for different areas of specialization and classes of users. The
model collections envisioned for the access experiments could also
be used for browsing experiments. Some librarians have from time
to time assembled books in special "Browsing Rooms" that would
provide a person with an introduction to the pleasures of reading. We
do not believe such recreational reading, separated from regular
curricular and other materials, should form the collection to be
used in these studies of browsing behavior.

Some suggested arrangements whose relative merits might


be tested are the following, arranged in a rough order to
explore the spectrum of possible systems:

Browsery. In order to collect data relevant to


present libraries — data against which newer
systems can be compared — an effort should be
made to characterize the range and variety of
browsing behaviors observed in collections of
printed documents arranged in some orderly
way on library shelves. Such data could be
collected in existing facilities at MIT; or, if
better experimental observation and control
is required, special rooms — which we have
called "browseries" in order to distinguish them
from the "browsing rooms" that already exist —
could be created in favorable locations on the campus.
Microbrowsing. Another possible arrangement
is to browse through microform, rather than
through the original documents. Stored images
of the documents in the collection would be re¬
trieved mechanically under the control of a
cataloging, indexing, and annotating system
operated from consoles at which the images
are displayed.
Catalog Browsing. As the augmented catalog
becomes a practical reality, a user should be
given the opportunity to browse through it from
a console. He would in this way be able to dis¬
cover the existence of unsuspected items in the
collection, and would have access to considerable
information about it. In order to get an actual
document, however, he would have to make a
specific request to the library in the usual way.
Telebrowsing. Finally, as more and more digital
text becomes available, a completely electronic
form of browsing could be tested — a form in which

122
the user had complete and immediate control over
his depth of penetration into any item. In a tele-
browsing system, moreover, various actions could
be taken to rearrange the user's personal files —
to make some items more immediately accessible —
as a result of discoveries made during the unplanned
exploration of the public collection.

The characteristics of these systems that would probably emerge


as experience accumulated with each of them are summarized
in the table on p. 130. If the comparison is formulated in com¬
petitive terms, no single system would be likely to emerge as
the optimal solution for all scholars in all fields under all con¬
ditions. Two, three, or perhaps all four kinds of browsing
may have a role to play in the automated library of the future.
The experimental program, therefore, should be conceived as
an effort to discover and remedy the relative weaknesses of
each, and to characterize the conditions under which each is
preferable. It should be remembered that people differ widely
in their work habits, so that no single system will prove optimal
for all customers. All types of browsing facilities will have
their devotees, and experimental development should be directed
toward making the available systems maximally useful.

PUBLISHING

In the kinds of information exchange discussed in the preceding


portions of this report, the emphasis has been upon meeting
the informational needs of individuals or at the most of small
groups whose members have identical interests resulting from
work on a common task, educational or other. By contrast,
when the arrangements for information exchange provide for
the deliberate bringing together of a body of information and
making it available in identical format to a number of potential
recipients representing a public, "publication" has been achieved.

Publishing normally includes not only the packaging and distribu¬


tion of information, but also a number of preliminary operations,
such as commissioning the work which is to be published, and
readying it for publication. It may be expected that the facilities
that will be assembled by Project Intrex will provide several
configurations capable of achieving publication; several experi¬
ments and/or demonstrations in publishing may be contemplated.

Demonstration of Publishing through the On-Line System

The on-line intellectual community discussed in Chap. IV suggests


an interesting set of experiments to explore ways in which a
university library of the 1970's could provide more effective
incentives for authorship within the community it serves. Present-
day publishers expend a large portion of their resources finding
and motivating scholars to contribute the basic raw material of

123
the publishing industry, namely, manuscripts. A variety of
incentives for authorship are recognized and invoked by the
publishing industry. The desire to document and broadly
communicate research results to the professional community
is probably most basic. Closely related, but present in
differing degrees, are the opportunities for professional
recognition and monetary reward gained through publication.
Viewed in this context, copyright protection represents one
particular class of author incentive, based on monetary re¬
ward for creativity. (See, also, the appendix on motivations
of authors. )

We do not propose that the university library of the next decade


shall go into competition with the existing publishing industry
for the sake of competing. Rather, it may be possible to
provide new tools and services to the academic community
which will expand and strengthen the existing incentives for
authorship and, through collaboration with the industry, achieve
a more effective "coupling" between author and publisher than
now prevails.

As an initial corpus of information to be handled by an on-line


publishing system, it is suggested that the documentation of
selected segments of current research conducted in the academic
departments and interdepartmental laboratories of MIT would
be an experimentally interesting and economically viable subset
of the total holdings of the present MIT libraries. This body
of documentation has several desirable characteristics. It
cuts across the entire spectrum of subject interest within the
MIT community, providing the opportunity to compare the kinds
of use made of the system by different subject-interest groups.
It is a collection of substantial interest to the entire on-campus
community as well as to other universities and research organ¬
izations (public and private) in the regional and national com¬
munity. Several industrial concerns in the New England area
might, for example, be given access to the system through
remote terminals, free of charge at first, and later on a pay-
as-you-go basis. Using this body of material in this manner
avoids, in the beginning, problems related to copyright pro¬
tection, since most of the copyrights would be held by MIT.
Much of the current documentation (progress and technical
reports) is, in fact, not copyrighted. Journal articles written
by MIT faculty and staff would represent a special case. How¬
ever, the system could be useful in the pre-publication prepar¬
ation of such articles and, once published, their bibliographic
references could be maintained in the system as a part of
its bibliographic data base.

Moreover, this corpus would make it possible to avoid the


awesome task of converting all existing documents to machine
form. As experience accumulated with newly created mater¬
ials, experiments could be initiated in the automatic conversion

124
of existing documents into the bibliographic and textual data
bases of the system, using optical character-recognition
techniques.

If the system were successful in the short run, the results


would be of immediate interest to other large universities
who face similar problems in the effective collection, organ¬
ization and dissemination of their own research documentation.
Similarly, some publications of university presses might be
suitable material for such publication techniques.

As a part of the total "user group" selected members of the


publishing industry could be invited to participate in the system
to explore the possibilities for more effective coupling of
authors to publishers. If it were deemed advisable, on-line
access to the bibliographic and textual data bases of the system
would enable publishers more easily and systematically to
determine who was writing what; they could discuss with
selected authors the possible publication of existing manu¬
script or of commissioned manuscripts on related subjects,
etc. Members of a publisher’s staff could also provide on¬
line editorial assistance, and could call upon certain members
of the on-line community to provide critical reviews.

Within the system itself, various kinds of limited "publication"


would be possible under the control of the authors. For example,
a newly created manuscript could be made accessible to other
system users either on a selected basis or on a broadcast basis
by entry in the "public file". As the system evolved in scope
and size, it might prove desirable to establish internal system
"editors" and "reviewers" to assist authors, on the one hand,
in conforming to editorial policies and standards of the system
(e.g. , assignment of descriptors, preparation of an abstract
or extract, etc. ), and to provide substantive review and criti¬
cism of manuscripts, on the other hand, to the extent deemed
desirable.

Another interesting prospect for such a system is the oppor¬


tunity for on-line interaction among several users who wish
to collaborate in the authorship of a manuscript. More in¬
formally, physically remote scholars could converse about
their mutual interests in ways that might converge more often
and more quickly to joint authorship of formal publications.

The system proposed above would eventually evolve into a very


sizeable on-line library, with all the characteristics (and prob¬
lems) confronting present-day libraries. It is not intended to
discuss the total on-line library concept here, but rather to
focus on the author-editor-publisher segment, with emphasis
on experiments leading to more effective incentives for author¬
ship and more efficient author-publisher communication.

125
Publication Through a Microform System

As an alternate to the "on-line" computer system of assisting


the author to generate and publish the results of his experiments
or deliberations, a trial of a system that relies on microform is
in order. Surveys have clearly indicated that many Scholars
get their most valuable and most current information through
verbal exchange. But this system has required and will continue
to require the written supplement. If the system is easy to use and
inexpensive, it may serve for all the preliminary drafts of research
reporting and — in some cases — for the final publication.

The advantages usually offered in support of a microform system


for graphic applications are: (a) volumetric reduction of the
space required to file any given set of materials; (b) fidelity of
the copy to the original; (c) low cost; and (d) facility for repro¬
duction in a variety of enlargements on either hard or soft copy.

These advantages might be tested by a group of faculty members


at one or more academic institutions. As often as work is
recorded in the form of rough notes, sketches, or instrument
readings, the author films the pages with a portable microfilmer.
There is no reason why such early work could not be filmed
from hand-written copy, since immediacy will overcome ob¬
stacles of legibility for the community of peers to whom the
substance is important. The recipient can examine the received
film and discard it, file it, or initiate a request for full-sized
copy through any channel open to him. He may then send his
comment to the original author via letter or telephone or he
may, by use of the same microfilm system, send comments,
addenda, or corrections to all participants in the system.

Authors will try such a system if it is simple and does not intrude
itself more than the accustomed ways of doing business. Its
competitors will be the duplicating machine and the office photo¬
copier. In an experimental operation, it would be important to
examine the author’s reaction, the user’s reaction, the obvious
costs and the hidden costs. One publication. Wildlife Disease,
has been issued only in microform for a number of years. Its
authors have become accustomed to preparing copy in the square
format that uses the film to best advantage. The experiment
proposed here carries this type of publication to the individual
article, report, or book chapter, and seeks to determine accept¬
ability in a community of scholars.

The experiment might be undertaken at MIT by authors whose


work appears in laboratory reports, journal articles, or theses.
The reasons for this selection have been enumerated above.

126
On-Demand Publication

The possession of a corpus of informational text in digital form


gives the publisher of scholarly publications considerable free¬
dom in selecting the format of his publications. Specifically,
the range of typography available to him extends from the primi¬
tive capital-letter alphabet of the line printer to the sophisticated
fonts of the photocomposing machines. In addition, because
for such a publisher composition is automatic, there is not the
wide spread between the cost of a single copy and of one of
an edition.

Similarly, if the publisher possesses a copy of the text in a


graphic form of sufficient acceptability, he can effect publica¬
tion by photographic methods which again avoid the costs of
composition, once more reducing the cost spread between the
single copy and one of an edition.

This situation suggests the feasibility of a program of "on


demand publication”, i.e. , one in which publications are offered
for sale or otherwise with no manufacture or warehousing of
copies in advance of sale; rather each copy is manufactured
to meet a specific request.

There have been precedents for such a program (in the ”auxili-
ary publication” program of the American Documentation Insti¬
tute for making available ancillary material omitted from pub¬
lished journal articles, also in the on-demand supply of Ph. D.
theses and by certain reprint publishers of photographic copies
of out-of-print books); but there has been (so far as is known)
no major instance of on-demand publication as an original
method of publishing. Project Intrex should be able to meet
this condition. For its list, one suggestion is that it might
take the (presently unpublished) progress reports of the num¬
erous MIT contract research projects of general interest.
Another suggestion is that, by arrangement with the MIT Press,
it convert to machine-readable (digital) form or to machineable
graphic form (e.g., microfilm) a number of MIT publications
of known saleability (leaving a number of similar publications
in conventional form to serve as controls), in order to test the
feasibility and effects of on-demand publication.

Just as it has been demonstrated that on-demand publication can


be substituted for a new edition of an out-of-print work, so it
appears that it can be substituted for conventional, original-
edition publication. Viewed in this light, the practice of pro¬
viding photocopies of pages or articles from books or journals,
even though they may still be in print, may be regarded as a
form of on-demand publication.

127
SELECTIVE RETENTION

One often thinks that most of the library's problems could be


solved, or at least moderated, if writers would stop writing
and publishers would stop publishing redundant, incompetent,
unnecessary articles and books. Since no practical method is
available for turning off the flow at its source, however, a second
line of defense might be in the library itself. Is there any way
that librarians could limit acquisition to worthwhile items, or
could discover and weed items thought to be of little interest of
value to the users of its collections? In the humanities, or any¬
where that the historical record is valued in and for itself, such
extreme measures would probably be unacceptable. In a scientific
working library, however dedicated to keeping its users on the
leading edge of their profession, the archival function is less im¬
portant, and a little heavy-handed pruning might prove to have
far more advantages than disadvantages, provided an archive
is maintained elsewhere.

With these thoughts in mind, some members of the Planning


Conference devoted time to considering better methods for
selecting new acquisitions and for weeding outdated or inappro¬
priate items from its files. It seems that better selection
methods might be developed, based on detailed usage data and
other information. Moreover, weeding is an operation rarely
done in traditional libraries. We felt that the data used for
selection could also be used for weeding.

An experiment along these lines was designed that had as its


main purpose to select these data and develop selection/weeding
rules based on them.

It was assumed that the experiment would be carried out in a


departmental library that has controlled access to all documents;
the model system contains such a library and thus would be
suitable for experimentation. In a departmental library, re¬
moval of a document will, at most, mean referring a user to
an MIT central library or to some other library. No concern
need be felt about destroying the last existing copy of the docu¬
ment, Moreover, the departmental library will have a limited
size and limited funds, so the total number of documents will
be clearly limited. And, finally, it was assumed that the very
straightforward objective of the selection/weeding policy would
be simply to maintain a collection that will maximize the number
of requests from library users satisfied by the documents held
in the library.

The first objective of the experiment should be to isolate a


number of measurable quantities on which a decision could be
based — a decision as to whether a document should be acquired
or discarded. The selection of these measures would be a major
outcome of a successful experiment. However, it is possible to
list some of the more promising possibilities.

128
Requests. In a large library, the average document is
requested only rarely — once or twice in many years. Hence,
request data are statistically unreliable over any reasonable
time span. In some subject fields, documents are used much
more frequently, and statistics may be correspondingly better.
Even so, they will probably leave much to be desired. Certainly,
if the Markovian model suggested by Morse is accepted, the
requests recorded most recently would be most pertinent.

Use. If document use differs significantly from document


requests, it might be considered as a possible, separate vari¬
able. Again, recent use would be more pertinent than earlier data.

Date. Scientific information becomes obsolete very quickly.


Requests for different kinds of materials (books, technical re¬
ports, papers) may decline rapidly with their age, hence discard
dates might be based on such experience. Discard of specific
documents, however, would have to be strongly determined
by specific requests.

Citations. If a document is frequently cited by other


documents that are frequently used, this fact might be used
as a rationalization for acquiring or retaining it. Citations
by a number of popular documents may provide an excellent
predictor of requests for a new document.

Interlibrary requests. Data concerning the requests


for documents in other libraries by the particular class of
users in the department library might or might not be possible
to get but, if obtainable, should be of value.

Requester. A particular request might be weighted by


the status of the person who made it.

Recommendations. It would at least be interesting to


correlate the requests for acquisition of specific documents
with subsequent demand for the documents.

Solicited opinions. Various quantitative methods exist


for collection of opinions, and the opinions of experts in a
field should be particularly valuable.

One might conduct the experiment along the following lines:


During the first two years, say, data would be collected rel¬
ative to all these variables and perhaps others. With a pro¬
perly designed recording system, little or no manpower would
be required, although about six months of programmer time
might be needed to prepare the data-gathering programs. Once
the data were in hand, correlation studies would estimate which
of the variables had the closest correlation with subsequent
user demand. Then, based on these variables, a number of
acceptable weeding/ acquisition rules would be developed. By

129
using great statistical care, the body of data already collected
might be used to evaluate each of the rules and to see approxi¬
mately what percentage of user requests would have been sat¬
isfied had the rules been in effect. Appropriate theoretical
models might simplify these calculations materially. In this
way, many possible rules could be examined before any of
them were applied, and without additional data collection.
Eventually, however, the rules would have to be implemented
(and further data collected) to see if they live up to expectations.
The trial might last about two years, during which time a man
would be employed to apply the rules. The normal library
selection procedures also would be carried out in order to
obtain a comparison between them and the experimentally de¬
rived rules.

This experiment has been described in the context of the parti¬


cular collection in the Intrex model library. Clearly, however,
the results would also be pertinent to personal collections, and
to computer files of digital text. It is doubtful that the results
could be generalized to really large libraries, however.

The proposal illustrates one possible approach to the problems


posed by the rapid increase in the number of documents with
which libraries must cope. Instead of developing technology
to bring more material within a second or two of a man's hand,
a selective filter is applied to increase the pertinence of the
smaller number of documents that are near him.

Space and funding permitting, these are experiments relative


to important problems, experiments that could be pursued in
the model library-laboratory Intrex proposes to create. It is
hoped that they will not be long delayed or grudgingly imple¬
mented once the facilities are available to handle them.

* * *

Predicted Characteristics of Alternative Browsing Systems

Variable Browse ry Microbrowsing Catalog Browsing Telebrowsing

Immediacy of good some 1ack*ng lacking


sensory contact
none some long little
Time delay
Special training none some some indeterminate

Depth of pene¬ under user's underuser1 s sharply limited available, but


tration immediate control expensive
control

Flexibility of limited to versatile versatile versatile


arrangement shelves

User location must go to console console console


room nearby nearby nearby

Memory for could be de¬ easy easy programmable


access readjust¬ veloped
ments
Relative expense 1 5 4 10
of development

When available (?) now 1987 1967 1970

130
6. R & D TO SUPPORT THE EXPERIMENTAL PROGRAM

The proposed program of experiments, presented in the fore¬


going pages, focuses on a few main problems that are, in our
assessment, both crucial and solvable. We think it is essential
for Project Intrex to concentrate its effort on those problems.
When we turn, as we do now, to the question of research and
development that should be done in support of the main exper¬
imental program, we face the temptation of recommending
that Project Intrex devote its energies also to many other prob¬
lems of information transfer and library operation, for there
are many problems that are interesting and important — many
problems to which MIT could doubtless provide solutions.
We believe, however, that we should resist that temptation.

The basic policy that should govern supportive research and


development is to undertake only those tasks that have to be
accomplished to ensure successful completion of the main
experiments and successful planning of the operational sys¬
tem for information transfer in the 1970's. That is to say.
Project Intrex should see to it that an adequate base of science
and technology exists for the accomplishment of its main
purpose, but Project Intrex should not try to work on every
square foot of that base.

We cannot tell in advance precisely where the gaps and flaws


in the base will turn out to be. That will depend in large part
on what other organizations attempt and how successful they
are. We therefore cannot list just the problems toward which
Project Intrex should direct its supportive effort. However,
we can now see some deficiencies that must be corrected if
Project Intrex is to serve its purpose. The following discus¬
sion deals with them. In order to select wisely among the
alternatives. Project Intrex should develop and maintain a
full awareness of plans and progress throughout the nation
and the world in library and information transfer technology,
and should regulate its supportive R & D responsively.

The four main areas that seem to us to demand the attention


of Project Intrex are: consoles, interaction language, auto¬
matic and semi-automatic analysis of content (e.g., automatic
indexing), and analysis of the needs of users and responsive
adjustment to meet those needs. There is a fifth area that
perhaps does not precisely demand the attention of Project
Intrex but should nevertheless surely have it: basic theory
of information transfer.

CONSOLES

Advances in the field we have come to refer to as "consolation”


are absolutely crucial for Project Intrex. They are crucial

131
also, however, for other applications of interactive computing,
and the whole task of developing consoles should not rest upon
the project's shoulders. What is essential is that Intrex for¬
mulate the needs for consoles of the information transfer ex¬
periments of 1965-1970 and of the information transfer networks
of the 1970's — and see to it that those needs are met. At the
very least, that will require a small group of people who are
knowledgeable in man-computer interaction and particularly
in information display. At the most, it will require a group
that can adapt hardware, design and construct interfaces, and
perhaps develop one or two critical devices. Hopefully, the
supportive R & D in consoles will not be a large effort; hope¬
fully, it will be mainly a matter of formulation, of liaison
with other organizations, and of testing. But it will be criti¬
cal to the success of Project Intrex.

The advances required in the console technology are numer¬


ous. Probably the main requirement is for an inexpensive
device that will display alpha-numeric text, approximately a
page at a time, and at a pace consistent with rapid reading,
at remote stations. Beyond that, there is need for display
of graphics, such as line drawings, graphs, diagrams, and
sketches. It would be good to have pictures, too, but they
seem less necessary.

The foregoing requirement has two branches. It is desirable,


on the one hand, to have means for displaying information
that exists in the computer store in the form of digital code
corresponding to characters of text, nrecipesn for the gener¬
ation of diagrams, and the like. On the other hand, it is
desirable to be able to display, at remote locations, informa¬
tion from repositories of microimages or other analog forms.
In the case of the digitally encoded information, means for
display now exist, but they are expensive; and the main prob¬
lem is either to devise new, less-expensive means, or to
figure out how to decrease the cost of the existing means.
In the case of the analog information, the problem divides
again into two parts. It is necessary either to figure out how
to transport microimages rapidly from the repository to the
remote station or to devise ways of scanning the microimages
and transmitting the scanned signals economically to the
remote station. (Alternatively, one can store the information
in the form of signals derived from the scanning of images.
The problem is in part to develop and compare these alterna¬
tive approaches.)

Interaction involves, of course, more than display by the


computer to the user. At present, the most convenient and
flexible device for communication from the user to the com¬
puter is the light pen that operates in conjunction with dis¬
plays on the screen of an oscilloscope, or the stylus that
operates in conjunction with the RAND Tablet, which is a

132
flat sheet on which the user can write or draw and through
which the computer can determine the position, at every
moment, of the user's stylus. Both the light pen/oscilloscope
combination and the stylus/RAND Tablet combination are at
present quite expensive; the costs are of the order of $10, 000.
In the opinion of many people who have had experience using
them, the arrangements are so convenient and, to use a word
that is expressive even though not literally accurate, powerful
that all the consoles of an information-transfer network should
have them. The problem is there, again, either to devise new
and inexpensive arrangements to serve the same function or
to figure out how to decrease the cost of arrangements of the
kind that now exist.

The list of display and control problems is long. It is impor¬


tant to have a larger set of characters than typewriters (and,
particularly) teletypewriters afford. Typewriters and tele¬
typewriters print much too slowly for presentation of text.
Moreover, far too few users can type well enough. Computer
programs can recognize cursive script, but pattern-recognition
programs use up a considerable amount of computer time, and
it may take quite a bit of study to make it practicable for the com -
puter to read the user’s writing. It takes either alarge amount
of processing by the computer or an expensive arrangement of
auxiliary equipment at the console to maintain cathode-ray
oscilloscope displays of text or of graphics, and something
must be done about that if the cathode-ray technology is to
serve the users of information networks.

As we have suggested. Project Intrex may not have to solve


any of these problems, and it certainly should not have to
solve all of them. What Intrex has to do is to see to it that
solutions to enough of these problems are found to make it
possible to carry out the experiments that involve consoles,
and then to see to it that there are adequate and affordable
consoles for operational applications in the 1970's.

INTERACTION LANGUAGE

"Interaction language" is the "software" aspect — functional,


procedural, behavioral, communicative — of the problem of
which the console is the hardware aspect. The users of an
information network will communicate with the system through
an interaction language. The language will be just as impor¬
tant as the hardware through which it is mediated. Interaction
language is now in approximately as primitive a state as
interaction hardware.

Project Intrex should have a small, core group on interaction


language. The primary tasks of the group should be to dev¬
elop the interaction language required for the main experiments

133
and to work toward a coherent family of interaction languages
for operational networks of the 1970’s. In addition, the group
should serve as a condensation point for research in man-
computer communication that will be conducted throughout
the Cambridge area.

The problems of interaction languages are analyzed earlier


in this chapter and in the appendix on interaction languages.
The fact that the languages are to be used by people who
typically are not technically knowledgeable about computers
distinguishes interaction languages of the kind with which
Intrex will primarily be concerned from on-line programming
languages. However, ’’programmability" is one of the most
powerful features that can be offered to the on-line commun¬
ity. The interaction language designed for the information
network should therefore border on, or should include, pro¬
gramming languages. The language must permit users to
specialize existing programs to serve their special purposes
and to direct the specialized programs upon named data.

The general requirements that must be met by an interaction


language for an information transfer network are fairly
straightforward. The interaction language must be easy for
non-technical users to learn and to employ. It must let them
specify what services they want. It must then let them make
use of the services. Since an information transfer network
will include many services — those of an augmented catalog,
an automated handbook, a document delivery and display
service, selective dissemination of information, and an
instructional system — the language must make it easy for
the user to identify particular documents in which he is in¬
terested, to negotiate retrieval prescriptions, to request
forms of delivery (microimages, full-sized hard copy, fac¬
simile, etc. ) and formats of display, to formulate questions,
to describe profiles of interest, and to interact with instruc¬
tional programs. The language should be free and natural
enough to suit the user, yet formal enough to be interpretable
by the computer.

The foregoing characteristics deal mainly with communica¬


tion from the user to the computer. The interaction language
must also facilitate communication from the computer to the
user. That is a trivial matter as long as it suffices for the
computer to select from a set of stored messages that were
prepared in advance; but it ceases to be trivial as soon as
the ensemble of things the computer might have to say becomes
too large for "canned answers".

The spectrum of possible interaction languages extends all


the way from a language consisting of single-word commands
to, and perhaps even beyond, natural English. At the present

134
time, most communication from men to computers takes place
in languages only slightly more complex than the simplest —
languages in which most of the statements are imperative and
consist of two terms, the first being a verb-like operator and
the second naming the data upon which the operation should
be performed. The statements that are not imperative are
simple declarative statements such as, "Alpha is an integer",
and "Beta is a system variable".

Going up the scale just a bit, one finds languages in which


the operators may operate upon more than one argument. In
the field of computer programming, such languages are called
"macro" languages. The so-called higher-order program¬
ming languages permit the user to make complex statements
in mathematical notation, usually algebraic notation; and
some of them, particularly those designed for business appli¬
cations, let the programmer write sentences that are actually
sentences of ordinary English, such as, "Multiply gamma
by 23. 6". The trouble with languages of the latter type is
that, whereas the acceptable sentences are English sentences,
not all English sentences are acceptable, and the programmer
has to remember which sentence formats are understood by
the linguistic mechanism (the compiler) of the system.

During the last few years, there has been some progress in
getting computers and programs to perform syntactic
anlayses — to parse sentences of natural language — and to
respond discriminatingly to substantive terms by taking into
account the kinds of data structure that are associated with
them. Unconstrained natural language, however, is still far
beyond the ken of computers and computer programs, and
most workers in the field of computational linguistics believe
that many years of research will be required to make machines
"understand" natural language in a way that, according to
behavioral criteria, approaches human understanding of
natural language.

It has been found convenient, in several'applications, to have


at least two versions of the language that the user "speaks" to
the computer. When the user is a neophyte, he finds it help¬
ful to employ familiar, meaningful words in his statements to
the computer, and he does not greatly object to having to type
them out in full. By the time he has developed skill and
facility in man-computer communication, however, the user
prefers a terser mode. Indeed, he likes to have single-
character abbreviations for the most frequently employed
commands, and he likes to identify the substantives (e. g.,
sets of data) with names of only two or three letters each.
However, if a few weeks elapse between his interaction with
the computer and his review of the record, communications
made in the terse mode are likely to be too cryptic. It is

135
desirable, therefore, for the computer to translate from the
terse mode to the full mode when it presents the record
for review.

The foregoing discussion sets part of the stage for a recom¬


mendation concerning the work of Project Intrex in the field of
interaction languages. We believe that the interaction-language
group should design and test languages for inform at ion-network
applications at levels ranging from the "macro11 level up as
far into the spectrum as it is possible to go without getting
deeply involved in fundamental research into problems of
syntax and semantics of natural language. We think that it
would be worthwhile to design, program and test as many as
five different levels of interaction language. As indicated,
each language should be capable of covering the range of
declarations, commands and queries that are likely to arise
in the use of the library and the information-transfer network.

ANALYSIS OF CONTENT

At the present time, there are computer programs that


examine the titles, abstracts, and/or bodies of documents
and come forth with terms descriptive of the subject matter
of the documents — terms that are useful in indexing and
cataloging. Automatic indexing is not wholly proven, but it
is promising. There are also some programs that examine
the contents of documents and produce abstracts. These are
less thoroughly proven and, on the whole, less promising.
It is evident, however, that the task of indexing and abstract¬
ing into which Project Intrex is heading — and with which
many operating organizations are already involved — is a
very large and time-comsuming one. It is a task that is
difficult to handle effectively by human agency because, to
handle it well, it is necessary to bring two different kinds
of competence to bear upon it simultaneously: a competence
in the field of documentation and a competence in the subject-
matter field of the documents being indexed or abstracted.

Thus far, what might be called semi-automatic techniques of


content analysis, computer-assisted indexing and abstract¬
ing, have been less well explored than fully automatic tech¬
niques. We think that semi-automatic techniques are promis¬
ing enough to warrant intensive study.

Project Intrex should consider establishing a small research


and development group in the field of automatic and semi¬
automatic analysis of content. As a minimum assignment,
such a group should develop and maintain an awareness of
the forefront of the art and adapt the best techniques for use
in preparing material for the main experiments. If available
techniques do not prove effective enough, and if it turns out

136
not to be practicable to achieve the required breadth and depth
of indexing and abstracting through human effort alone, then
Project Intrex should undertake to develop the techniques and
the programs required to support the experiments. It may¬
be appropriate to go farther than that in connection with the
theoretical work to be discussed later in this section; but we
think the supportive effort in automatic and semi-automatic
analysis of content should be focused sharply on the require¬
ments for indexes, abstracts, and the like that are posed by
the main experiments.

ANALYSIS OF THE NEEDS OF USERS

It seems to us possible, on the one hand, that the needs of


users of the projected information network can be ascer¬
tained, and ways of making the network responsive to those
needs can be figured out, without engaging in formal research
or development preliminary to the main experiments. It
seems to us to be equally possible, on the other hand, that
the needs of users and the problem of adjusting services to
meet those needs will prove to be a fairly deep and difficult
topic, possibly even critical to the outcome of the main ex¬
periments. In this area, therefore, we recommend that
Project Intrex set up, at an early date, a special investiga¬
tive group. This group should try to formulate, not research
problems, but solutions. In the process, it should'engage in
informal experimentation, interview potential users, and so
on, but it should operate with the bias that problems in this
area can be handled with common sense or, at any rate,
with expert sense, and it should move into a significant
R & D effort only if it is forced to do so. We make this con¬
tingent recommendation because we think the energies of
Project Intrex should be focused on the main experiments
yet because we suspect that the matter of user needs may
turn out to be both deep and crucial. One reason for sus¬
pecting so is that most of the opinion concerning what will be
required of advanced information technology stems from ex¬
perts in advanced informational technology, whereas the needs
with which Project Intrex should be most concerned are those
of undergraduate students, graduate students, faculty mem¬
bers engaged in teaching and research in non-informational
fields, and other substantively oriented scientists and engineers
affiliated with MIT or with industrial organizations in New
England.

THEORY OF INFORMATION TRANSFER

The broad area in which Project Intrex will operate includes


the computer sciences, the information sciences, the library
sciences, and parts of other disciplines ranging from psy¬
chology to electrical and mechanical engineering. There is

137
no dearth of theory in those fields. There is not at the present
time, however, we think it is fair to say, a comprehensive and
basic theory of information transfer. Perhaps such a theory
is now in the process of forming, but it does not yet have
definite structure, and it is not yet ready to support engineer¬
ing applications.

Although the main tasks of Project Intrex are concerned with


experimentation and the development of technology, and al¬
though it seems unlikely that it will be possible to develop a
sound theoretical basis in time for it to support the execution
of those main tasks, we think it is appropriate, and indeed
highly desirable, for Project Intrex to engage in basic studies
designed to lead to a coherent theory of information transfer.

The reason for our conviction that basic theoretical studies


are appropriate and desirable is that, without such studies,
experimentation in the field is likely to be, in the derogatory
sense of the word, empirical. For immediate purposes, it
may be sufficient to find out that one approach works while
another does not; but we look upon the information transfer
networks of the coming decade as but the first of many genera¬
tions of an important new species of intellectual technology.
We think it is important for this first generation to be con¬
ceived in a union of theory and experiment.

We doubt that it is either appropriate or possible to prescribe


a good program of basic theoretical research for Project
Intrex. We prefer to fall back upon the trite but true obser¬
vation that the main thing is to attract capable research people
to the field and to the project. We think that it will be pos¬
sible for Project Intrex to provide the nucleus of a fairly
large and very active research program without having to
operate the entire program itself. Indeed, we recommend
that Project Intrex adopt the policy of maintaining a small
core program of basic research in information transfer and,
at the same time, open its facilities to use by, and encour¬
age the use by, research people throughout New England and,
indeed, throughout the wider area that is within teleprocessing
distance of MIT.

138
7. DATA GATHERING FOR EVALUATION

Comments on the topic of data gathering, in this section, will


necessarily be quite general. Few of the "experiments” des¬
cribed in the foregoing program can be controlled in the
detailed manner that allows for rigorous statistical analysis.
They are, rather, trials of new procedures and equipment,
which will have to be modified as the trial proceeds and from
which general observation will suggest directions for further
progress. Data to be gathered will be in the nature of observa¬
tions of what is going on, to be used as indicators of possible
improvements, rather than comparative measurements in a
controlled experiment.

Data will, of course, be easier to record and analyze than is


usually the case in libraries, because of the presence of the
computer. Nevertheless, it will be necessary to include the
appropriate data-gathering procedures in the software devised
for each part of the program. Some of the kinds of data that
should be so included are discussed in this section.

DATA ON USE

Presumably each user of each experimental system will be


registered and will have a code number, which he will use
whenever he uses the system. Stored in the machine with the
code number will be a description of each user, to indicate
his status (student, faculty, etc.), his specialty (chemist,
psychologist, etc.), and additional professional profile inform¬
ation. Then, if the computer records the user's number-when¬
ever he uses any part of the system, it will be possible later
to correlate a detailed tracing of the nature of the use with the
user's status and specialty. Thus a set of use-patterns, of
the sort discussed by Morse in his appendix, can be computed
to see what parts of the system and which paths through the
system the chemist uses, for example, in distinction from
those used by the psychologist or historian.

Furthermore, if monetary or, perhaps pseudo-monetary charges


for various services are introduced, to bring in economic bal¬
ances of desires with capabilities, a certain amount of discrimin¬
atory control can be exercised by changing the relative magni¬
tudes of the charges from time to time. In this case, it will
be of considerable interest to see which kinds of users react
to these changes and what form the reaction takes.

ECONOMIC CONTROLS

The initial data on what services are most popular will probably
need to be combined with data on cost of service, to determine
the charges to be made for various services, if economic

139
controls are instituted. Thus the use-patterns, obtained by
correlating user records and records of use, will be needed both
before and during any experiment involving economic controls.

It is suggested, in this connection, that the economic controls


could work both ways — that a user could be rewarded when he
contributed to an improvement of the system, as well as charged
for its use. For example, there could be a standard payment
to each user who adds a significant comment to one of the cata¬
log entries in the augmented catalog. Since each user identifies
himself to the machine, it should not be difficult for the machine
to keep books on each userTs debits and credits, so that "bills11
could be sent out periodically.

DATA ON LEARNING

Since many innovations risk early rejection if their use is not


easily learned, it behooves the designers of each Intrex exper¬
iment to make it as easy as possible for the early users to
learn their way around. The software for each aspect of each
experiment should include the recording of the learning curves
of new users; and these records should be reviewed frequently
at the start, to see whether there are any indications of learn¬
ing difficulties, so they can be reduced.

In fact, the learning process is so integral with the development


of the Intrex program, that it probably would be wise to assign
a man or a group the responsibility for developing learning pro¬
cedures and for checking on actual learning processes among
the using population. Such a person or group could be respon¬
sible for the final forms of the instructional aids for newcomers
and for the continual improvement of these aids in the light of
user reaction. As indicated elsewhere in this Chapter, these
learning aids will take many forms. If different procedures
or pieces of equipment are to be compared, the learning curves
for each competing element, to be gathered by the suggested
person or group, will be important factors in the comparison.

AN ANNOTATED USER'S CARD

Thus we suggest that there should be a "user's card" with


data on the particular user's status and profession, on when
he started to use the system and other pertinent data, supplied
by the user himself or, when appropriate, by the machine.
Records on use of various services provided by the system would
then include the user's number, together with when and how he
used that service. With this amount of data present, correla¬
tions could be made between user's profession or status and
what he used, or on how soon, after registering with the system,
he started to use this or that service, and so on.

140
To carry this idea further, the user's card could be the basis
for making the library's reaction with the user active rather
than passive. Excerpts of the user's use-pattern could be
recorded on his "card”, and the system could be programmed
to volunteer further data of the sort he has already shown an
interest in. In addition, the user could have the right to
"annotate" his own McardM, to indicate his possible future
interests and disinterests. For example, a computer program
to correlate the new acquisitions as they are entered in the
augmented catalog with the "user's card" declaration of inter¬
ests could serve as the beginning for an automated selective
dissimination of information system that did not rely on long
initial interviews to open the activity.

141
APPENDICES

In the course of the discussions that preceded


the writing of the present report, the partici¬
pants of the Project Intrex Planning Conference
produced more than 140 working papers. These
papers were thought of as informal memoranda,
addressed to the Conference to provide a tran¬
sient record of discussions, or to clarify and
extend remarks made in the working groups
of the Conference.

Twenty-three of these papers have been


selected for inclusion in this report, in an
attempt to give the reader access to supple¬
mentary material that may have shaped the
thinking of the Conference. The authors were
given no opportunity to revise these papers;
thus the informal language of the Conference
is preserved. The papers are presented here
in a sequence corresponding to their appear¬
ance at the Conference, so that the reader
may gain some insight into the order in which
certain thoughts had their impact on the plan¬
ning of the Intrex experiments.

GONNA WAV* 1
TH& MOST APVto* CfcP
, OOUttk IN gV&tf'THIN' ,
OK M'MOW/EP
10 MAN NO* M*61!

Copyright 1965
Hall Syndicate, Inc.

143
APPENDIX A
REMARKS OF VANNEVAR BUSH
TO
PROJECT INTREX PLANNING CONFERENCE
2 August 1965

I will have very little to say this morning. There is good


reason for this. I learned a long time ago that, if one wishes
to live a reasonably comfortable life, he had better avoid get¬
ting into competition with a lively group of younger men in a
scientific or technical field. Accordingly, after the war years
rather completely knocked me out of all the things I had been
trying to do, I never did go back into this one. On the other
hand, I have watched it from the sidelines with keen enthusi¬
asm for the resourcefulness, ingenuity, sound engineering,
and imagination which have been exemplified by the men who
are in the forefront of the advance.

I do have two points that may be worth noting.

The advent of powerful analytical machinery has notably ad¬


vanced the economic prosperity of this country, as it has
facilitated business operations and speeded research programs.
This certainly does not need exposition. It is fully evident
merely from a glance at the financial aspects, for one thing.
Now I feel sure that this group assembled here is convinced
that a similar benefit to our prosperity would follow a suc¬
cessful mechanization of our libraries. But this is a subtle
matter. It will not show up promptly on operating statements.
There will not be the push of the profit motive behind it. And
it will be an expensive program.

This being the case, it will move forward well only if intelli¬
gent citizens, in positions of influence, come to realize the
potential benefits that exist. And this will occur only if those
in a position to understand the whole affair speak out, effec¬
tively and often. I am not advocating some sort of promo¬
tional campaign. I am merely trying to point out that there
are two parts of this job: first, the tough technical and pro¬
fessional task, which is indeed tough; and second, the burden
of carrying the intelligent section of the public along toward
understanding and, if possible, enthusiasm.

On this latter undertaking there is a point of which I feel full


advantage should be taken. The benefits of the great advance
in analytical machinery thus far have flowed largely, but not
quite completely, to men of business, science and engineer¬
ing. The program in the minds of you gentlemen reaches far
beyond this. It will influence, perhaps revolutionize, the

144
methods of every professional group — in law, medicine, the
humanities. It will support every phase of our general cul¬
ture. I believe very few scholars today realize what this
could mean. I am sure the general public does not realize, for
example, that success in this program could mean as much to
their well-being, their health, as has been produced by the
power of antibiotics. If and when professional men and schol¬
ars generally grasp what is here involved, there will be no
lack of support if this country maintains its present prosperity.

One last point, and this is merely an idea which I present for
your consideration. I believe that an essential aspect of a
fully successful system is that it should automatically improve
by reason of its use. It is easiest to consider this in the case
of a professional library, perhaps one on the law. If the attor¬
ney, turning to this, and finding the exact reference for his
purposes, found also the comments and criticisms of his col¬
leagues who had passed that way before, and if he added his
own thought, the value of the library would be continuously
enhanced. I realize that this would have to occur, at first, in
situations where conflict of interest is absent. But I would
hope that it would extend in an atmosphere of professional
interchange and frankness which is characteristically American.

As you labor on this problem you will work, not with the tools
of today, but with those tools as they will be rendered more
powerful by the technical advance which is present and active.
With these refined tools, and adequate support, I am sure
much will be accomplished. I merely wish I were young enough
to participate with you in the fascinating intricacies you will
encounter and bring under your control.

145
APPENDIX B

AN ON-LINE INFORMATION NETWORK

This is a proposal that Project Intrex set up an on-line in¬


formation network as an experimental 'Vehicle", and conduct
a variety of experiments with the network or within the con¬
text of the network. The information network would be a
subsystem, at first, of a time-sharing computing system at
MIT, later of a time-sharing network involving also other
computer systems in other institutions. The experiments
would be in such areas as editing of manuscripts, interaction
among colleagues with reference to manuscripts and other
documents, consoles, interaction languages, apparatus of
bibliographic control, retrieval, dissemination, represen¬
tation of content, and system adaptation to user needs.

INTRODUCTION

This appendix examines the possibility of making a signifi¬


cant approach, in the near term, to what is often thought of
as an advanced or futuristic information network. It strikes
me that it is technically possible, even now, to set up a
system within which a limited community of scientists and
engineers could experiment with preparation, storage, and
retrieval or dissemination of information within the context
of a teleprocessing network. It is doubtless not economically
feasible, at the present time, to develop an on-line informa¬
tion network of broad scope for a large community, but it
might not be too costly, and it might provide valuable ex¬
perience, to set up a limited network for one field or a few
fields of research and application. The network need not
be limited to a particular geographical area, but it should
probably have a center or focus, and that might well be at
MIT.

The experimental information network would be set up,


according to the plan that I am proposing, within existing
and forthcoming time-shared computer systems. It seems
likely that, during the next few years, most of the university
time-sharing systems, and perhaps some others as well,
will be connected together to form a teleprocessing net. What
I am thinking of is an information system that could be de¬
veloped within that net — hopefully, without using up more
than a reasonable fraction of the over-all computing and
communication resources of the net. It seems desirable that
the information network have a center or focus, even though

147
the network of time-shared computer facilities, itself, might
exist as a loose confederation of coordinate subsystems. The
plan calls, in accordance with that idea, for setting up the
proposed information system at first within the time-shared
computer facility at MIT, then extending service over tele¬
communication lines to individual people or consoles at other
locations, and finally bringing other time-shared computer
facilities into the system, thus creating a true network.

FUNCTIONS OF THE NETWORK

Among the functions that would be subserved by the network


are: collection, processing, and organization of data; prep¬
aration of manuscripts; review, editing, and MacceptanceM
of systems of data and of manuscripts; storage of data, of
documents, and of information derived from data and docu¬
ments; indexing, classifying, and organization of the store;
selective dissemination of information; retrieval of data,
documents, and derived information; facilitation of publi¬
cation through normal journal, monograph, and book
channels; preparation and storage of the apparatus.of biblio¬
graphic control for the conventional literature; and the
bibliographic part of retrieval from the conventional literature.

The foregoing list of functions may not be entirely self-ex¬


planatory. I am not thinking of a system that would, at least
in its early years, supplant normal publication. That con¬
sideration explains the fact that some basic functions appear
twice in the list, once with reference to the-on-line informa¬
tion network itself and once with reference to the role played
by the on-line network in normal publication.

Let us think of the functions as seen first by the experimental


scientist, then by the network editor, then by the network
librarians, and finally by Mdestinationn (as distinguished
from, but not necessarily divorced from, ’'source'1) users of
the system. The ’’destination1’ users may be experimental
scientists, theoretical scientists, engineers, reviewers,
science writers, and the like. In the interest of brevity, I
shall not go into all the ramifications.

Some experimenters may conduct their experiments — at


least in part — through computer systems such as that of
Project MAC. Since the information network here being dis¬
cussed will be using a facility such as that of Project MAC,
the computer system will be helpful in collecting, assembling
and processing the experimental data. Hence the first of the
listed functions. In some instances in which organized sets
of data are of general interest, the data would be organized
and formalized and stored within the system in such a way as
to be available independently of reports based on them.

148
Suppose, now, that the experimenter continues to use the
system as he prepares his manuscript for "publication".
For example, he uses an "editor" program to help him get
his manuscript into good form and style. He uses graph-
handling programs to prepare graphs that have the proper
visual appearance and, at the same time, the correct alpha¬
numeric representation within the computer system. (The
computer can reconstruct the graphs from the alpha-numeric
tables.) In short, the experimenter prepares his manuscript
in computer-processible form.

Consider, next, that the experimenter submits his manuscript


to an editor who enforces the standards of the experimental
network. This editor is distinct from the editors of con¬
ventional journals, but, as will be seen later, he works
closely with the editors of conventional journals, monographs
and books. The network editor, upon receiving the manu¬
script, assigns responsibility for it to an associate editor,
who arranges for reviews by appropriate people; the review¬
ers are "members" of the network. The associate editor
may go through more than the normal amount of communi¬
cation in order to find reviewers who are willing to do the
work and to do it very soon. Hopefully, his task of communi¬
cating with potential reviewers is facilitated by the fact that
he can communicate with them, and give them parts (or the
whole) of the manuscript through the system. It seems like¬
ly that, in the reviewing process, the procedure of maintaining
anonymity of reviewers may, in many instances, be waived,
but, of course, it need not be waived. In any event, through
communications and conferences within the network, necessary
emendations can be made, and the decision can be reached
as to whether or not to accept the manuscript for "publication"
within the network. This is totally distinct, during the era
of the experimental network, from acceptance by a conventional
journal. (For simplicity, I shall here speak about journals
and not refer continually to "journals, monographs or books. ")

Let us assume that the manuscript is accepted. It is then


stored, as a document, in the secondary or tertiary storage
of the network. (I say "storage of the network" in order to
avoid discussing the problem of deciding at which points within
the network to store various items of information.) As soon
as a manuscript is added to the store, it becomes available
for retrieval by members of the network who have substantive
interest in its content. It becomes available, also for process¬
ing by the librarians, documentalists, and other professional
organizers of information who work within the network. In
part, the work of organizing is to construct and improve the
apparatus of bibliographic control. In part, it is to devise
and test improved ways of organizing the content of the net¬
work's store and of representing the content for machine

149
processing, for retrieval, and for study by people. The two
parts of this organization are, needless to say, closely
interrelated.

At this point, I think I should say that the usefulness of the


network to substantively oriented scientists and engineers
would not have to wait upon the development of advanced
apparatus of bibliographic control, or upon the development
of advanced memory organizations. The network would be
set into operation with one or another of the present-day
programmable techniques of retrieval and dissemination,
and improvements would be introduced usually, first, on an
experimental basis as advances are achieved through thinking
and programming.

It would be expected that, through the work of substantive


users and of the librarians, documentalists and organizers
(I group those three terms because no one of them seem to
have the full connotation I wish to convey), kernels of basic
information would be abstracted from the documents (through
which they were introduced into the network) and organized
into new kinds of information structures, different from
documents, and more appropriate for machine processing.
Let me not try to describe such information structures here.
Suffice it to say that they are a topic that needs to be develop¬
ed. Over the years, I think, they would come to be the main
representations with which substantive users would interact;
but, during the early years, the main interaction would be
with documents and with organized sets of data.

We come now to retrieval and use of information from the


network store by substantive users. They would have access
to the store, and to programs for processing it, from their
remote stations. That channel of access would be backed up
by hard-copy service: pneumatic tube, messenger, or mail.
The processing and retrieval programs would be controlled,
of course, through a user-oriented language. This language
would be part of a broader language that would control, in
addition, the other functions (e. g., editing) provided by the
network. I hope there will be discussion of, and writing
about, such languages.

In addition to the facilities for user-initiated retrieval, there


would be arrangements for selective dissemination of in¬
formation. These would include, in addition to the now-
conventional techniques, arrangements based on the initiative
of individual authors or contributors, who would alert their
colleagues, through the system, whenever they introduced
manuscripts or data into it.

150
The final function of the system is to keep records of its use
and to facilitate experimentation. The proposed system is
an experimental system. It is, as the jargon puts it, an
"experimental vehicle". It should, therefore, be more
heavily adorned with facilities for observation and control
of its operation, and arrangements to promote flexibility of
operation, than would be fitting in an "operational" network.
But distinguishing "experimental" from "operational" is not
to suggest that the experimental network should not operate.

FEASIBILITY OF THE PROPOSAL

An information network — even an experimental one of the


kind proposed — will represent a large investment by the
time it is in full-fledged operation. In my opinion, however,
that fact does not vitiate the feasibility of this proposal.
Several considerations contribute to my estimate that the
proposal is feasible. I shall describe some of them briefly.

First, although the network would not be very useful to


potential users of the store until the store contained a
significant body of information, some of the services — and
these services should be among the first to be provided —
would be useful to the authors of manuscripts. In particular,
a good system of editing programs is a valuable, even in¬
dispensable, asset to a time-shared computer system. The
proposed network could develop in a progressive way from
such a small beginning.

Second, quite early in its life the system would serve as a


device to facilitate communication and experimentation within
an "invisible college". It would be possible for an author to
discuss parts of his manuscript with colleagues, including
geographically remote colleagues, even during the prepara¬
tion of the first draft. It would be easy for co-authors to
meld their contributions into a coherent paper. In any event,
there are plenty of interesting schemes to explore to make
the system useful, even when it contains only a few manu¬
scripts or documents. Perhaps I should not suppress the
notion that the prestige value of engaging in "on-line scholar¬
ship" might carry the proposed system through the early
months, even despite a paucity of services and a dearth of
information in the information base.

Third, the field of information sciences will almost surely


suggest itself as the main field upon which the proposed
system should focus. Indeed, the most appropriate subfield
would seem to be the field of on-line information processing.
That subfield is, happily, still a small one. It would be
possible to get most of the significant documents into the
store with only a modest investment in conversion from

151
printed to computer-processible form. In any event, by
selecting a small, new field, or a small cluster of such fields,
it would be possible to ensure that the system be practically
useful at an early date, and that the cost of carrying it to the
stage of practical usefulness would be small.

Fourth, there is the very important consideration that most


of the parts of the proposed system are already in existence,
in one form or another, and it is therefore necessary only
to appropriate them and to modify them to work together
coherently in a system. It is not necessary to do all the work
from the beginning. This is not to say that planning and design
of the system can be dispensed with; they are necessary to
ensure that the system brought into being will support the
extensions, the improvements, the advances that were en¬
visioned in the discussion of functions.

Fifth, there seems to me to be little basis for what may be


the most prevalent criticism of this proposal, namely, that
the art of processing of text (content) and the facilities for
storing content are not sufficiently well developed to support
several of the functions proposed. At present, there are
hundreds of operable computer programs for processing text —
as many as 200 available for use within a single organization.
The largest primary (directly processible) computer memories
now hold more than a million characters. The largest disc
files are approaching a billion characters. The largest stores
that afford access to any element of data within a few seconds
are approaching one hundred billion characters. I think it is
the case here, as it is in so many fields, that the technology,
itself, is outrunning our understanding of how to apply it.
The proposed experimental network would make the challenge
offered by the technology clear and tangible, and would
stimulate the exploration, experimentation and testing that
are required to develop the understanding.

THE PROPOSED EXPERIMENTAL NETWORK, CONVEN¬


TIONAL PUBLICATION, AND CONVENTIONAL LIBRARIES

As suggested earlier, the experimental network would function


more in the domain of pre-publication communication and
informal interaction among workers in a small field than it
would in the domain upon which conventional publication
channels and conventional libraries focus. At one point in
my thinking about the proposal, I thought it might be in¬
teresting to try to operate an on-line journal. However,
there is little prospect of providing enough consoles at an
early date to support a wide readership; and, in any event,
there are existing journals that already serve the field of
information sciences and the subfield of on-line computing.
It seems better, therefore, to aim the experimental network —

152
at least at first — toward the domain of invisible colleges
rather than toward the domain of existing journals.

With competition thus eliminated, we may turn to possible


cooperation between the experimental network and existing
journals. Editors of existing journals might be invited into
the system — might be provided with consoles — and en¬
couraged to use the facilities of the network in soliciting,
reviewing, editing, and publishing articles — publishing the
very manuscripts that enter the experimental network. The
network could, in that way, serve to acquaint the editors of
existing journals with new techniques and procedures, and
it could profit from the reactions of those editors.

There may be possibilities for exploring applications of the


network in other than editorial parts of the publication
process. However, I am not prepared to develop them now.

EXPERIMENTS

The proposed network has been called an experimental net¬


work. What experiments should be conducted?

Let us interpret "experiments*' broadly: controlled ex¬


periments, investigations, explorations, tests, and so forth.
The network would serve as a "vehicle" for experiments of
those various kinds. I shall not try to develop an exhaustive
list of experiments here, or to give detailed descriptions of
them. However, I shall set down a few paragraphs on the
subject of what experiments could be conducted within the
context of the proposed network.

Text-editing programs. Several text-edition programs are


available now. It would be worthwhile, I think, to convert
them into a form compatible with the time-sharing system
and network, and to evaluate them through competition,
formal or informal. It would be still better, I think, to
dissect the various editing programs and to isolate their
individual service functions or features. Then one could
re-assemble the components in various ways and find out,
more systematically, just what features are most valuable
to authors of manuscripts, to reviewers and editors, to
researchers working with data, etc.

Cooperation among co-authors. Programs to facilitate


cooperation among members of a team may not yet have
been written — I think things are moving rather slowly in
this particular area of on-line computing — but there is
certainly a place for such programs. I think it would be
worthwhile to develop a battery of programs to facilitate
communication and cooperation among the co-authors of a

153
paper, to make those programs available through the network,
and to experiment with them in approximately the same way
as was proposed in the case of text-editing programs.

Consoles. Experiments on consoles, displays and controls


is a very large topic. I shall not try to develop it here, but
shall try to contribute something on this topic during the
Conference.

Information-retrieval techniques. As suggested earlier,


the network would be set into operation with the aid of
existing information-retrieval techniques. During the early
months, experiments could be conducted to compare various
alternative techniques. For example, several different
systems based on descriptors, key words, and the like could
be tested competitively. Descriptive terms selected by people
could be compared with descriptive terms selected by com¬
puter program. Systems using thesauri could be compared
with systems not using thesauri. Systems restricted to de¬
scriptors, indexes, catalogs, and the like could be com¬
pared with systems based on nfull-text search11. (At least
three or four systems of the latter kind are presently in
operation; it would be necessary only to adapt them for the
tests.) Looking a little farther ahead, one can envision ex¬
periments with various kinds of query language — simple
languages at first and more-complex ones later. Within the
five years, it should be possible, certainly, to conduct ex¬
periments to assess the value of surface-structure syntactic
analysis in querying. Perhaps, within that time, some pro¬
grams for deep-structure analysis will be available. The
foregoing just scratches the surface in information retrieval,
of course, for it is a large and much-developed field. One
of the good developments is the TIP Project, and it goes with¬
out saying that the TIP techniques and their extensions would
be made available and tested in the proposed network.

Dissemination of information. There are fewer already-


developed techniques for dissemination than for retrieval,
I think, but there are quite enough to constitute an experiment.
One could get started right away, therefore, adapting the
better of the existing techniques for use within the network
and testing them competitively. There are many directions
for improvement and advancement in dissemination of in¬
formation, and there are quite a few people, around the
country, who are interested in making them. I think the
network would provide to them the vehicle that they have
been needing; and I suspect that a lively program of ex¬
perimentation in dissemination would develop, almost of
its own accord.

154
Representation of text within computer memories. The prob¬
lem of representing natural-language text within computer
memories, representing it in such a way as to be economical
in the use of memory space and also to facilitate processing,
is an important, and I think a deep, technical problem. There
are many ideas — coding for compactness, "hashing'^ list
structures, "trie" structures, etc. — and, I think, still more
to be discovered or invented. The processes of discovery
and invention involve a kind of software experimentation. I
would expect to see much of that kind of experimentation
carried out within the network.

Representation of information abstracted from text. The


area of study that deals with data structures for the represen¬
tation of ideas within computer memories is also both im¬
portant and deep. I think research in that area will go on
over a much longer period than is projected for Intrex, but
I think that some of the pioneering work should be done within
Project Intrex. When new representation techniques are
developed, they should be coupled with information-retrieval
techniques and tested within the network.

COMPUTER-PROCESSIBLE VS COMPUTER-RETRIEVABLE
TEXT

The foregoing discussion of the proposed network has made


the tacit assumption that all the information stored within
the system would be processible by the computer or com¬
puters. It seems to me important to bring about, within
Project Intrex, some kind of a direct confrontation between
systems based upon that assumption and systems in which
the documents to be studied by users are held on film, or
in some other medium, in such a way that they can be re¬
trieved by computer and transmitted or transported to the
user — but in which only the apparatus of bibliographic con¬
trol is actually processed by the computer. It may be,
therefore, that this proposal is only part of a larger proposal.
The other parts would deal with the development of other
subsystems, some of which would store documents as images.

J.C.R. Licklider

155
APPENDIX C
MEASURING USER NEEDS AND PREFERENCES

An important aspect of many experimental installations will


be the user’s satisfaction, and some consideration of how
acceptability can be estimated will be required for almost
any experimental program that is undertaken. It is probably
possible to get valid and reliable measures of user satisfac¬
tion, but it is not easy. A few cautionary remarks would
seem to be in order.

First, a clear distinction should be drawn between measure¬


ments of user satisfaction and evaluations of the system as a
whole. It is certainly desirable that users should be satisfied
with the services they receive, and that these services be
tailored, so far as possible, to match the user's needs and
preferences. But the orderly growth of accumulated knowl¬
edge available should not be endangered in order to provide
special services to special users. User satisfaction is only
a part of the larger problem of system evaluation. Presum¬
ably this is a problem every library director must face when
he decides how to allocate the funds available in his budget,
and we merely remark that measures of user satisfaction
cannot solve it for him. Measures of user satisfaction can,
however, help to decide between alternative services — other
factors being equal — hence it can be argued that an experi¬
mental program should have a capacity for making such measure¬
ments .

Second, there is a strong temptation, which must be sternly


suppressed, to attempt to measure preferences in advance.
The economies are obvious. The mistake goes like this:
A new service, or a variety of new services that might be
provided (if anyone wanted them), is described in clear and
careful prose. Potential customers are informed in this way
of the possibilities ahead and then, on the basis of that de¬
scription, are asked to rank-order the services in desirabil¬
ity, or to assign ratings on some subjective scale (e.g., how
"hot" is this item on your "psychological thermometer", etc.),
or even to say how much money they would pay for the service.
Bitter experience indicates that such evaluations of imaginary
services are worse than useless. In order to insure any de¬
gree of validity at all, it is essential to provide actual experi¬
ence with the services to be evaluated.

Third, another pitfall that must be avoided is the confusion


of frequency of use and value to the user. The point is fairly
obvious, but often overlooked when statistics of actual use
are compiled. It should not be concluded that a service is

156
unimportant, either to the individual user or to the system as
a whole, simply because it is used infrequently. Records of
use — frequencies provide valuable information about the way
the system is working, but they must be interpreted with great
caution when planning innovations are in the system.

Fourth, if questionnaires or interviews are used, these must


be prepared and administered by professionals. How a ques¬
tion is worded can strongly affect the way it is answered; how
an interviewer responds can strongly influence the person he
is interviewing. Most of the obvious mistakes that can be made
have been made, are well known to those who work in the field,
and ways to avoid them have been devised. It is good to employ
a professional for such work. If the professional is brought in
from outside, he can often do the job much more objectively
than can someone who is caught up already in defending the
services whose satisfactoriness he is trying to measure.

Fifth, it is very easy to over-estimate the value of accurate


measurements of user satisfaction in a system developmental
program. This is not to say, nThe customer be damned!M.
On the contrary, the customer should be listened to most at¬
tentively. He will give some rough (often, literally rough)
indications of his feelings, and this is usually enough. It may
be unnecessary to go to great trouble and expense for a more
refined indication. When a system is rapidly evolving, im¬
provements in it can usually be evaluated informally; to in¬
terrupt development long enough to measure user satisfaction
with some intermediate state of the system can cost more
than it is worth. For internal guidance of day-to-day work,
therefore, accurate estimates of user satisfaction can be
sacrificed in favor of the intelligent judgment of a technical
administrator or system engineer. When the system has
reached a relatively stable configuration and evaluation is
considered worthwhile (perhaps in response to outside pres¬
sures), then a full-scale user inquisition may be appropriate.
By this time, however, the value of the system will generally
be obvious to everyone already.

In conclusion, informal feedback from users can probably


provide sufficient guidance for most decisions during the
developmental stage, and the elaborate efforts that are re¬
quired in order to obtain better estimates are not worth their
cost. The recommendation, therefore, is to keep open the
informal channels of communication with the users — but how
this can best be accomplished is impossible to specify in
detail at this stage in the evolution of the Intrex experiments.

The only general recommendations would be:


That some of the system developers should also
be system users, sharing as far as possible the

157
same research interests as other users. Not
only are such "planted" users good feedback
channels about user satisfaction; they can also
do much to make the system more accessible
and attractive to other users.
That a "user's remarks file", of the kind currently
provided in the MAC system, can give the user of
a computerized system an easy and convenient chan¬
nel, especially for small details that might other¬
wise not be reported; it probably is well worth
the cost.

George A. Miller

158
APPENDIX D

INTERACTION LANGUAGES

An essential ingredient for the success of a computer-based


information transfer system is a good language for communi¬
cation between man and machine. At present, we interact
with the library through a librarian who understands at least
the basic English that we speak and, depending on his subject
specialties, may understand more or less of our technical
jargon. If the specification of our desires is vague, or if he
does not understand what we’ve said, he asks questions until
the nature of our needs becomes clear. Then he helps guide
us to a proper source of information. We would like a similar
flexibility in our computer-based system. Simple tasks should
be quickly and easily specified. More-complex demands might
be specified through a dialogue with the system. During a
search, we would like continued interaction with the computer
in as natural a language as possible. This is not to imply
that we wish to dispense with reference librarians completely,
of course, but they should be used only for the very complex
tasks for which they have been trained, and which are beyond
the scope of the system programs.

One mode of interaction with the computer-based library


should be a form of what Don Swanson has called ’’programmed
interrogation”. In this mode of interaction, the user can ask
a question which not only requires an answer from the com¬
puter but forces later questions to be taken in the context of
this previous interaction. In fact, the answer to a question
from a user may be a question (or series of questions) from
the machine, attempting to get more information to specify
more fully the user’s request. Alternatively, after a certain
question from the user, only certain other questions may be
reasonable as follow-up's from the response of the computer.
This kind of tree of actions may be shown explicitly to the
user on, say, a scope display; in fact, this kind of display
may be used to allow the user to specify (using a light pen)
the next alternative to be taken. An information retrieval
scheme, with this kind of control, has been designed and
implemented at MITRE by E. Bennett and others. A context-
dependent dialogue system, called MENTOR, has been
constructed by Feurzeig and Bobrow at Bolt Beranek and
Newman.

An important function of the computer in this computer-based


information transfer system will be to provide aid to the

159
user in discovering how he can formulate requests to and
commands for the system. One way of doing this is by pro¬
viding a simple teaching-machine program to use with a
programmed instruction text, with very flexible paths through
this text. One can compare experimentally this type of
instruction with ordinary user manuals that are provided
externally to the system. In this programmed-instruction
situation, the help from the system should gradually diminish
as the user learns more and more, and eventually he should
be able to talk to the computer in a very concise fashion
indeed. Of course, he should always have the option of going
back into the longer mode and requesting help when he for¬
gets the conventions of abbreviation of the short commands.
Similarly, the output from the computer should become more
abbreviated as the user becomes more skilled; but he should
always have the option of saying, ’’What? 11 and getting the
unabbreviated form of that message.

One might think that, to make communication with the com¬


puter easy, it would be best to be able to talk to it in English.
This possibly could be done, provided we restricted the kind
of English input we allowed; but I feel that the user will
probably feel more at home with a well-designed, problem-
oriented language designed to express the kind of procedures
he is likely to want to perform in utilizing the library system.
Again, this is subject to experimentation and analysis of
effectiveness and cost.

As mentioned previously, it will be imperative for the user


who learns the system to be able to talk to it in a more-
concise fashion than the novice user. For this reason, it is
probably a good idea to have the command language embedded
in what is called, in computer jargon, a Mmacrolanguagen.
By this we mean that the user will be able to specify a single
string — perhaps with some arguments — which will stand
for a whole sequence of operations in the original command
language. In this way, he will be able to abbreviate the
designation of sequences of operations that he performs
quite often. These macros should then become part of the
language for this user, and he ought to be able to create new
macros (abbreviations) which stand for sequences of opera¬
tions and which include macros previously defined as well as
basic operations in the system. There is no need for these
strings, which are defining new operations, to be shorter
than the original command string. If the user wishes to
create a more mnemonic (to him) command, or to talk in a
sort of pseudo-English, it ought to be possible for him to
create any string to be used to give a command (or sequence
of commands) to the machine.

160
We have discussed here, briefly, several aspects of interac¬
tion languages. First, as implied by the name, the language
should not be something that is accepted passively, but some¬
thing with which the user can interact strongly with the
machine. The machine should be very forgiving of mistakes,
and should request additional information it needs if the user
forgets to supply it. Secondly, there should be an ability to
abbreviate and to form a higher-level language built on the
command language originally given to the user. The way
suggested to do this was to embed the command language in
a macrolanguage for the system. Finally, programmed
instruction and feedback should be an integral part of the
system.
D.G. Eobrow

161
APPENDIX E

ECONOMICS, LIBRARIES AND PROJECT INTREX

Mildly stimulated by discussions at the Conference, I should


like to advance a slightly different view of economics in rela¬
tion to libraries and information handling.

There is a critical need for a sophisticated analysis of the


economic values of information — broadly defined — to society.
The methodologies for such an economic analysis are clearly
difficult, and the problem may be beyond the capability of
contemporary economic theories and techniques. But some
very able economists should consider the problem. I would
call your attention to the study by Theodore Schultz for the
Ford Foundation, The Economic Values of Education. This
is not the conventional level-of-education vs lifetime-earnings
study but, rather, the economic impact upon society of dif¬
ferent levels of educational input. A study related to informa¬
tion would probably be more difficult, but at least two questions
of a fairly fundamental nature now lack answers.

What is likely to be the over-all effect upon society,


measured by GNP, or whatever other measures are
suitable, of increasing or diminishing the support
of "informational activities"? (We hasten to recog¬
nize that there are cultural and other values that
are critically associated with "informational and
educational activities" that are not susceptible to
quantitative or economic measurements, but this
ought not to be used as an argument against a so¬
phisticated analysis of what might be measurable.)
A second dimension of such an economic analysis
would be directed to areas where input is currently
deficient, and to areas where further input would
have maximum output benefits.

For example, no one now knows how much a university library


is "worth" to its institution or to society; no one knows whether
some of the input in a university library, if shifted from the
purchase of materials to the better analysis of the acquired
materials, would produce a greater level of "benefits" or not;
and no one knows what the impact on a university would be of
doubling or tripling the library budget. While it can be argued
that no one knows what the university is "worth" to society
either, I still submit that at least the feasibility of studying
the economic values of information needs some critical exami¬
nation. The study would pick up, in a sense, where Macklup’s
study. The Production and Distribution of Knowledge in the
United States, stops.

163
At quite a different level, I would like to argue that Project
Intrex should enlist the services of an imaginative economist
at a very early stage. There are at least two major reasons
for such action. First, the adoption of various technical paths
and systems designs will be critically dependent upon sound,
economic analysis. Secondly, the application of sound tech¬
niques of economic measurement in the kinds of changes that
are ahead in the library information field need to be better
understood than is currently the case.

If the librarians need to know more than they do about computer


technology (this seems moderately reasonable), and computer
engineers need to know more about libraries (this is obvious!),
both need to know a lot more than they do about economic mea¬
surement in the making of long-range choices in the informa¬
tion field.

H.H. Fussier

164
APPENDIX F
THE ROLE OF GRAPHICS IN INFORMATION TRANSFER

BASIS FOR EXPERIMENTATION

The information center, an interface between the user and


the library collection, must perform a great variety of func¬
tions, ranging from acquisition, through cataloging, to pres¬
entation of hopefully relevant information. In this capacity,
it is evident that effective interfacing requires the storage,
manipulation, display and dissemination of graphic material.

By "graphics" is meant either the original text or reproduc¬


tion images of text, including alpha-numerics, pictorials,
charts, formulas and drawings. (Coded representations of
alpha-numerics are not termed graphics because spatial for¬
mat has been discarded or modified, and the reconstituted
characters are generally not accurate images of the origi¬
nal text.)

Bibliographic catalogs have largely comprised alpha-numeric


summaries of content. In.some cases (Chemical Abstracts,
for instance), a limited use of graphics has been made, but
the difficulty of machine storage and manipulation of graphic
information has precluded its widespread inclusion in cata¬
loging operations.

The primary use of graphic reproduction techniques has been


in dissemination of copies of full texts and in microform stor¬
age of source material. The storage function is a passive use
of imaging techniques. Dissemination of hard copies (paper
or film) represents an active use of graphic information trans¬
fer. Between these two extremes lie many possibilities for
active use of graphics in the operations of information centers.
This new area is the principal subject of this appendix.

The use of graphic images in information transfer systems is


based on broader premises than space compaction or archival
retention of valuable texts. Graphic images, often in micro¬
form, offer novel approaches to selective dissemination, cata¬
logs of informative extracts, establishment of personal infor¬
mation files, more rapid access techniques, etc. These
concepts are discussed in greater detail in later sections.

It is an understatement to say that microform technology has


not been entirely satisfactory in past library operations. Just
mention microfilm to a librarian and you’re in for a colorful
discussion. Why is this so? Has the manufacturer failed to
do his job properly? Or has the librarian failed to state what

165
is needed? Or has he misused existing equipment? There may
be basis for each of these points of view, but it would serve
little purpose to fix the blame here* Clearly, it can be stated
that technology exists to solve in great part the problems of
the information center. But it is a mistake to leave the job
entirely to the manufacturer. Faced with a great variety of
manufacturing alternatives, it is not likely that he will opti¬
mize equipment for purely library applications without guid¬
ance and standardization from the library community.

Project Intrex offers a unique environment for experimental


determination of optimum information-handling techniques.
The experiments outlined in the last section of this appendix
are intended as a guide to both new procedures and develop¬
ment of optimum technologies.

GRAPHIC STORAGE AND TRANSMISSION

Some discussion of storage and transmission techniques is re¬


quired in order to evaluate the extent of graphic information
transfer that may be feasible in library operations over the
next 10 years. Recognizing the computer's capabilities to
store, manipulate and route information, it is tempting to con¬
sider future storage and transfer of all graphic information
via computer systems. And this maybe possible. But, for
now, let's consider the alternatives.

There are three basic types of image storage. First, there


is conventional photography in which a reduced-size image is
stored on a two-dimensional recording medium. Here the
technology is well established and developed. With present
equipment, linear reductions of up to several hundred times
are possible, although not without problems from complex
optics, dust, dirt, etc. A more normal reduction ratio is
from 20 to 60 times (linear). To provide a meaningful evalu¬
ation of these reductions, it is comfortably possible to provide
microfilm images of 3n x 5" LC cards at a storage density
of 200 cards per linear foot of 16-mm film. Hence, a very
extensive library catalog could be stored in the space of a desk,
providing access to the catalog cards of several million docu¬
ments. Existing file equipment and associated reader-printers
are available for such applications.

The second class of graphic image storage involves linear scan¬


ning of the document page and storage (usually magnetic) of the
analog signal representing luminance variations across the page.
This is the basis of the video-file system. In this technique,
the image of a document page is stored on magnetic tape, usu¬
ally in a space of about l/3n along 2n wide tape. Disc-file
storage may also be employed. The images can be stored, re¬
arranged if desired, modified, displayed on cathode ray tubes.

166
or printed on to paper copy or film. The capacity of the sys¬
tem is limited by the tape supply system or the disc files
available. If high-resolution images are required, consider¬
able storage space is used and special display tubes employed.
Although in its infancy and presently expensive, video storage
offers a highly flexible medium for rapid access to graphic
images. Its primary applications in the future may be in situ¬
ations where the ability to erase and add images is of prime
importance.

The third class of image storage involves a point-by-point scan


of the document and subsequent digital encoding of the luminance
information at each point in the matrix. If a 100 x 100 micron
scanning cell is used to record the document, an 8-1/2” x 11”
page provides about 6 x 10 ^ picture elements. If we require
at least eight levels of luminance information to be discrimi¬
nated, a total of about 18 x 10^ bits of information is required
to store an 8-1/2” x 11” page. Assuming optimum codes, sig¬
nal compaction by elimination of redundancy, etc., we might
envision a reduction in storage requirement to about one mil¬
lion bits per page. The 100-micron scanning cell which was
assumed should provide adequate resolution for most of the
library collection. Looking forward to computer memories
of 10*2 bits, about a million document pages could be accom¬
modated in digital form as outlined above.

Even greater storage capability is possible if we permit a


combination of storage forms in which alpha-numerics are
stored as recognized 6-bit characters and pictorial material
is stored by the point-by-point encoding described above.
There is a strong possibility that optimum future transmis¬
sion of graphic information will involve digital encoding (pulse-
code modulation). Storage in a binary mode will facilitate the
interface between the image store and the transmission channel.

And so we are provided with a variety of storage possibilities


for the future — some available immediately and some as strong
possibilities for future information transfer. What does all this
mean to Intrex ? First of all, there should be an awareness
that mass storage and transmission of graphics is feasible for
the future. Although digital storage is not now widely available,
it should be recognized as a strong contender for future systems.

A primary objective of the Intrex experiments should be to de¬


fine a complete system of information transfer; the optimum
form of graphic information handling depends on the compo¬
nent system requirements. The selection of a particular
graphic storage technique for the Intrex experiments should
be viewed only as an experimental expedient to evaluate alter¬
native uses of graphics in the information transfer complex,
keeping in mind the storage and transmission possibilities that
may be offered by future technologies.

167
In contemplating various experiments, a detailed analysis of
the economic factors associated with various systems of stor¬
age and transmission is needed. There is little doubt that
microfilm offers the most economic means of microstorage
with currently available equipment for input, access, presen¬
tation, and generation of full-size paper prints. For the Intrex
experiments, evaluation of the possible uses of graphics in
information transfer probably will be made via microfilm sys¬
tems as adjuncts to computer consoles, transmission devices,
and paper-copy machines. A major contribution that can be
made by Intrex would be the specification of improved viewers
and printers for library uses of microforms.

If a large number of remote users is envisioned for the future


MIT library system, a careful study should be made of the
economics of transmitting graphic images over phone, micro-
wave and video channels. The cost factors will largely influ¬
ence possible decisions to create decentralized files of micro¬
images, both as stores of document images and as new cata¬
log forms.

Finally, strong consideration should be given to storage of at


least a small subset of high-activity documents in coded com¬
puter form. A scanning and encoding apparatus would seem
to be a reasonable investment for the Intrex experiments. Its
use could provide computer storage of an important portion of
the library acquisitions and would permit evaluation of the
ability to display graphic images at remote consoles in very
short access times.

NEW USES OF GRAPHICS IN INFORMATION TRANSFER

Microform Card Catalogs

Assuming that alpha-numeric catalog data are available within


the augmented catalog experiment, it would be possible to gen¬
erate microfilm files of catalog data for distribution to a large
number of remote users. Microfilm records of catalog entries
can be produced by photography of computer output. The re¬
mote files of film (in rolls, strips, or fiche) would be stored
in a suitable reader-printer for consultation by users at the
remote site. As mentioned earlier, many millions of entries
could be stored in a small space.

The microform catalog could serve a number of users who


either do not have access to the computer console or who wish
to browse through the catalog. It may also be an augmentation
of the computer-held catalog, storing information on a large
portion of associated library collections which cannot be accom¬
modated by the computer system.

168
Since the film strips are generated by filming of computer
display, the catalog is updated by computer merging of new
data within the old collection and periodic generation of new
microfilm catalogs. In the interim, a separate file of new
acquisition images would be maintained. (It may even be of
some value to have a separate file of new items prior to the
periodic merging.)

Informative Graphic Extracts

Computer-aided searching of alpha-numeric catalog data is


certain to be an important facility of future library operations.
The user, after consultation with the computer system, will
be provided with accession numbers to documents of possible
relevance. Clearly, we must aim toward maximizing rele¬
vancy of recall and minimizing traffic in full documents; this
optimization procedure may benefit from the inclusion of
graphic extracts of documents as a step in deciding to recall
any or all of the documents suggested by the computer system.

Assume the augmented catalog search has furnished the user


with several dozen suggestions. Consider the possible advan¬
tage in having at hand a file of microimages of graphic extracts
for each document. These extracts might comprise images of
the first pages of journal articles, tables of contents of books,
or even book jackets, pictures of the book itself, sample pages
from the book, important graphs, pictures, formulas, or crit¬
ical reviews. The user could consult these images on a viewer -
printer adjunct of the remote console before he decides to recall
any of the full documents from the central file. It is quite likely
that these informative extracts themselves might satisfy many
of the user's needs.

It is suggested that these graphic extracts be prepared by micro -


photography of selected portions of the text. In future systems,
in which digital encoding of full text is possible, the graphic ex¬
tracts might be computer-generated and photographed from CRT
display of the collage of excerpts. To keep intellectual work to
a minimum, the author (or an early editor) should be required
to outline the important parts of his text that should be included
in the graphic extract file.

Microform Dissemination to Personal Files

There is real need for simple, inexpensive copy devices to pro¬


vide the student or other library users with facsimiles of library
materials. Full-size Xerox copies have greatly aided this im¬
portant library function.

An alternative copy form, which has been tried without much


success, is microfilm. Cameras have been expensive, and

169
viewers out of economic reach of the student and most other
library users. Even so, we still recognize the attractiveness
of the concept of personal files of compact, inexpensive micro¬
film copies.

How do we reduce the camera costs and complexities and pro¬


vide images that can be comfortably viewed in inexpensive and
small devices? I suggest the following approach. Cameras
and viewers become complex and expensive when we insist on
high reduction ratios. But, for personal files, why not a new
look at microimages of very modest (but significant) reduction
ratios? If we think of linear reductions of only six or seven
times, we still provide area compaction of from 36 to 49 times
and compact storage of a large number of document images is
possible. The equipment simplicities that result are impressive.
At low reduction ratios, we can view aerial images rather than
screen images. Viewers much like the current hand viewers
for 2"x 2" slides would be used. Binocular viewers providing
five or six times magnification are currently available at very
low cost — and they even include battery-powered, self-contained
light sources. An 8-1/2M by llTt page can be imaged at 6.3
times reduction on a 35-mm frame of film. The cost per frame
might be a cent or less.

Technology exists to produce both the simple cameras (with


cartridge loading, in-camera processing, etc.) and inexpensive
viewers. If Intrex experiments indicate acceptability and de¬
sirability of this concept, industry might find this a worth-while
venture for library and other applications.

Selective Dissemination of Microform Information

The availability of computer-stored catalog information per¬


mits selective dissemination of current acquisition data to both
the specialized branch library and the individual. Again, com¬
puter output could be microfilmed and distributed to the remote
microform catalogs discussed earlier.

Another interesting possibility is the selective dissemination


of informative graphic extracts; prepared by microphotography
of sections of new books and journals, they could be dissemi¬
nated according to user-interest profiles. The number of jour¬
nals crossing our desks is becoming excessive; it seems quite
desirable to receive only the graphic digests of interest to us.
Profiles are determined from a statement of interests by each
user, by analysis of his past library use, or by study of his
own writings (both published and unpublished).

EXPERIMENTS AND STUDIES SUGGESTED FOR INTREX

The following suggestions are in a random order. They repre¬


sent a broad evaluation of the preceding thoughts on graphics in
information transfer; no priority of adoption is suggested.

170
Study of User Network Requirements

Early in the Intrex program, a detailed study of the user net¬


work should be made. The numbers of users, remote-console
locations, switching requirements, transmission-load estimates,
response times, and many other design criteria will largely
determine image storage and transmission forms. The econom¬
ics of information transfer to the remote users may strongly
argue for remote files of microform material.

Microfilm Catalogs of Bibliographic Data

For the body of information selected for the augmented catalog


experiment, microfilm files of catalog data could be prepared
as adjuncts to the computer consoles, as well as for consulta¬
tion in areas where there is no access to computer catalogs.
The microfilm catalog (stored in microstrip, microfiche, or
roll form in available readers) might also contain catalog in¬
formation beyond that stored in the computer system. The
microfilm catalog could even be a catalog of authors’ abstracts
which are too extensive for computer storage.

The microform catalog at the user catalog is not to be con¬


sidered an alternative for the computer catalog, but should be
viewed as an adjunct facility. The experiments should deter¬
mine optimum content of the microform catalogs.

Remote Files of Informative Graphic Extracts

The microform catalogs of the experiment above would com¬


prise alpha-numeric information which is arranged, displayed
and filmed by computer equipment. The aim of this third ex¬
periment is to evaluate the utility of microfilm dissemination
of graphic extracts both as a "finding key" in determining rel¬
evancy and as a form of selective dissemination.

The possibilities for material to be included in graphic extracts


are many; a major contribution of Intrex could be the determina¬
tion of the optimum content for classes of material and usage.
Should we film book jackets, the chapter headings, the most
important graphs and pictures, or simply random swatches
through the work? Could these extract files satisfy much of
the browsing requirements? Could selective dissemination
of graphic extracts remove the need for circulation of journals?

Answers to these questions require preparation of graphic ex¬


tracts of a reasonable variety for both journals and books. The
initial generation of these extracts will be a sizable job; still,
it appears justified in view of the potential advantages, and
especially in view of the future possibility of computer genera¬
tion of graphic extracts from the central digital image store.

171
Evaluation of Low-Reduction Microimages

A possible experimental facility within the Intrex center might


be a copying service using 6Xor 7X reductions on 35-mm film.
The film images, after processing, could be mailed to the re¬
quester for his personal file. With only slight modification,
viewers now available in the $10 price range could be sold or
loaned to a number of graduate students or faculty members
to evaluate the concept of personal microfilm files. It may
be possible to provide SDI in this form to a selected user group
for evaluation. (Considering market potentials for low-reduction
copying devices, industrial cooperation in preparing equipment
for filming and processing devices might be obtained.)

Experiments on Console Designs

For the display and printing of graphic information at the re¬


mote site, it is not sufficient to suggest the development of
special and improved reader-printers for microforms. Rather,
the entire user console should be carefully engineered for a
much wiser variety of uses which may include in an integral unit:

Typewriter input-output for remote computer.


Video display of computer output and/or document
images.
Filing and search facilities for microfilms.
Hard-copy printing from video transmission and
from microfilm file.
Audio input-output facilities to the central computer.
Microfilm camera for recording user-generated
hard-copy.
Optical viewer for microfilms.
Transmission and display of handwriting on special
tablets (e.g., RAND tablet), connecting all remote
consoles.
Microfilm generation from video display.
Files of audio recordings stored in easy-load
cartridges.
A re-usable recording plate which could record the
selected video or microfilm image at full size, be
taken to a soft chair in the reading room, and later
replaced in the console for re-recording. This would
provide more comfortable viewing of graphic images
than when one is restricted to a stationary reader
screen; it would be essentially free of cost because
of re-usability; and it would allow others to use the

172
console during the off-site reading period. Xero¬
graphic images with unfixed (not heated) toner
particles may provide this facility.

Computer Storage of Digital Images

To evaluate the ability to provide computer storage, manipu¬


lation, and rapid access to images of documents, an experi¬
ment is suggested in which a high-activity portion of the
library is stored in binary form. Perhaps we could conceive
of a computer-stored Mthree-day shelfn, regularly updated
with new acquisitions and stored in the computer by digitally
encoding signals furnished by optical-scanning equipment.

If not the "three-day shelf” collection, then certainly some


often-consulted portion of the holdings should be computer-
stored for evaluation of this promising technique.

The above-mentioned possible experiments are not a complete


list of what Intrex might undertake in this area. From these
suggestions, undoubtedly others will occur. Hopefully, Intrex
will find this an important area for investigation.

J.L. Simonds

173
APPENDIX G
DATA ARCHIVES AND LIBRARIES

THE BASIC PROBLEM

Libraries today contain vast amounts of tabulated data of which


the census volumes may be thought of as an example. The
present mode of data storage — large printed tables — has
drawbacks that can be overcome in a computerized library.

The present technology of data storage is largely based on


three past inventions: The Questionnaire, Unit Record Tabu¬
lating Equipment, and The Book. All modern societies have
found it essential to accumulate information on forms with
fixed formats on which people enter information about them¬
selves, their business transactions, their life histories, ca¬
reers, illnesses, skills, attitudes, etc. The resulting hor¬
rendous accumulation of paper became manageable with the
invention of the Hollerith Machine, which permitted forms to
be transcribed, record by record, into a positional notation
on a card; counts then could be made of the frequency distri¬
bution of those positions. However, IBM cards can also ac¬
cumulate mountainously. The most convenient way of sum¬
marizing them proved to be the publication of two- and three-
way cross-tabulation tables in large volumes. These turned
out to answer many of the most frequently asked questions.
One can look up the population of each city by age or by sex
or by race, or, indeed, by some small number of combinations
of these. One can look up value added by manufacture, by in¬
dustry, by state. One can look up tides by location on the
coast, by season. Libraries are full of such data.

This mode of data perservation, however, suffers from a


number of defects. In the first place, one can only answer
questions about combinations of variables that happen to be
chosen for printing by the data-book publisher. For example,
suppose you wish to ascertain whether educated Negroes have
a hard time obtaining their first jobs. You could presumably
get a fair indication if you could find the statistics for the num¬
ber educated, employed, and unemployed persons by race and
age. In fact, you probably wouldnTt find such a table. You
would probably find tables on employment by race, employment
by age, and race by age, but not the combination. Furthermore,
if you did by some chance find the combination, you would prob¬
ably then wonder whether it was accounted for by regional dis¬
tribution in the population of Negroes and whites; you would,
therefore, want the same table by locality, which you would
almost certainly not find. All you could then do, if you wished
to pursue the inquiry, would be to launch a new research project.

175
seeking the precise data you wanted by some sort of small-
scale sample survey., or else you could ask the Census Bureau
to run you a special set of tables. Since either of these proce¬
dures is costly, data producers are under constant pressure to
report more detailed and elaborate cross-tabulations in the
standard tables. The libraries get fuller and fuller of thick,
densely printed volumes, which, however, can never report
more than a small percentage of the possible cross-tabulations.
(Note that the present Indian census is being reported in 1500
volumes. Given the excellence of this census, its uniqueness
in developing countries, and the widespread interest at MIT in
India, the MIT library certainly ought to acquire this collec¬
tion. But if India is publishing 1500 volumes now, what does
the future hold for 115 countries in 20 years?)

A second defect of the present system of data storage is that


the data from different sources are not decomposed into com¬
parable, small units. Votes are reported by election district,
census information by census tract or town or county; economic
information by industry; market research by metropolitan dis¬
trict or market area; weather by weather station, etc. All
sorts of approximations have to be used to link data by space
unit or by time unit. If one wishes to correlate agricultural
production by rainfall, one may find it difficult to link the re¬
porting units and thus be forced to undertake an expensive,
fresh, sample study.

A third defect of the present system of data storage is that,


even when data are collected from the same reporting units,
it is not possible to link them between subsequent observations
over time. Most of the people who replied to the 1950 census
also replied to the 1960 census. By looking at the income-
distribution figures in the two censuses, we can measure how
much the median income has grown. We cannot, however,
say how much the average person^ income grew because no¬
where is the same person compared from 1950 to 1960. No
tag links them, individual by individual. It could conceivably
be (though it is not the case) that most peoples1 incomes fell.
Thus, one of the weaknesses of published, aggregated statis¬
tics is the impossibility of updating the underlying data in a
longitudinal fashion.

AVAILABLE SOLUTIONS

The technology available to us now permits the solution of


some, but not all, of these problems. Fast computers with
bulk memories can store the raw data (the equivalent of the
original IBM cards) and permit the user to create at will what¬
ever tabulation he wishes. This may, under some circum¬
stances, be an economy, and is certainly an economy if it
saves the user from undertaking a fresh research project. It

176
will probably never be an economy, however, to abolish cen¬
sus volumes or other volumes of basic statistics. It is prob¬
ably quicker to look up the population of the 50 states in the
World Almanac than to ask a computer to recompute them. I
do not intend in this brief memo to attempt to explore the al¬
gorithms for determining when it is more economical to print
out frequently used results and store them on sheets of paper
or books; when it is cheaper to compute frequently used re¬
sults and store them in ready-to-print output files of a com¬
puter; and when it is cheaper to keep the raw data untabulated
and permit the rapid calculation capability of the computer to
be used to produce the desired results. It is sufficient to note
that the direction of technological development is probably such
as to make it increasingly economical to leave raw data untabu¬
lated until the consumer wants it and then to permit the compu¬
tation of results, rather than to store the results once computed.

If raw data are stored, it will often facilitate the comparison


of results drawn from different studies, since basic units are
more likely to be translatable into one another than are ag¬
gregated units. For example, if voting districts and local
government units do not coincide, their overlaps can be better
estimated if the voting results are reported by precinct rather
than by ward.

Furthermore, longitudinal data records can be maintained if


it is possible to link the identity of the original respondent or
data supplier as he responds at different points in time. An
example of why this is important can be drawn from recent
studies by the Department of Labor. Using Social Security
records (which are longitudinal by individual), the Department
ascertained for a sample of workers in the construction indus¬
try what percentage had been unemployed at some time in a
particular season. Needless to say, this was a very much
larger number than the previous recorded unemployment
statistics in the construction industry which were based upon
static surveys of the number of unemployed at any given time.
Week by week, the figure might remain fairly stable at. (let us
say) 5%; but, over a season, the number of people who had been
unemployed might be larger by a factor. The logic of this ob¬
servation is obvious. Its significance is that, until now, such
an estimate of the number of people affected by the experience
of unemployment (as distinct from the number of unemployed
at a moment) had not existed.

It will presumably never be possible for any data archive (other


than that of the original collecting agency) to have certain
kinds of data records by individual or by individual firm. The
restriction is created by the requirement for privacy. The
U.S. Census Bureau, for example, will not release to libraries
the responses of individuals with an identifying code on them

177
that would permit the addition of those individuals new re¬
sponses 10 years later — or,.at least, the Census won't do that
for named individuals. Given this limitation there .are, how¬
ever, two partial solutions, namely, the recording of uniden¬
tifiable samples of individuals and the recording of aggregate
data in units sufficiently small to be useful but sufficiently
large to protect the individuals. The one-in-the-thousand
census tape is an illustration of the former solution. It con¬
tains no exact local information. It is, therefore, substan¬
tially impossible to identify any individual in the United States
from his replies. There is only one chance in a thousand that
any given person has data about himself on the tape in the first
place, and one could never guess who that individual is if one
doesn't know where he lives — no matter how much information
the tape gives about his family size, occupation, etc. The one-
in-a-thousand census tape does contain an arbitrary serial num¬
ber for each respondent. If this were preserved by the Census
Bureau, a very valuable longitudinal analysis could be done at
the next census. However, often one is particularly interested
in localizing data. One wants to study Boston, not people in
the Northeast. The best solution under these circumstances
is to report the smallest unit that will disguise individuals,
such as, for example, the block or the census tract.

The value of such small-unit reporting may be illustrated by


reference to voting statistics in Italy. One of the commonest
observations about Italian politics since the war has been that
on the extreme stability of the Communist vote, which has
varied by only a few percentage points. The conclusion of
stability is quite wrong, however, except as applied to the
total system. It turns out that the Communist vote oscillates
wildly when you look at voting statistics, commune by com¬
mune; it is not rare for the Communist party to get 10% of
the vote in a town in one election and 60% in the next. It fol¬
lows that, instead of the Communist voters1 being a stable
hard core as the national figure would suggest, they are in
fact a rapidly fluctuating group. Ideally, we would like to be
able to examine voting statistics by individuals, but, because
of considerations of privacy, that is obviously impossible in
any democratic society. But, by getting down to a small
enough unit, we have made a much better estimate of indivi¬
dual behavior than we would have if we only looked at the
larger aggregations.

Since data will continue to be collected on different bases,


and since some data will continue to be reported in aggre¬
gated form, even if in small aggregations, the problem of
matching data from different sources remains. It requires
complicated calculations which are much more feasible on a
computer than they were formerly. Nonetheless, much raw
data exist which are relatively easily comparable. Often,
also, we are interested in working in only a single file, and

178
so the problem of matching does not arise. All in all, it is
clear that, in the technology of the future, the user of data
will not be satisfied with printed tables but will want access,
so far as possible, to the basic data file.

KINDS OF DATA AVAILABLE

While much of what I have said has been in a social science


context, (since that is what is most familiar to me), it should
not be assumed that this is just a social science problem. The
functioning of a modern, technical civilization requires all
kinds of data, some of which are mixed and hard to identify.
What, for example, are traffic statistics ? Are they engineer¬
ing or social science? Vital statistics, weather statistics —
all present many of the same problems. Let me, however,
list some of the kinds of social science data, with which I am
more familiar, that any library of the future will need to have.

There are increasing sets of data comparing the 115 or so


countries in the world. UNESCO and the United Nations pub¬
lish large amounts of such material and, as each country
increases its data-collecting capacity, series of that kind will
grow. The Yale Data Archives has specialized in such col¬
lections. The Banks-Textor Cross Polity Survey, published
by the MIT Press, is an example of a study based on such data
files.

The census provides the largest and probably the most impor¬
tant archive of all. The census publishes not only population
statistics, but also censuses of agriculture, industry, mu¬
nicipal government, etc.

Public-opinion polls and social surveys are increasingly being


stored in data archives. The most important such archive is
the Roper Public Opinion Research Center at Williams College,
which now has a collection of the original IBM cards from ap¬
proximately 4000 surveys conducted in all parts of the world
over the past 30 years. These provide a unique historical
record of the changing attitudes and mores in our time.

Voting statistics, corporate statistics, government fiscal


statistics are other obvious examples of the kinds of data
that library users will want. .

Finally, we should note that a university itself generates a


continuing accretion of data such as the students1 records and
its library*s usage.

IS THIS A LIBRARY FUNCTION ?

The storing of basic data in retrievable and manipulable form


is, indeed, a library function. The library is an archive of

179
that type of information that is of interest to many members
of the university community and that is too bulky or expensive
for each to retain or own. Each member of the faculty owns
some books,, but no member of the faculty can afford all the
books he needs. The library provides the economy of shared-
book usage.

If this is a function of the library in the university, then


clearly data archives also belong in the library. Records of
the sort listed in the previous section are of great interest to
large numbers of members of the academic community, yet
are too mammoth and too expensive for any member of the
academic community to obtain them for himself. They there¬
fore belong in the library where all can share them.

Obviously, many data collections are so bulky or so expensive


or so private that not even a university library can hope to
own them. That, however, only suggests that specialization,
division of labor, and linkage among libraries in a total li¬
brary system are necessary in this field, as in other fields.
For example (to get away for the moment from statistical
data), comic books and local daily newspapers are data for
some historians and social scientists; political pamphlets and
posters are data for others; literature in African languages is
of interest to others. Clearly, not every library can acquire
all these things. The Hoover Library at Stanford University
collects radical pamphlets. Some African center may collect
materials in the newly written African languages. Libraries
elsewhere need ways of access to that material. The same
principle applies to statistical collections. There is no point
in trying to duplicate the Roper Centerfs activities in the
public-opinion poll field; we want to tie in to them, to be able
to obtain that part of their collection that we need at any given time.

IMPLICATIONS FOR MIT

There exists at MIT, in the social sciences in particular, a


considerable interest in this problem of data archives. We
have received a grant from the National Science Foundation
to explore the problem of computer systems for large-scale,
social science data archives, and are developing a small
prototype system to experiment on some of the problems. The
experimental system that we are now developing will include
public-opinion poll data; one-in-a-thousand and city and county
census data; voting data; and life-history data. This effort is
being undertaken with the help and cooperation of Project MAC,
the directors of which have recognized the importance of
moving from a purely computation-oriented system to a data¬
base-oriented system. Significant progress in this direction
may become possible in a year or so.

180
It is clear that many of the problems to which our social
science project is addressing itself are logically identical to
library problems to which Project Intrex is addressing itself,
and that many of the solutions (by way of computer hardware
and software, and human procedures) are identical. I would
therefore like to urge Project Intrex to consider seriously
the role of the data archive as an integral part of the library
system where, I am convinced, it belongs, and hope that we
can find ways to integrate and coordinate our intersecting
interests.

POSTSCRIPT

As I review what I have written, one significant point seems


to me to be missing. The memorandum is archaic in that it
proceeds on the assumption that a data collection — as distinct
from storage — is unaffected by new developments in data
storage and analysis. That is, of course, not true. Increas-
ingly, the computer will move into earlier phases of the data-
generation activity. For example, there is no reason to
assume that employment statistics will, in the future, be
generated by questionnaires. It will be mbre efficient and
more accurate to have the computer store in the data file the
information generated by processing the Social Security form
that comes in when a man is newly employed or leaves the
payroll. This point applies to university research too. Vari¬
ous activities in the laboratories and the classrooms of the
university can be linked to the computer system to generate
data automatically. The example that comes closest to home
is that the usage of a computerized library system generates
statistics about it in the system monitor.

Ithiel de Sola Pool

181
APPENDIX H
GUIDELINES FOR INTREX
CONTENT-ANALYSIS EXPERIMENTS

Intrex must explore the use of machine-readable full text.


One of the methods that has been widely used is automatic con¬
tent analysis of text. This field has been the subject of many
investigations, working systems, etc. M, E. Stevens (NBS
Monograph 91, March 19 65) cites 662 references related to
automatic indexing and similar areas of content analysis.

What techniques or combinations should Intrex apply? How can


successful content analysis be measured? What degree of effec¬
tiveness might be predicted? How can the extremely heavy
computational requirements be reduced? Clearly, some of
these questions must be answered in the next two to three years.
Intrex experimenters must have available to them the best
analytical tools and must adapt, improve, and extend these.

Before suggesting experimental guidelines, it might be well


to examine some shortcomings of the reported work. Some
analysis schemes that have been proposed have not been im¬
plemented; others, while implemented, have been applied to
samples of text that are now considered too small, too biased
or too specialized.

Since content analysis is likely to be a significant element of


Intrex, it is important to establish worth-while experiments
in this area. Instead of suggesting one experiment, I prefer
to offer some guidelines in the establishment of a computer-
based system and an environment. In this, an ’’experiment"
might consist of adding a new subroutine or of exercising the
system with an interesting corpus. The following are the
guidelines:

Overall purpose. A selected text is analyzed according to a


specified procedure. As output, some assertion about the
contents of the text is made, and optionally, the information
derived about the text is added to the system. The assertion
is subjected to a relevance evaluation, to give some validation
to the selected analysis process.

Corpora of text. A truly large collection of high varied texts


must be available to the user. Some characteristics of its
breadth are:

Subject matter — technical and non-technical.


Type — varying from free-running narrative
to succinct, formula-laden material.

183
Language — English and others that are heavily-
represented in recorded knowledge (Russian,
French, German, at least).
Size — varying from titles through paragraphs
to multi-volume treatises.

Some of the text should be that which formed the subject matter
of other well-performed experiments. Use of texts perpared
elsewhere should help to reduce the input costs.

Text storage. All text should be standardized for Intrex use.


If suitable, the current RAND standards could be used. Ques¬
tions of character set and physical representation must also
be settled in advance. Transliteration and conversion to
standard should be fully automatic, whenever possible.

Vocabulary files. As the experiments proceed, single and


combined files of words, terms, etc., found by statistical
and/or syntactic analysis, could be stored to minimize re¬
dundant computer effort. Derived information (frequency
counts, syntactic roles, etc.) could also be stored.

Lexicons, microlexicons. The system would have provision


to file and to maintain useful lexicons including language-
language, language alone, technical microlexicons, etc.

Analysis programs. The system initially would have available


a basic set of analysis programs: sentence parsers, state-
of-the-art translation programs, permuted index programs,
statistical "automatic extracting" and "automatic indexing"
programs, classification programs, routines to derive new
classifications by clumping, factor analysis, etc. As Intrex
proceeds, such programs would be modified and new ones added.

Evaluation system. A collection of well-conceived evaluation


schemes must be available with the system. It should include
both plans to compare the results of competent human docu¬
ment analysis against automatic computer analysis and plans
to compare the results of different machine-analysis schemes
against each other. Wherever possible, such evaluations
should result in quantifiable output (e. g., correlation coefficients).

User interaction. The entire system should be exercisable


by a user in a MAC-like environment. He will select his text
from a "catalog" displayed to him or perhaps he will enter a
new text. The catalog will indicate what (if any) derived files
are available and will give other useful information about the
document (e. g., length, average paragraph length, sentence
length, type/token ratio, references to previous work using
this text, etc.). The user will request certain manipulations.

184
observe results, feed back additional manipulations, call for
evaluation programs, etc. All this will be recorded in a
master file which collects the use pattern and observed
results of the content-analysis system.

Ascher Opler

185
APPENDIX I

PROPOSED EXPERIMENTS IN BROWSING

The purpose of this note is not to try to add ideas to the im¬
pressive pool formed by the collection of contributions al¬
ready made during the Planning Conference, but, rather to
propose some experiments. Perhaps one or two of the things
proposed will not qualify even under a broad and liberal in¬
terpretation of "experiments11, but I shall try to press, in
formulating the following, in the direction of analyzable
experiments.

THE FOUR-FOLD BROWSERY

In simplest schema, the idea of this experiment is to create


four browseries, to optimize each insofar as possible within
its type, and to compare the four by operating them in parallel,
and judging among them mainly on the basis of their popular¬
ity and browsers’ reports of their attractiveness and effective¬
ness. There is a little more to the experimental plan than
that, as I shall try to develop, but not very much more. The
main addition is that at least some of the instances of brows¬
ing in each system would be turned into case histories, and
an analysis would be made of the patterns of browsing that
are followed within the four contexts.

The first browsery is a browsing room with conventional


shelves and conventional books.

The second browsery contains exactly the same information,


but in different form and arrangement. All the pages of the
books of the first browsery are available as microimages,
retrievable through a mechanized handling system, and the
mechanized handling system is under the control of an elabo¬
rate cataloging, indexing and annotating system that may be
operated from the consoles — the consoles at which the micro¬
images are magnified and displayed — through a digital com¬
puter. The catalog or index and the annotation apparatus are,
of course, coded and computer-processible.

The third browsery is essentially the catalog-index retrieval


part of the second browsery, except that it is set up for
"manual manipulation" — perhaps that phrase is not so re¬
dundant now as it once was, what with all the manipulation
by machine that I have been hearing about lately — of cata¬
log cards, turning of pages in index volumes, and the like.
The idea, here, is to optimize browsing under two heavy
constraints: first, that the browser go to the library in order
to explore, second, that his exploration go no deeper than the

187
apparatus of bibliographic control. If he wants to get an actual
book, he has to use the regular library services to get it.

The fourth browsery is the same as the third, insofar as avail¬


able information is concerned, except that it might include,
also, the annotations mentioned in connection with browsery
Number 2. The main thing about the fourth browsery is that
it is available wherever there are suitable consoles. The
information base (the apparatus of bibliographic* control and
the annotations) is encoded for computer processing. The
single heavy constraint for this fourth browsery is the same
as the second of the two heavy constraints for browsery Num¬
ber 3, that the browsers depth of penetration is limited, that
he cannot break through into the contents of documents but
has to detour around, has to Mwalk over to the library" for
access to the contents of a book.

The four browseries just described do not fall precisely into


the four categories of a two-by-two table, but Ithink they per¬
mit one to make some interesting comparisons. Comparing
the first two, one pits all the conveniences of handling actual
books, together with the disadvantage of working within a
fixed, linear array and a single hierarchical classification
scheme, against the bother of working with microimages,
together with the advantage of working through a highly and
flexibly organized catalog-index system, which may be ex¬
plored with the aid of strategies programmed by the user or
put together by the user with the aid of a special "browsing
language" that brings pre-programmed components of strategy
to his aid.

Comparing the third and fourth browseries, one pits the new
look against the old look in an arena in which the new look
can actually handle more of a total operation, rather than —
as is usually the case — less. In particular, it will be pos¬
sible to implement a variety of dynamic annotation schemes
when the entire activity is carried out through consoles and
a digital computer, whereas card-catalog and index-book
browsing would become chaotic in a hurry if everyone wrote
in the margins. Perhaps I am making a mistaken assumption
here, but I think it is safe to say, at the very least, that it
is not necessary, in this comparison, to handicap the computer-
and-console approach.

Comparing the first two browseries with the second pair, one
sets into clear relief the factor of depth of penetration. In the
first two, the browser can delve as deeply as he likes and drift
out of browsing into consecutive reading or deliberate search.
In the second pair of browseries, the browser has to leave the
premises in order to move down into the realm of content.

188
Comparing the first and third browseries against the second
and fourth, one sets the conventional manual-visual approach
against one based on newer technology. In the case of full
and complete browsing, the conventional approach is based
on collections of shelved books. In the case of browsing
constrained to the level of the apparatus of bibliographic
control, the browser has to take a rack of catalog cards over
to the table, thumb through them, go over to an index volume,
take it off the shelf, and so on. A picture of the full, uncon¬
strained browsery based on microimage representation of
full content and computer-processible bibliographic informa¬
tion is given in the paper on "The Microbrowsery”, which
I wrote after completing the first few pages of this memoran¬
dum. The new-look browsery constrained to the level of the
apparatus of bibliographic control represents a very conserv¬
ative approach to the "telebrows eryM, about which George
Miller has done some thinking. Note that, in the telebrows-
ery part of this proposed experiment, the bodies of documents
are not assumed to be available in digitally coded, computer-
processible form.

One of the most important considerations in setting up an ex¬


periment of this kind, of course, is the selection of the field
or area or scope. I think it might be best to focus upon new
accessions in one field or on a few fields in which there is
great activity at MIT. It might be a good idea to associate
standard reference works with the new accessions. It might
be a good idea to introduce, also, documents that are signifi¬
cantly related to the new accessions, regardless of their age
and current level of activity. I shall not go farther, here,
in trying to specify the nature of the collection because the
selection of fields should depend mainly upon the interests of
the various departments, laboratories, and faculty members
concerned.

(John Burchard suggested that the experimental browseries


would have to contain a wider scope of documents than pro¬
posed in the foregoing — that few people would go to browse
in such restricted browseries. I agree with that criticism,
and hereby modify the proposal to include more material.
Let there be a working group, within Project Intrex, on the
selection of scope and documents for the browseries.)

In formulating the foregoing — if that is not too-formal a word


to use for dictation out of the top of one’s head — I tried to
limit my demands upon technology to what technology could,
indeed, provide if it were pressed forcefully and with consid¬
erable funding. The particular point in which I think I pressed
close to the bounds of tolerance is the preparation of brows¬
ery Number 2. For that browsery, I assumed, first, aeon-
version of all the documents — and that amounts to something

189
if the documents are new accessions and therefore always
changing — to microimage, and a cataloging and/or indexing
that is both rapid and deep. Both those operations pose a
considerable requirement for personnel, equipment and money.
I should argue, however, that the conversion to microimage
and the cataloging/indexing ought to go on, anyway, and
browsery Number 2 might give them the right kind of pressure
and the right kind of point of application. I may have pressed
fairly hard, in connection with browsery Number 4, when I
assumed that users could browse through the apparatus of
bibliographic control and the annotations while sitting at con¬
soles remote from the laboratory. There, however, the
pressure is wholly in the realm of economics and not in other
aspects of console technology. The trouble is that browsery
Number 4 would not be much good if the browsers had to
walk some distance to a community console, only to find
it already occupied.

In the foregoing paragraphs, I have tried to suggest four


things that might be compared. But what comparisons,
actually, could be made? Probably the simplest measures
are those that deal with the amount of browsing done and that
do not enquire more deeply into its satisfyingness or its
effectiveness. Records should be kept of the number of
browsers, the number of browsings per browser, the amounts
of time spent in browsing, the number of items (bibliographic
items, document, etc., the records being separated into
appropriate classes) dealt with by the browsers, and the like.

If reliance is to be placed upon truly gross measures, then


it is essential to keep a sharp eye out for "unfair practices",
such as the serving of tea in one browsery and not another,
or the provision of soft chairs or good facilities for doing
homework. However, I think that the more-detailed records
would keep anyone from drawing a false conclusion. Perhaps
it should suffice to say that a careful monitoring of the four
browseries from an umpire’s point of view would be essential.

In addition to the simple keeping of records, there should be


some time-motion analysis and some follow-up interviewing.
I think it would be desirable to construct flow charts of
typical browsings in each of the four browseries. It would
be useful to have reports from users, indicating what they
found, if anything, and where the browsing led. It would
be interesting, for that matter, to know why the browsers
came to browse and whether their experiences were soul-
satisfying and, if so, in what ways. I suppose that the prob¬
lem, in all this sort of thing, is not to pester the subjects
and not to bias them. Perhaps I should say that I rather
share George Miller’s suspicion (in another appendix) of user
reactions and user preferences, but I think that they are

190
particularly important in this area, and I think that they will
be the more helpful as MI liked it", and "I give it a grade of 3
on your 7-point scale", are shunned in favor of data for the
preparation of flow charts and case histories.

An interval of five days elapsed between completion of the


foregoing and the writing of the following. During that inter¬
val, one of the function-oriented groups discussed this
general problem area and arrived at a working plan based
on three browseries: a conventional browsery, a microbrows-
ery, and a telebrowsery. In a way, those three correspond
to the first, second, and fourth browseries proposed in this
appendix. I think I should say why I thought it a good idea to
include a fourth browsery, browsery Number 3 of this pro¬
posal — the conventional browsery limited to the level of the
apparatus of bibliographic control. The reason for that was
that it did not seem likely to me that browsery Number 4,
the telebrowsery, could be implemented in a sufficiently full
and unconstrained way — that is, with access through an on¬
line console to a computer-processible store of digitally en¬
coded text of sufficient scope and volume — in time to permit
Project Intrex to carry out experiments before 1970. As I
indicated earlier, I though that I was pressing the tolerance
of the technology fairly hard in assuming that it would be
possible to set up a microbrowsery on that time scale. Now,
I wanted to work a telebrowsery into the comparison, and I
did not want to make a non-feasible assumption, so I made
a more conservative assumption (browsery Number 4) and
gave it a "control group" (browsery Number 3).

To recapitulate the previous paragraph, the reason for


browsery Number 3, the constrained conventional one, is
that I think there should be a control browsery against which
to compare browsery Number 4 (the telebrowsery), and I
do not think that it will be possible, on the Project Intrex
time scale, to make browsery Number 4 coordinate with
(in the sense of being of full scope and content) browseries
Number 1 and Number 2.

Let me introduce, her,e, an afterthought: It involves an


escalation of the contents of browseries three and four. I
think it will be possible to get a considerable amount of text,
and also some graphics, into computer-processible form by,
say, 1968 — which would leave two years for the experiments. I
propose, therefore, that the catalog-index-retrieval-relevance
information that I have been calling (after Verner Clapp) the
"apparatus of bibliographic control" be supplemented with as
much of the richest browsing material, full text and graphs,
but not pictures, as possible. I think that this would leave
the telebrowsery, and therefore also its control, the constrained

191
manual browsery, far short of browseries one and two, but
I think it would make the experiment more meaningful. The
fact is, I think it would be frustrating and unproductive for
most people — for most purposes — to browse through catalogs
and indexes, and I am afraid the only value to be gained from
browseries three and four, as originally described, would
be proof that that expectation is correct. Therefore, make
browsery four as full and complete a browsery as possible
under the constraints set by the difficulty of converting the
contents of documents into digitally encoded, computer-
processible form. Then match browsery three to it to serve
as a control. Finally, if it should prove possible to go quite
a long way toward making the telebrowsery a real browsery,
then it would be worthwhile raising the question, whether or
not to expand the experiment to include six categories, the four
described initially plus an as-full-as-possible telebrowsery
and its manual-visual control.

A TEST OF THREE BROWSING STRATEGIES

This proposal concerns a minor experiment that I think may


have some theoretical interest. It concerns browsing with the
aim of finding something of value in connection with one's
work. The "expected value" of a "browse" depends both upon
the values of the things that one can find and upon the proba¬
bilities of finding them. The values of the things that one can
find depend upon their novelty to the finder and, in addition,
upon their novelty in the field in which he is working. Differ¬
ent assessments of this situation apparently lead to different
strategies for browsing. There is probably a continuum of
these strategies, but let us consider only three.

The first strategy is based on the general supposition that the


browser knows far less than he does not know and that the
other workers in his field are in approximately the same boat.
The thing to do, therefore, is to browse in areas that are close
to home, areas that are likely to be rich in relevance. That
maximizes the probability of finding things, and the assumption
is that the value of the things found will not be greatly attenuated
by the browser’s (or his co-workers' or competitors') already
knowing them, because they probably would not already
know them.

The second strategy is based on the assumption that anything


truly close to home is likely to be known already and that any¬
thing truly far from home is likely to be irrelevant. That
leads to a strategy of browsing in intermediate areas. One
avoids the most popular books, the journals all his colleagues
read, the bookstores closest to the campus. He avoids, also,
fields that do not have any evident connection with his own and
cultures that are so different from his as to have little likeli¬
hood of contributing thoughts that are capable of resonating.

192
The third strategy is based on the assumption that there are
so many scientists and scholars in the world that almost
every thought worth thinking, every device worth inventing,
every theory worth testing, has been thought or invented or
tested. That leads to a kind of desperation and to a search
of the most distant, most unlikely areas in the hope that
whatever is found there that has any relevance at all will at
least have some probability of novelty and some possibility
of revolutionizing the field to which the browser aspires to
contribute.

The three assessments just described — perhaps described a


bit too schematically - are appropriate, I think, to three dif¬
ferent sets of people. It is being too pat, I realize, to say
that the three sets are undergraduate students, graduate stu¬
dents, and faculty members — but perhaps a considerable
amount of schematization is forgivable in this kind of enter¬
prise. In any event, I envisage an experiment in which three
kinds of browseries are set up and browsed in by three groups
of browsers. The browseries are all conventional books-and-
shelves browseries. The three groups of people are all in the
same field — physics or electrical engineering or mechanical
engineering (I do not have any particular preference just what).
The first browsery is a collection of very standard reference
works on the subject. The second one is a collection of periph¬
eral but related documents. The third is a collection of docu¬
ments of distant relevance - perhaps best, analogical relevance.

It would be an interesting intellectual exercise, I think, to se¬


lect the documents for the browseries. I do not have in mind
very large collections. I think that a working group might be
able to select the documents in a few weeks. What I do not
have in mind is any way that the selection process could be
prevented from holding the key to the outcome of the experiment.

The experiment would be based on a simple, neat design in


which each group of browsers spends n browsing sessions in
each of the three browseries. A browsing session might be
one hour in length. The browsers would be browsing, in com¬
pliance with the requirements of the experiment, to find things
of value to them and their work. They would not be merely
following their own inclinations.

Records would be kept of the courses or trajectories of the


browsers. Perhaps the browsers would be required to take
notes on their "finds". At the end of each session, each
browser would be "debriefed" and would make an evaluation
of his various discoveries. The whole thing would be carried
out in a counterbalance order, and the browser's evaluations,
perhaps supplemented by the evaluations of a jury (which would
be given descriptions of the finds and explanations of why the

193
finds seemed important to the browsers, but no information
about which browsery was being used), would be taken quite
literally in the scoring of the experiment. The two "results” of
the experiment would be (1) a set of tables showing the values
of the three browseries to each of the three groups of browsers,
and (2) a set of descriptions or graphs or flow diagrams of the
courses followed by the browsers in the three browseries.

I shall not try to say why I think the foregoing is an interesting


experiment, however odd-ball. I shall simply see whether it
seems interesting to anyone else. In Washington, the expres¬
sion is, ’’Let's run it up the flagpole and see whether anyone
salutes. ”

THE TAILOR-MADE BROWSERY

This proposal is for another minor experiment. Perhaps it


is more an exploration than an experiment. The idea is to
select a number of subjects and to prepare, for each subject,
several special browseries. Of necessity, these would be
very small browseries. I think that even a very small brows¬
ery can be interesting: I find that small personal libraries
of some of my friends, particularly friends whose fields of
interest overlap mine but are not coextensive with mine, make
very good browsing, indeed.

Limiting the size of the browseries makes it not too much


trouble to prepare special browseries for individual subjects.
Perhaps the best way to proceed would be to select, for each
subject, two or three ’’browsery designers” who know the sub¬
ject well. (By "subject”, here, I am meaning a person who
serves as a subject in an experiment, not a subject-matter
field.) The browsery designers would design, for their par¬
ticular acquaintances, several experimental browseries. Each
browsery would correspond to a describable or definable hy¬
pothesis about what makes for good browsing. Then would
come the collection of the documents and the arrangements of
the documents in a set of shelves. (Alternatively, of course,
all this could be done with the aid of avant-garde technology,
but I view this as a preliminary experiment and think it might
just as well be done with ordinary books and reports.)

The experiments would be run off by having each subject browse


some small number of times, perhaps an hour each time, in
each of the browseries designed for him. As in the previous
experiment, records would be kept of what he did, and he would
keep records of what he found that was of interest to him, and,
at the end of the Session, he would describe his experiences to
the experimenters.

The outcome of the experiment would be, in part, an evaluation


of the various hypotheses on the basis of which the special

194
browseries were designed. In part, the outcome would be a
collection of descriptions of behavior and reaction in brows¬
ing. I suspect that the latter kind of result would be of con¬
siderable interest, even though it might not lend itself to neat
analysis or interpretation. In a field like browsing, in which
everyone has some opinions but no one has any data, it is
sometimes a good idea just to do some careful observing and
see what transpires. Incidentally, I do not know that no one
has any data about browsing. *

The question remains, what about the hypotheses about brows¬


eries? It seems to me that that is something for abrainstorm-
ing session, but perhaps I can suggest a few.

Let the first hypothesis be that, if the browser will name m


people he respects or admires in his field, and each of those
people will listjdbooks that he considers indispensable, then
the set of books so named, which may of course be less than m
times p in number, would constitute a good browsing collection.

Let hypothesis two be a variant of hypothesis one. Instead of


having the book-namers list "indispensable" books, let them
list books that they consider interesting and valuable in pre¬
senting new or unconventional approaches, new ways of han¬
dling problems in the field in question. Perhaps several
different hypotheses could be generated by following the fore¬
going line of thought.

To take another line, let hypothesis three be generated by


starting with the citations contained in the publications of the
subject — or, if the person in question has not published, then
the reference book lists for his courses in his field of spe¬
cialization. The references cited in those references (or, if
there are too many, the ones most frequently cited or the
major ones cited) would then constitute the set of books to be
collected (or sampled) in setting up the browsery. Perhaps
it would be a good idea to exclude from that set the original
list of references. This third browsery might be called a
"citation-index browsery".

Let hypothesis four be based upon a list of books prepared by


the subject, himself. Suppose that the subject makes a list
of the q books that are most familiar to him. He need not
do this from memory; he can use his own library or his insti¬
tution's library. Then someone else, a person selected for
sensitivity and appreciation of browsing as an art, plays a
"controlled association" game with the subject's list, pairing
with each book on the list another book that seems to him to
be similar or related. The set of books thus selected makes

It turned out that Bill Locke had already initiated some li¬
brary research, and that there are, indeed, some data on
browsing — but not many.

195
up an experimental browsery. If there are not enough books,
then several people can contribute paired associates, and the
browsery collection can be assembled from the union of the
suggestions.

That is probably enough hypotheses to convey the idea. I do


not want to do much more than that, for I do not have any
great conviction that this kind of experiment will prove viable
in the forthcoming competition.

THE BROWSER'S DIARY

This proposal will probably seem fairly far-fetched. From


one point of view, what it deals with may be seen as an ex¬
periment. From another point of view, it is seen as just a
way of collecting information about what people do when they
browse. Either way, there is the flaw — which may or may
not be a severe flaw — that the browsers would be browsing
under orders to browse, and not browsing just for pleasure
or out of interest.

The proposal is that a number of undergraduates, say 30 or 40,


be employed to browse one hour a day and keep good records
of what they do. The group of subjects would be carefully se¬
lected on the basis of grades, test scores, areas of concentra¬
tion, and other relevant factors - selected in such a way that
they could be divided into several equated groups. Each ex¬
perimental group would be assigned to browse for an entire
school year in one or another of the available libraries. These
might or might not include experimental browseries or other
libraries developed by Project Intrex.

Perhaps I should be explicit on the point of controls. One or


more of the groups would be paid to do something else quite
orthogonal to browsing, something else having nothing to do
with libraries or studying in any way.

At the beginning of the year and again at the end of the year,
the subjects would be interviewed and tested to determine
profiles of their interests, aspirations and personalities.

In searching for an "experimental result", one would look for


ways in which the interest structures of the students were
altered by their browsing experiences. One might hope that
an hour a day in a library for an entire school year might have
a fairly profound and measurable effect. If not, not too much
would have been lost, and almost surely some good would have
been done, however covertly.

The main product of the enterprise, however, would be the


collection of diaries. I think it might be interesting for a

196
researcher with a particular kind of bent to try to make sense
out of the diary data. He would approach the task, I think, by
formulating and collecting hypotheses about browsing and try¬
ing to structure the field. Then, with his a priori formulation
of browsing to go on, he would read the diaries. In the proc¬
ess, he would modify his formulation, altering some of his
hypotheses and adding new ones suggested by the diaries. Per¬
haps he might stop this process after having read a random
half of the diaries.

At that point, he might be able to define conditions or courses


of action or positions within the domain of browsing that he or
a jury of readers could recognize in the remaining half of the
diaries, and he might be able to make predictions, on the basis
of various hypotheses, about what should ensue under those
conditions, or in the completion of those courses of action, or
from those positions. Perhaps different hypotheses would pre¬
dict different things. He might then use the second half of the
diaries as a test base for comparing the various hypotheses
or for checking out his formulation. All this is very unstruc¬
tured, I know, and I certainly do not volunteer to do the analy¬
sis. Nor, for that matter, do I give this proposal a very high
evaluation. But, as you will have gathered if you have read
this far, the process of proposing these experiments is a kind
of browsing in itself. I have been letting one thing suggest
another, and the fairly miserable outcome of the enterprise
is leading me to do what most browsers do when their meander-
ings through the bookshelves prove unrewarding.

J.C.R. Licklider

197
APPENDIX J
THE MOTIVATIONS OF AUTHORS -
INTELLECTUAL PROPERTY AND THE COMPUTER

The following is not written from a publisher’s point of view.


Rather, it tries to summarize and rationalize the observations
and experiences I collected in dealing with hundreds of sci¬
entific authors and editors in the United States and abroad.

These remarks seem to be significant because the evolving


computer technology is apt to upset the "balance of nature"
in the subtle area of the relationship between the author and
his brainchild. Proper precautions have to be taken so that
the author of the future will not lose the incentive and urges
which motivated him in the past, and so that the "memory"
will not die of starvation or undernourishment.

There are many, often conflicting, motives which lead a


scientist to publish.

The author of a scientific paper of a research nature is


mainly driven by the desire to communicate his findings
to his colleagues. His wish is to spread information, to
identify the findings with his name and/or with that of his
institution. He expects credit (by proper quotation) in his
own field, credit by his institution (academic or industrial).
He wants as wide as possible distribution (publication in a
journal of many readers, maximum number of reprints).
Financial expectations are nil. He will even pay a page
charge for having his work published.

The author of a book or of a review article, a contributor to


a book, etc., is driven by a more complex set of urges which
in turn lead to much closer identification with the end-product.
He wants to extend his teaching realm to an audience larger
than his classroom (e. g., a textbook). He wants to educate
his peers or bridge borderline fields by identification and
synthesis of the work of others (e. g. , monographs). In any
case, he wants by his writing to establish his authoritative
knowledge of the field to the outside world. He wants to
receive for himself (and sometimes for his estate) an income
from the fruits of his labor. He wants distribution for
all these reasons.

The author of a book is concerned not only about the content


but (to a high degree) about the format of his presentation:
he worries about the elegance of his style, about type faces,
color of paper, design of jacket, etc. He wants the publisher
to advertise the book (knowing that book advertising is of
minor importance for sales). He wants as many review copies

199
sent out as possible. He wants to watch acceptance of his
writing, and wants carefully to nurse a second edition; he
resents help if he is unable to perform this task himself.

Last, not least, he wants to be an author and see his name


on the title page or in a contributor list.

In these circumstances, the dangers of shared use of stored


information are obvious. Attempts have to be made to
safeguard:

Proper identification of the output with the


author of the input.
Proper remuneration of the author of the input
for shared information of creative and time-
consuming ad hoc and extracurricular texts.
Manipulation of an author’s input without his
permission, and its incorporation into another
author’s output without at least citation.

But the greatest difficulty seems to me to lie in the fact that


memory storage is only potential publishing or distribution.
Without hard copy and its distribution through marketing
devices, the stored information will only be read on demand,
and only by the users of or subscribers to the network.

Methods of making users aware of the input of the storage


will have to be developed (we are not scanning the telephone
book to find out whether we have a friend in the city). Non¬
subscribers or people abroad — 50% of our market for ad¬
vanced scientific books — have to be put in the position to
receive hard copies. (The author would want this — not only
the publisher). Last, not least, print-outs have to be im¬
proved in typography and aesthetic appeal to fulfill the dreams
of authors, even the scientific authors.

Having seen the untiring resistance of the scientific authors


to even the slightest deviation from the standard style in the
last 40 years (e. g. , cold-type composition, loose-leaf books,
coded publications, and the slow inroads publishers could
make into changing deep-seated habits, prejudices and idio¬
syncrasies related to extremely minor style questions), I am
extremely doubtful that the scientific author will respond as
fast to a deep-going modification of his present privileges,
prerogatives and ’’profits" as computer technology will
develop methods for taking care of his product.

I am afraid that the incentives to writing, being published


or being distributed will be substantially diminished or
stifled if the creator of the "new system" does not use a

200
psychologist who offers means for building author satis¬
faction into the system.

I did not deal with the function of the editor, who has to act
as a filter for the input.

E. S. Proskauer

201
APPENDIX K

PROJECT INTREX AND MICROPHOTOGRAPHY

INTRODUCTION

This memorandum proceeds on the assumption that the im¬


provement of information transfer will require attention to
the document store and not just to the keys that lead to it.
The practical design of a desirable information transfer com¬
plex would, I think, incorporate a more-formalized network
of information activity, including authorship at one extreme
and the individual user at the other, with vastly improved co¬
ordination. In this network, practical considerations suggest
information handling in forms other than paper. These forms
will be all manner of things: photographic images in analog
and digital form, magnetic storage in various forms, and
hybrid systems incorporating both of these forms, possibly
crystal storage, and all kinds of storage materials based on
electrostatic and thermoplastic principles.

The advantages of microphotography include the attribute of


combining, in a simple manner, alpha-numeric as well as
pictorial and other graphic information. The full document
in microform can be handled mechanically, which is the single
most important property of this technique. Automatic, rapid
access to any given document or section of such a document
is technically a simple matter. Microphotography has many
different forms, ranging from the punched-card mounted film,
the so-called aperture card, and film chips and film sheets,
to the continuous roll. Each basic form can be readily modi¬
fied to meet the demands of particular applications. Micro¬
photography lends itself particularly well to the concept of
the non-circulating (even completely non-lending) library,
since output from a microstore of information can take many
different forms: screen image, microimage duplicate, tempo-
rary, intermediate plates, and full-size paper copy. Properly
designed, the collection can remain complete at all times.

Used as a means of publication, microphotography can achieve


remarkable economy, regardless of the size of the edition.
Indeed, the edition size need not be predetermined. Storage
space is reduced, which means lower building and building-
maintenance costs every year. Storage-space reduction and
inherent publication and reproduction economy reduce the
need for transmitting information from one locale to another;
this is most valuable, considering the cost and technical limi¬
tations of transmission. On the other hand, when transmission
is necessary, the machine-handled film image is easier to

203
control than the book. Technically the generation of full-size
paper copies is easier and subsequently more economical if
the input is microfilm than if the copy were made from a paper
original.

Before dealing with specific problems concerning the use of


microphotography in the library, I would like to refer to the
matter of the continuous evolution of the information network.
A constant stage of change of the network is desirable. It is
not, after all, as if we had reached a certain point in time
when traditional methods are to be discarded, to give way in
10 years to a new technique which will then continue indefinite¬
ly to another point in time. But it is possibly a bad byproduct
of the current methods that they are rigid, that they would
tend to retard evolution. Flexibility, in the evolutionary sense,
is an important attribute of any new information system.

MICROPHOTOGRAPHY SYSTEMS

The question I would like to examine at this point is this:


Why has the potential of microphotography not been realized
in the library, and why have some applications of microfilm
to library functions resulted in so much adverse criticism?

In some instances, microfilm has been a very successful


tool in libraries, and its use as an alternative for loan has
gradually increased. However, compared to its potential,
microfilm has not been successful in the library. It is vital
to recognize the reasons for this in order to determine whether
experimentation by Intrex is justified and, if so, what the
nature of the experiments should be and, finally, whether such
experiments represent a last-ditch stand for microfilm or
the beginning of true experimentation. Let me attempt to
answer these questions.

The primary reason for the lack of proper utilization of micro-


photography in libraries is that the basic technique, is abso¬
lutely dependent on systems design, and no proper systems
were designed for it.

It has been pointed out that libraries were established on the


basis of an economy of need, so that those who could not
afford to buy a book could at least share it. The book was
the beginning, and from it grew personal, academic, private,
municipal and Government libraries. As information in¬
creased, the borrowing of books had to become more system¬
atic, and methods for acquisition, circulation, cataloging
and purging were developed, first on a trial-and-error basis,
later in a more scientific manner.

204
If library science has not succeeded very well in developing
good information transfer systems, this is due partly to poor
economic support for libraries, to a lack of planning of infor¬
mation networks, and to other reasons, but a primary factor
is the inadequacy of the book itself. (This kind of statement
often requires the proponent of new techniques to enlist the
services of a public relations expert or to be able to duck
quickly, because questioning the adequacy of the book is
only one step below an attack on motherhood. )

Leaving aside their use for pleasure, books as information


devices are really good for nothing but reading. If this
sounds facetious, let us look at some other attributes which
we want an information carrier to have. We want economic
acquisition and storage, convenient cataloging, convenient
and fast access, and availability of each item at all times;
we want to abstract, make extracts, duplicate the information;
we may want to have access to the information from different
locations without loss of time; and sometimes we want to
rearrange the information. While convenience of consultation
is undoubtedly the single most important component of an
information storage and retrieval system, we pay a consider¬
able price for allowing a book to play the multiple roles of
publication device, storage medium, duplication master, and
output form. The book is bulky and expensive and inconvenient.

One might conclude that the book simply isn't worth it. But a
more proper conclusion is that we should strive for systems
with the advantages of the book at the output end without its
disadvantages in the preceding phases of information transfer.
Microphotography systems, sometimes incorporating com¬
puters and sometimes used in conjunction with computers,
come closer than any other concept to being able to do this
during the next decade.

As a result of further experimentation (improvement in


reading devices, etc. ), consultation of microfilm in forms
other than hard copy may become acceptable; but I would
not suggest, at this time, a microphotographic system in
the library that did not offer economic hard-copy output — at
least as an alternative to screen viewing. At best, the process
of consulting information stores on the microfilm-reader
screen (and cathode-ray tubes are much worse) requires
further education of the user. Experience with so-called
"captive11 users of screen output have shown that such an
educational period need not be too long (at least for certain
types of information). An example of such use is the entirely
mechanized information store of the Social Security Adminis¬
tration, where computer and microfilm work together in an
economical, humanly acceptable system.

205
So far as library applications of microphotography are con¬
cerned, microfilm service has normally meant the acquisition
of roll microfilm, unindexed, and stored in a primitive
manner, without any provision for location of individual pages
or chapters of the document, let alone automatic page selec¬
tion. Such microfilm was used with reading devices that
suffered not only from a conspicuous lack of good mechanical
and human engineering, but that were frequently designed
for film in a different form. For example, a 30X reduction
microfilm is offered to the user with a 15X magnification
reader, resulting in a very small, inadequate screen image.
There has been no adequate standardization of the manner
in which images are arranged on film to facilitate retrieval
and bibliographic control. One of the worst aspects of the
library microfilm situation has been that it has not been
subject to quality standards, so that the user often has had
to tolerate severely substandard graphic images.

At best, the library user may have been given a reasonably


good roll microfilm or microfiche with more-or-less com¬
patible viewing equipment in a more-or-less adequate state
of maintenance. Even then, the system (?) had no attributes
that gave the user some obvious advantage, e. g. , a complete
collection at hand, quicker access, or an automatic subject
search. He probably did not even have an explanation to the
effect that the alternative was to have nothing at all because
the only paper copy was in the British Museum, or that his
library purchased the film as an economy measure. It is
not surprising that these library ventures met with user
opposition — not to say, resentment. There are explanations
for these short-sighted methods. The library took as a basis
the early uses of microphotography in industry which were
geared almost entirely to the filming of practically obsolete
records that took up too much space in paper form and that
were infrequently consulted. These systems were complete
with a low-grade film and a low-grade reader. It should
have been obvious that the procedure intended for essentially
dead-record storage could not be directly applied to a system
for consulting complex scientific and technological information.

It is reasonable to ask at this point whether microphotography


in application to any system, for any user, has been adequate.
And the answer would appear to be an emphatic yes, consider¬
ing the many commercial areas where microphotography,
designed as the result of a systems study, has satisfied the
needs of a user in free competition with other techniques.
The microfilm industry is said (thoroughly reliable figures
are not available) to be approaching the $500 million mark,
and this is a substantial increase in the last few years.
Microphotographic systems have been successfully used for
records of all types, obsolete and live, because machinery

206
has been designed which facilitates quick and reliable access
to information of this type - including fully automatic subject
search, address-type of retrieval, and simple indexing methods
which assist in locating a given page.

The fastest-growing area during the last few years has been
the storage and retrieval of engineering drawings in micro¬
form, in aperture-card systems. Mail-order houses are
using microfilm extensively and are achieving considerable
publication economy as well as improved access speed to the
information, and catalogs listing parts for general engineering
use are published in microfiche form. The advantages of
publishing historical manuscripts in microform are obvious.
Access to this country's historical materials has been quite
inadequate, and a major program to disseminate such mater¬
ials in the form of microfilm is now under way.

Insurance companies who have converted their low-use-refer¬


ence records to microfilm for many years now are beginning
to convert their current files.

Public libraries have advantageously used microfilm for book¬


charging methods, and telephone services have employed
numerous microforms to place a great deal of information
within arm's length of the Information operator. Title com¬
panies have accumulated microfilm, normally in ficheform,
for storing their documents. The Bureau of the Census
found a different advantage to microfilm: questionnaires,
after completion by the census takers, were microfilmed
because a convenient method was found to transfer the data
from the film to a computer, speedily and economically.
Microfilm has also served as a convenient computer output
method through machines which interpret digital information
into alpha-numeric symbols. Sometimes these devices have
combined the computer output data with simultaneously photo¬
graphed graphic information for a montage-type microimage.

Many out-of-print books are kept effectively in print by means


of an inexpensive microfilm master from which electrostatic
re-enlargements are made on demand. This would also be an
excellent form of publication for scholarly material which
commands an important but small market.

There is a growing list of journals and other publications


(Chemical Abstracts, in the near future) available in micro¬
form as well as in the paper form, directly from the publisher.
But this list, while growing, is still quite small, particularly
in science and technology.

Current plans provide that technical reports disseminated by


NASA and AEC and other Government agencies will be avail¬
able in microfiche form only. I have mixed feelings about

207
this project. On the one hand, it is an example of a good
balance between "type" of information (namely, reports) and
the microform chosen (namely, the microfiche), and a stan¬
dard has been written to assure uniform placement of the
images, uniform size of the entire fiche, uniform reduction
ratio, etc. On the debit side, little attention has been paid
to the storage and retrieval problems of the fiche, and neither
the reader nor the reader-printers for either occasional or
heavy use can be considered quite adequate. It is of interest,
however, to note that, even though a complete system (includ¬
ing all possible user requirements) was not designed by NASA,
the size of the project has caused manufacturers to introduce
additional equipment quickly.

This lends further support to the theory that standardization


can change the library from a bad market to a respectable one.

In industrial applications, microphotography has grown sub¬


stantially in scope as well as in size. In the library, it is
but a slight exaggeration to say there has been no systems
approach to microphotography whatsoever, and that the loose
application of systems components has frequently prejudiced
people against the entire concept. It is not within the scope
of this memorandum to outline all components of a library
microform system, but some basic considerations in such
systems design are the following:

An analysis of the character of the information


to be put into microform, that is, the size of
the unit (catalog card or 400-page book); variety
of different sizes; graphic arts aspect of the
information (color photography, mathematical
equations, etc.); requirements of storage arrange¬
ment (can it be random, or does it have to be
sequential or ordered in some way?); what is the
expected rate of growth of the body of information?
what is the initial volume?
What are the user requirements? Does the mater¬
ial have to be consulted simultaneously by 10 people,
100 people, 1000? Is remote consultation of the
information a useful, or necessary, part of the
system? What type output forms are required?
What form of microphotography is best suited to
the particular body of information and the user
requirements (aperture card, roll film, fiche, etc.)?
What standards are needed for quality, permanence,
image size, and orientation?
What photographic materials are most suited to
the particular application? Should the system use

208
silver film, diazo film, Kalvar film, or all
three? Should the system experiment with
completely new forms of image storage — e.g. ,
microxerography or thermoplastics?
What are the economic constraints of the system?
What is the compatibility of this information
store with rest of the library? With other
mechanized systems in the library?
What type and amount of information should be
in machine-readable form on the microfilm?
Should it be a simple address in digital form,
or a series of machine-readable descriptors so
that the information can be found by description
rather than address?
What reduction ratio should be used?

Apart from systems engineering, to link the character of the


information to the proper form, to the proper material, to
the proper indexing technique, etc., there is, as I have pre¬
viously noted, a need for standardization and inter-system
compatibility, so that manufacturers can be encouraged to
see at least a reasonable market for their equipment. (Ob¬
viously, standardization should not become a straightjacket;
but the arbitrary employment of diverse forms, reduction
ratios, retrieval logic, etc. , must be stopped. )

As an adjunct of this discussion of systems, it might be use¬


ful to reflect for a moment on the state of the components.
How good are commercially offered components for these
systems? How good are the films, readers, reader-printers,
microfilm cameras, etc. ? The answer to these questions
are not simple; and, unfortunately, there is quite a lot of
misleading (and actually incorrect) information in print.
There is a great deal of equipment — cameras, processors,
readers, reader-printers, and retrieval consoles — and
there are mechanically adequate devices as well as poor ones.
Many problems result from incompatibility between these
pieces. Instead of accepting common basic standards and
competing on the basis of good engineering, manufacturers
have considered it good practice to design their equipment
independently of others, except in areas where the Government
took an effective part in standardization (engineering drawings).
As the writer knows from involvement in many microfilm
standardization activities, it is not easy at this point to
write standards.

In general, technology is considerably ahead of marketed


items, and marketed items are even more ahead of the
equipment available in most libraries.

209
Readers and reader-printers could certainly be improved
substantially. A host of interesting new types of film is
awaiting exploitation.

Project Intrex comes at a time when manufacturers of in¬


formation-retrieval equipment are beginning to have the
uneasy feeling that the library may become a market, and
they are tentatively interested and willing to engage in some
additional research and development. They are not likely
to do very much, however, in the continued absence of guid¬
ance from the library about user requirements.

It is probable that many good microsystems will require


materials and equipment produced by several independent
manufacturers, and this is another reason why such systems
should be designed on the basis of objective, non-baised
experiments.

In asserting that there has been no systems approach to fit


microphotography into the library, I am doing an injustice
to one or two projects by industry and by the Council on
Library Resources, but these are exceptions. Some of
these projects might well provide a basis for further experi¬
mentation (for example, Avco's Mechanized Library System).

The preceding discussion was intended to qualify my answers


.to questions raised earlier in this paper, and the answers
are as follows:

Microphotography is making considerable head¬


way in industrial applications.
Microphotography can be of great value in the
library of the future provided systems are
furnished — not just system components.
Most applications of microphotgraphy to the
library in the past failed to grow because of
a striking lack of consideration for total
systems design and human engineering and
therefore provide a poor basis for evaluation
of the technique. With somewhat better com¬
ponents than were available twenty years ago
and with the precedent of more versatile in¬
dustrial application, we are now in a position
to begin genuine experiments to utilize micro-
photography properly.

This leads us to the question of meaningful experiments


by Project Intrex.

210
EXPERIMENTAL OBJECTIVES AND EXPERIMENTS

Throughout the Conference, experiments involving microfilm


have been suggested verbally or in memoranda. Almost
invariably these references have been to '’microfilm11, with¬
out recognition of its many forms and, I believe, with the
assumption that an experiment in this area begins with systems
readily at hand and requires only a quick evaluation of shelf
items. I have tried to show in the previous section that this
assumption is false. Some suggested experiments, based on
a comparison of different technologies, involve pitfalls, which
are discussed subsequently. Another type of experiment is
concerned with optimizing specific microfilm applications.

In my opinion, the main objective of Intrex experiments in


the area of microphotography should be the design and field
trial of total microsystems which will improve the storage
and retrieval of information, taking due cognizance of the
character of different bodies of information and of all user
requirements, and considering also the economics of such
systems. Every effort should be made to standardize com¬
ponents for the greatest possible number of different appli¬
cations, but the goal — user satisfaction — should not be
compromised. The design of a system would be followed by
the conversion of the chosen documents to a microform, by
field trials, and by a careful analysis of user reaction. The
field trials may incorporate alternative reading devices, or
reader-printers, and the equipment or procedures should be
improved periodically when analysis of user reaction suggests
modification of the system. The need for additional or im¬
proved items can be translated into subprojects for Intrex,
into topics for graduate theses, or into published reports
for the benefit of interested manufacturers.

In the choice of a body of information to be converted to


microform, I tend to favor MIT publications as a good sub¬
ject. A group of journals in a specific field would make a
suitable experiment, as would complete collections in highly
specialized areas. Slides and pictures, which are not nor¬
mally well organized, would be useful, also. It will be
desirable in some instances to convert the original material
twice — into two alternative forms, each with its set of equip¬
ment — in order to obtain a user expression or preference.
(A departmental reading room would be a good location for
such an experiment.) The system should then be improved
over a period of time on the basis of user suggestions.

It cannot be too strongly emphasized that the aim of the


experiment is to perfect the system through feedback. In
the experiment, the user must enjoy a number of advantages,
relative to traditional book use. He should have ready access

211
to the complete, up-to-date collection. The system should
allow browsing. Output should normally include several
alternatives, including a print output. Subject search might
be part of the system.

It is not at all necessary that an experiment of this type include


many alternative pieces of equipment for comparison.

Apart from the over-all objective of systems design, some


specific questions should be answered by these experiments.

What is the best manner in which computer


interaction with a microfilm store can be
achieved? (I would like to keep in mind
Vannevar Bush’s Memex concept of associa¬
tive trails. *)
Which types of information benefit from a
film system that physically combines the
documents with descriptors in machine-
readable form? (The Rapid Selector princi¬
ple. ) Should one combine microimages with
magnetic striping for these descriptors, or
should they be in photodigital form? Or
should automatically handled microphoto graphic
stores have a machine-readable address only.
Does the current state of technology and econom¬
ics recommend very-high-reduction systems?

Experiments might include publication in microform possible


by direct microphotocomposition.

It has also been suggested that it might be useful to store


information in a certain microform, at a specific reduction,
but to print out items at a different reduction ratio and in a
different microform. It may be useful, for instance, to
store the material in roll form or in the form of scrolls
(wide ribbon roll film with a number of images across the
width of the roll) at a linear reduction of 200 times, and to
give copies to the user in the form of an aperture card or
a microfiche at a net reduction of 30 times. This would be
done by projection printing from the basic store, and micro¬
film would have to borrow equipment from the motion pic¬
ture industry (again!).

Documents stored in microform can be enhanced by an index


or by brief descriptive information stored in a computer.
To fit into the system, the computer output also can be
printed out in microform.
* The Atlantic Monthly v. 176, p. 101-108 (1945).

212
The experiments should study the attributes of microphoto -
graphic images to be transmitted, possibly utilizing a buffer
principle based on xerographic Proxi plates.

In considering output forms from a microstore, an effort


should be made to perfect an intermediate step between the
hard-copy output and the screen image. This ’’intermediaten
type output envisions the use of a re-usable plate (it could
be a xerographic plate). The plate, on a base of metal or
plastic and bearing a temporary, full-sized image could
possibly be read more comfortably than a screen image, and
would free the reader for another user. Subsequently, the
plate can be returned to the machine for erasure and re-use.
This concept has interesting possibilities for the future
design of microfilm systems.

An additional area of experimentation concerns recent devel¬


opments in photographic materials. Certain bodies of in- *
formation require erasure, others require frequent addition
of single pages, to be interfiled with the basic information
store. Recently introduced storage materials such as thermo¬
plastic and photoplastic recording. Frost Images, Kalvar,
and others need field testing within the framework of practical
systems. The traditional silver film still has great advantages
over these new materials in film speed, but the required
wet-processing procedure and the resultant time delays often
constitute an inefficiency in the system. The advantage in
light sensitivity which the silver film enjoys over its newer
relatives becomes a disadvantage in film handling in that the
film has to be protected against tungsten light as well as
daylight. Slower films such as Kalvar and diazo which are
now used for microfilm duplication can be handled relatively
freely in normal room light; and, since film speed is only
relative to available optics and light sources, every effort
should be made to find a light-source optics film combination
that would yield immediately available microimages of good
quality. It has been suggested that lasers might be useful
in this connection, and their application to microphotography
should be the subject of special experiments.

Experiments should include all manner of exploitation of the


microstore so that it yields not only the total document but
also excerpts (including diagrams, pictures, and short
paragraphs). These may be useful as part of an SDI system.

Experiments should also be concerned with machinery that


’’marries” individual pieces of equipment or partial systems
produced by different manufacturers for maximum utilization
of such equipment.

213
Experiments in microphotography share the hazards of all
experimentation — namely, the difficulty of determining
meaningful methods for evaluation and the difficulty of eval¬
uating user satisfaction. (Psychologists have warned that
experiments are frequently prejudged by the manner in which
they are introduced or described.) There is another hazard
in experimenting with image retrieval, and this concerns
comparison between different techniques. While it seems at
first perfectly feasible to compare a magnetic information
store with a store of hard-copy documents and with a micro¬
film store, further thought about the details of such an ex¬
periment shows that, with the present immature state of the
systems, an elementary mistake can easily be made. It is
somewhat like an experiment which compares a DC-3 air¬
craft, a Rolls Royce automobile and a fishing vessel as rep¬
resentative of aviation, ground transportation, and shipping,
respectively. Such interdisciplinary comparisons may be
useful as follow-up experiments, but basic system experi¬
ments should come first.

It is to be hoped also that the economic basis for Intrex ex¬


periments will be such as to preclude the possibility of
biased experimentation simply as a consequence of the avail¬
ability of certain machines at MIT.

Quality in graphic arts is a vital part of microphotography


systems, and there is a genuine need for better training of
technicians in producing laboratories — commercial and
academic. A course in the theory and practice of micro-
photography, taught at MIT, would help this situation but I
do not know whether this fits into Project Intrex.

Peter Scott

214
APPENDIX L

THE NATURE OF THE "EXPERIMENTS"


TO BE CARRIED OUT BY PROJECT INTREX

Inhis ’’Briefingfor Visitors’* of 17 August 1965, Joe Weizenbaum


said that "The word 'experiment* has the connotation of an
activity. . . terminated after. . . a period of time. . . . But. ...
an Intrex experiment operates in a real environment. . . will
not prove terminable. To be real it must serve real users
on a realistic scale and over a long period of time.’’Weizenbaum
then went on to describe two activities that are not experiments
in the narrow sense of that word — two activities in which he
thought Project Intrex should engage, and which he thought
would ultimately converge and interact strongly with each
other. One of these activities had to do with automating and
”rationalizing" the functions of libraries. The other had to
do with the development of a stored-program-based and
programmable information-transfer network. Thus Weizenbaum
dealt with a subject that I think is basic to this Planning Con¬
ference, the form (as distinguished from the content) of the
"experiments” that give rise to the "ex” in "Intrex".

It occurs to me that many members of the Planning Conference


have never had the experience — perhaps, I should say, have
never had the pleasure — of knowing experiments of the kind
that have dominated the field of experimental psychology for
many years. These experiments have had the characteristic,
mentioned by Weizenbaum, of limited duration. They have
been carefully designed, set up with one or more "experimental"
groups and one or more "control" groups, arranged in such
a way as to permit rigorous statistical analysis and interpre¬
tation with the aid of "confidence limits", rejection of the
"null hypothesis" at a pre-specified level of confidence, and
so forth. A good experiment in experimental psychology has
been a polished gem, a small thing that shines brightest under
the light of careful scrutiny and that is valuable for its beauty,
or its elegance, or its rarity, rather than for its practical
utility. I characterize the typical psychological experiment
thus, not by any means in disapproval, but rather with con¬
tinuing admiration and — indeed — pleasure in appreciation of
balanced design, carefully enumerated components of variance,
and high coefficients of concordance. It is striking to me how
different such experiments are from the planned experiences
in system design and synthesis with which the world of engi¬
neering has been so much more boldly and so much more
expensively concerned.

A remark Fred Mosteller made to me last year - "I suppose


these experiments will be so important that experimental

215
control will be unnecessary", introduces the focal problem;
and perhaps it says enough about the focal problem. However,
I shall discuss the matter further, trying to come to grips
with a question about the nature of "experiments" that still
bothers me.

It seems to me that most of the big engineering "experiments"


have not been true experiments and that they have contributed
much less to human understanding than they would have con¬
tributed if they had been conducted more nearly within the
tradition of the kind of experiment I grew up with in behavior¬
istic experimental psychology. On the other hand, the big
engineering experiments have truly changed our world. Some¬
times we have not been quite sure whether they improved it,
but there is no doubt that they changed it; and in that respect
they are quite different from the psychological experiments
that, however much or little they added to understanding,
changed the world of the pigeon and the rat considerably more
than the world of man.

Carefully trained experimental psychologists have come into


association with, and participated in, large-scale engineering
"experiments". When they have done so, they have tended to
follow one or the other of two courses. Those who followed
the first course abstracted small, manageable problems from
the larger context within which the engineering "experiments"
were being carried out, schematized those small problems
and made them amenable to designed, controlled experimen¬
tation, carried out those idealizations, analyzed the data, and
reported the results at too late a date to have any effect upon
the engineering, which meanwhile had reached the point of
being frozen for production. Those who followed the other
course adjusted to the fast time scale of engineering develop¬
ment and conducted "quick and dirty" tests to determine which
of two or three different ways of handling some process or
shaping some component would be "best". This let them do
experiments that influenced engineering, but it did not contrib¬
ute much to long-term interest and value, did not add much
to knowledge.

In the foregoing, I said that psychologists tended, and some


erred, in one or the other of the two directions - not that all
actually went to those extremes. Actually, there has been
much good work that has been timely enough to contribute to
engineering and rigorous enough to contribute to academic
knowledge. By and large, however, it has come from long-
continued programs of psychological research within contexts
in which the larger aims and commitments had to do with
engineering. Good examples have been provided by the Bell
Telephone Laboratories over a long period of time and by the

216
Lincoln Laboratory after the initial pressure of making the
"quick fix" and building the SAGE System subsided. Never¬
theless, I think it is worthwhile noting that rigorous experi¬
mentation has had a hard time adjusting to the pressures and
realities of large-scale engineering works, and that large-
scale engineering projects have often been successful — perhaps
equally often, unsuccessful — quite independently of the experi¬
mental work that was conducted at the same time on the same
premises under the same general management.

Now, it is easy to make the argument that science has been


progressing rapidly these last several decades and that it has
so grown in stature and in favor with God and man that there
is no longer any need to observe the pretty niceties of stratified
random sampling and the like. It is easy to argue that what
the world of libraries and information systems needs is "ratio¬
nalization" or insightful application of automation or simply an '
infusion of expertise and funding. Certainly, any one of those
three would be better than a myriad of picayune experiments
on such matters as whether, on library cards, the titles should
be underlined or in italics. But I think that there is a middle
ground, and that is why I am writing this note instead of, as I
should be,proposing some definite experiments.

Actually, as I see it, there are two general areas of "middle


ground". One of these has to do with confrontation among major
themes in the field of system design. The other has to do with
optimization within a general system design or family of designs.

By "themes", in the foregoing, I mean such broad, general


approaches as those based on computer processing of encoded
information (i. e., the concept often referred to as MAC) or
the copying and transportation of microimages, plus enlarge¬
ment at the user's location (the concept referred to as MIC);
it may be possible to cast comparisons of such themes into a
genuinely experimental mode. That will be very difficult to do,
and it can not be done without a considerable amount of pre¬
optimization within each theme. But I think that it is called
for if Intrex is to do more than design something, build it, try
it out, and see whether it works.

It seems to me much easier, and just as necessary, to carry


out experiments within the framework of a theme, within the
design of a system that has parameters that can be varied
experimentally to generate a family of systems. One way to
proceed, of course, is to generate the family of systems and
to compare, in a balanced and orderly test, various members
of the family. However, systems of the type with which Intrex
will be concerned have very complex structures and very many
parameters. It is necessary to use heuristics to avoid the vast

217
numbers of alternatives to which combinatorics leads. The
main heuristic is to isolate critical subsystems, simpler and
therefore combinatorically less explosive. Sometimes one can
test the various possible forms of a subsystem in parallel.
Alternatively, one may employ hill-climbing or adaptive tech¬
niques. On this level, neat experiments can be designed,
carried out, analyzed, and reported within the five-year period
of Project Intrex. Indeed, they can be carried out, in some
instances, in less than one year. I think that a considerable
number of such experiments should be performed by Intrex.
Some of them should be scheduled to influence the system
projects with which Intrex will be concerned. Others should
be protected from the pressures of the system enterprises and
should be scheduled to feed into the design, not of the "experi¬
mental" systems of the first five years, but of th.e "operational"
systems of the second five years.

Besides the two kinds of experiments, there are activities that


have to do with creation and invention, activities that are clearly
"research" but not clearly "experimentation". Such activities
lead to new languages for interaction between men and com¬
puters, new ways of classifying the organizing knowledge, new
ways of displaying information to users. Whatever activities
of these kinds are called — often the verb used in conjunction
with them is "to develop", even though the activities are much
more akin to research than to development — I think that there
should be much of them in Project Intrex. In fact, I think that
it will be from them, more than from formalized experimenta¬
tion or from invention on the system level, that the truly novel
and the truly valuable ideas will come.

J. C. R. Licklider

218
APPENDIX M
HOW HUMANISTS USE A LIBRARY

HOW HUMANISTS USE A LIBRARY

It is doubtful that any one person can provide an authentic


view of how humanists use a library. There are too many
different ways. But perhaps a personal impression can be
risked.

By humanists in this note I mean

Philosophers

Historians (this includes historians of all kinds in¬


cluding the arts, music, literature, science, as
well as the more conventional fields of cultural or
political history; it also includes archeologists).

Critics (I mean here people who analyze works of


art, literature, etc., of any period whatever).
There is obviously some overlap with history.

Linguists, both modern and classic, if there is


anything left for them outside the above classi¬
fications; this would include linguists of the Roman
Jakobsen type, but I assume that most contempo¬
rary students of technical linguistics actually use
library resources more as scientists do and fall
outside this classification.

I do not include

Artists, composers, architects, writers (I shy


from the word ’’creative’1 as I do from the word
’’genius" and might prefer makers of art, liter¬
ature, music, architecture who are really work¬
ers in imagination.) There are some people
sitting uncomfortably in our intellectual univer¬
sities now, and there may be more in the future and
they may be more comfortable. I do not exclude
them for any reason other than that I suspect
that few of them use libraries at all in anything
other than a browsing sense, and many not even
that.

I am talking about adults. Of these there may be two groups,


each important

219
Teachers who try to grow in what they know about
what they teach without trying to contribute seri¬
ously to the store of knowledge. They need, I
suggest, a fine opportunity for browsing and for
leisurely and thoughtful reading in depth, and a
selected exposure to the important current liter¬
ature of their field — but very selective, since
humanists perforce will not find any five years of
time of maximum importance, and particularly
not the present time. It is only rarely that the
current literature or the current writer has the
importance, ipso facto, that it may properly have
in rapidly changing science.

Scholars without regard to whether they teach or


not, or whether they are part of the university
establishment of merely passersby. These we
can divide again into two groups.

Generalists or synthesizers, from those


who write about all history (Wells, Durant,
Toynbee, Spengler, Pareto, Mumford) to
those who write about periods or topics,
say. The Age of the Enlightenment or The
Anatomy of Revolution. If one writes about
Painting in Siena after the Black Death, he
is probably moving into the second cate¬
gory. What seems to me to characterize
the methods of these general people is
that the scope of their inquiry forces them
to the use of secondary sources. If I read
somewhere that one of the sumptuary laws
of the Commonwealth of Virginia said some¬
thing interesting, I can rarely afford to find
out at first hand whether there is such a
law or what it says in detail, if this re¬
quires a substantial effort. It is essential
that I be able to trust my secondary sources,
to establish a sort of roster of the reliable;
and more powerful tools for rapid checking
would be of great value.

Such workers need the resources of a conventional,


well-stocked library, a good many books and full texts,
and access to the best journals but especially the
monographic literature. The great libraries do not
seem to offer any difficulties that seem to matter much
to people like us. We need time to ponder the in¬
formation we find; we need more time to ponder pro¬
posed interpretations. We can turn from topic to
topic in our net; we can wait for a piece if we know

220
we will get it* And our principal frustration might
well be an inadequate stock of things, most of which
we know We want from bibliographic and catalog
searches. We may use rare books or manuscripts
now and then for purposes hard to explain, but
facsimile copies at original size would ordinarily
be quite enough. An exception in my own field is
that an art critic or historian has no business
talking at all about paintings, sculpture or build¬
ings he has not seen in the original. No skill in
photoreproduction — even in living color — is a
substitute regardless of scale and quality. But
these experiences are by definition not in the li¬
brary. After they have been had, sometimes
more than once, quite-crude, much-reduced
graphics may do. For example, the size, color
and texture of the basilisk on the west portal of
Amiens are important. After I have absorbed
them on the site, if I need any reminder back in
the study, a reasonable line drawing, an inch
high without color texture or scale, will do; it
may even be better than a photograph. The
presentations computers can make will do for
much of this; the rest has got to be in form that
can be considered at leisure. The most important
part has to be either in books that can be kept and
frequently, even capriciously, returned to, or in
photocopies, black on white at a size large enough
to read with no apparatus more complicated than
"natural" spectacles and with enough margins to
permit heavy personal annotation on paper which
permits annotations. Sometimes even these
annotations need to be cut and pasted into an early
rough draft. Most of this could possibly be
computerized in a fancy enough set-up.

Our editing process is not simple. We need to


retain what we rejected lest we want to restore
it tomorrow. And actually we can edit very
quickly, barring the thinking time, if the inter¬
lines are wide enough and anybody can read our
calligraphy. To be impressive to us, the present
editing arrangements of MAC would have to be
better in all ways by an order of magnitude.

Specialists. These are generally regarded


as the true scholars of the humanities. They
work heavily in original sources. An example:
When Sir Thomas More was incarcerated in
the Tower of London, he is supposed to have
written a prayer on an illustrated Book of The

221
Hours. Certainly, somebody wrote such a
prayer and it is a beautiful one. It is in the
English of the early 16th century. Now a
student who needs to relate this directly to
More needs first to establish, if he can,
that More wrote it; this could require the
use of the original (now in the Yale library)
and nothing else. At the next level down, a
good facsimile will do, purveying the environ¬
ment as well as the text of the prayer. Further
down, a printing of the text only (without
illustration) will do. Still further down, a
translation of the English into modern English
in typewriter face may be enough. It depends.

These fundamental scholars have to do their work


or the synthesizers would be guessing more than
they do. Without the synthesizers, the detective
work might not be very useful.

And it is detective work: they get clues, they search


police records, birth and death and marriage records,
scraps of letters located heaven knows where. Some
of this is in libraries, some not; but even when there
are self-conscious files, say, of the papers of
Hemingway or F. D. Roosevelt, the lacunae will be
enormous. I don't do this kind of work and don’t
know how the people who do it use libraries, but I
am confident they are technically skilled in utilizing
libary aids and probably know the relevant resources
of their own university library very well — better,
perhaps, than the librarian. Indeed, they may be
responsible for much of these resources being
there; and they seem to know even by grapevine what
other great libraries have which theirs does not,
since the accession rate for new discoveries (as op¬
posed to commentaries) is slow.

Let me try to summarize this in another and more


general way.

1. a humanist spends a good deal of time and


pleasurably and productively in studying a work
which may contain something nobody has discovered
before or suggest to him a new insight. The search
for the information itself is a major part of his
task. It is in this sense that the library for the
humanist is truly his laboratory.
2. he rarely, if ever, works under the time pressure
of the contemporary scientist or engineer, real or
fancied.

222
3. he usually does not possess a battery of re¬
search assistants or even secretaries to do work
and he would not trust them to do it if he had them
(there have been notable exceptions).

4. the current document, apparently unlike the


documents of science and engineering, is not
prima facie of more importance than an earlier
one; indeed it will often be less important.

5 . except perhaps in the technical criticism of


literature, the humanist has to examine products
of many disciplines — perhaps, for the reason
only, the literature of literature is better organized
than that of most other disciplines.

6. usually, though not always, the pressure for


use of any given document is felt from a small
constituency.

7. frequency of use of many of the most important


works is low. To remove them on the basis of
such frequency would destroy humanistic libraries.
This includes historians of science.

8. the humanist is strongly dependent upon browsing


as a scholarly, serendipitous device. This might
be less so if his library resources were more
systematically organized but we can only guess.

9. even if he could tap every existing monograph,


the monographic literature available to him is full
of holes.

10. present subject analysis is completely in¬


adequate for him as a specialist.

John E. Burchard

223
APPENDIX N

ON THE PREDICTION OF LIBRARY USE

MOTIVATION

Whether or not it ever were so run, the modern research or


public library certainly cannot be operated as though it were
a passive repository for printed material. The opposed
requirements of storing an increasing collection and of
maintaining easy access to the most-used part of it can only
be balanced by active and discriminatory planning. Whether
the material be stored on shelves, in microfiles, or
magnetically, the exponential increase in publication makes
it uneconomical, and even undesirable, to have all items
equally accessible. In spite of this, however, the library
must be operated so that most of its users can find their way
to the items of information they need, with a minimum of delay
and frustration. To achieve a balance between these opposing
requirements, the manager of an existing library, or the
planner of a new one, must know in some detail what the
user of the library does, how often he will use a catalog or
other reference material, for example, what books or
periodicals he will refer to, and how long he will need to use
each item. As with any other organization in these rapidly
changing times, the librarian should know, as accurately as
possible, what is going on, and should be able to predict
what probably will be going on in the future.

This is particularly true of science libraries. As Price


(Science, Vol. 149, 30 July 1965, p. 510) has indicated, most
reference material in the sciences has a very short useful
life: about one-third of the citations in the Scientific research
literature are to material published in the previous 10 years;
and the references to earlier material are mostly to a rela¬
tively small group of "classic papers", with more than half
of the earlier papers ignored completely. In physics, for
instance, most books published 20 years ago are out of date,
and a not-inconsiderable fraction contains erroneous or
misleading material, in the light of later findings. Surely
such volumes do not deserve accessibility equal to the most
recent publications. Since the first year of the average physics
book’s life constitutes roughly 20% of its total utility, it
certainly is worth staff effort to get these books bought,
cataloged and shelved before half their first year has elapsed,
and to spot quickly the more-popular of them so that
additional copies or reserve arrangements can ensure multiple
access during their most useful period.

225
Policy Decisions

Administrative decisions, both major and minor, regarding


all aspects of library planning and operation, can be wisely
reached only in the light of knowledge of present library use
and by the help of careful estimates of future use. Here are
some typical administrative questions which must be answer¬
ed, either actively or by default, often or occasionally, by
every librarian or library board:

What fraction of the yearly budget should be allocated


to the purchase of books; to the purchase of periodicals?
How should this be allocated among the various fields
covered by the collection?

How does one decide which books (or periodicals) with¬


in a covered field should be purchased? How and when
does one decide to buy a duplicate? How can one
evaluate alternative decision procedures ?

How should books (periodicals, etc.) be placed in re¬


gard to accessibility? Which items should be put on
open shelves, which in stacks, which on reserve
shelves, to be used only in the library, and so on? Can
the amount of use of a book be predicted, so that one
can estimate the fraction of users who will be frustrated
or delayed by reducing the book's accessibility? With
a popular book, what fraction of prospective users will
find the book has been borrowed by another user? How
high must this fraction be before a duplicate should be
bought or the book be put on the reserve shelves?

How much is the usefulness of a book (or periodical)


reduced if its use is restricted to the library reading
room? Can the reduction be expressed in dollars and
cents or in any other measure that allows comparison
with other methods of ensuring multiple access (such
as purchase of duplicate books)?

What is the value of "browsing"? How much easier is


it to browse in an open-shelf collection than in a "stack"
or than from a card catalog? Will any of the proposed
"automated systems" of information retrieval permit
browsing? Can they be modified to do so?

Can a measure be devised for the loss of utility that


results when a book (or periodical) is missing from
the collection for a period, either because "it is
being rebound" or because it has been mislaid or stolen?
How can this measure be compared with the cost of a
guard or of more-speedy rebinding or of occasional
inventories to discover which books are lost so as to
forestall the frustration of the next potential user?

226
In the case of a university library, which has priority (in
service, in book choices, etc.): student or the faculty?
Does the library use by the faculty differ enough from
that by the student, so that there should be separate
libraries (or collections or rooms) for each?

Should there be one big library for the whole university


or should there be many branch libraries? What would
the difference in cost be? Would this difference be
T,worth it” in some measurable sense? And how much
duplication of books would be required to stock ade¬
quately a set of departmental (or school) libraries (in
other words, how many physics books does a geologist
or a psychologist frequently use, and vice versa)?

Questions of this sort are being answered all the time, either
consciously or by implication, by librarians or by their gov¬
erning boards. Most of the time, the operating decisions,
which should be based on an explicit analysis of such questions
and their answers, are based on a reluctance to change past
practices or a desire to emulate some other library, though
it should be apparent that the answers may differ appreciably
from library to library and even from time to time in the same
library. Occasionally, attempts are made to arrive at answers
by "market surveys" of a sample of users. Experience is
showing the dangers of such opinion surveys, unless they are
very carefully worded and unless they are quantitatively check¬
ed against the actual behavior of the same users. Too often
has the questionee persuaded both himself and the questioner
that he would use some proposed new service, only to find
that he seldom gets around to using it, once it is installed.

Pertinent Data

For some time to come, many of the questions listed above will
have to be answered on the basis of the librarian's experience
and intuition. Some of them may always have to be so answer¬
ed. But, surely, a greater quantitative knowledge about
library use can assist in getting answers, will make it easier
to determine when conditions have changed enough to warrant
changes in operation and wherein procedures in one library
should differ from those in another. Data on some or all of
the following questions would be of value in this respect:

What services (chance to sit down, chance to look at a


book, chance to take a book home, chance to look at
the catalog or to talk to the reference librarian, etc. )
does the library attendee use and how often does he use
them each visit? How do different attendees differ in
their use patterns?

227
What is the pattern of visits by various attendees (users)
of the library? Is there an hourly, weekly or seasonal
periodicity in attendance ? How long do they stay, and
what is the distribution of lengths of stay? Is there a
correlation between length of stay and the attendee's
use pattern?
What is the pattern of book (periodical, etc. ) use? With
a freely circulating book, what is the ratio between use
in the reading room and borrowing to take home ? What
fraction of the collection is not used at all during a year?
How do these use factors change with the age of the book?
Is there a correlation between the use factors for succes¬
sive years ? How do the use factors differ for different
classes of books (field of specialty, foreign language, text,
periodical, report, and so on) ? Is there any correlation
between the use factor of a book and the way the library
was persuaded to buy the book (request from faculty,
decision of librarian, decision based on list of new publi¬
cations or on book advertisements, etc.)?

Increasing Need for Data

Da;ta on all these items can be obtained. Most of them are not
gathered by most libraries. Expense and lack of librarian's
time are the usual excuses given for the neglect. Certainly,
gathering any of the data mentioned costs time and therefore
money; to answer all the listed questions in detail each year
would overburden any library's budget. It is the thesis of
this appendix that librarians should realize as managers of
industrial, mercantile and military operations are learning,
that, as use patterns change and publication increases, lack
of such data may lead to wastage and loss of utility, that expen¬
diture of time and money in gathering some of the data men¬
tioned could save more than this in improved utility. In the
near future, the introduction of data-processing equipment in
library operations will make it easier to amass the data; librar¬
ians should experiment with such data gathering before mecha¬
nizing, comparing the various methods of data gathering and
the value of the various kinds of data in assisting policy deci¬
sions, so the data-processing equipment can be designed to
produce the most effective data most efficiently.

It is also the thesis of this appendix that the application of


modern techniques in the theory of probability makes it possi¬
ble to reduce greatly the cost and time involved in keeping a
running record of much of the data listed above. By develop¬
ing and testing out (at 10- or 20-year intervals) a number of
probabilistic models of the operation, the models can be kept
current in the intervening years by the gathering of a relative¬
ly small amount of data. From the models, the other details,
which would be more expensive to gather regularly, can be
reconstituted with a good degree of expected accuracy.

228
Prediction of next year's operation thus consists
of extrapolation of the few items of gathered data; the models
then provide the details of the prediction. (The use of ex¬
perimentally checked theories to predict the behavior of some
system is, of course, the usual method of physical science
and of engineering.

Finally, it is suggested that similar data should be relevant


in the experiments to be performed by Project Intrex. There
will be patterns of use of the various users of the tfmodel
library" for example -- patterns that will display certain
statistical regularities. These, when examined, may in¬
dicate general habits of users which will be, or at least will
be related to, the "user parameters" discussed by Raymond
in another appendix. Likewise, the various elements, data
units or catalog cards will be used by different users differ¬
ently, and will change in usefulness with time. It is suggest¬
ed that this behavior will be analogous to the use history of
elements in current libraries, and can be analyzed in similar
ways. The results of the analysis then can be developed into
the set of "message parameters" discussed by Raymond. In
other words, detailed recording and analysis of user and
element behavior in the Intrex experiments, of the sort
described in this appendix, will be needed to provide the
quantitative details, from which one hopefully can build the
general theory discussed by Raymond.

TYPICAL PROBABILISTIC MODELS

A few examples, drawn from data gathered in the Science


Library at MIT during the past 10 years, will illustrate
the points made in the previous section. The use a library
attendee makes of the library during one visit can be classi¬
fied according to "tasks", such as:
Borrow a book (or periodical, or report) from the library.
Consult a book (etc. ) in the library.
Consult the card catalog,
and so on.

The attendee may perform any one of these tasks zero or one
or more times during his stay; the average number of times
each different task is performed per visit provides a use
pattern which can assist the librarian in determining the re¬
lative importance of the services provided, can help, for
example, determine whether different kinds of libraries are
needed for different users, or whether one general library
could satisfy nearly everyone.

229
Probability Distribution of Use

The average use patterns of the various classes of users of


a given library can be fairly easily determined by a simple
questionnaire-survey, distributed to a random sample of a
few hundred of each class, at the time of attendance, asking
what the attendee did during that specific visit (not what he
usually does or what he would like to do!) These average
use patterns are quite illuminating in themselves; a more de¬
tailed description of the statistics of use is, of course, still
more valuable, but requires a considerably larger and more-
detailed survey to achieve. Such a larger survey was carried
out at MIT; the results displayed significant internal regu¬
larities which, if vertified elsewhere, would enable one
to construct the detailed statistics from data on average use
patterns alone. Thus, there are indications that a simple,
probabilistic theory of library use can be constructed.

What was found, from the MIT data, was that the number of
tasks per visit, performed by members of certain obvious
classes of users (such as chemists or physicists, or fresh¬
man sophomore undergraduates, etc. ) was geometrically dis¬
tributed. In other words, out of NQ visits of a given class,
the number of visits during which k or more tasks were per¬
formed is fairly accurately given by the simple formula
NoaX where a is a constant factor, less than unity, character¬
istic of the given class; a_ is, in fact, equal to the ratio K/(K
+ 1), where K is the mean number of tasks performed per
visit by members of the class. Not only is the totality of
tasks so distributed; the individual tasks, of the sort instanced
above, turned out to be independently distributed and subject
to the same geometric formula. Thus, the average K’s for
individual tasks are additive, to obtain various factors si for
various groups of tasks of interest in policy determination.

If these findings are verified in other libraries, the need for


large use surveys in every library, or each year in the same
library, is obviated. The librarian has first to divide his
library’s users into a few broad homogeneous classes, each
of which satisfied such a geometrical task-number distri¬
bution. Henceforth, he needs only to measure average numbers
K of tasks per visit for each of his classes (which requires
a much smaller survey than is required to establish the dis¬
tribution itself); the probability distribution can then be re¬
constituted by using the geometric formula. There are general
arguments which indicate that library tasks should be thus
distributed. For example, the geometrical distribution is
typical of a purely random sequence, such as the frequency of

230
occurrence of k or more successive heads in a sequence of
coin tosses; the random nature of the time limitations and the
interests of the attendee, plus the random availability of the
various books in the library, makes it plausible that such a
random distribution should apply.

Once it is decided that a certain class of attendees of a


library is a homogeneous class in regard to use (i. e., that
the use pattern of this group has a geometric distribution),
then a number of conclusions of policy interest can be obtained
by measuring the average number K of tasks per visit for the
class and from the K deriving the factor a = K/(K + 1) which
specifies the distribution. For example, if biologists (a
homogeneous class in the MIT Science Library) borrow,
on the average, 0. 5 book per visit, when no limitation is
placed on the number of books borrowed per visit, then about
11% of the biologist attendees withdraw more than one book per
visit (a = 0. 5/1. 5 = 0. 33; probability that K is greater than
1 = af = 0.11). Thus a limitation to one withdrawal per visit
would affect one biologist in nine. Further use of the formulas
shows that such a limitation would reduce the number of books
withdrawn from the library by biologists to two-thirds of its
previous value. If the afs are similarly known for the other
classes of users, the results can be combined into a general
statement regarding the fraction of all the people affected by
the limitation and the total reduction of books out on loan.
Many other deductions of operational interest can be similarly
calculated.

Duration of Visit

The Science Library survey indicated that the lengths of stay


in the library of attendees of a homogeneous class also are
geometrically distributed. In other words, if t_ is the mean
length of stay of the users of a given class, then the fraction
of the visits of attendees of this class that stays longer than
time t_ is e"*' . As with the use-pattern data, once the dis¬
tribution is verified (or is taken on faith), a relatively simple
measurement of the mean duration T of stay per visit for
each class is sufficient to determine the full distribution of
stay-times.

A general consequence of the geometrical distribution of


lengths-of-visit is that the rate of departure of attendees
from the library is proportional to the number of persons,
of the class in question, who are present at a given time,
independent of the time at which they arrived. Thus, an
equation between rate of arrival and rate of departure can
be set up and solved.to predict the number of attendees still
present in the library at time t_ after its opening in the morn¬
ing, given the curve of average arrivals as a function of
time of day.

231
Circulation Interference

Other probabilistic theories can be utilized to study the cir¬


culation properties of various books and periodicals in the
collection. The theory of queues can be utilized, for instance,
to analyze the relationship between the rate of circulation of
a given book (the number of times per year it is borrowed)
and the frequency with which a would-be borrower finds that
someone else has the book out. If, in addition, a count is
kept of the number of times per year a reserve card is left
for the book, it is possible to estimate the number of persons
per year who found the book on the shelf and borrowed it
without waiting, the number who had to leave reserve cards
to get the book (and how long they had to wait, on the average),
and the number of persons who, on finding the book was out on
loan, gave up and did not leave a reserve card.

Thus a continuous check on the circulation rate of the more-


popular books would enable the librarian to estimate the degree
of delay and frustration caused by having only one copy of a
popular book, and thus to judge, in a quantitative manner, when
it is necessary to buy another copy or copies.

Prediction of Circulation

In order to study the dependence of circulation rate on a book’s


age, class and previous circulation, the circulation records of
several thousand books in the MIT Science Library were
tabulated. Superficial examination indicated several general
characteristics of these use histories. As would be expected,
the circulation rate usually decreases with book age. A popular
book may be borrowed more than 10 times a year during the
first year or so but, after four or five years, this rate has
dropped to one or two a year; after 20 years, it is borrowed
only once in two or three years. But there are occasional ex¬
ceptions to this general rule. A book may have a low cir¬
culation for a few years, then suddenly become popular. The
data showed that the subsequent circulation history reflects
this increased popularity, rather than the earlier neglect.
Circulation again decreases with age, but more in accord
with books originally popular, rather than with the unpopular
ones. Once in a while, a book will have a second, sudden
access of popularity; again, the subsequent circulation
history reflects the latest access, rather than the earlier
history.

This general behavior suggests that the Markov process would


be a likely probabilistic model for the history of book use.
If circulation were a simple Markov process, the circulation in
a given year would depend only on the previous year's circula¬
tion. This dependence would be probabilistic; not all books

232
having circulation m during the previous year would have
the same circulation, but the distribution-in-circulation would
be determined; a certain fraction would not circulate, another
fraction would circulate once, and so on, the fractions being
determined by the previous year's circulation. Thus, there
would be a chance for a book to have a sudden increase in
popularity and, for such a book, its subsequent circulation
would be typical of the increased popularity; it would have
forgotten its earlier neglect, so to speak. In addition, of
course, a few books will suddenly lose popularity; their sub¬
sequent history will, with a few exceptions, reflect this loss
of popularity. Thus the Markov-process model accounts for
the exceptional, sudden changes in circulation, as well as
the usual, gradual decrease in use.

To test this suggestion quantitatively, the records of all


books of a class (physics, biology, foreign language, etc. )
that had a given circulation m in some year were collected
together and the distribution of circulation for the subse¬
quent year was examined. This distribution was rather wide¬
spread, with an average usually somewhat smaller than_m,
but with a small fraction of the books having circulation
greater than m. Detailed examination indicated that the dis¬
tribution of circulations about the mean value was in accord
with the Poisson distribution, characteristic of random
occurrences.
The model which thus emerges is as follows:

For all books having a circulation m in a given year, the


mean circulation n for the subsequent year is linearly dependent
on m, n = a + bm, where b is a fraction less than one, corres¬
ponding to a gradual dimunition of circulation with time, and
a is the "residual circulation" after time has removed the
initial popularity. The actual circulations for the subsequent
year are clustered about this expected or mean value in
accord with the Poisson distribution. The formula for the
probability that a book of a given class circulates n times in
a year, if it circulated m times the previous year is

_ (a + bm)n „ -a-bm
x — “ r_ j 6
n n!
This formula fits the data, particularly for the more-popular
books, remarkably well. Its use enables one to predict the
future circulation rates of various classes of books. For
example, if the mean circulation of a given group of one or
more books during a given year is M per year, then the mean
circulation rate of the same group t_ years later would be
1
N = a , ~ “ + Mb1
1 - b

233
In addition, the model displays the limitations on prediction,
caused by "quantum jumps" in popularity which occasionally
occur. It indicates that, even though a book did not circulate
at all last year, it may circulate next year, and it provides
a value of this probability. It thus can predict how many books,
which have been "purged", will have later to be "reinstated".
Thus the costs of a proposed purging program can be
evaluated for comparison with its advantages.

Of course, the simple Markov model is a first approximation


to a detailed theory of circulation. There are indications
that the constants a and b change slowly with the age of the
book, but this complication, when verified, can be included
in the calculations. What is needed, at present, is an
evaluation of the values of a and b that describe the circula¬
tion of various classes of books in different libraries, to see
what regularities emerge, with the hope of eventually being
able to "predict" the circulation of a book before it is
placed on the shelves (the prediction being subject to the
variability inherent in any probabilistic model).

Further Developments

A number of other aspects of library use should be investi¬


gated in a similar manner. For example, the circulation of
a book is not a complete measure of its usefulness; it also
is used in the library. At present, this use is more difficult
to measure in detail. The user questionnaire mentioned
earlier does indicate the relative number of times a user
of a given class consults a book in the library, compared
to the number of books he borrows. But these data do not
indicate the correlation between the use of a given book
in the library and its circulation rate. Such correlations
will require a different sort of questionnaire or other means
of observation. Some data have been gathered on the books
left on the library tables, after they were consulted by some
user, but too little has been done to be able to report any
results. Data on lost (or stolen) books will depend on re¬
gularly established inventories of the library, at least on a
sampling basis.

Surveys of library use patterns should be carried out in


other libraries. The MIT survey at least indicates the
patterns of various classes of scientists. How much do the
use patterns of historians, for example, differ from these?
And how do the circulation histories of books on philology,
or on anthropology, differ from those for books on chemistry?
When these sorts of questions can be answered, we will be
in a much better position to plan future libraries and to
operate present ones.
Philip M. Morse

234
APPENDIX O

A TECHNIQUE OF MEASUREMENT
THAT MAY BE USEFUL IN PROJECT INTREX EXPERIMENTS

The technique of measurement to be proposed may be useful


in comparative evaluations of information transfer systems
or services. Suppose, for example, that a user — a subject
in an experiment — wishes to look at the six or eight best
journal articles, published during the last three or four years,
on the topic, "the role of the superior olivary complex in the
mediation of directional binaural hearing’1. Suppose that five
different facilities or channels are available to him, each
capable of giving him access to more or less approximately
the required documents, but with varying degrees of effective¬
ness in selecting truly the most relevant ones, with varying
demands upon the user’s operating knowledge and skill, and
with varying time delays. Assume that the user is familiar
with all the facilities. The object of the experiment is to
derive from the user's actions, and of course from the actions
of other users like him, measures of the relative values of
the competing facilities.

Probably the most straightforward way to set up the experi¬


ment is to assign a cost to use of each of the facilities, to give
the user free rein in choosing the facility to use in trying to
gain access to the documents he needs, and to record his se¬
lection as an expression of preference under the specified con¬
ditions of cost. The trouble with that procedure is that it
provides only a single datum each time a user makes a selec¬
tion, and the single datum is an all-or-none selection. That
is not very much information to obtain from an observation
that is bound to be expensive.

There are, of course, many ways to milk more information


out of the subject's behavior in the experiment. Most of them
have in common the extraction, from the subject, of informa¬
tion about the several subjective values that, presumably, he
pits against the stated costs in arriving at his selection. The
particular technique (or class of techniques) that looks best
to me is based upon a game, embedded within the experiment,
in which the subject bids for the various facilities or services
without having been told the cost figures set by the experimenter
The subject might say, for example, that he will offer $6.00 for
hard-copy xerographic reprints delivered through the express
messenger service, which he knows to have a record of making
deliveries in between 45 minutes and an hour and a half, that
he will pay $10.00 to see enlargements of microimages of the
same documents through the ’’blue” service, which almost
always produces its information within 15 minutes and which

235
lets the user retain the microfiche, that he will pay . . . and
so on. The experimenter, or the computer system through
which the experiment is being conducted, might then say that
no one of the bids is high enough to obtain the desired service,
but that the bid for the microfiche would be successful if it
were doubled. The subject might agree to that price and that
service, or he might raise one of his other bids. I think it
would be possible to work out rules of the game that would
lead the subject to reveal his pattern of evaluations during
the course of his making each selection. Moreover, I think
it would be possible to achieve that result without having to
reveal, in advance, the experimenter's schedule of prices
for the alternative facilities or services.

It may be worthwhile to dwell a moment on the last half of


the last-mentioned point. Presumably, the experimenter
will estimate costs on the basis of expectations for the several
technologies during the period 1970-1975. If the experiment
were set up in the straightforward way, the experimenter’s
estimates of the cost would exert a direct influence upon the
subject’s selections. The data obtained in the experiment
would be no more valid than the estimates of the costs, and
the estimates would, of course, be quite fallible, for it is
difficult or impossible to predict the economics of the tech¬
nologies very far in advance. It seems to me to be quite im¬
portant, therefore, to devise techniques of measurement that
do not contaminate the experimental results greatly with the
fallibilities of cost assessment. I realize that, even with the
proposed scheme, subjects would tend to develop impressions
concerning the cost structure during a series of trials. How¬
ever, the impressions could be kept tenuous by varying the
estimated but unstated costs of the several facilities or ser¬
vices from trial to trial and, in addition, explaining to the
subjects that the cost assessments were not fixed, and that
the subject’s bids should be based on willingness to pay, not
upon expectation of any particular pricing pattern.

In the foregoing, I have tried to give only a general picture


of the technique, and only a brief statement of why I think it
is important. This seems to me not to be the time to settle
down to a detailed or refined development or exposition. I
am interested to see, nevertheless, whether this suggestion
sets up a resonance in any other members of the Planning
Conference.

J.C.R. Licklider

236
APPENDIX P

EXPERIMENTS ON INDEXING- SEARCH AND DISSEMINATION

INDEXING

A library item can be described by many methods. The total¬


ity of such methodologies I shall here call "indexing”.

A set of key words appearing in the item, the usual combina¬


tion of author and title, the set of names of authors cited, a
class or classes from some classification schema, and so
forth, are examples of indexes. The usual index, listing
occurrences of important words in the item, is itself an ex¬
ample of an index.

The literature includes reports upon experiments that were


intended to measure the relative effectiveness of various in¬
dexing methods. These experiments sometimes utilized
human participants, and sometimes were simulations run
according to indexing and searching schemes contrived by the
experimenter. In each case, the experimental objective was
to measure the extent to which some mixture of indexing and
searching methods can bring to a user an appropriate part of
some collection of items.

There are also several reports upon evaluation procedures


that might be used for analysis of data obtained from indexing -
search experiments. For example, Gerald Salton has a computer
code that permits the analyst to measure indexing-searching
effectiveness for a body of items stored digitally, where index¬
ing strategies and search strategies and criteria of effective¬
ness are all described by parameters under input control by
the analyst. Of course, the analyst is limited to strategies
and criteria included within the broad class parameterized by
Salton, or else he must extend the computer code. As another
example, an Arthur Anderson & Company computer code has
been written to enable an analyst to measure effectiveness of
retrieval schemes that utilize key terms, when certain para¬
meters are given that describe the distribution of useless and
useful key terms within the items of the collection.

Very briefly, the greatest need in indexing-searching experi¬


mentation is for a methodology and a collection of items that
can be used by many research groups interested in evaluative
comparisons of alternative indexing-searching systems. Intrex
could develop a group of users and a collection of items to be
employed in comparative evaluation experiments. It is
assumed that such experiments would, of course, be conducted
within Intrex but also by groups outside Intrex.

237
The experimental design problem is the usual one: a control
group would be helpful. For example, the users in an exper¬
iment might be students in two comparable sections of the
same course. The attempt would be to compare relative task
success, using two alternative index-search systems. I hasten
to note that simultaneous experimentation would be conducted
with the subjects, with other factors varied, to avoid the im¬
possible task of evaluating all alternatives factorially.

None of these remarks is intended to imply that experimental


evaluation of indexing-searching systems is more important
than the invention of improved systems (now more desirable
than theoretical evaluations when these are possible).

It is believed that the most critically important un-met need at


present is for improved indexing-searching systems. Or, equiva¬
lently, hardware progress currently outpaces software progress
because of the unsatisfactory character of available indexing -
searching systems.

DISSEMINATION

Information is prepared and distributed in many different


ways. The recipients utilize many means of selecting from
among all possible information sources; and most of these
methods of selection are highly dependent upon existing tech¬
nology. The selection of periodicals for personal subscrip¬
tion, the quick turning to favored sections with recurrent
serials, the choice of radio station or church or newspaper,
the contingent reinforcement of associates and employees
and friends for information transmitted, and many other
selective acts characterize the attempt by every individual
to get information that is valuable — but to avoid information
that is valueless. All these considerations are central to the
design of any information transfer system, and all together
constitute the "dissemination11 problem.

One class of dissemination systems attempts this selection


by developing a characterization for potential recipients. In
the scheme developed by Luhn, as with many others that
depend upon a set of key terms to characterize each potential
recipient, the quality of the selection is dependent upon both
the ability of the system designer to create a good set of key
terms and the ability of the user to select a subset that prop¬
erly describes him for this purpose. The principle is closely
akin to that of subject indexing in the library field, and suffers
from the same difficulties. Paramount among these is the
difficulty of creating a set of key terms that is at all adequate
among a wide variety of users over a period of changing times.

A really good dissemination system would make it unimpor¬


tant for a user to browse, or to search, or even to store much

238
information. The system would bring the information to the
user when he should have it, and would do this by maintain¬
ing a currently reliable characterization of the user in its
own processor. Most of our information dissemination sys¬
tem is constructed in this spirit, and it is for this reason
that indexing-searching systems are relatively unimportant.

Our school system is a prime example. The students are


practically forced to receive certain information at appro¬
priate stages of their development because the system has
learned that this is desirable. After the student specializes
a bit, as in a choice of medicine over law, this fact is used
by the system to further narrow the range of information
supplied to him, and so it goes.

Our libraries have failed woefully in meeting this selective


principle, which requires very frequent feedback from the
user to the library in a form that systematically alters the
image of the user stored currently in the library's descrip¬
tion of its users. Consequently, users have been forced to
learn to search for the items — a formidable task in the
Library of Congress with its 50 - 100 million items.

Intrex should especially encourage experimentation with


automatic dissemination of information to its users, with
absolutely minimal effort required of the user to advise the
system regarding his satisfaction or dissatisfaction with
each item supplied. For example, items cited by an author
are presumably useful to him — even when he merely criticizes
them — and the system could use such data in building its
image of each user. Looking to the future, after the system
is in an advanced state of development, the portions of an item
quoted by another author could be analyzed to determine which
of his publications should perhaps be called to the attention
of the author he quotes — usually including the item with the
quotation. These trite examples are intended only to indicate
that the system can attempt to develop an image of each of
its users, using only data obtainable without special or addi¬
tional effort by the users.

A good librarian often brings items to the attention of his


clientele in just the way proposed for the automatic dissem¬
ination system. Friends and colleagues often do the same
for each other. It is proposed simply that Intrex should ex¬
periment with many variations on this theme of automatic
dissemination, and with evaluative experiments that compare
alternative dissemination systems.

Feldman experimented with Luhn's type of key-word selective


dissemination system with a University of California group at
Berkeley. Kochen and Wong conducted some evaluative

239
- experiments at the IBM Research Center with a dissemination
scheme (DICO) based upon similarity of interests among users,
rather than upon profiles expressed by use of key words (where
the initial feedback by the users was a relevancy index value
for each of a sample of items). Flood extended the DICO idea
to use feedback to modify the distributions made to users in
a stochastic, adaptive, sequential information dissemination
system (SASIDS) that is now under test, and for which some
comparisons have been made with the Luhn and Kochen-Wong
systems.

Evaluative comparative experiments with alternative auto¬


matic dissemination systems are difficult or impossible to
conduct successfully unless and until some experimental
users make themselves truly dependent upon them, and accept
them to some proper degree as substitutes for their prior
habits of browsing and scanning. Only a large, continuing
project like Intrex, with computerized controls over informa¬
tion accessibility to the users, provides a setting for experi¬
ments of this type.

To put the difficulty differently, reader habits have developed


over a long period of time in, adjustment to existing technology.
There has been a great premium placed in the past on knowing
a great deal — an individual with an encyclopedic knowledge
was invaluable — because facts and opinions were not readily
available. As this ready availability increases, it will be
less important to store such large quantities of information
against the contingency of need in some unpredictable situa¬
tion; and so we all will perhaps store less miscellaneous in¬
formation, and instead develop greatly increased skills and
better tools for using information. Much of the time now
spent in systematically keeping abreast of various special
areas of knowledge can eventually be devoted, instead, to
other purposes, as automatic dissemination improves — very
much as a key executive or commander depends to a consid¬
erable degree upon his staff to provide him with the inform¬
ation he is most apt to need. Experimental users cannot be
expected to depend upon a primitive automatic-dissemination
system to the exclusion of their usual habits, if they cannot
be assured that they'will have the advantage of the new system
permanently. Project MAC surely could not have been aided
so greatly in its time-sharing software development by its
users if they had not been confident that the program ideas
would be usable in future, improved time-sharing systems.

The difference between an information system that develops


an understanding of each user so as to better serve his needs,
and one that does not, is the difference between success and
failure. It is the Legislative Reference Service of the Library
of Congress that tailors its service to the needs of individual

240
members of Congress and their staffs and thus serves that
group, while the library as a whole serves the national library
community primarily and develops procedures and services
pertinent to this need. Intrex should certainly include many
features that relate to the characterization of individual users,
and subsets of users, to be used in producing automatic dis¬
semination of quality comparable to that given a special
librarian who is intimately familiar with his small group of
users. Experimentation by Intrex on dissemination systems
should go very far beyond devices like the use of key-term
profiles, as in the Luhn system.

Merrill M. Flood

241
APPENDIX Q

EDUCATIONAL FUNCTIONS OF INTREX

INTRODUCTION

The library of a university has an important role in the edu¬


cational programs of its various schools and departments.
The effectiveness of this role has often been limited by the
numerous inefficiencies in present library operations. And
it has, as well, been limited by the frequent tendency of faculty
and students, not engaged in research activities, to empha¬
size specific course texts while relegating to a secondary
position the immense stores of information possessed bythe
library.

As Project Intrex moves towards the design of a new library


and information system, we anticipate marked improvement
in the MIT Libraries1 abilityto cope with problems of storage
and retrieval of an explosively expanding population of docu¬
ments (in several different forms), and with an immense col¬
lection of information. We expect the university library of
1975 to move from today's passive activity to an active role —
even to an on-line information transfer status.

Such improvements and revisions of traditional library func¬


tions must be accompanied by a re-assertion of the library's
role in the educational network. For, reviewed in a broad
sense, the entire university is an information transfer system,
with its library providing only selected transfer channels and
services. The faculty, the lecture, the laboratory, and the
seminar represent other channels. And these other channels
are being constantly revised in regard to technique, emphasis,
balance and interaction with each other, sometimes in pro¬
grams as large as or larger than that envisioned for Project
Intrex.

It is clearly a requirement of Intrex to develop experiments


with a view towards demonstrating the power of the new
library to supplement the other (traditional) information
transfer channels of the university. Of equal importance is
the need for Project Intrex to establish new boundaries with
future developments in the other educational channels at MIT
and, further, to explore techniques designed to de-emphasize
the traditional divisions among classroom, laboratory, library,
dormitory and lounge.

Aside from the research-oriented information retrieval


function, the educational role of the university library has
several facets:

243
The traditional back-up role for the con¬
ventional lecture course, i. e., making
available extra copies of texts and supple¬
mentary reading material;
The provision of instruction in the art of
information retrieval from the university
library store or from external sources;
The fostering of self-search and acquisition
of new knowledge by the individual student
through the library medium.

The Classroom Back-up Role

The back-up role is already being modified as teaching tech¬


niques are altered, and as texts are supplemented by visual
aids, film strips, and audio records. Students are presently
assigned ’’extra viewing” material, as well as extra reading
material. It is reasonable to expect the university library
to provide greatly expanded back-up functions for future
educational activities on-campus. An Intrex-designedlibrary
system can, for example, be expected to provide back-up
storage and retrieval responsibility of Project MAC-developed
computer programs, as well as those developed independently
by faculty and/or students.

Instruction in Information Retrieval

Instruction in the art of information retrieval at MIT is


presently limited to publication of an occasional library
handbook or guide pamphlet, and to the direct efforts of the
’’front counter” librarian and of the reference staff. In the
Intrex-designed library, it is expected that the searching
and retrieval function will, to a major degree, be automated,
computerized, and extremely rapid. And there follows, from
this, a distinct danger that the student will emerge from MIT
after 1975 with less knowledge of the working of the univer¬
sity information storage and retrieval system than does a
member of the class of 1965. (The driver of an automatic-
shift 1965 Ford understands less about basic automobile
mechanics than the owner of a Model T was required to know.)

This is an acceptable state of affairs if the student leaves


MIT for employment in NASA or General Electric. But if
he joins the staff of the Pincher Shoe Company or Nohole
Bakery Inc., he must quickly come to grips with a Model T
informational operation — even in 1975, The dichotomy of
the precision-hardware/defense-space oriented industry and
the civilian-oriented software or low-precision hardware
industry will undoubtedly still be with us in 1975. And the
present gap between the two will have widened even more in
ten years.

244
Hence, it is imperative that Project Intrex should experi¬
ment with built-in instruction aids designed to teach the
effective utilization of the nation’s information store as
viewed from the entire spectrum of users — in industry, in
Government, and in the university.

Education, Self-Acquired

The role of fostering the independent acquisition of new


knowledge by the individual, through the library medium, is
hardly fulfilled in the university of 1965. The English uni¬
versity concept of "reading for a degree” does not exist on
the typical U.S. campus. At MIT, the aggressive, stimu¬
lating, intellectual input from the faculty, and the pressure
it engenders, tends to suppress the opportunity, if not the
desire, for independent learning by the student in new and
"unassigned” areas — as differentiated from directed informa¬
tion retrieval via the library medium.

One result of this natural response to today’s academic


pressures is that the graduate (particularly at the B. S. level)
becomes accustomed to spoon-fed education (sometimes in
teaspoons, sometimes ladles). And he has not experienced
the pleasurable apprenticeship in self-education, under sym¬
pathetic tutelage of faculty and library staff.

Recognition of the fault of the present system is evident in


our national concern about the limited half-life of the engi¬
neer, in the discussion and implementation of programs of
continuing education, and the appearance of numerous papers
with such titles as "Retooling the Mind”, While it is expected
that many scientists and engineers will return periodically
to the formal classroom to upgrade their technical knowledge,
it is likely that most will have to "go it alone", using means
available at work and at home. Hopefully, local library
facilities and access to information systems will be at hand.
The rest will be up to the individual's effectiveness in "reading"
the information store. The Intrex-designed library can con¬
tribute in a worth-while manner to this problem if it can sig¬
nificantly increase the student’s aptitude and desire for self-
activated learning — and, just as important, if it can develop
a unique service for the departed graduate to facilitate his
continuing education.

Consideration of the varying degree of fulfilment by the


present library system of its educational roles is but a start
in determining the optimum educational activity of a Project
Intrex-designed system. The development and evaluation of
new components, services and programs that emphasize the
educational role must be included in the Intrex program. And
here are some possible experiments which have been con¬
sidered during the Intrex Planning Conference.

245
A Class-Room-Supplement Experiment

This experiment is directed towards the design of a better


system for information transfer, to serve as a supplement
to the lecture in a formal course of instruction. It has a
secondary purpose to provide an improved method for eval¬
uating and correcting differences in subject prerequisites
between students in attendance in a given course. An auxil¬
iary benefit of such an experiment will be its usefulness as a
tool for measuring the relative effectiveness of different
modes of information transfer.

The first experiment suggested is designed to determine the


usefulness of an idea-retrieval system, as compared with a
document-retrieval system, and as compared with the usual
library reference and reserve book rooms. It is extracted
for the most part from Samuel’s PIT-55. The information
contained in each of these subsystems should be limited to
material required for a single course subject in which con¬
siderable use must be made of reference material, but in
which the material is not too extensive but that it could be
put in machine-readable form.

It is further desirable to pick a course which has multiple-


choice prerequisites and which has students with widely
varying backgrounds in attendance. (One such course which
might fit these specifications is Mechanical Behavior of
Materials (2. 30), given by Professor Argon with varying pre¬
requisites (2. 01 or 6. 00J; 3. 14 or 3. 141 or 3. 31). )

Note that this first part of the experiment does not deal with
Computer-Aided Instruction, as usually conceived, but is an
experiment on an information retrieval system which may be
used by the student or not, as he may elect, in preparing for
a course conducted along conventional lines. The student
would be presented with an assignment requiring him to obtain
information from a variety of sources; and he would be pro¬
vided with access to a console on a system containing all the
needed information.

He could, if he wished, ask questions and get detailed answers;


or he could use the system for document retrieval by having
pages from documents displayed on the scope. The system
would, however, keep records of the actual usage, and these
results could be correlated with the student’s general stand¬
ing and his final grades in the course. These statistics,
together with data on students not having access to consoles,
would enable one to evaluate the system.

One could also structure the course material so that some


supplementary subjects were available on both the idea-
retrieval system and on the document-retrieval system, while

246
other supplementary subjects were available only in the usual
library reference room. Then one could undertake paired
comparisons of the same student’s performance on different
subjects available in different forms. And variations on this
theme could be developed with continuing experience in con¬
duct of the course.

On the assumption that the system would use a general-purpose


computer, it might also provide a variety of text-editing facil¬
ities so that the student could use it in the preparation of his
homework assignments (which would, however, be handed in
in the usual way). The extent to which such services were
used would be a measure of their effectiveness. The student
could also write simple programs to make any required com¬
putations, or he could fall back on his slide rule; but the
system would know. One might even go so far as to allow him
to use the terminal for the final exam.

Such a system should be made open-ended so that documents


requested but not available could be added and changes could
be made in the fact-retrieval and idea-retrieval routines
whenever experience showed that they did not answer the
student’s questions — at least, not to the student’s satisfac¬
tion. One could make the polling of the student’s opinion
automatic by requiring an approval or disapproval, or perhaps
a graded response from the student after each reply before
the system would respond to the next question or request.

The second purpose of the experiment is related to the prob¬


lem of varying prerequisites. The teacher of an ongoing
subject (i. e., one that uses directly related prerequisite
subject material from one or more other courses) is frequently
forced to spend the early portion of the term in a lecture-
recitation-quiz exchange that is designed to fix the state-of-
mind of his class. This fix is often based on the least common
denominator of student background and, to a great extent, it
determines the pace and level of the subsequent course struc¬
ture. Assigned extra reading on an individual basis can
expand the common denominator, or the sink-or-swim
attitude can be adopted with the assumption of a high, arbi¬
trarily selected common denominator. But even the toughest
instructor has second thoughts in viewing consistent negative
skewness in quiz results.

The second part of the proposed experiment is, therefore,


intended to include a section of programmed evaluation and
reinforcement of student prerequisite backgrounds. It relies
entirely on Computer-Aided Instruction. The desired essence
of the course will be conveyed through the classical lecture
route, reinforced by supplemental study via various storage
and access media. But the system for evaluation and

247
correction of prerequisites will be library-stored and-serviced.
It may be operative only in the early part of the term, or it
may be resorted to as new phases of classroom discussion to
identify for the individual student the nature and scope of his
deficiencies.

In the final analysis, the faculty member, cooperating with


the library in the classroom-supplement experiment, must
provide the intellectual content for the programmed instruc¬
tion on student deficiencies. The library does the rest in
servicing and monitoring the "prerequisite equalizer". And
evaluation of the effectiveness of the system will be based on
a correlation between the use of the "equalizer" and the per¬
formance of the prerequisite-deficient student. The ultimate
success of the process will, of course, be related to the ex¬
pansion in course coverage — achieved with minimum student
failures or drop-out’s.

A Laboratory-Supplement Experiment

This experiment is intended to demonstrate the potential of


the library as an organized retail-distributor of intermediate
(or, more important, varying) sized packages of educational
subject matter. The large-size subject-package on campus
corresponds to the conventional course, the small package
to the single seminar.

The undergraduate at MIT can exercise a varying number of


subject electives, depending on the policies of his department,
as administered by his faculty advisor. The graduate student
has a high order of flexibility in choice of electives. In fact,
he faces an entire catalog of useful and desirable courses.
But, conventionally, his choices are restricted to specific
educational packages, whose sizes reflect the faculty’s con¬
cept of a suitable subject entity.

A student cannot take five one-hour lectures on any subject


in the MIT catalog. Yet, in certain subjects, this is all he
may want — either to sample the subject prior to signing up
for the formal course or to add an important note to his gen¬
eral educational background. In short, the student is com¬
pelled to cast his lot at the beginning of each term and do for
four months with four or five subject packages selected on
the basis of a short abstract included in the general catalog.
(We must of course recognize the presence of course listen¬
ing as a medium for promoting flexible choice, and of pre-term
faculty advice on course selection — but basically the situation
is as described above. )

An experiment could well be developed to formulate "subject


capsules" of varying sizes of depth for all subjects offered in

248
the MIT catalog (of subjects as distinguished from library-
catalog). And these capsules could be stored and serviced
in the Intrex-designed library. Such an experiment would
require active administration by the Registrar’s office, and
extensive cooperation from the entire MIT faculty. But a
more appealing experiment can be directed toward one par¬
ticular subject which is offered in every department of the
schools of science and engineering at MIT. It is the subject
of instrumentation and measurement.

The student taking a course in instrumentation is either in¬


terested in designing new instruments or in using existing
instruments for observation, measurement, and/or control
of physical (or chemical) phenomena. (We shall not consider
the designer. )

The use-oriented student may be interested in the full package


of subject matter offered, say, in 2. 651 (Physical Measure¬
ment and Analysis) or 2. 652 (Experimental Stress Analysis).
Or he may be interested simply in learning how to use an
electrical resistance-bonded strain gage to build a load cell
for measuring the uncrimping force of a 3-mil diameter wool
fiber.

In MIT Course 2. 30 (Mechanical Behavior of Material), simple


film strips and audio records have been prepared to service
the student entering the Mechanics Laboratory for the first
time. And without instructor participation, the student can
teach himself how to use bonded strain gages, etc. Clearly,
this procedure can be expanded to incorporate Computer-
Aided Instruction. But, even in its present form, this labora¬
tory instructional aid can achieve a higher order of utility if
its content were better advertised and distributed within the
academic community. The library is the most-appropriate
agency to undertake this advertising and distribution function.

The specific experiment we propose is directed towards the


development of a library store of instructional media relating
to both classical and modern instrumentation in all fields of
science and engineering. Books or papers on instrumentation
do not (and usually can not) fulfill this role. What is needed
is a suitable collection of conventional film strips, audio
records, digitally stored information, and microimage stor¬
age to let the student teach himself the essentials of intelli¬
gent use of instruments. If the instrument in question is
relatively inexpensive, or unusually indestructible, the stu¬
dent may be directed to a MIT location where he may sign up
to practice what he has learned. Or he may be directed to
an instructor who can transport him rapidly through the crit¬
ical period of real experience on the actual instrument.

249
The role of Intrex will include the solicitation and assembly
of existing material, and even the sponsorship of instrument
teaching programs to be included in this experimental col¬
lection. The experimental library will advertise the presence
and content of the programmed instructional material, and
store it in readily accessible form. Records of its usage
will be maintained and correlated in terms of gross average
student-user and instructor reaction, or in terms of paired
comparisons on different students with and without access to
the programs for particular instruments, and on the same
students with and without access to programs for different
instruments.

EXPERIMENTAL INSTRUCTION
ON THE ART OF INFORMATION RETRIEVAL

If an Intrex system is to be phased-in as an adjunct and ex¬


tension of the regular MIT library services, then its services
should be described as an organic part of the orientation and
teaching material furnished by the libraries to students and
faculty. A recommendation directed towards this end is to
prepare a series of brochures, each with an MIT academic
field in mind, starting with the Intrex-selected field. Each
brochure would describe the effective bibliographical con¬
trols available currently and conventionally — their strengths
and weaknesses, their coverages, suggestions as to sequence
of use — in the subject field covered.

As a supplement to this, for use of advanced undergraduates


and graduates, an experiment is suggested with programmed
teaching and teaching machines. These would be set up so
as to require a dialogue-type of instruction (to simulate man-
computer interraction) of the user: statements, questions,
directions leading to answers, re-instructing back to correct
answers where the user has gone in the wrong direction.
Such a programmed set of instructions would include visual
guides through the bibliographical maze (flow charts showing
a search progressing step by step in a recommended sequence),
charts, film strips, animated films. But all would be directed
toward showing the interdependence of a network of informa¬
tion sources, including the conventional library and its keys
and the expanding network of extra-library lines to other
sources. As Intrex progresses, communication lines to
other information centers would begin to appear on the charts
and in the instructions (the brochures would need continuous
revision) and the machine elements, the interface equipment
to computers and their use, would be introduced.

At all times, the instructional material should be tailored to


the qualifications and needs of the user. For many purposes,
where questions are uncomplicated, uncomplicated sources

250
of information should be recommended: dictionaries, ency¬
clopedias, textbooks, handbooks, the library’s card catalog.
Where deeper penetration of a field is needed, the indexing
and abstracting services of the particular discipline and its
related fields should be progressively brought into play. For
full, specialist needs, the total information-containing net¬
work should be recommended, including communication with
sources making limited distribution of their information (such
as personal communication with authors). An advanced student
should know, even with an advanced system, how to get in¬
formation from "in" groups. A great deal of information
today is available only by tapping into a circular type of com¬
munication channel among members of a particular, disci¬
plinary "in" group.

The needs of specialists — people who either know almost


everything aoout a subject or who need to know everything —
are stressed. Ideally, of course, it should be possible for
learners at any stage of their work to enter meaningfully
into the system at their level and to extract information use¬
ful to them. If a textbook is needed, and not a highly spec¬
ialized monograph, this should be referenced on proper inquiry.

Evaluation of this experimental component of the Intrex system


can be conducted in terms of the automatically collected statis¬
tics of its use and in the record of user satisfaction. A more
direct experiment could incorporate a series of subject
''treasure hunts", with alternate access points available to
different students and to different research staff. One hunter
would be required to use the conventional library system
without retrieval instruction; another, the conventional library
system with programmed retrieval instruction; one, the Intrex
system without retrieval instruction; and the last hunter would
have access to Intrex plus specially programmed retrieval
instruction. Evaluation of the results of the searches should
consider speed and effectiveness of the different routes. It
should also take into account the incidence of useful informa¬
tional fall-out not coincident with the treasured subject.

Experiments Relating to Self-Search


and Acquisition of Knowledge

The first experiment in this series represents a cautious


approach to the concept of individual reading for a degree.
It involves reading for a subject, and it requires that the
undergraduate read any one of his curriculum-prescribed
fourth-year subjects in lieu of taking the corresponding
formal course. The student would have advice of a qualified
'tutor during the reading period, and he would be examined at
the end of the year on what he had actually read, rather than
on what was covered in the formal course. He would have

251
available the full services of Intrex in searching and repro¬
ducing hard copy as he moved into the subject. The system
would record his search patterns and keep the tutor informed
of his movements. The tutor would attempt to advise more
on search strategy and study patterns than on actual subject
questions. Whenever possible, student questions relating to
subject content would be handled by having tutor and student
move to the Intrex system and search together for the answers.

Clearly, subjects that require problem drill could not be


effectively introduced into this experiment until some formal,
programmed instruction were developed, or suitable incre¬
mentally structured work books could be prepared. For,
throughout the experiment, the emphasis should remain on
self-education of the student through the information store.

The evaluation of the experiment would be based on suitably


paired comparisons of different students who studied the
same subject in the formal course and in the reading exer¬
cise. And careful record would be made of the subject con¬
tent for which the tutor found it necessary to provide inter¬
pretation, even after search of the information store was
conducted.

A second experiment; representing more of a gamble, would


allow a student to read in the Intrex library system across a
complete departmental area for his third and fourth under¬
graduate years. He would have available all Intrex services,
would have access to tutors in any of his chosen fields, and
would operate under the same rules as apply to a single-
subject reader. The one difference would reside in the form
of examination to be given. In this second experiment, the
evaluation of the student and his reading effectiveness would
be through the medium of a general examination, oral and
written, as is now given to the Sc. D. candidate. And care¬
ful record would be made of cumulative tutor-student contact
time to permit an analysis of the economics of the process.

The student would be allowed to attend lectures if he so


desired, or even to take one (or at the most two) formal
courses per year. As an inducement to enter such an ex¬
perimental reading program, his tuition fees would be
waived and a stipend substituted, as his financial needs
dictated.

The third experiment in this area is directed towards the


scientist or engineer in industry. It would be based on a
trial fellowship program where scientists or engineers,
whose last degree goes back five to ten years, return to the
campus to read in a given subject or subject area. They
would be given the same privileges, access and tutelage

252
accorded the undergraduates in the above two experiments.
They would not be permitted to read for a degree, hence
tuition levels should be reduced.

Coming from industry (or Government), their motivation and


subject specificity would be greater than that of most under¬
graduates. It would therefore be highly desirable that their
responses to the library's output and their dependence on the
tutor be recorded and analyzed in considerable detail. Their
feedback into the system should highlight the failure of our
present information stores to encourage and facilitate con¬
tinued education off the campus. Particular attention should
be given to their reactions to and requests for programmed
instruction and/or instructional aids that project beyond the
conventional printed page.

Evaluation of this third experimental program can be under¬


taken through general examination, with results not released
to the employer. The proof of the value of such a system
will be in the number of repeats from the same company or
industry.

Stanley Backer

253
APPENDIX R

THEORY, MODELING AND SIMULATION

There were many wealthy manufacturers of steam engines


before Helmholtz, Gibbs, Clausius and others developed the
firm, theoretical base of thermodynamics on which the
modern engineering of all heat engines is based. It would be
interesting to know whether an engine — even one as advanced
as the Corliss, let alone a modern, high-performance turbine —
could have been designed without a basis in theory. It seems
possible that we might have achieved similar designs, but
the amount of experimental effort required would have been
enormous. The elapsed time might well have extended beyond
the present.

In the engineering of a new system or even in the design of a


useful experiment, there is no substitute for theoretical
understanding of the process. Without a mental image of the
subject of the experiment and definitions of its parameters,
we have no basis for selecting variables to be measured or
for determining the features of the needed instruments.
Theory will exist, at least implicitly, in the minds of all
those who do experiments. If we can formulate explicit
theory so that it can be discussed and mutually understood,
we may be able to predict the results of a new experiment, to
relate differing experiments, and to unify our knowledge of the
information transfer system. Modeling of the system is an
essential aspect of theory. It may turn out that some of the
models will be easier to deal with by simulation than by
simpler, deductive logic.

Theories can deal in gross generalities (thermodynamics), or


in very detailed ways with small parts of a problem (electron
physics). In the most desirable state, but one seldom
achieved, the detailed theories concerning the individual parts
of a system fit neatly in relation with each other and with a
general theory of the whole system. In this happy situation,
a theory appropriate to the level of aggregation or generality
of an experiment or a problem can be selected, and results
from the application of this theory can be used as input to
theoretical statements at both higher and lower levels of
aggregation.

Theoretical models can often be transferred from one field


of investigation to another with only minor changes. We should

255
seek analogies. For example, a part of the theory of a cir¬
culating collection has much in common with the theory of
reliability and maintainability. What little theory there is
in the field of machine translation can probably be trans¬
ferred to problems in the areas of man-machine conversation
and content analysis. Perhaps the broadest theory to apply
to the over-all project is the theory of information and com¬
munication.

Theories vary widely in the extent to which the theoretical


model represents the real subject of effort. Differences be¬
tween experimental results and theoretical predictions can
generally be attributed to a failure on the part of the theorist
to model the real system with sufficient care, or to a failure
in the experiment to measure real variables most closely
corresponding to the theoretical parameters of the model.

To illustrate a way in which theoretical studies may get


started in Intrex, let us lay out a theoretical model of the
entire process envisioned in the 1975 goal, "an information
transfer system including library functions". We must first
decide how much of the actual, real-world process we wish
to represent in the model. It seems clear that it will be
difficult to describe and deal with telephone calls among
friends, corridor conversations, invisible colleges, news¬
paper reading, and similar informal aspects of the total
communication problem. It will probably be important to
reflect the influences of informal communication later in
some way; but, for a start, let us try to represent the formal
system.

Our system will include persons who submit written or


machine-recorded messages. We shall call these persons
authors or, more generally, sources. It will also include
persons who receive the messages; we shall call these
persons users. Identified individuals may be both sources
and users during the operation of the system.

Linking the sources with the users is a man-machine complex


which interacts with both sources and users. Its interaction
with sources may be direct, or it may be through one or
more intermediaries. Some of the intermediaries may be
similar man-machine complexes in other locations.

Recorded messages (in some cases, documents or parts of


documents) that are processed through the system will range
widely in format and content. Generally we shall identify as
a message any record that is normally handled as a unit
during processing, and is distributed or disseminated as a
unit to a user.

256
We shall assume that the foreseeable technologies of the
complex will preclude repeated processing of the contents
of most of the messages in depth. A control function based
on less than complete text will be necessary to make the
complex operate effectively. We shall assume that the
sources are unaware of the identities of many of the users
to whom their messages should be sent and that the users
are often unaware of the submission by sources of messages
that are relevant to their work. These features, and the
fact that some of the messages will have enduring value to
future generations of users, suggest that we model the com¬
plex as a store-and-forward, broadcast communication
system. Each input message may eventually be desired by
an unknown number of unspecified users.

To specify the model more completely and to make it look


more like the real-world system, let us put some structure
into the complex (see Fig. 1). First we split the complex
into two, interrelated parts. There is a message-handling
part, which carries out the physical processes of receiving,
storing, retrieving, copying for dissemination, and delivering
copies of messages to users. There is a control part which
selects messages and sources of messages, orders acquisi¬
tion of messages, selects and derives the data needed for
control of the messages, makes files of control data available
to users, and controls all the functions of the message -
handling part. The control part of the complex also selects
messages for removal from storage and removes the asso¬
ciated control data from the control files. It collects control
data that describes the interests of users and is thus able
to order selective dissemination of messages by the message¬
handling part. The control part may also provide users with
computational facilities so that users may program searches
of the control-data files and may carry out logical analyses
and computations on the basis of data from these files and
data in some of the messages.

Control data are useful in the system for two reasons. The
total quantity of control data should be considerably smaller
than the total quantity of message data so that processing
can be faster. The control data, if properly chosen, are
more susceptible to logical arrangement and processing than
message text for a variety of purposes. Control data relating
to the messages in the system may be based on any attribute
of the messages to which values may be assigned in such a
way as to permit orderly filing and searching. Some of the
attributes of each message may be evaluated from informa¬
tion associated with or contained in the message. Some may
be obtained from auxiliary sources, such as abstracting
journals or other messages which make reference to the
message.

257
The purpose of the theoretical model we are building is to
permit a prediction of a measure or measures of success of
the system in providing communication from sources to
users, and a measure or measures of the effort required to
build, maintain and operate the complex. We should like to
be able to compute the chosen measures of effort and per¬
formance as functions of parameters that describe the system.
We should prefer parameters that correspond as closely as
possible to variables that might be measured or controlled in
an experimental system.

Measures of effort can usually be reduced to capital-invest¬


ment and operating-cost figures, and to the elapsed time re¬
quired to get the complex into operation. Predictions from
theoretical models of large systems have not been uniformly
successful in the past, but we are on far firmer ground here
than we are in attempting to formulate measures of success.

It is easy to say that a measure of value of the complex to


the source-user community is developed eventually through
evidence of the willingness of this community to support and
make use of the complex; but it is a sad, social fact that the
development of a concensus of this type generally requires a
period of five to twenty years from the date of first intro¬
duction of the capability. If it is sufficiently novel, the
facility may become obsolete or fail to function from lack of
support long before it can be properly appreciated.

TIME DELAY

Assume that a source has prepared and submitted a message


that has high potential value to a fraction of the community
of users. The time period starting with the submission of
the message to the complex and ending with the delivery of
a copy of the message to the first user to whom it is
relevant is a measure of the speed of the internal processes
of the complex. There will naturally be a distribution of
time delays for delivery to the entire body of users to whom
this message is relevant. The breadth of this distribution
in time will be a measure of the usefulness of the control
system in selective dissemination and in allowing users to
search for relevant messages.

In a properly arranged system model, it should be possible


to compute time delays at least probabilistically. There will
be a distribution of system processing time delays which will
depend on the system work load and on the capacities of the
data-processing, man-machine facilities of the complex. As
we see in Fig. 2, we can expect that the system delay without
selective dissemination might be shorter than with it for a
given capacity, because of the extra processing required in

259
selective dissemination. On the other hand, the use of selec¬
tive dissemination may allow a much larger fraction of the
users to whom the message is relevant to receive it at a
slightly greater time delay than would the normal user-
initiated search process.

Associated with the time-delay measure of performance is,


of course, a real human question. What is the value to the
human user of decreasing the time delay? A little thouglion
this problem shows that there is a wide range of potential
relationships between the value of a message to a human
user and its delay in delivery. There also seems to be some
fundamental limit on the rate at which a human user can
accept messages. Striving for a system with time delays in,
say, the microsecond or nanosecond range seems unreward¬
ing. Some of the experiments previously suggested deal with
the value to the user of time delays in access to a specified
message and time delays in access to the system. Experi¬
ments in the over-all source-to-user time delay-value rela¬
tionship might be more significant from a standpoint of
general design of the complex.

RELEVANCE

The system under discussion is entirely capable of passing


more messages to each user than he can possibly read, let
alone understand and use effectively. An extremely important
measure of system performance will eventually be found in
the degree to which the complex, in interaction with the user,
can assist him to select those messages that he wishes to
see and to avoid waste of time on all of the rest.

At the present state of the language arts, relevance is a


highly subjective matter. The user may judge a message
relevant or irrelevant to his work in ways that no complex
can now predict or even describe adequately. Furthermore,
the relevance decision is temporary. A message that was
relevant yesterday is no longer of interest. One that was
received three weeks or three years ago and judged irrele¬
vant suddenly assumes crucial importance, especially in the
high, creative climate of the university. It seems unlikely
that we shall be able to devise a truly satisfactory measure
of performance for the system on the basis of relevance;
but, on the other hand, the problem of the user in dealing
with all the messages in the system by browsing or other
means is so great that some attempt to develop a useful con¬
cept of relevance and to measure the system performance in
its terms certainly seems warranted. Much attention has
been paid to the issue of relevance in the extensive literature
on library operations and on information storage and re-

260
trieval. The general conclusion from reading it is that a
useful concept of relevance will be much more complicated
than most of the workers in the field believe it to be. Addi¬
tional theoretical work, tied to well-planned experiments, is
certainly justified by the importance of the problem, though
perhaps not by the probability of success.

EASE OF OPERATION

The system will have a variety of operational services with


different degrees of service to the user (see Fig. 3). It will
have a large number of users of diverse educational back¬
grounds and diverse interests. A measure of the usefulness
of the system to users can be found in determining what
fraction of the user population is able and willing to use each
type of service from the system. If, for example, some
features of the system require an extensive knowledge of
computer programming, it will surely be found that these
features are of use to a very small fraction of the user popu¬
lation. At the other extreme, a telephone reference service,
which demands only that the user know it exists and how to
dial its number, would probably be used by almost the entire
user group, if it gave a useful service under one of the other
performance criteria suggested above. With data on the
education of the user population available, it should be possible
to make theoretical estimates of the fraction of users who
could avail themselves of any well-described system service.
Here, again, close cooperation is clearly called for between
theoretical workers and the Intrex experimenters who deal
with the values of the system to users.

Fig. 3

OF THE SYSTEM SERVICE

261
USER SERVICE DELAYS

The amounts of time spent by users in securing preliminary


information from the system and then in awaiting the comple¬
tion of some system action are measurable experimental
quantities which are probably in some way related to the
eventual value of the system to the community. It is possible
to construct models in which parameters corresponding
closely to these observable quantities may be computed
probabilistically. In the experimental phase of any investi¬
gation of the value of the experimental system to the individual
user, as it relates to these time delays, it will be essential
to plan clearly and keep good records in detail. For example,
the value of a system that makes the user wait an unspecified
and unannounced length of time may well be quite different
from the value of a system giving the same real performance
in time but with provisions for informing the user as to the
delay he may expect. In the case of a conversational-mode
augmented catalog, it will probably be found very important
to know in detail the extent to which the system informs the
user of the protocol and procedures he must use and the
extent to which it can tolerate user errors in spelling and
procedure. It will be difficult to construct useful theory
here without a good deal of additional study and experimenta¬
tion on actual man-machine interactions. We should expect
a good deal of information in this area from all the developers
of direct-access information systems and from all the experi¬
mental systems, such as MAC. But unless we are thinking
about such information in terms of theoretical concepts
useful in Intrex, it may not be very useful or applicable.

MODEL PARAMETERS

We should like to identify and describe in the theoretical work:

The sources or authors.


The users.
The messages.
The control data.
The message-handling processes,
The control processes.

Let us look in a little more detail at some of the parameters


that might be used in a system model, bearing in mind that
we should like these parameters to correspond as closely as
possible with observable and controllable real variables,
at the same time being subject to precise definition and
theoretical calculations within the logical relations that com¬
prise the system model.

262
SOURCE PARAMETERS

Sources or authors are people who prepare messages which


may be selected for inclusion in the operations of the system.
For modeling purposes, we can compute the load on the
selection and acquisition processes of the system from a
knowledge of the number of sources to be considered, and
from a statistical description of the number of messages we
may expect per source in some time period. For practical
purposes, the expected number of messages per source-year
is probably adequate. We anticipate that the selection and
acquisition process will work to some extent on the basis of
sources and to some extent on the basis of messages, so that
both parameters must be known. If this is not the case, and
selection is made only on the basis of messages, then a
single parameter (which is the product of the two stated
above) is adequate, i. e., the expected number of input
messages in unit time.

USER PARAMETERS

At our present state of knowledge, it will be extremely


difficult to forecast user behavior adequately. We may ex¬
pect such behavior to change in unforeseeable ways in inter¬
action with the system. We shall therefore start by describ¬
ing the user community in terms of the number of users, the
number of accesses we may expect from each user to the
control part of the complex in unit time, and the number of
full messages desired by each user in unit time. Additionally,
we may need a parameter to describe the expected on-line
time per user access to the control part.

Beyond these parameters, if we wish to approach the thorny


problems of the value of the system to the source-user
community, we shall need some relationships such as the
expected loss in value of a message as the total time-delay
between source and user increases, the expected loss in
value of an interaction with the control part of the complex
as the elapsed time between user initiation and completion
increases, and the expected loss in value of a specific
message as the delay time between user request for the
message and delivery to him increases.

Next, since we may wish to make some comparisons among


differing kinds of user consoles and differing formats for
data and message delivery to users, we shall need some
parameters such as the relationship between value to the
user and scale factor of reduction for a message delivered
to specified color, contrast, and resolution standards but at
reduced size. Complementing this parametric relationship,
we shall need a relationship between user value and quality

263
of presentation at a given scale factor -- say, full size.

Finally, since we may wish to relate the value of the system


to users to the complexity of operation, we shall need a
parameter for each of the kinds of user operation included
in the system that relates the operation to the expected
number of hours of user instruction and training needed to
permit the user full use of the operation.

MESSAGE PARAMETERS

To describe and relate the processes of the message-handling


part of the system complex, we shall need to describe the
messages in terms of parameters that include the class of
message formats at the input to the complex, the statistical dis¬
tribution of lengths of the messages in each class of format,
the number of messages of each class in storage at some
time, the number (derived through application of operating
policy to the source parameters listed above) of messages
input to the complex per unit time, and the amounts and
formats of control data received with each message.

CONTROL-DATA PARAMETERS

Here we shall need to know the amounts and formats of con¬


trol data required to operate the complex in accord with
selected policy and procedure; and we shall need to flow-chart
and describe quantitatively the control processing needed to
derive such data and put them in the formats required by the
design of the control part of the complex for user access,
selective dissemination, etc.

MESSAGE-PROCESS PARAMETERS

Messages selected for inclusion by the system can be


described by the message parameters cited above. The
processing loads, both input and output, can be derived
from the source and user parameters described above. The
message-process parameters include the choice of formats
for storage in the complex, and the choices of format con¬
versions to get the input messages into proper form for
filing and to get the output messages into the desired form
for use by the users. When these parameters have been
chosen, it will be possible to compute the required capacities
of the format conversion equipments, the required file sizes,
and the message time delays to be expected under differing
system loads. It is not expected here that there will be any
single class of formats in any part of the process, but rather
that the mess age-handling system may deal with a range of
formats from digitally stored and electronically processed

264
computer data,, at one extreme, to rare manuscripts kept
under glass, at the other. A theory that attempts to describe
the real situation, then, will describe the message handling
in terms of all the input, filing, retrieval, conversion and
output processes selected for the real system, with the addi¬
tional complication that conversions among several of the
file formats may be required to serve differing users ade¬
quately.

CONTROL-PROCESS PARAMETERS

The control process is the operational expression of the


policy and procedure decisions of the system management.
As such, it must contain sufficient muscle to express and
administer the policies and procedures with respect to both
sources and users. It must also contain sufficient sensitivity
to sources and users to allow management to get information
on the success or failure of particular items of policy or
procedure. For more detailed consideration, we shall assume
that the over-all control process is described in such a way
that it may conveniently be separated into a selection and
acquisition process, an internal process, and a user service
process. The internal process is largely aimed at joining
the other two processes effectively and providing the inter¬
face for interaction with the system managers.

SELECTION AND ACQUISITION

Selection and acquisition requires that the system contain


knowledge of the sources and the intermediaries that are
most likely to provide messages desired by the system users,
now or in the future. The intermediaries include publishers,
other libraries, and eventually national and international
networks. To model this aspect of the control process, we
need parameters that include the numbers and classes of
sources and intermediaries, together with some way of
describing the classes of messages that are likely to be
available from each. We need parameters that describe
the management policy as to expenditures for acquisition of
messages in relation to some message-classification scheme.
We may wish to make use of parameters available from the
description of the user-service process, as indications of
the current status and trends in the desires of users for
messages in various classes. When these parameters have
been selected and estimated in accord with a given policy,
we should be able to compute the file sizes and amounts of
information processing needed for selection and acquisition
of the messages for the system. We should, of course, in¬
clude an allowance for selection of messages that are put
directly into the system by local users, and that may have

265
enduring value to other users, locally and in related systems.
This load may be small relative to the load from external
sources, but it may be large enough that it cannot be ignored.
We must also remember to provide capacity for the normal
business processes related to acquisition.

INTERNAL PROCESS

Internal process includes selecting, editing, deriving the


needed control data for each selected message, putting these
data into proper system formats for the system files,
selecting and ordering any conversion of format for the
message, decision on the removal of any message from file
to file or from the system altogether, together with suitable
alteration of the control data files, decision on the time-
availability and format of delivery of messages to users,
decision on the amount of machine file space and the amount
and kind of computational capacity to be made available to
each user, and collection of data and summaries required
for system management. When the volume of control data
needed for the implementation of a particular system
design can be estimated, and when the availability of the
selected data from external sources is better known, it
should be possible to describe these internal processes by
flow-charting and to estimate the sizes of files and process¬
ing capacities needed to maintain specified over-all delay
times of this part of the process under specified loads. Here,
again, the process can be multiple and diverse, ranging at
one extreme from filing a set of catalog cards received with
a new book, to machine analysis of the complete text for the
derivation of key words and syntactically analyzed subject
entries at the other. Highly recommended is a continuing
effort to describe the process in terms of a model that will
permit extrapolation to other processes and other sizes and
loads.

USER SERVICE

This process can be further detailed into: selective dissem¬


ination, control-file search, delivery of specified messages,
and user computation.

Selective dissemination demands that the control part of the


system gather and retain descriptive files of control data to
describe classes of messages that may be of interest to
users. Most efforts in this direction, to date, have dealt
primarily with subject-classification data, but there is no
reason other control data should not be used. Other kinds of
control data that show potential include author names,
citations of the work of the user, citations of selected
messages, project names, etc. The control part of the

266
system can gather the desired data by asking users to fill in
a questionnaire, by interviews, or perhaps more effectively
by monitoring the user interactions with the system in
terms of one or more of the types of control data suggested
above. If over-all system value to the user source commun¬
ity depends on total delay times, as we have suggested above,
selective dissemination can prove to be an extremely important
part of user service. Appropriate parameters describing the
file sizes needed and the processing capacities required to
handle specified kinds of user's interactions and system
processes can be selected, once the general policy on
selective dissemination is expressed. With these parameters
and with given system loads and capacities, it should be
possible to compute the system time delays for selective
dissemination. The effectiveness in terms of relevance of
the messages to users is also extremely important. It
should be possible to find parameters to describe the per¬
formance of the system in this regard in relation to the types
and volumes of control data processed in selecting messages
for dissemination, provided there is an adequate mechanism
for collecting information from users on their appreciation
of the relevance of the materials sent out. In the proposed
model system, it may also be possible to infer user opinions
by collecting data on their uses of other features of the
system (e.g., their searches through the catalog).

When the experimental system provides users with means


for interrogating and working with the control files on a
flexible basis, it should be possible to relate some theory
of this process to the experiment. As we have seen above,
the operation of the system is apt to be highly sensitive to
small details of the man-machine interaction; and it is
difficult to find parameters that will permit us an easy
description of this process without detailed statements of the
exact hardware, software, and environmental configuration.
We have available, from other parts of the theoretical model,
parameters such as the total sizes of the files of control
data, and we know that users cannot spend more than their
full working time in the process of searching these files.
But, before any reasonable predictions of machine capacities
and search times can be made, we need additional effort to
attempt better descriptions of the actual file-search processes
in terms of the sizes and structures of the files and the search
strategies that may be developed by ingenious users. Per¬
haps, at the theoretical level, additional general theory on
the structure of machine files — as it relates to foreseeable
or desirable memory technologies and as it relates to syntax
and semantics — is the indicated direction.

267
Times for delivery of specified messages to users may be
computed quite easily when the other features of the
system (mentioned under control and message handling) have
been selected, and when the geography and delivery means
have been determined. These times include the times for
receiving a clear request from the user, queueing times in
various parts of the process, which may depend on capacity
and the statistical fluctuations in the system loads, times for
format conversion if needed, and times for physical delivery
to the place specified by the user. Any real system should
be describable in terms of these times; then it should be
fairly easy to build a theoretical model to represent the real
system and to permit computation of expected performance
at differing work loads and with differing design choices.

Theoretical modeling of the part of the system that provides


for user computation will be difficult for several reasons.
Basically, there is no good theory of the operation of any of
the foreseeably marketed computer systems. In addition, it
will not be possible to describe the user interaction with the
machine in the absence of a great amount of detail, as we
have seen above. The theory of this part of user service is,
at present, much more appropriately the field of projects
such as MAC. It appears that, in the Intrex model, we
should provide some hardware and software to permit this
type of interaction, and then watch to see how important it
becomes. It is entirely possible that, in the development of
this art, the user will be better served by providing for
high-speed transfers of data from Intrex system files to the
temporary files of a large system whose main function is
computation, rather than through an effort to include the
computation function in Intrex.

R. C. Raymond

268
APPENDIX S
GRAPHIC COMMUNICATIONS FOR THE LIBRARY

On 24 August, I had the opportunity to describe the Xerox


LDX (Long Distance Xerography) equipment at an Intrex Proj¬
ect meeting (specifications for some existing equipment are
given in Table 1). Discussion during and after the meeting
indicated strong interest in future input media (microform
and bound documents) and in library graphic communication
networks. These areas are therefore discussed below.

TABLE I

Specifications for Transmission Rate, Resolution, and


Transmission Bandwidth for some typical Xerox facsimile
equipment.

Transmission Transmission Rate


Bandwidth (pages per minute)
100 1/in* 135 1/in 200 1/in
48 kcps 3.2 1.6 0.8
240 kcps is 8 4

Resolution in TV lines per inch

MEDIA

The present LDX equipment was designed to scan and print


full-sized images. The scanner input documents cannot be
bound, but may be any width between 5 and 9. 5 inches. The
actual scan is, however, always 8. 5 inches. The printer
output is always 8.5-inch roll paper which is cut after the
printing process to correspond with the input document
length. Printing is normally accomplished on ordinary pa¬
per stock; however, special stocks, such as offset master
stock, can be used.

MICROFORM

Although transparencies and negative or positive copy can be


handled, the LDX scanning definition (TV-type resolution) of
190 lines per inch does not allow successful transmission
directly from a microform. If microforms are transmitted
by LDX, they must first be enlarged to full size (or at least
one-third to one-half size, depending on resolution require¬
ments). The enlarged copy is then processed by the LDX
scanner.

269
Microform scanning appears to be technically feasible and
can probably be provided by manufacturers in the future if
it can be justified from a business point of view. The future
development of acceptable microform graphic communication
equipment for libraries will therefore depend on manufac¬
turers' understanding of the needs of libraries. The number
of types of microforms complicates the problem of establish¬
ing specifications for these devices and makes difficult the
economic justification of development programs. Each
type of microform requires different mechanical mechanisms
for transporting the microimage. Also, optical units of dif¬
ferent resolving capabilities may be required. Each mecha¬
nism may require a large engineering program and a large
tooling or manufacturing investment if it is to be obtained
economically. The selection of a standard medium would
make the business opportunity much more attractive. Also,
the library and the user would benefit from the availability
of lower-cost equipment and service.

A statement by libraries on the need for microform trans¬


mission including the following would be very helpful to the
equipment manufacturer.
Type of microform — scale, resolution.
The need for microform printing with microform
scanning or with full-scale scanning input.
Printed document resolution required.
Cost targets.
Type of automatic handling of the microform
that is desired.
BOOK SCANNING

A great amount of present library information is found in


bound documents. Graphic transmission of this information
today requires a two-step process. First, the pages must
be copied on a flat-bed copier such as the Xerox 914. This
is accomplished without tying up the communications facili¬
ties and the facsimile equipment. Second, the copy is scan¬
ned with present facsimile equipment. Direct book scanning
would eliminate one step of this operation at the expense of
lower transmission efficiency. If we assume that book scan¬
ning were available at LDX equipment speeds, the manual
turning of pages would reduce the document throughput rate
by 10% to 25% compared to straight through feeding (Le., the
facsimile equipment and communication lines are tied up
during the page-turning operation). Equipment for future
systems could be developed with book scanning for, like
microform scanning, book scanning is within the present
state-of-the-art. However, if automatic page turning is re¬
quired, book scanning would probably be considered more

270
difficult and expensive. Also, unattended operation with high
reliability appears less probable than with microform input.
The need for book scanning in the 1970's should be clarified
by the libraries when they are considering the possible in¬
creased use of microform.

NETWORK CONSIDERATIONS

A number of libraries and many locations convenient to li¬


brary users can be interconnected by a network of communi¬
cation channels. If LDX equipments are located at each li¬
brary and user site, there are several alternate methods for
providing the required communications network. These
alternative methods can be classified by the type of communi¬
cation channel used and the type of switching and control.
Some of the approaches for providing communication channels
for library use are:
For long haul (off-campus)
Telephone company facilities such as Telpak
wideband;
Private microwave installation which can be
time-shared and/or frequency-shared with
other applications such as closed-circuit TV.
For short haul (on-campus)
Telephone wire pairs (special amplifiers and
equalizers are needed if 48kcps or 240kcps
are required);
Multiple-use coaxial cable. Time-sharing
or frequency-sharing of this type of com¬
munication channel can probably provide
very economical campus communication.
A study of the specification of communica¬
tion equipment indicates that noff-the-shelf,r
equipment is available that can insert sev¬
eral LDX channels in one TV channel for
time-sharing with closed-circuit television.
Also, frequency-sharing is available with
equipment that can insert LDX channels in
the spectrum below the frequency normally
used for coaxial-cable television. To date,
LDX transmission has not, however, been
attempted over coaxial cable when it is time-
or frequency-shared with television.
Switching also can be accomplished in several ways:
Patch panel. This has the lowest equipment cost.
The efficiency (network and equipment utilization)
of this approach is low for large networks that
require frequent switching.

271
Dial-up (telephone-type) switch. This is similar
to the switching in XeroxTs own LDX network.
Dial-up could be subscriber-/or operator-controlled.
Frequency Switching. Switching is accomplished by
tuning receivers (printers) to a given frequency.
The transmitter (scanner) then selects the frequency
spectrum to be transmitted, based upon the desired
destination. (Our home TV receiver, in effect,
performs frequency-switching by selecting the
desired transmitted frequency.) For facsimile,
however, switching control (selection of the
frequency) is more desirable at the transmitting
station. A very-wide-band communication chan¬
nel, such as microwave or coaxial cable, is
required.
Computer-controlled switching. The switching
task for facsimile could be controlled by a time-
shared computing system such as used in MIT's
Project MAC. Because of the high data rates of
facsimile, it appears desirable today to accom¬
plish the actual switching with a mechanical
cross-bar as is used in many telephone dial-up
switches, but to control the switch bv the com¬
puter. The computer then "dials-up" the switch
via an AT&T 801 DATA PHONE which converts
computer words to dial pulses.
Both switching methods one and two above are being used
today in LDX systems.

The selection of the best and most economic communication


networks depends on many factors. Some of these factors
are:
Facsimile communication needs — traffic, volume,
urgency, speed.
Other communication facilities and future needs —
TV, computer-to-computer 1/0, computer-to-
computer, telephone, etc.
Geography or configuration of potential users
and libraries.
Physical plant — tunnels for stringing cable or
tall buildings for installing microwave stations.
Costs.
The combination of a multiple-use coaxial cable and compu¬
ter-controlled switching appears to be an economical approach
for future library graphic communication systems. The

272
economics will be especially attractive when parts of the
system, such as modulation equipment and computer pro¬
grams, are provided as research projects by graduate
students. The best system for a given university can be de¬
termined, however, only after analysis of the universityfs
needs and situation, and after formulation of a communica¬
tions system plan for the university.

Paul L. Brobst

273
APPENDIX T

MORE ON THE EXPERT FILTER

Our emphasis on ’’access11 in Intrex experiments skirts the


important function of selecting prime library content. And
our concern with the implications of computer techniques,
in handling the information store, tends to neglect the con¬
tinuing requirement for the intellectual effort necessary for
optimizing library input.

Heart's paper on the "expert filter” considers one phase of


this general problem. This memo adds a few suggestions for
overcoming the difficulties of attracting the subject-matter
expert to library-oriented activities.

We assume the subject expert is a necessary member of any


successful library organization. But he cannot be a full-time
member, except for short periods, else he will downgrade
his expertise in his field of specialization. He must continue
to contribute directly to that field. A second assumption is
that casual assignment of faculty members to library advisory
committees cannot provide the intensity and the continuity
of intellectual effort needed to assist the librarian in exer-
cising increasing selectivity in future acquisitions.

Between these two extremes, one can formulate a number of


schemes that can be applied experimentally in Intrex. They
all assume adequate payment for library-oriented services
of the subject expert. But payment need not be made in cash.

A first-line expert in a given field is rarely available at hon¬


orarium rates. His need is for time, not money. And so we
suggest extension of special library services to the "expert
library contributor” — services that will more than make up
for the time he devotes to the expert, filtering process.
A highly attractive area of service relates to assistance in
the organization of personal collections.

We suggest that spill-over from the technological effort di¬


rected towards large information stores can be put to a very
useful application in organizing smaller collections. As
J. Dyal has remarked, every professor's office becomes a
library, but secretaries are neither librarians nor informa¬
tion specialists. In short, we recommend exploitation of
the bartering power of time and units of expertise as a means
of attracting the "expert filter". And, in the future, the
"small collection" service may so gain in attractiveness as
to become a highly negotiable item for developing general
faculty responsibility towards the library.

275
An alternate route for developing faculty recommendations
on library acquisitions would be the provision of a fringe
benefit, book-purchasing credit of $150 to $250 to each
professor. And, as he ordered books for his personal tech¬
nical library, copies of the titles would be automatically
funneled to the acquisitions librarian. The system would
also have to record titles purchased from personal funds
(beyond the basic credit) and titles received for review, etc.,
on a complimentary basis. The latter category would, of
course, have to be double-checked by the librarian.

The final idea for developing expert library input is to estab¬


lish a requirement, through appropriate faculty committees,
that graded bibliographies be developed yearly for every
advanced subject offered at MIT. This intellectual effort is
common enough on the academic scene. It simply remains
to formalize it and to channel its results into the information
store. Few professors would care to admit the absence of
such bibliographies in the presentation of their courses.

Stanley Backer

276

You might also like