Selective Dissemination of Information (Sdi) : State of The Art in May, 1963

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

SELECTIVE DISSEMINATION OF INFORMATION (SDI):

STATE OF THE ART IN MAY, 1963


C. B. Hensley
International Business Machines Corporation
Advanced Systems Development Division
Yorktown Heights, N. Y.

INTRODUCTION tion for the SDI 2 System was completed and


Selective Dissemination of Information the first public announcement of a documented
(SDI) is a new and rapidly developing field. SDI System was made on July 11. Implemen-
The concept was originally set forth by Hans tation started on a third system, SDI 3, early in
Peter Luhn in 1958.1-2 As described by Luhn, the year. Although SDI 3 was in partial opera-
one part of a larger idea, the business intelli- tion during the last part of 1961, complete
gence system, was Selective Dissemination of debug and documentation was not completed
Information. SDI involves the use of the com- until 1962. During 1961, several other systems,
puter to select from a flow of new documents, all of which will be designated by their location
those of interest to each of a number of users. names, became operational. These included SDI
This process may be thought of as the inverse Owego,7 at the IBM Federal Systems Labora-
of information retrieval. In information re- tory in Owego, New York; Current Awareness
trieval, a user precipitates a search of a file of Service, a System at the Technical Information
documents. In SDI a document precipitates the Center at General Electric's Evandale installa-
search of a standing file of user interests. SDI tion,8 and a manual current awareness service
has been called "current awareness" since the available at cost to all United States Citizens
attempt is to keep the user aware of current at the Office of Technical Services of the United
developments. This function has been tradi- States Department of Commerce in Washing-
tional with those few really excellent librarians ton, D. C.
and executive staff assistants. SDI is a mecha- More systems became operational in 1962.
nization of this function. The second system tested and documented for
The concept described by Mr. Luhn was first remote installation, SDI 3, became available
i-~m1*-~*~-~-i-~A J— m e n A J. i i _ _ x j_i ;.. tr i_ through the IBM Data Processing Library for
niipiciiiciiueu in ±vov. J^VL mai u m e ill XOJTK- the IBM 1401 Data Processing System.9 The
town Heights, New York, an IBM 650 Data Poughkeepsie System went into operation near
Processing System, together with other card mid-year at the Technical Information Center
machines, reproduction equipment and human of the IBM Data Systems Division in Pough-
operators, processed a small flow of documents keepsie, New York.10 A fourth system from
against the interest profiles of some 30-odd the Mohansic Group,* SDI 4, has been in partial
users. This system has subsequently been called operation for nine months and documentation
"SDI l."a. 4> is In 1960 a second system, SDI 2,
evolved from the original one. SDI 2 is the first * Mohansic is the IBM ASDD laboratory in Yorktown
system designed and documented so that it may Heights, N. Y., where SDI 1-4 have been developed and
be installed remotely.5-6 In 1961, documenta- tested.

257
258 PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1963

is well under way,11 The Douglas Aircraft Cor- months effort would be necessary to install an
poration in Santa Monica, California, has a early SDI system.3 Programmers have been
system in an advanced state of debug.12-13 used in all cases. The time to get a SDI pro-
Around the end of 1962, a second 1401 program gram through a monitor system or to fit in with
became operational at IBM, Data Processing other existing operating procedures has been
Division, Midwest Region, in Chicago, Illinois. quite variable. In one case the program as-
Over the past four years, SDI has moved from sumed a particular load routine long in general
a concept into a rapidly increasing number of use, but not in use in this installation.
system installations. Experience with combining and modifying
existing systems is exemplified by Poughkeep-
Implementation sie. There, despite the fact that the programs
Implementation difficulties for a system are had little, if any, documentation, one or two
often underestimated. This is in contrast to programmers fought through SDI, KWIC* and
reduced operational cost and increased quality an IR program in a few months. The manual
of service which are often overestimated. With procedures were in flux for a longer period.
SDI, the choice is whether (1) to use an avail- The total system is still being modified and only
able system, (2) modify an available system, or parts are in operation. Owego was a rewrite
(3) to write your own new one. Because of the from SDI 2 which took over a calendar year to
uncertainty, implementation cost is hard to get into operation. The program rewrite itself,
estimate. Quality is even more difficult to esti- from start to run, took about three months.
mate. Everyone seems to feel he is an expert Prestart systems work extended longer and, to
on quality. There is disagreement in many my knowledge, the system is still rather weakly
cases. Present SDI systems involve computer documented for remote installation 18 ' 19 and is
programs, manual procedures and sometimes being integrated with KWIC.7 What might
special equipment. In order of increasing diffi- seem to be a relatively simple rewrite of SDI 2,
culty, implementation may involve the installa- SDI 3, required one person for a calendar year
tion of a well-documented, tried and true sys- in a building being noisily rebuilt, although the
tem which is in operation somewhere else; programming and documentation was done by
modification of manual procedures; obtaining an experienced programmer who knew SDI and
special equipment; reprogramming or redesign. the machine.
Human skills available; experience of the The classic problem seems to be an under-
personnel with SDI, Information Retrieval or estimation of the amount of the programming
related areas; and the number of other systems, required to rewrite and document. For experi-
procedures or constraints interacting with the enced personnel, e.g. SDI 3, estimates seem to
new SDI installation all affect the effort re- be low by a factor of four. For less experienced
quired for implementation. Not only are a wide (with SDI) personnel perhaps six would be
systems background, computer knowledge, and better, e.g., Poughkeepsie. It should be pointed
documentation experience valuable, but spe- out that certain phases can sometimes be esti-
cialized knowledge with office machinery, in- mated accurately, e.g., programming at Owego.
dustrial engineering, typography as well as
psychology, sociology and organization theory User Interests
often help. Programmers seem to be necessary Most user profiles (interests) have been ob-
for any type of installation. The more experi- tained without any problem by blindly mailing
ence with data processing as contrasted with a short form to the potential user.t In three
scientific programming the better, but any pro- tests* some 65% of those contacted became
gramming experience is better than none. Sys- users. Mass meetings of potential users have
tems and procedures personnel are well-known been used as well as blindly mailing longer
in most organizations and are certainly advan-
tageous for modifications or rewriting.
t Key Words in Context, a machine prepared printed
Experience with installations of documented index.17
SDI systems is limited. It was estimated that * 5, Pages 94-5,
three calendar months and a total of three man t *, Page 41.
SELECTIVE DISSEMINATION OP INFORMATION (SDl) : STATE OF THE ART IN MAY, 1963 259

forms with either term dictionaries attached, on which there is experience is quite wide, in-
e.g., Owego (modified ASTIA), or enclosing cluding science, engineering and management.
examples*** of indexed document items. In- There are no known cases of letters, memo-
direct methods have also been used to derive randums, or picture annotations being processed
profiles from personnel or project informa- although this has been proposed and no prob-
tion.*** Only with SDI 1 was a comparative lems are anticipated. Document source has been
study made and it had too small a sample to shown to be a significant factor in response.14
be conclusive.*** Each of these methods have Owego uses ASTIA documents predominately.
been proven feasible. Further research is Poughkeepsie uses internal IBM reports. Sur-
needed to define situations where one is pref- veys of what users read15 or library usage could
erable to another. be used to determine what document sources to
Adjustment of user profiles has been done use for an SDI system. Most such data indi-
largely at the user's instigation. At Mohansic, cates a skew distribution of usage with a few
blanket mailings of current user profiles with highly used journals. It is assumed but not
change forms have been made to encourage demonstrated that different types of users need
users to make changes. Users have also been different document sources. Experimentation
notified that they can make changes. The effec- in this area might influence the selection by
tiveness of these measures is subject to doubt. professional journals of items to abstract. SDI
The only known attempt to automatically up- provides a tool in this area through its response.
date or adjust profiles based on user's responses SDI 4 and a revised Chicago system will allow
was tried at Mohansic on SDI 1. The results exclusion of documents by source, e.g. need-to-
were inconclusive. Manual attempts to suggest know or excluding journals user subscribes to.
or arbitrarily make changes in user profiles Volumes of document items being processed in
based on various hypotheses have been made SDI systems run from tens to hundreds per day
from time to time, usually without controls. with experience upward lacking. Subscribing
Although how to get new users to join and to a journal is not much of a problem, but
give the "best" possible profile seems to be a getting on internal distribution lists is more
difficult theoretical problem, in practice there difficult than one might expect. It cost the
seems to be no difficulty. Experiments with Mohansic group several man months of effort
automatic updating are in order but adequate to locate internal sources of information and
user response histories seem to be necessary. arrange to be added to these distribution lists.
The number of users serviced by SDI systems Documents normally come to one location,
now in operation has ranged from tens, to are handled and numbered. Some SDI's inte-
one to two thousands. Experience with larger grate with library operations to various de-
groups is lacking although no new problems grees. Owego uses the same numbering and
are anticipated. One problem, not initially an- hard copy reproduction procedures. Mohansic
ticipated, which increasing number of users provides abstract sets and utilizes journals
has proven to be important, is that of address from the library. Douglas is partially inte-
changes. These occur so frequently that not grated. Some work with IR systems, e.g., Evan-
only must they be considered part of every dale, Owego, Washington. Document number-
normal run, but provision is necessary to ing may be sequential as at Mohansic or by an
rVlJmCTA p r i r l r ' o a o a c h a i i i T a c m n n f i f i n o + i r v - n rnnA \\n-vA int°niQ^ nnAa 0,3 Q+ QwgnrQ and Poughkeepsie.
copy order. As we shall see below (Abstracts Checking for duplication and series complete-
and Notifications) this affects the notification ness is a normal library problem.
itself. There seem to be few serious operational
problems in this area. Studies are needed to
Documents
test automatic procedures to analyze user re-
Document sources for SDI are usually defined sponses and to vary the document source mix
by the application. The range of subject matter to maximize value functions. Little has been
done to study the effect of frequency of mail-
*** Ibid., Page 6. ings to the user.
260 PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1963

Indexing and Decision


The primary decision method used in SDI effected by direct documents.*** With this
has been a probability of interest estimate, i/d, exception, SDI systems have used two-stage
where i is the number of words identical in the procedures sending abstracts to users and al-
two lists and d is the number in the document lowing them to obtain hard copies, in some
list.* The words are normally chosen by hu- cases providing order forms. Abstracts have
mans** from text and truncated to adjust for been compared to titles and the titles seem ade-
endings. This particular technique came from quate for deciding which document to order, but
a programmer's misunderstanding in 1958 of abstracts are necessary partially to substitute
H. P. Luhn's instructions. It proved to work for the document if the document is not availa-
and was therefore kept. The proposition is that ble.14 Quality of abstracts has been discussed
probability of interest increases with i/d. The in theory. There is no known experimentation.
i/d criterion may vary by document—Mohansic SDI and IR, with appropriate response evalua-
—or by user—Owego, Poughkeepsie and Chi- tion, would appear to be excellent vehicles for
cago.20 A no-truncation fixed-dictionary system this exploration.
with a thesaurus (Owego, ASTIA thesaurus) A number of forms for notifications have
has been used with i/d. Experiments have been been discussed and tried, IBM cards have been
run at Mohansic, but not as yet reported, on the most frequent vehicles. The abstract is
truncation, 4-9 characters; 9 and depth of in- normally typed on a reproduction master and
dexing, 1-26 keywords; as well as machine in- reproduced onto cards. Normal card stock
dexing from partial text by several methods. works well with offset or stencil but spirit
Conventional "Boolean" methods with key- master runs are too short. Chicago and Owego
words are used at Evandale. Chicago indexes use or will use the IBM 1403 Printer to print
by machine from the abstract using KWIC on continuous-form abstract cards directly. A
methods, i.e. dropping common words—A com- machine record of the form number and the
bination i/d and "Boolean" method with varia- user and document would have to be kept so
ble truncation is used. that when the card was returned the response
could be mark sensed and the number read.
A variety of indexing and decision proce-
Mark sense requires special pencils which de-
dures have been used. There needs to be work
posit an electrically conductive mark, but new
to compare the results under varying condi-
optical machines (e.g., the IBM 1418 Optical
tions. Relationships between SDI and IR need
Character Reader and IBM 1428 Alphameric
to be explored empirically in the indexing and
Optical Reader), allow standard No. 2 and No.
decision area. Are they the same or different;
3 pencil marks. Machine (1403) printing of
if different, in what ways? More work needs
the form number in place of prepunches is
to be done on the desirable amount of direct
possible with an odd font. These have yet to
user control over unc uecision.
be tried for SDI but appear cheaper for high
Abstracts and Notifications volumes.
The decision is made to notify the user of Single (SDI 1) vs. multiple (all other) card
one or more items. What should the notification systems have been under debate since 1959.
consist of? In one of the SDI tests, hard copies This debate no doubt will continue. The noti-
were sent directly to the user.t Users preferred fication should combine (1) the document ab-
a two-stage over a one-stage procedure: receive stract (preferably both 3 x 5" and IBM card
abstract notifications and be able to order hard size), (2) the user's address (3 or 4 lines
copy instead of receiving the hard copy directly for complete postal address which constantly
without intermediate control.* However, their changes), (3) the system return address, (4)
questions regarding the document, (5) provi-
sion for the user's remotely made response to
* See6, Page 30; 3 , Page 9,10;*, Page 8,10.
** Education level doesn't seem to make any differ-
the questions, (6) the document number and
ence. See18. (7) the user identification. The notifications
tSee 3 ; *, Page 6.
t See; 3 *, Page 10; 15 , Pages 8-9. *** See «, Table IV.
SELECTIVE DISSEMINATION OF INFORMATION (SDl) : STATE OF THE ART IN MAY, 1963 261

should be in appropriate sequential order for There have been various hard copy proce-
mailing. If 5, 6 and 7 are not machinable on dures. (1) Ignore the problem (SDI 2-4). (2)
return, response handling for document hard Refer the user to a library (SDI 2-4). (3)
copy orders and operating statistics must be Shelve and pull (SDI 1-4). (4) Keep vellum
manual, as in SDI 1. The abstract, 1, should and reproduce (SDI 2-4). (5) Use aperture
be retainable by the user. A study15 in one cards and reproduce (initially at Owego).
organization shows 3 x 5 and IBM card sizes (6) Use reel microfilm at multiple locations
were the most frequently used media for this (Poughkeepsie). Adequate analysis of cost
purpose even prior to SDI. The response, 5, is and value are yet to be made. Most systems
made at many remote uncontrollable locations. agree with the Mohansic survey,15 users want
The PORT-A-PUNCH® card has proven to to be able to obtain hard copy.
provide a machine readable response. PORT- Value-Cost
A-PUNCH is only now (February 15, 1963)
becoming available in continuous forms, thus This is, in my opinion, the area with the
making machine (1403) printing of the ab- largest potential for development. Available
stracts on the PORT-A-PUNCH card or an cost data 3 is very limited and hard to inter-
attached form possible. Previously a bill feed pret. Available value information15 is largely
attachment was necessary which slows the subjective. Dichotomy scales have been used
printer. Systems remain to be developed and in SDI, i.e., "of interest vs. not of interest."
tested based on bill feeds, optical reading and It is my opinion that ordinal, and cardinal
many other devices. When several notices go scales are needed if we hope to move SDI
to each user at once, placing several cards design from an art towards a science.
together (or using a sheet of paper as at
Evandale) might save handling expense and REFERENCES
user exasperation. No existing system meets 1. "A Business Intelligence System," H. P.
all of these requirements; each compromises LUHN, IBM Journal of R&D, 2, 4, 314-319,
to some extent. Considerable research is neces- October 1958.
sary before sufficient basic knowledge is ob- 2. "Selective Dissemination of New Scientific
tained as to the relative worth of these various Information with the Aid of Electronic
features. Processing Equipment," H. P. LUHN,
American Documentation, 12, 2, 131-138,
Response, Reports and Hard Copy April 1961.
SDI 2-4 require the user to respond on every 3. "Selective Dissemination—Report on a
notice. Other systems require responses under Pilot Study—SDI 1 System," C. B. HENS-
certain conditions (SDI 1, no response if nega- LEY, T. R. SAVAGE, A. J. SOWARBY, and A.
tive) or never, i.e., just a notification. It is RESNICK, IBM, ASDD, Yorktown Heights,
not known exactly what effects this has. N. Y. Report 17-039, January 1961. (Pre-
Responses and other records allow reports to sented at the 18th meeting of the Opera-
the user, operators, management and research tions Research Society of America—1960),
personnel. This is a largely undeveloped area 45 pp.
even though some rudimentary reports are in- 4. "Selective Dissemination of Information—
cluded in the SDI 2 and 3 systems* Feedback A New Approach to Effective Communica-
reports could be used to assist in updating user tion," C. B. HENSLEY, T. R. SAVAGE, A. J.
profiles, changing the document sources mix, SOWARBY, and A. RESNICK, IRE Transac-
adjusting the system sizes, changing indexing tions on Engineering Management, EM-9,
methods, and adjusting the cost vs. value bal- 2, 55-65, June 1962, 11 pp.
ance. Randomly selected notices (SDI 1-4) 5. "Selective Dissemination of Information—
allow the system selection performance to be SDI 2 System," W. BRANDENBERG, H. C.
compared to random selection as a base. This FALLON, C. B. HENSLEY, T. R. SAVAGE,
also allows miss items (which could have been and A. J. SOWARBY, IBM, ASDD, York-
selected by the system but were not) to be town Heights, N. Y. Report 17-031, April
estimated statistically. 1961,102 pp.
262 PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1963

6. "The Selective Dissemination of Informa- 13. "Library Information Retrieval Program,"


tion System—Present Operations and G. W. KORIAGIN, Missiles and Space Sys-
Future Application," A. J. SOWARBY, pp. tems Engineering, Douglas Aircraft Com-
14-40, in "Library Seminar—October 19- pany, Inc., Santa Monica, California, Engi-
20, 1960" C. F. Balz, Editor, IBM, FSD, neering Paper No. 1269, March 1962
Space Guidance Center, Owego, N. Y., (Presented to the American Chemical So-
February 28,1961. ciety, Washington, D. C.) February 1962,
7. "The Merge System of Information Dis- 23 pp.
semination, Retrieval and Indexing Using 14. "Relative Effectiveness of Document Titles
the IBM 7090 DPS," R. H. STANWOOD, and Abstracts for Determining Relevance
IBM, FSD, Owego, N. Y., Report 62-825- of Documents," A. RESNICK, Science, 134,
441, April 1962 (presented at the Septem- 3484, 1004-1006, October 6, 1961.
ber 1962 ACM meeting), 10 pp.
15. "The Use of Diary and Interview Tech-
8. "New Concepts in Technical Information niques in Evaluating a System for Dis-
Services," B. K. DENNIS in Proceedings seminating Technical Information," A.
of The Engineering Information Sympo- RESNICK and C. B. HENSLEY, IBM, ASDD,
sium, 19-23, NYC, January 17, 1962, avail- Yorktown Heights, N. Y., Report 17-055,
able at $2.00 from the Engineers Joint December 1961, 86 pp., scheduled to appear
Council, 345 East 47th Street, N. Y. 17, in the April 1963 issue of American Docu-
New York. mentation.
9. "SDI 3 for the IBM 1401 Data Processing
16. "Comparative Effect of Different Educa-
System 10.3.004; Selective Dissemination
tion Levels on Indexing in a Selective Dis-
of Information (SDI) for the 1401 Tape
semination System," A. RESNICK, IBM,
System, the 620 Tape System and FOR-
ASDD, Yorktown Heights, N. Y., Report
TRAN II," A. J. SOWARBY, W. BRANDEN-
17-092, August 1, 1962, 16 pp.
BERG, H. C. FALLON, C. B. HENSLEY, and
T. R. SAVAGE (IBM, ASDD, Yorktown 17. "Keyword-in-Context Index for Technical
Heights, N. Y.), 1401 General Program Literature (KWIC Index)," H. P. LUHN,
Library released June 25, 1962, 157 pp. IBM, ASDD, Yorktown Heights, N. Y.,
10. "A Computer Integrated System for Cen- Report RC-127, 1959.
tralized Information Dissemination Stor- 18. "Selective Dissemination of Information,"
age and Retrieval," R. J. TRITSHLER, IBM, an IBM 7090 Program, R. BENJAMIN, S.
DSD, TIC, Poughkeepsie, N. Y. (Presented D. MILLER and E. S. ROWLAND, IBM, FSD,
at the ASLIB Conference, Blackpool, Lan- Owego, N. Y. December 20, 1961 (in
cashire, England, October 4, 1962). SHARE Library), 12 pp.
11. A. J. SOWARBY, IBM, ASDD, Yorktown 19. R. BENJAMIN, S. D. MILLER and E. Sco-
Heights, N. Y., unpublished material. FIELD, IBM, FSD, Owego, N. Y., unpub-
12. "Mechanized Information Retrieval Sys- lished material.
tem for Douglas Aircraft Company, Inc., 20. "On Relevance, Probabilistic Indexing and
Status Report," SM-39167, Missile & Space Information Retrieval," M. E. MARON and
Systems Division, Douglas Aircraft Com- J. L. KUHNS, Journal of the Association
pany, Inc., Santa Monica, California, Jan- for Computing Machinery 7, 3, 216-44,
uary 1962. 1960.

You might also like