p16 Anthes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

news

Technology | doi:10.1145/1735223.1735231 Gary Anthes

Happy Birthday,
RDBMS!
The relational model of data management, which dates to 1970, still
dominates today and influences new paradigms as the field evolves.

F
orty years ago this June, an ar- agement System (IDMS) from the com-
ticle appeared in these pages pany that would become Cullinet.
that would shape the long- Contentious debates raged over the
term direction of information models in the CS community through
technology like few other ideas much of the 1970s, with relational en-
in computer science. The opening sen- thusiasts arrayed against CODASYL ad-
tence of the article, “A Relational Model vocates while IMS users coasted along
of Data for Large Shared Data Banks,” on waves of legacy software.
summed it up in a way as simple and el- As brilliant and elegant as the re-
egant as the model itself: “Future users lational model was, it might have re-
of large data banks must be protected mained confined to computer science
from having to know how the data is or- curricula if it wasn’t for three projects
ganized in the machine,” wrote Edgar F. aimed at real-world implementation of
Codd, a researcher at IBM. the relational database management
And protect them it did. Program- system (RDBMS). In the mid-1970s,
mers and users at the time dealt IBM’s System R project and the Univer-
mostly with crude homegrown data- sity of California at Berkeley’s Ingres
base systems or commercial products project set out to translate the rela-
like IBM’s Information Management tional concepts into workable, main-
System (IMS), which was based on a tainable, and efficient computer code.
low-level, hierarchical data model. “People were stunned to learn that Support for multiple users, locking,
“These databases were very rigid, and complex, page-long [IMS] queries could logging, error-recovery, and more were
they were hard to understand,” recalls be done in a few lines of a relational developed.
Ronald Fagin, a Codd protégé and now language,” says Raghu Ramakrishnan, System R went after the lucrative
a computer scientist at IBM Almaden chief scientist for audience and cloud mainframe market with what would be-
Research Center. The hierarchical computing at Yahoo! come DB2. In particular, System R pro-
“trees” in IMS were brittle. Adding a Codd’s model came to dominate a duced the Structured Query Language
single data element, a common occur- multibillion-dollar database market, (SQL), which became the de facto stan-
rence, or even tuning changes, could but it was hardly an overnight suc- dard language for relational databases.
involve major reprogramming. In ad- cess. The model was just too simple Meanwhile Ingres was aimed at UNIX
dition, the programming language to work, some said. And even if it did machines and Digital Equipment Corp.
used with IMS was a low-level language work, it would never run as efficiently (DEC) minicomputers.
akin to an assembler. as a finely tuned IMS program, others Then, in 1979, another watershed pa-
But Codd’s relational model stored said. And although Codd’s relational per appeared. “Access Path Selection in
data by rows and columns in simple concepts were simple and elegant, his a Relational Database Management Sys-
tables, which were accessed via a high- mathematically rigorous languages, tem,” by IBM System R researcher Patri-
level data manipulation language relational calculus and relational alge- cia Selinger and coauthors, described
(DML). The model raised the level of bra, could be intimidating. an algorithm by which a relational
abstraction so that users specified what In 1969, an ad hoc consortium called system, presented with a user query,
they wanted, but not how to get it. And CODASYL proposed a hierarchical da- could pick the best path to a solution
when their needs changed, reprogram- tabase model built on the concepts be- from multiple alternatives. It did that
ming was usually unnecessary. It was hind IMS. CODASYL claimed that its ap- by considering the total cost of the vari-
Photogra ph by c ha d nack ers

similar to the transition 10 years earlier proach was more flexible than IMS, but ous approaches in terms of CPU time,
from assembler languages to Fortran it still required programmers to keep required disk space, and more.
and COBOL, which also raised the level track of far more details than the rela- “Selinger’s paper was really the piece
of abstraction so that programmers no tional model did. It became the basis of work that made relational database
longer had to know and keep track of for a number of commercial products, systems possible,” says David DeWitt,
details like memory addresses. including the Integrated Database Man- director of Microsoft’s Jim Gray Systems

16 communications of th e ac m | M ay 2 0 1 0 | vo l . 5 3 | n o. 5
news

Laboratory at the University of Wiscon- 1980s in the form of object-oriented da- cloud, where service vendors could
sin-Madison. “It was a complete home tabases (OODBs), but they never caught host the databases of many clients.
run.” The paper led to her election to on. There weren’t that many applica- “So ‘scale’ now becomes not just data
the National Academy of Engineering in tions for which an OODB was the best volume, it’s the number of clients, the
1999, won her a slew of awards (includ- choice, and it turned out to be easier variety of applications, the number of
ing the SIGMOD Edgar F. Codd Innova- to add the key features of OODBs to locations and so on,” he says. “Man-
tions Award in 2002), and remains the the relational model than to start from ageability is one of the key challenges
seminal work on query optimization in scratch with a new paradigm. in this space.”
relational systems. More recently, some have suggested
Propelled by Selinger’s new ideas, that the MapReduce software frame-
Further Reading
System R, Ingres, and their commercial work, patented by Google this year, will
progeny proved that relational systems supplant the relational model for very Codd, E.F.
A relational model of data for large shared
could provide excellent performance. large distributed data sets. [See “More
data banks. Comm. of the ACM 13, 6, June 1970.
IBM’s DB2 edged out IMS and IDMS on Debate, Please!” by Moshe Y. Vardi on p.
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D.,
mainframes, while Ingres and its deriv- 5 of the January 2010 issue of Communi-
Lorie, R.A., Price, T.G.
atives had the rapidly growing DEC Vax cations.] Clearly, each approach has its Access path selection in a relational
market to themselves. Soon, the data- advantages, and the jury is still out. database management system. Proceedings
base wars were largely over. As RDBMSs continues to evolve, of the 1979 ACM SIGMOD International
scientists are exploring new roads of Conference on Management of Data, 1979.
Faster Queries inquiry. Fagin’s key research right now Ullman, J.D.
During the 1980s, DeWitt found anoth- is the integration of heterogeneous Principles of Database and Knowledge-Base
Systems: Volume II: The New Technologies,
er way to speed up queries against rela- data. “A special case that is still really
W.H. Freeman & Co., New York, NY, 1990.
tional databases. His Gamma Database hard is schema mapping—converting
Hellerstein, J. M. and Stonebraker, M.
Machine Project showed it was possible data from one format to another,” he
Readings in Database Systems (4th ed.), MIT
to achieve nearly linear speed-ups by says. “It sounds straightforward, but Press, Cambridge, MA, 2005.
using the multiple CPUs and disks in a it’s very subtle.” DeWitt is interested
Agrawal, R., et al
cluster of commodity minicomputers. in how researchers will approach the The Claremont Report on Database
His pioneering ideas about data par- “unsolved problem” of querying geo- Research, University of California at
titioning and parallel query execution graphically distributed databases, Berkeley, Sept. 2008. http://db.cs.berkeley.
found their way into nearly all commer- especially when the databases are cre- edu/claremont/claremontreport08.pdf
cial parallel database systems. ated by different organizations and
“If the database community had not are almost but not quite alike. And Gary Anthes is a technology writer and editor based in
Arlington, VA.
switched from CODASYL to relational, Ramakrishnan of Yahoo! is investigat-
the whole parallel database industry ing how to maintain databases in the © 2010 ACM 0001-0782/10/0500 $10.00
would not have been possible,” DeWitt
says. The declarative, not imperative,
programming model of SQL greatly fa-
cilitated his work, he says.
The simplicity of the relational
Looking Ahead With
model held obvious advantages for us-
ers, but it had a more subtle benefit as
Michael Stonebraker
well, IBM’s Fagin says. “For theorists An adjunct professor at Massachusetts Institute of Technology, Michael Stonebraker
is renowned as a database architect and a pioneer in several database technologies,
like me, it was much easier to develop such as Ingres, PostgreSQL, and Mariposa (which he has commercial interests in). As
theory for it. And we could find ways to for the database industry’s future direction, Stonebraker says one-third of the market
make the model perform better, and will consist of relational database management systems used in large data warehouses,
ways to build functions into the model. such as corporate repositories of sales information. But the mainstream products in
use today, which store table data row by row, will face competition from new, better-
The relational model made collabora- performing software that stores it column by column. “You can go wildly faster by
tion between theorists and practitio- rotating your thinking 90 degrees,” he says.
ners much easier.” Another third of the market he believes will be in online transaction processing,
Indeed, theorists and practitioners where databases tend to be smaller and transactions simpler. That means databases
can be kept in memory and locking can be avoided by processing transactions one at a
worked together to a remarkable de- time. These “lightweight, main memory” systems, Stonebraker says, can run 50 times
gree, and operational techniques and faster than most online transaction processing systems in use today.
applications flowed from their work. In the final third of the market, there are “a bunch of ideas,” depending on the type
of application, he says. One is streaming, where large streams of data flow through
Their collaboration resulted in, for ex- queries without going to disk. Another type of nonrelational technology will store
ample, the concept of locking, by which semistructured data, such as XML and RDF. And a third approach, based on arrays
simultaneous users were prevented rather than tables, will be best for doing data clustering and complex analyses of very
from updating a record simultaneously. large data sets.
Finally, “if you don’t care about performance,” says Stonebraker, “there are a bunch
The hegemony of the relational of mature, open-source, one-size-fits-all DBMSs.”
model has not gone without challenge.
For example, a rival appeared in the late

m ay 2 0 1 0 | vo l . 5 3 | n o. 5 | c o m m u n i c at i o n s o f t he acm 17

You might also like