XMLBible 1 10

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

3236-7 FM.F.

qc 6/30/99 2:59 PM Page iii

XML Bible ™

Elliotte Rusty Harold

IDG Books Worldwide, Inc.


An International Data Group Company
Foster City, CA ✦ Chicago, IL ✦ Indianapolis, IN ✦ New York, NY
3236-7 FM.F.qc 6/30/99 2:59 PM Page iv

XML™ Bible For general information on IDG Books Worldwide’s


Published by books in the U.S., please call our Consumer Customer
IDG Books Worldwide, Inc. Service department at 800-762-2974. For reseller
An International Data Group Company information, including discounts and premium sales,
919 E. Hillsdale Blvd., Suite 400 please call our Reseller Customer Service department
Foster City, CA 94404 at 800-434-3422.
www.idgbooks.com (IDG Books Worldwide Web site) For information on where to purchase IDG Books
Copyright © 1999 IDG Books Worldwide, Inc. All rights Worldwide’s books outside the U.S., please contact our
reserved. No part of this book, including interior International Sales department at 317-596-5530 or fax
design, cover design, and icons, may be reproduced or 317-596-5692.
transmitted in any form, by any means (electronic, For consumer information on foreign language
photocopying, recording, or otherwise) without the translations, please contact our Customer Service
prior written permission of the publisher. department at 800-434-3422, fax 317-596-5692, or e-mail
ISBN: 0-7645-3236-7 rights@idgbooks.com.
Printed in the United States of America For information on licensing foreign or domestic rights,
please phone +1-650-655-3109.
10 9 8 7 6 5 4 3 2 1
For sales inquiries and special prices for bulk
1O/QV/QY/ZZ/FC
quantities, please contact our Sales department at
Distributed in the United States by IDG Books 650-655-3200 or write to the address above.
Worldwide, Inc.
For information on using IDG Books Worldwide’s books
Distributed by CDG Books Canada Inc. for Canada; by in the classroom or for ordering examination copies,
Transworld Publishers Limited in the United Kingdom; please contact our Educational Sales department at
by IDG Norge Books for Norway; by IDG Sweden Books 800-434-2086 or fax 317-596-5499.
for Sweden; by IDG Books Australia Publishing
For press review copies, author interviews, or other
Corporation Pty. Ltd. for Australia and New Zealand; by
publicity information, please contact our Public
TransQuest Publishers Pte Ltd. for Singapore,
Relations department at 650-655-3000 or fax
Malaysia, Thailand, Indonesia, and Hong Kong; by
650-655-3299.
Gotop Information Inc. for Taiwan; by ICG Muse, Inc.
for Japan; by Norma Comunicaciones S.A. for For authorization to photocopy items for corporate,
Colombia; by Intersoft for South Africa; by Eyrolles for personal, or educational use, please contact Copyright
France; by International Thomson Publishing for Clearance Center, 222 Rosewood Drive, Danvers, MA
Germany, Austria and Switzerland; by Distribuidora 01923, or fax 978-750-4470.
Cuspide for Argentina; by Livraria Cultura for Brazil; by
Ediciones ZETA S.C.R. Ltda. for Peru; by WS Computer Library of Congress Cataloging-in-Publication Data
Publishing Corporation, Inc., for the Philippines; by
Harold, Elliote Rusty.
Contemporanea de Ediciones for Venezuela; by
Express Computer Distributors for the Caribbean and XML bible / Elliote Rusty Harold.
West Indies; by Micronesia Media Distributor, Inc. for p. cm.
Micronesia; by Grupo Editorial Norma S.A. for ISBN 0-7645-3236-7 (alk. paper)
Guatemala; by Chips Computadoras S.A. de C.V. for 1. XML (Document markup language) I. Title.
Mexico; by Editorial Norma de Panama S.A. for
Panama; by American Bookshops for Finland. QA76.76.H94H34 1999 99-31021
Authorized Sales Agent: Anthony Rudkin Associates for 005.7’2--dc21 CIP
the Middle East and North Africa.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND AUTHOR HAVE USED THEIR BEST
EFFORTS IN PREPARING THIS BOOK. THE PUBLISHER AND AUTHOR MAKE NO REPRESENTATIONS OR
WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS BOOK
AND SPECIFICALLY DISCLAIM ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. THERE ARE NO WARRANTIES WHICH EXTEND BEYOND THE DESCRIPTIONS
CONTAINED IN THIS PARAGRAPH. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES
REPRESENTATIVES OR WRITTEN SALES MATERIALS. THE ACCURACY AND COMPLETENESS OF THE
INFORMATION PROVIDED HEREIN AND THE OPINIONS STATED HEREIN ARE NOT GUARANTEED OR
WARRANTED TO PRODUCE ANY PARTICULAR RESULTS, AND THE ADVICE AND STRATEGIES CONTAINED
HEREIN MAY NOT BE SUITABLE FOR EVERY INDIVIDUAL. NEITHER THE PUBLISHER NOR AUTHOR SHALL
BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT
LIMITED TO SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR OTHER DAMAGES.
Trademarks: All brand names and product names used in this book are trade names, service marks, trademarks,
or registered trademarks of their respective owners. IDG Books Worldwide is not associated with any product or
vendor mentioned in this book.

is a registered trademark or trademark under exclusive license


to IDG Books Worldwide, Inc. from International Data Group, Inc.
in the United States and/or other countries.
3236-7 FM.F.qc 6/30/99 2:59 PM Page v

Welcome to the world of IDG Books Worldwide.


IDG Books Worldwide, Inc., is a subsidiary of International Data Group, the world’s largest publisher of
computer-related information and the leading global provider of information services on information technology.
IDG was founded more than 30 years ago by Patrick J. McGovern and now employs more than 9,000 people
worldwide. IDG publishes more than 290 computer publications in over 75 countries. More than 90 million
people read one or more IDG publications each month.
Launched in 1990, IDG Books Worldwide is today the #1 publisher of best-selling computer books in the
United States. We are proud to have received eight awards from the Computer Press Association in recognition
of editorial excellence and three from Computer Currents’ First Annual Readers’ Choice Awards. Our best-
selling ...For Dummies® series has more than 50 million copies in print with translations in 31 languages. IDG
Books Worldwide, through a joint venture with IDG’s Hi-Tech Beijing, became the first U.S. publisher to
publish a computer book in the People’s Republic of China. In record time, IDG Books Worldwide has become
the first choice for millions of readers around the world who want to learn how to better manage their
businesses.
Our mission is simple: Every one of our books is designed to bring extra value and skill-building instructions
to the reader. Our books are written by experts who understand and care about our readers. The knowledge
base of our editorial staff comes from years of experience in publishing, education, and journalism —
experience we use to produce books to carry us into the new millennium. In short, we care about books, so
we attract the best people. We devote special attention to details such as audience, interior design, use of
icons, and illustrations. And because we use an efficient process of authoring, editing, and desktop publishing
our books electronically, we can spend more time ensuring superior content and less time on the technicalities
of making books.
You can count on our commitment to deliver high-quality books at competitive prices on topics you want
to read about. At IDG Books Worldwide, we continue in the IDG tradition of delivering quality for more than
30 years. You’ll find no better book on a subject than one from IDG Books Worldwide.

John Kilcullen Steven Berkowitz


Chairman and CEO President and Publisher
IDG Books Worldwide, Inc. IDG Books Worldwide, Inc.

Eighth Annual Eleventh Annual


Computer Press Computer Press
Awards 1992 Ninth Annual Tenth Annual Awards 1995
Computer Press Computer Press
Awards 1993 Awards 1994

IDG is the world’s leading IT media, research and exposition company. Founded in 1964, IDG had 1997 revenues of $2.05
billion and has more than 9,000 employees worldwide. IDG offers the widest range of media options that reach IT buyers
in 75 countries representing 95% of worldwide IT spending. IDG’s diverse product and services portfolio spans six key areas
including print publishing, online publishing, expositions and conferences, market research, education and training, and
global marketing services. More than 90 million people read one or more of IDG’s 290 magazines and newspapers, including
IDG’s leading global brands — Computerworld, PC World, Network World, Macworld and the Channel World family of
publications. IDG Books Worldwide is one of the fastest-growing computer book publishers in the world, with more than
700 titles in 36 languages. The “...For Dummies®” series alone has more than 50 million copies in print. IDG offers online
users the largest network of technology-specific Web sites around the world through IDG.net (http://www.idg.net), which
comprises more than 225 targeted Web sites in 55 countries worldwide. International Data Corporation (IDC) is the world’s
largest provider of information technology data, analysis and consulting, with research centers in over 41 countries and more
than 400 research analysts worldwide. IDG World Expo is a leading producer of more than 168 globally branded conferences
and expositions in 35 countries including E3 (Electronic Entertainment Expo), Macworld Expo, ComNet, Windows World
Expo, ICE (Internet Commerce Expo), Agenda, DEMO, and Spotlight. IDG’s training subsidiary, ExecuTrain, is the world’s
largest computer training company, with more than 230 locations worldwide and 785 training courses. IDG Marketing
Services helps industry-leading IT companies build international brand recognition by developing global integrated marketing
programs via IDG’s print, online and exposition products worldwide. Further information about the company can be found
at www.idg.com. 1/24/99
3236-7 FM.F.qc 6/30/99 2:59 PM Page vi

Credits
Acquisitions Editor Copy Editors
John Osborn Amy Eoff
Amanda Kaufman
Development Editor Nicole LeClerc
Terri Varveris Victoria Lee

Contributing Writer Production


Heather Williamson IDG Books Worldwide Production

Technical Editor Proofreading and Indexing


Greg Guntle York Production Services

About the Author


Elliotte Rusty Harold is an internationally respected writer, programmer, and
educator both on the Internet and off. He got his start by writing FAQ lists for the
Macintosh newsgroups on Usenet, and has since branched out into books, Web
sites, and newsletters. He lectures about Java and object-oriented programming
at Polytechnic University in Brooklyn. His Cafe con Leche Web site at http://
metalab.unc.edu/xml/ has become one of the most popular independent XML
sites on the Internet.

Elliotte is originally from New Orleans where he returns periodically in search of


a decent bowl of gumbo. However, he currently resides in the Prospect Heights
neighborhood of Brooklyn with his wife Beth and cats Charm (named after the
quark) and Marjorie (named after his mother-in-law). When not writing books, he
enjoys working on genealogy, mathematics, and quantum mechanics. His previous
books include The Java Developer’s Resource, Java Network Programming, Java
Secrets, JavaBeans, XML: Extensible Markup Language, and Java I/O.
3236-7 FM.F.qc 6/30/99 2:59 PM Page vii

For Ma, a great grandmother


3236-7 FM.F.qc 6/30/99 2:59 PM Page viii
3236-7 FM.F.qc 6/30/99 2:59 PM Page ix

Preface
Welcome to the XML Bible. After reading this book I hope you’ll agree with me that
XML is the most exciting development on the Internet since Java, and that it makes
Web site development easier, more productive, and more fun.

This book is your introduction to the exciting and fast growing world of XML. In this
book, you’ll learn how to write documents in XML and how to use style sheets to
convert those documents into HTML so legacy browsers can read them. You’ll
also learn how to use document type definitions (DTDs) to describe and validate
documents. This will become increasingly important as more and more browsers like
Mozilla and Internet Explorer 5.0 provide native support for XML.

About You the Reader


Unlike most other XML books on the market, the XML Bible covers XML not from
the perspective of a software developer, but rather that of a Web-page author. I
don’t spend a lot of time discussing BNF grammars or parsing element trees.
Instead, I show you how you can use XML and existing tools today to more
efficiently produce attractive, exciting, easy-to-use, easy-to-maintain Web sites
that keep your readers coming back for more.

This book is aimed directly at Web-site developers. I assume you want to use XML
to produce Web sites that are difficult to impossible to create with raw HTML. You’ll
be amazed to discover that in conjunction with style sheets and a few free tools,
XML enables you to do things that previously required either custom software
costing hundreds to thousands of dollars per developer, or extensive knowledge
of programming languages like Perl. None of the software in this book will cost
you more than a few minutes of download time. None of the tricks require any
programming.

What You Need to Know


XML does build on HTML and the underlying infrastructure of the Internet. To that
end, I will assume you know how to use ftp files, send email, and load URLs in your
Web browser of choice. I will also assume you have a reasonable knowledge of
HTML at about the level supported by Netscape 1.1. On the other hand, when I
discuss newer aspects of HTML that are not yet in widespread use like cascading
style sheets, I will cover them in depth.
3236-7 FM.F.qc 6/30/99 2:59 PM Page x

x Preface

To be more specific, in this book I assume that you can:

✦ Write a basic HTML page including links, images, and text using a text editor.
✦ Place that page on a Web server.

On the other hand, I do not assume that you:

✦ Know SGML. In fact, this preface is almost the only place in the entire book
you’ll see the word SGML used. XML is supposed to be simpler and more
widespread than SGML. It can’t be that if you have to learn SGML first.
✦ Are a programmer, whether of Java, Perl, C, or some other language, XML is
a markup language, not a programming language. You don’t need to be a
programmer to write XML documents.

What You’ll Learn


This book has one primary goal; to teach you to write XML documents for the Web.
Fortunately, XML has a decidedly flat learning curve, much like HTML (and unlike
SGML). As you learn a little you can do a little. As you learn a little more, you can do
a little more. Thus the chapters in this book build steadily on each other. They are
meant to be read in sequence. Along the way you’ll learn:

✦ How an XML document is created and delivered to readers.


✦ How semantic tagging makes XML documents easier to maintain and develop
than their HTML equivalents.
✦ How to post XML documents on Web servers in a form everyone can read.
✦ How to make sure your XML is well-formed.
✦ How to use international characters like _ and _ in your documents.
✦ How to validate documents with DTDs.
✦ How to use entities to build large documents from smaller parts.
✦ How attributes describe data.
✦ How to work with non-XML data.
✦ How to format your documents with CSS and XSL style sheets.
✦ How to connect documents with XLinks and Xpointers.
✦ How to merge different XML vocabularies with namespaces.
✦ How to write metadata for Web pages using RDF.
3236-7 FM.F.qc 6/30/99 2:59 PM Page xi

Preface xi

In the final section of this book, you’ll see several practical examples of XML being
used for real-world applications including:

✦ Web Site Design


✦ Push
✦ Vector Graphics
✦ Genealogy

How the Book Is Organized


This book is divided into five parts and includes three appendixes:

I. Introducing XML
II. Document Type Definitions
III. Style Languages
IV. Supplemental Technologies
V. XML Applications

By the time you’re finished reading this book, you’ll be ready to use XML to create
compelling Web pages. The five parts and the appendixes are described below.

Part I: Introducing XML


Part I consists of Chapters 1 through 7. It begins with the history and theory behind
XML, the goals XML is trying to achieve, and shows you how the different pieces of
the XML equation fit together to create and deliver documents to readers. You’ll see
several compelling examples of XML applications to give you some idea of the wide
applicability of XML, including the Vector Markup Language (VML), the Resource
Description Framework (RDF), the Mathematical Markup Language (MathML), the
Extensible Forms Description Language (XFDL), and many others. Then you’ll learn
by example how to write XML documents with tags you define that make sense for
your document. You’ll see how to edit them in a text editor, attach style sheets to
them, and load them into a Web browser like Internet Explorer 5.0 or Mozilla. You’ll
even learn how you can write XML documents in languages other than English,
even languages that aren’t written remotely like English, such as Chinese, Hebrew,
and Russian.
3236-7 FM.F.qc 6/30/99 2:59 PM Page xii

xii Preface

Part II: Document Type Definitions


Part II consists of Chapters 8 through 11, all of which focus on document type
definitions (DTDs). An XML document may optionally contain a DTD that specifies
which elements are and are not allowed in an XML document. The DTD specifies
the exact context and structure of those elements. A validating parser can read a
document and compare it to its DTD, and report any mistakes it finds. This enables
document authors to make sure that their work meets any necessary criteria.

In Part II, you’ll learn how to attach a DTD to a document, how to validate your
documents against their DTDs, and how to write your own DTDs that solve your
own problems. You’l learn the syntax for declaring elements, attributes, entities,
and notations. You’ll see how you can use entity declarations and entity references
to build both a document and its DTD from multiple, independent pieces. This
allows you to make long, hard-to-follow documents much simpler by separating
them into related modules and components. And you’ll learn how to integrate other
forms of data like raw text and GIF image files in your XML document.

Part III: Style Languages


Part III consists of Chapters 12 through 15. XML markup only specifies what’s in a
document. Unlike HTML, it does not say anything about what that content should
look like. Information about an XML document’s appearance when printed, viewed
in a Web browser, or otherwise displayed is stored in a style sheet. Different style
sheets can be used for the same document. You might, for instance, want to use a
style sheet that specifies small fonts for printing, another one that uses larger fonts
for on-screen use, and a third with absolutely humongous fonts to project the
document on a wall at a seminar. You can change the appearance of an XML docu-
ment by choosing a different style sheet without touching the document itself.

Part III describes in detail the two style sheet languanges in broadest use on the
Web, Cascading Style Sheets (CSS) and the Extensible Style Language (XSL).

CSS is a simple style-sheet language originally designed for use with HTML. CSS
exists in two versions: CSS Level 1 and CSS Level 2. CSS Level 1 provides basic
information about fonts, color, positioning, and text properties, and is reasonably
well supported by current Web browsers for HTML and XML. CSS Level 2 is a more
recent standard that adds support for aural style sheets, user interface styles,
international and bi-directional text, and more. CSS is a relatively simple standard
that spplies fixed style rules to the contents of particular elements.

XSL, by contrast, is a more complicated and more powerful style language that cannot
only apply styles to the contents of elements but can also rearrange elements, add
boilerplate text, and transform documents in almost arbitrary ways. XSL is divided
into two parts: a transformation language for converting XML trees to alternative
trees, and a formatting language for specifying the appearance of the elements of an
XML tree. Currently, the transformation language is better supported by most tools

You might also like