It should come as no surprise that XML is yet another invention spearheaded
by Berners-Lee and the organization he leads, the World Wide
Web Consortium. With offices in Valbonne, France; Cambridge,
Mass.; and Tokyo, and with a full-time staff of more than
60, the W3C, as it's called, brings together about 500 member
organizations. While the IEEE Computer Society is one, most
of the rest are large or mid-sized corporations like DaimlerChrysler
[ranked 4 among the "Top 100 R and D
Spenders"], Hewlett-Packard (30), and Autodesk. W3C also
coordinates the work of additional researchers as well as
volunteers from member and nonmember companies and academia.
The Semantic Web is just one item on the W3C's diverse agenda. Others are
interoperability (in file formats, for example) and technologies
for trust, like digital signatures. But the Semantic Web is
increasingly important—four interest groups are working
on its technologies.
Similar goal, simpler strategies
While the W3C works hard to coordinate the work of multifarious organizations,
some other companies are overcoming the semantic shortcomings
of a human-oriented Web without either restructuring it or
waiting for smarter agents. Google Inc. (Mountain View, Calif.)
has, to date, not only kept up with the Web's phenomenal growth,
it has added new categories of documents in search results—PDFs,
Usenet newsgroups, and image files. Autonomy Corp. (Cambridge,
UK) and the Palo Alto (Calif.) Research Center [recently spun
off from Xerox (73)], each, in different ways, use mathematical
models of how long-term memory works in the brain to create
concept maps out of the words on Web pages. At Verity Inc.
(Sunnyvale, Calif.), researchers add things like organization
charts and address books to infuse amorphous corporate documents
with additional structure.
What companies like Google, Autonomy, and Verity are doing, in
other words, is figuring out better ways of doing what search
engines have always tried to do: deliver the best documents
the existing Web has on a given topic. The advocates of the
Semantic Web, on the other hand, are looking beyond the current
Web to one in which agent-like search engines will be able
to not just deliver documents, but get at the facts inside
them as well. One thing everyone can agree on: even with its
billions of pages and countless links, the Web, only a dozen
years old, is still in its infancy. As Berners-Lee puts it,
the next generation of the Web will be as revolutionary as
the original Web itself was..
From words to concepts
The ideas behind the Semantic Web are innovations that simply extend
current Web techniques in ways that make documents more datalike,
so that agents can interact with them in sophisticated ways.
For instance, URIs (uniform resource identifiers) are like URLs (uniform
resource locators), but more general: a URL (such as http://www.spectrum.ieee.org/index.html)
is a link to an entity on the Web, while a URI identifies
resources, in general. (All URLs are URIs, while the reverse
isn't the case.) For Berners-Lee, items like human beings,
corporations, and bound books in a library are resources,
just not "network retrievable" ones.
XML build on a second fundamental Web technique: coding elements in
a document. With the current scheme, HTML, such codes as <title>
for an article's title, <bold> for boldface type and
<table> to begin a table, identify document elements
only stylistically. XML, however, singles things out as data
elements—as dates, prices, invoice numbers, and so on.
In fact, XML allows users to mark up any data elements whatsoever.