
Jay Ven Eman, Ph.D.
Interest in Extensible Markup Language (XML) rivals the press coverage the World Wide Web received at the turn of the Millennium. Is it justified? Rather than answer directly, let us take a brief survey of what XML is, why it was developed, and highlight some current XML initiatives. Whether or not the hype is justified, ASIST members will inevitably become involved in your own XML initiatives.
An alternative question to the title is, “What can’t you do with XML?” I use it to brew my coffee in the morning. Doesn’t everyone? To prove my point, the following is a “well-formed” XML document. (“Well-formed” will be defined later in the article.)
<?xml version="1.0" standalone="yes" encoding="ISO-8859-1"?> <StartDay Attitude="Iffy"> <Sunrise Time="6:22" AM="Yes"/> <Coffee Prepare="Or die!" Brand="Starbuck’s" Type="Colombian" Roast="Dark"/> <Water Volume="24" UnitMeasure="Ounces">Add cold water.</Water> <Beans Grind="perc" Type="Java">Grind coffee beans. Faster, please!!</Beans> <Grounds>Dump grounds into coffee machine.</Grounds> <Heat Temperature="152 F">Turn on burner</Heat> <Brew>Wait, impatiently!!</Brew> <Dispense MugSize="24" UnitMeasure="Ounces">Pour, finally.</Dispense> </StartDay>
This XML document instance contains sufficient information to drive the coffee making process. Given the intrinsic nature of XML, our coffee-making document instance could be used by the butler (should we be so lucky) or by a Java applet or perl script to send processing instructions to a wired or wireless coffeepot. If XML can brew coffee, it can drive commerce; it can drive R & D; it can drive the information industry; it can drive information research; it can drive the uninitiated out of business.
What is XML? To understand XML, you must understand meta data. Meta data is “data about data.” It is an abstraction, layered above the abstraction that is language. Meta data can be characterized as natural or added. To illustrate, consider the following text string, “Is MLB a sport, entertainment, or business?” You, the reader, can guess with some degree of accuracy that this is an article title about Major League Baseball (MLB). Presented out of context, even people are only guessing. Computers have no clue, in or out of, context. There are no software systems that can reliable recognize it in a meaningful way.
For this example, it is a newspaper article title. To it we will add subject terms from a controlled vocabulary, identify the author, the date, and add an abstract. As a “well-formed” XML document instance, it is rendered:
<?xml version="1.0" standalone="yes" encoding="ISO-8859-1"?> <DOC Date=5/21/02 Doctype="Newspaper"> <TI> "Is MLB a sport, entertainment, or business?" </TI> <Byline> Smith </Byline> <ST> Sports </ST> <ST> Entertainment </ST> <ST> Business </ST> <AB> Text of abstract...</AB> <Text> Start of article ...</Text> </DOC>
In this example, what are the meta data? What is natural and what is added? Natural meta data is information that enhances our understanding of the document and parts thereof, and can be found in the source information. The date, the author’s name, and the title are “natural” meta data. They are an abstraction layer apart from the “data” and add significantly to our understanding of the “data.”
The subject terms, document type, and abstract are “added” meta data. This information also adds to our understanding, but it had to be added by an editor and/or software. The tags are “added” and are meta data. Meta data can be the element tag, the attribute tag, or the values labeled by element and attribute tags. It is the collection of meta data that allows computer software to reliably deal with the data. Meta data facilitates networking across computers, platforms, software, companies, cities, countries.
What is the “data” in this example? The text, tables, charts, figures, and graphs that are contained within the open <Text> and close </Text> tags.
Comments are closed.