Tag Archives | xml

An XML-based Workflow System for ClerkBase

Unlimited Priorities is investigating the introduction of an XML-based workflow management system for ClerkBase, Inc. a publishing company focused on the needs of municipal governments. Their core product gives the public easy search and retrieval tools to access public documents such as meeting agendas, minutes, city codes, and manager reports. ClerkBase wants to establish a common format for these documents, convert the legacy ones and put in place a workflow management system so that all new documents can be rapidly converted to the same formats. Because ClerkBase is growing rapidly they need to ensure that they are using the most efficient and cost effective methods to publish their content.

Unlimited Priorities is reviewing their current processes and helping them develop a strategy for vendor selection to ensure that all documents could be converted to a modern XML format by introducing a state-of-the-art workflow management system.

Comments { 0 }

The value’s in the content

Data Conversion Laboratory has released their April, 2012 industry survey focused on the role content plays in corporations. Almost half of the respondents said that content makes up half of their companies’ value. Nearly one-third estimated that corporate content makes up 75% of company value! As DCL says, “That’s quite an incredible acknowledgement of the shift in corporate value.”

The results are not specific to large corporations. The respondents came from a cross sections of companies, both in size and in business area – manufacturers, high technology companies, publishers and pharmaceutical companies.

The survey also shows that XML acceptance is here – 25% report that all their content is in XML and another 50% report some of their content is in XML.

Another, maybe not so surprising result, is that the main reason driving conversion to XML is for presentation to the customer – 83% want their information converted for their customers.

The biggest concerns? Shortage of expertise and cost were both reported by 96% of the respondents.

The results are summarized in more detail on the DCL website: Content is Where the Real Value Resides In Corporations

Helping organizations develop a content strategy is one of our specialties at Unlimited Priorities. You can see more on this website at: Content Licensing, Outsourcing and Production.

Comments { 0 }

DCL Learning Series Webinar: Crossing the Chasm with DITA

DCL Learning SeriesData Conversion Laboratory and and Dr. JoAnn Hackos, president of Comtech Services Inc. are producing a three part webinar on what DITA is and what it can do for your organization.  The first of this three part event will be Thursday, January 19, 2012 from 1:00 PM – 2:00 PM EST so sign up now.

In this three-part webinar series, Dr. JoAnn Hackos, president of Comtech Services, Inc., will trace the progress of many organizations from the early phases of Exploration, Preparation, and Education through genuine progress through Pilot projects, purchasing of a Component Content Management System to keep everything in line, through the Conversion of legacy content to a new way of structuring and managing information.

via DCLnews Blog.

Sign up links for all threes sessions are on the DCLnews Blog.

The session titles are:

  • Session 1: “Get Ready… Get Set”
  • Session 2: “Now Go”
  • Session 3: “Next, Grow”

About DCL

Since its founding in 1981, Data Conversion Laboratory (DCL) remained faithful to its guarantee to construct unparalleled electronic document conversion services based on the rich legacy of superior customization and exceptional quality.

About Dr. JoAnn Hackos

Dr. JoAnn Hackos is President of Comtech Services, a content-management and information-design firm based in Denver, Colorado, which she founded in 1978. She is Director of the Center for Information-Development Management (CIDM), a membership organization focused on content-management and information-development best practices.

Comments { 0 }

How Intelligent is Your Content?

An interview with Ann Rockley

Written by Richard Oppenheim for
Unlimited Priorities and DCLnews Blog

Intelligence has an increasing list of definitions. There is natural, artificial, computer, along with many variations of intelligent as a descriptor – an intelligent question, comment, reply, etc. With the silos of data overflowing and new silo construction happening every day, the evolution of your data into functional content is a key component of intelligent analysis and results.

People and animals have collected, stored, and preserved items throughout history. On the people side, scrolls, books, art work and all things collectible were brought to a central location for protection or hording or just to allow others to view the items. Storage facilities were constructed way before the invention of electricity and the advent of digital data. Today, content is flowing through the conversion of many things to many things digital. There is no indication a pizza will evolve to something digital. You can order, pay for and request delivery of the pizza. Eating it is a different experience. Digitized content can be made accessible for anyone to view whether it is a book, movie, museum masterpiece or do it yourself images and journals.

Transforming content into intelligent content takes more than a magic wand and a few wishes. The content needs to be accessible. Once accessed, the enterprise must construct a capability to assemble various forms of content into usable information.

To shed a bright light on how content can be stored intelligently; I interviewed Ann Rockley, Founder and President of The Rockley Group. For more than 20 years, Rockley has been helping organizations and publishers of all sizes with a well-planned move to useful and usable content publishing strategies through the the use of tagging schemes, such as XML. The flood of content is exploding from every direction. The volume of content is advancing in a steady and forever increasing speed. This growth places strenuous demands within every enterprise whether for profit, not-for-profit or government agency, to create, manage, distribute all forms of content. Ann Rockley states, “We can do so much more than just full-text searching. We’ve gone from documents which are ‘black boxes’ to content which is structurally rich and semantically aware, and is therefore automatically discoverable, reusable, reconfigurable and adaptable.”

Intelligent content is not just about bigger, faster computer processing. In the last century, we worried about how to store all the paper that was being created. Large companies would buy or build large warehouses with cabinets and shelves to hold the documents that were being created non-stop. Cries of “paperless office” echoed from Wall Street to Main Street.

The computer did help, by creating even more paper to be stored. Accessibility to content is continuously expanding whether through public search engines or private company search applications. Today, companies of every size need to determine how it will store and access content. The right strategy is not about technology alone, it is about “…defining a content experience for your customer that enables them to achieve their goals anywhere, anytime and on any device,” says Ann.

The first step in this process is to understand overall company requirements. The practical issues include figuring out how to integrate the significant volume of already stored data with new data flowing through the input pipeline every second. Digital data needs to be stored with appropriate identification that it can be accessed. With estimates of data creation being measured in zettabytes, each organization will contribute its share to this volume. The good news about evolving technologies is that huge storage facilities are being strategically located around the world with sufficient power, cooling and security. One content area is linked with one or multiple content areas so that overflow, malfunction and other operating requirements can be shifted among the silos as needed.

The business demand for loads of storage is not just a volume measured in gigabytes or terabytes or some other huge number. The key with today’s digital data is that volume requirements fluctuate between peaks and valleys so that if a flood of new data knocks on the warehouse door, more storage space can be provided. This is called scalability. Retail stores experience this flood of more during the end of year holidays. Accountants have this experience during tax season. Ski resorts, Sunbelt states, summer vacations all have these variable data flows.

Ann Rockley advises everyone to recognize just how important it is to have each company build a detailed content strategy. Whether the company is growing or holding steady, tagging, storage, security and retrieval of content is crucial. She states that, “With today’s web based access technologies, computer use is becoming easy and in many cases, even easy to use. With more people gaining access to content, there are many more opportunities for collaboration throughout the personal or business communities.” As long as the computer platforms are constructed correctly, content can expand to whatever level of intelligence that is needed at the moment.

Having a structure for the content does not imply that every bit of data has the same format or application process. There are accounting data, reports, correspondence, manufacturing process control, inventory management and on and on. Data arrives and can be reshaped, recolored and tagged with appropriate XML style coding to create intelligent content. Developing a content strategy starts with knowing and/or learning a few things:

  • What data is currently being collected and where it is being used
  • What people are accessing the content – customers, employees, researchers, etc
  • How can existing content be merged with new structures being created
  • What is needed to enable scalability of content storage areas
  • As data is collected, does the process know the frequency and purpose of individual use
  • Does the content flow through the company work processes in a logical series of steps
  • How will the company establish and maintain appropriate taxonomy definitions
  • How will the company manage the stored content and its accessibility

Development strategies do not begin with a single ‘Aha’ moment. Strategy takes resources, review, input from multiple sources, and creating a structured blueprint for the years ahead. The strategy must have flexibility so that it can be adapted to the potential changes of business operations going forward. Redoing the strategy every year is not just expensive, it can likely be confusing, extremely difficult to complete the change in 12 months and can very likely undo any intelligence slowed or stopped from too many errors resulting from constant change.

The intelligent content structure has to support the capability for individual data components to be tagged so that data can be transformed to content then transformed to information. In addition, systems and procedures have to be implemented that prevent damage from such events as simultaneous updates to individual records.

There is so much more that intelligent content will provide to the organization. There will be faster response time to content questions, improved use of resources, and an increased satisfaction for all current users and the expanding base of future users. In early search days, we used the phrase data mining to locate and retrieve nuggets of data. Mining has matured and companies can now do content mining that provides a lot more nuggets along with the information that can be determined by viewing all of the collected nuggets as a whole.

As Ann Rockley says:

If we have a structure in our content we can manipulate it. … if it is structurally rich we can perform searches or narrow our search to the particular type of information we are interested in.” The focus of intelligent content is to help us improve decision making, perform better and work with more intelligence.

Comments { 0 }

Implementing DITA at Micron Technology, Inc. — Interview with Craig Henley, Manager, Marketing Publications Group

Written for Unlimited Priorities and DCLnews Blog.

Charlotte Spinner

Charlotte Spinner

Micron Technology, Inc., is one of the world’s leading semiconductor companies. Their DRAM, NAND, and NOR Flash memory products are used in everything from computing, networking, and server applications, to mobile, embedded, consumer, automotive, and industrial designs. Craig Henley is manager of the Marketing Publications Group at Micron. His team leads the DITA effort at Micron and oversees all conversion and implementation projects.

DITA (Darwin Information Typing Architecture) is an XML-based international standard that was initially developed by IBM for technical documentation and is now publicly available and is being applied to technical documentation in many industries. Craig shares with Charlotte Spinner of Unlimited Priorities his thoughts about a recent DITA conversion he worked on with Data Conversion Laboratory.

Charlotte Spinner: Craig, what were the business circumstances that led Micron to DITA? Describe the problem you were trying to solve.

Craig Henley: Micron has one of the most diverse product portfolios in the industry. Our complete spectrum of memory products—DRAM, Flash memory and more—require data sheets that contain technical specifications describing the product. These data-intensive documents typically exceed 100 pages and sometimes reach 200-300 pages, are heavy on graphics and tabular data, and are very complex. For each product family (e.g., SDRAM or DDR2) every item is available in multiple densities (256Mb, 512Mb, 1Gb, 2Gb, etc.), and each permutation requires its own large data sheet document. This provides the descriptive information a design engineer needs in order to incorporate the parts into the product, so data sheets are a key component for sales.

The data sheets were maintained using unstructured Adobe® FrameMaker® and stored, along with the many graphics, in large zip files which were then stuffed into a large enterprise CMS. We always knew that 80-90% of the content was reusable and could form a core batch of content, with the rest of the specifications varying a minute amount. But with the old system, if anything changed we had to update each document individually. This was a very unwieldy, very manual process—the “brute force method.” Even if we had to change boilerplate content—something in legal or copyright, logos, colors—it had to be changed in the template, and also in every old document whenever it was next modified. This was a maintenance nightmare, and the challenges compounded exponentially because of the sheer number of items involved.

The bottom line is that all of our products have to be supported at the data sheet level, so documentation is mission-critical for us. And we pride ourselves on the quality of our documentation. So we asked ourselves, “Do we keep trying to work harder and increasing resources and staff? Do we scale the brute force method, or do we work smarter?” Brute force is not cost-effective, so we decided to use DITA to help us work more efficiently.

CS: Beyond the obvious advantages of an XML-based solution, what made DITA an especially good fit for this project?

CH: We’re part of the digital era, but in fact we felt we were behind because our already large product portfolio was growing due to acquisitions. We adopted DITA in the nick of time. It verified our expectations with regard to eliminating redundancy and increasing efficiency, and also opened the door to new functionalities and capabilities, such as allowing us to publish documents that we couldn’t before.

For example, we had heard from field engineers that customers wanted a full complement of documentation for our Multichip Packages (MCPs), which may contain DRAM, Flash, Mobile DRAM, e-MMC, or more, so we have to assemble data sheets for all of the discrete components, not just a general one for the MCP overall. There are many different types of MCPs—any package can leverage other discrete components. This was a nightmare in the old paradigm. If anyone updated a DRAM or other specification that was being leveraged in 6-7 MCPs, how could we keep up with that?

DITA allows reuse in a way that lets us remove redundant information: we use existing DITA maps, pull out topics that don’t need to be there, nest the maps inside the MCP datasheet, and voila!—it’s created in minutes. All the information is still connected to the topics that are being leveraged in their original discrete datasheets, so if they are updated by engineers, the changes are inherited. This makes it easy to regenerate content, and it’s seamless to the customer.

Another factor that made DITA a great solution is that it’s an XML schema that’s very basic in design, so reuse is there and it’s easy, but it imposes enough structure on the user-base that everyone is operating under the same model. We call it “guided authoring”—it keeps people from veering off with their own methods for creating the documentation, which wouldn’t promote clean handoff. Authoring under DITA is clean—it guarantees that elements are used in a consistent way and leaves less room for errors. Initially, industries moving into the XML paradigm developed their own in-house DTDs, and I think that made it slower to adopt. But with DITA, the standardization makes it easy to have interoperability between departments, and even companies, which seems to be supporting its wider-scale adoption.

CS: Did you know much about DITA before embarking on this project?

CH: We had read about DITA in books and independent research articles, and learned more while attending an STC conference, so we had an idea that it could work well for us.

CS: Did you have any trouble selling the idea of a DITA conversion internally?

CH: At the core, we had the support we needed. Our management believes in innovation— they trusted us to go out and do things differently. So we brought in some reps from our key customer base for a pilot study. Once we proved the success of DITA in the pilot mode, and how it could scale, it gained traction and sold itself. We started at the grassroots level and it went from there, one successful demo at a time.

CS: Did you think you could do this alone at any point, or did you always know that using an outside expert was the best approach? What led you to Data Conversion Laboratory?

CH: We like the “learn to fish” approach, but when it came to full-blown conversion of legacy documents, we knew we’d need to go outside.

We had heard of DCL in STC publications, and we regularly read the DCL newsletter. We knew in the back of our minds that if we went full-on to DITA we would need to build a key XML foundation of content, and we didn’t want to do that manually. Tables in XML are complex, and ours are really complex. Our initial attempts at that XML conversion were too time- and labor-intensive, so we were concerned.

We brought in DCL and they talked us through their process. They explained some filters they use, and why the tables don’t have to be such a challenge. A test of some complex tabular data came back in pristine XML, so they were something of a lifesaver for us. DCL’s proprietary conversion technology—that “secret sauce” they have—is pretty magical.

CS: What did you have to do to prepare? Was there a need to restructure any of your data before converting to DITA?

CH: We did have to do some preparation in learning how to do some of our own information architecture work, and we discovered some best practices in prepping our unstructured content. Mostly it involved cleaning up existing source files, making sure we were consistent in our tagging for things like paragraphs, to ensure clean conversion. It was a fair amount of work—a front-loaded process—but well worth the investment.

CS: How did you get started? What was the initial volume?

CH: We had about a 2,000 page-count initially, much of which was foundational content we could leverage for lots of other documents. Starting with this set helped us scale almost overnight, and also helped accelerate our timeline, ramping quickly from proof-of-concept to pilot to full-blown production. Without that key conversion our timeline would have been drastically pushed out.

CS: Did the process run smoothly?

CH: Yes, I would say it ran very smoothly. There were a few unexpected results, but we communicated those back, adjustments were made, and that was it. The timeframe was on the order of a few months, and that was more dependent on us. We couldn’t keep up with DCL. In fact, we referred to their process as “the goat” because of the way it consumed and converted the data. They’re fast. They were always ahead of us.

CS: How did you make use of DITA maps?

CH: We adopted a basic shell for our documents, but then started nesting DITA maps within other DITA maps. This was another efficiency gain, giving us the capability to assemble large documents literally in minutes as opposed to hours and days. We specialized at the map level for elements and attributes to make them match our typical data sheet output, so we found maps to be quite flexible.

CS: What, if any, were the happy surprises or unexpected pitfalls of the project, and how did you deal with them?

CH: There were no pitfalls with DITA itself. It’s a general, vanilla specification. It might not semantically match every piece of content, but you can use it out of the box and adjust the tags and attributes to match your environment as needed with a little bit of customization. That’s actually a benefit of DITA—you can use it initially as is, and then modify it further along in your implementation.

CS: What have been the greatest benefits of the conversion? Were you able to reduce redundancy? By how much?

CH: The greatest benefit is that it helped us lay that foundation of good XML content in our CMS so we now can scale our XML deployment exponentially faster. With regard to redundancy, this is a guess-timate, but I’d say we had about a 75% reduction of manual effort, so a 75% gain in efficiency. For heavier reuse documents such as MCPs, the benefit scales even more.

CS: How do you anticipate using DITA down the road?

CH: We are publishing different types of documents in our CMS now, going beyond data sheets. Within technical notes, for example, there’s not much reuse, but DITA is still good because that data can be leveraged in other documents. We’d like to start seeing even more document types and use cases to leverage the reuse and interoperability. You don’t hear about DITA for pure marketing content, or things that are more standalone, but given what we’ve seen, we don’t see why you couldn’t use DITA for that. We’d like to branch out into other types of content. The pinnacle would be to have all of our organization’s data— internal and external —leveraged, authored and used within the XML paradigm. That might sound crazy and aggressive, but there would be benefits, such as dynamically assembling content to the Web.

CS: Can you expand a little more on dynamically-generated content?

CH: We haven’t fielded many requests for that yet, but we see the potential. Someone could call up and request a particular configuration of data sheets, and we could throw together that shell very fast because the topic-based architecture promotes that. The next level would be to take modular information and make it available on-demand to customers in a forward-looking capacity.

For example, if you build a widget on your website for a customer to request only certain parameters, such as electrical specs, across, say, four densities of a given product family, the content could be assembled as needed. Our CMS does facilitate that internally, assembling content on demand, but that could be a major differentiator for us if available via the intranet and Web. As XML comes of age, it’s not impossible, and it’s probably where we’re headed. We could package in mobile capacity, or whatever’s needed.

By making information available at that level, you speed up responsiveness and get feedback on a specification quickly. Fielding requests to the right people quickly can ramp up the communication process for development.

CS: Looking back, how would you advise an organization to prepare if they’re about to embark on a DITA conversion?

CH: Advise folks to understand and justify the reasoning. Spell out clearly—and potentially quantify—the types of efficiency gains they’ll get by going to this model. Understand that use case and justification, and be prepared to articulate that to the right people. Remember that saving time results in cost savings. Be able to articulate that part of the story.

In terms of planning and implementation, start at that proof-of-concept level, move to the pilot level, and identify ways you can scale the pilot to the full blown organizational level. Have a plan for scaling, because you’re going to hear that question.

Part of our success was adopting DITA and a CMS that was built to work with DITA, marrying the two. A good CMS that works well with an open source such as DITA lets you scale with your resources, because it gives you methods to accomplish your key tasks and provides avenues for bridging the gap—the leap—from the desktop publishing paradigm to the XML publishing paradigm. DITA with a good CMS implementation helps you bridge that gap and helps your users—writers, application engineers, etc.—take that step.

CS: Are there any other comments you’d like to make about your experiences with DITA?

CH: In a nutshell, we’re believers.

About the Author

Charlotte Spinner is a technical specialist for Unlimited Priorities. Further information about Micron Technology, Inc. may be found at www.micron.com. Further information about Data Conversion Laboratory, Inc. may be found at www.dclab.com.

Comments { 0 }

What Can You Do With XML Today?

Jay Ven Eman, Ph.D.

Jay Ven Eman, Ph.D.

Interest in Extensible Markup Language (XML) rivals the press coverage the World Wide Web received at the turn of the Millennium. Is it justified? Rather than answer directly, let us take a brief survey of what XML is, why it was developed, and highlight some current XML initiatives. Whether or not the hype is justified, ASIST members will inevitably become involved in your own XML initiatives.

An alternative question to the title is, “What can’t you do with XML?” I use it to brew my coffee in the morning. Doesn’t everyone? To prove my point, the following is a “well-formed” XML document. (“Well-formed” will be defined later in the article.)

<?xml version="1.0" standalone="yes" encoding="ISO-8859-1"?>
<StartDay Attitude="Iffy">
<Sunrise Time="6:22" AM="Yes"/>
<Coffee Prepare="Or die!" Brand="Starbuck’s" Type="Colombian"
<Water Volume="24" UnitMeasure="Ounces">Add cold water.</Water>
<Beans Grind="perc" Type="Java">Grind coffee beans.
Faster, please!!</Beans>
<Grounds>Dump grounds into coffee machine.</Grounds>
<Heat Temperature="152 F">Turn on burner</Heat>
<Brew>Wait, impatiently!!</Brew>
<Dispense MugSize="24" UnitMeasure="Ounces">Pour, finally.</Dispense>

This XML document instance contains sufficient information to drive the coffee making process. Given the intrinsic nature of XML, our coffee-making document instance could be used by the butler (should we be so lucky) or by a Java applet or perl script to send processing instructions to a wired or wireless coffeepot. If XML can brew coffee, it can drive commerce; it can drive R & D; it can drive the information industry; it can drive information research; it can drive the uninitiated out of business.

What is XML? To understand XML, you must understand meta data. Meta data is “data about data.” It is an abstraction, layered above the abstraction that is language. Meta data can be characterized as natural or added. To illustrate, consider the following text string, “Is MLB a sport, entertainment, or business?” You, the reader, can guess with some degree of accuracy that this is an article title about Major League Baseball (MLB). Presented out of context, even people are only guessing. Computers have no clue, in or out of, context. There are no software systems that can reliable recognize it in a meaningful way.

For this example, it is a newspaper article title. To it we will add subject terms from a controlled vocabulary, identify the author, the date, and add an abstract. As a “well-formed” XML document instance, it is rendered:

<?xml version="1.0" standalone="yes" encoding="ISO-8859-1"?>
<DOC Date=5/21/02 Doctype="Newspaper">
<TI> "Is MLB a sport, entertainment, or business?" </TI>
<Byline> Smith </Byline>
<ST> Sports </ST>
<ST> Entertainment </ST>
<ST> Business </ST>
<AB> Text of abstract...</AB>
<Text> Start of article ...</Text>

In this example, what are the meta data? What is natural and what is added? Natural meta data is information that enhances our understanding of the document and parts thereof, and can be found in the source information. The date, the author’s name, and the title are “natural” meta data. They are an abstraction layer apart from the “data” and add significantly to our understanding of the “data.”

The subject terms, document type, and abstract are “added” meta data. This information also adds to our understanding, but it had to be added by an editor and/or software. The tags are “added” and are meta data. Meta data can be the element tag, the attribute tag, or the values labeled by element and attribute tags. It is the collection of meta data that allows computer software to reliably deal with the data. Meta data facilitates networking across computers, platforms, software, companies, cities, countries.

What is the “data” in this example? The text, tables, charts, figures, and graphs that are contained within the open <Text> and close </Text> tags.

Comments { 0 }