Improving Enterprise Search Using Auto-Categorization: Making the Business Case to Senior Executives

By Marjorie M.K. Hlava and Jay Ven Eman
of Access Innovations, Inc

The significance of using a business case approach to improve corporate search using auto-categorization and taxonomy is the subject of this white paper. These solutions are understood by corporate librarians and knowledge management leaders, but the value aspect is often poorly comprehended by the executives responsible for the budget and approval process.

This paper differentiates between solely presenting a technical resource to the business vs. using a well thought-out business case when attempting to procure enterprise or department funding. Search is on the radar of senior management due to the appearance of Google and other search systems. There is a vast proliferation of knowledge workers, and efficiencies in information throughput are in strong demand. Workers spend more than 25% of their time searching for information (IDC Research, 2008). The average corporation has four search systems with none of them delivering productivity to the work force. This issue has emerged as a significant concern in helping to drive higher business productivity and profits.

This paper outlines how the development of a cohesive taxonomy strategy, well aligned with corporate business needs, becomes a strategic investment supporting staff productivity and overall knowledge worker output quality. It is a tactical purchase to strengthen the company’s competitive edge.

There is now a 92% accuracy rating on accounting and regulatory document search based on hit, miss and noise or relevance, precision and recall statistics [using] Access Innovations. –USGAO

Obstacles in optimizing search

The problem with search is that it usually depends on statistics and immense data processing and storage to process answers, without paying attention to the language of the user. Corporate intranets, pharmaceutical firms, large database publishers, and magazine and content publishers suffer without well-formed information to clearly indicate conceptual links, provide replicable results, and support intuitive semantic search. This directly impacts the knowledge worker’s patience and productivity, with many spending one fourth of their time looking for information rather than using it in creative and strategic ways. Individual lost time multiplied by tens to hundreds in a large corpora- tion significantly undermines the bottom line. By not readily allowing the user his or her own terminology, the system creates small hurdles which, multiplied by many failed searches, become large barriers. The result is a loss of efficiency and flexibility across the entire enterprise.

Agile enterprises must provide a mechanism for the user to automatically translate their terms, dialect, or language into well-formed, standard terms. This provides for consistent, deep searching, the most effective means to obtain information with comprehensive recall and accuracy. It prevents trial- and-error searching that wastes workers’ time. Factor in the direct and burden costs of each knowledge worker; the cost savings rapidly become significant.

Research has shown that most classification systems touted as automatic actually require rules to reach productive levels for production or search. The rules differentiate among meanings of words to correctly interpret a document. To create and maintain these rules, one needs to build a rich semantic layer and then place a rule-based appli-cation over the classification function. Traditional search does not provide this functionality. To facilitate information capture and retrieval that runs at 6, 8, even 10 times greater productivity, a good taxonomy must provide the search backbone.

IT departments, charged with safeguarding valuable corporate information, require a simple and safe way for users to manage the categorization tools, to avert increasing IT costs and burden. The current move to Web 2.0 empowers users and lessens the load on IT departments. Collaborative taxonomy management supports Web 2.0 initiatives.

We have moved from a fielded Boolean search to a faceted search GUI, but the fundamentals of search still hold. The 1960s gave us the Arpanet and ReCon systems, which gave rise to the Internet and present search technologies. Metadata elements rose from fielded data. The missing piece in today’s search is the taxonomy application. The market challenge is to produce solutions that enhance search through taxonomy and automatic categorization.

IEEE had their system up and running in three days, in full production in less than two weeks. –Institute of Electrical and Electronics Engineers

The American Economic Association said its editors think using it is fun and makes time fly! –American Economic Association (AEA)

The business of auto-categorization and taxonomies

Well-formed data, with clear indication of conceptual semantic links, provides replicable results and intuitive, semantic search. Users search with their own words, removing obstacles to search success and increasing productivity. The system translates non-standard word choices to consistent taxonomy terms, resulting in consistent, deep searching and, ultimately, greater knowledge access and use.

To produce the highest level of productivity at the most cost-effective TCO (total cost of ownership), a system must provide both semantic interpretation and governing rules linked to a taxonomy. This ensures fast, accurate search regardless of the skill or number of users.

Good corporate compliance systems need to ensure conformity with accepted taxonomy standards. These include ANSI/NISO Z39.19, and those from the ISO, WC3, British Standards Institute, and other standards-setting organizations.

To minimize costs, the categorization system should work both at the content creation, content management, digital depository end of the information management process and at the search end to provide seamless performance.

Dangers in the industry that inhibit seamless performance include out-of-date data schemas in which critical data is stored in extinct formats and media. Strategic planning for search must consider migration of this data as technical platforms evolve. Most enterprises handle terabytes of data with an average lifespan of 3 years. With often inadequate and over-capacity contingency plans (all of which further exacerbate search inefficiencies), these huge information stores must be configured to ensure that the data is platform-independent and accommodates new technologies.

Value drivers for your project

Business issues and value drivers supporting projected returns are shown here.

Business issues and value drivers supporting projected returns are shown here.

The need for a supportive business case

A business case is vital in helping executives rationalize decisions, especially ones of a technical nature. It facilitates their ability to analyze the technology’s impact compared with other corporate opportunities, particularly with limited budgets.

Having financial metrics along with technical recommendations fuels the ability to communicate expected upstream value. Several industry-leading vendors are extending themselves by drawing up contracts where payment is conditioned on proving delivered value. Accenture, Triology, and IBM have established value-based selling as a best practice; soon, it will be an industry standard.

Research shows that, of over 400 software vendors, close to 75% fail to prove their solution’s tangible value. These vendors sell solutions that challenge the client to build business value. But that business value must be clearly described in the business case.

Building a supportive business case also needs to address technical issues such as enabling semantic search, interlinking data, and using rules.

Many firms use a “discovery” process, where technical and business parties join forces in discovering value in a proposed solution. This collaborative process demonstrates how departmental needs are aligned with business value and IT impact and strengthens your business case.

The following elements are key in assembling a software or services business case:

  1. Value proposition – summarizes the position
  2. Executive summary– brief and bottom line
  3. Risk, impact, and strategic benefit
  4. ROI validation – clear and concise is best
  5. Competitive TCO – for competing vendors
  6. IT impact and support – to build bridges

ProQuest CSA has achieved a 7-fold increase in productivity. –ProQuest CSA

Weather Channel finds things 50% faster using Data Harmony. A significant saving in time. –The Weather Channel

Supporting the Metrics

The baseline for integration of automated or assisted metatagging integrated into your workflow should be 85% accuracy or 15-20% irrelevant returns (noise). When this level is reached, you can potentially see seven-fold increases in productivity and cut search time in half. Achieving these levels demonstrated notable credibility for CSA’s implementation.

Though the benefits of an ROI measure depend on size of audience, audience level, complexity of content, and complexity of search, there are reliable data points that can be used. This table serves as a guideline when building cost-justification efforts to buy auto-classification and taxonomy solutions.

A guideline when building cost-justification efforts.

A guideline when building cost-justification efforts.

The Value Produced

Building your case will be invaluable when presenting it to management or a budgeting committee. It helps your department be viewed as in-step with management and supporting corporate strategic goals. To the owner of the case, the benefits are clear:

  • Projects are better received.
  • Projects are well justified.
  • Projects are viewed beyond “tools”.
  • Projects receive better funding.


This paper seeks to illuminate the importance of a well thought-out business case. Whether using outside vendors or an internal committee, following the steps to build each aspect of a persuasive business case for a solution’s implementation is ultimately the most successful way to identify your needs and promote your project.

About Access Innovations

Access Innovations, Inc. is a software and services company founded in 1978. It operates under the stewardship of the firm’s principals, Marjorie M.K. Hlava, President and Jay Ven Eman, CEO.

Closely held and financed by organic growth and retained earnings, the company has three main components- a robust services division, the Data Harmony software line, and the National Information Center for Educational Media (NICEM).

, , ,

Comments are closed.