Federated Searching: Good Ideas Never Die, They Just Change Their Names

Written by Barbara Quint for Unlimited Priorities and DCLnews Blog.

“I don’t want to search! I want to find!!” “Just give me the answer, but make sure it’s right and that I’m not missing anything.” In a world of end-user searchers, that’s what everyone wants, a goal that can explain baldness among information industry professionals and search software engineers. Tearing your hair out isn’t good for the scalp.

And, for once, Google can’t solve the problem. Well, at least, not all the problems. The Invisible or Dark or Deep Web, whatever you call the areas of the Web where legacy databases reside with interfaces old when the Internet was young, where paywalls and firewalls block the paths to high-quality content, where user authentication precedes any form of access — here lie the sources that end-users may need desperately and that information professionals, whether librarians or IT department staff, work to provide their clients.

As the Internet and its Web took over the online terrain, different names emerged, such as portal searching and — the winner in recent years — federated searching.

The challenge of enabling an end-users searcher community to extract good, complete results from numerous, disparate sources with varying data content, designs, and protocols is nothing new. Even back in the days when only professional searchers accessed online databases, searchers wanted some way to find answers in multiple files without having to slog through each database one at a time. In those days, the solution was called multi-file or cross-file searching, e.g. Dialog OneSearch or files linked via Z39.50 (ANSI/NISO standard for data exchange). As the Internet and its Web took over the online terrain, different names emerged, such as portal searching and — the winner in recent years — federated searching.

So what does federated searching offer? It takes a single, simple (these days, usually Google-like) search query and transforms it into whatever format is needed to tap into each file in a grouping of databases. It then extracts the records, manipulates them to improve the user experience (removing duplicates, merging by date or relevance, clustering by topic, etc.), and returns the results to the user for further action. The databases tapped may include both external databases, e.g. bibliographic/abstract databases, full-text collections, web search engines, etc., and internal or institutional databases, e.g. library catalogs or corporate digital records.

The key difference in federated searching is that it usually involves separate journeys by systems to access collections located in different sites.

In a sense, all databases that merge multiple sources, whether Google tracking the Open Web or ProQuest or Gale/Cengage aggregating digital collections of journals and newspapers or Factiva or LexisNexis building search services collections from aggregators and publishers, offer a uniform search experience for searching multiple sources. Even accessing legacy systems that use rigid interfaces is no longer unique to federated services as Google, Microsoft, and other services have begun to apply the open source Sitemap Protocol to pry open the treasures in government and other institutional databases. The key difference in federated searching is that it usually involves separate journeys by systems to access collections located in different sites. This can mean problems in scalability and turnaround speed, if a system gets bogged down by a slow data source.

A good federated system has to know just how each field in each database is structured and how to transform a search query to extract the needed data.

More important, however, are the problems of truly making the systems perform effectively for end-users. Basically, a lot of human intelligence and expertise, not to mention sweat and persistent effort, has to go into these systems to make them “simple” and effective for users. For example, most of the databases have field structures where key metadata resides. A good federated system has to know just how each field in each database is structured and how to transform a search query to extract the needed data. Author or name searching alone involves layers of questions. Do the names appear firstname-lastname or last name-comma-firstname? Are there middle names or middle initials? What separates the components of the names — periods, periods and spaces, just spaces? The list goes on and on — and that’s just for one component.

So how do federated search services handle these problems? In an article written by Miriam Drake that appeared in the July-August 2008 issue of Searcher entitled “Federated Search: One Simple Query or Simply Wishful Thinking,” a leading executive of a federated service selling to library vendors was quoted as saying, “We simply search for a text string in the metadata that is provided by the content providers – if the patron’s entry doesn’t match that of the content provider, they may not find that result.” Ah, the tough luck approach! In contrast, Abe Lederman, founder and president of Deep Web Technologies (www.deepwebtech.com), a leading supplier of federated search technology, responded about his company’s work with Scitopia, a federated service for scientific scholarly society publishers, “We spend a significant amount of effort to get it as close to being right as possible for Scitopia where we had much better access to the scientific societies that are content providers. It is not perfect and is still a challenge. The best we can do is transformation.”

A good federated system imposes a tremendous burden on the builders so the users can feel the search process as effortless.

Bottom line, technology is great, but really good federated services depend on human character as much or more than technological brilliance. The people behind the federated service have to be willing and able to track user experience, analyze user needs, find and connect up the right sources, build multiple layers of interfaces to satisfy user preferences and abilities, and then tweak, tweak, tweak until it works right for the user and keeps on working right despite changes in database policies and procedures. A good federated system imposes a tremendous burden on the builders so the users can feel the search process as effortless.

By the way, the name changes are apparently not over. A new phrase has emerged for something that looks a lot like same old/same old: discovery services. EBSCO Discovery Service, ProQuest’s Serials Solutions’ Summon, ExLibris’ Primo, etc. These products focus on the library market and all build on a federated search approach. The main difference that I can distinguish – beyond different content types and sources – lies in the customization features they offer. Librarians licensing the services can do a lot of tweaking on their own. Some of the services even support a social networking function. That could help a lot, since, in this observer’s humble opinion, the most critical element in success for these services, no matter what you call them, lies in the application of human intelligence and a commitment to quality.

About the Author

Barbara Quint of Unlimited Priorities is editor-in-chief of Searcher: The Magazine for Database Professionals. She also writes the “Up Front with bq” column in Information Today, as well as frequent NewsBreaks on Infotoday.com.

, , ,

Comments are closed.