Tag Archives | search

What you should know about Google’s New Privacy Policies

Google is updating its Privacy Policy and Terms of Service on March 1, 2012.  If you use any Google services this update applies to you.  This change will allow Google to support a single set of terms and policies across the entire Googleverse and make things simpler for both users and the search giant.

Privacy PolicyWith a single privacy policy, Google can use information from one service and deliver it to users of another service. This will help users get more out of Google+ by helping them connect with the people they correspond with via Gmail.

In short, we’ll treat you as a single user across all our products, which will mean a simpler, more intuitive Google experience,” Google notes.

Continue Reading →

Comments { 0 }

Google and the Evolution of Search

In August, Google posted a video, Another look under the hood of search, that shows some of the things they do to make changes and improvements to their search algorithm.  They followed up up last week with new 6-minute video on the evolution of search summarizing key milestones from the past 10 years and ending with a brief taste of what’s coming next:

The highlights include:

  • Universal Results: finding images, videos, and news, in addition to webpages.
  • Quick Answers: including flight times, sports scores, weather others.
  • The Future of Search: Their goal, to make searching as easy as thinking.

The full post is here:  The evolution of search in six minutes

Comments { 0 }

Google’s new “Freshness Algorithm”

Last Thursday, Google announced major changes to the way they present search results. The changes are expected to affect up to 35% of all searches. While relevance and currency have always been important in how high web sites appear in search results, this makes them even more so.

As explained by Amit Singhal in a post on the Official Google Blog, Google is making these changes because:

Even if you don’t specify it in your search, you probably want search results that are relevant and recent. If I search for olympics, I probably want information about next summer’s upcoming Olympics, not the 1900 Summer Olympics.

Google is basing the new search on their Caffeine web indexing system, introduced last year, which allows them to crawl and index the web in near real time.

These changes are obviously good for users. The implications for website owners will become clearer after some more usage and the algorithm will likely be tuned by Google, but several things are already obvious. Fresh content, including frequent updates, will be even more important. RSS feeds of your content and date-modified tags will help Google find the updates.

If you’ve been putting it off, now would be a good time to get a comprehensive analysis of your website.

Comments { 0 }

Conference Buzz: Re-inventing Content, Discovery, and Delivery for Today’s Academic Environment

Written for Unlimited Priorities and DCLnews Blog.

NFAIS 2011

Expectations of today’s academic information users have changed as technology has advanced and new technologies have appeared, so many information providers have re-invented their content accordingly. The processes of accessing and delivering information are considerably different than they were even a few years ago. This NFAIS symposium on May 25, 2011 in Philadelphia, PA examined some of the trends and issues that content providers have faced and the changes they have made to their products to accommodate today’s digital and multimedia technologies. The symposium had sessions on re-inventing content from traditional sources, effects of eBooks and eTextbooks on the learning process, and discovery and delivery platforms. It closed with a fascinating systems analysis look at book publishing.

Integration of Video

One of today’s major trends is the integration of video into all types of content. With the appearance of video hosting sites like YouTube, students have come to expect video content to play a prominent part in their education. In response to this demand, Alexander Street Press (ASP) modified its business strategy in order to concentrate on video-enhanced products. Stephen Rhind-Tutt, president of ASP, reported that the company has translated over 20,000 CD-ROMs into streaming media and has also developed a system to transcribe video into text and synchronize the text with the video images, thus allowing users to quickly and easily scan through the text and view only the portions of the video of interest to them. Other examples of video initiatives by publishers include the American Chemical Society, which developed a very successful video course, “Publishing Your Research 101” that was viewed over 24,000 times in one week and Pearson, a leading educational publisher, which is adding video and podcasts to its eBook products.

Re-Invention of Content From Traditional Sources

The Retrospective Index to Music Periodicals (RIPM) is one of the few content providers dealing with very old content—in this case, music periodicals from the 1800s up to about 1950. Because of the age of the source material, RIPM has several unique problems not generally faced by today’s information companies, such as the poor condition of the pages, handwritten notes on them, etc. RIPM has overcome these problems, producing a database of over 1.2 million pages that has become a major tool for teaching music. The user interface offers several advanced features, such as spelling suggestions, and even the ability to reconfigure one’s keyboard to accommodate non-Roman character sets.

Search vs. Discovery

Search, long a feature of information systems, has several well-known problems, as Bruce Kiesel, Director of Knowledge Base Management at Thomson Reuters, pointed out. It works best when you know what you are looking for, but it only retrieves documents. It cannot find answers to questions, knowledge, new information, or information spread across multiple documents. Discovery systems are making content increasingly intelligent, and they allow users to find unknown information by serendipity, create document maps, and find entities, concepts, relationships, or opinions. Semantic content enrichment can annotate knowledge, link to similar documents, and use metadata as a springboard to other documents, thus enabling information visualization and more proactive delivery. Thomson has greatly enhanced some of its databases using these techniques.

Re-inventing the Learning Experience

A new generation of electronic book products is changing the learning experience. It is no longer sufficient to simply repurpose printed books into a series of PDF documents. Pearson is using Flash technology in its eBooks, and Wiley has redesigned its WileyPlus product, organizing it by time instead of subject so that students can easily determine where they are in a course and can budget their time effectively. It also includes an “early warning system” that uses time and learning objectives to help students find their weak areas and study more effectively. M&C Life Sciences has overcome some of the well known problems of publication delays by selling its content as 50 to 100 page eBooks that include animations and video. Because of their small size and rapid publication schedules, these eBooks can be updated quickly and easily as necessary.

What is a Book?

Eric Hellman, founder of Gluejar, closed the day with a fascinating look at the future of book publishing from a systems analysis viewpoint, examining questions such as:

  • Is the future of publishing related to paper and ink, or bits?
  • Will we be working with documents or objects (like software)?
  • What are the objects in our environment and what are the relationships between them?
  • What will users do with the objects?

Systems analysis involves objects and the actions taken on them. In the publishing world, objects are textual data, articles, or photos, and the actions are navigation, sharing, and searching. Hellman compared a newspaper website such as the New York Times and a general news website such as CNN. The analysis shows that both sites have similar objects and actions (with the exception that CNN emphasizes videos), so they are very much alike. In contrast, single articles and videos are not as similar. An article has text, metadata, photos, and some context and can stand on its own; actions on it include searching and scanning through it. A video is usually focused on a single object with only some context; actions on it include play, pause, change the volume, etc. Applying this analysis to eBooks, Hellman suggested that an eBook is more like a video than an article, although some of them work well as websites. He went on to assert that selling objects has many advantages; the best model is to aggregate them and sell subscriptions because is a good fit with existing book businesses.

More details on this useful and interesting symposium are available on The Conference Circuit blog, and presenters’ slides have been posted on the NFAIS website.

Comments { 0 }

Peeling Back the Semantic Web Onion

Written for Unlimited Priorities and DCLnews Blog.

An Interview with Intellidimension’s Chris Pooley & Geoff Chappell

Chris Pooley is the CEO and co-founder of Intellidimension. His role is to lead its corporate development and business development efforts. Previously, Chris was Vice-President of Business Development at Thomson Scientific and Healthcare where he was responsible for acquisitions and strategic partnerships.

Richard Oppenheim

Richard Oppenheim

As stated in the first article, “the Semantic Web is growing and coming to a neighborhood near you.” (Read Richard Oppenheim’s first article here) Since that article, I had a conversation with Chris Pooley, CEO and co-founder of Intellidimension. Chris understands how the web and the Semantic Web work today. So let’s peel back some of the layers surrounding the semantic web onion and bring the hype down to earth.

Chris has spent years working with and developing applications specifically for the semantic web. Along with Geoff Chappell, Intellidimension president, our conversation ranged around the semantics of the Semantic Web and, more importantly, the impact it will have for access to information resources.

The vision of the founding fathers of the World Wide Web Consortium was for information to be accessible easily and in large volume with a process enabling the same information to be used for infinite purposes. For example, a weather forecast may determine whether your family picnic will be in sunshine or needs to be rescheduled. For the farmer the weather forecast is a key to what needs to be done for the planting and harvesting of crops. The retail store owner decides whether to have a special promotion for umbrellas or sunscreen lotion. The same information is used for different questions and actions.

Data publishers of all sizes and categories have information available. These publishers range from newspapers to retail stores to photo albums to travel sites, and a lot more; get the breaking news story, buy a book, connect with family albums, or book a flight. The web provides access to these benefits in endless combinations. The sites are holders of large volumes of data waiting for you to ask a question, or search. The applications are designed for human consumption so that people can find things when they choose to look.

The Semantic Web is a modified information agent in that there are one or more underlying software applications designed to aggregate information and create a unique pipeline of data for each specific user.

The foundation of the Semantic Web is all about relationships between any two items…

One of the key attributes of the web is that we can link any number of individual pages together. You can be on one page and click to go to another page of your choice. You can send an email that has an embedded link to allow the reader one click access to a specific page.

Chris emphasizes that the Semantic Web is not about links between web pages. “The foundation of the Semantic Web is all about relationships between any two items,” says Chris. Tuesday’s weather has a relationship to a 2pm Frontier flight leaving from Denver. Mary’s booking on that flight means that her ticket and seat assignment also has a relationship. In the semantic web sense, there is a relationship between Tuesday’s weather and Mary.

The growth of the Semantic Web will expand the properties of things to include lots of elements, such as price, age, meals, destination, and so on. The language for describing this information and associated resources on the web is the Resource Description Framework (RDF). Putting information into RDF files, makes it possible for computer programs (“web spiders”) to search, discover, pick up, collect, analyze and process information from the web. The Semantic Web uses RDF to describe web resources.

For end users, the continued adoption of the Semantic Web technologies will mean that when they search for product comparisons they will find more features in the comparisons which should make the process easier, faster, and provide better results.

Whether you seek guidance from the Guru on the mountain top or the Oracle at Delphi, information will range from numbers to statistical charts, from words to books, from images to photo albums, from medication risks to medical procedure analysis to doctor ratings.

Chris Pooley states, “For end users, the continued adoption of the Semantic Web technologies will mean that when they search for product comparisons they will find more features in the comparisons which should make the process easier, faster, and provide better results. For a business user or enterprise the benefits will be huge. By building Semantic Web enabled content, businesses will be able to leverage their former content silos; and the cost of making changes or adding new data elements (maintaining their content) will be reduced while flexibility will be improved, by using the rules-based approach for Semantic Web projects.”

With this vast increase in data volume, users should remember to be certain they trust the data that is retrieved. As part of the guidelines for proper use of the semantic web, we need to establish base levels of reliability for the sources being accessed. This requires some learning and practice to determine what maps appropriately to the level of accuracy needed. The weather forecast can be off a few degrees. Sending a space vehicle to Mars requires far greater accuracy since being off even one degree will cause the vehicle to miss its intended target.

Both end users and enterprise users will learn new ways to pay attention to the data validity. Trusting the source may require a series of steps that includes tracking the information over an extended time period. This learning process will also include a clear explanation of why that information is out there. For example, a company’s historical financial information is not the same as the company’s two year marketing forecast.

There is a chicken and egg aspect to approaching growing accessibility to more data. More data means more opportunity to collect valuable information. It also means that more care needs to be exercised to identify and separate meaningful relevant data from data noise. For example, the retailer Best Buy has started down this path by collecting 60% more bits of information from user clicks on their web site. This enriched data delivers added value to the retailer for more accurate and timely business decisions about products and selling techniques.

One of the intoxicating things about the web is that the vast majority of data, entertainment and resources are all free to anyone with an internet connection. While Chris acknowledges the current state of free resources, he also anticipates that in the future, there will likely be a need for some fee structure for the aggregator of content. With data demand growing exponentially, there will be a corresponding demand for huge increases in both storage capacity and internet bandwidth. The Semantic Web will require more big data mines and faster communications.

There is a significant difference between infrastructure and the applications that ride on that structure. Bridges are constructed to enable cars to use the span to get from one side to the other. The infrastructure of the bridge demands it holds all of the bridge weight as the weight of all cars at any one moment is insignificant to the bridge’s total weight.

Chris Pooley’s company, Intellidimension, builds infrastructure products delivering a useful and usable bridge for enterprise users. These users then create aggregating and solution oriented applications that travel along the appropriately named information super highway. Chris says, “The evolving Semantic Web technologies will offer benefits for the information producer and the information user that will enrich and enlarge what we see and how we see it.”

About the Author

Richard Oppenheim, CPA, blends business, technology and writing competence with a passion to help individuals and businesses get unstuck from the obstacles preventing their moving ahead. He is a member of the Unlimited Priorities team. Contact him by e-mail or follow him on Twitter at twitter.com/richinsight.

Comments { 0 }

Conference Buzz: Discovery Systems at Internet Librarian

Written for Unlimited Priorities and DCLnews Blog.

A major topic at current conferences is “discovery,” and this was certainly true at the recent Internet Librarian (IL) conference in Monterey, CA on October 25-27. So what is discovery and how does it differ from other forms of search?

Internet Librarian 2010Information users are trying to access more and more types of information, both in locally stored databases like library catalogs, and external information from commercial systems, and increasingly they want to do it with a single Google-like search box. The problem is that content is often siloed in many different databases, and it comes in a wide variety of formats. Most users therefore become frustrated in finding the information they need because they do not know where the information resides, what the database name is, and how to access it.

The first attempts to help users took the form of “federated search,” in which the user’s search query was presented to several databases in turn, and then the results were aggregated and presented in a single set. The problem with federated search was that the user had to select the databases to be searched, and then had to wait for the sequential searches to be performed and processed, which could result in long response times.

In current discovery systems, a unified index of terms from a wide variety of databases is constructed by the system, and the queries are processed against it. Several discovery systems can aggregate the results and remove duplicate items. This approach has significant advantages over federated search systems:

  • The user does not need to know anything about the databases being searched,
  • Because only one search is performed, searches are faster, and
  • The same interface can be used to search for information from all databases.

A major issue with discovery systems is that considerable effort (and time!) must be expended in installing a system in an organization and customizing it to access only those systems to which access is available.

Activity in discovery systems in currently intense, and several competitors are vying for market share. Examples are Summon from ProQuest, EBSCO Discovery System, WorldCat Local from OCLC, and Primo from Ex Libris. One might ask if discovery systems are better than Google Scholar. One panel at IL looked at this question, and the answer is that they appear to be, but further investigation is needed. And because these systems are newly developed, problems with installation and the relevance of results have not all been solved yet. But it is clear that discovery is an exciting new advance in searching, and we can expect to see new advances coming rapidly in the near future.

Comments { 0 }

Interview with Deep Web Technologies’ Abe Lederman

Written for Unlimited Priorities and
DCLnews Blog by Barbara Quint

Abe Lederman

Abe Lederman

Abe Lederman is President and CEO of Deep Web Technologies, a software company that specializes in mining the deep web.

Barbara Quint: So let me ask a basic question. What is your background with federated search and Deep Web Technologies?

Abe Lederman: I started in information retrieval way back in 1987. I’d been working at Verity for 6 years or so, through the end of 1993. Then I moved to Los Alamos National Laboratory, one of Verity’s largest customers. For them, I built a Web-based application on top of the Verity search engine that powered a dozen applications. Then, in 1997, I started consulting to the Department of Energy’s Office of Science and Technology Information. The DOE’s Office of Environmental Management wanted to build something to search multiple databases. Then, we called it distributed search, not federated search.

The first application I built is now called the Environmental Science Network. It’s still in operation almost 12 years later. The first version I built with my own fingers on top of a technology devoted to searching collections of Verity documents. I expanded it to search on the Web. We used that for 5 to 6 years. I started Deep Web Technologies in 2002 and around 2004 or 2005, we launched a new version of federated search technology written in Java. I’m not involved in writing any of that any more. The technology in operation now has had several iterations and enhancements and now we’re working on yet another generation.

BQ: How do you make sure that you retain all the human intelligence that has gone into building the original data source when you design your federated searching?

AL: One of the things we do that some other federated search services are not quite as good at is to try to take advantage of all the abilities of our sources. We don’t ignore metadata on document type, author, date ranges, etc. In many cases, a lot of the databases we search — like PubMed, Agricola, etc. — are very structured.

BQ: How important is it for the content to be well structured? To have more tags and more handles?

AL: The more metadata that exists, the better results you’re going to get. In the library world, a lot of data being federated does have all of that metadata. We spend a lot of effort to do normalization and mapping. So if the user wants to search a keyword field labeled differently in different databases, we do all that mapping. We also do normalization of author names in different databases — and that takes work! Probably the best example of our author normalization is in Scitopia.

BQ: How do you work with clients? Describe the perfect client or partner.

AL: I’m very excited about a new partnership with Swets, a large global company. We have already started reselling our federated search solutions through them. Places we’re working with include the European Space Agency and soon the European Union Parliament, as well as some universities.

We pride ourselves on supplying very good customer support. A lot of our customers talk to me directly. We belong to a small minority of federated search providers that can both sell a product to a customer for deployment internally and still work with them to monitor or fix any issues with connectors to what we’re federating, but get no direct access. A growing part of our business uses the SaaS model. We’re seeing a lot more of that. There’s also the hybrid approach, such as that used by DOE’s OSTI. At OSTI our software runs on servers in Oak Ridge, Tennessee, but we maintain all their federated search applications. Stanford University is another example. In September we launched a new app that federates 28 different sources for their schools of science and engineering.

BQ: How are you handling new types of data, like multimedia or video?

AL: We haven’t done that so far. We did make one attempt to build a federated search for art image databases, but, unfortunately for the pilot project, the databases had poor metadata and search interfaces. So that particular pilot was not terribly successful. We want to go back to reach richer databases, including video.

BQ: How do you gauge user expectations and build that into your work to keep it user-friendly?

AL: We do track queries submitted to whatever federated search applications we are running. We could do more. We do provide Help pages, but probably nobody looks at them. Again, we could do more to educate customers. We do tend to be one level removed from end-users. For example, Stanford’s people have probably done a better job than most customers in creating some quick guides and other material to help students and faculty make better use of the service.

BQ: How do you warn (or educate) users that they need to do something better than they have, that they may have made a mistake? Or that you don’t have all the needed coverage in your databases?

AL: At the level of feedback we are providing today, we’re not there yet. It’s a good idea, but it would require pretty sophisticated feedback mechanisms. Some of the things we have to deal with is that when you’re searching lots of databases, they behave differently from each other. Just look at dates (and it’s not just dates), some may not let you search on a date range. A user may want to search 2000-2010 and some databases may display the date, but not let you search on it; some won’t do either. Where the database doesn’t let you search on a date range but displays it, you may get results outside of the date and display them with the unranked results. How to make it clear to the user what is going on is a big thing for the future.

BQ: What about new techniques for reaching “legacy” databases, like the Sitemap Protocol used by Google and other search engines?

AL: That’s used for harvesting information the way that Google indexes web sites. The Sitemap Protocol is used to index information and doesn’t apply to us. Search engines like Google are not going into the databases, not like real-time federated search. Some content owners want to expose all or some of their content existing behind the search forms to search engines like Google. That could include DOE OSTI’s Information Bridge and PubMed for some content. They do expose that content to a Google through sitemaps. A couple of years ago, there was lots of talk about Google statistically filling out forms for content behind databases. In my opinion, they’re doing this in a very haphazard manner. That approach won’t really work.

BQ: Throughout the history of federated search — with all its different names, there have been some questions and complaints about the speed of retrieving results and the completeness of those results from lack of rationalizing or normalizing alternative data sources. Comments?

AL: We’re hearing these days a fair amount of negative comments on federated search and there have been a lot of poor implementations. For example, federated search gets blamed for being really slow, but that probably happens when most federated searches systems wait until each search is complete before displaying any results to the user. We’ve pioneered incremental search results. In our version, results appear within 3-4 seconds. We display whatever results have returned, while, in the background, our server is still processing and ranking results. At any time, the user can ask for a merging of results they’ve not gotten. So the user gets a good experience.

BQ: If the quality of the search experience differs so much among different federated search systems, when should a client change systems?

AL: We’ve had a few successes with customers moving from one federated search service to ours. The challenge is getting customers to switch. We realize there’s a fairly significant cost in switching, but, of course, we love to see new customers. For customers getting federated search as a service, it costs less than if the product were installed on site. So that makes it more feasible to change.

BQ: In my last article about federated searching, I mentioned the new discovery services in passing. I got objections from some people about my descriptions or, indeed, equating them with federated search at all. People from ProQuest’s Serials Solutions told me that their Summon was different because they build a single giant index. Comments?

AL: There has certainly been a lot of talk about Summon. If someone starts off with a superficial look at Summon, it has a lot of positive things. It’s fast and maybe does a better job (or has the potential to do) better relevance ranking. It bothers me that it is non-transparent on a lot of things. Maybe customers can learn more about what’s in it. Dartmouth did a fairly extensive report on Summon after over a year of working with it. The review was fairly mixed, lots of positives and comments that it looks really nice, lots of bells and whistles in terms of limiting searches to peer-reviewed or full text available or library owned and licensed content. But beneath the surface, a lot is missing. It’s lagging behind on indexing. We can do things quicker than Summon. I’ve heard about long implementation times for libraries trying to get their own content into Summon. In federated searching, it only takes us a day or two to add someone’s catalog into the mix. If they have other internal databases, we can add them much quicker.

BQ: Thanks, Abe. And lots of luck in the future.

Related Links

Deep Web Technologies – www.deepwebtech.com
Scitopia – www.scitopia.org
Environmental Science Network (ESNetwork) – www.osti.gov/esn

About the Author

Barbara Quint of Unlimited Priorities is editor-in-chief of Searcher: The Magazine for Database Professionals. She also writes the “Up Front with bq” column in Information Today, as well as frequent NewsBreaks on Infotoday.com.

Comments { 0 }

Federated Searching: Good Ideas Never Die, They Just Change Their Names

Written by Barbara Quint for Unlimited Priorities and DCLnews Blog.

“I don’t want to search! I want to find!!” “Just give me the answer, but make sure it’s right and that I’m not missing anything.” In a world of end-user searchers, that’s what everyone wants, a goal that can explain baldness among information industry professionals and search software engineers. Tearing your hair out isn’t good for the scalp.

And, for once, Google can’t solve the problem. Well, at least, not all the problems. The Invisible or Dark or Deep Web, whatever you call the areas of the Web where legacy databases reside with interfaces old when the Internet was young, where paywalls and firewalls block the paths to high-quality content, where user authentication precedes any form of access — here lie the sources that end-users may need desperately and that information professionals, whether librarians or IT department staff, work to provide their clients.

As the Internet and its Web took over the online terrain, different names emerged, such as portal searching and — the winner in recent years — federated searching.

The challenge of enabling an end-users searcher community to extract good, complete results from numerous, disparate sources with varying data content, designs, and protocols is nothing new. Even back in the days when only professional searchers accessed online databases, searchers wanted some way to find answers in multiple files without having to slog through each database one at a time. In those days, the solution was called multi-file or cross-file searching, e.g. Dialog OneSearch or files linked via Z39.50 (ANSI/NISO standard for data exchange). As the Internet and its Web took over the online terrain, different names emerged, such as portal searching and — the winner in recent years — federated searching.

So what does federated searching offer? It takes a single, simple (these days, usually Google-like) search query and transforms it into whatever format is needed to tap into each file in a grouping of databases. It then extracts the records, manipulates them to improve the user experience (removing duplicates, merging by date or relevance, clustering by topic, etc.), and returns the results to the user for further action. The databases tapped may include both external databases, e.g. bibliographic/abstract databases, full-text collections, web search engines, etc., and internal or institutional databases, e.g. library catalogs or corporate digital records.

The key difference in federated searching is that it usually involves separate journeys by systems to access collections located in different sites.

In a sense, all databases that merge multiple sources, whether Google tracking the Open Web or ProQuest or Gale/Cengage aggregating digital collections of journals and newspapers or Factiva or LexisNexis building search services collections from aggregators and publishers, offer a uniform search experience for searching multiple sources. Even accessing legacy systems that use rigid interfaces is no longer unique to federated services as Google, Microsoft, and other services have begun to apply the open source Sitemap Protocol to pry open the treasures in government and other institutional databases. The key difference in federated searching is that it usually involves separate journeys by systems to access collections located in different sites. This can mean problems in scalability and turnaround speed, if a system gets bogged down by a slow data source.

A good federated system has to know just how each field in each database is structured and how to transform a search query to extract the needed data.

More important, however, are the problems of truly making the systems perform effectively for end-users. Basically, a lot of human intelligence and expertise, not to mention sweat and persistent effort, has to go into these systems to make them “simple” and effective for users. For example, most of the databases have field structures where key metadata resides. A good federated system has to know just how each field in each database is structured and how to transform a search query to extract the needed data. Author or name searching alone involves layers of questions. Do the names appear firstname-lastname or last name-comma-firstname? Are there middle names or middle initials? What separates the components of the names — periods, periods and spaces, just spaces? The list goes on and on — and that’s just for one component.

So how do federated search services handle these problems? In an article written by Miriam Drake that appeared in the July-August 2008 issue of Searcher entitled “Federated Search: One Simple Query or Simply Wishful Thinking,” a leading executive of a federated service selling to library vendors was quoted as saying, “We simply search for a text string in the metadata that is provided by the content providers – if the patron’s entry doesn’t match that of the content provider, they may not find that result.” Ah, the tough luck approach! In contrast, Abe Lederman, founder and president of Deep Web Technologies (www.deepwebtech.com), a leading supplier of federated search technology, responded about his company’s work with Scitopia, a federated service for scientific scholarly society publishers, “We spend a significant amount of effort to get it as close to being right as possible for Scitopia where we had much better access to the scientific societies that are content providers. It is not perfect and is still a challenge. The best we can do is transformation.”

A good federated system imposes a tremendous burden on the builders so the users can feel the search process as effortless.

Bottom line, technology is great, but really good federated services depend on human character as much or more than technological brilliance. The people behind the federated service have to be willing and able to track user experience, analyze user needs, find and connect up the right sources, build multiple layers of interfaces to satisfy user preferences and abilities, and then tweak, tweak, tweak until it works right for the user and keeps on working right despite changes in database policies and procedures. A good federated system imposes a tremendous burden on the builders so the users can feel the search process as effortless.

By the way, the name changes are apparently not over. A new phrase has emerged for something that looks a lot like same old/same old: discovery services. EBSCO Discovery Service, ProQuest’s Serials Solutions’ Summon, ExLibris’ Primo, etc. These products focus on the library market and all build on a federated search approach. The main difference that I can distinguish – beyond different content types and sources – lies in the customization features they offer. Librarians licensing the services can do a lot of tweaking on their own. Some of the services even support a social networking function. That could help a lot, since, in this observer’s humble opinion, the most critical element in success for these services, no matter what you call them, lies in the application of human intelligence and a commitment to quality.

About the Author

Barbara Quint of Unlimited Priorities is editor-in-chief of Searcher: The Magazine for Database Professionals. She also writes the “Up Front with bq” column in Information Today, as well as frequent NewsBreaks on Infotoday.com.

Comments { 0 }