JSTOR Home Skip to Main Content
RESOURCES FOR LIBRARIANS   |  RESOURCES FOR PUBLISHERS  |  PARTICIPATION INFORMATION
Search Journals Browse Journals Tips Set Preferences About JSTOR Contact JSTOR

No. 8, Issue 1, JSTORNEWS, February 2004

Metasearching JSTOR

JSTOR has a dual mission: to be a trusted electronic archive of important scholarly literature, and to extend access to that archive as broadly as possible. In the last few years, we have implemented initiatives guided by both parts of our mission. In terms of access, we have initiated accessibility improvements to our website and search engine, we have provided more seamless access to citation management software, and we have facilitated article-level linking with publishers, libraries, scholars, and commercial resource providers. We realize that students and scholars want to get to information as easily as possible, and so we are committed to making access to the JSTOR archive as convenient as possible.

As part of this effort, we have been following the recent developments in metasearching (broadcast searching, or federated searching). As you may know, metasearch tools are available that will provide a single point of access to the electronic content available at a particular library, allow a user or library to search disparate electronic resources at the same time, and then aggregate the results sets. In early 2003, the National Information Standards Organization (NISO) announced an initiative to develop guidelines and standards for the metasearching environment. In October 2003, JSTOR participated in a NISO-sponsored workshop on metasearching, and has since been asked to join the committee drafting a set of standards for metasearch protocols of service and support.

While JSTOR does not offer a "broadcast search" function itself, the archive is the target of many metasearch engine requests. Presently, JSTOR has no formal agreements in place with any of the metasearch engine providers or implementers. This is not an optimal situation, for many business and user support reasons, and we are working hard to rectify this. We believe that metasearching can be beneficial, especially for the undergraduate community, and we fully support our participants' efforts to implement metasearch services. However, we (as a community) should not be comfortable with the idea of sacrificing quality for the sake of expedience. Our work with the NISO Metasearch Initiative committee is focused on improving the quality of the metasearching efforts to date, to offer an example of effective partnership agreements in these endeavors, and to help create reliable service guidelines between all parties so that end-user support is as seamless as possible.

At the recent JSTOR Participants Meeting at the ALA Midwinter Meeting in San Diego, Bruce Heterick, JSTOR's Director of Library Relations, outlined several of the technology, user support, and policy considerations that are important in supporting metasearching.

Technology

The current technique employed by metasearch tools - issuing HTTP requests to JSTOR, retrieving an HTML page with the number of "hits", then "screen-scraping" that number from the HTML page to include in a larger, aggregated result set - is neither sufficient nor desirable as a long-term strategy. HTML was designed as a display protocol and was not really built for the robotic data parsing that metasearching requires. As a result, quality suffers. In addition, currency is very difficult because each time JSTOR changes its search interface, the metasearch engine must make the same changes in order to continue to retrieve the HTML properly. The current authentication methodologies (userid/password, IP) are also problematic because resource providers cannot currently differentiate individual users from the metasearch engines. The metasearching requests originate from the same IP range as the normal searching activity from an institution, and because the metasearch engines do not identify themselves, it's very difficult to distinguish use, especially as that applies to evaluating usage statistics.

From a searching perspective, one of the unique features of JSTOR is lost in the current implementation of metasearch engines. When users search the archive in the native JSTOR search interface, they are required to identify the discipline(s) in which the search should be executed. This feature is lost in a broadcast search to JSTOR, as the metasearch engine has no way to determine which discipline(s) to search. Instead, the metasearch engine simply searches all disciplines in JSTOR. This is inefficient for many reasons, and has at least two unintended consequences: skewing user statistics, and adversely affecting system performance.

Support

Providing quality user support is a critical concern as this juncture. At the moment, if JSTOR receives a support question, for instance, stating that "response time is slow", we may not be able to effectively diagnose the problem because we are not aware exactly "how" JSTOR is being searched at the institution. Is it in the native JSTOR interface? Is it via a stand-alone metasearch engine, and is that engine hosted locally or remotely? Or is the search originating through an integrated library system that has licensed a metasearch engine and incorporated it into its own metasearching services? Understanding the interplay of these services at any institution is imperative in providing support. It is important that participating institutions notify JSTOR when these types of services are implemented. As well, it is important that JSTOR have support arrangements in place with the appropriate metasearch engine providers and integrated library system providers so that support can be as seamless as possible for the library and its constituents.

In addition to understanding the impact of metasearching on system performance and user statistics, we also need to understand the impact of metasearching on other types of use of the archive. For instance, how will this type of federated searching affect browsing and printing in the archive? If metasearching begins to significantly alter user behavior in relation to the other functions in JSTOR, we will need to track those changes. Alterations in user searching will impact on how JSTOR deals with future research and development on the user interface.

Policy

At the NISO Workshop in October, the question arose as to whether the libraries which had implemented metasearching tools had reviewed their license agreements with the resource providers, prior to implementation, to confirm that metasearching was permitted under the terms and conditions of those agreements. Only a handful indicated that this step had been taken. JSTOR's Archive License Agreement, for example, does not specifically address metasearching at present. Given these new circumstances, it seems appropriate to add language to the License to clearly state what type of metasearching will be supported going forward. In addition, as JSTOR has no current agreements in place with any of the metasearch engine providers, nor the integrated library systems that offer this in their products, it will be important to build the necessary partnerships to allow each party to codify the appropriate business relationships and service level agreements necessary to provide quality end-user support.

There are also questions with regard to the costs that may be incurred in support of metasearching. We need to be able to determine the real system costs (e.g. servers, increased network traffic) as well as the associated support costs. If those costs are significant, as a not-for-profit organization, we will need to outline how we plan to recover those costs.

Going Forward

Throughout the first months of 2004, JSTOR is working to define an organizational strategy for metasearching - one that addresses technology issues, interface issues, authentication issues, and support issues. In addition, we will work closely with NISO, and the Metasearch Initiative, to help define standards in this area which will influence this emerging community in a positive, high-impact way. JSTOR will begin to build business relationships with the companies and organizations providing metasearching tools to our participants, and we will start to look closely at how metasearching is affecting user behavior in JSTOR. Finally, we will work over the next few months to build cost models that we can share with the community regarding the tendant costs of supporting metasearching.

The input of our library and publisher participants will be critical as we proceed with this work. Please contact us at support@jstor.org should you have any comments.

Last updated on September 8, 2006


JSTOR HOMESEARCHBROWSETIPSSET PREFERENCES  | ABOUT JSTOR  | CONTACT JSTOR  | TERMS & CONDITIONS 

©2000-2007 JSTOR