JSTOR Home Skip to Main Content
RESOURCES FOR LIBRARIANS   |  RESOURCES FOR PUBLISHERS  |  PARTICIPATION INFORMATION
Search Journals Browse Journals Tips Set Preferences About JSTOR Contact JSTOR

No. 9, Issue 1, JSTORNEWS, March 2005

A New Search Engine for JSTOR

Background

Since its inception in 1995, JSTOR has used a search engine called "Full Text Lexicographer" or FTL. This proprietary software, developed at the University of Michigan, was specifically designed to search large text files like those underlying the JSTOR archive. FTL served JSTOR and its users for nearly a decade. However, with JSTOR's continued growth, inherent limitations in the software made enhancements and scalability difficult. In the spring of 2004, after nearly a decade of reliance on FTL, JSTOR launched a project to update the search engine to better meet the needs and demands of our users.

This effort began with a survey of JSTOR users' search preferences. Over 1,100 faculty, staff, and students in 50 countries responded. Of the respondents, 41% were graduate students, 23% were faculty members, and the rest were a mix of undergraduates, librarians, independent researchers, and secondary school students. About 60% were from the United States, and 71% identified English as their native language. Not surprisingly, the respondents were some of the more devoted JSTOR users: 36% said they used JSTOR several times each week, and another 29% said they used the resource at least once weekly.

The survey results and accompanying comments suggested that while many users appreciated the quality and comprehensiveness of the content in JSTOR, they found the existing search interface cumbersome to use. In particular, the need to select from a long list of disciplines and journals before starting a search frustrated users who wanted to conveniently search across the entire archive. Users emphasized the importance of interdisciplinary research and asked for a "search all" feature. Whether or not they recognized that this type of search would need to run across millions of pages, "a very fast search" also ranked high on the list of requests!

Together with years' worth of user messages, these survey results helped to direct our development priorities. Improving the speed of a search and introducing the ability to search all disciplines at once were identified as our top goals. Numerous user requests to "simplify, simplify, simplify" also did not go unheard. Our undergraduate and international users said the existing search interface was difficult to understand and use effectively. Introducing a simple search screen became another major objective.

After six months of intensive development work, FTL was retired and replaced by an open source product known as Lucene. Customized and "tuned" to work with the JSTOR archive, the new search engine and interface was previewed to librarians and publishers in early December 2004. A month later, Lucene went "live" to JSTOR users on January 19, 2005.

Lucene's Features & Benefits

Lucene is an open source software search engine tool, written entirely in Java. As an open source product, JSTOR can modify the underlying source code easily if needed for future development. Lucene includes many of the options available in FTL, such as Boolean operators, proximity searching options, and phrase searching; and offers the possibility of including a number of new capabilities in the future, such as fuzzy searching, wildcard options, and more flexible nesting of search terms. Stop words (commonly used words like "and," "for," and "not"), which were not indexed in FTL, are now searchable (although, due to their superabundance, they are only included when users search for a phrase).

Lucene also greatly enhances system performance. In many instances, FTL required 15 seconds or more of processing time for one search. Much of this was due to the organization of the indices that underlay each JSTOR search. In FTL, each journal was given its own index; these indices were then searched consecutively to produce results. Lucene uses a more modern, more scalable architecture in which all content is indexed in one large file. While there are still some types of complex searches that will require several seconds of processing time, most searches now occur in under a second. That translates into faster response times for everyone using JSTOR, and ultimately results in a significant lightening of the load on our servers and cost savings for JSTOR.

For more information about the new features and to see sample searches, please visit the JSTOR Search Help page.

Future Improvements

The initial release of Lucene on January 19th introduced a much faster and more feature-rich set of search interfaces. This is only the first of a series of search engine improvements. The search development team is currently planning future releases including often requested features such as better error handling, a citation search form, and more flexible sorting of search results. JSTOR is also committed to using internal search statistics to improve the relevancy of the results it returns, and to continue to test and better the search interface's usability. It is very important to us that we incorporate the feedback we receive from our users, participating libraries, and publishers. If you have comments about the search interface, we encourage you to write us by selecting the "Contact JSTOR" link on any JSTOR page or by sending an email to support@jstor.org.

Last updated on September 8, 2006


JSTOR HOMESEARCHBROWSETIPSSET PREFERENCES  | ABOUT JSTOR  | CONTACT JSTOR  | TERMS & CONDITIONS 

©2000-2007 JSTOR