Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958353

707 76 4MB

English Pages [30] Year 2010

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958339

496 52 3MB Read more

Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958131

471 28 2MB Read more

Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958322

483 109 2MB Read more

Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958100

477 122 3MB Read more

Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958346

487 100 3MB Read more

Understanding the Semantic Web : bibliographic data and metadata 9780838958070, 9780838958315

591 70 2MB Read more

The Social Semantic Web 9783642011719, 9783642011726, 3642011713

This book offers a brief overview of the Social Web and Semantic Web before it describes popular social media and social

567 72 25MB Read more

Metadata for information management and retrieval : understanding metadata and its use [Second edition.] 9781783302161, 178330216X

1,439 211 5MB Read more

Semantic Web and Model-Driven Engineering 9781118004173, 9781118135068

Content: Chapter 1 Introduction (pages 1–8): Chapter 2 Model?Driven Engineering Foundations (pages 9–20): Chapter 3 Onto

608 130 6MB Read more

Programming the Semantic Web [1 ed.] 0596153813, 9780596153816

With this book, the promise of the Semantic Web -- in which machines can find, share, and combine data on the Web -- is

1,975 276 6MB Read more

Understanding the Semantic Web : bibliographic data and metadata
9780838958070, 9780838958353

Author / Uploaded
Karen Coyle

Citation preview

MEET THE NEW! FACE OF alatechsource.org

ALA TechSource Online

• Access a growing archive of more than 8 years of Library Technology Reports (LTR) and Smart Libraries Newsletter (SLN) • Read full issues online (LTR only) or as downloadable PDFs • Learn from industry-leading practitioners • Share unlimited simultaneous access across your institution • Personalize with RSS alerts, saved items, and emailed favorites • Perform full-text ISBN 978-0-8389-5807-0 searches

Library Technology R

E

P

O

R

T

October 2011 vol. 47 / no. 7 ISSN 0024-2586

S

Expert Guides to Library Systems and Services

alatechsource.org a publishing unit of the American Library Association

FREE SAMPLES @ alatechsource.metapress.com

LIBRARY TECHNOLOGY

9 780838 958070

UNCOVERED, EXPLORED, ONLINE

Subscribe to TechSource Online today! alatechsource.metapress.com

Your support helps fund advocacy, awareness, and accreditation programs for library professionals worldwide.

Analyzing the NextGeneration Catalog Andrew Nagy

Library Technology R

E

P

O

R

T

S

Expert Guides to Library Systems and Services

Analyzing the Next-Generation Catalog Andrew Nagy

alatechsource.org American Library Association

About the Author

Library Technology R

E

P

O

R T

S

ALA TechSource purchases fund advocacy, awareness, and accreditation programs for library professionals worldwide.

Volume 47, Number 7 Analyzing the Next-Generation Catalog ISBN: 978-0-8389-5835-3

American Library Association 50 East Huron St. Chicago, IL 60611-2795 USA alatechsource.org 800-545-2433, ext. 4299 312-944-6780 312-280-5275 (fax)

Advertising Representative Brian Searles, Ad Sales Manager ALA Publishing Dept. [email protected] 312-280-5282 1-800-545-2433, ext. 5282

Editor Dan Freeman [email protected] 312-280-5413

Copy Editor Judith Lauber

Editorial Assistant Megan O’Neill [email protected] 800-545-2433, ext. 3244 312-280-5275 (fax)

Production and Design Tim Clifford, Production Editor Karen Sheets de Gracia, Manager of Design and Composition

Andrew Nagy, an open source evangelist and library technologist, is currently market manager, Discovery Services with Serials Solutions, where he serves as a product manager for the Summon service, a web-scale discovery solution, and the 360 Access Control service, a single-sign-on authentication solution, as well as market liaison. Previously, he was a technology specialist with Villanova University’s Falvey Memorial Library, where he implemented and established the foundation for a technology infrastructure designed to meet the needs of the library’s patrons. There he founded VuFind, a globally adopted open source next-generation catalog. Andrew holds a master’s degree in technology management from the Villanova School of Business at Villanova University, a master of science in computer science from the College of Arts and Sciences at Villanova University, and a bachelor’s degree in information management and technology from the iSchool at Syracuse University.

Abstract Libraries have entered a highly competitive marketplace for providing content to their constituents. Researchers are finding convenience in paying for material from highly accessible websites over accessing materials for free from a library. Web search engines and crowd-source content portals have shifted the value of a library dramatically. However, libraries have begun a transformation from the physical space and collections to the electronic medium. This issue of Library Technology Reports analyzes five different academic libraries to better understand why they have made an investment in a next-generation catalog and what the outcome of this investment has been.

Library Technology Reports (ISSN 0024-2586) is published eight times a year (January, March, April, June, July, September, October, and December) by American Library Association, 50 E. Huron St., Chicago, IL 60611. It is managed by ALA TechSource, a unit of the publishing department of ALA. Periodical postage paid at Chicago, Illinois, and at additional mailing offices. POSTMASTER: Send address changes to Library Technology Reports, 50 E. Huron St., Chicago, IL 60611. Trademarked names appear in the text of this journal. Rather than identify or insert a trademark symbol at the appearance of each name, the authors and the American Library Association state that the names are used for editorial purposes exclusively, to the ultimate benefit of the owners of the trademarks. There is absolutely no intention of infringement on the rights of the trademark owners.

alatechsource.org Copyright © 2011 American Library Association All Rights Reserved.

Subscriptions

alatechsource.org/subscribe

Contents Chapter 1—Preface

Along Came Solr Changes in the Product Landscape

Chapter 2—Next-Generation Service in the Library Notes

5 5 6

8

10

Chapter 3—Defining the Next-Generation Catalog

11

Chapter 4—Deploying the Next-Generation Service

16

Chapter 5—The Impact of the Next-Generation Catalog

18

Chapter 6—Case Studies

21

Chapter 7—Conclusion

26

Products Open Source versus Commercial Solutions Notes Overcoming Librarian Anxieties Library Website Redesign The Deployment Model Search Engine Optimization Analyzing Circulation Analyzing the Website Analyzing the Impact Note

Wake Forest University Oklahoma State University North Carolina State University University of Tennessee at Knoxville Villanova University Note Next Step: Web-Scale Discovery The Future of Discovery Notes

13 14 15 16 16 17 17 18 18 19 20 21 22 23 24 24 25 26 27 27

Chapter 1

Preface

Abstract The changing landscape of library collections has led to a new take on services that libraries offer. While competing with on-demand access of content found on the open web, the library needs to position itself in a new way to capture the interest of its users by providing new services that are compelling, exciting, and self-service oriented. This chapter will identify the impetus for these changes and what technologies have been made available to better support library patrons.

Definitions Federated search: a software solution designed to solve the problem of searching multiple content databases. By taking the user’s search query and broadcasting it to tens or hundreds of databases at the same time, the application can compile a sampling of real-time results into a single relevancy-ranked list. Native XML database: a platform for storing and querying XML-based files. These solutions generally support querying the content via the XQuery standard and indexing the content based on elementlevel rules; they can easily fetch documents based on a unique ID.

Along Came Solr Luckily, at this time—the winter of 2006—a very talented graduate student in the computer science department, Rushikesh Katikar, was enrolled in a work-study Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

E

ver since I first began working for a library in 2004, I have been thinking about how libraries can make their collections and resources more easily accessible in this highly accessible digital world. In 2004 and 2005, I developed an application that leveraged federated search technology to allow researchers to search across the barriers of the individual databases that our library provided access to. However, federated search quickly proved to be inadequate for the needs of a library that subscribed to around 300 databases. Trying to query hundreds of databases and process the results was like trying to squeeze a square peg into a round hole. The problem was not that one product would be better than another, but simply that certain difficulties were inherent in federated search. This technology quickly proved to be one step on the path of the library’s technology-based services. This project led us to another, which I led in 2006, to provide a modern search experience in accessing the library’s collections. This project, now known as VuFind, was initially conceptualized to be what I called an XML framework. Because all the library’s collections, including some of the vendor content, were stored in XML, I quickly realized the possibilities. We

could easily translate MARC to MARCXML; our digital library stored metadata in METS XML; our institutional bibliography could be exported into XML; data from external repositories could be accessed in the OAIPMH XML format. The grand vision was to develop a unified database via a native XML database product. By unifying all of the collections into a single platform, the library could host a single-search-box approach to discovering the library’s collections. While the idea was grand, at the time the complexities of the technology, as well as the performance and scalability, simply did not meet our expectations—it was clear that this was not going to be the killer solution.

5

Library Technology Reports alatechsource.org October 2011

Figure 1 Print versus digital acquisitions spending at Villanova University Falvey Memorial Library, February 2003 to August 2009. Source: Joseph Lucia.

6

program and placed in the library as a part-time software developer. Rushikesh worked side-by-side with me as we evaluated various native XML database products and analyzed their scalability and performance for querying these disparate collections. We evaluated both open source solutions and off-the-shelf commercial products. We first evaluated eXist, an open source database developed in Java. At the time, eXist was very early in its life cycle, and the current release was at a very early version number. We then evaluated a commercial solution, MarkLogic. This product is well regarded in the industry as a workhorse; however, its learning curve and the need for training forced us to push the product to the bottom of the evaluation list. The next product that was evaluated was Berkeley DB XML. Berkeley DB is also a highly regarded and heavily used database product, and its native XML counterpart felt like it would be a contender. However, after many rounds of performance testing and index tweaking, we were unsuccessful in gaining performance and scalability to meet our needs—searching a set of over one million records in under one second. After quite a bit of frustration in tweaking configurations and the lack of scalability in the products evaluated against our collection size, we sought a different technology to support our needs. The first stop was native indexing products, whose core feature provided strengths that the native XML databases lacked. We began to play around with Apache Lucene, the industry standard for indexing. Lucence, a highly regarded and heavily used product for search, can be found in both commercial applications large and small and open source applications. When looking at how Lucene could be woven into our solution, we found Apache Solr—a Java application built on top of the Lucene product. Solr provided everything that we needed: a web-based API to develop front-end applications in any language or platform, the ability to index and store XML structured data, the ability to scale to large collection sizes, the ability to perform Analyzing the Next-Generation Catalog Andrew Nagy

Figure 2 Wesleyan University Library acquisitions spending, 1998– 2009. Source: Pat Tully, “1998–2009: How Libraries Have Changed,” From the University Librarian (blog), Wesleyan University website, Nov. 19, 2009, http://ptully.blogs. wesleyan.edu/2009/11/13/over-the-past-10-years-howlibraries-have-changed.

fast queries and facet on the search results, and a fairly low learning curve. This was the solution.

Changes in the Product Landscape This was a very active time in the library community. Many new products were coming onto the market, and libraries were witnessing a fundamental shift from purchasing physical resources to subscribing to collections of electronic resources. We saw a growth in options around federated search products, and the next-generation-catalog product category had just been created. During my time of employment at Villanova University, the library witnessed a shift in the bulk of acquisitions spending from print materials to electronic between 2003 and 2009—a trend not uncommon at other universities in North America (see figure 1). Wesleyan University had a similar experience between 1998 and 2009 (see figure 2). Similar changes have happened in the publishing industry, as seen from information on the sales of O’Reilly books versus e-books (see figure 3). Around this time, libraries began to see the value in a modern search experience that would meet user expectations, provide a highly customizable user interface, and broaden the scope of what can be discovered in the library. This is when the product line for next-generation catalogs was first established, and the race was on to implement. In a few years, thousands of libraries around the globe have adopted next-generation catalogs. This fairly rapid adoption piqued my interest, and I asked some questions: • How will adoption of NGCs impact the services offered by libraries? • Will adopting NGCs in an effort to provide a better solution for our patrons really make an impact in the research process and use of print materials in

the long run? • Will this highly customized and modern search tool increase the library’s value to its audience?

Figure 3 E-book versus print orders on the O’Reilly website, January 2008 through June 2009. Source: Andrew Savikas, “Does Digital Cannibalize Print? Not Yet,” Radar (blog), O’Reilly website, Aug. 7, 2009, http://radar.oreilly.com/2009/08/doesdigital-cannibalize-print-not-likely.html.

For seven years, I have worked on technology solutions to problems faced by libraries—a custom interface to a federated search product, a digital library platform, a subject guide platform, a questionand-answer system for online reference transactions, a next-generation catalog, and now a web-scale discovery solution. Throughout this period, these questions stayed with me. After seeing a bit of maturity in both age and number of installations hit the next-generation catalog product line, it was time to set out to find some answers.

Library Technology Reports alatechsource.org October 2011

Analyzing the Next-Generation Catalog Andrew Nagy

7

Chapter 2

Next-Generation Service in the Library

Library Technology Reports alatechsource.org October 2011

Abstract

8

The past decade has been witness to dramatic evolutions in library service as patron expectations and the research process have changed. Libraries have found themselves competing with everyday content services like Google and Wikipedia, cafés such as Starbucks and Barnes and Noble, and home delivery and online streaming services such as Netflix. To stay in this information access market, libraries are shifting their services to better meet the demands of the new generation of user. Libraries are changing their look, changing their level of service, and partnering with the competition—all to continue to be a part of the research process and provide content to patrons when and where they expect it. This evolution is necessary to keep the library relevant to the researcher. This chapter will illustrate these new services that libraries are offering and explore what about them is meeting the demand of library patrons.

W

ith the explosion of powerful open-web search engines such as Google and Yahoo! and the introduction in every town of grocery store-style bookstores such as Barnes and Noble, libraries have entered a highly competitive marketplace for providing information access—so competitive that libraries are quickly losing their market share. The web, now so pervasive in industrialized nations, has become a starting place for research. It has become so convenient that the value of the library has diminished in the researcher’s eye. A report by ITHAKA, Ithaka’s 2006 Studies of Key Stakeholders in the Digital Transformation in Higher Education, published in August 2008, showed a decline in the perceived value of the library since the early part of the 2000s.1 Conversely, highly convenient and accessible resources are seeing tremendous usage—the English version of Wikipedia

Analyzing the Next-Generation Catalog Andrew Nagy

serviced an average of 7.139 billion page views per month in 2010.2 With resources such as these that are becoming more accurate and relevant than their printbased alternatives, not to mention freely available 24/7 from the comfort of one’s home, the library has been losing its relevance in the research process. On top of this decline, there has been a global recession that started in 2007 and is still continuing to impact the economy today, causing many libraries to face dramatic budget cuts and staff layoffs. In my hometown of Philadelphia, the city was almost forced to close all fifty-three branch libraries in 2009 due to a large budget deficit.3 This closure would have made an incredible impact in the various neighborhoods that relied on their local library for after-school activities and a safe and comfortable place to work and socialize. Budget declines, declining usage, and declining value have caused libraries to rethink their goals and the services provided to their communities. In an academic library, the majority of patrons are of the Millennial Generation—those born between 1982 and 1995. These Millennials have a few attributes that directly correlate to many changes that libraries have been going through. Over the past decade, academic libraries have introduced self-service kiosks and checkout counters, single-point-of-access information stations, cafés, and gaming facilities (see figure 4), to name just a few of the physical transformations—all to improve the library’s relevance to this generation. Millennials are strongly inclined to be completely selfsufficient and highly confident in themselves, and therefore they tend to not want to ask for assistance. They are accustomed to using technology in everyday life, and research is commonly attacked from the same angle as finding out what movie to go to on a Friday night. With a quick flick of a finger on the iPhone, they

Figure 4 Self-checkout service point in a library.

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

can find out what movies are playing at the closest theater, learn which ones are good, and even watch the trailers. This instant gratification and extreme simplicity are what this generation has grown up with and is accustomed to. This is what they expect. In the earlier half of the 2000s, federated search became a viable solution in the library industry. As online databases became more pervasive and as focus shifted from print to electronic resources, more and more content was found in electronic silos, and the need to make these collections highly visible and easily discoverable was clear. Federated search was intended to provide a convenient interface to the confusing environment that libraries had been constructing with the evolution of content silos. A stepping-stone technology in libraries, federated search was initially seen as the answer to the discoverability of the growing electronic collections. A typical federated search product uses connectors, small pieces of software that translate the search results from the database it is defined to communicate with into a common format, enabling the federated search tool to compile the search results and blend them into a single results set. However, this process has many limitations that may have been overlooked due to the “wow” factor of searching multiple databases at once. A typical database will return only a small number of search results for each search request performed by a federated search tool. Generally this number is around thirty records or so. This is done to control the impact on the database itself, preventing an overload of searches from these broadcasts. Due to the lack of complete data sets returned from each database, federated search tools can present the user with only a limited results set, meaning that the more interesting documents may be hidden from the user. Additionally, this approach results in poor

relevancy of the search results and inability to provide accurate faceted browsing capabilities. Relevancy algorithms take many factors into consideration when calculating the order of the results, and this simply cannot be done to the quality that researchers expect when the tool is analyzing only a small fraction of the entire results set. Faceted navigation, where the search tool pulls out metadata from the records in the results set to allow the user to narrow searches, is also greatly hindered by low visibility into the complete results set. Moreover, the connectors can fail at any time— a database changes its response syntax, the system is down for maintenance, the interface responsible for answering federated search requests moves to a new location—any of these events and many others can simply render the results useless or prevent the results from getting to the user. Federated search also suffers from a lack of scalability in many cases. Federated search was initially designed and developed to combine search results from disparate collections—a common practice in the corporate sector when searching a company’s collection of multiple databases. Generally this required a handful of connections—between ten and fifty. In an academic library, the numbers go beyond the original scope of federated search—such libraries may see many hundreds of databases that are capable of responding to a federated search request. This lack of scalability, poor user experience, and lack of accuracy led this technology to be a stopgap rather than a long-term solution. Through years of using federated search and begging it to do more, libraries have become more prepared for a future that allows the researcher to discover content from different databases in a single session. Libraries began thinking about how to better organize the library’s website—creating an inviting environment that places the emphasis on discovering resources. In conjunction with the demand to make the library more relevant to its audience, library administrators are also finding themselves working to prove the library’s value to the budget makers and administrators of its governing body—a university administration office or local or state government. To do so, libraries have been finding new and innovative ways to show their value: increasing resource usage is one way, but so is providing tools and services that have a larger impact and can help show that the library is being innovative. We have seen libraries launching digital collections that provide access to rare and historic collections that few may even know exist, sending books to Google and the Internet Archive for mass digitization, and providing completely new services that cater to the new generation of users. However, many libraries have shown that by being able to compete with highly accessible resources like Wikipedia and Google, they can make a significant impact in battling the decline discussed earlier.

9

fishing equipment as part of their efforts to better meet the needs of the local population.5 While these initiatives all help attract patrons to the building, similar efforts are needed to drive patrons to the library website. More compelling and supportive functionality for research will help to increase the visibility and usage of the libraries website, which is the front door for the library’s electronic resources—an essential means to subsidize this growing budget line.

Notes Figure 5 Students gaming at NCSU Libraries Learning Commons.

Library Technology Reports alatechsource.org October 2011

We have seen libraries expanding their roles in their communities for quite some time now. Many libraries have been hosting cultural library events to position themselves as a hub in the community. Many academic libraries have been buying into the “learning café” model by installing coffee bars and restaurants in the library, such as the Starbucks at Wake Forest University’s Z. Smith Reynolds Library, which opened on September 30, 2008, providing loud study and work space, nonstructured work spaces with cushy seating, group study areas with technology support, and much more.4 Another popular initiative over the past few years has been providing video game competitions and dedicated gaming areas within the library (see figure 5). Even more customized libraries in Rhode Island, such as the Coventry Public Library, are lending

10

Analyzing the Next-Generation Catalog Andrew Nagy

1. Ross Housewright and Roger Schonfeld, Ithaka’s 2006 Studies of Key Stakeholders in the Digital Transformation in Higher Education (New York: ITHAKA, 2008). 2. Erik Zachte, “Page Views for Wikipedia, Non-mobile, Normalized,” Wikistats: Wikimedia Statistics website, http://stats.wikimedia.org/EN/TablesPageViews Monthly.htm (accessed May 19, 2011). 3. Lisa Chinn, “Sign of the Times: Philadelphia’s Public Library to Close,” World Newser (blog), ABC News website, Sept. 14, 2009, http://blogs.abcnews.com/ theworldnewser/2009/09/sign-of-the-times -philadelphias-public-library-to-close.html. 4. Cheryl Walker, “Starbucks opens Sept. 22, ribboncutting ceremony Sept. 30,” Wake Forest University website, September 19, 2008, http://wfu.edu/news/ release/2008.09.19.s.php. 5. Lisa Vernon-Sparks, “Check It Out: Libraries That Will Lend You Fishing Gear,” Providence Journal website, May 13, 2008, http://www.projo.com/news/content/ WB_fish_borrowing_05-13-08_JL9TOG6_v39.38e8d8c .html.

Chapter 3

Defining the Next-Generation Catalog

Abstract This chapter will define the next-generation catalog (NGC) and briefly look at some of the products in the marketplace.

T

he term next-generation catalog (NGC) first became omnipresent throughout the library industry with the founding of the NGC4Lib mailing list. Eric Lease Morgan of the University of Notre Dame founded the mailing list in order to create a channel for discussion on the topic of the next generation of library OPACs (online public access catalogs). Morgan noted four principles that define the NGC in a posting entitled “Next Generation Library Catalog.”1 These four principles are the following:

Library industry vendors and open source communities have provided solutions that appear to meet these needs—but as we further analyze the solutions, it is clear that they touch only the surface of these needs. It is clear that the NGC solutions that have been used in libraries fail these four principles. Let’s take a closer look. • Principle 1: It is not a catalog. A typical NGC solution is more than just a catalog—many of these products provide the ability to search more than just the bibliographic records from the ILS, such as digital collections produced by the institution or open-access data culled from open repositories.

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

• It is not a catalog. • It avoids multiple databases. • It is bent on providing services against search results. • It is built using things open.

Some of these solutions have included the ability to harvest OAI-based repositories to include additional content in the index. However, this is still very siloed and narrowly focused. The NGC just blurs those boundaries making the distinction even more difficult for the general user. Because the solution expands the data set to include content from a few additional external sources, understanding the boundaries of this system is even more confusing for the user. • Principle 2: It avoids multiple databases. While the NGC by and large has avoided multiple databases, many have incorporated federated search to provide a greater level of access. The NGC was thought of as the single-search-box paradigm that libraries have been dreaming of; however, federated search just exacerbated the problem by creating a less convenient and less simple interface— which was one of the key driving factors for the invention of the NGC. A single database is key to providing a simple interface, which brings us back to the failures in Principle 1. Many NGC solutions have attempted to be more than just a catalog by incorporating additional content, but in doing so have integrated federated search, thereby failing to meet Principle 2. • Principle 3: It is bent on providing services against search results. Many NGC solutions have done very well with this principle. The interface and functionality have all been designed around working with the results set and providing services around it. For example, the incorporation of faceted navigation allows the user to modify results through the use of filters. Many NGC solutions provide recommendation functionality as well as the ability to share results in a more social environment

11

Library Technology Reports alatechsource.org October 2011 12

and to expand the research to external entities such as Google Books or Wikipedia. Due to failures with Principles 1 and 2, these services are still fairly myopic—focused on the smaller collections represented within the NGC. • Principle 4: It is built using things open. Here is another area where the NGC solutions have shone. Many have been built from open source technology and have incorporated functionality to include openaccess content. Two solutions, VuFind and Blacklight, are available under an open source license, allowing them to be downloaded and installed at no cost. Of course, I am refer- Figure 6 Eric Lease Morgan’s diagram of the architecture of a next-generation catalog. ring to direct financial cost and not staffing and resource cost—“free as in kitten, not beer.” Utilizing open catalog. This diagram proves to still be very relesource technology is a great way for the vendor of vant today. However, there are three services that the product to reduce cost and build on a platform are missing from what is listed on the right-hand that other like organizations are also building on. side to allow the NGC to better meet user expectaFor example, consider the widely popular Apache tions. These are recommend, browse, and relate. Solr and Apache Lucene, a search engine platform and an indexing engine respectively. These Recommendations are becoming part and parcel two open source products have become extremely of discovery systems. Amazon.com has been known for popular in the library market and can be found in using this approach to help increase the visibility of almost every product in the NGC market. As these its products and sales; similarly, libraries have been technologies continue to evolve and get better, adopting this model to broaden the exposure of their so will the solutions that are built around them. collections. VuFind provides recommendations based There has been one failure around this principle, on common elements. however; the NGC has not facilitated the open In figure 7, we can see a view of the record for sharing of content in a convenient manner. No The Cathedral and the Bazaar—a popular book about NGC on the market today provides an open shar- open source software. On the right-hand side, we see ing process of MARC records. One open source similar items that are recommended to the user. Below solution, SOPAC, developed by John Blyberg of that, the Other Editions box provides a link to the first the Darien Public Library, has taken on the role edition of the book. of being a collaborative engine of social tags. One Browsing is also a highly valuable approach to library with SOPAC can pool and share tags on website navigation, and the faceted navigation model records in its collections with other libraries that makes that highly intuitive and greatly increases preare using SOPAC. This is a great model that seems cision of the search results. Many sites that adopt the to have seen little adoption; however, a newer faceted navigation model, such as e-commerce sites like commercial product, BiblioCommons, seems to be Bestbuy.com or Shopper.com, allow the user to start not trying to push this approach further. This concept by searching, but by browsing the collection starting of libraries sharing resources and services seems from a list of facet values. If I am searching for a new like a highly valuable proposition that deserves television on the Best Buy website, for example, I start further research and investment. Lastly, while a with TV & Video, then TVs, then LCD TVs (see figure 8). typical NGC uses open content and open source This path allows me to browse through the product line software, it is not able to provide access to all of and get directly to what I am looking for. I don’t have the vast collections of open-access content. to think of search terms up front but am able to browse • Figure 6, a diagram drawn by Morgan in 2006, the taxonomy of terms in a hierarchical manner to find depicts the architecture of a next-generation exactly what I want in a very intuitive way. Analyzing the Next-Generation Catalog Andrew Nagy

Morgan’s assumptions from 2006 are quite visionary and depict a future that goes beyond the NGC. What Morgan has described is what is being adopted today by libraries as the next step in discovery and access, the web-scale discovery solution.

Products

Figure 7 VuFind page on The Cathedral and the Bazaar.

The NGC market has grown over the past five years with a multitude of options, including both commercial and open source options, full turn-key solutions and those that require local development efforts. Here is a sampling of some of the products in the marketplace. AquaBrowser Medialab Solutions BV, founded in 2000 in Amsterdam, the Netherlands—a small company at the time—set out to create a search engine solution that could be customized to the collections of commercial companies, nonprofits, and governments. It quickly found a successful channel working with public, academic, corporate, and government libraries with its AquaBrowser library solution (see figure 9). By 2010, over 800 libraries around the world used AquaBrowser as the search solution. Encore

Figure 8 Best Buy website page showing flat screen LCD TVs.

There is a growing need in the information industry to provide the ability to relate. With the advent of the Semantic Web, building relationships between entities will allow the researcher to understand more about the content that is being studied. Libraries have the ability to help the Semantic Web take shape. By participating in the Semantic Web and evolving cataloging practices, libraries can foster and define these relationships. A next-generation solution can be the tool that allows libraries to do this. The library catalog is an authoritative source on materials held by the library, and other sources are authoritative on subject terms, authors, and call numbers. When these connections are made, the researcher can be better equipped to browse at a more macroscopic level through this notion of the Semantic Web.

Endeca While Endeca is not precisely an NGC, this company and product are worth mentioning. Endeca is a solutions company that provides search engine technology. This widely adopted technology has found a home in the library world. Its first use was by North Carolina State University (see figure 11), and it has expanded from there to libraries that are seeking a highly tailored search solution. This solution requires the library to build its own front-end interface, but its back end is very rich with features and highly scalable. Primo Primo (see figure 12) was first announced in the summer of 2006 and released in summer of 2007. Ex Libris announced Primo as “a single unified solution for the Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

Encore was first announced in the summer of 2006 and released in the summer of 2007. The announcement by Innovative Interfaces (see figure 10) said, “patrons will be able to see everything the library has to offer, in terms of services and content, with minimal effort.”2

13

discovery and delivery of all local and remote scholarly information resources, including books, journals, articles, images, and other digital content.”3

Library Technology Reports alatechsource.org October 2011

VuFind

14

VuFind, an open source solution first released in the summer of 2007 by Villanova University, was intended to provide a leading-edge interface allowing library patrons to discover the library’s collection in the same manner that they are used to when using the open web every day. A product that was developed by libraries for libraries, it made a big splash when the first production installation of the software was deployed by the National Library of Australia in May 2008. VuFind is not the only open source NGC solution available. The number is growing; some of the others are Blacklight, SOPAC, Scriblio, and Summa. Today, many libraries around the world have adopted VuFind and have deployed it as the central point for research on the library website. As you can see from this sampling of products, there is a common thread—they all employ faceted navigation. The idea behind this style of navigation fits the search-and-refine user behavior model, a search behavior that is popular with the Google approach to searching. Users search on a term or set of terms that are relevant to their topic. They then analyze the results and refine the search terms based on the results presented. Facet browsing is an approach that makes this model more effective by presenting users with faceted values of the search results that can then be applied as filters. A user can start with a broad topic, for example “green energy,” and then narrow the results down to something more specific. Faceted navigation has been researched heavily by professor Marti Hearst at the UC Berkeley iSchool, who notes, “Faceted navigation is a proven technique for supporting exploration and discovery and has become enormously popular for integrating navigation and search on vertical websites.”4

Figure 9 AquaBrowser on the Queens Library website.

Figure 10 Encore on the Grand Valley State University website.

Open Source versus Commercial Solutions A library that is looking to implement a commercial solution has a different set of needs from one that is looking to implement an open source solution. While open source may be attractive due to a perceived low cost when compared to commercial solutions, one must remember that open source is “free as in kittens, not beer.” Analyzing the Next-Generation Catalog Andrew Nagy

Figure 11 Endeca on the NCSU Libraries website.

Figure 12 Primo on the University of Tennessee website.

Figure 13 VuFind.

Notes 1. Eric Lease Morgan, “Next Generation Library Catalog,” Infomotions website (originally published on the LITA blog [www.litablog.org]), June 2, 2006, updated Dec. 27, 2007, http://infomotions.com/musings/ngc/index.shtml. 2. Innovative Interfaces, “Innovative Announces Encore” (press release), May 26, 2006, Library Technology Guides website, http://www.librarytechnology.org/ ltg-displaytext.pl?RC=12014. 3. Ex Libris, “Vanderbilt University and University of Minnesota Partner with Ex Libris to Deliver Primo— The Next-generation, User-centric Discovery and Delivery” (press release), June 19, 2006, http://www. exlibrisgroup.com/default.asp?catid={EEF2DEB0987D-45F4-9069-7D1B4178196F}&details_ type=1&itemid={8FDC2D12-51A0-4447-B34EB3FD63614ACF} 4. Marti A. Hearst, “UIs for Faceted Navigation: Recent Advances and Remaining Open Problems” (paper presented at the Workshop on Human-Computer Interaction and Information Retrieval, HCIR 2008, Redmond, WA, Oct. 23, 2008), 1, http://people.ischool. berkeley.edu/~hearst/papers/hcir08.pdf. 5. Evergreen, “Evergreen Downloads” (see paragraph under “Evergreen Virtual Images”), http://open-ils. org/downloads.php#evergreen_vm.

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

While a free kitten is cute and cuddly, it needs lots of love and attention in order to keep it healthy. It also needs care over the years to retain its health, an indirect cost that is associated with its adoption. A free beer is delicious and free—it needs no love and care, just quick consumption. Open source must be viewed as a free kitten: it needs direct involvement to get the solution installed, set up, configured, customized, and launched. Paying for support and maintenance is also an ongoing indirect cost. However, this indirect cost can vary from organization to organization. If you have a software developer in your team, your cost might be lower than the cost to an organization that needs to hire a developer to do the initial installation and maintenance over time. Organizations that have the resources in place and that are already familiar with open source solutions will find that an open source NGC can be a great fit. They can download and install various available solutions in a relatively short time, then test and evaluate each solution for little or no cost. For example, VuFind and Blacklight share many common technologies. Both use the open source Apache Solr for their

underlying search engine, and both use the open source SolrMarc tool for loading MARC records into the index. A library can download and install both and try them out at the same time without having to create two different environments in which to install the applications. The open source Evergreen ILS has even created a snapshot of an operating system with the product already set up and loaded with sample data for immediate deployment into a virtualization application.5 These organizations can communicate with existing users of the software in open collaborative communities to get more insight into the strengths and weaknesses of the product. Evaluating and talking with existing clients of commercial software is not as easy. Of course, commercial software has its strengths—support from product experts, a company that needs to keep the product active and in development, a financial investment in the future of the product. And of course, there is someone to sue when something goes wrong—an actual statement that I heard from a librarian. In every marketplace, there is a fit for open source software and there is a fit for commercial software. Both have their strengths and weaknesses. There is no one right or wrong answer to choose one over the other.

15

Chapter 4

Deploying the NextGeneration Service

Abstract Implementing a new service in a library requires a clear plan and strong communication inside and outside of the library. From developing a committee of staff members responsible for the implementation to creating a marketing plan for informing future users, there are many areas of the process that need to be well planned for success. This chapter will review this process and illustrate some examples.

Library Technology Reports alatechsource.org October 2011

Overcoming Librarian Anxieties

16

Deploying a new component in the library is like the Tuckman’s stages of group development, where the group goes through four stages: forming, storming, norming, and performing. The first stage, forming, is where the library forms its new solution by rolling the solution out to the library. In many cases, this rollout process is a slower, planned-out process—not just ripping the Band-Aid off. Libraries tend to first deploy the solution to the staff of the library for evaluation, followed by a beta deployment, where the traditional system is the main search solution, but users have access to the new solution to allow them to beta test the product and provide their feedback. Finally, the solution is put in place, and the old solution is either retired or reduced to a link for access. Generally, once the solution is out of beta testing and in full production, the library enters the storming phase. This phase is met with anxiety. During this time, bugs or missing elements may be found that were not uncovered during the beta testing phase. Additionally, changing the core search tool can be very unsettling to a library where instruction and research have been done for Analyzing the Next-Generation Catalog Andrew Nagy

many years around a system that looks and acts differently. The NGC introduces a shift in the research process, which can cause a bit of anxiety. However, through adoption and use, the storming process calms and the library moves to the norming stage. This is when bugs and enhancements are worked out, bibliographic instruction has been modified, and users and staff have become comfortable with the end result. The final stage, performing, has been happening over a number of years. The product has been evolving, and the users have become experts. Many libraries with NGCs could not envision going back to the traditional OPAC after a year of steady use of the NGC solution.

Library Website Redesign A fundamental shift in focus and priority that the NGC helped bring to the library was in the use of the library website—the online branch. The model for discovery and access suddenly became a core focus once the NGC was introduced to the library. Many libraries clearly understood the value of a welcoming, relevant homepage with highly functional information architecture that was the virtual front door to the library and its collections. Because the Millennial Generation views the library in much different ways from past generations, the need for a new homepage with a different approach to design and layout became very clear. Libraries began to look at the competition—not the institution next door, but commercial entities. Learning from the research and development done in the commercial market has taught these libraries how to better meet users’ expectations. Models developed by popular websites have become the goal for the redesign process. A highly common practice is a tabbed approach to the central

Figure 14 Yahoo! search box.

search box, as seen on many commercial search engines such as Yahoo! (figure 14). By placing this single search box with different “scopes,” the library can utilize a simple yet powerful component to put discovery and access front and center. By comparing the design of Yahoo! with Google, one sees two different approaches. Google’s homepage makes it very clear what to do when a user lands (or starts, in many cases) there—the user is going to search. When a user lands at Yahoo!, the options are bountiful, and it is not clear that the first action is searching. A library is much more like Yahoo!: it offers more than just searching of the resources; it is a portal into the world of research.

The Deployment Model

Search Engine Optimization A highly popular concept called SEO, or search engine optimization, has been the focus of content providers and e-commerce retailers online for years. By improving the structure and metadata of the content on their webpages, webmasters can increase the visibility of the website in the search results of open web search engines. With search engines such as Google being the starting point for most Internet users, this is the key to visibility on the Internet. This practice has been growing among libraries that have adopted a NGC solution. By improving the quality of the records in the ILS and other collections, these records can be more visible. Additionally, all NGC solutions utilize a faceted navigation model. This navigation system pulls out elements, or facets, of the metadata in the search results, providing the user with the ability to easily navigate through the results of the search. This faceting exposes the metadata in a whole new way. For example, providing a facet on the subject terms in MARC records has exposed errors in a controlled taxonomy of terms. The term for the United States of America, according to the Library of Congress controlled subject term vocabulary, is United States. However, over the years many catalogs have been plagued with cataloging errors that previously had gone undetected. However, via faceting of the metadata elements, many NGCs will expose these errors. For example, a user might find terms such as U.S., US, America, and so on. A user searching on American history may be provided with the ability to narrow the search results down to these options, negatively impacting the user experience and the success of the NGC. These errors and inconsistencies were nearly impossible to detect prior to the NGC. Cleanup of metadata has grown into a common practice after the deployment of the NGC and continues to be an ongoing activity to improve the quality of the user experience.

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

The shift in focus for librarians to better supporting the online user experience has put the deployment of the NGC under high scrutiny. First impressions are everything with the Millennial Generation—if you lose them, regaining their attention can be an incredible feat. Librarians saw that the NGC would be critical to the library’s future and therefore scrutinized the deployment process to leverage its impact. Of the libraries that were analyzed for this report, most followed a common rollout model. The first step, after acquiring the product and working through implementation, was to provide a beta test period. This period had many goals—the libraries wanted user feedback and also wanted to build the library staff’s comfort level with the product before the general release. Libraries seemed hesitant to deploy an NGC without this beta test period. Librarians wanted to gain familiarity with the new solution and have time to use the system themselves to find library materials. This step allowed librarians to build the comfort needed to work with patrons on the new solution. The group of libraries that were evaluated seemed to take this step during a few months prior to the start of a semester. The next step in the process was the launch. Taking a more forceful approach by redesigning the library’s homepage to include the new search box front and center, and minimizing the homepage to add more focus to the new search box, proved to be essential. Many of

the libraries put the new solution front and center on the library’s homepage and hid or removed the “classic” OPAC. By forcing the use of the new solution, they helped the adoption rate and raised the comfort level of the users much more quickly.

17

Chapter 5

Library Technology Reports alatechsource.org October 2011

The Impact of the NextGeneration Catalog

18

Abstract

Analyzing Circulation

In the past decade, libraries have shown some major evolutions in the types of services they provide and the platforms in which they deliver these services. From the learning café, to patron-driven acquisitions, to the NGC discovery solution, libraries are evolving to match the demands of changing user behaviors. But a key part of the process that is commonly overlooked is the use of data analysis to drive decisions. By utilizing usage analysis tools, we can build a picture of how certain changes and evolutions affect patrons and how services are used. By constantly analyzing reports on usage, librarians can better judge how to improve services to meet the demands of their patrons.

The NGC solution is designed to make the library’s physical collections more easily discoverable. A good metric to show the NGC’s impact is to analyze the usage of the library’s physical collections. We can do this by analyzing the circulation statistics (see figures 15 and 16). By analyzing the sample set of libraries in this report who have implemented an NGC solution, we can conclude that while the NGC solution has not increased the circulation of physical materials in the library, it has contributed to the prevention of a plummeting circulation. With the focus of library spending changing from physical resources to electronic resources and with many libraries shrinking their physical collections, we would expect to see a decrease in circulation. However, this sample shows that the NGC has had a positive impact on the usage of physical resources.

T

o have a clear understanding of the impact of the NGC on the library, we need to evaluate usage of many different aspects of the library’s services. A library website that is designed around search and discovery may very well increase the visibility of the library to the community that it serves. With this in mind, we need to analyze the usage of the library website before and after the implementation of the NGC. Because the visibility of the library is affected, the physical building usage may change as well, and an analysis of the use of the physical building would also be valuable. Further, because the NGC is heightening the awareness of the library’s physical collections making them more easily discoverable, the use of these collections needs to be analyzed as well to prove the effectiveness of the solution. Finally a comparison of the usage of the original OPAC versus the usage of the new NGC solution will help show the success of the solution in place. Analyzing the Next-Generation Catalog Andrew Nagy

Analyzing the Website By analyzing its website, a library gains valuable knowledge about the activity happening on the site. This allows essential visibility into the activity happening on the library’s website. These log files contain a tremendous amount of detail that can produce reports such as path analysis, search engine optimization, and referring web site reports. In addition to the website’s log files, tools can be added to the website to analyze user activities in other ways. Heat map technology can show what areas on a webpage are used more than others, allowing the website owner to learn what links and features are used heavily and what goes unused.

Figure 15 Circulation statistics for two years before and two years after implementation of the NGC solution at Oklahoma State University.

Figure 16 Circulation statistics for one year before and two years after implementation of the NGC solution at Villanova University.

Analyzing the Impact Because so many factors play a role in the impact of next-generation services, it can be difficult to use analytics to isolate the effect of any one specific service. For example, around the same time that Wake Forest University deployed its NGC solution, it also established a Starbucks coffee lounge in the library. This new service will most likely impact many other services, such as the gate counts, library website usage, and circulation of materials— all the statistics that need to be reviewed to

Figure 17 A heat map image created by CrazyEgg that overlays the web page to illustrate user click locations.

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

Figure 17 is an example of a heat map created with Crazy Egg, a product that records the location of each user’s cursor on the screen when the user clicks. This information is used to produce a heat map image that can be overlaid on the webpage to illustrate what parts of the website are the most frequently used. Red color shows spots that are most frequently used. Cooler colors, like greens and blues, show spots that are less frequently used. Crazy Egg and other similar tools use a bit of JavaScript added to a webpage that allows the software to work. This approach is similar to that of other usage analysis tools, such as the popular Google Analytics (figure 18). Google Analytics is a free tool provided by Google. Google keeps the data about your users collected by the software. If Google decided to terminate or drastically change the product, you could potentially lose your usage data. However, other similar tools are available that do not require Google to store your usage data; these tools include the commercial version of Google Analytics, known as Urchin, or the open source alternative Piwik. By storing the data in your own IT environment, you control the data retention policies. However, the down side is the cost in resources to store and report on the data. These tools can also be configured to track the usage of the library OPAC and NGC. Many website usage tools provide the ability to parse specific elements out of the URL that is recorded in the website’s log files. This allows the analyzer to pull out search queries or other specific elements that are specified in the site’s URLs. By combining the library’s website, OPAC, NGC, and any other online resources the library manages into a single reporting system, this can be a very powerful solution for analyzing how patrons use the library’s online environment.

19

Figure 18 A screenshot of Google Analytics.

Library Technology Reports alatechsource.org October 2011

calculate the impact of the NGC. So we have to take these other factors into consideration when analyzing the impact of such a solution; however, in order to provide empirical evidence, we must, as physicists say, “ignore wind resistance.”

20

Analyzing the Next-Generation Catalog Andrew Nagy

Another institution in North America that deployed an NGC solution, York College in the Greater Toronto area, has announced that its VuFind solution has increased usage more than five times over the original OPAC. The OPAC was peaking out at just under 10,000 page views per day, while the NGC was peaking at around 50,000. William Denton, web librarian at York University, reports that “according to Google Analytics, people average about 60 seconds looking at the search results pages in VuFind, and 90 seconds looking at item records. In the classic catalogue, they spend about 45 seconds per page average.”1 We don’t know what other factors were in play at York during the time of the rollout of its NGC solution—but this evidence shows that the NGC solution has made a tremendous impact.

Note 1. William Denton, “VuFind Usage Fives Times That of “Classic Catalogue,” Miskatonic University Press (website), Dec. 17, 2010, www.miskatonic.org/2010/ 12/17/vufind-usage-fives-times-classic-catalogue.

Chapter 6

Case Studies

Abstract The case studies in this chapter illustrate what pushed each library to implement an NGC, the steps involved, and the outcomes of the solution. Interviews are with Lynn Sutton, Library Dean, and Erik Mitchell, Assistant Director for Technology Services from Wake Forest University; Anne Prestamo, Associate Dean for Collection and Technology Services, from Oklahoma State University; Greg Raschke, Associate Director for Collections and Scholarly Communication from North Carolina State University; Allison Sharp Bolorizadeh, Assistant Professor and Instructional Services Librarian from University of Tennessee Knoxville; and Joseph Lucia, University Librarian and Library Director from Villanova University.

• Interviewees: Library Dean Lynn Sutton and Assistant Director for Technology Services Erik Mitchell • Product: VuFind • Launch Date: July 2009

During the summer of 2009, Wake Forest University (WFU), a university of over 7,000 enrolled students, launched an NGC at the Z. Smith Reynolds Library. I spoke with Library Dean Lynn Sutton and Assistant Director for Technology Services Erik Mitchell about their implementation of VuFind to learn more about why they chose the product and what impact it has had on the library. Prior to the implementation, the staff at WFU had a general dissatisfaction with the OPAC that was in place. They wanted to be able to provide an experience to their users that the users would be familiar with as well as free themselves from “vendor lock-in.” In addition, they had a lack of confidence in the ability of the vendor that they had been working with to provide a

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

Wake Forest University

solution that would meet both their needs and their patrons’ needs. There were a few institutions that were beginning to make some noise around the discoverability of collections in the library industry. North Carolina State University (NCSU) launched a homegrown catalog built on top of a commercial search engine, Endeca. This platform allowed the developers at NCSU Libraries to build a web application that brought together the intuitiveness of a Google search, the power of faceted searching, and the highly customized look and feel that libraries have been wanting in one tool. Shortly thereafter, Villanova University launched VuFind as an open source solution. Sutton had known Joseph Lucia, University Librarian at Villanova University, and began to have conversations with him. After briefly evaluating the few commercial products that had recently come to market, they mutually agreed that the open source products available provided more attractive cost and features than the commercial products. This, mixed with a general lack of trust in the vendors, made it a clear path forward. It was then that Mitchell and Sutton decided to move forward with an implementation of VuFind as a new platform for their patrons. As with all product evaluations, the open source solutions were not all peaches and cream. Sutton did have some hesitations. “I was worried about the days of Notus and VTLS when ILS systems were developed at a library and then became commercial products,” said Sutton, talking about the concern of relying on an open source solution that was developed by a library. She went on to say that after a conversation with Lucia, he comforted her by saying that they were in a whole new ballgame. VuFind at WFU quickly became a success for the technology team at the library. They were able to prove that an open source solution could perform at the levels that their librarians and users demand. VuFind became a “quick win” and opened the doors for more open source solutions. One of the most beneficial traits

21

Library Technology Reports alatechsource.org October 2011 22

of open source software is the ability of the team to quickly deploy new copies of the software, both for testing newer versions and for upgrading infrastructure. WFU has migrated all of its locally hosted applications in the library to a cloud-based platform, and its open source products have made that migration very easy. The implementation of VuFind was spread out over a period of six months that they called “in beta.” The team consisted of one developer—Mitchell—and one user interface specialist. Between the two, they installed and set up VuFind with the library’s catalog records very quickly. They then customized the tool in ways that open source software would allow. By making customizations to the user interface, they were able to control the usability and performance of the application to meet the needs of their institution. Altering the underlying code was also a project that they undertook. They tweaked the index schema to better match their cataloging models as well as analyzing the tool for performance enhancements. During the six-month beta period, they provided their staff with a bug-tracking tool to report issues and request feature enhancements. They relied on the VuFind community quite heavily for support during this period, which worked out very well for them. The VuFind community was quite active during this time, and they were able to get the support that they needed in a very timely manner. The early adoption by their public services staff was very slow to start. They received an overwhelming amount of feedback, much of which was negative. The staff had a hard time understanding the new approaches to the subject term facets as well as the lack of name authority controls. But it was this early period of discomfort that lead to the success of the project. Without this approach to force the use of the product, Sutton and Mitchell felt that the project would have been stuck in perpetual beta and eventually failed.

Oklahoma State University • Interviewee: Associate Dean for Collection and Technology Services Anne Prestamo • Product: AquaBrowser from Serials Solutions • Launch Date: August 2007

The impetus for the NGC at Oklahoma State University (OSU) was in 2006, when Roy Tennant began speaking in a public forum about how libraries are stuck in a position where we are putting “lipstick on a pig.” This notion made Anne Prestamo, Associate Dean for Collection and Technology Services at OSU, begin thinking about the need for a solution that is not built out of the ILS system. When North Carolina State University launched its Endeca-based catalog in 2006, the staff at the OSU libraries were ready for a change. When they began to canvass the market for options, they found a few products in their infancy, but they also found a Analyzing the Next-Generation Catalog Andrew Nagy

Figure 19 Design of library homepage after NGC deployment

solution from the Netherlands that was beginning to make some headway in the industry—AquaBrowser. Its innovative user interface and price point made this solution highly attractive. Additionally, with its adoption, OSU won the prestige of being one of the first academic libraries to adopt the solution and the first library using Ex Libris’s Voyager ILS. The goals for deploying AquaBrowser to the academic community were simple—to “get users better results,” said Prestamo. OSU wanted a solution that would not require the end user to learn how to search the “librarian way.” This goal was very important to the administrators at the library; they wanted to be able to provide their staff with the ability to focus on performing librarian duties rather than on helping frustrated students navigate through a cryptic search tool. Because they bought into the notion that “If they have to be taught how to use it, they won’t use it,” a self-guided solution was key. The implementation of the product was a bit more hands-on than simply purchasing a vended product, setting it up, and deploying it. Since AquaBrowser was new to the academic market and to the Voyager ILS, some joint development was necessary between the librarians and the development team at the then MediaLab Solutions BV. While bringing up the product took a short amount of time, development continued throughout the year to continue to optimize the interaction with the Voyager ILS server. The second year continued with new developments such as integration with the federated search platform, Serials Solutions’ 360 Search, and expansion throughout the five campuses within the OSU system. When BOSS (Big Orange Search System) went live to the campus, reactions were very positive. Students

loved it and began using it right away. There were mixed reactions to the word-cloud feature that sets AquaBrowser apart from the rest of the products available. The word cloud was a very innovative technology at the time; it allowed users to easily disambiguate their search by navigating the cloud of related terms. A user could then click on a term to narrow the results. The cloud has associated terms, translated terms, spelling variations, and terms from a thesaurus as well as the original search term. This level of visualization helps educate the user about other search options and strategies. This model aligns very well with the research work from Professor Marti Hearst at UC Berkeley (as mentioned in chapter 3), who studied the impact of visual tools in relationship to faceted navigation. Although innovative for its time, the word cloud received mixed reactions from the patrons of the OSU libraries. One nice story from Prestamo was about a soon-to-retire faculty member who was a big user of the library. This faculty member was not known for giving praise, but was not shy about letting the library administration know if he was unhappy about something. One day, shortly after the launch, the professor rang the library. This was a surprising occasion, as he was calling to sing its praises. He shared his excitement about the new discovery system and wondered why the library had to wait until his impending retirement to roll out such a wonderful tool.

North Carolina State University • Interviewee: Associate Director for Collections and Scholarly Communication Greg Raschke • Product: ProFind from Endeca • Launch Date: November 2006

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

With a library system that is seen by many as an innovator, North Carolina State University (NCSU), a sciences- and engineering-focused institution, has had the leadership required to keep the services its libraries offer in line with the expectations of their patrons. In 2005, the main library at NCSU began a project to redesign its website. The website redesign project was led by a team of developers and librarians who were focused on the user experience of their patrons in the online environment. During this project, they spent time studying their users through usability studies, learning how their patrons use resources and perform research activities. These studies made it very clear that the interface for searching the library’s collections simply did not meet the expectations of the users. For example, the OPAC returned search results in a LIFO (last in, first out) order—meaning the most recently cataloged items would show up on the top of the results list—rather than presenting a list of items in order of relevance to the search input. This clearly left users with a lack of understanding of the search process and

a lack of desire to use the library’s current platform, leaving them with an easy alternative—Google! The focus of the website redesign team was to provide an experience that met users’ expectations and would increase usage of the library’s resources. The team began to evaluate solutions external to the library marketplace, commercial websites that strongly emphasized search and user experience—Epicurious.com, Walmart.com, and Target.com. These sites inspired the team at NCSU to choose a platform that could provide high-quality search results and could be tailored to their collections. A meeting with a company called Endeca, which is known for its search engine software (which powers sites such as Walmart.com, Homedepot.com, and Target.com), led to NCSU’s decision to develop its own catalog solution with the Endeca search engine software. The team felt very confident that they could get an easy win by integrating the Endeca software into their new website platform to create an intuitive and seamless website experience for their users. This confidence won over the library dean and led to a revolution in the OPAC, sparking a new way of thinking in the library world about user experience and introducing the NGC as a way to optimize that experience. The project started in the fall semester of 2005, and a solution was ready for user testing by November 2005. This testing went on until the end of that semester in December. By the start of the following fall semester, library patrons found themselves with a completely new experience on the NCSU libraries’ website. Users started their research from a single search box, which the undergraduates instantly loved. During that initial semester, the “classic OPAC” was still available, allowing staff and diehards to perform the wonderful complex searches that they had done for years. But just a few months later, by August 2006, the classic OPAC was turned off and no longer used. Library staff members were not as enthusiastic as the undergrads, but by the time the Endeca solution was the only available solution, staff members were singing its praises. Now, more than five years later, Greg Raschke, Associate Director for Collections and Scholarly Communication, says that project was a big success. Not only did it put NCSU libraries on the map and win the prestigious Endeca Navigator Award for building a highly successful solution, but it also provided a boost for the library. Circulation statistics show an increase after the launch, and the eBook filter, a feature that was not available from the classic OPAC, is one of the most popular filter options in the search results. Additionally, the TRLN (Triangle Research Libraries Network), a consortium of four academic libraries in the Triangle Research area—Duke University, North Carolina Central University, North Carolina State University, and the University of North Carolina at Chapel Hill—adopted Endeca as the source for its union catalog. This was another nice win for the team at NCSU as they were

23

now able to share some of the burden of software development and maintenance with a larger team, increasing the ability to further evolve the front end application. Raschke is now thinking with his digital library team about how to further the user experience. They are thinking about the best ways to integrate and interweave web-scale solutions, such as their subscription to the Serials Solutions’ Summon service, into their highly tuned web environment. They are interested in how to combine such diverse yet essential resources into a single experience in a way that is coherent to the end user. For example, they have integrated into their website their local collections, their regional network collections, their statewide collections, and the more global academic collections available to them from Summon. The team plans to further explore how these collections collaborate as they move to a web-scale environment and how to make that a part of their “big success.”

University of Tennessee at Knoxville

Library Technology Reports alatechsource.org October 2011

• Interviewee: Assistant Professor and Instructional Services Librarian Allison Sharp Bolorizadeh • Product: Primo from Ex Libris • Launch Date: August 2009

24

The University of Tennessee, an institution of over twenty-seven thousand students nestled in Eastern Tennessee near the Smoky Mountains, began an evaluation of its library’s online environment in early 2009. The library leadership realized that the library could provide a better user experience, and they saw patrons going to Google as a starting point for research. As a result, the University of Tennessee, Knoxville began seeking a solution that provided an experience that matched that of Google—a tool that did not require an expert to teach users how to navigate the vast collections in the library. Seeing students fumble in the library website and turn to easy-to-use resources like Google had led to increased frustration. The library had put a large effort into making the tone of its physical space inviting and easily accessible; now the goal was to develop the online environment in a similar fashion. In 2009, the committee responsible for the library’s website led a redesign project. Choosing a new platform to deliver a high-quality search experience was core to this charge. At the time, the NGC market included options from vendors in the library industry, from the technology sector, and from the open source community. When evaluating some of the options, the team looked for something that integrated well with the existing ILS system, Ex Libris’s Aleph, and the federated search platform, MetaLib. While the library staff appreciated open source tools, that option was off the table because the team felt they needed vendor support and a full turn-key solution. Ex Libris was able to Analyzing the Next-Generation Catalog Andrew Nagy

provide such a platform with Primo. During the summer of 2009, the new library website and catalog were being prepared for launch. By August 2009, they were up and running for the staff and the downsized population on campus to test and provide feedback. By the start of the fall semester of 2009, the Primo catalog was deployed as the 0main access point to the library’s collections. Adoption by undergraduates that semester was quick: they began using the new tool immediately and without concern. Allison Sharp Bolorizadeh, assistant professor and Instructional Services Librarian, noted that it was as if the students didn’t even notice the change since the interface was familiar to them. On the other hand, faculty members were a bit skeptical at the outset. Library staff members were also a bit skeptical; however, because they needed to learn to use the new tool so that they could support instruction, staff adoption was not a problem for long. As a result of negotiations between the Primo implementation team and rest of the library staff, access to the “classic OPAC” was retained so that some specific advanced types of searching would remain available. The deployment of Primo happened in conjunction with the website redesign during the summer of 2009. Two groups were involved with that summer’s work— the Primo implementation team, which consisted mainly of staff from the systems department and members of the administration team, and the virtual library steering committee, a much larger group made up of staff from within and outside of the library. These teams were focused on more than just the deployment of the Primo product; they produced materials for marketing the new website and the new search solution along with video tutorials on how to use it. Looking back, the new library environment has resulted in many successes. The most important was a paradigm shift in the thinking of the library staff, bringing the discovery of library resources to the forefront and making it a high priority. Moreover, other resources witnessed an increase due to the redesign work; for example, the library witnessed an increase in usage of its federated search platform due to the integration with Primo, which is now front and center on the library homepage.

Villanova University • Interviewee: University Librarian Joseph Lucia

In late 2006, the library at Villanova University set out to achieve what the North Carolina State University libraries had achieved: complete control over the primary discovery interface to offer a fully customized environment for their patrons. By the summer of 2007, the library had pioneered one of the early products on the market, VuFind, an open source NGC. According to University Librarian Joseph Lucia,

at the time that they began development on VuFind, traditional OPACs were inadequate to meet the users’ expectations. This disconnect was making a significant impact on the usage of the library’s resources as more and more users were gravitating to tools such as Google and Amazon for finding relevant materials. Because another institution had made a big splash with its homegrown catalog, Villanova had confidence that it could achieve the same. NCSU’s Endecabased homegrown catalog proved to many libraries that a better solution was possible, and it didn’t have to come from an ILS vendor. Having a custom-made solution allowed Villanova to embed such a search tool in the library’s web environment. This allowed complete control over the look and feel, as well as the wording and functionality, to meet the demands of the new generation of users. Moreover, Lucia saw building a solution mutually with other libraries as core to the mission and goals of libraries. He sees this type of cooperation as fundamental. The NGC Value to Libraries

Open Source Solutions

When Villanova University was looking to provide these new models of access for its patrons, it became apparent that the open source model melded well with the library’s mission and was a clear path forward. Of

The Outcome at Villanova

Students’ adoption of the NGC solution at Villanova has been very similar to the experiences of other libraries with a solution in place. Adoption has been almost seamless. Lucia commented that it has been a “friendly, familiar, comfortable interface that students like and use with a great deal of facility. It is much less intimidating and confusing than traditional library catalogs.” Lucia also suggested that the hypothesis set forth in chapter 5 is correct—that seeing no significant change in circulation statistics shows that the NGC is successful because we would expect circulation to decrease as libraries devote less focus and spending to their physical materials. However, Villanova had not performed a thorough evaluation of usage of its ILS OPAC, so it does not have a proper basis for comparison.

Note 1. Sandra Davie, “Gen Y @ Work,” The Straits Times, May 12, 2008, www.asiaone.com/Business/Office/ Learn/Story/A1Story20080511-64480.html (accessed May 23, 2010).

Analyzing the Next-Generation Catalog Andrew Nagy

Library Technology Reports alatechsource.org October 2011

Lucia argued that “libraries traditionally viewed content myopically.” He pointed out that the common college student grew up with the Internet and viewed collections differently in a very broad way. The rise of instant communication technologies made possible by the Internet and mobile networks, such as e-mail, texting, and IM, and new media used through websites like YouTube and social networking sites like Facebook, and Twitter may explain the Millennials’ reputation for being somewhat peer-oriented.1 These users are highly independent and don’t commonly ask for help. More and more libraries are pushing alternatives to the traditional reference desk. Rather than supporting one-on-one sessions, librarians are focusing more on group sessions. Many libraries offer chat and texting to communicate with librarians. Supporting an NGC in a library allowed the library to present its collections in a different way to meet the needs of this new generation. Resources and collections could be “recommended, sliced, and presented differently,” explained Lucia. He said that “the NGC was the first generation answer to this need and we are now facing enhancements and revisions to that model.”

course, these solutions have their challenges, such as how to support them and how to solve complex problems in a timely manner. However, Lucia continues to argue that open source is a perfect fit in libraries, saying that “libraries exist to remove the barriers around intellectual property for our communities so that creativity and inquiry are facilitated.” He said further, “Open source as a model practice for the sharing of knowledge and creative enterprise in a collective community environment is really emblematic of the entire mission of libraries.” Discussing how VuFind has made an impact on the library community, Lucia continued, “You can make a really compelling and effective open source project with a relatively constrained cohort of developers. The active developers on VuFind are probably between ten or fifteen, but the level of continuous enhancement in the community is really substantial. The broader engagement has made it possible for libraries to really shape the tools that communicate who and what we are to our constituents. This gives libraries a much deeper stake in what we build and how we build it and opens the door to new kinds of collaborations across libraries.” He concluded, “We are at the very, very front end of a process that is going to take place in the next decade that is going to be very transformational.”

25

Chapter 7

Conclusion

Abstract The next-generation catalog has clearly made an impact, and that impact can be measured; however, it is clear that these solutions have not met the demands of it users to the fullest and have not solved the problem of creating a compelling starting point for the research process. This chapter identifies how the NGC has impacted libraries and what lies in the future of discovery for libraries.

Library Technology Reports alatechsource.org October 2011

I

26

t is clear today that the NGC has had a significant impact on the library by providing a search experience that was more familiar to library patrons and easier to use. Libraries that have deployed such solutions have not seen a dramatic increase—or any increase at all—in circulation of the physical collection, but they have also not seen a decrease. This is important. When the direction of spending at academic libraries is taken into account, it indirectly shows success of the NGC as book collections continue to shrink in favor of electronic collections. The NGC is a stepping-stone technology, preparing libraries for the migration to web-scale services. The NGC is after all a catalog of content that is held by the library, and according to Eric Lease Morgan’s principles discussed in chapter 3, a failure in the model. The NGC does not meet the users’ expectations: it provides a convenient and compelling single search box, but it does not search everything. This failure has opened a position for web-scale discovery to make an entrance to the library marketplace. Acting as a stepping stone, the NGC has helped libraries prepare their local collections for migration to the web-scale discovery model, making the adoption of this new technology easier. The web-scale discovery products on the market today put a big emphasis on the power of the facets—which was Analyzing the Next-Generation Catalog Andrew Nagy

the impetus for the cataloging staff at many libraries to have taken on a new workflow. By exposing various elements of the metadata in a whole new way, the NGC has led many libraries to invest effort into cleaning up the metadata found in their records to ensure that everything followed their local cataloging standards. Since this work has been done in connection with the adoption of the NGC, moving to the web-scale solutions has been made easier. Additionally, systems librarians have also become familiar with additional workflow elements sparked by the NGC, such as the routines of exporting the bibliographic records and dealing with deleted and suppressed records. With this learning curve lessened, libraries will have a smoother transition to the end goal, which is web-scale discovery.

Next Step: Web-Scale Discovery Web-scale discovery has taken over the library market with a fury. This product line is just over two years old and already has captured the minds of librarians around the world. The idea to go beyond the NGC to provide a single search box to the full breadth of the library’s collections is extremely compelling to both the library and the researcher. Because of this interest, the market has become highly competitive. We learned more about this market in the January 2011 issue of Library Technology Reports, in which Jason Vaughn of UNLV reviewed four products available at the time.1 In the past two years, we have seen hundreds of academic libraries jump on the opportunity to go beyond the NGC to provide a compelling starting place and bring the researcher back into the library. With this product line taking over the interests of the library administration, is it safe to say that the NGC is dead?

The Future of Discovery In the early half of the 2000 decade, federated search was the next big thing; in 2006, the NGC came to the library market with a slow but steady adoption rate; in 2009, web-scale discovery exploded as the solution to discovery in the library. But it is safe to say that we will see something bigger and better in the coming years. Moore’s Law, named after Gordon Moore, the cofounder of Intel Corporation, states that the number of transistors (or commonly, processing power) in a CPU doubles approximately every two years.2 While this law is specifically about integrated circuit design, it can be applied to other technologies. It helps with assumptions about the evolution of technology and the rate at which this evolution occurs. While the web-scale discovery product line is still young in the marketplace, we need to keep our ears to the ground for what might be new. The Google Book project has been buzzing in the library community since its inception. It has a feature that is very compelling to librarians—making the full text of print materials in the stacks discoverable in a very convenient and familiar interface. Google Scholar has also been able to catch the eye of the young researcher by making it possible to find academic research very easily from the same environment that researcher uses for everyday activities. Many libraries have been thinking about how to better leverage these sources. We see the Google Books logo appearing as a link within many of the NGC products mentioned in earlier chapters. However, the answer is not forcing

the user off to Google—libraries need to continue to keep their stake in the research process. With the web-scale discovery solutions on the market adding the publically available metadata from the HathiTrust Digital Library, there are signs of a focus on making content from beyond the confines of the library’s collection to be discoverable within the NGC. The focus of the NGC had always been the library’s defined collections; however, it is beneficial to move beyond that and expand the resources that the researcher can discover. Some of the more recent enhancements to the VuFind code base support harvesting disparate collections; some library users of VuFind have begun loading collections such as the freely available HathiTrust metadata collection to provide a more encompassing NGC solution. This is a step in the right direction to allow academic books outside of the library’s controlled collection to be more easily discoverable. In March 2011, Serials Solutions formed an agreement with the HathiTrust Digital Library to allow the full text of the digitized print material collection to be discoverable within its Summon product. This further exemplifies the power of the web-scale discovery solution and the shortcomings of the NGC solutions.

Notes 1. Jason Vaughn, “Web Scale Discovery Services,” Library Technology Reports 47, no. 1 (Jan. 2011). 2. http://en.wikipedia.org/wiki/Moore’s_law.

Library Technology Reports alatechsource.org October 2011

Analyzing the Next-Generation Catalog Andrew Nagy

27

Library Technology Reports Respond to Your Library’s Digital Dilemmas Eight times per year, Library Technology Reports (LTR) provides library professionals with insightful elucidation, covering the technology and technological issues the library world grapples with on a daily basis in the information age. Library Technology Reports 2011, Vol. 47 January 47:1

“Web Scale Discovery Services” by Jason Vaughan

February/ March 47:2

“Libraries and Mobile Services” by Cody W. Hanson

April 47:3 May/June 47:4 July 47:5 August/ September 47:6 October 47:7 November/ December 47:8

“Using WordPress as a Library Content Management System” by Kyle M. L. Jones and Polly Alida-Farrington “Librarians’ Assessments of Automation Systems: Survey Results, 2007–2010” by Marshall Breeding and Andromeda Yelton “Using Web Analytics in the Library” by Kate Marek “The Transforming Public Library Technology Infrastructure” by ALA Office for Research and Statistics “Analyzing the Next-Generation Catalog” by Andrew Nagy “The No Shelf Required Guide to E-book Purchasing” by Sue Polanka

alatechsource.org ALA TechSource, a unit of the publishing department of the American Library Association