MARC and metadata - METS, MODS, and MARCXML : Current and Future Implications 9781845443931, 9780861769841

This paper describes the MARCXML architecture implemented at the Library of Congress. It gives an overview of the compon

159 12 2MB

English Pages 127 Year 2004

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

MARC and metadata - METS, MODS, and MARCXML : Current and Future Implications
 9781845443931, 9780861769841

Citation preview

lht_cover_(i).qxd

21/06/2005

15:18

Page 1

Volume 22 Number 2 2004

ISBN 0-86176-984-8

ISSN 0737-8831

Library Hi Tech MARC and metadata: METS, MODS and MARCXML: current and future implications. Part 2 Theme Editor: Bradford Lee Eden

Library Link www.emeraldinsight.com/librarylink

www.emeraldinsight.com

Library Hi Tech Volume 22, Number 2, 2004

ISSN 0737-8831

MARC and metadata: METS, MODS and MARCXML: current and future implications. Part 2 Theme Editor: Bradford Lee Eden

Contents 115 Access this journal online 116 Abstracts & keywords 119 EDITORIAL Selection for digital preservation Michael Seadle

Theme articles 122 Using XSLT to manipulate MARC metadata Corey Keith 131 Meta-information about MARC: an XML framework for validation, explanation and help systems Joaquim Ramos de Carvalho, Maria Ineˆs Cordeiro, Anto´nio Lopes and Miguel Vieira 138 Creating metadata practices for MIT’s OpenCourseWare Project Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman

144 Medium or message? A new look at standards, structures, and schemata for managing electronic resources Sharon E. Farb and Angela Riggio 153 Repurposing MARC metadata: using digital project experience to develop a metadata management design Martin Kurth, David Ruddy and Nathan Rupp 166 Future considerations: the functional library systems record Karen Coyle 175 A bibliographic metadata infrastructure for the twenty-first century Roy Tennant

Access this journal electronically The current and past volumes of this journal are available at:

www.emeraldinsight.com/0737-8831.htm You can also search over 100 additional Emerald journals in Emerald Fulltext:

www.emeraldinsight.com/ft See page following contents for full details of what your access includes.

Contents

(continued)

Other articles 182 A comparative review of common user interface products Daniel G. Dorner and AnneMarie Curtis 198 Visual image repositories at the Washington State University Libraries Trevor J. Bond 209 GIS in the management of library pick-up books Jingfeng Xia 217 PSU Gateway Library: electronic library in transition Lesley M. Moyo 227 ProPrint world-wide print-on-demand services for study and research Elmar Mittler and Matthias Schulz

Columns 231 ABOUT XML Patently ridiculous Judith Wusteman 238 Book review

www.emeraldinsight.com/lht.htm As a subscriber to this journal, you can benefit from instant, electronic access to this title via Emerald Fulltext. Your access includes a variety of features that increase the value of your journal subscription.

How to access this journal electronically To benefit from electronic access to this journal you first need to register via the Internet. Registration is simple and full instructions are available online at www.emeraldinsight.com/ rpsv/librariantoolkit/emeraldadmin Once registration is completed, your institution will have instant access to all articles through the journal’s Table of Contents page at www.emeraldinsight.com/0737-8831.htm More information about the journal is also available at www.emeraldinsight.com/lht.htm Our liberal institution-wide licence allows everyone within your institution to access your journal electronically, making your subscription more cost effective. Our Web site has been designed to provide you with a comprehensive, simple system that needs only minimum administration. Access is available via IP authentication or username and password.

E-mail alert services These services allow you to be kept up to date with the latest additions to the journal via e-mail, as soon as new material enters the database. Further information about the services available can be found at www.emeraldinsight.com/ usertoolkit/emailalerts Emerald WIRE (World Independent Reviews) A fully searchable subject specific database, brought to you by Emerald Management Reviews, providing article reviews from the world’s top management journals. Research register A web-based research forum that provides insider information on research activity world-wide located at www.emeraldinsight.com/researchregister You can also register your research activity here. User services Comprehensive librarian and user toolkits have been created to help you get the most from your journal subscription. For further information about what is available visit www.emeraldinsight.com/usagetoolkit

Choice of access Key features of Emerald electronic journals Automatic permission to make up to 25 copies of individual articles This facility can be used for training purposes, course notes, seminars etc. This only applies to articles of which Emerald owns copyright. For further details visit www.emeraldinsight.com/copyright Online publishing and archiving As well as current volumes of the journal, you can also gain access to past volumes on the internet via Emerald Fulltext. Archives go back to 1994 and abstracts back to 1989. You can browse or search the database for relevant articles. Key readings This feature provides abstracts of related articles chosen by the journal editor, selected to provide readers with current awareness of interesting articles from other publications in the field. Reference linking Direct links from the journal article references to abstracts of the most influential articles cited. Where possible, this link is to the full text of the article. E-mail an article Allows users to e-mail links to relevant and interesting articles to another computer for later use, reference or printing purposes.

Additional complementary services available Your access includes a variety of features that add to the functionality and value of your journal subscription:

Electronic access to this journal is available via a number of channels. Our Web site www.emeraldinsight.com is the recommended means of electronic access, as it provides fully searchable and value added access to the complete content of the journal. However, you can also access and search the article content of this journal through the following journal delivery services: EBSCOHost Electronic Journals Service ejournals.ebsco.com Huber E-Journals e-journals.hanshuber.com/english/index.htm Informatics J-Gate www.j-gate.informindia.co.in Ingenta www.ingenta.com Minerva Electronic Online Services www.minerva.at OCLC FirstSearch www.oclc.org/firstsearch SilverLinker www.ovid.com SwetsWise www.swetswise.com TDnet www.tdnet.com

Emerald Customer Support For customer support and technical help contact: E-mail [email protected] Web www.emeraldinsight.com/customercharter Tel +44 (0) 1274 785278 Fax +44 (0) 1274 785204

Meta-information about MARC: an XML framework for validation, explanation and help systems

Abstracts & keywords

Joaquim Ramos de Carvalho, Maria Ineˆs Cordeiro, Anto´nio Lopes and Miguel Vieira Keywords Online cataloguing, Extensible Markup Language This article proposes a schema for meta-information about MARC that can express at a fairly comprehensive level the syntactic and semantic aspects of MARC formats in XML, including not only rules but also all texts and examples that are conveyed by MARC documentation. It can be thought of as an XML version of the MARC or UNIMARC manuals, for both machine and human usage. The article explains how such a schema can be the central piece of a more complete framework, to be used in conjunction with “slim” record formats, providing a rich environment or the automated processing of bibliographic data.

Selection for digital preservation Michael Seadle Keywords Archives management, Digital storage, Copyright law This editorial discusses long-term archiving and longterm access to digital documents, with an emphasis on criteria for selection. Selecting materials for digital preservation depends on whether the materials are both valuable and endangered, whether appropriate digitization procedures and standards for these materials exist, and whether copyright allows reasonable access for educational and research purposes.

Using XSLT to manipulate MARC metadata Corey Keith Keywords Online cataloguing, Extensible Markup Language, Services, Internet This paper describes the MARCXML architecture implemented at the Library of Congress. It gives an overview of the component pieces of the architecture, including the MARCXML schema and the MARCXML toolkit, while giving a brief tutorial on their use. Several different applications of the architecture and tools are discussed to illustrate the features of the toolkit being developed thus far. Nearly any metadata format can take advantage of the features of the toolkit, and the process of the toolkit enabling a new format is discussed. Finally, this paper intends to foster new ideas with regards to the transformation of descriptive metadata, especially using XML tools. In this paper the following conventions will be used: MARC21 will refer to MARC 21 records in the ISO 2709 record structure used today; MARCXML will refer to MARC 21 records in an XML structure.

Library High Tech Volume 22 · Number 2 · 2004 · Abstracts & keywords q Emerald Group Publishing Limited · ISSN 0737-8831

Creating metadata practices for MIT’s Open CourseWare Project Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman Keywords Online cataloguing, Libraries, United States of America The MIT libraries were called upon to recommend a metadata scheme for the resources contained in MIT’s OpenCourseWare (OCW) project. The resources in OCW needed descriptive, structural, and technical metadata. The SCORM standard, which uses IEEE Learning Object Metadata for its descriptive standard, was selected for its focus on educational objects. However, it was clear that the libraries would need to recommend how the standard would be applied and adapted to accommodate needs that were not addressed in the standard’s specifications. The newly formed MIT Libraries Metadata Unit adapted established practices from AACR2 and MARC traditions when facing situations in which there were no precedents to follow.

Medium or message? A new look at standards, structures, and schemata for managing electronic resources Sharon E. Farb and Angela Riggio Keywords Resources, Internet, Online cataloguing, Licensing, Information management, Libraries This article examines several library metadata stadnards, structures and schema relevant to the challenge of managing electronic resources. Among the standards, structures and schema to be discussed are MARC, METS, Dublin Core, EAD, XrML, and

116

Abstracts & keywords

Library High Tech Volume 22 · Number 2 · 2004 · 116-118

ODRL. The authors’ analysis reveals that there is currently no one standard, structure or schema that adequately addresses the complexity of e-resource management. The article concludes with an outline and proposal for a new metadata schema designed to manage electronic resources.

Repurposing MARC metadata: using digital project experience to develop a metadata management design Martin Kurth, David Ruddy and Nathan Rupp Keywords Online cataloguing, Libraries, Design, United States of America Metadata and information technology staff in libraries that are building digital collections typically extract and manipulate MARC metadata sets to provide access to digital content via non-MARC schemes. Metadata processing in these libraries involves defining the relationships between metadata schemes, moving metadata between schemes, and coordinating the intellectual activity and physical resources required to create and manipulate metadata. Actively managing the non-MARC metadata resources used to build digital collections is something most of these libraries have only begun to do. This article proposes strategies for managing MARC metadata repurposing efforts as the first step in a coordinated approach to library metadata management. Guided by lessons learned from Cornell University library mapping and transformation activities, the authors apply the literature of data resource management to library metadata management and propose a model for managing MARC metadata repurposing processes through the implementation of a metadata management design.

A bibliographic metadata infrastructure for the twenty-first century Roy Tennant Keywords Online cataloguing, Archives, Bibliographic systems The current library bibliographic infrastructure was constructed in the early days of computers – before the Web, XML, and a variety of other technological advances that now offer new opportunities. General requirements of a modern metadata infrastructure for libraries are identified, including such qualities as versatility, extensibility, granularity, and openness. A new kind of metadata infrastructure is then proposed that exhibits at least some of those qualities. Some key challenges that must be overcome to implement a change of this magnitude are identified.

A comparative review of common user interface products Daniel G. Dorner and AnneMarie Curtis Keywords Common user interface, Library portals, Software evaluation A common user interface replaces the multiple interfaces found among individual electronic library resources, reducing the time and effort spent by the user in both searching and learning to use a range of databases. Although the primary function of a common user interface is to simplify the search process, such products can be holistic solutions designed to address requirements other than searching, such as user authentication and site branding. This review provides a detailed summary of software currently on the market. The products reviewed were EnCompass, MetaLib, Find-It-All OneSearch, ZPORTAL, CPORTAL, InfoTrac Total Access, MetaFind, MuseSearch, SiteSearch, Single Search, Chameleon Gateway, and WebFeat.

Future considerations: the functional library systems record

Visual image repositories at the Washington State University Libraries

Karen Coyle

Trevor J. Bond

Keywords Online cataloguing, Libraries, Object-oriented databases, Design

Keywords Visual databases, Copyright law, Partnership

The paper performs a thought experiment on the concept of a record based on the Functional Requirements for Bibliographic Records and library system functions, and concludes that if we want to develop a functional bibliographic record we need to do it within the context of a flexible, functional library systems record structure. The article suggests a new way to look at the library systems record that would allow libraries to move forward in terms of technology but also in terms of serving library users.

The World Civilizations Image Repository (WCIR) and Photos Online are two collaborative image database projects under way at the Washington State University (WSU) Libraries. These projects demonstrate how the WSU Libraries have employed OCLC/DiMeMa’s (Digital Media Management) CONTENTdm in partnership with other University departments to develop visual collections free from copyright restrictions, as well as to manage “born digital” images on a collaborative basis.

117

Abstracts & keywords

Library High Tech Volume 22 · Number 2 · 2004 · 116-118

GIS in the management of library pick-up books

University (PSU) is an electronic library in transition, with new technology-based services evolving to address the ever growing and changing needs of the academic community. It facilitates access to and navigation of electronic resources in an integrated technology environment.

Jingfeng Xia Keywords Geographic information systems, Libraries, Shelf space, Collections management, Books The management of library “pick-up books” – a phrase that refers to books pulled off the shelves by readers, discarded in the library after use, and picked up by library assistants for reshelving – is an issue for many collection managers. This research attempts to use geographic information system (GIS) software as a tool to monitor the use of such books so that their distributions by book shelf-ranges can be displayed visually. With GIS, library floor layouts are drawn as maps. This research produces some explanations of the habits of library patrons browsing shelved materials, and makes suggestions to librarians on the expansion of library collections and the rearrangement potential for library space.

PSU Gateway Library: electronic library in transition

ProPrint world-wide print-on-demand services for study and research Elmar Mittler and Matthias Schulz Keywords Libraries, Digital libraries, Print media, Demand, Germany The libraries of more and more universities and research institutions have local digital repositories, and the amount of material is increasing every day. Users need an integrated retrieval interface that allows aggregated searching across multiple document servers without having to resort to manual processes. ProPrint offers an on-demand print service within Germany for over 2,000 monographs and 1,000 journals. Partners worldwide are now invited to join.

Lesley M. Moyo Keywords Digital libraries, Library systems, Academic libraries, Communication technologies, Information services

Patently ridiculous Judith Wusteman

Developments in information technology have led to changes in the mode of delivery of library services, and in the perceptions of the role of librarians in the information-seeking context. In particular, the proliferation of electronic resources has led to the emergence of new service paradigms and new roles for librarians. The Gateway Library at Penn State

Keywords Computer software, Public domain software, Extensible Markup Language, Libraries The Open Source Software movement has much to offer the library community. But can it survive the onslaught of patent applications?

118

Recently I was asked to speak on a panel at the German Library Congress in Leipzig about “long term archiving and long-term access to digital documents”, with an emphasis on criteria for selection. This issue is less controversial than it was in 1992, when Anne Kenney and Lynne Personius wrote their landmark “Study in digital preservation”. Three of their principal conclusions were:

Editorial Selection for digital preservation Michael Seadle

(1) Digital image technology provides an alternative – of comparable quality and lower cost – to photocopying for preserving deteriorating library materials . . . (2) Subject to the resolution of certain problems, digital scanning technology offers a cost effective adjunct or alternative to microfilm preservation . . . (3) Digital technology has the potential to enhance access to library materials (Kenney and Personius, 1992).

The author Michael Seadle is Editor of Library Hi Tech.

Keywords Archives management, Digital storage, Copyright law

Abstract This editorial discusses long-term archiving and long-term access to digital documents, with an emphasis on criteria for selection. Selecting materials for digital preservation depends on whether the materials are both valuable and endangered, whether appropriate digitization procedures and standards for these materials exist, and whether copyright allows reasonable access for educational and research purposes.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Kenney and Personius were also clear that a number of problems remained, including the need to institutionalize technology refreshing. Not all of these problems have been solved, and a few new ones have been added to the list, such as establishing “authenticity” in a digital environment (Smith, 2000). Nonetheless, digital preservation has grown increasingly acceptable as librarians begin to realize that digital preservation has nothing to do with transient media like tape, CDs or hard drives, but rather with the ability of systems to make perfect copies and with the foresight to have enough copies that no statistically plausible set of failures can eliminate them all. Projects like LOCKSS (Lots of Copies Keeps Stuff Safe) from Stanford University have gone a long way toward making digital preservation a reality (see Reich and Rosenthal, 2001). We have reached a point where the discussion has shifted from whether digital preservation is feasible to how to select appropriate materials. Selecting materials for digital preservation depends on three criteria: (1) whether the materials are both valuable and endangered; (2) whether appropriate digitization procedures and standards for these materials exist; and (3) whether copyright allows reasonable access for educational and research purposes.

Valuable and endangered The value of a work represents both an economic and intellectual calculation, and together these Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 119-121 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543494

Received 7 March 2004 Revised 14 March 2004 Accepted 19 March 2004

119

Selection for digital preservation

Library Hi Tech

Michael Seadle

Volume 22 · Number 2 · 2004 · 119-121

have a direct relationship to the work’s survival chances: high overall value means a high probability of copies surviving. For example, each genuine Gutenberg Bible has a value in the millions of dollars. Few copies of the original artifact exist, but its intellectual contents are as safe as contemporary society can make them. Countless versions of the original text, revised texts, translated texts, even facsimile versions are available in paper, and now a digital version of the Go¨ttingen copy is available (Lossau, 2000). A library fortunate enough to have an original can largely ignore it for long-term digital archiving, unless that copy has some previously undiscovered unique aspect, such as handwritten notes by Martin Luther, that no one else has captured. Digital archiving could arguably reduce the economic value of an expensive original by increasing copies and making its contents much more widely available, though no evidence exists to suggest that this in fact is happening. If anything, access seems to expose artifactual originals to wider markets, increasing their economic value. Unless human cupidity changes, the economic value of digitized originals seems safe. Low-value materials are, however, often genuinely endangered. The classic example is gray literature: those materials never published through commercial presses, not offered in bookstores, never registered with ISBN or ISSN or other schemes, and never inventoried except occasionally through the good fortune of archival organization. Gray literature composes large portions of the Web, and only Brewster Kahle’s Internet Archive is making an earnest attempt to preserve it all[1]. Some efforts exist to reformat older paper-based gray literature. Michigan State University, for example, digitized its American Radicalism Collection in the late 1990s[2]. The collection includes fliers, pamphlets and drawings, often reproduced crudely on mimeograph machines or early photocopiers. Often the ink was already fading. They would not have survived another decade with the relatively heavy use they received from students and researchers who discovered an interest in the topic. Between these extremes lies the bulk of printed materials. On average their economic value, as measured by used bookstores, hovers around $5. Books with pictures may be worth more. Many works are sufficiently unsaleable not to be worth the cost of warehousing them, and end up in the dumpster. Their intellectual value may well exceed their economic value for a few scholars at some indefinite future time. These kinds of works represent the core of a good research library collection, which tries almost by definition to have broad or nearly comprehensive collections in a

least a few chosen subjects. Neglect protects them to some degree from the ravages of human oils, excess light, and exposure to the elements. Neglect also often condemns them to remote storage areas whose climate controls may lack sophistication. Works from the acid paper period will burn slowly away. They fall in the broad middle range for their survival chances as well as their economic value. A few libraries, notably the University of Michigan (University of Michigan Digital Library, n.d.), have started the systematic digitization of these materials using low-cost techniques that require the bindings to be chopped off, and the pages sheet fed in bulk through scanners. Long-term, these works are much more likely to survive in digital form than in paper, and their use will grow because Web-based discovery and access offers a vastly greater market for their intellectual value.

Standards and procedures Many libraries think of digital archiving mainly in terms of reformatting existing paper materials. This is partly because early projects like Cornell’s concentrated on printed materials, and thus developed standards for the digital versions using TIFF, SGML and now XML. Good training programs exist to ensure reasonably efficient procedures. Although some issues remain, longterm digital archiving of text-based materials seems reasonable. The standards and methods for multimedia preservation remain far more volatile. Older multimedia that is currently stored in analog form on magnetic tape or on color film is more seriously endangered than all but the most acidic books. As with books, those items with particularly high economic value tend to have many more copies, but the inevitable loss in each generation of analog copies sets a threshold of quality, and even modern color film has some tendency to fade. The National Gallery of the Spoken Word project has helped to establish some standards (Seadle, 2004), including the use of WAV files, but issues like the preferred sampling rate remain in dispute with some arguing for 96kHz for all materials, and others defaulting to the more widespread 44.1kHz “CD audio” standard, which already captures more than the human ear can hear and has long been used for music. Standards can hardly be said to exist for digital video. Most digital video is too highly compressed and too prone to loss when expanded to be considered reliable preservation, but the file sizes for storing uncompressed video make any largescale effort economically infeasible. Falling disk storage prices may change that, however. An

120

Selection for digital preservation

Library Hi Tech

Michael Seadle

Volume 22 · Number 2 · 2004 · 119-121

additional complication is the fact that contemporary video often starts digital and incorporates software dependencies from the editing tools. A larger version of the same problem exists for software preservation. Software is, of course, born digital, but it is also born with operating system or device dependencies that make the equivalent of reformatting just as necessary. Emulating old platforms and migrating software to run on new platforms are both alternatives, and at this point neither seems obviously preferable (Hedstrom and Lampe, 2001). Long-term archiving for software is very much an open issue.

Access Digital preservation is no guarantee of access. US law allows libraries to create up to three digital copies of endangered works, but it is important to note that the law explicitly limits access to the premises:

Conclusion For libraries that deal primarily with paper documents, selection for long-term digital archiving is limited mainly by choices about the value of the originals and the degree to which copyright law allows access. Standards and procedures are reasonably well established to give the works a good chance to long-term survival. Since most libraries fit this category, digitization projects can be expected to flourish. For libraries with significant audio, multimedia, or software collections, as does Michigan State University’s Vincent Voice Library, long-term digital archiving continues to have a significant research aspect. The copyright issues for multimedia especially can be particularly complex because of the potential for multiple ownership. Libraries selecting these kinds of materials for digitization need to be prepared for ongoing change.

Notes 1 See www.archive.org/ 2 See http://digital.lib.msu.edu/onlinecolls/collection. cfm?CID ¼ 1 3 17 USC 108, United States Code, Title 17, Chapter 1, section 108, available at: www.copyright.gov/title17/ 92chap1.html#108

The right of reproduction under this section applies to three copies or phonorecords of a published work duplicated solely for the purpose of replacement of a copy or phonorecord that is damaged, deteriorating, lost, or stolen, or if the existing format in which the work is stored has become obsolete, if: (1) the library or archives has, after a reasonable effort, determined that an unused replacement cannot be obtained at a fair price; and

References

(2) any such copy or phonorecord that is reproduced in digital format is not made available to the public in that format outside the premises of the library or archives in lawful possession of such copy[3].

While some wishful arguments suggest that “premises” for a university library could include the whole campus, many who deal regularly with the law would not want to risk such a broad interpretation of a word that seems plainly to mean a single physical building. Copyright restrictions are the chief reason why most digital archiving has concentrated on pre20th century materials. Some exceptions to copyright protection exist, particularly within US law, and some institutions are willing to make a risk assessment that allows access to works whose rights owners seem unlikely to protest. Others, especially state-supported libraries, are more riskaverse, and spend significant resources trying to get permission to give access to materials they want to digitize.

Hedstrom, M. and Lampe, C. (2001), “Emulation vs migration: do users care?”, RLG DigiNews, Vol. 5 No. 6, December, available at: www.rlg.org/preserv/diginews/diginews56.html#feature1 Kenney, A. and Personius, L. (1992), “The Cornell/Xerox/ Commission on Preservation and Access Joint Study in digital preservation”, available at: http://palimpsest. stanford.edu/byauth/kenney/joint/ Lossau, N. (2000), “Go¨ttingen Gutenberg Bible goes digital”, D-LIB Magazine, Vol. 6 No. 6, June, available at: www.dlib.org/dlib/june00/06contents.html (accessed March 2004). Reich, V. and Rosenthal, D. (2001), “LOCKSS: a permanent Web publishing and access system”, D-LIB Magazine, Vol. 7 No. 6, June, available at: www.dlib.org/dlib/june01/reich/ 06reich.html (accessed March 2004). Seadle, M. (2004), “Sound preservation: from analog to digital”, in Lynden, F.C. (Ed.), Advances in Librarianship, Vol. 27, April, Academic Press, New York, NY. Smith, A. (Ed.) (2000), Authenticity in a Digital Environment, Council on Library and Information Resources, Washington, DC, available at: www.clir.org/pubs/reports/ pub92/contents.html (accessed March 2004). University of Michigan Digital Library (n.d.), “Making of America IV: the American voice, 1850-1877”, available at: www.umdl.umich.edu/moa4/overview.html (accessed March 2004).

121

The Library of Congress has developed the MARCXML schema and the MARCXML toolkit in order to standardize the exchange of MARC structured data in XML. Leveraging the MARCXML schema as an exchange standard enables the library community to create a broad base of reusable software tools and facilitates the flow of information regardless of format.

Theme articles Using XSLT to manipulate MARC metadata Corey Keith

History

The author Corey Keith is Digital Project Coordinator, Network Development and MARC Standards Office, Library of Congress, Washington, DC. Keywords Online cataloguing, Extensible markup language, Services, Internet Abstract This paper describes the MARCXML architecture implemented at the Library of Congress. It gives an overview of the component pieces of the architecture, including the MARCXML schema and the MARCXML toolkit, while giving a brief tutorial on their use. Several different applications of the architecture and tools are discussed to illustrate the features of the toolkit being developed thus far. Nearly any metadata format can take advantage of the features of the toolkit, and the process of the toolkit enabling a new format is discussed. Finally, this paper intends to foster new ideas with regards to the transformation of descriptive metadata, especially using XML tools. In this paper the following conventions will be used: MARC21 will refer to MARC 21 records in the ISO 2709 record structure used today; MARCXML will refer to MARC 21 records in an XML structure. Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

The Library of Congress has had a DTD for MARC since the mid 1990s. The first instance of the DTD allowed for the conversion of MARC data to SGML. As technology evolved, this SGML DTD was converted to an XML DTD. The difference between SGML and XML DTD was a syntactical variation on the same document structure. Today the library has created an XML schema (www.w3.org/XML/ Schema) that incorporates lessons learned throughout the development of the DTDs, and takes advantage of current technology standards to provide an improved approach to MARC21 data in XML.

MARC XML DTD An important feature of this DTD was that it could be used for MARC 21 validation. As a consequence of encoding all the validation rules in the schema, it became very large, so large that there are two DTDs; one that is roughly 500k in size which validates bibliographic, holdings, and community information records, and the other approximately 240k in size which validates authority and classification records. In order to validate the MARC21 structure, for every tag, the valid subfield codes and indicator values were enumerated in the schemas, thus resulting in their large size. These large DTDs are a bit awkward to work with, especially for users of desktop-based XML applications. A sample record may look like the following:



Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 122-130 Emerald Group Publishing Limited . ISSN 0737-8831

Received September 2003 Revised September 2003 Accepted November 2003 # Corey Keith

122

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith







92005291 DLC 19930521155141.9

















Sandburg, Carl, 1878-1967.



Arithmetic/ Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand.



San Diego : Harcourt Brace Jovanovich,

c1993.



1 v. (unpaged) : ill. (some col.) ; 26 cm.



One Mylar sheet included in pocket.



Arithmetic Juvenile poetry.

Children’s poetry, American.

Arithmetic Poetry.

American poetry.

Visual perception.



Rand, Ted, ill.



MARC XML schema The current MARC XML schema was developed as a lightweight alternative to the MARCXML DTD. This schema is composed of just six elements: (1) collection; (2) record;

123

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

(3) (4) (5) (6)

leader; controlfield; datafield; and subfield.

The schema provides for whitespace preservation and basic sanity validation of appropriate ISO2709 record elements. Due to the characterbased nature of the MARC21 standard, especially in the controlfields and leader, whitespace needs to be preserved in order to maintain character offsets, thus ensuring proper interpretation of the data. While the leader, controlfield, datafield, and subfield elements should be familiar to those in the library community, the use of the collection and record elements should be explained. The base case for use of the MARCXML schema is a single MARC21 record in a single XML document. In this case, the record element from the www.loc.gov/MARC21/slim namespace is the root element. A facility is needed for cases when multiple MARC21 records exist in the same file. This is especially helpful for the exchange of multiple records in one file, result sets that can return zero or many records, or transformations that operate on multiple records such as a functional requirements for bibliographic records (FRBR) transformation. In these cases, the collection element from the same namespace should be the root element of the XML document with multiple child record elements containing each record. As an aside, the next version of the MARCXML schema will define leader, controlfield, datafield, and subfield elements as global elements so other XML schemas can reference them. A sample record may look like the following:

01142cam 2200301 a 4500

92005291

DLC

19930521155141.9

920219s1993 caua j 000 0 eng

Sandburg, Carl,

1878-1967.

Arithmetic /

Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand.

San Diego : Harcourt Brace Jovanovich, c1993.

1 v. (unpaged) :

ill. (some col.) ;

26 cm.

Arithmetic Juvenile poetry.

Children’s poetry, American.

Arithmetic Poetry.

American poetry.

Visual perception.

Rand, Ted, ill.

A motivating goal in developing the schema was to allow easy access to discreet pieces of data. With this schema it is very easy to access information at the subfield level with simple XPATH (www.w3.org/TR/xpath) expressions. This ease of access enables the creation of XML stylesheets (www.w3.org/TR/xslt) to manipulate and transform the data. Possible types of transformations range from a stylesheet used to create an HTML page out of MARC records for display to complex stylesheets for transformation to other metadata formats such as MODS. Although XSLT does not provide the full features and control of a programming language such as

124

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

Java, it is quite surprising what can be accomplished with XSLT. Maintaining transformations of XML data in stylesheets results in many positive consequences. XSLT stylesheets are easily modifiable. There is no need to recompile or to have special software tools to change stylesheets. Library professionals who are not software developers can make changes to transformations with little assistance. XSLT documents are simple text files just like XML documents which are editable with word processors at the most basic level. This being said, many software tools exist to make it easier to develop and maintain stylesheets. The MARC21 file format is not an easy format for application developers to write software for. Deciphering the directory structure of ISO2709 and the issues with character conversion are software tasks with high learning curves that only need to be written once and reused thereafter. Converting the data into XML drastically eases access for developers today. A software developer who speaks MARC is relatively rare, as opposed to one who speaks XML. With a well-written specification, an XML-skilled developer can write tools to manipulate MARC metadata. Due to the exponential growth of XML as a standard for the exchange of data, the availability of XML tools both for developers and end users is quite high. XML tools exist for reading, writing, searching, indexing, transforming, and so on. Many tools are open source and free, which allow for easy experimentation at a low cost, and many excellent commercial products are available as well. Specifications or crosswalks to other metadata standards are much more easily implemented when the data is in XML. This is especially true when those other metadata schemes have an XML representation, such as MODS and Dublin Core, because the initial step of marshalling the information to XML is not needed. When implementing the MARC to MODS mapping (www.loc.gov/standards/mods/modsmapping.html) in an XSLT, each of the mapping rules was translated into a remarkably similar XSLT expression. The MARC to MODS XSLT is available at www.loc.gov/standards/marcxml/ xslt/MARC21slim2MODS.xsl

MARCXML architecture The MARCXML schema is the core piece of the broader MARCXML architecture. Due to the variety of tags in the MARC21 format, it serves as an ideal format which other metadata formats can map to and map from, e.g. we can create MODS

records from MARC21 metadata and we can also create MARC21 records from MODS metadata. The disparate nature of metadata element sets usually means the mapping to MARC21 and the mapping from MARC21 may not be exact mirrors of each other, thus requiring a specification and transformation for each direction. The MARCXML representation of MARC21 records serves as a bus upon which information is carried along. Once a specific metadata format is mapped to MARCXML, ideally via XSLT, information is free to flow onto the bus, thus exposing the metadata to all of the features attached to the bus. By also mapping the opposite direction, from MARCXML back to a specific metadata format, information can flow off the bus, creating records in this specific format from other formats that are enabled on the MARCXML bus, e.g. Dublin Core records can be created from MODS records because mappings exist for MODS to MARCXML and MARCXML to Dublin Core. For example, the Library of Congress has developed mappings to and from MARCXML for MODS (www.loc.gov/standards/mods/). The first benefit is that MARCXML records can be created from MODS records. Only one step away on the bus is the conversion to MARC21 records. As a result, MODS and MODS-aware tools could be used to create basic bibliographic records that could then be converted into MARC21 records and put into MARC-based library systems. But the bus is much wider and will keep growing with community involvement. The Library of Congress has already developed mappings to and from unqualified Dublin Core, so the records that were first created in MODS could be converted to Dublin Core via two XSLT transformations on and off the MARCXML bus. The inputs and outputs to the bus are not restricted to just other metadata formats, but the bus also connects metadata to tools for transformation and analysis (see Figure 1).

Validation As a consequence of making the MARCXML schema a ‘‘slimed down’’ schema from the DTD, the ability to validate the MARC 21 structure of the record has been lost. Although the schema does not validate MARC21 structure, tools external to the schema are being developed to accomplish validation. The library has started to develop stylesheets that can be used to validate MARC records. By writing a stylesheet to perform validation, implementers are able to modify the validation rules for their own local practices.

125

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

Figure 1 MARCXML bus

The Library of Congress is taking a multilevel approach to validation. At the first level is a simple check if the XML is well-formed. The second level of validation is validating against the MARCXML schema. As stated earlier, this is basic ISO2709 validation of record structure. The third level of validation examines the structure of the MARC21 record itself, checking for valid tags, indicator values, and subfield codes against information extracted from the MARC documentation. A yet-to-be-developed fourth level of validation would test the actual content of the MARC records against cataloging rules such as AACR2.

Transformation The library has created some utility XML stylesheet templates that can be reused in other stylesheets. The most heavily-used utility template is subfieldSelect:









The template subfieldSelect takes a parameter named ‘‘codes’’, which is a string containing the subfield codes that the user would like to extract.

subfieldSelect also takes an optional parameter which is the delimiter desired to insert between the subfields. The default is a space. subfieldSelect should be called when the context node is at a datafield. An important feature of subfieldSelect is that it does not alter the order of the subfields when selecting. subfieldSelect iterates through each subfield and checks the code against the list of desired codes. If the code is in the list, the content of the subfield is outputted and appended with a delimiter. In the example below, for each 650 datafield in the document, we will select the context node using the for-each construct. Once the context is set we will create a topicalSubject element. The content of the topicalSubject element will be created with the call to subfieldSelect, which will select all of the a, b, c, and d subfields in the order they appear in the record:

abcd



Transformation for display By using the subfieldSelect template or some variation thereof, it is quite simple to take MARCXML-encoded records and produce an HTML page for display. Information can be combined from a variety of sources to produce views of MARC21 metadata customized for each user experience. For example at the Library of Congress we have extracted English descriptions of each tag and subfield from the MARC

126

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

documentation to create a stylesheet to display the content of a MARC21 record to someone who has little familiarity to the MARC21 format. This stylesheet is available at www.loc.gov/standards/ marcxml/xslt/MARC21slim2English.xsl As integrated library system (ILS) software becomes more extensible, it could be possible that XSLT would serve as the mechanism for customization of Web-based displays of metadata from online public access catalogs (OPAC). Imagine if an XSLT was the basis for the display of search results from a catalog. Changing the look or feel would be easily accomplished and would not require custom software development by the ILS vendor, and thus Web pages for searching and retrieving records from the catalog could be tailored to different libraries and different user requirements.

Transformation to other XML formats (MODS) Again using the subfieldSelect template and a specification for the conversion of metadata from one format to another, the creation of a stylesheet to convert MARCXML into another metadata format is straightforward. The difficulty is in the intellectual task of creating the conversion specification.

Transformation as a filter for mass edits By using what is commonly known as the identity transformation in XSLT, record processing tasks can be written using a stylesheet:





The identity transformation copies each element from the source document to the output document. To perform any processing, this identity rule needs to be overridden for the specific element that is to be handled. For example, assuming all 9XX tags were reserved for local information and the user wanted to remove them from records which are distributed, the following template would be used with the identity template from above.

When any XML element from the source record is encountered which is not a datafield with a tag attribute greater than or equal to 900, it is copied to the output document by the identity template. When local datafields by this definition are encountered, the specific template is invoked. Since the template does nothing, the datafield element is not copied to the output and thus is ‘‘deleted’’. Another example would be the addition of a note field (500) to a set of records. This can be accomplished by overriding the default identity rule for the record element:



Note to be added



In overriding the default identity rule, the record element must first be copied to the output document. In order to handle any of the attributes of the record element and to process all of the other elements in the source document, the applytemplates element is used. Once all of the elements from the source document are processed and before the record element is closed, the 500 tag is inserted into the output document. A final example of a record-processing type of template would the modification of the content in a specific subfield, such as when excess whitespace is removed from a numeric subfield:



By using an XPATH expression which selects the text of 020 $a, the context node of the XSLT processing is the actual text. Removing the whitespace is as simple as applying the normalizespace function to the current node. These are just a few examples of the use of XSLT for record processing. The identity transformation in combination with the overriding templates is a powerful concept in XSLT. Even more complex record processing can be accomplished by chaining multiple transformations together.

127

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

Analysis transformation (FRBR)

Using the MARCXML toolkit

Aside from transforming metadata from one element set to another, MARCXML data on the bus can be transformed in higher level ways using XSLT or more sophisticated tools. Any tool that understands MARCXML can be supplied data from the MARCXML bus. Taking advantage of features of the evolving XSLT 2.0 standard (www.w3.org/TR/xslt20/), the library has begun work on a stylesheet to transform a set of MARC21 records into FRBR-like structure. XSLT 2.0 supports advanced grouping functionality that, while accomplishable in the current version of XSLT, is much more straightforward in the new version. The process of creating a FRBR structure involves grouping and sorting MARCXML records at each of the four levels of the FRBR model. The FRBRization stylesheet is available at www.loc.gov/standards/ marcxml/frbr/FRBRize.xsl

The MARCXML Toolkit is a set of utilities written in Java to facilitate the conversion of MARC21 records to and from XML. The toolkit also allows the user to apply their own XSLT to transform MARC21 records. The MARCXML toolkit uses the MARC4J (http://marc4j.tigris.org/) open-source library for manipulating MARC records. The MARC4J is a feature-filled Application Programming Interface (API) which takes an XML-like approach to dealing with MARC-encoded information; it is an excellent tool to marshal data between MARC21 and XML. The Library of Congress-developed MARCXML toolkit provides a simplified command-line based interface to MARC4J, appropriate for use by technology-savvy library professionals. Much effort has been placed in conversion of characters from the MARC8 repertoire to Unicode. When converting MARC21 records to XML, the MARC8 characters are converted to non-precomposed Unicode characters. In order to run the MARCXML Toolkit, a Java Runtime Environment (JRE) needs to be installed on the user’s system and the Java command needs to be in their path. It is preferred to have Java of at least version 1.4. The toolkit will run with earlier versions of Java, but modifications to the batch file are needed to include an XML parser in the classpath. Java versions 1.4 and greater have an XML parser included. A JRE can be downloaded from www.java.com if the user does not already have one. To see if Java is in the path and to test the version, go to the command prompt and type java version, and the following should be outputted:

To text (SQL scripts) Although the examples thus far have transformed MARCXML into other mark-up languages, XSLT also allows us to transform XML into a variety of formats including normal text. Using this concept, we can create scripts to load MARC data into a database using SQL. In the example below:



INSERT INTO TitleStatement VALUES (, ’);

would result in: INSERT INTO TitleStatement VALUES (92005291,’Arithmetic /’);

C:>java -version java version "1.4.1_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06) Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode)

Once Java is verified to be installed, the MARCXML Toolkit needs to be downloaded. The toolkit is packaged as a zip file and is available at www.loc.gov/marcxml. Download the file and extract it to a folder on the system. Go to the command prompt and change to the directory to which the user extracted the toolkit. Four files compose the toolkit: (1) marc4j.jar is the MARC4J library; (2) marc4j-b7.zip is the source code for the MARC4J library; (3) marcxml.jar is the actual MARCXML Toolkit code; and (4) marcxml.bat is a driver batch file for the toolkit.

128

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

If the user is running on a platform other than Microsoft Windows, careful examination of the batch file will illustrate how to run the toolkit directly. The batch file provides more documentation than functionality. In order to experiment with the toolkit, a user needs a file of MARC21 records. When marcxml.bat is run, the user should get an overview of what the toolkit can do:

To experiment with applying local stylesheets to MARC data, use the following command. The stylesheet can be a path to a file on the user’s system or a URL pointing to a stylesheet located elsewhere. The MARCXML Web site provides numerous examples of stylesheets to start from: C:>marcxml gov.loc.marcxml.GenericMarcXmlWriter file.mrc file.xml stylesheet.xsl

C:>marcxml Usage marcxml usage: marcxml classname [parameters]

Finally, to convert MARCXML back into MARC21 records, run: C:>marcxml gov.loc.marcxml.MARC21slim2MARC file.xml file.mrc

Where classname can be one of the following: To convert MARC to MARC21slim xml: gov.loc.marcxml.MARC2MARC21slim To convert MARC21slim xml to MARC: gov.loc.marcxml.MARC21slim2MARC To convert MARC to MARC21BIG xml: gov.loc.marcxml.MARC2MARCBIG To convert MARC to MODS xml gov.loc.marcxml.MARC2MODS To use any stylesheet when converting from MARC records gov.loc.marcxml.GenericMarcXmlWriter To convert MARC records to UNICODE gov.loc.marc.UnicodeConverter To validate MARC records gov.loc.marc.Validator

The future of MARCXML

In order to convert the marc file to MARCXML, run: C:>marcxml gov.loc.marcxmlMARC2MARC21slim

When this command is run without any parameters, basic instructions on usage come up: C:>marcxml gov.loc.marcxml.MARC2MARC21slim MARC2MARC21slim Usage: gov.loc.marcxml.MARC2MARC21slim

See www.loc.gov/standards/marcxml for more information. To actually convert the records to MARCXML, run: C:>marcxml gov.loc.marcxml.MARC2MARC21slim file.mrc file.xml

Similarly, to convert MARC data into MODS, run: C:>marcxml gov.loc.marcxml.MARC2MODS file.mrc mods.xml

To test the current validation application, run the following command. It will produce an XML document containing the errors in the MARC file. C:>marcxml gov.loc.marc.Validator file.mrc report.xml

The evolution of the MARC DTD to an XML Schema is just one example of adapting to new technology in the history of MARC. The challenge for the metadata community is to take the semantic value of existing standards such as MARC and implement them with new technologies and tools to provide this continual evolution. Both the MARCXML toolkit and the MARC4J library are considered beta-quality software. Before the MARCXML toolkit is used in a production environment, some outstanding issues need to be resolved. The MARCXML toolkit expects cleanly-encoded MARC records, especially related to character sets. When encountering an error in a record, the error output is not very user friendly. MARC4J is an open source project, so feel free to help by downloading the source code and tackling some of these open issues. To enable greater experimentation with lower barriers to entry, the Library of Congress would like to expose the functionality of the MARCXML toolkit as a Web application. Users could visit a Web page to convert MARC records to and from a variety of XML formats including MARCXML, MODS, and Dublin Core. The Web application would also allow users to experiment with their own XSLT conversions. Being a Web-based application reduces the burden of downloading the software and having the requisite Java runtime environment installed. The Library of Congress will also provided the source code, and libraries of these Web-based tools for those who would like to install them in their local systems. Software is in general moving toward a distributed service-oriented architecture. For a particular need, a developer can stitch

129

Using XSLT to manipulate MARC metadata

Library Hi Tech Volume 22 . Number 2 . 2004 . 122-130

Corey Keith

together Web services with business logic to produce an application without writing new code each time. Some of the prominent Web services currently available include mapping and routing and image manipulation. The transformation of MARC21 metadata is in the near future. By exposing the MARCXML toolkit functionality as a set of Web services, the services can be consumed easily by a variety of applications. Centralizing the service offering also provides a standardization of the toolkit functionality. All users who invoke the Web service to convert a MARC21 file to MARCXML will get the same result, whereas someone who uses a Java library as opposed to a Perl library may not have the same result. Web services also allow for platform independence. A PHP application may use a Z39.50 library such as PHP/YAZ to retrieve MARC records from a catalog that could be submitted to the MARCXML Web service for conversion to XML.

Conclusion Placing MARC metadata in XML exposes MARC21 information to a whole new world of resources from software to people. The MARCXML architecture, and its component pieces including the MARCXML schema and toolkit, lay the groundwork and standardize an approach to MARC metadata in XML. The applications presented in this article are intended to demonstrate a few of the many possible ways to manipulate metadata in XML form and to provoke thought on additional uses of the architecture. All of the resources mentioned in this paper, including the MARCXML Toolkit, are available at the Library of Congress’s Web site at www.loc.gov/marcxml.

130

Meta-information about MARC: an XML framework for validation, explanation and help systems Joaquim Ramos de Carvalho Maria Ineˆs Cordeiro Anto´nio Lopes and Miguel Vieira The authors Joaquim Ramos de Carvalho is Professor, University of Coimbra, Quinta da Nara, Portugal, and a founder of BookMARC. Maria Ineˆs Cordeiro is Library Information Systems Manager, Art Library, Calouste Gulbenkian Foundation, Lisbon, Portugal. Anto´nio Lopes is Lead Developer and Miguel Vieirais is Developer, both of BookMARC, Instituto Pedro Nunes, Quinta da Nora, Portugal.

Keywords Online cataloguing, Extensible markup language

Abstract This article proposes a schema for meta-information about MARC that can express at a fairly comprehensive level the syntactic and semantic aspects of MARC formats in XML, including not only rules but also all texts and examples that are conveyed by MARC documentation. It can be thought of as an XML version of the MARC or UNIMARC manuals, for both machine and human usage. The article explains how such a schema can be the central piece of a more complete framework, to be used in conjunction with ‘‘slim’’ record formats, providing a rich environment for the automated processing of bibliographic data.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 131-137 # Emerald Group Publishing Limited . ISSN 0737-8831 DOI 10.1108/07378830410524558

Since 1998 there has been widespread interest in the potentials of XML on the part of the library community, regarding raising awareness, expectations and experiences for library services in various aspects (Hjørgensen, 2001a, b; Tennant, 2002; Cordeiro and Carvalho, 2002; Carvalho and Cordeiro, 2003a, b) more than clear directions regarding the implications for bibliographic metadata standards (Gardner, n.d.; Exner and Turner, 1998; McCallum, 2000; Kim and Choi, 2000; Miller, 2000; 2002; Qin, 2000; Van Herwijnen, 2000). The relationship between MARC and XML has already been addressed in the literature (Lam, 1998; Medeiros, 1999; Granata, 2000; Johnson, 2001; Miller, 2000, 2002; Carvalho and Cordeiro, 2002) (although MARC generally refers to library data formats, which can include different subsets for bibliographic, authority, holdings, classification and community information, in this paper we refer to MARC meaning mainly bibliographic data formats). The interest in XML has been fueled by a variety of needs that exist in the bibliographic data processing community: wider circulation and usage of bibliographic records requiring a mainstream data format, the demand for flexible representations of bibliographic data in a continuously growing range of display formats (html, pdf, text), and awareness about a certain archaism of ISO 2709 as a transport format Information and Documentation – Format for Information Exchange (ISO, 1996). ISO 2709 is the common format for exchange underlying all MARC formats; it consists of a record label, a directory and data fields, with standard characters for separators. The corresponding ANSI standard is the ANSI/NISO (1994) Information Interchange Format. The hope that XML would help to remove, or overcome, unnecessary complexity in the creation and processing of bibliographic records is cited in the literature and in the messages exchanged on the XML4Lib mailing list[1]. In this article, the XML representation of MARC meta-information is the focus, instead of the more common topic of the representation of individual records. By meta-information is meant the information about the standard itself, such as the actual content of MARC documentation, most of it included in MARC21 or UNIMARC manuals (available at: http://lcweb.loc.gov/marc/ and www.ifla.org/VI/3/p1996-1/sec-uni.htm). Beyond the record structure and morphological level of content designation, an important part of such information describes semantic characteristics of valid MARC records: which Received 1 September 2003 Revised 2 October 2003 Accepted 7 November 2003

131

Meta-information about MARC

Library Hi Tech Volume 22 . Number 2 . 2004 . 131-137

Joaquim Ramos de Carvalho et al.

fields are mandatory, which are repeatable, what are the legal values for indicators or certain sub fields, etc. These aspects are essential for the validation of bibliographic records. The issue of validation is central in discussing the relationship between XML and MARC. This is due to the fact that validation is an aspect that was built into the design of XML. XML documents can be validated against a special document, called a schema[2], containing a description of the legal structures and data that are ‘‘legal’’ for a certain type of information. Widelyavailable tools can use schema documents to check the validity of XML documents[3]. Since validation is a very common data processing activity, one would expect that bibliographic records expressed in XML could be validated using the built-in mechanisms derived from schemata. It turns out that this goal cannot be achieved in a simple way, because the range of rules that may define what a valid MARC record is cannot easily be mapped to a schema. This is especially true if one wishes to maintain a simple XML representation of MARC, in order to allow for easy transportation of records between systems. As a matter of fact, in a first stage of exploring the use of XML to express MARC records, most of the approaches assumed self-validation or selfexplanation of MARC records as an essential feature, resulting in very long and complex DTDs (see for example Hough et al., 2000) and confounding different goals and functions for how and why to use XML versions of MARC records. Examples of this approach are LoC MARC DTDs, BiblioML and Medlane XMLMARC experiment[4]. This situation soon revealed a tension between the simplicity required for the transportation and dissemination of records, and the complexity that is implied in expressing the full set of rules that make a record valid. The solution is to design an XML transport format that is simple and efficient for that purpose (hence ‘‘slim’’), and to produce an alternative mechanism for validation. This was the approach followed by the Library of Congress (LoC) with MARCXML (available at: www.loc.gov/standards/marcxml/), and it was also proposed by the authors in Carvalho and Cordeiro (2002). Three areas were identified for the use of XML with bibliographic records: (1) record exchange (data transportation level); (2) record validation (data conformity level); and (3) sharing of services (application services level). Separating record exchange from validation was the first step emphasized (Carvalho and Cordeiro, 2002). The potentials of XML applied to sharing services between systems, notably using Web services, was later explored in Cordeiro and Carvalho (2003b) and some practical

exemplifications were provided in Carvalho and Cordeiro (2003a). Altogether these considerations form what the authors called the TVS – Transport, Validation and Services – Model[5]. In this article, this route is taken a step further by proposing an XML representation of the information needed for the validation of MARC records. The novelty of this approach is that such representation assumes the form of an XML Schema targeted not at bibliographic records as such, but rather at the set of information that describes what a valid MARC record is. This proposal can be thought of as a sort of XML transcription of the MARC manual, aimed at both machine and human usages. Before introducing the schema itself, it is worth elaborating on what is meant by ‘‘validation’’ of records. Validation is not a straightforward and uniquely-defined concept. It can be considered at different levels, expressed by different concepts, each having different implications from the point of view of the type of information needed to determine if a MARC record is valid or not.

‘‘Readable,’’ ‘‘correct’’ and ‘‘adequate’’ records MARC records are formalized representations of bibliographic information. ‘‘Formal’’ means that they conform to a set of conventions, principles and rules about how a bibliographic item should be described and how this description should be coded into machine-readable form. Like all formalisms, MARC represents reality by reducing (i.e. simplifying) the complexity and variations of real bibliographic entities in a way that is considered adequate for a set of purposes. MARC records abstract and simplify bibliographic reality by means of a set of conventions where some are generally present in all MARC formulations while others can be particular to specific materials, or derive from a given policy of application. That is to say, the level of rules and criteria that can be considered for MARC ‘‘validation’’ can vary according to the requirements of a given case and of a given purpose (i.e. like the map of a city that, beyond general conventions of a city map, conveys special features suited to a given purpose, e.g. that of providing guidance about public transport mobility). There are several levels of rules and conventions in MARC that converge in the formal representation of bibliographic information. Some are very ‘‘low level’’ and have to do with the structure of the record, its atomic parts and how they are presented in an electronic file. Others are of a ‘‘higher level’’ and define what type of information should be present, if and when certain

132

Meta-information about MARC

Library Hi Tech Volume 22 . Number 2 . 2004 . 131-137

Joaquim Ramos de Carvalho et al.

data elements should or should not occur, or what elements of bibliographic information should be transcribed using pre-defined codes or vocabularies. Not all of the information included in MARC documentation is absolutely prescriptive and exclusive of alternatives; in the same way, not all the prescriptions relating the production and management of MARC records are included in official MARC documentation, because some derive from policies adopted by institutions or communities of systems. In this context, the concept of a ‘‘valid’’ record can have different meanings and scope. It is therefore useful to categorize different levels of understanding ‘‘validity’’ before we proceed with the discussion of how different aspects of MARC validation can be handled. We suggest decomposing the concept of ‘‘a valid record’’ into three different levels: ‘‘readable’’, ‘‘correct’’ and ‘‘adequate’’. Like all terminologies, these levels are somewhat arbitrary and other terms could be used. The important thing is to distinguish clearly the meaning of each level: (1) A ‘‘readable’’ record is composed of a leader, a set of control fields and a set of data fields with the additional characteristics defined by the ISO 2709 standard. From a functional point of view, this is a record that can be read by a machine and further processed. Alternative designations of a ‘‘readable’’ record could be ‘‘structurally valid’’ or ‘‘syntactically valid’’. Limitations of this level of analysis include the fact that a record may be ‘‘readable’’ but it may consist of a set of fields and content which make little sense as a MARC record. (2) A ‘‘correct’’ record is a ‘‘readable record’’ that contains the required set of fields prescribed by the MARC standard to model a given type of bibliographic item and the content of which follows the relevant coding rules and vocabulary types, wherever applicable. Functionally, this is a record that a MARCaware automated system can extract information from, producing, for instance, indexes or ISBD displays of the contained information. An alternative designation of a ‘‘correct’’ record could be ‘‘semantically valid’’. Limitations of this level of analysis include the possibility that a record can be ‘‘correct’’ in this sense but provide inappropriate description of the bibliographic item, either because it contains semantic errors (misreadings, incorrect recording of dates or names, etc.) or because it lacks information that is not mandatory by the standard but is relevant and needed for the case in question. (3) An ‘‘adequate’’ record is a ‘‘readable’’ and ‘‘correct’’ record that is fit for the purpose defined for that record in its context.

Functionally, this is a record which applies MARC following the relevant guidelines, good practice, consistency rules and policies of a given librarianship community. It is a record that correctly surrogates the original item and fulfills the role and features of the information system(s) where it was generated or in which it will ‘‘live’’. An alternative designation for an ‘‘adequate’’ record could be ‘‘fit for purpose’’. Limitations of this level of analysis derive from the issue that the ‘‘purpose’’ of records, in the sense explained above, is seldom clearly defined and even when such definitions exist they can change over time and circumstances. From this categorization it is clear that when most people talk about ‘‘valid’’ records they usually mean ‘‘readable and correct’’ records. It is fairly obvious that labeling a record as ‘‘adequate’’ is not a task that can be fully automated, mainly because it relies in ‘‘real world’’ information that is not available as such to machines. It is also clear that the main source of information for labeling records according to this typology is MARC documentation, such as MARC21 and UNIMARC manuals, just to mention the two major MARC formats. MARC manuals define the rules for ‘‘readability’’, ‘‘correctness’’ and, although not covering all possible aspects of ‘‘adequacy’’, they provide examples and directions related to generally accepted good practices and desirable outcomes in terms of modeling bibliographic surrogates.

The role of XML: from ‘‘readability’’ to ‘‘correctness’’ Where does XML fit into this ‘‘validation’’ landscape? Current MARC XML mappings are aimed at ‘‘readable’’ records. The LoC MARCXML Schema, having data transport as the main goal, defines exactly what a ‘‘readable’’ record is, not if it is correct. This task is currently being relegated to specialized software that in turn relies on stylesheets. Extensible Stylesheet Language Transformation (XSLT), is an XML specification with the status of a W3C Recommendation, for transforming XML documents into other XML documents using stylesheets[6]. At this point, it is worth pausing a moment to reflect on how the limited role of XML has been over-hyped. The introduction of an XML ‘‘slim’’ format is an example of how MARC records are ‘‘read’’ into machines. It provides no conceptual revolution and no change in the intellectual aspects of producing bibliographic records according to the

133

Meta-information about MARC

Library Hi Tech Volume 22 . Number 2 . 2004 . 131-137

Joaquim Ramos de Carvalho et al.

standard. It looks like a simplification because XML records are more easily read by humans than the original ISO 2709 records, but in fact nothing of what makes a MARC record what it is (a formalized mapping of a real world entity) is changed. What XML brings is a significantly greater simplification in the machine processing of records. This simplification can be further expanded if we extend XML usage from the ‘‘readability level’’ to the ‘‘correctness level’’, achieving a fully XML-based validation framework. To achieve such a framework, there is a need to express in a machine-readable format the knowledge behind the definition of a ‘‘correct record’’ and, beyond that, to provide as much as we can of ‘‘adequacy’’ information, mainly in the form of conditional rules, descriptive commentaries and examples. The functional horizon is to provide information that an ‘‘adequately informed’’ system can use in order to validate records and to provide enhanced assistance to human operators in producing ‘‘adequate’’ records. As already mentioned, this apparently elaborated goal comes down to a very pragmatic task: to express the full (explicit and implicit) content of MARC manuals (e.g. of MARC21 or UNIMARC) in XML.

Getting the MARC21 manual into XML: general concepts Each field is described through a set of identifier attributes – a field has a tag and a name – and a set of occurrence attributes – a field may or may not be mandatory and repeatable. Description and examples elements hold information in (at least almost) human-readable form that can be used in help or documentation systems:

Sixteen characters that indicate the date and time of the latest record transaction and serve as a version identifier for the record. They are recorded according to Representation of Dates and Times (ISO 8601). The date requires 8 numeric characters in the pattern yyyymmdd. The time requires 8 numeric characters in the in the pattern hhmmss.f, expressed in terms of the 24-hour (00-23) clock.

005 19940223151047.0 [February 23, 1994, 3:10:47 P.M.]

Data fields also contain indicators and subfields elements:

General information for which a specialized 5XX note field has not been defined.





The legal values of indicators can also be described as a set of options. Each member of this universe of values can be expressed as a singular value or as a range and can have an associated descriptive text:



A value that specifies the number of character positions associated with a definite or indefinite article (e.g., Le, An) at the beginning of a title that are disregarded in sorting and filing processes.

Subfields are described by the same set of identifer and occurrence attributes used in fields description:



Fixed-length data elements contained in subfields (or fields) are represented through an appropriate element type, psubfield, that adds information about the start and end of the element. When these elements store only a discreet and finite set of values, an appropriate vocabulary of possible items is composed (provision is made to allow for external vocabularies located by means of an URL):

134

Meta-information about MARC

Library Hi Tech Volume 22 . Number 2 . 2004 . 131-137

Joaquim Ramos de Carvalho et al.

A one-character code that indicates the technique used in creating motion in the motion picture or videorecording.







Item is not a motion picture or a videorecording.







The structure of some data elements can be affected by the content of others. For instance, the type of material as set by the value of the first position of field 006 affects the composition of the rest of the field. This kind of adjustable structure is formalized by the applyif element, that describes how the contained subfields and positional fixed-length data elements are interpreted in view of a particular condition. The condition expression must form a syntactically correct XSLT Boolean expression:

...

The framework and its deliverables MARC and UNIMARC rules transcribed in this way can be processed in a number of useful ways.

The authors have produced a set of derived products from the XML version of a subset of the MARC21 manual, and have a more developed and extensive version of the UNIMARC manual. Here we present how the schema exemplified above can foster usage in the areas of validation, explanation of records and documentation of the standards. While developing examples of the usefulness of the schema, the authors relied intensively on XML transformation stylesheets (XSLT). XSLT is a standard for transforming an XML document in another type of document, including another XML document or another stylesheet. The authors used this technology to show how an XML version of the MARC manual can be used to validate records, to ‘‘translate’’ MARC records into readable English descriptions, and to produce a Web-ready version of the manual. Since XSLT technology is freely available, the cost of implementation of the described usages was very low. Figure 1 shows how the XML MARC manual (MARC21DOC.xml) can produce a set of related products through XSLT transformations. The authors have produced the following stylesheets: . MARC21ValidationGenerator.xslt is a stylesheet that produces another stylesheet (Validator.xsl) that, when applied to a MARCXML bibliographic record, validates the record against the rules specified in MARC21DOC.xml. Note that the current LoC MACXML toolkit includes a stylesheet similar to Validator.xsl. The new approach here consists in the automatic production of the validating stylesheet from the MARC manual. This insures that changes in the XML representation of the MARC manual can be immediately applied to validation of records. . EnglishFormatGenerator.xslt is a stylesheet that also produces another stylesheet (EnglishFormater.xsl) that, when applied to a MARCXML bibliographic record, displays the record in an easy-to-read format, with tags decoded, indicator semantics explained, coded values translated into plain English. This mimics the behavior of a similar stylesheet included in the LoC kit, but it is generated from the MARC manual. . MARC21DOCtoHTML.xsl is a more conventional stylesheet that produces a HTML version of the MARC manual. Examples on the usage of these stylesheets, along with a more detailed description of the XML schema, a sample of the subset of the MARC21 manual, and a toolkit for testing purposes is available at: www.bookmarc.pt/tvs/marcdoc.html

135

Meta-information about MARC

Library Hi Tech Volume 22 . Number 2 . 2004 . 131-137

Joaquim Ramos de Carvalho et al.

Figure 1 XMLMARC through XSLT transformations

Conclusions The evolution of approaches to XML regarding MARC records representation has recently emphasized the separation of concerns between the transport and validation functions. In turn, alleviating XML transport from the validation rules raises the need to reappreciate the validation function itself. General-purpose MARC validation programs have been around for many years, and most bibliographic applications include validation facilities at the level of ‘‘correctness’’ of records, e.g. controlling cardinality and repeatability of fields, offering drop down menus of codes for editing coded data fields, etc. Many systems also include some level of help systems for human operators, in the form of texts transcribing explanation phrases for the content designators shown on MARC editor screens. In both aspects, these are constructs that usually make use of reduced information and limited functionality, very far from conveying all knowledge that is included in MARC analogue documentation. Moreover, these validation and help functionalities are built from scratch for each system or application, with no automated relationship with the authoritative source of the underlying MARC standards. This paper has considered an XML framework for the validation of MARC records, having in mind a full-feature knowledge base to support both systems’ and users’ activities. The flexibility and neutrality of XML constructs allows for the

combination of these traditionally separated, or not completely integrated, purposes. The authors have proposed a MetaMARC Schema that, together with standard stylesheet techniques, can produce datadriven validation, documentation, help systems, and human-friendly transformation of records. The authors believe that this approach will lower the cost of implementation and usage of MARC-based software; at the same time, it can facilitate the standard use of XML regarding bibliographic records. XML has introduced yet another level of crucially needed agreement in the library community, if it is to make the most of the standardization efforts previously achieved at all levels up to MARC. Finally, the proposed MetaMARC Schema can facilitate and extend the authoritative function and the reach of the documentation produced by the MARC standard agencies. These are goals that can contribute decisively to the technological realignment of bibliographic systems, bringing together actual systems with legacy features with the newest technologies by taking full advantage of the major standards of both sides.

Notes

136

1 Available at: http://xmlmarc.stanford.edu/ webliography.html and http://sunsite.berkeley.edu/ XML4Lib/ 2 Available at: www.w3.org/XML/Schema

Meta-information about MARC

Library Hi Tech Volume 22 . Number 2 . 2004 . 131-137

Joaquim Ramos de Carvalho et al.

3 Available at: www.w3.org/Style/XSL and http:// dmoz.org/Computers/Data_Formats/ Markup_Languages/XML/Style_Sheets/XSL/ Implementations/ 4 Available at: http://lcweb.loc.gov/marc/marcdtd/ marcdtdback.html; www.culture.fr/BiblioML; and http://xmlmarc.stanford.edu 5 Available at: www.bookmarc.pt/tvs 6 Available at: www.w3.org/TR/xslt

References ANSI/NISO (1996), Information Interchange Format, Z39.2-1994 (2001), Washington, DC. Carvalho, J. and Cordeiro, M. (2002), ‘‘XML and bibliographic data: the TVS (transport, validation and services) model’’, paper presented to the 68th IFLA Council and General Conference, available at: www.ifla.org/IV/ifla68/papers/075-095e.pdf (accessed 31 August 2003). Carvalho, J. and Cordeiro, M. (2003), ‘‘Web services: next generation interoperability for libraries’’, in Nixon, C. (Comp.), Internet Librarian International Conference, 25-25 March 2003, Birmingham UK, Collected Presentations, Information Today, Medford, NJ, pp. 40-9. Cordeiro, M. and Carvalho, J. (2003), ‘‘Web services: what they are and its importance for libraries’’, VINE, Vol. 32 No. 4 Issue 129, pp. 46-62. Exner, N. and Turner, L. (1998), ‘‘Examining XML: new concepts and possibilities in Web authoring’’, Computers in Libraries, Vol. 18 No. 10, November/ December, available at: www.infotoday. com/cilmag/ nov98/story2.htm (accessed 31 August 2003). Gardner, J. (n.d.), ‘‘Exploring what’s neXt: XML, information sciences and markup technology’’, available at: http:// vedavid.org/xml/docs/eXploring_xmlandlibraries.html (accessed 31 August 2003). Granata, G. (2000), ‘‘XML e formati bibliografici’’, Bolletino AIB, No. 2, pp. 181-91. Hjørgensen, P.H. (2001a), ‘‘XML standards and library applications’’, paper presented at ELAG 2001, available at: www.stk.cz/elag2001/Papers/ Poul_HenrikJoergensen/Show.html (accessed 31 August 2003). Hjørgensen, P.H. (2001b), ‘‘VisualCat: cataloging with XML, RDF, FEBR and Z39.50’’, available at: www.bokis.is/iod2001/slides/Jorgensen_slides.ppt (accessed 31 August 2003). Hough, J., Bull, R. and Young, B. (2000), ‘‘Using XSLT for XML MARC record conversion’’, discussion paper, v. 0.2, 16 June, available at: www.crxnet.com/one2/ xslt_marc_report.pdf (accessed 31 August 2003). International Standards Organization (ISO) (1996), Information and Documentation – Format for Information Exchange, ISO 2709, ISO, Geneva.

Kim, H. and Choi, C. (2000), ‘‘XML: how it will be applied to digital library systems’’, The Electronic Library, Vol. 18 No. 3, pp. 183, 189. Johnson, B.C. (2001), ‘‘XML and MARC: which is right?’’, Cataloging and Classification Quarterly, Vol. 32 No. 1, pp. 81-90, available at: http://elane.stanford.edu/docs/ johnson.pdf (accessed 31 August 2003). Lam, K.T. (1998), ‘‘Moving from MARC to XML’’, available at: http://ihome.ust.hk/~lblkt/xml/marc2xml.html (accessed 31 August 2003). McCallum, S. (2000), ‘‘Extending MARC for bibliographic control in the Web environment: challenges and alternatives’’, paper presented at the Library of Congress Bicentennial Conference on Bibliographic Control for the New Millennium, available at: http:// lcweb.loc.gov/catdir/bibcontrol/mccallum_paper.html (accessed 31 August 2003). Medeiros, N. (1999), ‘‘Making room for MARC in a Dublin Core World’’, Online, November, available at: www.onlinemag.net/OL1999/medeiros11.html (accessed 31 August 2003). Miller, D. (2000), ‘‘XML and MARC: a choice or a replacement?’’, paper presented at ALA Annual Conference 2000, available at: http://elane.stanford. edu/laneauth/ALAChicago2000.html (accessed 31 August 2003). Miller, D. (2002), ‘‘Adding luster to librarianship: XML as an enabling technology’’, available at: http:// elane.stanford.edu/laneauth/Luster.html (accessed 31 August 2003). Qin, J. (2000), ‘‘Representation and organization of information in the Web space: from MARC to XML’’, Informing Science, Vol. 3 No. 2, available at: http:// inform.nu/Articles/Vol3/v3n2p83-88.pdf (accessed 31 August 2003). Tennant, R. (Ed.) (2002), XML in Libraries, Neal Schuman Publishers, New York, NY, and London. Van Herwijnen, E. (2000), ‘‘The impact of XML on library procedures and services’’, available at: http:// lhcb.web.cern.ch/lhcb/~evh/xmlandlibrary.htm (accessed 31 August 2003).

Further reading Gatenby, J. (2000), ‘‘Internet, interoperability and standards: filling the gaps’’, available at: www.niso.org/press/ whitepapers/Gatenby.html (accessed 31 August 2003). Hopkinson, A. (1998), ‘‘Traditional communication formats: MARC is far from dead’’, paper presented at the International Seminar, ‘‘The function of bibliographic Control in the Global Information Infrastructure’’, available at: www.lnb.lt/events/ifla/hopkinson.html (accessed 31 August 2003). Motta, S. and Ursino, G. (2000), ‘‘XML su tecnologia MOM: un nuovo approccio per i software delle biblioteche’’, Bolletino AIB, No. 2, pp. 195-203.

137

Creating metadata practices for MIT’s OpenCourseWare Project Rebecca L. Lubas Robert H.W. Wolfe and Maximilian Fleischman

The authors Rebecca L. Lubas is the Special Formats Cataloging Librarian, Robert H.W. Wolfe is Metadata Specialist and Head, Metadata Unit and Maximilian Fleischman is Metadata Production Assistant, all at Massachusetts Institute of Technology Libraries, Cambridge, Massachusetts, USA.

Keywords Online cataloguing, Libraries, USA

Abstract The MIT libraries were called upon to recommend a metadata scheme for the resources contained in MIT’s OpenCourseWare (OCW) project. The resources in OCW needed descriptive, structural, and technical metadata. The SCORM standard, which uses IEEE Learning Object Metadata for its descriptive standard, was selected for its focus on educational objects. However, it was clear that the Libraries would need to recommend how the standard would be applied and adapted to accommodate needs that were not addressed in the standard’s specifications. The newly formed MIT Libraries Metadata Unit adapted established practices from AACR2 and MARC traditions when facing situations in which there were no precedents to follow. Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 138-143 # Emerald Group Publishing Limited . ISSN 0737-8831 DOI 10.1108/07378830410524567

Until 2002, MIT Libraries’ Bibliographic Access Services only dabbled in non-MARC metadata. The reliable MARC format met most cataloging needs for decades. In the last few years, encounters with other metadata schemes began to occur with increasing regularity. Frequently, digital objects carried metadata that could be used in the online catalog if harvested and converted to MARC. The libraries converted a sampling of Dublin Core (DC) and Federal Geographic Data Committee (FGDC) metadata records into MARC to make them compatible with MIT’s local integrated library system. The focus of these experiments was to make the data as MARC-like as possible rather than to exploit the features of the alternative standards. With the advent of DSpace, MIT’s digital repository, the libraries’ staff gradually realized that not all metadata would ultimately be converted to MARC and that in some cases MARC would not be the most desirable standard. DSpace contains items in a variety of formats. Some of the items are born digital and others are converted from print resources. Dublin Core was chosen as the DSpace metadata standard because it is well-developed and flexible enough to address the needs of a wide array of formats. DC’s similarity to MARC also made it a logical choice for MIT’s first major venture into non-MARC metadata. Its advantage over MARC is in being tailored specifically for describing and providing access to electronic resources. The DC records in DSpace function in the same way that the MARC records do in an ILS. They are primarily surrogates to aid discovery. The libraries’ approach to creating metadata practices for DSpace was heavily influenced by MARC traditions. These early metadata experiences demonstrated that much needed to be learned to appreciate and utilize the capabilities of other metadata standards. The libraries formed a Metadata Advisory Group in 2001 for the express purpose of creating inhouse expertise about metadata beyond MARC. In the spring of 2002, representatives from MIT’s OpenCourseWare Project (OCW)[1] approached the libraries. OCW plans to make much of the course materials from 2,000 of MIT’s course offerings available on the Web, free of charge, to any user anywhere in the world. They plan to do this by 2007, with 25 per cent off the courses available in Fall 2003. OCW’s planners recognized the need for metadata to make the courses with their associated learning objects Received August 2003 Revised September 2003 Accepted November 2003

138

Creating metadata practices for MIT’s OpenCourseWare Project

Library Hi Tech Volume 22 . Number 2 . 2004 . 138-143

Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman

searchable, retrievable, and readily preserved. They also recognized that the MIT Libraries would be the place to begin looking for metadata experts.

The OCW metadata proposal The MIT Libraries were engaged to propose a metadata scheme in the summer of 2002 for OCW. OCW needed a metadata scheme to begin using immediately for courses going into production the following fall. The libraries agreed to recommend an accepted metadata standard and to suggest best practices for the standard’s application in OCW. The proposal also included staffing suggestions for the creation of a metadata unit to provide metadata application services to OCW. The unit would be organized under Bibliographic Access Services, the libraries’ monograph cataloging department. The OCW project presented another opportunity to take a step away from MARC in the application of metadata to electronic resources. The libraries recommended the SCORM/IEEE Learning Object Metadata (LOM) standard for the foundation of OCW metadata. Sharable Content Object Reference Model (SCORM) treats educational digital objects as active, dynamic items – an aspect which traditional library organization usually neglects. SCORM includes a more robust architecture for describing the digital objects’ structural and operative relationships. IEEE LOM has nine basic element areas: (1) General; (2) Lifecycle; (3) Meta-Metadata (information about who created the metadata and how it was created); (4) Technical; (5) Educational; (6) Rights; (7) Relationship; (8) Annotation; and (9) Classification. The General, Lifecycle, Meta-Metadata, Technical, Annotation and Classification areas provide information much like a familiar MARC record with local information fields added. The Rights area addresses the complex aspect of rights management in the digital world. OCW chose to handle rights information in a separately-staffed Intellectual Property division with its own metadata record. The educational area, which describes the use of the object in a learning context, is an aspect of SCORM/IEEE LOM that needed the most work in establishing best practices for the libraries staff to fully exploit the

standard’s capabilities. The Relationship area provides a way to describe how resources interact with one another. After examining the 20 pilot courses, libraries staff identified a need for three types of metadata records for each OCW course. The initial proposal labeled these record types: . Course Level; . Course Item; and . Content Item. The Course Level record was an overall record, describing the entire course site and its contents. The Course Item record would describe items created for the specific course, such as syllabi, lecture notes, and exams. The Content Item record would describe items that could stand by themselves out of the context of the course, such as journal articles. The Course Item records and the Content Item records would use the IEEE LOM Relationship area to link the object back to the course of origin. This aspect of the initial proposal came out of the common cataloging and library collection management practice of treating published items differently from more ephemeral material.

Project planning and implementation After the initial proposal was accepted, the libraries then participated in OCW’s workflow and production planning. During this phase, the original proposal was adjusted as more decisions were finalized about the actual course content, production timelines, and available labor. It became increasingly clear that the distinction between Content Item and Course Item was not as important as it had been with more traditional formats in a library collection. As planning for the metadata service developed, the practices for Course Items and Content Items appeared nearly identical. By the time full production began, the only discernable difference between a Course Item and a Content Item was that a Content Item might have additional contributors listed. OCW project personnel use the term ‘‘resources’’ to refer to all items, and the libraries staff ultimately adopted this term for simplicity. In practice, a fairly rigid content hierarchy of the metadata records developed. Each course was organized on three levels: (1) Course Level; (2) Section Level;and (3) Resource Level. The course in general is represented by a single HTML document, and the Course Level record became the most robust of the metadata records in

139

Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman

Creating metadata practices for MIT’s OpenCourseWare Project

Library Hi Tech Volume 22 . Number 2 . 2004 . 138-143

OCW’s content management system. It is only at this level that name authority work and subject analysis are provided. In the suggested practice of the original proposal, every resource within a course received detailed subject analysis and name authority work. This requirement was dropped in the libraries’ final agreement with OCW because of the labor-intensive nature of such work. With an average of 35 resources per course, there would have been 70,000 resources to analyze. OCW’s tight deadlines did not grant enough time to do subject analysis of every resource with the amount of staff hours that were funded. The next level in the hierarchy, the Section Level, developed into aggregations of resources around functional activities such as exercises, exams, lecture notes, syllabi, etc. These section pages receive the least amount of metadata. Principally they record the organization and operation of the complex digital objects. The Section Level records inherit metadata from the Course Level, and a Relationship field is generated that refers every section back to the course in which it resides. Resources, the last level of the hierarchy, are non-HTML single bitstreams. Adding to the complexity of the course is the fact that resources refer to, employ and relate to each other. Digital objects, and educational digital objects especially, present a challenge for MARC in their recombinable nature. LOM addresses this aspect in a more logical way than MARC does. Unlike a catalog record, the metadata records created for the OCW project are more than search and recovery surrogates. The content management system relies upon the metadata records in constructing the navigation environment of the course. Course and Section HTML pages are not the source of global and local navigation. Rather, the CMS includes a ‘‘frame’’ or ‘‘skin’’ which contains navigation menus for each course. These menus are built from the structural metadata in the records the Libraries help create. The workflow for the OCW metadata is based on traditional library technical services models, with the added feature of working with authorand auto-generated metadata. The process requires the OCW faculty liaison, working with the course’s creator, to submit a preliminary set of metadata following guidelines developed by the libraries and OCW staff. Some of the metadata is supplied by the system, drawing on pieces of information from the course framework, such as the course’s number, title, and the semester in which the course was originally taught. Other metadata elements are filled in by the faculty liaison. The records then come to the libraries’

Metadata Unit[2], where a metadata production assistant and a metadata specialist revise them. The division of labor in the libraries’ unit mirrors cataloging departments. The metadata production assistant, the equivalent of a library technical cataloging assistant, revises the existing descriptive metadata much like a copy cataloger would revise a MARC record provided by another library. The metadata specialist, acting in the manner of a traditional professional cataloger, performs the specialty duties of classification, subject work, and name authority work. In practice, during the first few months of production, the metadata specialist did much resource metadata revision and creation in order to see a representative number of resources as well as meet OCW’s course publication deadlines. This activity guided the establishment of best practices. Using SCORM’s implementation of IEEE’s LOM allows the cataloging entity to employ the major concepts of AACR2 and MARC in its areas, including description, subject access, and classification. An attractive feature of LOM is that it offers considerable flexibility by allowing multiple thesauri to be plugged in the Classification area as long as the thesaurus in question is identified. The proposed OCW implementation of IEEE LOM employed methods from traditional cataloging such as standardization of contributors’ names via authority work in the Contribute area and the use of the Library of Congress Subject Headings in the Classification area. Additionally, OCW requested that the Libraries include the National Center for Education Statistics’ Classification of Instruction Programs. The Libraries proposed that during the course creation process, the faculty be allowed to use any other classification scheme they might be familiar with, and that such suggestions be included in the metadata. In practice, this capability has not yet been used to its full potential. For the first 500 courses, the OCW faculty liaisons provided keyword lists in a free form manner. The liaisons create a list of keywords derived from the course descriptions and resource titles. When the metadata are enhanced in the libraries, the keywords are examined for redundancy. Some keywords may be added. For example, if a Library of Congress Subject Heading is added to the Classification area in the Course Level metadata, and that heading has cross-references, the cross references may be used as keywords. Both extremes of electronic search systems, uncontrolled keywords and controlled vocabularies, are available in OCW. OCW metadata are developing into a hybrid of time-tested cataloging practices and Web searching methods.

140

Creating metadata practices for MIT’s OpenCourseWare Project

Library Hi Tech Volume 22 . Number 2 . 2004 . 138-143

Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman

Name authority work is performed for the course authors and contributors that appear in the Contribute area. First, the metadata creator searches the OCLC authority file. If the name is not found there, the authority file in Barton, MIT’s catalog, is searched. If a match is still not found, the MIT roles database of current students and faculty members is examined. When an authoritative form of the name is found, the metadata creator opens a record in a FileMaker Pro database, and records the name, any cross-references, and the course number in which the name appears. In the event of conflicts, AACR2 methods for distinguishing names are employed. For example, if there are two James Smiths, distinctions such as a middle initial or birth date are sought. The authoritative form is then entered in the Course Level metadata, and this information is replicated in all levels of metadata for the course. At this time, the database of OCW author names stands alone, but the Metadata Unit is planning ways in which the information may be utilized more interactively. Much labor is saved via inheritance. All of the resources within a course inherit the Course Level metadata, which includes contributors, basic technical information, and classification terms. The name of the course populates the Relationship area in each resource. If a resource is used in more than one course, it has multiple relationship entries. It was originally anticipated that OCW learning objects would take many forms such as video, audio, text, and program code. IEEE LOM includes a technical requirements element set, and the libraries’ suggested implementation is an expansion of the System Requirements note in MARC. The courses encountered in the first months of the Metadata Unit’s work tended to be heavily reliant on text documents in Portable Document Files (PDFs). The focus on text resources allowed the metadata creators to borrow heavily from the text-centric MARC standard for best practices. For example, the best practices for the Unit use spacing and punctuation rules from AACR2/MARC to guide the metadata in the General Description area of IEEE LOM. The original proposal was made and much of the workflow planning happened before a content management system with a metadata interface was chosen. The first 50 pilot courses were published in a temporary system. The permanent content management system became live shortly before the production process for the 500 Fall 2003 courses began, so much of the development of best practices needed to happen on the job.

Creating resource item record best practices The arrival of the Content Management System (CMS), an out-of-the-box Microsoft solution, brought the real challenge to preservation of rich MARC/AACR2 practices. Where the SCORM/ IEEE LOM standard is extensible and allows for that most sacred of AACR2 traditions, cataloger’s best judgment, the CMS is a rigid tool that forces the metadata creator to be dogmatic about metadata application, especially in applying metadata to resource level objects. The need to balance the impulse toward traditional practices against the flexibility required by the OCW resources and the limitations of some of the available tools is best explored through examination of the Metadata Unit’s efforts towards best practices in the application of two of the SCORM element sets to content item or resource level objects.

Technical requirements After choosing a standard that allowed the most flexibility in describing technical requirements, the difficulty in applying these metadata elements focused upon interpretation of the SCORM instruction for this element to provide ‘‘technology required to use this learning object’’. The Metadata Unit identified two ways in which one could understand the intended use of the objects, the original educational use versus the OCW use. To provide technology required to satisfy the original educational use, one would follow the point of view of the disseminator of the course objects, providing all technology that a student would require to employ the objects in an actual learning situation. A second definition for OCW would require one to follow the point of view of the end-users of the OCW Web site, providing just that technology that would be required to access the material. Note that this expectation of use is closer to that which MARC records hold for library catalog users. While the libraries’ staff recognized and planned the metadata application for the intrinsic, complex operational nature of the OCW objects, in implementation that operation is restricted to something closer to the surrogate nature of MARC records. The Metadata Unit chose to follow the second interpretation and not provide all the information a student would require. This decision was based upon OCW’s statement that it does not intend these materials to constitute credit for a course or in any way substitute for MIT courses of instruction. Both the object and the

141

Creating metadata practices for MIT’s OpenCourseWare Project

Library Hi Tech Volume 22 . Number 2 . 2004 . 138-143

Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman

metadata are a snapshot in time. The material intended for public consumption, while in format instructive, in function is illustrative. This is not to say that the Metadata Unit does not take advantage of the richness of the SCORM Standard[3]. The unit’s rules of application are no technical requirement needed unless: . A Web browser will not display file and text editor displays file as gibberish. . For resources that require software of which there are multiple vendors, add a general technical requirement. Do not recommend one vendor over another. . Analyze the contents of Zip files for other possible technical requirements. . Many programs use generic file extensions. Examine the context in which the file is included in the section materials to determine the software that might be needed.

Best practices for learning resource types A second element that highlights the Metadata Unit’s attempts at good cataloging practice for electronic objects in a non-MARC standard is the ‘‘Learning Resource Type’’ (LRT). IEEE LOM defines the value space for this element as the following list: . exercise; . simulation; . questionnaire; . diagram; . figure; . graph; . index; . slide; . table; . narrative text; . exam; . experiment; . problem statement; . self assessment; and . lecture. Not having the resources or time to provide the level of effort required to properly create a custom taxonomy for all of MIT’s learning objects, the Metadata Unit elected to adopt this list. The justification for adoption focused on the benefits of a having an immediately employable controlled vocabulary, focusing the unit’s effort at identification and improving the claim to interoperability. Interoperability is a prime concern at the end of the lifecycle of these objects, when they will have to be crosswalked to other metadata schemes in order to be permanently archived in a repository such as DSpace.

SCORM’s implementation of IEEE LOM describes the Learning Resource Type element as specifying, ‘‘the kind of learning object’’. For those used to AACR2 and MARC, the LRT functions much as General Material Designations and Specific Material Designations. Unlike AACR2, the LRT terms of SCORM do not have glossary definitions. IEEE LOM refers implementers to the OED and ‘‘communities of practice’’ for definitions of its terms. The Metadata Unit found that the OED provided an adequate means of interpreting these terms as kinds of learning objects. Kind is understood to define a ‘‘class of objects distinguished by attributes possessed in common’’. The unit discovered that the three attributes these terms describe are format, function, and association. Lacking an equivalent of the Library of Congress Rule Interpretations for IEEE LOM, the Metadata Unit created rules of application after seeing a critical mass of roughly 300 courses with associated resources. Here is the list of the Best Practice guidelines that the unit devised for applying LRTs: (1) When the attributes suggest more than one value for the learning resource type, choose the dominant kind in this fashion: format dominates function dominates association. (2) Per SCORM specification that, ‘‘the most dominant kind shall be first’’, enter the LRT that is most dominant. As the content management system does not allow for multiple instances of this element, place any other applicable LRTs in the description field. (3) Take a narrow interpretation of narrative text. If the dominant format is determined to be textual but not narrative, then apply the rule: Function dominates association dominates format. If neither a function nor an association is identifiable for the object, only then resort to narrative text. (4) For images that are not readily identifiable by their format as a diagram, figure or graph, look to Function and then Association. If neither of these is identifiable, then best practices recommend the use of ‘‘Figure’’ with an appropriate description. (5) When applying an LRT for binary code use Simulation as format and LRT. Code that displays text in a browser or text editor should be treated as text, apply rule 2. The way in which images were captured in the LRT best practice vocabulary was not entirely satisfactory. The generators of the list did not accurately consider the instructive power of images that were not communicating some quantifiable data. ‘‘Figure’’ became a catch-all for images of this sort, while the metadata creators relied upon the educational description element to provide better information.

142

Creating metadata practices for MIT’s OpenCourseWare Project

Library Hi Tech Volume 22 . Number 2 . 2004 . 138-143

Rebecca L. Lubas, Robert H.W. Wolfe and Maximilian Fleischman

Conclusion In selecting metadata formats and creating best practices, MIT Libraries’ Metadata Unit attempted to preserve AACR2 and MARC cataloging traditions developed from generations of library experience. The adoption of other standards is an attempt to accommodate the diffuse natures of the libraries’ growing collection of electronic resources within the standards and practices of traditional cataloging. Sometimes it is the tools that make things difficult, when dealing with a CMS that is not as extensible as the standard. Sometimes it is the youth of the area of endeavor, when controlled vocabularies still have gaps. The Metadata Unit has found that even with all the forward thinking and cutting edge technologies used in the OCW metadata effort, it

is the traditional cataloger’s sensibilities regarding good description and access – as derived from the AACR2/MARC heritage – that is most valuable in discovering access to the library’s new class of electronic objects.

Notes

143

1 OpenCourseWare, available at: http://ocw.mit.edu/ OcwWeb 2 MIT Libraries Metadata Unit, available at: http://libraries.mit.edu/guides/subjects/metadata/ index.html 3 SCORM Standard, available at: www.adlnet.org/ index.cfm?fuseaction=scormabt

Medium or message? A new look at standards, structures, and schemata for managing electronic resources Sharon E. Farb and Angela Riggio

The authors Sharon E. Farb is Coordinator for Digital Acquisitions and Angela Riggio is Serials and Digital Resources Cataloger, both at the University of California, Los Angeles, California, USA.

Keywords Resources, Internet, Online cataloguing, Licensing, Information management, Libraries

Abstract This article examines several library metadata standards, structures and schema relevant to the challenge of managing electronic resources. Among the standards, structures and schema to be discussed are MARC, METS, Dublin Core, EAD, XrML, and ODRL. The authors’ analysis reveals that there is currently no one standard, structure or schema that adequately addresses the complexity of e-resource management. The article concludes with an outline and proposal for a new metadata schema designed to manage electronic resources. Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 144-152 # Emerald Group Publishing Limited . ISSN 0737-8831 DOI 10.1108/07378830410524576

I do not accept the argument that a word is a word is a word is a word no matter where it appears. There is no pure ‘‘word’’ that does not inhabit context inextricably. I don’t think the medium is absolutely the message, but I do think that the medium conditions the message considerably (Birkerts, 1995).

Increasingly, we are learning about new metadata schemata, structures, and standards designed to address various communities and constituencies. To date, however, none exist that address the dynamic, multidimensional, and legal aspects of acquiring and managing licensed electronic resources (e-resources) over time. The first section of this paper provides an overview of some of the unique characteristics, functionality, and challenges of managing e-resources. The next section describes the type of functionality and metadata required to support a comprehensive e-resource management system. The pros and cons of several metadata schemas are then highlighted to emphasize the functional requirements to support a comprehensive e-resource management system. The penultimate section examines some of the policy issues related to managing e-resources persistently over time; for example, the impact of proprietary software, restrictive licensing agreements, and digital rights management on the requirements for metadata to support long-term access and use of e-resources. The paper concludes with an outline of a proposed new schema to support the design and implementation of systems and tools to manage e-resources effectively over the long term.

The long-term challenges of e-resource management Why are e-resources so difficult to manage? What functionality and metadata are required to support e-resource management over time? This section outlines ten key challenges facing effective persistent e-resource management. The list of challenges is not meant to be exhaustive, but representative of the ever-changing, multidimensional nature of the electronic medium. The first group of challenges comprises some of the unique characteristics of e-resources and digital collections: . the instability of content; . license-legal issues; . multiple business models present in current e-publishing markets; Received September 2003 Revised November 2003 Accepted November 2003

144

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio . .

product performance troubleshooting; and technological controls.

The remaining five are common to any library, museum, or archive collection, but in the context of electronic or digital resources management, exist in an exceedingly complex legal and technological environment: (1) description and identification; (2) access and discovery; (3) archiving of licensed materials and collections; (4) digital preservation; and (5) perpetual access, and persistence of licensed digital collections. Much has been written about the nature of digital content. With few exceptions, researchers, publishers, consumers and creators agree that digital content is dynamic. The Oxford English Dictionary defines dynamic as: . . . adj. 1. Of or pertaining to force producing motion: often opposed to static (OED Online, 2003).

Libraries have developed policies and procedures to manage analog content, which tends to be static and fixed. For example, paper books and journals have a definite beginning, middle, and end. The tangible format requires the designation of a different edition when content is altered and republished. These analog materials are purchased and are owned in perpetuity. Library practices include the selection, acquisition, description, loan, cataloging, maintenance, preservation, and archiving of analog products. These processes and services are well established and rely on standards including MARC to formalize, automate, and share information. Few, if any analog acquisitions, regardless of size, require a legal opinion or formal legal contract signed by both parties. All are governed by a set of agreed-upon practices and rely on national and international standards to exchange static information. In the USA, federal copyright law provides for fair use, interlibrary loan, and preservation and archiving of lawfully purchased analog materials and collections. The same cannot be said of licensed digital collections where everything is negotiable. Most commercially-produced electronic products require a license agreement or contract which regulates their use. Negotiated license agreements are binding contracts governed by state contract law that when breached include liability and remedies that are enforceable under state law. Prior to the recent growth of electronic resources, libraries did not generally risk incurring state contract law liability for the use or alleged misuse of acquired content or collections. The stakes, risks and potential liability for libraries in a

licensing environment are new and require new tools to effectively manage. This obligation is unique to the acquisition and maintenance of digital and electronic products. Negotiated license agreements are negotiated individually and control all manner of scope, use, rights, and restrictions. Licenses typically require prolonged and time-consuming negotiations to insure that basic library principles and services are met, which include ‘‘walk-in’’ users, remote access for authorized users, interlibrary loan, scholarly sharing rights, linking, archiving or perpetual access rights, the confidentiality of user information, continuous use downtime, and notice of click-through license. In addition, a number of very important legal and liability issues are included in license agreements for digital content, such as indemnification, termination, breach and cure period, governing law, and warranties (Yale University, 2003). These functions, as well as other acquisition and payment processes, including automatic renewals, price, price linked to print, and price linked to consortial agreements must be recorded in a meaningful way to be systematically retrieved upon demand. Moreover, most license agreements that apply to digital content require notification to end users of relevant terms or restrictions on use. Finally, it must be noted that none of the rights guaranteed under federal copyright law are guaranteed in any license agreement. To complicate matters, libraries, archives and museums increasingly find themselves on both sides of the issue, as the consumer and/or the creator of digital content. To date, there is no one schema or metadata structure that is designed solely to produce reports and to display and distribute relevant licensed library collection information. A plethora of business models currently plagues the commercial publishing environment. There is a clear need to standardize these practices, but content providers, producers, and publishers cannot make up their minds as to what business models are preferred. In the past five years, numerous models have been proposed with varying degrees of popularity, adherence, or success for all involved. Some examples of existing business models are based on full time equivalence (FTE of students, faculty, staff), on specific populations (i.e. how many PhDs in specific disciplines), on site-specific models, individual licenses and subscriptions, or license at domain level (i.e. .edu, consortial licensing, models based on use, models based on maintaining print, etc.). While there are a number of metadata initiatives aimed at encouraging and streamlining e-commerce (a few to be addressed in this paper),

145

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

there is no one schema that provides support for the multidimensional functionality required in a comprehensive e-resource management system. The troubleshooting of licensed, networked, Web-based products is complex and time consuming at best. A comprehensive e-resource system should have the ability to record and maintain a thorough history of problem reports that can be used to evaluate future selection decisions and license negotiations. Reports of online access problems involving licensed digital resources require a thorough comprehension of the entire lifecycle of a resource, including the parties involved in the legal agreement, license restrictions, payment details, as well as end-user concerns, such as computer technology (i.e. electronic configuration and interface), product downtime, and possible restrictions on use and access. ‘‘Technological controls’’ consist of security features that are designed to automatically restrict the access and use of an online product. They are routinely used in the electronic delivery of digital motion pictures, music, and e-books. In stark contrast to copyright law, which is limited in both scope and duration, technological controls can last forever, effectively locking down digital or electronic content that would otherwise end up in the public domain in perpetuity. Libraries of all types face significant challenges regarding the new digital publishing environment. The logarithmic growth in production of electronic books, journals, and databases has complicated and transformed the processes and workflows associated with traditional library acquisitions functions. Most integrated library systems are designed to accommodate the control, maintenance, access, and discovery of analog resources via the representation of physical entities and their relationships (e.g. books, paper serials, etc.). These systems, however, cannot adequately cope with the growing complexities and challenges involved in the evaluation, selection, acquisition, management, access, and troubleshooting of licensed digital products from a variety of third parties, as well as the ongoing access and upkeep of electronic products.

Metadata required to support e-resource management Description and identification Descriptive metadata has no doubt received the lion’s share of attention as well as the intellectual and financial resources for development throughout the history of libraries, and that trend has continued during the rise of the Internet. In

order to identify something, it must be described. In the context of modeling an e-resources management system, descriptive needs fall somewhere between the complexity of the traditional role of a library catalog and a simple Dublin Core record. Such a system requires the identification of a resource, its location, its relationships to other resources, and to its supplier package. Depending on the interoperability of such a system with a traditional library catalog, much of the descriptive metadata for an e-resource record could be derived from a MARC bibliographic record, such as title information, author information, and identifiers, such as an ISSN, ISBN, or a utility generated record number (i.e. OCLC, RLIN). Among the descriptive data which is most unique to e-resource management revolves around the relationship between an individual resource and its parent supplier package. MARC can accommodate the basic relationship between a title and a larger group of items, but does not express the detailed nature of the package itself. Specifically, an e-resource system should be able to indicate if a particular library’s licensed package of titles is a complete set of holdings, or a selected subset of available titles. This adds a new dimension to the traditional concept of resource description. Additional descriptive data can also be supplied by the vendor of the resource, which can be packaged in any one of a variety of metadata flavors, including MARC, Dublin Core, and ONIX. The extensibility and flexibility of any new metadata schema must also be considered. The overall significance of the institutional requirements for e-resource management, public discovery, and other automated systems must drive the decision about how much descriptive metadata is enough: as noted by Gilliland-Swetland (2002): . . . [c]arefully designed metadata results in the best information management in the short and long-term.

Licensing Licensing is about the control over use of electronic resources. The Association of Research Libraries (ARL) points out that a significant distinction exists between acquiring resources or collections that do not require a license with those that do. Specifically, ARL (2003) notes that the: . . . [l]icenses may define the rights and privileges of the contracting parties differently than those defined by the Copyright Act of 1976. But licenses and contracts should not negate fair use and the public right to utilize copyrighted works.

In an analog environment, libraries select, acquire, catalog, and make available millions of items. The

146

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

information and metadata required to manage analog library processes, practices, and services is standardized and well established. In contrast, the acquisition of commercial electronic or digital resources requires license agreements and adherence to a variety of business models that are being used for the first time. These models govern the who, when, why, and where of the content and conditions related to the use of the electronic product. License agreements are governed by state contract law, and unlike analog purchase agreements, restrict access and use on a product-by-product basis, and on a customer-by-customer basis. In other words, unlike the acquisition of analog materials which once lawfully purchased were owned in perpetuity and were governed by national standards such as MARC, Z39.50, EAD, and federal copyright law, the acquisition of a licensed digital resource provides no guarantees. Most license agreements provide no warranty for the completeness of content of a resource, for the accuracy or integrity of a product, or for reliable access to that product over time. Licensing is the area that is both unique and new to libraries, archives, and other non-commercial information service providers, and is the most rapidly developing area in the commercial information resources market. To effectively manage, describe, and provide access to large digital collections supplied through a layering of third parties, license agreements requires information derived from the legal agreement or contract as well as technical, administrative, and business terms. Among some of the essential licensing data elements are those that: . describe key licensing terms or clauses and their status; . the scope of the license; . its duration; . the parties involved; . renewal notices, indemnification; and . warranties. Some of these key terms are: confidentiality of user information; . interlibrary loan; . scholarly sharing; . linking; . archiving; . completeness of content; . notice of click-through license agreements; . ADA or disability compliance; . usage statistics; and . perpetual access and termination (including breach and cure period). .

While there are a few metadata schemata designed to encourage the exchange of rights information,

they are designed to support commercial, but not library and library user needs. Similarly, there exists several rights management schemata coupled with various payment schemata, but there is not one comprehensive, integrated, standard structure or schema that provides the essential metadata useful to libraries to manage licensed electronic resources over time. The Digital Library Federation (2002) currently is sponsoring an effort by librarians in consultation with the vendor community to address these gaps in existing metadata schemata, which proposes an open source, interoperable, flexible, extendable, and workable solution to this problem. Access and administration The access and administration data element set is necessary to support a wide range of activities and processes related to current and long term access to electronic products. This group of elements contains the business and acquisitions terms that are specific to a product as well as its technological requirements, restrictions, or responsibilities. The unique activities in the electronic environment revolve around technical requirements for access, such as the registration of URLs with campus proxy servers (which guarantees authorized user remote access to resources), and the recording of technical requirements and hardware/software compatibility. Other data elements express user restrictions and terms of access, including the number of concurrent users allowed to view a resource, the user groups that are ‘‘authorized’’ (those who are allowed access to a resource), time limitations on resource availability, etc. Data elements in this group that control the management of electronic resources over time record information regarding permanent URIs, IP addresses, outside linking service updating, consortial partners, business models, price discounts, contact information, technical requirements, and proprietary software. Clearly the needs of the electronic exceed the capabilities of the MARC-based ILS. The information that is required for online materials is complex and requires a high level of maintenance. Many of these items require that an action occur based on a data element value, but these particular bits of information have not even been defined within a typical library acquisitions system, nor can current ILS technology handle these complex interactions between elements and resources. Digital troubleshooting Digital troubleshooting is another new phenomenon that requires an extensive knowledge of the nature of electronic resources, from how they are acquired and maintained within a

147

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

technological environment to the license agreement, local system details, and user needs. Digital troubleshooting is essential as it insures access, and can be used for the evaluation of licensed vendor products and services. In an analog environment, library troubleshooting consists of reports of missing or damaged library materials that could be tracked by following the trail of the physical item. In the electronic environment, the challenge is far more complex and often takes the troubleshooter on several disparate missions to track down and solve a single problem. The lack of access to an electronic resource could be traced to a litany of errors: a missed payment, the failure to register with a service provider, a faulty network, a bad computer configuration, non-compliant hardware, a third-party site that is temporarily down, or a re-directed URL. The needs of electronic maintenance exceed the limits of MARC records and the traditional interaction of ILS modules. An electronic resource management system could effectively record hardware and software requirements, contact information, and registration details, virtually offering an electronic checklist of activities that need to be taken care of (such as registration, payments, etc.) before access is turned on. Notes about downtime and server problems can be recorded in detail, and made available to both library staff and users in a timely fashion. .

Pros and cons of existing metadata schemata, standards, and structures Dublin Core With the ultimate goal of improving document retrieval from the Web, the Dublin Core Metadata Initiative (DCMI) developed a basic set of data elements for resource description. The Dublin Core element set is concerned mostly with content metadata, which can consist of the physical description of a resource, or a description that describes the intellectual aspects of a resource. The basic set of 15 elements is conceptually divided into three groupings: content, intellectual property, and instantiation. For example, the element ‘‘creator’’ establishes the person or body responsible for a work, and is placed in the area of ‘‘intellectual property’’, while the element ‘‘description’’ resides in the ‘‘content’’ designation, as it reflects information intrinsic to the resource, a description of the resource itself. DC elements can be utilized as is, or can be further broken down to fit individual needs. The Dublin Core ‘‘qualifiers’’ serve two purposes. Some qualifiers are used to refine an element, thus narrowing the scope of the

element, but allowing more specificity. Other qualifiers are used to designate an encoding scheme for a controlled vocabulary, such as LCSH, or MIME types (Hillman, 2001). The beauty of the Dublin Core lies in its simplicity; practically anyone can use the schema, with little more than a list of elements and definitions. Because it enjoys vast support from many communities throughout the world, Dublin Core has become the de facto standard for electronic resource description. The use of Dublin Core-compatible elements in an electronic resources management context, simply put, makes good sense. For the most part, basic elements of description that would be required in such a system can be expressed using DC. There is no dispute about the meaning or usefulness of ‘‘creator’’, ‘‘language’’, ‘‘publisher’’, or ‘‘title’’ for the description of a digital product. The complexities of the relationships that arise, however, in the acquisition and maintenance of digital products from the library perspective may require the expression of equally complex relationships that cannot be presented in such a clear-cut manner. For example, the qualified DC element ‘‘RelationisPartOf’’ can adequately communicate a parent-child relationship between aggregator/supplier packages and their individual titles in a basic way, but the particulars of a publisher’s package cannot always be expressed by a DC element. Often, libraries acquire selected titles from a vendor’s bundle of resources. It is important for an electronic resources management system to be able to note if the library’s holdings of that package are complete or not. This information is useful when considering new purchases or withdrawal of material. It also plays an important role in access and troubleshooting. For example, if a patron who is familiar with a library’s electronic collection expects a certain title from ScienceDirect to be available, a reference librarian could easily be able to determine if a library’s holdings of that package are intentionally incomplete, or if the denial of access is due to a completely different problem. The Dublin Core element set was never meant to address the specific needs of electronic resource management. Detailed descriptions of access restrictions, acquisition information, and licensing cannot be properly expressed through the ‘‘rights’’ element alone. Since the Dublin Core is a descriptive metadata standard, it would benefit any system to define descriptive elements that are compatible with DC. Such a strategy offers systems the ability to share and transfer information across metadata schemata, and is precisely the strategy that the DCMI encourages. From a library perspective, should public

148

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

discovery also be incorporated into an e-resource management system, adherence to the DC standard would ensure better retrieval of resources by all library users.

ONIX The ONIX (Online Information eXchange) for books standard has emanated from a collaboration of members of the book industry trade, in order to share commercial book trade data in electronic form among all sectors of the industry. In essence, the ONIX standard was developed to promote the online book-selling industry. Better and more detailed book information can be transferred via the ONIX XML DTD, and its use also promotes a standardization of metadata across the book publishing trade. EDItEUR oversees its development, along with the Book Industry Communication, based in London, and the Book Industry Study Group in New York (EDItEUR, 2002). The ONIX element set is highly complex, and is structured around three record types: the serial item record, the serial title record, and the subscription package record. The ONIX data element set is focused on the description of a work as a commercial product, with the goal of transmitting a rich description of an item to vendors and wholesalers. In addition, the ONIX for serials record structure is able to transmit pertinent serials information, such as check-in, title change information, and package information (Jones, 2002). Much of this information would be useful for the acquisition and access of electronic resources in the library setting, which has recently been addressed (Dawson, 2003). A logical mapping to components of this element set could be used for the transmission and sharing of library information with the vendor community, and vice versa. In The Exchange of Serials Subscription Information, Jones (2002) makes the point that information kept by the vendor community is: . . . often unsatisfactory for accomplishing the tasks that libraries and others would like to achieve.

Accordingly, the commercially designed ONIX DTD does not currently address the licensing detail critical to the library acquisition model, such as archiving rights, confidentiality of user information, use restrictions, etc. In order for the library to effectively participate in the development and implementation of the ONIX standard, it is important that library concerns, especially those regarding the access and licensing of materials, be addressed by the vendor community. The inclusion of data elements not

present in the ONIX schema, such as the details of a license signed by both parties, or the terms of a purchase agreement, could save the library a great deal of staff time and effort. This information, packaged along with item, subscription and holdings information into the XML-based ONIX record and transmitted to a library management system, would automate what can often be a painstakingly laborious effort by library staff. Metadata encoding and transmission standard (METS) METS is an XML-based structural framework which stores the metadata for a digital object. METS conforms to the Open Archival Information System (OAIS) model for the long term preservation and sharing of information. The structural framework which characterizes METS was conceptualized during the Library of Congress’ Making of America II project. Today, the development and growth of METS is accomplished with the additional support and sponsorship of the Digital Library Federation. A METS document allows for the effective storage, packaging, and transmission of digitized objects. METS provides the structural containers for administrative and descriptive metadata, and can include or reference any metadata served up in XML. Metadata can also be ‘‘wrapped’’ around the object itself. The METS data registry is responsible for this compatibility, and currently endorses Dublin Core, MARC, MODS, and MIX (Metadata for Still Images) schemata. At this stage, the use of METS alone would not adequately address the dynamic nature of licensed electronic documents acquired for use by a library, although its use is appropriate when a library creates and stores its own digital content. In a library setting, the metadata associated with a purchased or acquired resource usually resides elsewhere, in a separate database or within an ILS. Because of its ability to accommodate various XML metadata records, an e-resources management system could conceivably accept certain bundles of METS metadata and in turn share metadata encoded in an XML format with a METS-based repository. Of course, any sort of exchange of this nature would encourage the use of a standardized or METS-endorsed XML schema.

Metadata schemata and policy This section will highlight three key policy issues related to the development and implementation of an e-resource metadata schema, and the management, control, and use of digital information: the development and deployment of

149

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

digital rights management software and systems, restrictive licensing arrangements, and proprietary standards. All three illustrate the inherent tension between standardization and customization, between access and security, and between ownership and control. Margaret Jane Radin, Professor of Law and Stanford University, describes this tension as follows: . . . [r]oughly speaking, customization involves individualization, production of a unique item, or attention to a particularized person or application, whereas standardization involves non-individualization or mass production of a class of identical items, without attention to a particular person or application (Radin, 2001, p. 101).

Radin calls for more study and analysis of the policy implications of the merging and interaction of technical and legal standards. Radin (2001, p. 130) cites digital rights management, contracts of adhesion, and click-through contracts as examples of where technological and legal standards have blended to become a powerful factor affecting ‘‘user autonomy and choice’’ as well as: . . . push the have and have nots further apart, because it seems predictable that the haves will check the boxes to pay more money, and the have nots will not.

Digital rights management (DRM) is a critical information policy issue, because it can limit access and use by technological and legal means well beyond that allowed under existing federal copyright law. In addition, digital rights management systems, software and tools create serious barriers to long-term preservation of digital content. Radin (2001, p. 116) defines digital rights management as a technological solution: . . . that limits distribution and use of some piece of digitized content.

In response to the concern over additional technological and legal means to control use, access, and description of digital content through digital rights management, members of the library community are taking a look at DRM from a different perspective. For example, efforts such as Federated Digital Rights Management are taking another look at ways to express and protect user’s rights, fair use rights, and creator’s rights involved with the open publishing model. Much like the Digital Library Federation’s Electronic Resource Management Initiative (DLF ERMI), this group is reevaluating metadata sets and considering the fashioning of a schema that best reflects the interests of the library and its users as lender and the educational institution as potential content creator. The federated group is integrating this model with an authentication system, known as Shibboleth. In both cases, the emphasis in rights

expression no longer favors the traditional business models, but semantically conveys these rights from the specialized view of the library. Rights languages eXtensible rights Markup Language (XrML), currently in its version 2.0 release, is an example of a proprietary standard. XrML is an XML-based, commercially-designed schema whose purpose is to standardize the expression and implementation of all aspects of digital content rights. Its parent corporation, ContentGuard, is currently working alongside MPEG and the development of the MPEG-21 Rights Expression Language to develop standards for the moving pictures and audio industries, promoting the integration of content management with rights management. XrML can work across systems, and is not format-specific. It can be manipulated to handle all levels of complexity required for a particular institution, and can be used alongside other metadata schemata, such as ONIX and RDF (ContentGuard, 2003). XrML enjoys the support of major commercial enterprises, but has been criticized for its favoritism toward the commercial industry in regard to the expression of rights. The XrML data element set contains many of the basic licensing terms and descriptive element identifiers that would be used in a library management system. ContentGuard has created an extensive set of case scenarios that demonstrate various licenses that could be issued in different business models (ContentGuard, 2001). These examples are limited to the use of a resource, which illustrates the product’s bias toward insuring the rights of the content provider. While the library is concerned with content creator and provider rights, the rights of the library as lending institution, as well as the details of the business license, the purchase, the implementation and assurance of access to digital content are also key to the functioning of an efficient library system. Until there is an indication that large, commercial rights management companies will accommodate the needs of a not-for-profit content lender, the library community will continue to lobby companies like ContentGuard, but will also investigate alternative, local solutions to the problems involved with managing digital rights from a dedicated, not-for-profit perspective. Developed as alternatives to proprietary solutions and schema for digital rights expression, the Open Digital Rights Language (ODRL) Initiative, along with the support of the World Wide Web Consortium (W3C), is striving to standardize the semantics for the expression of rights over digital assets. ODRL is XML-based, and is freely downloadable, in keeping with the

150

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

spirit of open source software. The ODRL Initiative promotes its tool as flexible and interoperable, able to work alongside any DRM system. The core data elements provide a very basic terminology for usage, transfer, asset management, and reuse permissions, as well as defining all forms of usage constraints, whether they result from the physical limitations of the object, or are negotiated terms in the license. ODRL also includes terms for the financial obligations associated with a digital asset, and basic descriptive terms used to locate and identify the resource. These elements are extensible, and can be added to using any of six ‘‘substitution groups’’ (Ianella, 2002). The specifications of ODRL are broad enough to apply to all digital content, from mobile phone use, to videos, to e-books. A library that purchases digital content could use many of the ODRL data elements, and refine them to accommodate more specific needs regarding licensing details. The usefulness of ODRL might break down when trying to address the rights the library has negotiated on behalf of its users. In a large library system, there is often cause to identify and record various user groups within that library community, each group possessing different access rights to a specific electronic resource. Moreover, in a system specifically designed to deal with library requirements, some data elements contain trigger dates for other system functions, such as renewal of licenses, payments, or report generation. It seems that the ODRL element set would need to be extended in some creative ways to take care of a library’s licensing tracking needs. When developing metadata sets, such as one for digital rights management, developing standards such as ODRL must be considered. It is relatively easy to develop metadata element sets which are compatible, and can be mapped to such a schema. In the case of a large research library, complex DRM elements could be reverted back to a basic ODRL element, if such information needed to be transmitted from one institution to another who uses the ODRL schema. is another example of a project primarily funded by the European Commission and by an international representation of rights owners to develop a metadata framework to support e-commerce (, 2000). focuses on ‘‘practical interoperability of digital content identification systems and related rights within the multimedia e-commerce’’ (, 2000). Developed by rights owners and mega-industry creators such as the Recording Industry of America (RIAA), the Federation of European Publishers (FEP), and the British Broadcasting Company (BBC), the schema

emphasizes the ‘‘foundation for online commercial transactions’’ rather than the non-commercial information access and use issues that are the concern and mission of most US libraries and archives. While some of the data elements and definitions from could be used by libraries related to components of an effective e-resource management system, it was not designed to fully represent or reflect non-commercial use issues or concerns. The data elements and definitions were designed to focus on commercial products and producers, and therefore definitions of what is a ‘‘publisher’’, ‘‘creator’’ and ‘‘user’’ may differ greatly, particularly from a commercial context to a non-commercial one, thus making automated use and data transfer and metadata challenging and subject to extensive human intervention and data cleanup.

A proposed e-resource metadata schema As of this writing, several projects are underway to address the challenges involved with the access, control, and maintenance of information required for the evaluation, selection, acquisition, licensing, and cataloging of digital products. Their solutions have required the design and development of stand-alone databases, or the alteration of existing systems and software to meet the new business models, relationships, and workflows required to support digital acquisitions. Librarians at several institutions are participating and promoting the development of such tools by developing a set of best practices for the design and implementation of such a system. In addition to system design, the DLF ERMI (2003) is developing functional specifications, an extensive set of data elements, and an XML schema to allow the transfer of information between systems. The DLF ERMI Steering Committee along with a group of librarian and vendor reactor panels have developed a draft set of data elements, definitions as well as a functional specification, an entity relationship diagram, and draft XML schema related to the design of an effective e-resource management system. Extensive details of the project and design documents are available at both the DLF Web site and e-resource management Web hub. DLF ERMI has developed a set of draft functional requirements for an effective electronic resources management system, and corresponding metadata set designed to support an integrated environment in which management and access are both supported, without maintaining duplicate systems. The DLF ERMI defines an electronic

151

Medium or message?

Library Hi Tech Volume 22 . Number 2 . 2004 . 144-152

Sharon E. Farb and Angela Riggio

resource management system as one that is designed to support the management and workflows necessary to efficiently select, evaluate, acquire, maintain, catalog, and provide informed access to electronic resources in accordance with license, business terms, and user needs. Seamless interaction and efficient sharing of data among a diverse set of tools and functions, including traditional MARC-based online catalogs, web portals, federated searching tools, local resolution services, local authentication and access management systems, traditional library management functions, and the unique management and service requirements of electronic resources should be supported by such a system. It should provide for global updating, flexible addition of new fields, the ability to suppress fields from public view, and a single point-of-maintenance for each data element. In addition, the system should support the ability to store, access, search, and obtain reports of the information contained in the system over time (DLF ERMI, 2003).

Conclusion In light of the absence of existing metadata schema to effectively manage e-resources persistently over time, there is a growing, almost desperate need for libraries to track the persistence and accessibility of their electronic resource assets. Several tools and metadata schemas have been developed to transfer bits of information between vendors and libraries, between library catalogs and users, and from vendor to vendor. Although many of these metadata schemas and standards overlap, none have been constructed specifically for the purpose of managing electronic resources long term, persistently over the continuum of time in a library collection. Collaborative efforts, such as the DLF electronic resource management initiative and the NISO/EDItEUR joint working party for the exchange of serials subscription information (Florida Centre for Library Automation, n.d.), are pooling the knowledge gained by librarians and vendors in the field to craft new ways to store, exchange, and update metadata for electronic resources from a library service point-of-view.

References Association of Research Libraries (ARL) (2003), Intellectual Property: An Association of Research Libraries Statement

of Principles, ARL, Washington, DC, available at: www.arl.org/scomm/copyright/principles.html Birkerts, S. (1995), ‘‘Page versus pixel’’, FCED, June. ContentGuard (2001), eXtensible Rights Markup Language (XrML). Example Use Cases, 20 November, ContentGuard, Bethesda, MD, available at: www.xrml.org/spec/2001/11/ExampleUseCases.htm ContentGuard (2003), XrML: eXtensible Rights Markup Language, home page, ContentGuard, Bethesda, MD, available at: www.xrml.org/ Dawson, L. (2003), ‘‘How will you shape the future?’’, PowerPoint presentation at the American Library Association Midwinter Conference, Philadelphia, PA, January. Digital Library Federation (2002), ‘‘Digital Library Federation electronic management initiative’’, available at www.diglib.org/standards/dlf-erm02.htm Digital Library Federation Electronic Resource Management Initiative (DLFERMI) (2003a), ‘‘A Web hub for developing administrative metadata for electronic resource management’’, available at: www.library.cornell.edu/cts/ elicensestudy/home.html Digital Library Federation Electronic Resource Management Initiative (DLFERMI) (2003b), ‘‘Functional requirements for an electronic resource management system’’, unpublished. EDITEUR (2002), ‘‘EDiTEUR home page’’, available at: www.editeur.org/ Florida Centre for Library Automation (n.d.), ‘‘NISO/EDItEUR joint working party for the exchange of serials subscription information’’, available at: www.fcla.edu/ ~Epcaplan/jwp/ Gilliland-Swetland, A.J. (2000), ‘‘Setting the stage’’, in Introduction to Metadata: Pathways to Digital Information, Getty Information Institute, Los Angeles, CA, available at: www.getty.edu/research/institute/standards/ intrometadata/ Hillman, D. (2001), Using Dublin Core, DCMI, available at: http://dublincore.org/documents/usageguide/ Iannella, R. (2002), Open Digital Rights Language (ODRL) Version 1.1, WWW Consortium, Cambridge, MA, available at: www.w3.org/TR.odrl/ (2000), ‘‘Interoperability of data in e-commerce systems’’, in Putting Metadata to Rights: Summary Final Report, European Commission INFO 2000 Programme, June, available at: www.indecs.org/pdf/ SummaryReport.pdf Jones, E. (2002), The Exchange of Serials Subscription Information, National Information Standards Organization, Bethesda, MD. OED Online (2003), Oxford University Press, Oxford, available at: http://dictionary.oed.com Radin, M.J. (2001), ‘‘Online standardization and the integration of text and machine’’, Fordham Law Review, Vol. 70, available at: www.law.stanford.edu/faculty/radin/ RadinFordham.pdf Yale University (2003), ‘‘LibLicense: licensing digital information: a tool for librarians’’, available at: www.library.yale.edu/ ~llicense/index.shtml

152

Repurposing MARC metadata: using digital project experience to develop a metadata management design Martin Kurth David Ruddy and Nathan Rupp The authors Martin Kurth is Head of Metadata Services, David Ruddy is Head of Systems Development and Production, Electronic Publishing, and Nathan Rupp is Metadata Librarian, all at Cornell University Library, Cornell University, Ithaca, New York, USA. Keywords Online cataloguing, Libraries, Design, United States of America Abstract Metadata and information technology staff in libraries that are building digital collections typically extract and manipulate MARC metadata sets to provide access to digital content via non-MARC schemes. Metadata processing in these libraries involves defining the relationships between metadata schemes, moving metadata between schemes, and coordinating the intellectual activity and physical resources required to create and manipulate metadata. Actively managing the non-MARC metadata resources used to build digital collections is something most of these libraries have only begun to do. This article proposes strategies for managing MARC metadata repurposing efforts as the first step in a coordinated approach to library metadata management. Guided by lessons learned from Cornell University library mapping and transformation activities, the authors apply the literature of data resource management to library metadata management and propose a model for managing MARC metadata repurposing processes through the implementation of a metadata management design. Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 153-165 # Emerald Group Publishing Limited . ISSN 0737-8831 DOI 10.1108/07378830410524585

Technical services staff in libraries have a long history of optimizing and documenting the processes they use to create and manage the metadata contained in MARC-based library management systems. With the expansion of library metadata processing into non-MARC schemes, metadata managers and practitioners are faced with the need to extend the tradition of careful MARC metadata management to all library metadata processes in an environment that is complicated by decentralization. The decentralized situation in which libraries find themselves parallels that in all automated work environments following the emergence of relational databases, desktop workstations, and client-server architectures. These tools made it possible for internal data users to extract data from centralized organizational databases and create new, unique, standalone database applications on their desktops (Tannenbaum, 2002). In libraries that are building digital collections, information technology and metadata staff access the relational database files underlying library management systems to extract and manipulate MARC metadata sets to provide access to digital content via non-MARC schemes. The metadata processing environment in libraries that use MARC and non-MARC metadata schemes magnifies the decentralization and complexity common to automated workplaces. In such libraries, metadata processing involves defining the relationships between metadata schemes (mapping), moving metadata between schemes (transformation), and coordinating the intellectual activity and physical resources required to create and manipulate metadata (metadata management). Mapping and transformation work is divided among metadata staff and information technology staff. The tools and electronic files used in metadata processing are often scattered throughout the library, on servers and on the workstations of individual staff members. Actively managing the non-MARC metadata resources used to build digital collections is something most of these libraries have only begun to do. Although the history of using non-MARC metadata for digital collections is a relatively short one, library metadata staff and information technology staff have already begun to notice the problems that result from the lack of metadata management. Access to digital collections inevitably suffers when staff try to recall metadata content decisions, recreate metadata repurposing workflows, or troubleshoot connection failures by locating metadata resource Received 16 September 2003 Revised 1 November 2003 Accepted 7 November 2003

153

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

files among the innumerable folders of staff members who have handled them at various junctures in the metadata processing pipeline. For libraries seeking to manage their metadata processing operations as a whole, an excellent starting point is the management of MARC repurposing efforts. We see four reasons for attending to the use of MARC metadata in nonMARC applications as a first step in the larger endeavor of library-wide metadata management. First, the MARC records in a library management system are typically the largest store of metadata in the library. The Cornell University Library (CUL) library management system, for example, contains 4.5 million bibliographic records and is arguably irreplaceable as a tool for retrieving and managing Cornell’s unique array of physical and electronic resources. Second, repurposing MARC metadata necessarily involves mapping and transformation. Because the emerging library metadata environment requires mapping and transformation activities, MARC metadata repurposing is a representative subset of the library metadata environment as a whole. Third, the metadata mapping schematics and transformation processes used in MARC repurposing are sufficiently costly and complex to warrant optimization and documentation. Finally, MARC metadata repurposing inevitably results in data redundancy, with duplicative occurrences of source and derived metadata residing in various library systems. Carefully managing redundant metadata can avoid unnecessarily multiplying maintenance efforts and creating access points that should be identical but are instead divergent. For the reasons just identified, this article will propose strategies for managing MARC metadata repurposing operations as the first step in a coordinated approach to library metadata management. In considering MARC metadata repurposing and metadata management, we will draw on our experiences as CUL staff members from three different library units who have participated in MARC metadata repurposing activities to support CUL digital collection development projects. Although CUL is a large library with decentralized metadata operations, we believe that our MARC metadata repurposing experiences are relevant to metadata and information technology staff in libraries of all sizes and configurations who are engaged in similar activities. As we have already observed, the complexities of MARC repurposing derive as much from data redundancy and automated work environments as they do from specific library configurations. Thus we believe we can prudently generalize from our digital project experience to recommend approaches to MARC metadata

repurposing and thereby library metadata management. Our approach to MARC metadata repurposing will concentrate on three areas: mapping activities, transformation processes, and metadata management design. We will begin by relating significant CUL experiences in mapping MARC metadata to other metadata schemes and will use these experiences to recommend a model for managing mapping activities. Using a similar approach, we will then describe CUL experiences with metadata transformation processes and draw on those experiences to make general observations about metadata transformations. Next, guided by lessons learned from CUL mapping and transformation activities, we will apply the literature of data resource management to library metadata management and propose a model for managing MARC metadata repurposing processes through the implementation of a metadata management design. Having explored mapping, transformation, and metadata design, we will conclude with two short sections that look to the future. We will offer practical next steps for practitioners who wish to apply metadata management design to their operations and we will identify areas for further research into issues related to library metadata operations.

Mapping activities Although authors and practitioners often use ‘‘mapping’’ and ‘‘crosswalking’’ interchangeably, in this article we use ‘‘mapping’’ to refer to the process of establishing relationships between semantically equivalent elements in different metadata schemes and use ‘‘crosswalk’’ or ‘‘map’’ to refer to a visual representation of mapping relationships (St Pierre and LaPlant, 1998; Woodley, 2000). In this section we describe significant CUL digital project experiences in mapping MARC metadata to the TEI Lite and Dublin Core metadata schemes. We give particular attention to CUL’s MARC-to-Dublin Core (DC) mapping effort because it represents the library’s first coordinated approach to metadata mapping. We conclude the section by drawing on CUL mapping experiences to recommend a model for managing mapping activities. MARC-to-TEI mapping experiences CUL has derived metadata from MARC records for many of its digital library projects. In 1995-1996, for one of Cornell’s earliest and largest digital conversion projects, the Making of America (MOA) (Cornell University Library,

154

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

1999), CUL supplied its scanning vendor with MARC records, which the vendor used to generate volume-level descriptive metadata records to accompany digitized page images. MOA project staff relied on the MARC-derived metadata primarily for file management, but they also applied it in an early MOA online delivery system. In 1998, the library implemented a digital collection delivery system developed at the University of Michigan to provide enhanced online access to MOA. This system required, as an ingest format, SGML-encoded TEI Lite documents containing bibliographic metadata, structural metadata, and full-text content generated from optical character recognition (OCR) output. To populate the TEI Header fields with bibliographic metadata, staff used the MOA vendor’s records, those originally derived from CUL MARC records. A number of CUL projects completed since MOA have required similar TEI Lite encoding of metadata and OCR. These projects include the Core Historical Literature of Agriculture (Cornell University Library, 2004a), the Samuel J. May Anti-Slavery Collection (Cornell University Library, n.d. a), Historical Math Monographs (Cornell University Library, 2004b), and the Home Economics Archive (Cornell University Library, 2004c). All of these projects have used metadata derived from MARC records and mapped into TEI Lite documents. Though the mapping schemes used in these projects were similar, they varied from project to project. Metadata staff developing the Home Economics Archive (HEARTH), a ‘‘core electronic collection of books and journals in Home Economics and related disciplines . . . [p]ublished between 1850 and 1950’’ (Albert R. Mann Library, 2003), based the HEARTH MARC-to-TEI mapping scheme on the scheme used in the MOA project, though they changed some element mappings. For example, the HEARTH mapping differed from the MOA mapping in its handling of edition statements. Because many of the print versions of the books and journals included in HEARTH had been published in the late nineteenth and early twentieth centuries, the MARC records for them presented a mixture of pre-AACR and AACR cataloging rules. When HEARTH staff adapted these catalog records to describe the digital versions of those publications, they did not update the cataloging to reflect current rules; instead, they retained the original description and added fields representing the electronic aspects of the objects cataloged. In order to map edition statements from MARC records that followed different cataloging rules, HEARTH metadata staff wrote mapping protocols to recognize that in some cases

the edition statement would come from the 250 field, while in other cases it would come from the 500 field. In the latter cases, the mapping rule identified (via such abbreviations as ‘‘ed.’’) which 500 field represented the edition statement. A similar example of how collection-specific mapping protocols differ involves the Samuel J. May Anti-Slavery Collection and the Historical Math Monographs collection. For the May AntiSlavery Collection, project staff mapped the date of publication for the pamphlets in the collection from the 07 through ten character positions of the MARC 008 fixed field, because the MARC records used for the May Collection cataloged the original source documents, in this case the pamphlets. For Historical Math Monographs, however, CUL catalogers had created MARC records for the digitized versions of the original print books. The 008/07-10 character positions for these records contained the dates of the digital versions, whereas the publication dates of the original books were in the 008/11-14 character positions, so Historical Math Monographs staff mapped the 008/11-14 to TEI for that project. The mapping schemes for MOA, HEARTH, May, and Historical Math Monographs reflect collection-specific variations that are common in MARC repurposing. CUL’s experiences with these collections suggest that, given changes in cataloging practice over time and the requirements of individual digital collections, a single MARC mapping for all collections sharing a common metadata scheme is not possible. Beginning coordinated mapping at CUL As the number of digital collections at CUL continued to grow, and with them the number of MARC mapping schemes, CUL metadata librarians felt the need to develop a generalized, but CUL-specific, mapping scheme that they could use as the basis for collection-specific mappings in a number of different projects. Project staff would be able to consult the generalized mapping scheme in light of their project needs and revise it as they saw fit. To that end, CUL metadata librarians in early 2002 convened a group of staff who had worked on various digital library projects (including MOA, HEARTH, and May), along with other technical and public service stakeholders, to develop a CUL scheme for mapping MARC to DC. The group’s organizers felt that bringing a representative group to consensus regarding MARC-to-DC mapping would generate a mapping scheme acceptable throughout the library system. CUL metadata librarians chose DC for this effort because it was well on its way to becoming a metadata lingua franca (it had been approved by

155

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

NISO the previous fall and would be approved by ISO within a year), and because CUL had begun a project to implement Endeavor Information Systems’ ENCompass product (Cornell University Library, 2003). ENCompass enables end users to search different digital library collections simultaneously by mapping their metadata schemes to a common element set. For that common element set, CUL ENCompass developers had selected DC. CUL’s DC Mapping Group began by consulting the MARC-to-DC Crosswalk created by the Library of Congress (n.d.). The DC Mapping Group wanted the CUL MARC-to-DC mapping to follow the LC Crosswalk whenever possible because the LC mapping had become a de facto standard for mapping MARC to DC. Although the CUL group based its work on the LC Crosswalk, it wanted to go beyond the LC map in three general ways. First, the LC Crosswalk did not address all MARC fields and the CUL group wanted to record its decisions regarding a greater number of MARC fields and subfields. Second, the DC Mapping Group wanted to apply recommendations found in the DC-Library Application Profile (Dublin Core Metadata Initiative, 2002) to the CUL MARC-toDC map. Finally, the group wanted to expand the ‘‘Notes’’ section of the LC Crosswalk to reflect CUL practices with regard to such decisions as the treatment of initial articles in titles. After the DC Mapping Group completed its work, members of the group presented their recommendations to CUL staff at a forum of the CUL Metadata Working Group. The group made some revisions to its MARC-to-DC map based on this additional input, and made its final recommendations available on the Web (Cornell University Library, n.d. b). CUL’s ENCompass Development Project quickly put the work of the DC Mapping Group to use. ENCompass Project staff extended the CUL MARC-to-DC Crosswalk to an even greater degree of granularity with regard to MARC subfields to support access to MARC-derived DC records in the ENCompass-based ‘‘Find Articles/ Find Databases/Find e-Journals’’ system (Cornell University Library, n.d. c). Metadata librarians working on the ENCompass Project consulted the CUL MARC-to-DC Crosswalk regularly throughout the project, and then documented the revised MARC-to-DC mapping scheme they implemented in ENCompass. Documenting the ENCompass MARC-to-DC map enabled CUL ENCompass Project staff to offer their MARC mapping scheme to staff at other libraries who were implementing ENCompass. The efforts by the DC Mapping Group and the ENCompass

Project team were CUL’s first attempts to record a generalized MARC mapping scheme and the application-specific mapping derived from it. A MARC mapping model Our experiences with MARC-to-TEI and MARCto-DC mapping at CUL have led us to view metadata mapping in general, and MARC mapping in particular, as a series of refinements from an international or national standard to a library standard and ultimately to an applicationspecific map. Starting from a de facto international standard such as the Library of Congress MARCto-DC Crosswalk has given CUL metadata librarians some confidence that local CUL mappings from MARC to DC will be compatible with other libraries’ mappings based on the LC map. Similarly, developing local agreement around a more detailed metadata map should ensure an even greater consistency among the digital collections offered by a single library. And, as our experience with CUL MARC repurposing projects has shown, each digital project has its own peculiarities that call for collection-specific decisions regarding metadata mapping. Throughout our mapping experiences, we have observed the clear need to document mapping decisions. We have found it worthwhile to share with our colleagues not only the library’s standard mapping scheme, but also the specific mapping schemes developed for particular projects. Making all schemes available gives metadata librarians throughout the organization an opportunity to examine the collection-specific maps available and determine whether they can adapt any of them to their projects; if none of the maps is useful, metadata staff can choose the generalized library scheme and adapt it for their purposes. The process of coordinating metadata mapping across a decentralized organization like CUL has proven to be no different than coordinating other decisions and processes that affect the organization as a whole. CUL’s MARC-to-DC mapping experiences have confirmed the benefits of building a library-wide consensus and then disseminating results so staff who work with metadata can benefit from them. CUL metadata librarians’ success with approaching metadata mapping from a library-wide perspective rather than an isolated project perspective led them to apply similar strategies to the metadata transformation processes that enact mapping rules.

Transformation processes There is little agreement in the metadata literature regarding where metadata mapping ends and

156

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

metadata transformation, also called metadata conversion, begins. We view metadata mapping as the process of establishing semantic relationships between equivalent elements in different schemes and metadata transformation as the design and implementation of scripts and other tools that move mapped metadata between schemes. Representing a somewhat different view, St Pierre and LaPlant (1998) argue that a complete metadata map should define both semantic equivalents and transformation specifications. The boundary between detailed mapping specifications on one hand and transformation rules on the other is admittedly fuzzy. We include detailed mapping decisions, written as natural language instructions in lists or tables, within the boundaries of mapping; conversely, we include transformation rules written in languages such as Perl or XSLT within the boundaries of transformation. In this section we turn our attention to metadata transformation processes. We describe CUL experiences with scripted transformation processes and we propose an XML/XSLT approach to transformations. We then compare the scripted and XML/XSLT approaches. We conclude the section by recommending strategies for MARC metadata transformations. MARC metadata transformation experiences As CUL’s experiences with mapping MARC metadata have evolved, so have the processes the library has used to extract metadata from MARC records and transform it to another format – the transformation processes that implement the intellectual mapping work described in the previous section. The evolution of CUL transformation processes has followed a path similar to that of CUL mapping practices, from project-specific conversion processes to an increasing emphasis on reusable tools, processes, and workflows. For early CUL digital collection projects such as MOA, scanning vendors derived descriptive metadata from MARC records before online delivery systems were in place to deliver the digital content. When CUL staff implemented the current delivery system for MOA, they used the vendor-supplied metadata records to populate TEI Header fields, rather than returning to the original, or ‘‘live’’, MARC records in the library management system. The problems with this approach became more apparent with time. For one, the vendor had derived the MOA descriptive metadata from MARC without a clear sense of the delivery system’s requirements; as a result, neither the vendor nor CUL staff had optimized the metadata records for any particular system. But more importantly, the MOA processing stream

had divorced the derived metadata from its source and there were no processes in place for regenerating it. The library had thereby created multiple versions of its metadata, with the resulting requirement that it be maintained in two places – the library management system (LMS) and the MOA delivery system. This experience of metadata replication, and its attendant problems of version control and duplicated maintenance effort, led CUL to work out a different approach to MARC metadata transformation for the May Anti-Slavery Collection. As mentioned earlier, the metadata requirements for the May project were similar to those of MOA. The delivery system required TEI Lite encoded files, which included descriptive metadata in the TEI Header. This requirement called for a conversion process that would extract portions of the MARC record to populate a TEI Lite file according to the MARC-to-TEI mapping that project staff had devised. Rather than follow the MOA process, May Collection developers now wanted to avoid divorcing the derived bibliographic metadata from its source in the MARC record. What May project staff wanted to achieve instead was a way of regenerating the derived metadata easily and at will. This would allow CUL to maintain the bibliographic metadata for May Collection pamphlets in one place, the LMS. If CUL technical services staff corrected or altered May Collection records in the LMS, May project staff could rerun the transformation script, thereby updating the metadata used by the delivery system in a reasonably automatic way. This would obviate the need for redundant data entry and maintenance. Implementing this transformation process involved CUL information technology staff. A programmer analyst wrote software to identify the relevant records in the LMS and extract fields from these records as specified in the MARC-toTEI map. The extracted data was then encoded in XML and stored so that it could be picked up and combined with administrative metadata, structural metadata, and OCR output to produce the TEI files required by the delivery system. Once the programmer analyst began the transformation process, its operation was automatic. The transformation process used in the May project had some clear advantages. Because the files required by the delivery system could easily be regenerated, it was relatively simple to rerun the entire transformation process if the bibliographic metadata was updated in the LMS. This approach allowed the LMS to remain a central site for maintaining bibliographic data, thus relying on well-established library procedures and workflows. The transformation process was

157

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

also quite flexible in terms of mapping and output. Working with a programmer analyst, librarians could request that any MARC field be extracted and mapped to any metadata output format. The flexibility of this ‘‘scripted’’ transformation process made it possible to reuse the same tools in subsequent projects that required similar conversions. As noted in the discussion of MARC mapping, while metadata librarians can fruitfully pass MARC mappings from project to project, collection-specific modifications are inevitable. The flexibility of the scripted approach to metadata transformations allows digital project staff to accommodate these modifications and create collection-specific conversion processes. Experience has shown, however, that this manner of achieving needed flexibility is expensive. Because information technology staff must build the intellectual mapping work done by librarians into the transformation process, librarians must work closely with programmer analysts to install and test each new mapping. Every modification requires an analyst’s time, which is expensive in direct staff time requirements and can cause time delays in implementing project-specific MARC conversions. Further, scripted transformations are designed to run as batch processes initiated and directed by programming staff. In practice, however, single record transformations are often preferable and more efficient than regenerating thousands of records merely to accommodate a few changes. Though the scripted approach to MARC metadata conversion offers flexibility with regard to mapping and output, it is technologically inflexible with regard to scale. In other words, digital projects requiring metadata transformations also need flexibility that considers staffing and workflow efficiencies. An XML/XSLT transformation process The need for processes that accommodate different staffing requirements and workflow scenarios has prompted further evolution in CUL’s MARC metadata extraction and transformation strategies. For example, a CUL programmer has created a Web-based tool that automatically selects and extracts MARC records from the LMS based on criteria collected via an online form. Librarians can enter criteria, indicate various output parameters, and save and edit jobs. The extraction jobs run overnight and generate email notifications when they finish. Similar improvements in MARC metadata transformations are possible, perhaps by building on the features of the existing extraction tool. We envision an XML/XSLT transformation process that requires information technology

programming support to create the transformation tool but then allows staff in other units to carry out required work with few if any ongoing programming needs. One strength of this approach to MARC metadata transformation is that it can use widely available XML and XSLT tools, taking advantage of the flexible mapping technologies they offer. XSLT processing essentially converts one XML document into another XML document by using instructions contained in an XSLT stylesheet. XSLT processors are widely available, and we have found that basic XSLT stylesheet creation and modification are skills that technical services staff can learn quickly. A second strength of an XML/XSLT process for transforming MARC metadata is that its developers can build on existing work done with the MARCXML standard for XML encoding of MARC metadata. The Library of Congress (2003) has made a tool freely available precisely for this MARC-to-MARCXML conversion process. CUL has already implemented a Web-based interface to this tool for single record conversions of MARC records to MARCXML or DC. A fully functional implementation of this tool would require securing programming staff time to add the capability of batch processing MARC records. Once built, the modified tool could serve any project, as the conversion from MARC to MARCXML is uniform across MARC records. The XML/XSLT approach to transformation accommodates collection-specific variations by converting MARCXML to other metadata schemes. It is XSLT’s straightforward handling of these variations that makes it so attractive and convenient. The subtle variations of MARC metadata mapping required by different digital collections can be scripted into an XSLT stylesheet by the librarians who are closest to a given collection and who have done the intellectual work of map preparation. Using the single record transformation tool just described, metadata librarians working on the CUL ENCompass Project have done just this, using XSLT to implement a MARC-to-DC map that met the project’s specific needs. An XML/XSLT approach to MARC metadata transformations offers staffing, scheduling, and workflow advantages. Once library programmers build XML/XSLT transformation tools and put them in place, metadata staff can perform ongoing conversion work without information technology support. Metadata staff can also modify existing XSLT stylesheets or insert new ones without the need for programming skills. Using metadata staff rather than information technology staff for collection-specific transformation operations will

158

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

lower project staffing costs. With regard to scheduling, an XML/XSLT approach lets metadata practitioners decide when they will implement transformation modifications. XML/ XSLT enables them to translate their own mapping decisions into transformation processes immediately by reusing or refashioning XSLT stylesheets as needed. Finally, using XML/XSLT offers exceptional flexibility with regard to transformation workflows. The same approach can be used for batch or one-off, record-byrecord, processing. It might also be used for onthe-fly conversions, where library-developed tools import external MARC records (e.g. via Z39.50) and then convert them for user viewing or additional processing. Considering MARC metadata transformations generally Reflecting on our experience with scripted and XML/XSLT metadata transformations has prompted us to make these general observations regarding MARC metadata transformation processes. MARC metadata transformation work will inevitably encounter variation in digital project conversion needs. Transformation processes must accommodate this variation as a functional requirement in their design. To transform MARC metadata in a technically and economically efficient manner, transformation processes should be broadly available to library staff rather than centralized in information technology units. Libraries engaged in MARC repurposing need decentralized transformation processes that digital project staff can easily modify through routine and standardized methods such as the alteration or addition of XSLT stylesheets. Decentralization of transformation processes, however, points to a critical need for a richer and more complete inventory and documentation of transformation components. The library staff who build and maintain transformation tools, including software and XSLT stylesheets, need to document them in a way that promotes their reuse. Without this intellectual control over the mechanisms by which librarians transform MARC metadata, it will be difficult to create the standardization in procedures necessary to facilitate the sharing, reuse, and adaptation of transformation tools.

Metadata management design CUL experiences with MARC repurposing have highlighted the need to build library-wide consensus regarding mapping decisions between specific schemes; to document collection-specific

maps in order to share and reuse them; and to build, maintain, and document transformation tools in order to facilitate their sharing and reuse. In considering such topics as consensus, sharing, and reuse, we have moved from the specifics of mapping and transformation to metadata management. Metadata management comprises two interrelated aspects. First, it involves coordinating the intellectual work that drives metadata creation and manipulation activities and, second, it involves managing the tools and electronic files that result from metadata creation and manipulation processes. In this section we advocate meeting the needs of internal library users insofar as they are necessary to meet the needs of library end users. We explore the ways in which studying data resource management literature can inform our experiences as metadata practitioners to enable us to recommend potentially fruitful approaches to the ways in which libraries manage MARC metadata repurposing. And, last, we propose creating a metadata resource inventory as an initial step toward establishing a management design for MARC metadata repurposing. Identifying the needs of internal users In discussing metadata mapping activities and transformation processes in this article, we have identified the benefits to library operations that accrue when digital project staff share and reuse metadata resources. As members of library digital project teams, however, we have observed that sharing and reuse of metadata resources does not always occur. Instead, digital project staff often devise metadata maps, write transformation scripts, and create intermediate metadata files to meet digital collection project goals and deadlines without capitalizing on the latent value of those maps, scripts, and files to subsequent digital projects or to the ongoing maintenance of the collections for which they were created. Metadata operations would grow more integrated and effective if library metadata managers were to develop metadata processes that promoted sharing and reuse in order to meet the operational needs of the library staff who create and manage digital collections. Because the sustained vitality and integration of digital collections depends on digital project staff sharing metadata resources with each other over time, treating digital project staff as internal users with specific information needs is necessary to serving the long-term information needs of end users. The value of developing metadata processes that meet the needs of both end and internal users is evident in some metadata management guidelines currently in effect. The UK Intra-Governmental

159

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

Group on Geographic Information (IGGI) has created two documents – ‘‘Principles of good data management’’ and ‘‘Principles of good metadata management’’ – that recommend best practices for information managers who deal with geographic data files and the metadata files that describe them (IGGI, 2000, 2002). The ‘‘Principles of good data management’’ contend that actively managing data by making it easier to access and thereby reuse improves operational activities. A significant aspect of the ‘‘Principles of good metadata management’’ is the principles’ emphasis on the benefit of metadata management for information managers as well as end users. The IGGI metadata principles also hold that actively managing metadata processes reduces the risk that metadata generation and transformation processes will be lost as an organization’s structure, staff, and activities change over time. Consulting documents such as the IGGI data and metadata management principles prompted us to consult the literature of the data resource management discipline. The publications of the Data Management Association in particular document principles dedicated to managing the internal data needs of organizations in order to meet the data needs of the clienteles that those organizations serve. According to the Data Management Association, data resource management ‘‘facilitates the stewardship of data’’ as a key organizational asset, making it more valuable to an organization through ‘‘planning, communication, control, coordination, and management’’ (DAMA Chicago, 2002). The discipline of data resource management seeks to manage the entirety of an organization’s data (including metadata) as an organic whole, regardless of the physical formats in which the data lie. Although the data resource management literature is largely targeted to private enterprise organizations, its principles are relevant to the metadata management needs of libraries building digital collections because they attend to internal data needs as intrinsic to external data needs. Bringing data resource management to library metadata management The benefits of applying data resource management principles to library metadata operations fall into two categories, cultural benefits and operational benefits. Cultural benefits relate to the adoption of values within the library that are useful to furthering its service mission, while operational benefits relate to the integration of useful practices into the library’s operations. As we have discussed in this article, a library’s MARC metadata is connected to the metadata derived from it by mapping schematics and

transformation processes, all of which are represented by electronic files and likely paper documents stored throughout the library, often by many staff in several library units. Because such scattering of data and the staff members responsible for it is often the norm in libraries managing digital collections, it is important to note that applying data resource management to library metadata would not depend on centrally directing the creation and manipulation of all of the library’s metadata. Rather, it would involve, first, documenting existing metadata relationships and processes and, second, using that documentation as a primary resource when seeking to coordinate those relationships and processes. The two-part effort to describe and shape an organization’s metadata is what Tannenbaum (2002), who writes about data and metadata architectures in private enterprise settings, calls creating a meta-meta design for an organization. Although we may be reluctant to use Tannenbaum’s terminology and introduce yet another ‘‘meta’’ into the library literature, we are nevertheless willing to endorse her advocacy for creating what we choose to call a library’s metadata management design in order to document and manage its metadata content relationships, processing applications, maintenance protocols, storage instances, and display occurrences. As a library begins to enjoy the operational benefits that creating a metadata design would lend to metadata activities, it would also begin to realize the cultural benefit of formally acknowledging the value of managing its metadata. Creating a metadata design would enable metadata and information technology staff to represent visually the location of an applicationspecific metadata scheme within the array of schemes used in the library. By calling on project staff to locate application-specific schemes within the library’s metadata design, the presence of a design would also require staff to establish the relationships between the project scheme and related schemes used elsewhere in the library. Treating project needs in a broader context in this way would maximize the chances for integration among projects and minimize the chances of duplicated effort and irrecoverable departures from existing access methods. Underlying the effort to create a metadata management design for a library is the acknowledgement that a library’s metadata mapping schematics and transformation processes are valuable resources that serve the information needs of end users and as such warrant careful management. The principles of resource management call for an organization to optimize an

160

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

existing resource because the organization cannot afford an unlimited supply of it; to share and leverage a resource in as many ways as possible to maximize value and minimize cost; to anticipate a resource’s requirements and fulfill them proactively; and to manage a resource carefully to make sure the organization uses it prudently, efficiently, effectively, and securely (DAMA Chicago, 2002). These values in managing resources may seem intuitively obvious, but we feel it is important to articulate them explicitly in the context of metadata design because the first two principles provide a justification for the cost of metadata design and the second two provide a rationale for undertaking it. Related to the introduction of a metadata management design to a library’s metadata operations is the application of the data resource management notion of ‘‘enterprise’’ to a library’s metadata processes. Treating heretofore projectbased metadata operations as an integrated whole through a metadata design is brought into clearer focus by seeing that whole as an enterprise. Here we draw on Webster’s Third International Dictionary to define an enterprise as a planned, systematic, often complex venture undertaken to achieve a specific purpose. Enterprises frequently have the economic features of risk and cost. We have already noted the tendency in libraries to ‘‘look past’’ the needs of internal users in order to meet the needs of end users. When we look at a library’s metadata generation activities in light of our definition of an enterprise, we can begin to see them not simply as by-products of an effort to serve end users, but as parts of a coordinated venture with a singleness of purpose whose careful management is essential to meeting the needs of end users. Insofar as data resource management principles focus on the enterprise, they seek to find integrated solutions that benefit an organization as a whole. Because of their collection-oriented history, digital library metadata creation processes can benefit from the holistic approach that an enterprise perspective brings to metadata repurposing. Another potential benefit for libraries in drawing on the data resource management literature lies in incorporating the principle of data stewardship to manage the components of metadata mapping and transformation. The cultural value of stewardship is one of accountability. It holds that if information producers create or update data that other people in the organization can use, they will generate it, store it, and make it accessible to meet their colleagues’ needs as well as their own (DAMA Chicago, 2002). At an organizational level, data stewardship requires buy-in from information managers and producers throughout

the organization. Applying data stewardship in library metadata management can achieve such operational benefits as enhancing communication and productivity among metadata and information technology practitioners and making it more likely that they will find and use tools generated in other parts of the library. Data stewardship also facilitates such metadata management outcomes as treating metadata as a shared resource, reducing metadata development and maintenance costs, minimizing the creation of redundant components, making it possible to develop new metadata applications faster by sharing and adapting existing maps and transformation tools, and making it more likely that new applicationspecific mappings and transformation tools will integrate with the existing metadata development environment. Toward a design for MARC metadata repurposing As we have already observed, creating a metadata management design for a library is a two-phase effort. First, library staff document existing metadata relationships and processes and, second, they use that documentation as a primary resource for coordinating those relationships and processes. Because this article focuses on MARC metadata repurposing as a representative subset of a library’s metadata processes, our discussion of the documentation phase will focus on MARC repurposing. To begin documenting MARC repurposing processes, we propose creating an inventory of the data files, mapping schematics, transformation processes, and systems that comprise the components of the library’s current metadata repurposing efforts. The components of the metadata processing workflow for the May Collection project listed below are generally representative of those used in repurposing MARC metadata for digital projects at CUL. We have followed each entry in the inventory with observations about significant features of the component: (1) The MARC bibliographic metadata, both content and content designations, as stored in the library management system. For the May Collection, the MARC bibliographic metadata was used as a source for the description of the digitized pamphlets in the May Collection. The use of MARC bibliographic records for authority-controlled element values is complicated by the fact that the authoritative source for the value is not the bibliographic record, but rather its associated authority record. Some digital repository projects use MARC holdings metadata in combination with MARC bibliographic metadata as source metadata.

161

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

(2) The extract script or tool that selects and extracts the MARC bibliographic metadata for the project. This script was written and is maintained by CUL information technology staff. (3) The file that is the product of the extract in (2). The extract file for the May project resides temporarily on a server in the library’s server farm, under the control of the programmer who ran the extract process. For more recent digital projects, similar extract files reside on the desktop workstations of library staff members. (4) The collection-specific MARC mapping used for the project. The mapping used for the May project indicates which MARC fields from the records contained in the file (3) are to be captured and encoded in the descriptive metadata section of the XML metadata collection and storage scheme (5). As mentioned, we recommend deriving the project-specific mapping from a generalized mapping agreed on by stakeholders throughout the library. (5) The XML metadata collection and storage scheme. This XML scheme was designed by library staff to collect and store together in a single file several types of metadata about a digital library object in a transitional stage prior to the creation of TEI Lite files. The scheme includes descriptive, structural, and administrative metadata. Bibliographic metadata from MARC records are mapped into the descriptive section of the scheme. XML elements within this section are based on TEI Header elements. (6) The transformation script that creates an XML file – meeting the specifications of the metadata storage scheme (5) – and populates the descriptive metadata of this file with MARC metadata elements from the extract (3), following the MARC mapping specified for this project (4). Cornell information technology staff created and maintained the transformation script used for the May project. (7) The XML file that is the product of the transformation in (6). For the May project, these files reside on a library server. It should be noted that additional processes (with their own transformation scripts and data sources) also place metadata into these files. Examples of such metadata include the structural information contained in an image-to-page correspondence table as well as information about page image file names. All metadata about a particular document is eventually collected in this XML file. (8) The transformation script that generates a TEI Lite file by taking metadata from the XML metadata storage files in (7) and integrating it with page-level optical character recognition data. This transformation script was written by information technology staff, but not the

same staff who maintain the transformation script in (6). (9) The TEI Lite file that is the product of the script in (8). This file is stored on a library server. (10) The DTD used to validate the files in (9). The DTD used is an abridged subset of TEI Lite. A validation check is done for quality control by the script in (8). The DTD is also used by the digital collection delivery system to validate files during ingest and to aid internal data management processes. This DTD resides on a library server. (11) The project metadata as stored in the digital collection delivery system after the TEI Lite XML file is ingested. A digital collection delivery system may or may not store metadata as XML. Regardless of whether the system stores the metadata as XML, it may not be able to output the metadata, including any changes made subsequent to ingest, as an XML file that validates against the DTD or schema used for the project. Moving from an inventory like the one above to comprehensive documentation and coordination of MARC metadata repurposing will require further work. Establishing a metadata management design for MARC repurposing processes calls for a library to manage those processes as resources and to apply principles of stewardship to them. CUL, for example, has not yet brought library stakeholders together for a formal discussion of the costs and benefits of such efforts. Stakeholder discussions would provide an opportunity to raise the issue of identifying stewards for key metadata repurposing components. To bring stewardship to a library’s metadata processes, we anticipate establishing these values for each component identified as critical to the process: its authoritative version; its location (server, filename, identifier); the staff or unit responsible for it; its supporting documentation (may be internal or external to the component); its backups; and the standards and policies it follows.

Proposals for library investigation Our inventory of the metadata components involved in building the May Anti-Slavery Collection points to the work that still needs to be done to coordinate CUL’s MARC metadata repurposing processes. As the authors of this article we have begun the documentation phase of creating a MARC metadata repurposing design, but as metadata practitioners at CUL we have yet to extend this work into the coordination phase or to introduce these ideas to our colleagues to generate the buy-in necessary for their

162

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

implementation. We expect that metadata practitioners at other libraries who are interested in improving their management of MARC repurposing operations may find themselves in similar positions. As our recommendations for immediate action, we offer the following list of practical next steps for practitioners wishing to apply the metadata management approaches we have discussed to their MARC repurposing operations: (1) Build library-wide consensus regarding metadata element decisions and generalized mappings. For example, CUL has recently drawn on its experience in reaching consensus about MARC-to-DC mapping to develop local consensus around preservation metadata elements (Cornell University Library, 2002). (2) Develop reusable transformation tools. CUL’s experiences with its MARC-to-XML conversion tool and its MARC extraction tool are promising. CUL has many more transformation scripts currently in use that are not as well documented or as easily accessible by staff library-wide. (3) Organize meetings with stakeholders to discuss the costs and benefits of creating a MARC metadata repurposing design for the library. Metadata practitioners will want to use these discussions to demonstrate to their colleagues the value of stewardship in making tools and resource files more broadly accessible. Stakeholders’ meetings will present opportunities to begin articulating the roles and responsibilities of metadata staff and information technology staff in metadata management. (4) Extend discussions of a MARC repurposing design to discussions of creating a librarywide metadata management design. Stakeholders in library-wide metadata management design will likely be more numerous than those interested in MARC repurposing design because library metadata activities typically extend beyond MARC repurposing. (5) Investigate the costs and benefits of taking the creation of a library-wide metadata management design yet further by investigating the creation of a metadata management repository of mapping schematics, transformation tools, data files, and other metadata resources. Creating a metadata repository would involve treating metadata components as persistent digital objects with persistent identifiers and descriptive metadata in order to facilitate their discovery and retrieval through a digital content delivery system. Building searchable

metadata repositories would make it easier for libraries to share their metadata mapping and transformation resources with each other. The BellSouth Metadata Services Group has created a metadata repository similar to the one described here. The BellSouth repository includes information about the ‘‘databases, data transformations, interfaces, systems, metrics, components, Web content, XML artifacts, messaging structures, Web services and documents, all of which are accessible from a Web portal and cross-referenced by the use of the Dublin Core (DC) standard’’ (Stephens et al., 2003). The repository contains 150,000 objects and provides such services as ‘‘data mapping, documentation, metrics, reusable components, naming standards, etc.’’ (Stephens et al., 2003).

Recommendations for further study To provide a conceptual framework for the continuing work in metadata mapping, transformation, and management design we expect in the years ahead, we have identified three potentially fruitful areas for further research. These are: (1) Investigate the use of such architectures as the Dublin Core Abstract Model (Powell, 2003) and the METS external descriptive metadata (mdRef) element for linking MARC and MARC-derived metadata records, thus providing an infrastructure for refreshing MARC-derived metadata. Creating a librarywide MARC metadata repurposing design would identify the relationships among MARC and MARC-derived metadata occurrences. Diagramming these relationships would provide a foundation for establishing automated processes to refresh element occurrences with updated values. Linking among metadata value occurrences is an emerging activity that bears watching from a metadata management perspective. (2) Investigate the programmatic use of models such as the Simple Bucket Digital Object Model (Chandler and Westbrooks, 2002) and the Master Metadata File model (Davis, 1998; Mandel, 1998) for collocating metadata elements and values intended to be shared among distributed metadata records. Some models for managing related metadata occurrences involve collocating metadata elements from related records in a single digital object. For example, metadata in a Master Metadata File object would comprise elements and values from multiple schemes

163

Repurposing MARC metadata: using digital project experience

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Martin Kurth, David Ruddy and Nathan Rupp

including MARC. Proponents of collocative models argue that they facilitate management over time, assimilation of metadata content from multiple schemes, and delivery of metadata content into multiple schemes. Building a single file of metadata objects containing metadata from multiple applications would promote consistent data definitions, maintain consistent content for key elements, and formalize relationships among related digital objects (Davis, 1998). (3) Investigate the implications that building a MARC metadata repurposing design hold for creating Archival Information Packages (AIPs) in accord with the Open Archival Information Systems (OAIS) Reference Model. As we have observed, managing digital collections over time depends heavily on metadata processes and the files they produce. For AIPs that contain MARCderived descriptive metadata, it may serve the ends of long-term preservation to establish links between the MARC-derived metadata contained in AIPs and the inventory of maps, tools, and files that generated that metadata. Establishing those links may also make it possible to refresh MARC-derived metadata in AIPs dynamically.

Conclusion In light of the experience of metadata and information technology staff at Cornell University Library in repurposing MARC metadata for digital collection projects, we see the need to manage internal metadata processing more efficiently in order to serve end users reliably over time. Recent metadata mapping and transformation efforts indicate a trend in library metadata operations: Libraries will receive, create, and transform metadata in multiple schemes amid an increasingly complex and decentralized technological environment. The literature of data resource management indicates that libraries are not alone in facing this phenomenon; indeed, data resource management can provide useful strategies to library staff as they explore more systematic and comprehensive approaches to managing metadata operations. This article advocates that libraries develop metadata management designs. Such designs will recognize the importance of mapping schematics and transformation processes to library operations and will propose management practices that optimize their use. In particular, we have argued for the need to build library-wide consensus on metadata mapping decisions and to document

mappings and transformation processes systematically in order to promote sharing and reuse. We see the value of approaching library metadata work as a coordinated enterprise (especially in a decentralized environment) and of reinforcing metadata coordination by establishing data stewardship responsibilities. Although library metadata and information technology staff have yet to discuss the many specific components of potential metadata management designs, the general features of a library metadata management design as outlined here offer metadata practitioners a useful framework within which to position their operational needs and potential solutions. Metadata management designs promise effective approaches for libraries seeking to optimize their significant investments in metadata work.

References Albert R. Mann Library (2003), ‘‘Home economics archive: research, tradition and history (HEARTH)’’, available at hearth.library.cornell.edu Chandler, A. and Westbrooks, E.L. (2002), ‘‘Distributing nonMARC metadata: the CUGIR metadata sharing project’’, Library Collections, Acquisitions, & Technical Services, Vol. 26 No. 3, pp. 207-17. Cornell University Library (1999), ‘‘Making of America’’, available at: http://cdl.library.cornell.edu/moa/ Cornell University Library (2002), ‘‘CUL working meeting on preservation metadata’’, available at: www.library. cornell.edu/iris/dpo/metadata.html Cornell University Library (2003), ‘‘CUL ENCompass Development Project’’, available at: http://encompass. library.cornell.edu/ Cornell University Library (2004a), ‘‘Core historical literature of agriculture’’, available at: http://chla.library.cornell.edu/ Cornell University Library (2004b), ‘‘Historical math monographs collection’’, available at: http://historical.library.cornell. edu/math/ Cornell University Library (2004c), ‘‘Home economics archive’’, available at: http://hearth.library.cornell.edu Cornell University Library (n.d. a), Samuel J. May Anti-Slavery Collection, available at: www.library.cornell.edu/ mayantislavery/ Cornell University Library (n.d. b), ‘‘CUL MARC to Dublin Core Crosswalk’’, available at: http://metadata-wg.mannlib. cornell.edu/programs/docs/ CUL_MARC_to_DC_Crosswalk.htm Cornell University Library (n.d. c), ‘‘Find articles/find databases/ find e-Journals’’, available at: http://find.library.cornell.edu DAMA Chicago (2002), Guidelines to Implementing Data Resource Management, 4th ed., DAMA International, Bellevue, WA. Davis, S.P. (1998), ‘‘Managing and accessing the digital library’’, available at: www.columbia.edu/cu/libraries/inside/ projects/metadata/presentation/nypl/nypl.ppt Dublin Core Metadata Initiative (2002), ‘‘DC-library application profile’’, available at: http://dublincore.org/documents/ 2002/09/24/library-application-profile/

164

Repurposing MARC metadata: using digital project experience

Martin Kurth, David Ruddy and Nathan Rupp

Library Hi Tech Volume 22 . Number 2 . 2004 . 153-165

Intra-Governmental Group on Geographic Information (IGGI) (2000), ‘‘The principles of good data management’’, available at: www.iggi.gov.uk/achievements_ deliverables/manage.pdf Intra-Governmental Group on Geographic Information (IGGI) (2002), ‘‘The principles of good metadata management’’, available at: www.iggi.gov.uk/achievements_ deliverables/pdf/Guide.pdf Library of Congress (n.d.), ‘‘MARC to Dublin Core Crosswalk’’, available at: www.loc.gov/marc/marc2dc.html Library of Congress (2003), ‘‘MARCXML Site’’, available at: www.loc.gov/standards/marcxml Mandel, C. (1998), ‘‘Manifestations of cataloging in the era of metadata’’, available at: www.columbia.edu/cu/libraries/ inside/projects/metadata/presentation/alctslita/ alctslita.ppt Powell, A. (2003), ‘‘Dublin Core abstract model’’, available at dublincore.org/documents/2003/08/10/abstract-model

St Pierre, M. and LaPlant, W.P. Jr (1998), ‘‘Issues in crosswalking content metadata standards’’, available at: www.niso.org/press/whitepapers/crsswalk.html Stephens, R.T., Wilson, A. and Jenkins, B. (2003), ‘‘Best practices in metadata management’’, available at: www.wilshireconferences.com/award/Submissions/ Bellsouth.pdf Tannenbaum, A. (2002), Metadata Solutions: Using Metamodels, Repositories, XML, and Enterprise Portals to Generate Information on Demand, Addison-Wesley, Boston, MA. Woodley, M. (2000), ‘‘Crosswalks: the path to universal access?’’, in Baca, M. (Ed.), Introduction to Metadata: Pathways to Digital Information, Getty Research Institute, Los Angeles, CA, available at: www.getty.edu/research/ conducting_research/standards/intrometadata/2_articles/ woodley/index.html

165

Future considerations: the functional library systems record Karen Coyle

The author Karen Coyle is Digital Libraries Consultant, Berkeley, California, USA. Keywords Online cataloguing, Libraries, Object-oriented databases, Design Abstract The paper performs a thought experiment on the concept of a record based on the Functional Requirements for Bibliographic Records and library system functions, and concludes that if we want to develop a functional bibliographic record we need to do it within the context of a flexible, functional library systems record structure. The article suggests a new way to look at the library systems record that would allow libraries to move forward in terms of technology but also in terms of serving library users. Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

The library card catalog performed a suite of functions with a single technology: the card. Today’s library automation systems have integrated a much larger number of functions into a single system. These include the functions of discovery and location that were performed by the card catalog, but expanded to other library management functions like acquisitions, serials control, and circulation. The library system is also being asked to expand beyond these functions. It needs to provide interaction with outside user services such as full text, to enhance catalog entries with images and sound, and to allow users to search a variety of local and remote databases with a single search. When we contemplate how our bibliographic record should be structured in the future, and what data elements it should contain, we need to look at more than just the MARC record but also the context in which it is used, which is the library system. Changing the MARC record without taking this holistic system view would be a grave mistake. It would also be a mistake to assume that the library system of today is a finite and fixed context; instead, our systems are in a constant state of evolution, as are all computer systems, and they are part of a larger context of networked information resources. At the same time that those of us in the library systems area are contemplating our next record structure, catalogers in our profession are looking at the bibliographic record from a conceptual and functional point of view. The bibliographic view of what is functional and the systems view of functional are not currently being discussed in concert. Bringing these two reform movements together would be a better formula for success than either of them would have on its own. This article proposes one way to think about these two changes and how they might work together.

AACR: MARC . . . The MARC record was created as a digital mirror image of the cataloging rules of its time, which were not so different from the cataloging rules of our time. Those cataloging rules were originally designed to produce cards for library catalogs, and they still reflect that heritage with their main entry headings, inverted forms of names, and the grouping of data elements into paragraph-like segments.

Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 166-174 # Emerald Group Publishing Limited . ISSN 0737-8831 DOI 10.1108/07378830410524594

Received 31 August 2003 Revised 23 October 2003 Accepted 7 November 2003

166

Future considerations: the functional library systems record

Karen Coyle

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

A library catalog’s cards served a variety of functions. They carried the descriptive catalog for works owned by the library; they were the discovery mechanism for users of the library; they provided users with the shelf location of the items; and for the library administration the card catalog was an inventory of the library’s holdings. The data elements for this library card were the original focus of the MARC record, and the first use of the MARC record was to print traditional catalog cards in an era of computer-driven typographic machinery. The creation of the first online catalogs began a transformation of library catalogs that was not anticipated by either the cataloging rules nor the machine-readable record that served them. Most notably, the online catalog made a radical change in the discovery function of the library catalog. Discovery in the card catalog had been an entirely linear affair. Each designated access heading in the catalog record was an entry point in an alphabetical list of headings. Users searched for their desired author, title, subject or series in this alphabetical list. In the online catalog, discovery could be linear, but it could also take place as a keyword or string search within the access headings. Not only could the records be retrieved by words in the headings rather than the entire heading, the boundaries between headings could be broken down. A single search could be performed against more than one heading, for example a search could include words from all subject headings in the record, or could combine keywords from both author and title fields. It could also go beyond the designated access headings and allow searching in fields that were previously unavailable for discovery, such as notes, identifying numbers, and tables of contents. As new forms of discovery were presented in online catalogs, the MARC record began to respond to this environment. Fields were added to the MARC record that did not arise from the cataloging rules. Fixed field coding for various item formats became increasingly detailed so that searchers could limit their retrievals to specific physical formats such as to videotapes in VHS format, or to music on cassette tape. A field was added for a coded form of the mathematical data carried in map records that was normalized for machine manipulation. Other coded fields served the retrieval of music records by composition and number of instruments. None of this was conceivable in the era of the card catalog. Discovery was not the only library catalog function that has changed in this era of automated library systems; the concept of location has made significant changes. Networking, and in particular the Internet, means that the library catalog is no

longer a closed system that only refers to items inside the library. The location function of the catalog has changed from that of identifying a shelf location in a library to pointing to a networked location anywhere in the world. Location is increasingly becoming a dynamic concept that refers less to a fixed position in space and more with networked functions like the OpenURL and the digital object identifier (DOI), which resolve to a means to obtain the item or a service that can be offered related to the item. The inventory function has changed as well. The descriptive record is no longer the primary record of the library’s inventory. Inventory, as well as acquisitions and licensing, have their own functional segments of the integrated library system. Although coupled with the descriptive record, these modules are themselves sophisticated accounting and control systems. Part of a library’s inventory is now virtual; licensed resources that are neither owned by the library nor possessed by it must be accounted for in terms of resources that the library is making available to its users. Since the automation of the library catalog, the most radical change to the MARC record was the creation of a separate record for the very complex functions relating to holdings and locations. The MARC Format for Holdings was the first – and so far only – time that a new MARC record format was developed to fulfill the requirements of library systems. Other formats, like Authorities or Classification, automated existing records, not systems functions. The Holdings record was needed in particular to express complex serials holdings patterns for the system functions that support check-in and receipt prediction. The creation of a holdings-level record that is linked to the MARC bibliographic record gives us a direction for further developments toward a multilevel, multi-functional library systems record. The data structure of the Holdings record, however, is the same as that of the bibliographic record, and that is based on a standard developed in the mid1960s, and therefore shares the structural limitations. There are many reasons to contemplate a more modern replacement for the MARC record. Already there is a movement to transform the 1960s record structure of MARC (Format for Information Exchange, ISO 2709) to a variety of XML formats. These changes aim at allowing greater extensibility in the record structure and better integration with mainstream computing. But they do not necessarily encourage any modification of the fundamental content of the MARC record. Challenging the content is more difficult than challenging the structure – after all, the content of the record has a legacy of over a

167

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

century of library cataloging rules. Also, the challenges to the MARC record are coming from the technologists in the library field, and following the division of labor in the library profession, technologists work with the structure but the content is entirely the responsibility of catalogers. A new development in the cataloging community, however, may give us the opportunity to work on both sides of the library record format. It is an opportune time to move on from the AACR/MARC model that is based on the card catalog to a cataloging philosophy and a machinereadable record that are grounded in database management capabilities and networked information resources. And this brings us to one of our profession’s more recent acronyms: FRBR.

. . . as FRBR:? There is a great deal of buzz in the library world today over the Functional Requirements for Bibliographic Records (FRBR) (IFLA Study Group on the Functional Requirements for Bibliographic Records, 1998). FRBR is indeed a new way of thinking about bibliographic description because it places emphasis on the context of textual works and their relationship to each other rather than merely on the description of individual publications. It moves us toward a view of a universe of inter-linked publications where users eventually will not need to be concerned with differences in formats or the vagaries of nearly identical printings of the same works. Although this article uses the FRBR structure to illustrate the possibility of a multi-level, multifunctional record, it must not be construed to be in support of the FRBR four-tiered model. Although an interesting theory of the levels of works, FRBR is untested in practice and may never be implemented as an actual cataloging record. That said, it is reasonable to assume that a future cataloging structure will embody some degree of hierarchy, especially in the need to express the relationships between multiple versions of the same work. FRBR is not itself a record structure, but it speaks conceptually of four bibliographic levels. The most general level is that of the work, the fundamental intellectual product. That is followed by an individual expression of that work, which is generally considered to be the specific content or edition. The next level is the manifestation. Manifestation is the level where the work is ‘‘productized,’’ that is a particular publication or production. The final level is the item itself, the copy or physical package that is handled by the library.

These four levels are the aspect of FRBR that most librarians are aware of, to the extent that they are even aware of the FRBR movement. But the FRBR document also describes a number of entity-relationship elements that are pertinent to the intellectual work, such as authorship and topic assignment. These entity-relationships essentially define the relationship of key data elements, like creator and topic, to the four bibliographic levels. Although some of the concepts of FRBR have caught on in the library profession like a mild fever, how these concepts might affect both cataloging and library data development is still fairly unclear. One of the dilemmas we face when thinking about FRBR is that it is so clearly incompatible with our current data structure, the MARC record. Attempts are being made to ‘‘FRBR-ize’’ collections of MARC21 records, but in these cases we are trying to imitate some FRBR concepts with records created using pre-FRBR cataloging rules and pre-FRBR record structure. Because MARC records are what we have today, any early experimentation with the FRBR concepts in library systems will have to use the MARC data, but if we want to move forward to a library systems record that is based on the FRBR concepts we need to invent a record structure that supports that experimentation. As a preliminary step to creating that record structure, we can sketch out the basic functions and relationships that such a record will need to have. Whether or not this turns out to be the library world’s next record structure, having a logical map should help us perform the necessary gedanken experiments to determine if this multilevel, multi-functional record structure fulfills the needs of the library catalog of the new millennium.

A record structure Today we have a two-level record using MARC Bibliographic and MARC Holdings. If we take FRBR at its face value we can assume that we will have a four-level record: work, expression, manifestation, item and that any individual document has aspects in each of those levels. In fact, we may have more or fewer levels, but the exact number of levels is not important (see Figure 1). The creators of the FRBR concept state that they were inspired by relational database design (IFLA Study Group on the Functional Requirements for Bibliographic Records, 1998, p. 9). Viewing the FRBR design as objectoriented, however, allows us to make use of the concept of ‘‘inheritance’’ in which qualities and data elements of higher levels are inherited by the linked levels below them. In this way, every

168

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

Figure 1 FRBR tree structure

expression inherits the elements associated with the work, and manifestations inherit those of the expression and the work (see Figure 2). The object-oriented design also helps solve one of the conceptual difficulties of FRBR, that the upper levels may be insufficient or incomplete on their own. With object-orientation we can consider upper levels such as work and expression to be abstract in nature, and therefore not standing alone without at least a manifestation entry. At the same time, the abstract levels can carry data elements absolutely essential to the meaning of the manifestation or item levels and can sometimes fulfill functions on their own, such as some user-oriented displays.

The importance of identifiers Identifiers are important for linking records or parts of records that actually relate to the same item, or for identifying those parts from either inside or outside of the basic bibliographic record. Figure 2 Metadata inheritance

Although information for a single item can transmitted in a single record, the nature of the flexibility of a multi-level record means that within systems records may not be unitary, as the MARC21 bibliographic record is, but could consist of parts that may be accessed separately during different system operations. The exact nature of these operations is not predictable, and we should not attempt to constrain them through a record structure. Instead, any part of the record with its own coherent structure needs to identify itself in a way that maintains its relationship to the whole set of data elements related to a bibliographic item. This means that the record structure will rely heavily on identifiers and that standard identifier schemes for the levels will need to be defined. Assuming a record that uses the FRBR levels as its primary structure, we have a similar structure for identifiers of the various levels (see Figure 3). The creation of identifier schemes is one of the more difficult aspects of metadata development. For the sake of the readability of the examples below I have used some identifiers that we are already familiar with, but this is not meant to suggest that those would necessarily be the appropriate identifiers for our future record.

Systems and functionality Once we have a bibliographic record that is flexible and extensible, we can begin to look at the functions that we want the record to perform. The ‘‘F’’ in FRBR is ‘‘functional’’ and the FRBR movement, if it can be called such, can open up a discussion not only of the functional roles of traditional cataloging elements but could allow us to look at overall library systems functionality and how that functionality can integrate with the core Figure 3 Importance of identifiers

169

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

record for bibliographic description. In the FRBR analysis, the bibliographic record contains data elements for the complete range of functions that the catalog record performs today, including subject analysis, inventory, circulation and preservation. I will suggest that we can take ‘‘functional’’ even further and can break out some, if not all, of the uses of the library system record that are anchored by a core bibliographic description.

Library systems functions Below are some examples of key library systems functions. This is not a complete list; in fact, one of the points I wish to make is that the design of library system records needs to be so flexible that no definitive list of functions is appropriate. Instead, we need to be open to the possibility that our record will be of a plastic nature, and yet we can still expect to have standards that we can count on to make our systems accurate and strong. These library systems functions are not the same as the functions that are defined in the FRBR document, although there is some overlap. The FRBR analysis limits itself to the traditional functions of a library catalog (not a library system): find, identify and select records based on bibliographic characteristics. In our re-thinking of the bibliographic record, we should think broadly about a wide range of functions that have been supported by the bibliographic record and those functions that we support outside of that record in our current library systems, and see if we cannot find a more holistic approach to the system functions and the record structures that support them. Description Let us begin our list of functions with the one with which we are the most familiar: description. Description will be the central function of any library systems record because it is the focus of the system itself; it defines the information item. Description is the purview of the cataloging community and the future of bibliographic description may be informed by the principles expressed in FRBR. Description records the title of the item, the author or authors, the publisher or producer, the date of issue or publication, place of publication, and the extent of the item. It may also place the item in relation to other bibliographic entities such as a series. And it may list parts, such as chapters, in some publications.

Description, however, has never existed as a fully separate function from discovery, and in the traditional catalog record the same fields perform both of these functions, as well as other functions like display. Although it may not be possible draw a firm line between description and discovery, a functional approach to library systems records gives us an opportunity to rethink some discoveryrelated fields such as subject headings and variant titles that are intended primarily for discovery purposes. Discovery Library catalogs exist not simply as inventories of library holdings but as a means to discover what the library has. Although discovery is the key function served by library catalogs today, many data elements used for discovery also serve the description function. The data elements serving this dual function are based on the card catalog, where the primary means of discovery was based on the ordering of cards, not database searching. We still create a record replete with inverted entries (Doe, Jane; Mary, Blessed Virgin) intended to be filed sequentially. Very little of the record is designed to facilitate the keyword indexing that is the mainstay of discovery in the twenty-first century. The ability to carry additional keywords, based perhaps on indices or even the full text of the item, are essential to modern discovery methods. None of this means that catalogers have to create separate data elements to describe and discover the author of a work. Making multiple uses of a single incoming field is something that computer systems do quite well. Using the appropriate data structures, programs can derive a variety of displays and discovery elements from a single field. Location One of the primary functions of the library catalog entry has been to give users the shelf location of item. Location is still an expected function of the library record, but location now has an expanded function because many materials today are not sitting on a library shelf. The locate function is increasingly dynamic, linking to an OpenURL service, or a handle system that will resolve to the location of the item. In all likelihood, the location function will provide more than one route that a user or a system can take to arrive at the actual item. In particular, there is the need to express alternate locations and to link location with user authentication and access permissions. This also implies that the location function is aware of the availability of different physical formats, different

170

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

terms of use, and other aspects that might be part of the end user’s preferences for item selection. Purchase The acquisition function of libraries is one that is not well-served by the records that we create today. This is partly because acquisition has generally taken place before the record is created. However, today’s purchase function can include transmittal of the descriptive record as publishers and library book-jobbers include machinereadable records as part of the purchase. The publishing industry has developed a machinereadable record, Online Information Exchange (ONIX)[1], that will carry both descriptive, purchase and promotional information about print and electronic books. Today’s library record cannot accommodate many of the fields available through these sources. This becomes ever more important as we develop new ways to disseminate electronic materials through libraries. No longer a simple purchase-to-own, electronic materials may have machine-readable contracts that govern the number of simultaneous copies, terms for per-copy payment, special offers for review and browsing, etc. These terms must be able to interact with the library system’s acquisitions function, the circulation function, and even with display to users. Preservation Not all libraries have a need for detailed preservation information, but those that do perform an archival function know that there is no place in our current library systems record for the depth of information that is needed to perform this function well. Preservation information in the past was applied solely to older or archival materials of a certain age, often those that had already undergone some deterioration. With digital materials, preservation information must be applied at the time of acquisition and all storage of digital materials must be seen as having a preservation function[2]. Preservation is an important piece of information for electronic materials that are shared widely over networks. This information is key to understanding the reliability and durability of an electronic information resource. It allows for the cooperative sharing of the burden of digitizing in the case of materials that were not born digital, and for the sharing of the cost of diligent preservation for materials in digital form.

the library’s collection. Our ‘‘virtual’’ patrons, those that enter the library remotely through the online catalog, are often greeted with nothing more than plain text representing the items we hold. The plain text library catalog with its colorless entries has been shown up by the highlypromotional online bookstore sites like Amazon and Barnes and Noble. Since part of the library mission is to encourage enthusiasm for reading and learning, promotion needs to be a function of our virtual public entryway, the library catalog. Some library systems have included the ability to display cover art with the descriptive record, but there are other ways that we can promote reading materials if we have a record structure that allows it. We could provide not only cover art but jacket blurbs and sample readings from the work. Other materials such as reviews or synopses can inspire readers. We need to have the option to store some of these promotional items locally and access others remotely, depending on their interaction with the library catalog.

Structuring the record Although there are many ways that one could create a machine-readable that marries FRBR levels with library systems functions, one conceptual approach is to take each of the functional areas above (and any others we design in the future) and map these to the FRBR levels (see Figure 4). The simple diagram shown in Figure 4 does not represent an actual data design. In particular, it does not determine whether the record will be organized primarily along the bibliographic levels: Figure 4 Functional structure of FRBR

Promotion A patron entering a library may be greeted with a colorful display promoting reading materials from

171

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

Because many functions will not have data at all levels, the identifiers allow the creation of functional records at any appropriate level as long as the rules of inheritance are obeyed, such that any lower level always inherits data elements from the levels above it within its functional group. Note that between functions there is linking based on the identifiers but no inheritance takes place across functions. As we work with this model at greater levels of detail we can explore further the issues and capabilities of the object-oriented model. For example, there may be situations where a lower level element needs to redefine an element inherited from a higher level object. This capability may help us overcome some of the areas where a data element appears to be appropriate to more than one level.

Work Work: Description Work: Discovery Work: Promotion Expression Expression: Description Manifestation Manifestation: Description Manifestation: Discovery Item: Item: Location

or the functional levels: Description Description: Work Description: Expression Description: Manifestation Discovery: Discovery: Expression Discovery: Manifestation Promotion: Promotion: Expression Promotion: Manifestation Location: Location: Manifestation Location: Item

Although each function has four theoretical levels, clearly few functions will exist at all four of them. The location function will be fulfilled at the manifestation level (primarily for Web services for electronic materials) and the item level (for hard copy materials). The promotion function may have some expression and some manifestation entries. Description will be at the work and expression and manifestation levels. This design allows information to be stored at the appropriate level (see Figure 5).

Example: faster/fstr The following example is decidedly schematic and incomplete, but it will serve to show how individual views can make use of the FRBR levels and functional data structures to create coherent but different views of the same bibliographic item. The structure and linking is implemented through levels of identifiers that follow the levels of the basic data model (see Figure 6). The greatest advantage of this record structure for library systems comes when we need to make use of data that serves a function beyond the traditional catalog record. For example, if you want to add preservation data for the audio tape it is a simple matter of adding the preservation data at the item level. The new data does not need to use the same data structure that is used for the descriptive function, it only needs to link to the work/expression/manifestation/item using the appropriate identifiers:

Figure 5 Functions and levels

Item: Preservation work_id = wkid:123 exp_id = 2 manif_id = asin:037540886X item_id=2 [This can contain any set of elements from an identified preservation scheme, such as METS]

In another example, we can carry promotional elements at various levels: Work: Promotion work_id = wkid:123 cover_art = faster.gif review = This latest work by James Gleick takes on our . . . Expression: Promotion work_id = wkid:123 exp_id = 2

172

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

Figure 6 Description function and discovery function

review = The audio tape of James Gleick’s ‘‘Faster’’ is a great way to read this book while on the go . . .

The key point is that required information can be added to a library system and unambiguously associated with a bibliographic record without having to follow the structure and coding of that bibliographic record beyond identification of the appropriate level. This means that library systems

can make use of variable data structures in their records and that use of these data formats will not compromise the integrity of the basic description that is so vital to the full range of library functions. Each of these functions can share the descriptive record or any other functional records as needed by linking to the needed records using a standard identifier system.

173

Future considerations: the functional library systems record

Library Hi Tech Volume 22 . Number 2 . 2004 . 166-174

Karen Coyle

Conclusion After 40 years of evolution, the time has come when we can no longer achieve our goals by incremental tweaking of the MARC bibliographic record. Libraries are passing up tremendous opportunities to serve their users by hanging on to a record structure that was brilliant in the mid-1960s but is limiting in the early 2000s. Although library systems need the flexibility to make use of data from a variety of sources and to interact with other systems, including non-library systems, there is also a need for a highly-structured, standardized core of bibliographic description. This can be achieved by providing separate record structures for that core and for the myriad other functions that library systems may need to perform. The unitary MARC record attempts to fulfill both the core bibliographic function and a smattering of system functions, but this design threatens the integrity of the bibliographic core while it does a poor job of aiding systems design.

Although the particular design elements presented in this paper are far from fully developed, the principle of a multi-functional design and its advantages for library systems should be obvious. With such a design we can have a core bibliographic record that follows the strict rules of library cataloging, and at the same time we gain a great deal of flexibility for our library systems development.

Notes 1 Available at: www.editeur.org/onix.html 2 Available at: www.ifla.org/VII/s19/usefulinks.htm for some useful links on digital preservation.

Reference IFLA Study Group on the Functional Requirements for Bibliographic Records (1998), Functional Requirements for Bibliographic Records, K.G. Saur, Mu¨nchen, available at: www.ifla.org/VII/s13/frbr/frbr.pdf

174

A bibliographic metadata infrastructure for the twenty-first century Roy Tennant

The author Roy Tennant is the eScholarship Web and Services Manager, California Digital Library, Oakland, California, USA. Keywords

Online cataloguing, Archives, Bibliographic systems Abstract The current library bibliographic infrastructure was constructed in the early days of computers – before the Web, XML, and a variety of other technological advances that now offer new opportunities. General requirements of a modern metadata infrastructure for libraries are identified, including such qualities as versatility, extensibility, granularity, and openness. A new kind of metadata infrastructure is then proposed that exhibits at least some of those qualities. Some key challenges that must be overcome to implement a change of this magnitude are identified. Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 . Number 2 . 2004 . pp. 175-181 # Emerald Group Publishing Limited . ISSN 0737-8831 DOI 10.1108/07378830410524602

Without question, the development of the Machine Readable Cataloging (MARC) standard (www.loc.gov/marc/) in the 1960s was a revolutionary advancement in modern librarianship. It formed the foundation for moving libraries into the computer age by providing a common syntax for recording and transferring bibliographic data between computers. In association with the Anglo-American Cataloging Rules (AACR), MARC allowed libraries to share cataloging on a massive scale, and thus greatly increase the efficiency of the cataloging task as well as set the stage for the creation of centralized library databases such as those managed by OCLC and RLG, which are now major worldwide resources. But that was then. This is now. The technical environment has completely changed from the first days of MARC. When MARC was created, computer storage was very expensive – so expensive that every character was treasured. Very few people had access to a computer – not at work, and most certainly not at home. The Internet was no more than an idea. XML was decades away from being an idea. In addition, we are no longer dealing only with library catalog systems. Bibliographic records are being used in a variety of computer systems within libraries; for example, interlibrary loan systems, working paper repositories, and directories of online resources such as e-journals and databases. In many cases, MARC is not a good fit for such systems, and the lack of a rich metadata infrastructure finds libraries making up solutions that may prevent them from building an integrated metadata management system. Our cataloging practices have also been focused completely on the physical item, rather than the intellectual one. This has led to the creation of, in some cases, dozens of records for items with identical content, thereby sowing confusion and frustration among the users of our systems. Only through the application of the principles laid out in the Functional Requirements of Bibliographic Records (FRBR) (International Federation for Library Associations, 1998) do we have some hope of knitting this mess back together on behalf of our clientele. But clearly we can – and must – do better. We now have the opportunity to recreate our foundational bibliographic standards to take advantage of a new array of opportunities, as well as to fix problems with our current set of standards. It will not be sufficient to tweak our existing standards, since we have been using that Received August 2003 Revised November 2003 Accepted November 2003

175

A bibliographic metadata infrastructure for the twenty-first century

Library Hi Tech Volume 22 . Number 2 . 2004 . 175-181

Roy Tennant

method and it is unlikely to provide the scope and scale of change proposed here. We require computer systems, policies, and procedures that allow libraries to create bibliographic metadata, ingest bibliographic metadata from others, make enhancements to it, output it in both complex and simple forms, and do all of this and more with facility and effectiveness. We require a bibliographic metadata infrastructure that likes any metadata it sees, and can easily output simple records when needed, or complex records when called upon to do so. What I am suggesting is different in scope and structure than is implied by my ‘‘MARC must die’’ column in Library Journal, although I alluded to it in the follow-up ‘‘MARC exit strategies’’ column. What must die is not MARC and AACR2 specifically, despite their clear problems, but our exclusive reliance upon those components as the only requirements for library metadata. If for no other reason than easy migration, we must create an infrastructure that can deal with MARC (although the MARC elements may be encoded in XML rather than MARC codes) with equal facility as it deals with many other metadata standards. We must, in other words, assimilate MARC into a broader, richer, more diverse set of tools, standards, and protocols. The purpose of this article is to advance the discussion of such a possibility.

Infrastructure requirements

should records in more complex, granular, and qualified formats. We require an infrastructure that can take in any arbitrary set of metadata and be able to do something useful with it. Extensibility Our needs today will not be our needs tomorrow; therefore, we need an infrastructure that will allow for extensions to be developed and applied without breaking the whole. There must be room at the edges for experimentation, since it is often through such experimentation that the way forward is demonstrated. Extensibility can also be a problem, however, when it allows for differentiation beyond what can be accommodated by those relying on the infrastructure. Extensibility, therefore, should be crafted to allow metadata consumers to ignore extensions should they wish, without rendering the base metadata unusable. For example, with a metadata record format that allows for multiple, discrete ‘‘packages’’ of metadata within it (e.g. as does the Metadata Encoding and Transmission Standard or METS, see below), if a consumer of such a record wishes to ignore one or more of those packages in favor of others, they can easily do so. A specific example would be a record that has both an ONIX and a MARC or MARC-like package (e.g. Meta Object Description Schema (MODS) (www.loc.gov/Standards/mods/). A library may choose to ignore the ONIX package, while a publisher may choose to do the opposite, and a third party might use both.

The qualities of the bibliographic metadata infrastructure we require are many, varied, and in some cases, may be in opposition to each other (e.g. simplicity and versatility). Our challenge is therefore not only to build a sophisticated set of standards, protocols, and tools, but also to do it in such a way that it balances competing priorities. When faced with competing priorities, the needs of our users and our ability to serve those needs should weigh heavier in the balance than our needs for ease of implementation or maintenance.

Openness and transparency To facilitate implementation and extensibility, standards, protocols, and software should be open and transparent as much as possible. Efficiencies of sharing solutions and code can be realized if solutions are offered to others as open source without restrictions that prevent their useful implementation. Transparency is important for potential implementers to see how systems work (e.g. sharing of source code, human-readable metadata formats, etc.).

Versatility A modern metadata infrastructure should be capable of ingesting, merging, indexing, enhancing, and presenting to the user, metadata from a variety of sources describing a variety of objects. A simple example would be accepting an Online Information Exchange (ONIX) record (www.editeur.org/onix.html) for a book in press, then enhancing that record with information from an OCLC record when it becomes available. Formats as simple as unqualified Dublin Core (http://dublincore.org) must be accommodated, as

Low threshold, high ceiling We need a metadata infrastructure that will allow as many people and organizations to participate as possible, which means a system that can accommodate simple uses. But that same infrastructure should also support the more complex requirements of those needing a more full-featured system. The challenge will be to architect a system that can accommodate such diversity without needless complication for low threshold users, nor prevent more complex activities for those requiring a high ceiling.

176

A bibliographic metadata infrastructure for the twenty-first century

Library Hi Tech Volume 22 . Number 2 . 2004 . 175-181

Roy Tennant

Cooperative management No single organization should own the essential pieces of a new bibliographic infrastructure. In particular, the creation and ongoing management of new metadata standards should occur in as cooperative and inclusive process as is practicable. The METS draft metadata standard is a useful example of such cooperative standards development, in which a number of research libraries are participating through the Digital Library Federation in a process managed by the Library of Congress. Modularity The systems we use to create or ingest metadata, and merge, index and serve up or export that metadata should be modular in nature. That is, with a modular system it is possible to replace a component that performs a specific function with a different component, without breaking the whole. For example, a metadata infrastructure that uses XML should be constructed in such a way that whichever XML parser is being used, it can be swapped out for a different one when needed, without adversely affecting other parts of the infrastructure. Hierarchy A modern bibliographic metadata infrastructure must be capable of handling hierarchical information. For example, the table of contents of a book is inherently hierarchical, and there is no good place to put this data in the MARC record. Given an appropriate metadata infrastructure (see below), hierarchy could be handled very easily. Granularity Granularity is a key quality of metadata. If a personal name is encoded as:

Gabriela Garcı´a Ma ´ rquez

rather than something like:

Garcı´a Ma ´ rquez

Gabriela

it will be difficult for software to process names consistently and correctly. Therefore, metadata must be of a sufficient granularity to support all intended uses. Metadata can easily be insufficiently granular, while it would be the rare case where metadata would be too granular to support a given purpose (for more discussion of granularity, see Library Journal, 2002).

Graceful in failure After experiencing the rather forgiving search systems offered by Internet search systems such as GoogleTM, many of our users are likely dismayed to learn how easy it is to fail when searching our library catalogs. Many of our systems will return zero hits rather than do the best that can be done with what is entered. Modern search systems are capable of offering alternate spellings, returning hits ranked by the number of entered terms that are found in the records, or even performing the search using a different index after failing in the selected index. But such features are still rare in most of the bibliographic metadata search systems we offer our users.

A proposal We do not need a bibliographic record format. We need a bibliographic metadata infrastructure that has a number of components, each of which may have multiple variations. Our systems must be able to accommodate a great diversity of record formats to provide us with the flexibility and power that only such diversity can provide. Therefore, although I touch on specific metadata formats that are in use today, or that promise to be useful in the future, it is not meant to be an inclusive and exclusive list. Rather, this proposal is aimed at creating an environment that is welcoming to – and effective for – metadata formats yet to be created. Should we do our work well, choosing to use a new metadata format will not require us to make substantial changes to our underlying infrastructure. A robust metadata infrastructure should be able to accommodate new metadata formats by creating or applying tools specific to that format, explained in greater detail below. Transfer schema The transfer schema (for which clearly XML is the most reasonable solution) must be able to accept any arbitrary package of metadata. We need a method to pass records that may have metadata containers using ONIX, Meta Object Description Schema (MODS), Dublin Core, or virtually any other format. A draft standard that does just this is the Metadata Encoding and Transmission Standard (METS (www.loc.gov/Standards/mods/), see also related articles in this issue). Figure 1 illustrates a METS record with all major segments of the record collapsed. Note how one container holds a MODS record, consisting of a translated MARC record from the UC union catalog, while another holds a record called ‘‘ucpress’’, consisting of

177

A bibliographic metadata infrastructure for the twenty-first century

Library Hi Tech Volume 22 . Number 2 . 2004 . 175-181

Roy Tennant

Figure 1 A collapsed view of a METS record

bibliographic metadata from an in-house database at the University of California Press. This example illustrates how a transfer syntax like METS can carry containers of metadata adhering to different standards, or indeed no standard at all, and be associated with the same object. In this particular case, fields are indexed from both records for user searching and display. Bibliographic schemata As mentioned above, we need the ability to ingest, manipulate, and output metadata in a variety of formats. Some of these formats will initially include MARC, MODS, Dublin Core, and ONIX. There are many others, and still more that have yet to be developed, all of which may eventually need to be accommodated in some way. These various bibliographic schemata must be welcomed within our bibliographic metadata infrastructure, and be made searchable, displayable, and exportable. Application rules Schemata alone will be insufficient – we will also require rules and guidelines on their application and use. We will likely need general rules, as well as schema-specific rules, similar to the way that MARC has been the encoding and transfer syntax of the cataloging rules expressed in AACR2. Best practices Beyond specific rules that must be followed for compliance, there exists a grey area where

implementations may vary. This is both a good and bad thing. The good aspects have to do with the ability to experiment, to make adjustments for local needs, etc. Where this becomes ‘‘bad’’ is when local variances harm interoperability. Therefore, it will be helpful to build a set of ‘‘best practices’’ beyond the scope of application rules, that illustrate the best ways to implement a given infrastructure component. Crosswalks I have recently said that librarians must be able to say ‘‘I’ve never metadata I didn’t like’’ – or that we can walk, talk, eat, and drink metadata of all varieties. To be proficient at this will require crosswalks, or algorithms for translating metadata from one encoding scheme to another in an effective and accurate manner. A number of crosswalks already exist for formats such as MARC, MODS, and Dublin Core. Besides using crosswalks to move metadata from one format to another, they can also be used to merge two or more different metadata formats into a third, or into a set of searchable indexes. Indexing and display A heterogeneous metadata infrastructure presents particular challenges to effective indexing and display. When can a field in one metadata format be treated the same as a field in another? How can we logically deal with significant variances in the metadata we wish to search and display as a

178

A bibliographic metadata infrastructure for the twenty-first century

Library Hi Tech Volume 22 . Number 2 . 2004 . 175-181

Roy Tennant

unified whole? How do we rectify differences in metadata quality, encoding practices, and granularity? Likely we will need to use a variety of strategies depending on the situation. Crosswalking may be sufficient in some cases, while on the other extreme we may find that only human intervention will fix some problems. Enrichment A robust metadata infrastructure will offer opportunities for metadata enrichment – both human and machine-based. For example, book records could be enriched with such things as book reviews, cover art, and the table of contents. These items are already making it into some library systems, but with a robust infrastructure they could also be augmented by such things as robotcollected metadata – wherein software queries other systems and collects relevant metadata to add to the record, in a special encoding for what may be only partially trusted information. Tool sets As we begin to build and use this new metadata infrastructure (as is already happening at OCLC, RLG, and large research libraries), we will begin to accrete tools that can be used to create and manage our metadata systems. For example, XSLT stylesheets for parsing records from one format to another, from XML to an HTML screen display, etc. These tools can be made available to others, and thus enable other libraries to implement this new infrastructure with greater facility and ease. We are already seeing this happen with the Library of Congress making available tools for translating MARC records into MODS, OCLC making available its FRBR algorithm, and METS implementers offering tools for METS record creation and translation. Relationships with other standards and protocols Given an appropriate container/transfer format, virtually any bibliographic metadata format could be accommodated by a well-architected metadata infrastructure. Therefore, existing standards such as MARC (as expressed in XML), Dublin Core, as well as emerging standards such as MODS can all be used as carriers of bibliographic metadata. This will enable us to absorb our legacy systems while also offering new opportunities hitherto impossible. Interoperability and access standards such as the Open Archives Initiative Protocol for Metadata Harvesing (OAI-PMH) (www.openarchives.org/ OAI/openarchivesprotocol.html) and the Simple Object Access Protocol (SOAP) (www.w3.org/TR/ SOAP) are likely candidates for support in a full-

featured metadata infrastructure. These protocols offer a low-overhead way to make bibliographic metadata available to others, for services such as federated searching. Implementation issues Large professional organizations such as OCLC, RLG, and ARL, the Library of Congress, large research libraries, and imaginative and committed individuals must lead the way. Luckily, they mostly already are. One of the prime examples of leadership in this area is the development of METS. Springing from a real need to have a metadata container capable of ingesting and preserving the richness of a variety of metadata standards, as well as the structure of a complex digital object or set of objects, the METS development effort holds great promise for the kind of metadata infrastructure I envision here. The leadership in developing this standard comes from the sources named above, which is no surprise. Those kinds of organizations are both the best suited for such activities (having generally more resources to apply), as well as the most in need of such cutting-edge solutions for digital library problems.

Challenges Moving from a bibliographic infrastructure that is relatively homogenous (MARC21 and AACR2) into a diverse universe of metadata managed and controlled by a variety of library and non-library groups will clearly have its challenges. This short list of challenges is unlikely to be complete, but it may serve as the beginning of an honest assessment about what we must address to achieve the desired state as outlined in this article. Adapting to a diversity of record formats In moving into the brave new world I describe here, we will be leaving the familiar shores of MARC and venturing out into an ocean where we must be able to deal with just about anything that comes our way. For example, if we want to provide searching of working papers to our clientele, we will need to be proficient with the OAI Protocol for Metadata Harvesting and the Dublin Core metadata standard. If we wish to make tables of contents, book covers, book reviews, and other types of information available for the items we own, we will find a need for new metadata standards that will more easily and effectively accommodate such features (yes, many libraries and vendors are making MARC stand on its head to do these things now, but if they are

179

A bibliographic metadata infrastructure for the twenty-first century

Library Hi Tech Volume 22 . Number 2 . 2004 . 175-181

Roy Tennant

based on MARC, they are stop-gap solutions that do not provide a strong foundation for the future). OCLC has already begun laying the foundation for a diversity of bibliographic records formats and types, by rebuilding WorldCat* from the bottom up. ‘‘Extended WorldCat’’ as it is called by OCLC staff, stores records using an internal XWC (for Extended WorldCat) XML-encoded format in an Oracle 9i database. Although presently only taking in MARC21 and Dublin Core records, this infrastructure can potentially include records of a variety of types. The goal is to be able to accept virtually any bibliographic record, provide searching and display of the record, and output it in its original format when called upon to do so. This effort appears to be one of the first major projects to create something similar to the bibliographic infrastructure described here and will likely provide some early lessons on what works and what does not. Crosswalking and merging Taking records for the same object from different input streams and formats and making a merged record that retains the best of the granularity and qualification of the original records is clearly a challenge. But add to that the necessity of creating indexes, search result displays, etc. and the breadth and depth of the challenge begins to become clear. OCLC has done some interesting work in the area of crosswalking in their Metadata Switch project (www.oclc.org/research/projects/ mswatch). The idea is to create a software service that can take a record in one format as input, and output that record in a different metadata format. This service would logically be offered via a Web Services interface, so that the entire interchange can happen using software only. Such a service would allow distributed systems to take advantage of a robust central infrastructure for record translation and crosswalking. Early findings in this project suggest that while some records can be crosswalked in a straightforward manner, others will require first mapping them to an ‘‘interoperable core’’ before the translation process can be completed (Godby et al., 2003). As is the case in many situations, to appear simple from the outside there must be sufficient internal complexity. OCLC’s experience appears to indicate that we have not yet plumbed the full extent of the required internal complexity to create a simple service for metadata translation. Accurate record merging is a challenge even with a relatively homogenous data stream (e.g. MARC and AACR2), but with heterogeneous record formats and rules for applying those formats, it is a challenge that may only be partially met for quite

some time. The International Standard Text Code (ISTC) may help (www.nicbric.ca/iso/tc46sc9/istc/ htm), as may perhaps the algorithms being developed in support of implementing the concepts of the FRBR. But widespread implementation will take time, and meanwhile we will need to do the best we can with what we have. In addition, ‘‘merging’’ can have different meanings depending on the result desired. One type of merging takes two or more metadata records for an item and merges them into one record that is not intended to be displayed or exported as separate records again (i.e. ‘‘unification’’). Another type of merge would retain the information required to reconstruct the separate records again (i.e. ‘‘federation’’). Federation of records would be required if a system must be able to provide the original records from which the merged version was created (for example, if different contributing organizations needed to maintain their version of the record). Indexing different record formats into a single index will require crosswalking different fields into the same virtual index for searching. Where record formats have fields not found in other formats, or that have metadata that are of a different granularity (e.g. no distinction between first and last personal names), there will be problems. The challenge of display can conceivably be met by the provision of different display profiles for different types of records, but doing this in a way that will not be confusing to the user will again be a challenge. It may be easier to create summary displays or brief records that appear relatively homogenous, but full record displays will likely exhibit more divergence. System migration To migrate from systems based on MARC/AACR2 to the infrastructure proposed here is clearly a significant undertaking. As anyone who has ever been involved with migrating from one integrated library system to another knows, even moving from one system based on MARC/ AACR2 to another can be daunting. Within this context, the changes proposed here must clearly be fostered by cooperation at a national, and perhaps international, level and carefully staged. However, this proposal is about inclusion if it is about anything, and therefore our existing records can certainly be included, albeit in an envelope that can accommodate other record formats. But despite the very real challenges of a systemic and widespread migration to a new kind of metadata infrastructure, I believe that it is both necessary and achievable. We can no longer afford to have systems that are inadequate to meet both

180

A bibliographic metadata infrastructure for the twenty-first century

Library Hi Tech Volume 22 . Number 2 . 2004 . 175-181

Roy Tennant

the challenges and opportunities that currently face libraries. Staff retooling One of the most significant barriers to the implementation of this proposal is ourselves. Most of us in the profession today have never known anything but MARC and AACR2 as an online metadata infrastructure. But now we must dramatically expand our understanding of what it means to have a modern bibliographic metadata infrastructure, which will clearly require sweeping professional learning and retooling. Such a vision may be daunting when viewed as a whole, but when attacked piecemeal over time, there is indeed hope for achieving it. There are already hopeful signs that librarians are rising to the challenge before them, whether by participating in metadata standards development activities such as the Dublin Core and METS efforts, or simply in learning more about metadata issues by reading and attending conference presentations.

The once and future infrastructure With a robust bibliographic metadata infrastructure as a foundation, many things become possible that may have been more difficult or even impossible with the type of single-stream infrastructure we presently have. There is no doubt that engineering such an infrastructure will be a long and difficult task. The potential benefit to both libraries and library users, however, is likely to be both substantial and longlasting – particularly if it is constructed with the essential qualities of extensibility and flexibility. Also, we are apparently already on the path to a better future, with important early work in process both within key organizations (e.g. OCLC) and among them (e.g. the cooperative METS effort). Likewise, individual librarians are learning how to use technologies like XML and XSLT that will form the foundation of their new bibliographic tool set.

These are hopeful signs that we are beginning to muster both the political will and technical skill to support the type of massive change proposed here. Having not been a part of the effort to create MARC those many decades ago, I cannot imagine what conditions fostered its birth. But in my ignorance I imagine that the opportunities created by computers inspired Henriette Avram and company to rise to the challenge of recreating our professional infrastructure in a revolutionary and farsighted way. We would do well to look to our past for the inspiration we need to create a future that our descendants will look back upon with similar amazement.

References International Federation of Library Associations (1998), Functional Requirements of Bibliographic Records, K.G. Saur, Mu¨nchen, available at: www.ifla.org/VII/s13/ frbr/frbr.htm Tennant, R. (2002a), ‘‘The importance of being granular’’, Library Journal, Vol. 127 No. 9, 15 May, pp. 32-4, available at: http://libraryjournal.reviewsnews.com/ index.asp?layout=article&articleId=CA216337 Tennant, R. (2002b), ‘‘MARC must die’’, Library Journal, 15 October, pp. 26-7, available at: http://libraryjournal. reviewsnews.com/index.asp?layout=article&articleid= CA250046 Tennant, R. (2002c), ‘‘MARC exit strategies’’, Library Journal, 15 November, pp. 27-8, available at: http://library journal.reviewsnews.com/index.asp?layout=article& articleid=CA256611

Further reading Godby, C.J., Smith, D. and Childress, E. (2003), ‘‘Two paths to interoperable metadata’’, paper presented at the 2003 Dublin Core Conference, Seattle, WA, 28 September-2 October, available at: www.ischool.washington.edu/ dc2003/ Library Journal (2002), ‘‘The importance of being granular’’, Library Journal, Vol. 127 No. 9, 15 May, pp. 32-4.

181

Other articles A comparative review of common user interface products Daniel G. Dorner and AnneMarie Curtis

The authors Daniel G. Dorner is Director of Library and Information Management Programmes and AnneMarie Curtis is a Research Assistant, both at the School of Information Management, Victoria University of Wellington, Wellington, New Zealand.

Keywords Common user interface, Library portals, Software evaluation

Abstract A common user interface replaces the multiple interfaces found among individual electronic library resources, reducing the time and effort spent by the user in both searching and learning to use a range of databases. Although the primary function of a common user interface is to simplify the search process, such products can be holistic solutions designed to address requirements other than searching, such as user authentication and site branding. This review provides a detailed summary of software currently on the market. The products reviewed were EnCompass, MetaLib, Find-It-All OneSearch, ZPORTAL, CPORTAL, InfoTrac Total Access, MetaFind, MuseSearch, SiteSearch, Single Search, Chameleon Gateway, and WebFeat.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 182-197 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543502

Introduction Bradley (1995) describes the common user interface as the Holy Grail of the information industry. During the 1990s, as a number of databases were being made available to clients of academic and other research libraries, librarians were quick to recognise the difficulty that a variety of independent interfaces present to searchers, and the common user interface was identified as a potential solution. Common user interface software is designed for the hybrid library environment, where a library has both print-based and electronic document collections. In practice, hybrid libraries are likely to provide their clients with access to the Internet, electronic journals, electronic documents, databases of indexed or full-text journal articles, as well as one or more online public access catalogs (OPACs) and traditional print-based collections. In the past, library users were obliged to search each of these resources individually. Common user interface software provides a single search point for access to a hybrid library’s diverse electronic collections and catalogues, reducing the time and effort spent by users in both searching and learning to use a range of databases. Although the primary function of a common user interface is to simplify the search process, library common user interface products can be holistic solutions designed to address other requirements, such as user authentication and site branding. Common user interface software is a subset of library portal software products which aggregate multiple channels of information and communication through a single interface. Patron demand for remote access to library collections and services and the need for consortia to integrate their collections are increasingly elements in the uptake of common user interface products by libraries. Curtis and Greene (2002) report that “the marketplace is filling rapidly as vendors, including most major ILS vendors, recognise how much it is worth to libraries to be able to provide a way to make searching their resources easier for their users”. This review provides a detailed summary of the features of some of the products currently available from vendors. As part of this research, 79 common user interface and portal features were identified and classified into eight broader categories of like features: (1) searching; (2) user interaction; (3) customisation; Received 23 June 2003 Revised 11 August 2003 Accepted 14 August 2003

182

(4) (5) (6) (7) (8)

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

(2) articles discussing database linking protocols and the technical details of developing common user interface software; and (3) case studies describing the implementation of a common user interface in a particular library.

authentication; design; database communication protocols; after sale support; and software platforms supported.

The data for this research were obtained through a survey based on the evaluation criteria, which was sent to common user interface software vendors to collect information about their products. The results of the survey of software vendors include a detailed breakdown of each product’s performance against each of the evaluation criteria. The vendors who participated in this survey were Endeavour, Ex Libris, Follet, Fretwell-Downing, Innovative Interfaces, MuseGlobal, OCLC, SIRSI, WebFeat, and VTLS. The results of this survey indicate that many of the criteria evaluated are established features which are present in the majority of products, with other features maturing rapidly. Common user interface capability, including broadcast searching, is readily available from a number of products, as are many complementary features. Survey responses were returned from December 2002 to April 2003 and collected information on current and next-release versions of each product. Overall, the standard of the common user interface software products reviewed was very high, with products supporting on average around 75 per cent of the evaluation criteria. MuseSearch, ENCompass, MetaLib, Single Search and WebFeat are the five highest scoring common user interface software products. Products with average or below average scores may have adequate functionality for many libraries, depending on the extent of their requirements. The presence of an open source product (Site Search) in this review should be an encouraging sign for libraries that have previously avoided common user interface software products on the grounds of cost.

Literature review Over the last decade, librarians have continued both to laud the common user interface as an answer to the problems of searching the dozens of databases available in many libraries, and to lament the difficulties in implementing a search interface across a wide variety of database communications protocols. The literature falls into three main categories: (1) articles discussing the potential of common user interface software, which tend to include “wish lists” of desired features;

Wish-lists of desirable features The “wish list” branch of the literature provides suggestions as to which features of library portal software, in addition to common user interface capability, are most desirable to librarians. The consensus about the role of a common user interface is that it should be able to broadcast a single search to a variety of databases in different locations and in different formats and to unify the results from these databases, before presenting them in a useful order after de-duplicating the results. Boss (2002) provides an exhaustive list of desirable features in a “sample request for proposals for a portal interface product”. Pulkowski (2000), in his report on the development of the UniCats system, highlights the use of similar technologies for controlling data presentation, client authentication and cost control, and he predicts that this technology will become increasingly important in developing customised charging for database usage. Other features that library portal products can include are: . personalised client accounts; . customisable design; . inter-library loan capability; and . online reference (chat) services (Arant and Payne, 2001). While the common user interface concept is most often praised in the literature, a recurring criticism of the common user interface is that searching might only work at the lowest common denominator (Bradley, 1995; Payette and Rieger, 1997; Wilson, 2001). Each individual database has its own unique fields, controlled vocabularies, commands and syntax. Elements that all databases share may be searchable through a common interface, but unique features of an individual database may be sacrificed in the process. This concern highlights the importance of database communication protocols and database linking to the development of common user interface software. Database linking The literature relating to database communication protocols and database linking is largely technical in nature, and describes practical efforts to develop techniques for searching across heterogeneous

183

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

databases. Van de Sompel and Hochstenbach (1999a) argue that: “The omnipresence of the World Wide Web has raised users’ expectations in this regard. When using a library solution, the expectations of a net-traveller are inspired by his hyperlinked Web-experiences. To such a user, it is not comprehensible that secondary sources, catalogues and primary sources, that are logically related, are not functionally linked”. The variety of formats is not only problematic for searchers, but also requires sophisticated mapping between different communications protocols on the part of the common user interface. In the mid-1990s, people looked to the Z39.50 standard as a common framework for inter-database communication (Bradley, 1995; Friend, 1994; Payette and Rieger, 1997; Pope, 1998). Although the Z39.50 protocol provides a means of communication between database systems, databases that share the Z39.50 protocol may use the same field in different ways or might use different controlled vocabularies within a field. Bradley (1995) predicted that restricting database development along lines that support interoperability might lead to the stagnation of new developments in commercial databases. It seems the reverse has happened, and commercial databases have been developed with useful and unique features at the expense of inter-operability and standardisation, which contributes to the difficulty of creating a true broadcast search across heterogeneous databases. Producers of common user interface software have recognised the need to support a wide variety of information formats by complex mapping from each format to the common user interface. For a common user interface to be successful in today’s environment, it must support a range of formats and protocols other than Z39.50. Formats that may be supported by a common user interface include OpenURL, HTTP, SQL, XML, MARC, CrossRef, DOI, EAD, Dublin Core and Telnet. An aspect of common user interface technology that has also received attention is the development of technologies to provide persistent links between resources. Researchers at the University of Ghent Library developed a dynamic linking product called SFX in order to solve the problem of outdated links (Van de Sompel and Hochstenbach, 1999b). SFX has since been incorporated into MetaLib, an Ex Libris library portal product.

interface would significantly improve their bibliographic database search experiences” (Payette and Rieger, 1997). Mahoney and Di Giacomo (2001) report from the Los Alamos National Research Laboratory that “the migration of databases to a similar interface was prominent among customer requests. Another request was to have one place to search all the databases”. Friend (1994) suggests that because clients don’t need to know anything about the structure or syntax of individual databases, the common user interface “has tremendous promise for serving users who are reluctant to learn too many different interfaces but want to satisfy their information needs”. The first libraries to develop common user interfaces were academic and research libraries that had the resources to develop their own software, or that were the beneficiaries of academic research developing common user interface software, such as Penn State University (Friend, 1994), the Technical University of Denmark (Kvaerndrup, 2000), and Los Alamos National Laboratory Research Library (Blake, 2002). The Association of Research Libraries (ARL) convened a Scholars Portal Working Group in 2000 to investigate and develop a portal to electronic resources for several research libraries. As part of this project, ARL conducted a survey of portal functionality provided by ARL libraries in February 2002 to identify the state of current or planned research library applications of portals with common user interface capability (Wetzel, 2002). A total of 77 ARL member libraries responded to the survey, with 19 indicating that they offered some kind of portal. ARL determined that only six institutions offered a portal with common user interface capability. Subsequently, ARL surveyed library portal vendors to identify likely partners for its planned Scholars Portal project, eventually choosing to work with the Fretwell-Downing product ZPORTAL. As part of this process the ARL Scholars Portal Working Group developed a set of desired features for library portal products, which is included as an appendix to their final report (Association of Research Libraries Scholars Portal Working Group, 2002). Following a similar project, the Library of Congress Portals Applications Issues Group (LCPAIG) has released a draft list of portal application functionalities for public comment (Library of Congress Portals Applications Issues Group, 2003). Neither organisation has made the results of their evaluation of individual portal products available.

Case studies Case studies indicate that a common user interface is a desirable addition to library client services. In a study at Cornell University’s Albert R. Mann Library, “89% of the faculty and 100% of the students expressed the view that a common user

Systems evaluation The field of library systems evaluation provides a theoretical framework for choosing a methodology

184

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

with which to evaluate library portal products with common user interface capability. Lancaster and Sandore (1997) define a theory of systems evaluation that allows for the evaluation of software at each stage of software procurement, installation and use. Lancaster and Sandore (1997) separate systems evaluation approaches into two categories: (1) evaluation without user involvement or with less than full user involvement; and (2) evaluation with full user involvement.

interface products currently available, in recognition of the importance of cross-database information retrieval within today’s library environment. Part of the scope of this research was to develop a set of evaluation criteria that adequately reflects the current state of common user software products, which could be used to evaluate future products or adapted for use in a library’s request for proposal for a common user interface product. Common user interface software products were evaluated against a specially developed set of evaluation criteria to identify software trends and current market leaders. This review provides librarians with a detailed summary of products currently on the market. Common user interface software is likely to be an expensive and complex purchase for any library, and the results of this research should help library managers to make well-informed decisions.

Evaluation with user involvement tends to take place with a fully operating system in which one can evaluate not only system characteristics but also how users interact with the system and with what degree of success. Evaluation with no user input may be undertaken before a system has been selected or is fully operational, as it focuses on system features and may be used in the selection of a system, the acceptance of a system, or in decisions about system enhancement or replacement. This current review of common user interface software products falls within the category of systems evaluation with partial or no user involvement, and replicates the style of evaluation that a library planning to purchase common user interface software would use. However, it cannot provide a complete picture of how effective the software is once installed. Lancaster and Sandore (1997) identified the checklist as a useful tool for systems evaluation with partial or no user involvement. They noted that: . . . weighting of system features is always desirable. Otherwise checklist scores can be deceptive. Without weighting, a system with many superfluous features may obtain a high overall score, but may actually be weak in features that librarians and users rate as most important.

The application of two weighting schemes, developed by the authors for the National Library of New Zealand, resulted in only minor changes to the product scores and rankings, and those weighted results are excluded from this article. Librarians who wish to apply weightings to the data presented in this article are encouraged to review the methodology outlined in our report to the National Library of New Zealand (Dorner and Curtis, 2003). The need for a product review The complexity of features supported by common user interface software products means that it may be difficult for librarians to ascertain which product best meets their library’s needs. This comparative review of common user interface software products has been conducted to help make sense of the diversity of common user

Methodology Identification of common user interface products Web addresses for potential common user interface product vendors were obtained from journal articles found in the literature review and in an extensive search of the Internet using Google and the Yahoo and BUBL Link directory sites. Product information was downloaded from the vendor Web sites. Some vendors offered more than one common user interface product. Products where a common user interface component was present but inseparable from another product, such as an OPAC, were excluded to ensure a homogenous sample of software for comparison. Vendors who offered another vendor’s common user interface product under their own brand were also excluded from the review. Development of evaluation criteria An extensive list of evaluation criteria for common user interface software products was obtained by analysing the common user interface product literature collected. Software features mentioned in the product information were indexed and grouped into categories. Common user interface software features were identified and similar or identical features with proprietary names were compiled under a generic description. The resultant list of product features was supplemented by criteria from an RFP for a library Web portal, developed by Boss (2002), which were applicable to common user interface software. Any criteria identified by Boss (2002) that had not already appeared in the conceptual analysis of the product

185

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

information were added. The lists of library portal features and functionality developed by the Association of Research Libraries Scholars Portal Working Group (2002) and the Library of Congress Portals Applications Issues Group (2003) were not incorporated as they were not publicly available at the time the evaluation criteria used in this study were developed. The list of software features was divided into the following broad categories: . searching; . user interaction; . customisation; . access control; . design; . database communication protocols; . after sale support; and . software platforms supported.

The researchers contacted the vendors by e-mail using addresses obtained from the product Web sites. The initial contact e-mail explained the purpose of the survey and included the survey document as an attachment in RTF format, which vendors could complete in electronic form and e-mail back to the researchers. Using the survey, participating vendors were asked to record whether their common user interface software product fulfils each of the evaluation criteria. A total of 14 vendors were invited to participate in the research project. Of the vendors 10 agreed to participate and returned surveys for their products (71 per cent response rate). One vendor returned a survey for two common user software products, resulting in a total of 11 surveys returned. No products were reviewed without the co-operation of their vendors.

A survey of common user interface software vendors Conducting a comparative review of software that was fair to each of the vendors required a standardised method for collecting data about each product. A survey based on the set of evaluation criteria was developed and distributed to common user interface software vendors in order to collect information about their products. This method of systems evaluation with partial or no user involvement was adapted from a study by Raol et al. (2002), in which a variety of portal products was successfully compared against a checklist of evaluation criteria, and the current state of maturity of any given software feature or category of features was determined as the percentage of products they were found in and the relative maturity of individual enterprise information portal products. While this research employs the checklist approach to data collection, it departs from the methodology used by Raol et al. (2002) by requesting that the software vendors evaluate their products against the set of evaluation criteria, rather than having the researchers make the evaluations. The product information available on vendor Web sites is varied and often incomplete, and the researchers had limited access to live common user interface software products. These factors made it impossible to evaluate each of the products fairly, and the researchers therefore decided that a more robust method of data collection was to distribute a detailed checkliststyle survey based on the evaluation criteria on which the vendors themselves were able to record information about their product’s features. This method ensured that each product was evaluated as fairly as possible within Lancaster and Sandore’s (1997) framework of systems evaluation with partial or no user involvement.

Comparing common user interface software products The coded data obtained from the vendor survey were converted to numeric values in order to calculate overall scores for each of the products. The values received for each of the evaluation criteria were added together to obtain an overall score, which could be expressed as a percentage of the total score the products could possibly have obtained. The product scores provide a quantitative basis for comparison between different products and features. Two separate numeric conversions were conducted: (1) Conversion scheme A, to calculate the current product scores. (2) Conversion scheme B, to calculate future product scores using the data relating to the next release of each product. In conversion scheme A, a Y (yes) code resulted in a criterion score of 2, a P (partially) code resulted in a criterion score of 1, and an R (will be available in the next release) code resulted in a criterion score of 1. N (no) and U (unknown) codes resulted in criterion scores of 0. In conversion scheme B, any R codes were converted to 2 to calculate the scores for the products at their next release. An alternative conversion scheme was investigated in which a Y code resulted in a criterion score of 4, a P code resulted in a criterion score of 2, an R code resulted in a criterion score of 1, and N and U codes resulted in criterion scores of 0. Changes of this kind resulted in a negligible shift of 1 or 2 percentage points in the score for each vendor, and it was decided that it would be fairer to vendors to score P codes and R codes equally, as these codes were not exclusive and might change or improve in the near future.

186

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Limitations Only stand-alone common user interface software products were included in this review. This is not a comprehensive review of common user interface software products, as not all the vendors who were invited to participate in this research chose to do so. Because this review is based on vendors’ comparisons of their own common user interface software products against a list of criteria and without reference to live software and without user involvement, there can be no guarantee that products will be able to deliver all the functionality indicated by the vendors.

Three other common user interface software vendors were contacted but did not submit a survey by the final acceptance date, resulting in their exclusion from this review. They were AutoGraphics (Agent), Sea Change (WebClarity), and Ovid (WebSPIRS).

Common user interface software products reviewed In order to provide a reliable and representative review of common user interface software products, it was important to include as many eligible products as possible. Information and reviews in the academic literature and subsequent analysis of potential common user interface software product sites on the Internet yielded contact details for 14 vendors of library common user interface software. The vendors invited to participate in this research project were AutoGraphics, Endeavour, Ex Libris, Follet, FretwellDowning, Gale, Innovative Interfaces, MuseGlobal, OCLC, Ovid, Sea Change, SIRSI, WebFeat, and VTLS. Every effort was made to ensure that all common user interface software vendors were invited to participate in the research, and that an appropriate employee of each of these companies was contacted. Ten of the vendors contacted agreed to participate, and completed a survey for their product (Table I). Fretwell-Downing returned a survey for two common user software products, resulting in a total of 11 completed surveys. Gale declined to participate in the project on the valid grounds that their product, Total Access, is not for sale outside the US due to technology rights issues.

Results of survey of common user interface vendors The data obtained in the survey of vendors describe the capabilities of the common user interface products reviewed and can be used to ascertain the maturity of each of the software features evaluated. Vendor responses have been coded as Y (yes), N (no), P (partially), R (will be present in the next release), and U (unknown). The frequency of positive (Y) responses among the products for each feature has been expressed as a percentage value in the right hand column of each of the results tables (Tables II-IX). Features found in over 70 per cent of the products are described as “established features”, features found in 50-70 per cent of products are described as “maturing features”, and features found in less than 50 per cent of the products are considered to be “emerging features”. The frequency with which individual evaluation criteria are supported by common user interface software products is a useful indication of each feature’s maturity and provides quantitative basis for assessing which of the claims of common user interface capability are hype and which are the current standard for common user interface software. Some of the release dates have been passed since the survey was conducted, and the reader should note whether a superseded version of the product was reviewed (see Table X). Search features Advanced search functionality, particularly crossdatabase searching, is the primary purpose of

Table I Participating common user interface software vendors Vendor

Product name

Product URL

Endeavour Ex Libris Follett Fretwell-Downing Innovative Interfaces MuseGlobal OCLC SIRSI VTLS WebFeat

EnCompass MetaLib Find-It-All OneSearch ZPORTAL/CPORTAL MetaFind MuseSearch SiteSearch Single Search Chameleon Gateway WebFeat

http://encompass.endinfosys.com/whatis/whatisENC2.htm www.exlibris.co.il/metalib www.fsc.follett.com/products/finditall_collection/index.cfm www.fdgroup.com/fdi/products/about.html www.iii.com/pdf/map0103.pdf www.museglobal.com/Products/MuseSearch/index.html www.oclc.org/oclc/promo/9275webz/9275webz.htm www.dra.com/Sirsiproducts/broadcastsearch.html www.vtls.com/Products/gateway/ www.webfeat.org/prism.html

187

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Table II Product performance on search criteria Search criteria Search 1 Search 2 Search 3 Search 4

Search 5 Search 6 Search 7 Search 8 Search 9 Search 10 Search 11 Search 12 Search 13 Search 14 Search 15 Search 16 Search 17 Search 18 Search 19 Search 20 Search 21 Search 22 Search 23 Search 24

Broadcast searching of databases using the same communication protocol Broadcast searching of databases using a variety of communication protocols Broadcast searching of Web sites and Internet search engines Broadcast searching of citations and full text in databases by keyword Broadcast searching of all databases by field Field searching of individual databases Keyword searching of citations and full text in individual databases One or more Web-based OPAC databases can be searched A list of databases and resources with descriptions can be searched Boolean searching Wildcard and truncation searching Proximity searching Hotlink searching (hyperlinks to a cross-referenced item or search result) Database thesauri can be searched Searches can be saved Searching using diacritics (including macrons) is supported Results returned from different databases can be merged Results returned from different databases can be de-duplicated Results are ranked for relevance Results can be limited by any field Results can be sorted by any field in ascending or descending order Results can be saved or downloaded Results can be printed Results can be e-mailed

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

100

N

Y

Y

Y

Y

Y

Y

Y

N

Y

Y

82

P

Y

Y

Y

Y

Y

Y

Y

N

Y

Y

82

N

P

Y

Y

Y

Y

Y

Y

Y

Y

Y

82

Y Y

Y Y

Y Y

P P

Y Y

Y Y

Y Y

Y Y

P N

Y Y

Y Y

82 82

Y

P

Y

Y

Y

Y

Y

Y

Y

Y

Y

91

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

Y

91

Y Y Y Y

R Y Y N

Y Y Y Y

Y Y P N

Y Y Y Y

Y Y Y N

Y Y Y Y

Y Y Y Y

P Y Y P

Y Y Y N

R Y Y N

73 100 91 45

Y Y Y

Y Y R

Y Y Y

Y N Y

R Y Na

Y N Y

Y Y Y

Y Y Y

Y Y P

Y Y Y

Y P Y

91 73 73

Y

R

U

Y

Y

Y

Y

Y

P

N

N

55

Y

N

Y

N

Y

Y

Y

Y

P

N

Y

64

N N Y

N R P

Y Y U

N N P

Y Y P

Y N N

Y Y Y

Y Y Y

P N P

N N Y

Y N N

55 36 36

P Y Y Y

N R R R

Y Y Y Y

N N Y N

P R R Y

P Y Y Y

Y Y Y Y

P Y Y Y

P N P Y

P Y R R

Y Y Y Y

27 64 64 73

75

42

92

50

75

79

100

96

33

67

75

8

42

0

17

21

4

0

4

42

13

8

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ product supports this function; P ¼ product supports this function partially; R ¼ the next release will support this function; N ¼ product does not support this function; U ¼ not known whether product supports this function; asearch criterion 15 will not be included in the next release of MetaFind but is planned for future releases

common user interface software products. Accordingly, searching was the largest group of evaluation criteria, and many of the criteria chosen represent established features. Search criteria 1-5 relate directly to broadcast searching across databases and are features that particularly define common user interface software. That these criteria are present in over 80 per cent of products indicates that common user interface software is no longer the Holy Grail of the information industry but a viable reality. Search

criteria 6-8, which facilitate using the native search capabilities of individual databases, are present in over 80 per cent of common user interface software products, allaying any fears that common user interface software products will dumb down the search process. Boolean, wildcard and truncation searching are also widely supported features. Proximity searching is an emerging option found in 45 per cent of products. Searching using diacritics is available in 55 per cent of products.

188

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Table III Product performance on user interaction criteria User interaction criteria

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

User 1 User 2 User 3

Y Y

Y Y

Y Y

Y N

Na Na

Y Y

R Y

P Y

P P

R Y

Y Y

55 73

N

N

U

N

Na

N

P

N

N

R

P

0

Y N

R N

Y Y

N N

Y Na

Y Y

Y Y

Y N

Y N

R Yb

Y R

73 36

Y Y

N Y

Y Y

N Y

Na R

Y Y

Y N

N N

N P

R Y

N Y

36 64

71

43

86

29

14

86

57

29

14

43

57

0

14

0

0

14

0

29

14

43

57

29

Search history is recorded Previous searches can be refined Users can specify fields to include when printing or downloading results Users can select which records from search results to print or download Results can be downloaded to bibliographic software SDI is available (users can request that a particular search be run automatically and the results sent to them by e-mail) Context-sensitive help is provided

User 4 User 5 User 6

User 7

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ product supports this function; P ¼ product supports this function partially; R ¼ the next release will support this function; N ¼ product does not support this function; U ¼ not known whether product supports this function; a user criteria 1, 2, 3, 5 and 6 will not be included in the next release of MetaFind but are planned for future releases; b WebFeat requires an additional module called Write Note to support this criterion

Table IV Product performance on customisation criteria Customisation criteria

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

Custom 1 Custom 2

Y Y

Y N

Y Y

Y Y

Y R

Y Y

Y Y

Y Y

N P

Y Y

Y Y

91 73

Y Y Y Y

Y Y Y Y

Y Y Y Y

N N N R

Y Y Y Y

Y Y Y Y

Y Y Y Y

Y Y Y Y

Y Y Y Y

Y Y Y Y

Y P P P

91 82 82 82

Y

P

Y

N

Y

Y

Y

Y

Y

Y

N

73

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

91

Y Y Y Y Y

N N N N Y

Y Yb Y Y Y

N N N N R

Na R R N Y

Y Y Y N Y

Y Y Y P Y

N P P N Y

N P P N Y

R Y Y Y Y

N Y Y N Y

36 55 55 27 91

N

U

Y

N

Y

Y

Y

Y

Y

Yc

P

64

P

R

P

Y

Y

Y

Y

Y

N

Y

P

55

87 7

40 13

93 7

27 13

67 20

93 0

93 7

73 13

53 20

93 7

47 33

Custom Custom Custom Custom Custom

3 4 5 6 7

Custom 8 Custom 9

Custom Custom Custom Custom Custom

10 11 12 13 14

Custom 15

Links to library’s own Web-based OPAC Displays local holding information Database groups can be defined for specific client groups Default display fields can be set Results displays can be customised The design of user interfaces can be customised Different user interfaces can be created for each client group Different access levels can be assigned for each client group SDI is available (librarians can request that a particular search be run automatically and the results sent to users by e-mail) Results can be linked to document delivery services Results can be linked to interlibrary loan services Print limits can be set Transaction logs and usage statistics can be obtained Vendor can provide pre-loaded links to a wide range of databases The software is capable of suspending a potentially long search at a pre-determined point and providing the user with options to narrow or terminate the search, examine a portion of the results or continue the search

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ product supports this function; P ¼ product supports this function partially; R ¼ the next release will support this function; N ¼ product does not support this function; U ¼ not known whether product supports this function; a Custom criterion 9 will not be included in the next release of MetaFind but is planned for future releases; b LinkFinderPlus is required to support custom criterion 10 in ENCompass; c Custom criterion 14 can be supported in WebFeat using their 1Cate service

189

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Table V Product performance on authentication criteria Authentication criteria

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

Access 1 Access 2 Access 3

Pa Pa Pa

N R N

Y Y Y

Y Y Y

Y Y Y

Y Y Y

Y Y Y

Y Y Y

Y Y Y

Y Y Y

R Y N

73 82 73

Authorisation based on IP address Authorisation based on a single user password Authorisation based on domain name

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

0

0

100

100

100

100

100

100

100

100

33

100

33

0

0

0

0

0

0

0

0

33

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ product supports this function; P ¼ product supports this function partially; R ¼ the next release will support this function; N ¼ product does not support this function; U ¼ not known whether product supports this function; a Chameleon iPortal provides access criteria 1, 2 and 3 to Virtua databases only

Table VI Product performance on design criteria Design criteria Design 1 Design 2 Design 3 Design 4

Library name, logos, images and links can be added to interface Interface colour scheme can be modified Printing and downloading options are simple to find and use Help options are available on every page

Criteria supported Criteria partially supported or planned for next release

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

Y Y

Y Y

Y Y

R N

Y Y

Y Y

Y Y

Y Y

Y Y

Y Y

Y Y

91 91

Y Y

R Y

Y Y

R Y

Y P

Y Y

Y Y

Y Y

Y Y

Y Y

Y Y

82 91

100 0

75 25

100 0

25 50

75 25

100 0

100 0

100 0

100 0

100 0

100 0

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ product supports this function; P ¼ product supports this function partially; R ¼ the next release will support this function; N ¼ product does not support this function; U ¼ not known whether product supports this function

Table VII Database communication protocols supported by product Database communication protocols

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Data 9 Data 10 Data 11 Data 12 Data 13

Y N Y Y Y N Y R R R Y N N

Y U N Y Y N Y Y U Y Y N Y

Y N Y Y Y Y Y Y Y Y Y N Y

Y N U Y Y N Y U U Y U N N

Y Y Y Y Y Y Y Y Y Y Y N Y

Y Y Y Y Y Y Y R R Y Y N R

Y U Y Y Y P Y Y P Y Y N Y

Y Y Y Y Y Y Y Y N Y Y Y Y

Y N N N Y N N N N P P N N

Y Y Ya Y Y Y Y Y Y Y Y Y N

Y Y Y Y Y Y Y P Y Y P P N

100 45 73 91 100 55 91 55 36 82 73 18 45

46

62

85

38

92

69

69

92

15

92

69

23

0

0

0

0

23

15

0

15

0

23

Z39.50 ERL Open URL HTTP MARC DOI SQL EAD TEI XML Dublin Core Vendor products only Other standards

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ product supports this function; P ¼ product supports this function partially; R ¼ the next release will support this function; N ¼ product does not support this function; U ¼ not known whether product supports this function; a OpenURL can be supported by WebFeat using their 1Cate service

190

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Table VIII Vendor support criteria offered Vendor support criteria Support 1 Support 2 Support 3 Support 4 Support 5 Support 6 Support 7 Support 8

Vendor provides installation and initial support The software is provided to libraries for installation Software is hosted remotely by vendor Full documentation is included in the price of the product Technical support is readily available to New Zealand libraries In-house training is provided Libraries can join a product listserv Other support is provided

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency Y)

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

Y

91

Y

Y

N

N

Na

N

N

Y

Y

N

Y

45

Y

Y

Y

Y

Y

Y

Y

N

N

Y

Y

82

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

100

Y Y Y Y

P Y Pc Y

Y Y Y Y

P P P N

Y Y Y Y

Yb Y Y Y

Y Y N Y

Y Y Y N

P N Y N

Y Y N Y

Y Y Y Y

73 82 64 73

100

75

88

38

88

88

75

75

38

75

100

0

25

0

38

0

0

0

0

13

0

0

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ vendor provides this service; P ¼ vendor partially provides this service; R ¼ vendor will provide this service in the next release; N ¼ vendor does not provide this service; U ¼ not known whether the vendor provides this service; a Support criterion 2 will not be included in the next release of MetaFind but is planned for future releases; b Support for MetaLib is provided by the Australian subsidiary of Ex Libris; c Libraries using CPORTAL can join the FDI CPORTAL and seamless UK mailing lists

Table IX System platforms supported System platforms

CiP

CP

ENC

FIA

MF

ML

MS

SinS

SitS

WF

ZP

Frequency (Y)

System System System System System System

Y Y N Y Y U

Y Y N Y U Y

Y N N Y U Y

Y N N N N N

Y Y N Y Y U

Y Y N Y Y Y

Y Y N Y Y U

N Y N N U Y

N Y N Y Y Y

Y N N N N N

Y Y N N Y Y

82 73 0 64 55 55

67

67

50

17

67

83

67

33

67

17

67

0

0

0

0

0

0

0

0

0

0

0

1 2 3 4 5 6

Remote hosting by vendor Windows Macintosh Linux Other Unix Sun Solaris

Criteria supported (per cent) Criteria partially supported or planned for next release (per cent)

Notes: CiP ¼ Chameleon iPortal; CP ¼ CPORTAL; ENC ¼ ENCompass; FIA = Find-It-All; MF ¼ MetaFind; ML ¼ MetaLib; MS ¼ MuseSearch; SinS ¼ Single Search; SitS ¼ SiteSearch; WF ¼ WebFeat; ZP ¼ ZPORTAL; Y ¼ vendor provides this service; P ¼ vendor partially provides this service; R ¼ vendor will provide this service in the next release; N ¼ vendor does not provide this service; U ¼ not known whether the vendor provides this service

Table X Product release dates Product

Next release date

Chameleon iPortal CPORTAL ENCompass with LinkFinderPlus Find-It-All OneSearch MetaFind MetaLib MuseSearch SIRSI Single Search SiteSearch Web of Knowledge WebFeat ZPORTAL

February 2003 August 2003 December 2002 Summer/Fall 2003 2003 Summer 2003 2003 2003 No next release date given April 2003 October 2003

Merging and de-duplication of results are highly desirable features in cross-database searching, and are maturing features found in 64 and 55 per cent of products, respectively, although it is likely that vendors supporting a large range of database communication protocols will have more difficulty supporting merging and de-duplication of results than vendors offering a smaller set of database communication protocols. WebFeat commented that they “would challenge any vendor claiming this function to demonstrate it successfully across the full result set (. 10K records) and all databases (. 50 authenticated databases)”. Printing and downloading of search results are also maturing

191

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

features, as they appear in 64 per cent of products. Results sets can be e-mailed within 73 per cent of products. Limiting, sorting and relevancy ranking of results are emerging features supported by less than 40 per cent of products. The evaluation criteria relating to searching are widely supported by the common user interface software products reviewed. Seven of the 11 products fully support 75 per cent or more of the search criteria evaluated. MuseSearch is the most complete product, supporting 100 per cent of the search criteria. Single Search supports 96 per cent of the search criteria and ENCompass supports 92 per cent. All of the products reviewed fully or partially support 75 percent or more of the search criteria (see Table II).

present in 90 per cent or more of products. These were the ability to link to the library OPAC, the ability to define groups of databases and to vary access levels for particular client groups, and the ability to generate transaction logs and usage statistics. Other established features were customisable results displays (including the ability to specify fields displayed) and customisable interfaces. Customised interfaces for each client group and the display of local holdings are features offered by 73 per cent of products. Other maturing features include linking results to document delivery and interlibrary loan services (see Table IV).

User interaction features The user interaction criteria also relate to searching, but are particularly concerned with personalisation. User interaction features are less mature overall than the search features evaluated, although 73 per cent of products allow search queries to be refined and users to select which records are printed or downloaded from a set of results. MuseSearch and ZPORTAL partially allow users to specify which fields are included when results are downloaded or printed, and WebFeat should have this feature in the next release. Most products will at least partially record a user search history. Context-sensitive help is provided by 64 per cent of products. The ability to download results to bibliographic software and the availability of SDI options (where users can request that a particular search be run automatically and the results be sent to them by e-mail) are emerging features supported by 36 per cent of products. These features should be of interest to academic libraries in particular. User interaction options are less prevalent in common user interface software products than search functions. Improvement in this area would be enormously beneficial for library users and it is hoped that increased user interaction will be supported by future generations of common user interface software products. Over half the products currently support under 50 per cent of user interaction features. ENCompass and MetaLib are the clear leaders in this area, supporting 86 per cent of user interaction features (see Table III). Interface customisation features The interface customisation criteria describe the level of control a library has over the common user interface software product, and how well the product can be integrated into the library’s existing holdings and customised to suit particular groups of users. Several key customisation criteria were

Authentication features Authentication and access control requirements are inextricably linked to library provision of remote or distributed access to licensed databases and other electronic resources. This fact is reflected in the prevalence of authentication features in the common user interface products reviewed. Eight products supported all of the evaluation criteria relating to authentication, indicating that authentication is standard in common user interface software products. An exception to this trend is CPORTAL, which is designed to deliver citizens’ information and cross-resource searching for the public library and e-government market, and is intended to give unauthenticated access to all, so that users can search for information without identifying themselves (useful for personal or health-related searching; see Table V). Design features The design evaluation criteria chosen represent the bare minimum standard for common user interface software design and relate to branding and usability. Design criteria 1 and 2 in Table VI enable the library to incorporate its branding into the common user interface, increasingly necessary as common user interface software or portal products become the online face of libraries. Design criteria 3 and 4 both relate to usability. Eight of the products supported all of the evaluation criteria relating to design. The remaining products support or plan to support at least three of the four criteria. Further evaluation criteria covering interface navigation and aesthetics would have provided useful information about design, but it would have been difficult for the product vendors to evaluate these criteria objectively, and hence their exclusion from this review. Database communication protocols The database communication protocol criteria that have been evaluated represent a selection of

192

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

the database communication protocols supported by vendors rather than a comprehensive list. Some of these database communication protocols are standard in library environments, such as Z39.50, MARC and Dublin Core. Others are standard in any online environment, such as HTTP, SQL and XML. Many other standards could be added to this list. Other protocols suggested by vendors, librarians and the ARL Scholars Portal Working Group include CIMI, RDF, OAI, Telnet, Lotus Notes, SOAP, LDAP, SIP II, NCIP and ISO 10160/10161. There does not appear to be a standard set of database communication protocols supported by the majority of products at present, although the Z39.50 and MARC standards are used by all of the common user interface software products reviewed and can be considered to be standard. XML and SQL are also widely supported. All of the products support a range of database communication protocols, and none of the products reviewed support only their vendor’s proprietary databases. MetaFind, Single Search and WebFeat each support 92 per cent of the database communication protocols evaluated, and many of the vendors support or plan to support other protocols. CPORTAL supports seamlessUK, GILS and GRS-1. ENCompass supports the Open Archives Initiative. MetaFind supports Innovative products. MetaLib plans to support OAI harvesting in the next release. MuseSearch supports Telnet, Lotus Notes, SOAP, LDAP, SIP II, NCIP, and ISO 10160/10161. SIRSI Single Search supports SIP, NCIP and Telnet (see Table VII).

ZPORTAL include all of the support options evaluated. Seven of the remaining vendors offer at least 75 per cent of these support options (see Table VIII).

Vendor support options Adequate after sales support is essential in any purchase of complex and expensive software, and the vendor support criteria were designed to discover the extent to which common user interface software vendors supply support services. Most of the support services evaluated are standard. Most vendors offer initial set-up, documentation, training and support as part of the product. Vendors were asked specifically about resource linking support options in the interface customisation section of the survey, and most vendors indicated that they could provide assistance in setting up and maintaining the profiles which enable their common user interface product to link to external databases (see Table IV, custom criterion 14, “Vendor can provide preloaded links to a wide range of databases”). This is a particularly valuable option for libraries maintaining access to a large variety of databases and should be considered to be a standard support feature. Chameleon iPortal and

Additional vendor support The vendors were asked to describe any additional support that they offer to clients, and reported the following additional services: . VTLS provides an Account Management Support agreement that includes support for all enhancements and new releases to Chameleon iPortal, on-site visits for a service of the library’s choice and other support and maintenance services. . Endeavour has a support Web site for ENCompass users and user group meetings and answered “yes” to custom criterion 14 (“Vendor can provide pre-loaded links to a wide range of databases”). . Innovative configures profiles for MetaFind target resources as part of the set-up service and can also handle the maintenance of resource profiles. Innovative answered “yes” to custom criterion 14. . Ex Libris offers an optional KnowledgeBase service with monthly updates of database profiles on MetaLib. Ex Libris answered “yes” to custom criterion 14. . MuseGlobal has an automated service called the Source Factory that runs 24 hours a day, detecting changes in resource connections to MuseSearch and delivering and installing updated resource profiles to customers. MuseGlobal answered “yes” to custom criterion 14. . SIRSI maintains all resource profiles for Single Search and answered “yes” to custom criterion 14. . SiteSearch is an open source product to which OCLC provides free access. Limited support is available from OCLC and the SiteSearch community for no charge. A maintenance contract can be provided by OCLC for a fee. Site Search answered “yes” to custom criterion 14. . WebFeat is offered as a hosted service and all set-up and maintenance is handled for the customer, including resource profiles for over 1,200 authenticated databases that are updated whenever database vendors make changes. WebFeat answered “yes” to custom criterion 14. . Fretwell-Downing can also provide help desk support, project management and consulting and configuration services for ZPORTAL as required. Fretwell-Downing indicated that

193

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

ZPORTAL could partially support custom criterion 14.

.

System platforms supported The system platform criteria were included to ascertain the operating systems on which the products could be installed. The distinction between server and client was not made completely clear in the survey, and some vendors included information on client access to the common user interface software product via browser. Information on the possible server platforms is recorded for each product in Table IX. Possible client platforms were not evaluated. Common user interface software is readily available as a remote hosted service or on a Windows platform. Remote hosting is offered by 82 per cent of vendors. Where the software is installed locally, 73 per cent of products can be installed in a Windows environment and 64 per cent can operate on Linux. Sun Solaris or other Unix platforms can be used for 55 per cent of products. Common user interface software products are not currently available on a Macintosh platform, and it is unlikely that this will be offered in the future. System support requirements Vendors were asked to state the approximate amount of time a systems librarian would need to spend maintaining their product. Most of the products require minimal system support, and some vendors provide all support services as part of the product fee. Common maintenance tasks include regular backing up of the system, redesign of user interfaces and configuring new resources as they are added. The vendor descriptions of the system support requirements for their product are as follows: . Chameleon iPortal maintenance times vary depending on the situation, including the size of the system and the degree of customisation. . CPORTAL requires less than two hours per week in maintenance tasks, which include checking transaction logs and creating reports, checking the online harvest tool logs and adding new data sources if required. . ENCompass requires the same kinds of duties that the systems librarian would be performing for maintenance of their library management system: ensuring nightly server backups are completed, performing server maintenance, and co-ordinating with Endeavour for upgrades to the server. . Find-It-All One Search requires little administration beyond initial set-up and administration of account information for the databases and resources used by a given library because it is a hosted Web service and no local software is required.

.

.

.

.

.

.

MetaFind requires minimal time on the part of the library’s systems librarian as Innovative performs the complete installation and set-up of MetaFind and provides comprehensive support for the product as part of the library’s maintenance contract. MetaLib requires ongoing maintenance of approximately one day a month and a further day a month for SFX, the linking server embedded in MetaLib. These tasks typically take the place of tasks that are no longer needed once the product is installed. MuseSearch administration involves ensuring the application is accessible through the Web server or changing the HTML interface to the users. The amount of time spent will vary according to how intricate the HTML interface is and how often it is changed. New information sources can be readily and rapidly downloaded from the online Global Resource library. SIRSI Single Search requires only occasional maintenance by library staff. All Resource Plug-ins are automatically maintained by SIRSI. Site Search requires less than one hour per month for systems administration once the initial installation and configuration is completed. Systems administration will include daily check-ups and routine back-ups. Configuration files are updated occasionally when new libraries are added, which will amount to less than one hour per library added. WebFeat requires a few days administration at set-up and notification to WebFeat’s technical support team when any irregularities are noticed. The system is fully supported and maintained by WebFeat under a service provider model. The library specifies the databases to be included and places the WebFeat APIs wherever appropriate in their existing home page. WebFeat currently supports over 1,200 authenticated databases and will support any others needed by the library as part of the annual contract. WebFeat provides full usage reporting on all databases accessed. ZPORTAL requires up to five hours of support per week once implementation is complete and the back-up processes are automated. Ongoing configuration for new profiles and databases will require additional support time.

Pricing structures As may be expected with relatively complex software, common user interface product pricing is highly dependent on the situation of the purchasing library. Pricing tends to depend on the

194

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

number of users or the number of resources served. Libraries may find that one of these pricing models will suit their particular situation better than others. While most vendors could not supply a list price for their product because of the complexity of pricing models, those prices supplied by vendors ranged from free, for OCLC’s open source Site Search software, to tens of thousands of dollars for MuseGlobal’s common user interface software product. The vendor descriptions of their pricing structures are given below. However, these are subject to change and are given only as an indication: . Chameleon iPortal pricing is based on the level of concurrent user licensing. Drop In Pull Out (DIPO) components such as enhanced bibliographic content or thesaurus utilities may be added for additional fees. . CPORTAL list pricing is based on the size of population served, plus the number of information resources plugged into the CPORTAL search engine. . ENCompass pricing is based on the number of institutions and the size of institutions in volumes. . Find-It-All One Search charges an annual subscription of US$699 per year per library, which includes 12 months of access, Webbased training and online or downloadable user documentation. . MetaFind is typically licensed for an unlimited number of public users. Costs may vary based on the size of the library, the number of target resources required or the number of additional interfaces required to support multiple languages or different levels of access. . MetaLib pricing depends on the institution. . MuseGlobal offers a range of annual subscriptions. Public library systems are priced on the number of physical locations in tiers ranging from US$29,400 for 1-5 locations to US$96,120 for up to 99 locations. Systems for academic institutions are priced on the number of FTEs at the institution. The price ranges from US$29,400 for 5,000 FTEs to US$96,120 for up to 50,000 FTEs. Consortia are priced on the number of institutions contracting at the same time. The price ranges from US$22,000 per institution for up to ten institutions to US$7,209 per institution for up to 99 institutions. Discounts are available for multi-year contracts. . SIRSI Single Search is priced based on user population size, FTEs for academic and school libraries, employee count for special libraries and patron count for public libraries.

.

.

.

Site Search is free for non-profit use. It was an OCLC product and is now available as open source. WebFeat pricing is based on the number of database translators to be maintained annually. ZPORTAL pricing varies according to the ZPORTAL license. Current licensing includes site license (usage restricted to institutional members), concurrent licenses (usage restricted to an agreed number of concurrent users) and ASP (licensing included as part of the total service).

Comparing common user interface products The data collected from the vendors were converted to numeric values (as described in the section on methodology) to supply a quantitative basis for comparison. Two different conversion schemes were used to calculate the scores presented in Table XI. The scores obtained using conversion scheme A reflect the common user interface market in 2002. Some vendors have released the next version of their product at the time of writing, and most will have released a new version of their product during 2003. Conversion scheme B was used to calculate the score for the next release of each product. The scores obtained using conversion scheme B represent the vendors’ description of the features that will be available in the next release of their product. The scores for each of the evaluation criterion were added together to obtain an overall score for each product, shown in Table XI. These scores have also been expressed as a percentage of the total score the product could possibly have obtained. The products have been ranked from one to 11. The scores in Table XI identify the products with the most features. The scores for the product releases available in late 2002 (when the survey was conducted) ranged from 73 to 139 out of a possible 158, giving percentages of 46 to 88 per cent. The top five products received scores of over 80 per cent. MuseSearch and ENCompass are the current leaders at 88 per cent. The projected scores for the next product releases are from 0 to 7 percentage points higher, which should cause some minor position changes. The top seven next-release products are expected to receive scores of over 80 per cent, with a higher average score of 77 per cent, indicating an overall improvement in common user interface software functionality during 2003. MuseSearch should

195

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Table XI Overall scores for the common user interface software products 2002 Chameleon iPortal CPORTAL ENCompass Find-It-All MetaFind MetaLib MuseSearch Single Search Site Search WebFeat ZPORTAL Average score Best possible score

123 95 139 73 124 136 139 130 85 126 121 117 158

Total score Next release

Percentage score 2002 Next release

126 106 139 77 131 139 140 130 85 133 124 121 158

78 60 88 46 78 86 88 82 54 80 77 74 100

retain first place, with ENCompass and MetaLib projected to be equal second by next release.

Conclusion This research shows that common user interface software products are widely available, and that many of the products reviewed are relatively mature. The key search feature associated with a common user interface, cross-database searching, is supported by most of the products reviewed, and the standard of other search features is encouragingly high. The number of complementary features enabling personalisation and customisation indicates that the trend towards holistic, portal-style library products is not a passing fad but a logical evolution of this kind of software. Overall, the standard of the common user interface software products reviewed was very high, with products on average supporting around 75 per cent of the evaluation criteria. MuseSearch, ENCompass, MetaLib, Single Search and WebFeat are the five highest scoring common user interface software products, and each is highly functional. Products with average or below average scores may have adequate functionality for many libraries, depending on the extent of their requirements. The presence of an open source product, Site Search, should be an encouraging sign for libraries that have previously avoided common user interface software products on the grounds of cost. These common user interface software products were reviewed without reference to live software and without user involvement. This type of review is an effective first step toward choosing a product but libraries should also evaluate live versions of the common user interface software products

80 67 88 49 83 88 89 82 54 84 78 77 100

2002 7 9 1= 11 6 3 1= 4 10 5 8

Ranking Next release 7 9 2= 11 5 2= 1 6 10 4 8

where possible. Access to live versions of products can usually be arranged through the vendors or by contacting a library that is already using that product. User testing before purchase can offer valuable insight into a product’s effectiveness. Librarians are encouraged to use the evaluation criteria and evaluations presented in this report as a basis for selecting a common user interface software product, but these should be adapted to fit the circumstances of the individual library or group of libraries. The evaluation criteria chosen for use in this research are representative of the features currently present in common user interface software products. The survey results indicate that many of the criteria evaluated are established features that are present in the majority of products and that other features are maturing rapidly. Ongoing research will be required to chart the evolution of common user interface software and the progression of individual products.

References Arant, W. and Payne, L. (2001), “The common user interface in academic libraries: myth or reality?”, Library Hi Tech, Vol. 19 No. 1, pp. 63-76. Association of Research Libraries Scholars Portal Working Group (2002), Final Report, Association of Research Libraries, available at: www.arl.org/access/scholarsportal/final.html (accessed 15 May 2003). Blake, M. (2002), “Implementation of the OpenURL and the SFX architecture in the production environment of a digital library”, in VALA 2002 Biennial Conference and Exhibition, “E-volving Information Futures”, Melbourne, available at: www.vala.org.au/vala2002/2002pdf/39Blake.pdf (accessed 26 August 2002). Boss, R.W. (2002), “Library Web portals”, Public Library Association Tech Notes, February, available at: www.ala.org/ala/pla/plapubs/technotes/ librarywebportals.htm (accessed 3 February 2004).

196

A comparatve review of common user interface products

Library Hi Tech

Daniel G. Dorner and AnneMarie Curtis

Volume 22 · Number 2 · 2004 · 182-197

Bradley, P. (1995), “Towards a common user interface”, Aslib Proceedings, Vol. 47 No. 7/8, pp. 179-84. Curtis, D. and Greene, A.Y. (2002), Attracting, Educating and Serving Remote Users through the Web: A How-to-Do-It Manual for Librarians, Neal-Schuman, New York, NY. Dorner, D.G. and Curtis, A. (2003), “A comparative review of common user interface software products for libraries”, National Library of New Zealand, Te Puna Matauranga o Aotearoa, Wellington, available at: www.natlib.govt.nz/ files/CUI_Report_Final.pdf (accessed 3 February 2004). Friend, L. (1994), “Databases by the dozen: the challenge of multiple interfaces at Penn State”, Wilson Library Bulletin, Vol. 69 No. 1, pp. 38-41. Kvaerndrup, H.M. (2000), “DEF: Denmark’s electronic research library: a project changing concepts, values and priorities”, Exploit Interactive, No. 4, available at: www.exploitlib.org/issue4/def/ (accessed 26 August 2002). Lancaster, F.W. and Sandore, B. (1997), Technology and Management in Library and Information Services, Library Association, London. Library of Congress Portals Applications Issues Group (2003), “List of portal application functionalities for the Library of Congress”, first draft for public comment, 15 July, available at: www.loc.gov/catdir/lcpaig/ PortalFunctionalitiesList4PublicComment1st7-22-03.html (accessed 3 February 2004). Mahoney, D. and Di Giacomo, M. (2001), “[email protected]: a simple smart search interface”, Issues in Science & Technology Librarianship, Vol. 31, available at: www.istl.org/istl/01-summer/article2.html (accessed 26 August 2002). Payette, S.D. and Rieger, O.Y. (1997), “Z39.50: the user’s perspective”, D-Lib Magazine, April, available at:

www.dlib.org/dlib/april97/cornell/04payette.html (accessed 28 August 2002). Pope, N.F. (1998), “Digital libraries: future potentials and challenges”, Library Hi Tech, Vol. 16 No. 3/4, pp. 147-55. Pulkowski, S. (2000), “Intelligent wrapping of information sources: getting ready for the electronic market”, paper presented at the VALA 2000 Biennial Conference and Exhibition, Books and Bytes: Technologies for the Hybrid Library, Melbourne, available at: www.vala.org.au/ vala2000/2000pdf/Pulkowsk.PDF (accessed 26 August 2002). Raol, J.M., Koong, K.S., Liu, L.C. and Yu, C.S. (2002), “An identification and classification of enterprise portal functions and features”, Industrial Management & Data Systems, Vol. 102 No. 7, pp. 390-9. Van de Sompel, H. and Hochstenbach, P. (1999a), “Reference linking in a hybrid library environment. Part 1: frameworks for linking”, D-Lib Magazine, Vol. 5 No. 4, available at: www.dlib.org/dlib/april99/van_de_sompel/ 04van_de_sompel-pt1.html (accessed 26 August 2002). Van de Sompel, H. and Hochstenbach, P. (1999b), “Reference linking in a hybrid library environment. Part 3: generalizing the SFX solution in the ‘SFX@Ghent & SFX@LANL’ experiment”, D-Lib Magazine, Vol. 5 No. 10, available at: www.dlib.org/dlib/october99/van_de_sompel/ 10van_de_sompel.html (accessed 26 August 2002). Wetzel, K.A. (2002), “Portal functionality provided by ARL libraries: results of an ARL survey”, ARL Bimonthly Report, No. 222, available at: www.arl.org/newsltr/222/ portalsurvey.html (accessed 14 May 2003). Wilson, P. (2001), “One search interface fits all”, Public Libraries, Vol. 40 No. 5, pp. 280-1.

197

Introduction

Visual image repositories at the Washington State University Libraries Trevor J. Bond

The author Trevor J. Bond is Special Collections Librarian, Manuscripts, Archives, and Special Collections, Washington State University Libraries, Pullman, Washington, USA.

Keywords Visual databases, Copyright law, Partnership

Abstract The World Civilizations Image Repository (WCIR) and Photos Online are two collaborative image database projects under way at the Washington State University (WSU) Libraries. These projects demonstrate how the WSU Libraries have employed OCLC/DiMeMa’s (Digital Media Management) CONTENTdm in partnership with other University departments to develop visual collections free from copyright restrictions, as well as to manage “born digital” images on a collaborative basis.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Since the publication in 2002 of papers by Raym Crow and Clifford Lynch (Crow, 2002; Lynch, 2003) about libraries serving as institutional repositories (i.e. academic libraries developing digital collections that preserve and provide access to the intellectual output – such as working papers, dissertations, and data sets – of their respective universities), there has been a great deal of interest among librarians in creating partnerships with various campus groups to begin collecting these materials. This article discusses two pilot projects under way at the Washington State University (WSU) Libraries that seek to organize, preserve and disseminate high-quality images created by WSU faculty and staff. Although these efforts represent only one small aspect of a comprehensive institutional repository (images), they nevertheless demonstrate practical approaches for employing differing methods of collaboration between the WSU Libraries and other campus departments, as well as the central role of the WSU Libraries in maintaining access to visual materials. The projects also provide an opportunity for the Department of Manuscripts, Archives, and Special Collections (MASC) to develop collections of images in both analog and digital formats that are relevant to the campus community and at the same ensure that fragile “born digital” photographs are cataloged and maintained in such a way that they will be available in the future. The projects that will be discussed are: . the World Civilizations Image Repository (WCIR), a collaborative image database for use by WSU faculty in teaching World Civilizations courses; and . Photos Online, a collection of current photographs taken by campus photographers intended for official University publications and marketing efforts. The WSU World Civilizations courses are called General Education 110 and 111, and are required for students entering the University. The classes are global and comparative in approach, with an emphasis on interdisciplinary content including the material base of each civilization, its social system, ideological framework and creative arts[1]. Both projects employ OCLC/DiMeMa’s CONTENTdm software[2].

Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 198-208 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543511

Received 26 May 2003 Revised 30 June 2003 Accepted 12 August 2003

198

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

WCIR The impetus for the WCIR came from the recent (2002) donations of two faculty collections to the WSU Libraries, as well as the need of the World Civilizations faculty for relevant images to use in lectures and course assignments. Given the rapid pace of change in classroom technology and the use of Web based learning communities, faculties are especially interested in obtaining electronic images without fear of copyright concerns. Public domain images may be posted in Web accessible syllabuses, integrated in streaming PowerPoint lectures for distance courses, and generally shared in ways that would violate the fair-use clause for copyrighted images. Furthermore, WSU does not maintain a slide library for the General Education faculty, so even copyrighted images can be difficult to find on campus. With this need on the part of the teaching faculty and the two donations mentioned above, the idea then took root to expand the Library’s existing digital collections, which focus on Campus and Northwest History, with a series of image databases tailored to the World Civilizations Program. With the donated photographs covering Central Asia, China, Turkey and Japan, as well as and historical engravings selected from the WSU Libraries’ rare book collection, these image collections combined became the working test site for the World Civilizations Image Repository[3]. After a presentation by WSU librarians to the World Civilizations faculty demonstrating the WCIR and the criteria for adding images to the project, additional faculties have pledged collections and provided suggestions for content (see Figure 1). The basic criteria for accepting images are the following: . the donor must hold copyright over the images and be willing to transfer those rights to the Washington State University Libraries; and . the images should correlate with the cultures, topics and themes as outlined in the “Covenant” of course objectives and coverage for teaching World Civilizations 110 and 111 developed by the Department of General Education. This Covenant is a document of agreements created by a group from the Washington State University World Civilizations faculty that not only outlines the course objectives, topics and cultures covered in the series, but also includes a library research assignment and a cultural assignment. According to the document, “the Covenant is binding on all of us who teach the course”. These guidelines were developed by the core World Civilizations faculty in a curriculum project funded by National Endowment for the Humanities[4].

The images should also be appropriate for classroom use. What is appropriate for classroom use? Generally speaking, personal family photographs are not included in the collections. We are especially interested in images of archeological sites that convey a sense of the geography, topographic features and material culture of various civilizations, architecture, religious rituals, and cultural events including traditional dress, dance and art. So far, we have worked with one professor who self-selected the images to be included in the WCIR. In another case, the faculty member asked us to choose what we thought was best. With all of the collections, we have only included photographs for which the faculty member holds the intellectual property rights, and is willing to donate these rights for academic use. CONTENTdm The image database software used for the WCIR and the Photos Online project is CONENTdm. We first began working with CONTENTdm at WSU in 1999, while the product was a collaborative research project between a team led by Professor Greg Zick in the University of Washington Center for Information Systems Optimization (CISO) and the University of Washington Libraries. In 2001 a new company, DiMeMa (Digital Media Management) Inc., independent of the University of Washington, was formed to focus on research and product development[5]. DiMeMa and OCLC then formed a partnership in which OCLC supplied the marketing and support for CONTENTdm[6]. We started working with CONTENTdm in 1999 as part of a Digital Images Initiative led by the State Library of Washington. After the series of grants ended we have continued to use the software, for several reasons. It is very flexible in creating the look of any given database. Library staff can design the image databases with a range of searching options including pre-selected searches, such as a single hyper-link for a particular search, a drop-down list of search topics, a simple key-word (or Boolean-enabled) search box, an advanced search engine, and the ability to browse all of the objects in a given collection. Collections may also be combined for cross-database searching. All of these features can then be placed on a Web site, with the result that the database interface can be designed for any intended audience. The software also allows one to determine how the search results will look, i.e. how many thumbnail images will appear, what the results screen will contain (i.e. navigation bars), and even the option to have designated descriptive fields, such as subject or genre terms, hyper-linked so that if users click on a

199

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

Figure 1 Working model of the World Civilizations Image Repository, utilizing CONTENTdm’s contextual client

given term CONTENTdm will launch a search on that word within the given field. For librarians accustomed to working with inflexible OPACs or in-house programmed databases after the programmer has left for another job, these are heady choices indeed. The decision made by Professor Zick in the 1990s to map the CONTENTdm metadata structure to the emerging Dublin Core standard was a good one[7]. Dublin Core metadata includes 15 elements such as title, creator, date, and subject. It is flexible and easy for temporary workers to learn to use. The adoption of the Dublin Core standard by the Open Archives Initiative ensures that the metadata created in CONTENTdm collections can be harvested by other projects. Also, the software allows for several export options, including ASCII and XML. According to the DiMeMa Web site, “CONTENTdm provides support for the Open Archives Initiative Protocol for Metadata Harvesting Version 2.0 (OAI-PMH v2), an

emerging standard for metadata harvesting. CONTENTdm Servers can function as OAI repositories” (DiMeMa, Inc., 2004). As an example of CONTENTdm’s OAI compatibility, Alan Cornish, a WSU Systems Librarian, worked with the ARC project at Old Dominion University to harvest 10,526 images from our CONTENTdm collections to their database[8]. CONTENTdm’s OAI compatibility is not yet totally seamless. Challenges still remain with the rate at which OAI harvesting occurs (CONTENTdm collections load all at once instead of in a steady flow) and the way that CONTENTdm displays some characters, such as apostrophes, which causes them to disappear after harvesting. DiMeMa is aware of these difficulties and will no doubt address them. At WSU, we have also seen DiMeMa’s willingness to take on product development for specific projects. In 2000, we began a project with the University of Washington Libraries to scan and describe historical maps. As part of the project, we

200

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

wanted to compress our high-resolution map scans into the Lizard Tech MrSID format[9]. DiMeMa programmed the CONTENTdm software to include MrSID files as one of the acceptable highresolution formats (TIFF and JPEG are the others). This allowed us to incorporate the benefits of the MrSID files within a CONTENTdm database[10]. The result is that users visiting the Early Washington Maps site can take advantage of CONTENTdm’s numerous search options, such as keyword searching, selecting from a predefined list of topics, and clicking on icons provided on a graphic index or in a historical timeline. Once an icon is selected and a full-screen image appears with its description, users then have the option to click on a URL which leads them to a MrSID image viewer that has the same look as the rest of the database and the functionality to click and zoom to fully study the details of a given map.

Design features of the WCIR The current front page (see Figure 1) of the WCIR makes use of CONTENTdm’s numerous searching features embedded in a custom Web interface or contextual client. The page provides access to various collections: (1) engravings from the MASC rare book collections; (2) Central Asian and Chinese photographs by Proferssor Marina Tolmacheva; and (3) photographs of Turkey and Japan by Professor Paul Brians.

Work flow for the WCIR Project At the start of the project (spring of 2003), several members of the World Civilizations faculty mentioned that they had intended for some time to label and scan images, but had not had the chance to do so. Participating in the WCIR project allows the faculty member to concentrate only on the selection and description of images, while scanning, metadata entry, database design and maintenance all take place within the library. The basic work flow begins with an expression of interest on the part of a faculty member. If the professor holds the copyright to the images (that is if he or she took the pictures) and the photographs fall within the cultures and topics covered in the World Civilizations courses, the faculty member is asked to sign a deed of gift transferring the intellectual property rights for academic use to the WSU Libraries. The faculty member is asked to provide as many metadata as possible, generally a good caption including location and date. The images are then scanned (or retrieved from a CD in the case of digitally processed photographs) by temporary employees working in the Department of Manuscripts, Archives, and Special Collections (MASC) at the WSU Libraries and the electronic files are imported into the CONTENTdm acquisition station where the Dublin Core metadata is added. CONTENTdm’s template creator feature allows us to repeat regularly reoccurring elements, such as creator, source, and publisher. We use controlled subject and type/genre terms selected from the Library of Congress’s Thesaurus of Graphic Materials[11]. The Getty’s Thesaurus for Geographic Materials has also proved invaluable for completing the Dubin Core coverage field[12]. Once the images and metadata are added to the database, we solicit additional comments from the donor and revise the database accordingly.

The images in the WCIR are a mixture of scans made from analog prints or book illustrations as well as “born digital” images. Although Professor Tolmacheva used a traditional camera during a trip to China in 2003, the images she donated were also processed as digital images and stored on CDs. During his trip to Turkey in 2002, Professor Brians used a digital camera exclusively. After he returned to Pullman, he donated the full-resolution images along with captions to the Libraries. His collection of photographs of Japan was similarly provided on CD. Both of Professor Brians’ collections also feature CONTENTdm’s full-resolution option, whereby users can download high resolution images. There are two ways to access the full-resolution images. In the primary (contextual) search client, after an image is selected to view, a link labeled “Full Resolution” is provided at the bottom of the metadata. Clicking on this link will initiate the option to download the full-resolution file. The second method of accessing these files is through the advanced search (HTML) client. After the user selects an item from the list of results, CONENTdm opens a second window with the JPEG image[13]. Above this image are a series of three tabs with options for “Image”, “Description”, and “Full Resolution”. Selecting the “Full Resolution” tab will open a third browser window with the full-resolution file (see Figure 2). There are four primary ways to search from the WCIR front page. First, users can select any of the 29 pre-selected searches located in a drop-down box on the upper left of the screen under the heading “Browse by Subject”. This list will expand over time as new images are added to the various WCIR collections. A selection of current topics includes images of Africa, China, Maps, Mosques, Public baths, Silk making and Troy. The second way, placed to the right of the dropdown box, is a search box where users can enter keyword terms. This search option is set to accept Boolean operators. Also included in this top row of options is an option at the top of the page is a link to other WSU Digital Collections. The third way to access the collections is a series of four graphic buttons aligned to the left of the

201

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

Figure 2 CONTENTdm’s full resolution option in the contextual and HTML “Advanced Search” clients

Visual image repositories at the Washington State University Libraries

202

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

screen, labeled Engravings (from MASC), Silk Road (including Central Asia and China by Tomacheva), Turkey, and Japan (Brians). Clicking on any of the graphics will open a new window to the corresponding individual collection, each with its own custom interface and results screen. The final method is located in the paragraph of text to the right of the buttons that describes the WCIR project and provides links to the individual collections. The framework of the WCIR therefore allows users to search across all collections or search individual databases for a given topic or assignment. The search results are displayed in two rows consisting of five columns of thumbnail images. The title displays beneath each thumbnail result. A navigation bar along the top the screen allows users to return to the front of the World Civilizations Image Repository or to choose from a number of other sites: a complete list of CONTENTdm digital collections at WSU, the MASC home page, and the World Civilizations home page maintained by the WSU Department of General Education. Once the user clicks on a thumbnail image or the text below the images, a full-screen JPEG image appears with the descriptive metadata below. Again, the same navigation banner appears at the top of the screen. One last feature of note on the World Civilizations Image Repository’s front page is a sample reserve page on Chinese silk making. We created this reserve page using CONTENTdm’s “My Favorites” feature. When browsing images in CONTENTdm, the option “Add to My Favorites” is provided. If one clicks on this link, CONTENTdm sends cookies to one’s machine which point to the specified images. CONTENTdm, through a series of basic steps, will then allow the instructor (or any user) toedit his or her “favorites” and then generate a basic HTML page. We followed this process after choosing a series of three engravings from a 1736 edition of Jean Bapist Du Halde’s Description Ge´ographique, Historique, Chronologique, Politique, et Physique de l’Empire de la Chine showing the process of silk production (see Figure 3). Once CONTENTdm generated the plain HTML page based on the selected “My Favorites” images, we simply added the navigation bar used in the WCIR, a title and a couple of sample study questions. This method allows us to highlight any given set of images in the WCIR as they match a given topic or culture in a World Civilizations class. The reserve pages also remove any concerns about the students’ inability to locate specific images within the database. However, it should be noted that we were unable to get CONTENTdm’s “My Favorites” to work on several WSU Library computers. Browsers configured for a high level of

security are incompatible with this feature (see Figure 4).

Sustainability and future plans The WCIR is regularly backed-up on magnetic tape. We also burn CD and DVD disks with the fullresolution images. Full-resolution images have also been made available to the Department of General Education for use by the teaching faculty. As we move beyond this pilot phase, we plan to make the World Civilizations Image Repository images and metadata available for OAI harvesting. Given our ability to export the metadata from CONTENTdm and the storage of the full resolution images both online and on CD/DVD, we should be ready to migrate the databases to other systems if necessary. With these first databases completed, we plan to solicit and begin work over the summer on additional collections, while applying for external funds. Even without grant support, both the WSU Libraries and General Education would like to expand the project, and will do so through general operating budgets. The WCIR exemplifies the close collaboration between the librarians and faculty teaching in the Department of General Education. WSU librarians have taken responsibility for much of the labor involved in building these image databases by scanning the images, adding metadata, designing the databases, and providing training for their use. At the same time, the teaching faculty has demonstrated a willingness to contribute images along with the corresponding intellectual rights and to assist in providing captions so that their visual collections can be used by other faculties in teaching. The Libraries have benefited by being the recipients of donations of collections that are relevant to ongoing instruction on campus. The WCIR also reflects several of the key goals (collaboration with other departments, to strengthen collections, and to adapt technology to meet undergraduate needs and expectations for access to information) outlined in the WSU Libraries’ 2002 strategic plan, entitled “The Information Union: the WSU Libraries of the Future”[14]. Furthermore, given the challenges of describing and providing access to electronic data (in this case digital images), the project shows how CONTENTdm can facilitate access to and management of digital materials.

Photos Online Photos Online is another partnership between the WSU Libraries and the WSU Marketing and Communications Division to create a Webaccessible image database for use by the University

203

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

Figure 3 Course reserve page on Chinese silk production using the CONTENTdm “My Favorites” feature

Figure 4 Detail of Chinese silk making

204

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

community in Web sites, PowerPoint presentations (especially those focused on recruiting efforts), brochures and other publications[15]. Photos Online differs from the WCIR in that the Marketing and Communications Division oversees the addition of new images complete with metadata to the database.

initially taken. So even with the Marketing and Communications Division retaining photographs of buildings and natural scenes beyond five years, I anticipate that we will receive the materials sooner than in the past, and unlike analog photographs, these digital images will arrive pre-selected with item-level metadata. To get the Photos Online project started, we held a series of project meetings to work out the terms of the agreement, and to discuss unqualified Dublin Core metadata and the use of CONTENTdm software. The WSU Library provided a basic PC workstation with the CONTENTdm acquisition program loaded. The machine is mapped to a Library server that holds the database folders. This was necessary because the campus photographs in the Marketing and Communications Division work exclusively with Apple machines. As yet, CONTENTdm does not function with the Mac operating system. We also discussed the importance of using a controlled vocabulary, such as the Library of Congress’s Thesaurus for Graphic Materials, in the database. The campus photographers developed a clever metadata sheet (see Figure 5) on which the photographers circle who took the pictures, equipment used, subject categories, etc. These sheets are then provided along with the images to the Marketing Communications staff who in turn enter the metadata into the CONTENTdm database. Once the images and metadata are transferred to the Libraries, my intent is to edit these terms globally to conform with controlled vocabularies (see Figure 5).

Need for partnership In 2000, the Marketing and Communications Division wanted to move their visual library (primarily slides and contact prints) to the Web so that users from around the WSU campus could browse and select images with minimal assistance from Marketing and Communications Division staff. The Marketing and Communications Division also wanted their photographs in some type of database format so that they could keep better control over them. The photographers were also in the process of changing to digital cameras for most of their work, so the difficulties of organizing and preserving digital files in addition to hard copies became manifest. At the same time, as the librarian in charge of the WSU Libraries’ Historical Photograph Collections in MASC, I was becoming increasingly concerned over the long-term retention of these digitally created resources. The WSU Libraries has served for the last 30 years as the repository for the inactive photographs taken by campus photographers. Indeed, these images comprise the bulk of the Libraries’ campus visual collection, and are used frequently by students, faculty, alumni and the public. Working again with my colleague Alan Cornish from Digital Collections and Systems, we proposed that the Libraries would provide image database software (CONTENTdm), training, and metadata advice, as well as hosting the database on a WSU Library server. In exchange, the Marketing and Communications Division would dedicate the staff resources to scan and describe the images, and agree to transfer the images to the WSU Libraries once they are no longer current. The Marketing and Communications Division also choose to restrict access to the Photos Online images to the WSU community. All such restrictions will be lifted once the Libraries receive the images. Given the recentness of this project, we have only transferred two images from the active Photos Online database to the Libraries’ Photos Online Archive. How long do images stay current? According to the Marketing and Communications Division, photographs of people are generally retired within five years, while buildings and natural scenes stay “current” a little longer. In the past, transfers of analog photographs (prints, contact sheets and slides) between the Marketing and Communications Division and the Libraries have been made ten to 15 years after the images were

Design features of Photos Online The front page of Photos Online includes three search options: (1) a keyword search; (2) a drop-down box of predefined searches; and (3) an advanced search engine. The first two features are similar to those described in the World Civilizations Image Repository project above (see Figure 6). The advanced search option is a standard database interface provided by DeMeMa. It is ready for use as soon as objects are loaded into a database. The advanced search engine also provides an immediate database interface that is useful during the collectionbuilding process, and may also serve as a more permanent search option for institutions without the design expertise (or time) required for creating a graphic interface (contextual client) with custom results screens. The advance search engine allows users to search across individual or multiple fields, select search preferences, such as Boolean operators, and browse terms used in any

205

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

Figure 5 Metadata checklist designed by WSU’s campus photographers

searchable field. It also enables users to browse an entire collection and create the “My Favorites” pages described above (see Figure 7). Information about the purpose of the Photos Online database, the image request forms and official graphic identity photograph guidelines are also provided. As the department on campus charged with developing and policing WSU’s graphic identity program, staff from the Marketing and Communications Division were pleased with the flexibility that CONTENTdm provides in designing the database interface. Indeed, a Marketing and Communications Division Web designer incorporated the official WSU banner (required on every campus Web site) along the top of the results screens (see Figure 8). Images are selected for inclusion in the Photos Online database on the basis that they will be of general interest to departments around campus. Currently the images added to Photos Online are a mixture of scanned slides and images taken with a digital camera. With the growing use of digital cameras, photographers anticipate that they will soon stop scanning slides and work almost exclusively with these “born digital” images. Without this agreement, there is a high probability

that many of these digital images will be lost, or if not lost, then poorly described. One unforeseen by-product of the Photos Online database is that Web masters from around the WSU campus are using the Web-accessible, low-resolution images from the site for departmental homepages, and in the process improving the overall look of the WSU Web site.

Conclusion The WCIR and Photos Online projects show two models of collaboration between the WSU Libraries and campus units to provide Internet access to and preservation of quality visual materials, while at the same time managing copyright issues. For the WCIR, WSU librarians have played a hands-on role in soliciting collections, working with faculty to organize and catalog images, and preparing a workable database environment with collections that can be crosssearched or examined individually. In addition, librarians have worked closely with the World Civilizations faculty to ensure that relevant topics

206

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

Figure 6 CONTENTdm’s advanced search HTML client

Figure 7 Photos Online front page

207

Visual image repositories at the Washington State University Libraries

Library Hi Tech

Trevor J. Bond

Volume 22 · Number 2 · 2004 · 198-208

Figure 8 Search results for “events”

5 6 7 8 9 10 11

are emphasized and that these images may be used in teaching. On the other hand, the Photos Online database demonstrates a mutual agreement between the Marketing and Communications Division and the Libraries, where the Libraries provide training and support and serve as a repository for visual materials and the Marketing and Communications Division devotes the staff and the images to expand the database. With both projects, we are taking small steps to describe and maintain fragile digital images created by campus faculty and staff.

Notes 1 See www.wsu.edu:8080/%7Ewldciv/teachindex.html 2 DeMeMa, CONTENTdm software, available from http:// contentdm.com/ 3 The working pilot site for the World Civilizations Image Repository, www.wsulibs.wsu.edu/holland/masc/ xworldciv.html 4 See www.wsu.edu:8080/%7Ewldciv/covenant.html

12 13 14 15

See www.contentdm.com/about-us.html See www.oclc.org/digitalpreservation/services/ See http://dublincore.org/ See http://arc.cs.odu.edu/ See www.lizardtech.com See www.wsulibs.wsu.edu/holland/masc/xmaps.html See www.loc.gov/rr/print/tgm1/ and http://lcweb.loc.gov/ rr/print/tgm2/ See www.getty.edu/research/tools/vocabulary/tgn/ See www.wsulibs.wsu.edu/holland/masc/xturkey.html See www.wsulibs.wsu.edu/general/iuplan2.html See www.wsu.edu/photos-online/ (access to this database is limited to those with WSU credentials).

References Crow, R. (2002), “The case for institutional repositories: a SPARC position paper”, ARL Bimonthly Report, No. 223, pp. 1-4. DeMeMa, Inc. (2004), “CONTENTdm features”, available at: www.contentdm.com/products/features.html Lynch, C.A. (2003), “Institutional repositories: essential infrastructure for scholarship in the digital age”, ARL Bimonthly Report, No. 226, pp. 1-7.

208

Introduction

GIS in the management of library pick-up books Jingfeng Xia

The author Jingfeng Xia is a Masters Student in the GIS Program, Department of Geography, University of Calgary, Calgary, Canada.

Keywords Geographic information systems, Libraries, Shelf space, Collections management, Books

Abstract The management of library “pick-up books” – a phrase that refers to books pulled off the shelves by readers, discarded in the library after use, and picked up by library assistants for reshelving – is an issue for many collection managers. This research attempts to use geographic information system (GIS) software as a tool to monitor the use of such books so that their distributions by book shelf-ranges can be displayed visually. With GIS, library floor layouts are drawn as maps. This research produces some explanations of the habits of library patrons browsing shelved materials, and makes suggestions to librarians on the expansion of library collections and the rearrangement potential for library space.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 209-216 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543520

In order to create greater effectiveness in dealing with library management concerns such as reconfiguration of space, acquisition of material and control of usage, it is essential to obtain accurate information on book use in a library (Nkereuwem and Eteng, 1994, p. 37). The convention in collecting and analyzing such information has been to conduct scientific surveys (McGrath, 1971; Morse, 1978). The problem with such surveys is that they might not show the overall picture of collection utilization in a library if they are carried out at unsuitable times or with small samples, or if analytical strategies are applied unsuitably. A full survey involving all a library’s collections would be quite expensive in terms of time and labor involvement. Further, no survey can provide a dynamic mechanism for librarians to maintain a continuing set of data to monitor book usage in the long term. One solution is an automated tool that includes a database containing data on book use (which can be entered manually by people or scanned in by a device), an analytical mechanism to manipulate data, and an integrated interface to show the data in a dynamic and visual form. Working with this tool, librarians will be capable of controlling the necessary information regarding book utilization, and of making management decisions accordingly. This project is an effort to use ArcView, the most popular geographic information system (GIS) software in the world, to develop a book management tool that links a database to visual maps. The maps represent the floor plans of the MacKimmie Library at the University of Calgary, with individual book shelf-ranges as the map features, whereas the database contains data of inlibrary book use. A multiple-week observation was designed to record the call numbers of all books that were used in selected collection areas of the library and to enter the numbers into the database. Upon analysis of the raw data, ArcView links the resulting figures to each individual feature (e.g. shelf-ranges) on the maps and draws the use frequencies of the books with different shades of colors. The tool is capable of providing information to indicate the actual extent of book use in the library, and immediately reflects any updates in the database. Received 9 November 2003 Revised 8 January 2004 Accepted 9 January 2004 The author wishes to thank the librarians and librarian assistants at the MacKimmie Library for allowing the recording of numbers of library pick-up books, and two anonymous reviewers for their helpful comments on the first draft.

209

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

Background One of the problems that most libraries have been facing for a long time is the expansion of library collections and the reconfiguration of space (Fraley and Anderson, 1990, p. 67). Each year, numerous publications are added to the market. Libraries are involved in the ongoing acquisition of newly published materials, and these are placed on shelves for the use of library readers. Most libraries, however, have already reached their maximum construction potential. The dilemma of indefinite growth due to new publications against a library’s physical space constraints has become a great concern for many librarians and library managers. New technologies have provided some solutions, such as the emergence of electronic substitutes and the use of microfilms/microfiches to save collection storage room. However, experience has shown that electronic forms have been popular only in the area of e-journals. People still prefer a regular printed book to an electronic version. Microfilms and microfiches are currently restricted to storing archives and unpublished student theses. Some other strategies have been applied to manage library collections space, but they too have limitations. For example, compact shelves cram collections together in small areas. The problem with compact shelves is the inconvenience caused for library readers, who have to move the shelves before they can go into an aisle. If somebody is already in an aisle, those who need a neighboring aisle have to wait to shift shelves until the current reader moves out. Alternatively, librarians have relied on analyzing library collections individually to determine space requirements for future collection growth (Leighton and Weber, 1986; Montanelli, 1987). One way of doing so is to analyze the history of book development and project growth accordingly. “A more reliable technique is to average the number of pieces added during the past five years and consider this the annual growth figure in the space data file” (Fraley and Anderson, 1990, p. 50). Another method is to adjust collection content following the strategies of “zero-growth” and “weeding”. Most European research libraries and some US libraries have adopted the “zero-growth” and “weeding” concepts to manage their collections (Kent, 1979; Lancaster, 1988; Truewell, 1968). Their applications are based on the principle that some books are never used, and should be discarded to make room for other books. Inevitably, the criteria for “weeding” books are important and can only be established by performing thorough studies on the utilization of library materials. Successful management of the “weeding” will guarantee “zero-growth” so that the total number of collections in a library stays the same and the collection space is expected to

remain relatively unchanged (Arfield, 1993). Increases in book utilization can be accompanied by a rearrangement of different book categories. There are two ways of examining quantitatively the utilization of library books: (1) by counting books checked out and in; and (2) by calculating the floor pick-up books within the library. These methods deliver similar but slightly different information to elucidate book use by library patrons. Previous studies have focused on working with the former (Truewell, 1968). However, floor pick-up is a different reading behavior of library users (McGrath, 1971; Nkereuwem and Eteng, 1994). It can be assumed that when readers use books inside a library, they will expect to get a quick reference, or read only part of a book, or just browse periodicals. The types of books used inside a library will vary more widely than the types of books checked out of the library, and their numbers are larger. Hence, for this research, floor pick-up books convey more appropriate information than checked-out books. During the past two decades, a technological innovation – namely the introduction of radio frequency identification (RFID) – has enabled libraries to automate the gathering of book-use information. The RFID system uses an antenna to emit radio signals to activate tags and read and write data to them, while the tags are embedded in library items and are stored with vital bibliographic data. With the capability of tracking objects without contact, RFID can efficiently and effectively control data acquisition and communication of library books, in addition to improving library workflow and enhancing security. Libraries that have installed this system find monitoring the use of both library check-out and floor pick-up books easy. However, RFID was not designed for analyzing as well as presenting data. This is where GIS software can come into play. GIS is capable of demonstrating data of a spatial format in visualized forms to give people an easy understanding of the data. The distribution patterns of book use on shelfranges, either checked-out or picked-up books, are spatial in nature, though not geo-referenced. Books used inside a library can be clustered in certain layers of a shelf on the micro-scale, or they can be found grouping around certain regions of bookshelves. On the macro-scale, one may find heavily used books crowded on certain floors and sparse on other floors. Although these data are not geo-referenced, GIS will still handle them well and be able to compile the information for display so that the librarian is able to manage space effectively by “weeding” unused items. GIS was used as early as a decade ago by many European public libraries as a decision support

210

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

tool to assist the management of library routines (Hawkins, 1994; Tombarge, 1999). It was not until recent years that new GIS products, such as LibraryDecision, were marketed in the USA to fulfill the tasks of analyzing library needs and making the analytical information available for library management and operation. Such software packages are able to associate library statistical data with the geographical locations of neighboring communities, so that certain distribution patterns of library users can be displayed visually on maps. Interested readers can refer to the demonstration provided on the Web site of one of the GIS software companies[1]. However, there are several limitations of this type of GIS library management tool. For example, they may primarily suit the needs of public libraries rather than research libraries, where geographical analysis of library service territories is not the main basis for decision making. Research libraries – mostly university libraries – pay more attention to the infrastructures of research and teaching that they serve than to the geographical distributions of their patrons. Moreover, these products still rely on geo-referenced data for analysis, where all of their functions are linked to traditional maps, such as streets and city blocks, while library floor layouts are not what people consider to be traditional maps. As far as the current author is aware, there is a lack of research on the use of GIS products to deal with non-georeferenced spatial data for supporting the management of library collections and space planning. There is always room for different forms of data to be analyzed to solve the problems that currently exist in many libraries. To be specific, by incorporating library floor layouts into GIS systems, librarians can expect to have an efficient device through which information about book use becomes available and controllable. Some popular GIS products are now available in many research libraries as well as public libraries, making their use relatively inexpensive (e.g. Bergen, 1995; Boisse and Larsgaard, 1995; Cheverie, 1995; Cline and Adler, 1995; Larsgaard and Carver, 1995; Strasser, 1998; Suh and Lee, 1999). More importantly, GIS products are relatively easy to adapt to the needs of different tasks.

investigation would be beyond the scope of this paper and would require the full collaboration of library staff. Since an RFID device was unavailable, data collection was done through personal observations. The definition of floor pick-up books used in this paper refers to books and periodicals ranked on bookshelves in library collection sections and pulled out of stacks by readers. After being browsed or read, they are left on the floor, or at the desks, or loosely on the shelves, or in places other than their original positions inside the library. Library assistants regularly go through each aisle and corner of the library to pick up these books, and gather them in a sorting place where the books are ordered according to their call numbers. The final step is to reshelve the books into their designated positions on the shelves. On a regular day, the MacKimmie Library does this pickupsort-and-reshelve cycle three or four times, usually in the morning, afternoon and evening. The survey was designed to note down manually every call number of the floor pick-up books. It recorded all call numbers for each of the three reshelving cycles on the floors surveyed in two weeks. In order to check different book distribution patterns, the survey was undertaken during the week of final examinations as well as a week preceding it, between the times when book pick-ups were done and before book reshelving began. During the final exam week, the library was crowded with busy students who relied on library collections to prepare their term papers and examinations. Books were heavily used, and recording the call numbers for one floor would take an average of two hours. Therefore, several student assistants were hired to work simultaneously on each floor. Tables I and II list the total numbers of pick-up books for a period of 14 continuous days for five library floors. These figures were examined to make inter-floor comparisons for book use. Likewise, individual call numbers were entered into a database with the purpose of making comparisons between shelf-ranges on each floor to look for the relative distribution patterns of library book use in the ArcView analysis.

GIS contributions Data collection It is better to collect data from books checked out of libraries and from books read inside libraries to get an overall view of collection utilization. However, in this project, only books read inside the library were considered. This project is only a preliminary one, aiming to demonstrate how a GIS application can be applied to help librarians manage in-library book use. A full-scale

In order to prepare ArcView to exhibit different numbers of library pick-up books on maps, the floor layouts in most collection sections were measured. In the library, each rack of bookshelves contains eight similar-sized shelves that are placed together into a line. Each rack is labeled with a call number that represents the start number of the books on the shelves, and a call number represents the end number. Every two shelf-racks stand

211

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

Table I Numbers of monographs picked up December 1-15, 2002 in the MacKimmie Library Day Floor Fifth Sixth Seventh Eighth Tenth

1

2

3

4

5

6

7

8

9

10

11

12

13

14

100 82 56 76 10

174 148 153 235 134

247 172 146 157 123

221 109 79 146 168

248 217 124 312 116

165 209 78 195 110

99 97 86 70 12

156 136 31 4 25

152 150 158 161 118

159 170 81 21 56

134 158 61 70 71

150 91 45 127 91

153 160 65 105 113

51 20 55

Table II Numbers of periodicals picked up December 1-15, 2002 in the MacKimmie Library Day Floor

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Fifth Sixth Seventh Eighth Tenth

63 33 67 8 22

89 80 78 57 61

122 78 94 31 33

58 63 81 30 33

92 73 71 28 84

42 60 44 17 51

7 20 38 18 46

8 32 25 1 2

46 39 55 33 47

39 59 32 6 8

40 33 61 9 68

49 52 31 23 12

31 47 31 16 65

7

together back to back to construct a range of shelves, and an aisle is formed between two ranges to allow readers ro browse. The shelf-range was used as the analytical and display unit. The choice of the shelf-range as the key unit makes it possible to display book utilization visually in an easy way. ArcView was used to draw the shelf-ranges on “themes” for each library floor studied (“theme” is a GIS term that represents a collection of features drawn on a view). Each individual shelf-range was treated as a “polygon” on the floor maps (“polygon” in GIS is a feature that represents an area enclosed by specific boundaries; in this research, shelf-ranges are drawn as polygons of a rectangular shape). By examining the distribution of books falling into the call-number scope of each shelf-range and plotting the figures onto the floor maps, ArcView displayed various shades of colors (for book numbers) in different polygons (for shelf-ranges) showing the various densities of in-library book use in different locations. Then, by consulting the start and end call numbers of each shelf-range, the librarian deterimes what types of books are there. The majority of the collections are stored in a 13-story tower building. Some areas, such as the reserve room, the current periodical and newspaper floor, the reference corner, the university archives and special collection sections, were excluded from this research. To simplify the job, also excluded were floors containing books on the fine arts, music, maps, government documents, and microfilms and microfiches. Therefore, the project only included the fifth, sixth, seventh, eight and tenth floors, where regular library books and periodicals are stored and are available to the public. The call numbers recorded were B, C, D, E, F, G, H-HN, HQ-HX, J, K, L, N, P-PM, PN-PZ, U, V and Z from the Library of Congress Classification system. Call numbers not

21

included were A, M, Q, R and S from the main library, as well as QE, T-TX and W, which are in branch libraries. After recording two weeks’ worth of floor pickup books in the library, a relational database was designed in Microsoft Access to store the data, which was then imported into ArcView and later connected to the polygons. Several tables were created for the database: . a RANGES table for shelf-ranges, that lists the start and end call numbers of every range; . a PICKUPS table, containing the call number for each individual pick-up book; and . a FLOORS table, with floor identifications as a joint table. After the process of database de-normalization, the FLOORS table was eliminated and the other two tables were retained, along with a new table, BOOKS, which held aggregated data. The tables shown in Figure 1 are fairly simple. The underlined fields in Figure 1 are primary and foreign keys. All these field names are selfexplanatory except for Auto_ID, which represents a database-generated sequence number. The combination of Book_ID and Auto_ID is used in the PICKUPS table to handle situations where the same book was picked up multiple times. If this Figure 1 Tables created for the database

212

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

database was to be used by the librarian as a management tool to monitor book use, the RANGES table would be filled with data once and updated only when bookshelves are reorganized. If the database was designed for analyzing book use with other analytical units (e.g. bookshelf or shelflayer), instead of shelf-range, the designer would only need to alter the RANGES table. The operator has to add data into the PICKUPS table in the form of newly recorded call numbers of books. It is fortunate that RFID could be incorporated to fulfill this task. Whenever an update is done, users would only need to run a queery (see below) to generate aggregated information from the tables of RANGES and PICKUPS into the table BOOKS, which is the table which is connected directly to the GIS maps. Then, the maps will automatically mirror the update (see Figure 2).

different floors and different shelf-ranges. Tables I and II show the discrepancies in the numbers of books recorded on the floors observed. It is noticeable that some floors attract more readers than others. For example, the fifth and sixth floors have almost twice as many visitors the other floors studied. When the pick-ups are separated into monographs and periodicals, it can be seen that periodicals were more heavily browsed than regular books on the sparsely visited floors. This further reduces the number of books being pulled out of the shelves on these floors. The tenth floor may present an exception. Novels and storybooks are the primary collections on this floor. The numbers counted here around the final examination week may be biased because readers would not be spending much time on leisure reading. Needless to say, the bias can be easily fixed by conducting a more extensive observation to cover the time periods of different academic activities. Nevertheless, the differences of book numbers listed in Tables I and II have provided reasonable indications of book utilization at a floor-by-floor level, suggesting that current space arrangements on the floors library may enable collection development to be reconsidered. The library implement plans such as compressing books on the less visited floors and expanding the space and number of collections on the more heavily used floors. Any decisions, of course, should be made in conjunction with a consideration of the conditions of collections at the shelf-range level. The seventh floor tells a different story. It stores mostly law and education books. The University of Calgary has a law library and an education library which are operated separately by their respective schools. The primary readers of these libraries may be people from departments inside the schools, who find it more convenient to use their own libraries as major sources. If this assumption is correct, the MacKimmie Library should maintain a close relationship with the branch subject libraries on campus to avoid retaining the same inventories. The main library should stock a small but unique collection to supplement but not duplicate the publications in the branch libraries. Data at the shelf-range level paints a similar picture to that of the floor-level data. Books picked up are distributed unevenly over the space and across categories from range to range. Figures 3 and 4 show parts of the fifth floor that display different distributions of book use among shelfranges, because this is the busiest floor. For the sake of clarity, only certain portions of the floor layout are exhibited. Similarly, shelf-ranges are not drawn to exact scale. The degrees of color shading represent the frequencies of books being pulled out of the shelf-ranges. Each shelf-range stores a span of books. For example, the first range at the north-

Discussion of the findings “Space crunch” was a common phenomenon in the collection areas of the MacKimmie Library at the time this research was carried out. The width of an aisle between any two shelf-ranges had already been reduced to less than 90 cm. This distance makes accessibility marginal, and is too small for mobilityimpaired patrons, which suggests that there is no further space to reduce. At the same time, the shelves are very full. Conceptually, the standard library shelf capacity requires three-quarter occupancy of a layer as “working capacity” (Fraley and Anderson, 1990, p. 49). Small-scale shifting needs at least one-quarter of empty space. Most shelves in the Library are loaded far beyond the three-quarter mark. Unless the library adds additional floors or reduces current administration spaces, both of which seem unrealistic, no increase in physical space seems possible. The library will have to manage the development of its collection through rearranging existing collections and reconfiguring current spaces. The numbers of pick-up books counted illustrate that book utilization does vary between Figure 2 Running a query of the database to generate aggregated information for GIS maps

213

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

Figure 3 The layout of the fifth floor at the north-eastern corner. Different color levels represent the numbers of books pulled out of each individual book shelf-range by library users. The shelf-ranges are numbered 5-11 (from bottom up) and 12-26 (from left to right; see Table III). The floor layout is presented as an example, and is not to exact scale

eastern corner of the fifth floor consists of two back-to-back racks that currently load books from call numbers BF199 A1 L42 to BF319.5 B5, and from BF319.5 B5 B53 to BF456 R2, respectively. For reference, Table III lists the call number for every range (not rack) on the fifth floor. Figure 3 shows the layout of the fifth floor at the north-eastern corner, from which one can discover that shelf-ranges have various amounts of book use. For example, ranges 14-18 and 21-23 are darkest, showing more than 110 books pulled out from each range by users during the two weeks, while ranges 6-8 are the lightest, with less than 20 books read. The contrast between dark and light colors is shown even more strongly in Figure 4, the south-western corner, where some shelf-ranges (such as ranges 52-59) are close to white, which tells us that the books stored on them were hardly browsed at all. Shelf-ranges in other parts of the fifth floor as well as on other floors exhibit the same pattern: some shelf-ranges contain more useful items than others. The importance of using the GIS visual tool to assist the management of library collections is thus very apparent. By referring to these floor maps, the librarian learns where adjustments are needed. Special attention should be paid to the shelf-ranges where books are seldom or most often pulled out, so that they can be investigated further.

If such shelf-ranges are consecutively distributed, like ranges 14-18 or ranges 52-59 on the fifth floor, the librarian may need to reconsider the layout of the ranges. In the Library, shelf-ranges 14-18 are from call number BF to BQ, containing books on psychology, aesthetics and religion, while shelfranges 52-59 have books roughly from call numbers DG to DP, representing European history. The next concern for the librarian is to explore why such books were ignored and how to work on “weeding” or “expanding” them. Understanding of book use distributions can be reached on different levels. In general, at the floor level, materials usually fall into a group of categories. Many libraries organize their books at the section level on a floor to separate books of different categories: for example, law books are separate from economics books and are located in a separate area. The MacKimmie Library, however, does not have section separations: rather, shelf-ranges are placed in a consecutive order on a floor. This research suggests that efforts to determine the patterns of book distribution at the shelf and layer levels will not produce as much relevant information for reconsideration of collection development as investigations at the shelf-range or rack levels. The shelf-range maps have shown their appropriateness for displaying

214

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

Figure 4 The layout of the fifth floor at the south-western corner. Different color levels represent the numbers of books pulled out of each individual book shelf-range by library users. The shelf-ranges are numbered 40-45 (from top down) and 46-60 (from right to left; see Table III). The floor layout is presented as an example, and is not to exact scale

the differences in use distribution of pick-up books. On the other hand, information on book use at the shelf and layer levels may help investigators wishing to examine certain aspects of the behavior of library readers, such as the influence of bookshelf height on readers’ attitudes towards browsing books randomly within a library.

Conclusion While monitoring the frequency of book use by location in a library is tiresome work, such information is necessary for collections management. This research attempts to make the task easier by developing GIS into an automated tool. For a scientific survey, two weeks may be too short to yield adequate data for the interpretation of different book uses in a library. However, since the present work concentrates on process rather than results, it is acceptable if its findings are only suggestions. In the future, there could be a seamless marriage between RFID and GIS in which the former efficiently collects and updates data, and the latter works on analyzing and displaying data in visual form. The current research has provided a number of interesting results. It has found that the utilization

of pick-up books varies in frequency from floor to floor, and from shelf-range to shelf-range. The uneven patterns of book distribution present some reasonable correlations. For example, the categories of books that are less used may find their counterparts in branch libraries on the campus. Collections which are duplicated in the main library and branch libraries represent a waste of resources. Simultaneously, the distribution data of shelf-ranges suggest the possibility of rearranging current collections and reconfiguring library spaces to meet growth needs if no increase in physical space is proposed. What makes this project important is that such a practical device can be used by any library (especially large-scale research libraries), for management of their collections, and is very simple to develop and refine. Unlike other currently commercial available GIS library management products, the development of such a small tool is virtually cost-free. The only requirement is GIS software. Since it uses very few basic functions of GIS software, most GIS products can be applied. The tool developed can also be embedded into an Internet or intranet program for online access to eliminate the need to install a GIS package onto client machines. More research on this topic is recommended.

215

GIS in the management of library pick-up books

Library Hi Tech

Jingfeng Xia

Volume 22 · Number 2 · 2004 · 209-216

Table III Book shelf-ranges on the fifth floor with their start and end call numbers Range Start call number End call number Location 1 DT1 W64 DT365 Q95 N 2 DT365 S781 DT776 F3 W3 N 3 DT776 J3 C6 B1 A1 N 4 B1 A168 B3 A7 N 5 B3 A7 B398 I6 N 6 B398 I6 B829.5 S53 N 7 B829.5 S66 B2331 M54 N 8 B2340 A5 B4373 K5 N 9 B4373 P32 BF1 A51 N 10 BF1 A55 BF21 P843 N 11 BF21 P843 BF199 A1 L42 N 12 BF199 A1 L42 BF456 R2 E 13 BF456 R2 R47 BF713 L58 E 14 BF713 L58 BF1999 H34 E 15 BF1999 K54 BL51 A1 I56 E 16 BL51 A1 S64 BL1138.69 V46 E 17 BL1138.69 W67 BM503 G3 E 18 BM503 G3 G6 BQ1138 B85 E 19 BQ1138 C63 BR1 R44 E 20 BR1 S36 BR140 J6 E 21 BR140 J6 BR1720 A65 E 22 BR1720 A9 BS1192.5 H54 E 23 BS1192.5 M20 BT92.9 W35 E 24 BT98 A82 BV4241 I5 E 25 BV4241 K52 BX940 R47 E 26 BX940 R47 BX2350.2 S62 E 27 BX2350.2 S78 BX5110 W34 E 28 BX5117 W54 CB53 D85 E 29 CB53 D85 CC77 A1 I56 E 30 CC77 B3 CS410 H35 E 31 CS410 H352 D1 C46 E 32 D1 E85 D13 B965 E 33 D13 B97 D111 T72 E 34 D111 T72 D410 F71 E 35 D410 F71 D743 C56 S 36 D743 C56 D811 G34 S 37 D811 G66 DA20 A62 S 38 DA20 A62 DA122 V5 S 39 DA125 A1 DA483 W2 S 40 DA483 W2 DA566.9 Q79 S 41 DA556.9 P4 B73 DA763 T48 S 42 DA765 D66 DB361 J67 S 43 DB366 E8 DC1 N48 S 44 DC1 N48 DC146 B97 S 45 DC146 C12 DC367 R44 S 46 DC369 A52 DD120 U5 W 47 DD125 B82 DD282 W47 W 48 DD284 A44 DF10 R45 W 49 DF10 R45 DG90 T82 W 50 DG91 B3 DG737.42 Y8 W 51 DG737.5 C33 DK1 A75 W 52 DK1 B9 DK28 J63 W 53 DK28 K34 DK67.3 P62 W 54 DK67.4 D66 DK246 Y33 W 55 DK251 G64 DK266 A2 W 56 DK266 A2 DK275 V471 W 57 DK276 A1 DK600 W 58 DK601 DK4800 S64 W 59 DL1 A62 DP80 N38 W 60 DP81.5 G66 DR366 M5 W 61 DR367 B44 DS41 T72 W 62 DS41 T72 DS110 G3 W 63 DS110 H38 Missing W 64 DS247 K85 DS450 R8 S82 W 65 DS450 T513 DS501 J593 W 66 DS501 J6 DS759 M521 W 67 DS759 R42 DS805.2 C43 W 68 DS805 K64 DT1 T72 W

Note 1 See www.civictechnologies.com/librarydecision/index.cfm

References Arfield, J. (1993), “Pruning, weeding and grafting: strategies for the effective management of library stock”, Library Management, Vol. 14 No. 3, pp. 9-16. Bergen, P.F. (1995), “Of special interest: interactive access to Geographic Information Systems and numeric data on the World Wide Web”, The Journal of Academic Librarianship, Vol. 21 No. 4, pp. 303-8. Boisse, J.A. and Larsgaard, M. (1995), “GIS in academic libraries: a managerial perspective”, The Journal of Academic Librarianship, Vol. 21 No. 4, pp. 288-91. Cheverie, J.F. (1995), “Getting started: ready, set . . . get organized!”, The Journal of Academic Librarianship, Vol. 21 No. 4, pp. 292-6. Cline, N.M. and Adler, P.S. (1995), “GIS and research libraries: one perspective”, Information Technology and Libraries, Vol. 14 No. 2, pp. 111-15. Fraley, R.A. and Anderson, C.L. (1990), Library Space Planning, Neal-Schuman, New York, NY. Hawkins, A.M. (1994), “Geographical Information Systems (GIS): their use as decision support tools in public libraries and the integration of GIS with other computer technology”, New Library World, Vol. 95 No. 7, pp. 4-13. Kent, A. (1979), Use of Library Materials: The University of Pittsburgh Study, Marcel Dekker, New York, NY. Lancaster, F. (1988), “Obsolescence, weeding, and the utilization of space”, Wilson Library Bulletin, Vol. 62, pp. 47-9. Larsgaard, M.L. and Carver, L. (1995), “Accessing spatial data online: Project Alexandria”, Information Technology and Libraries, Vol. 14 No. 2, pp. 93-7. Leighton, P.D. and Weber, D.A. (1986), Planning Academic and Research Library Buildings, 2nd ed., American Library Association, Chicago, IL. McGrath, W.E. (1971), “Correlating the subject of books taken out of and used within an openstock library”, College and Research Libraries, Vol. 32 No. 4, pp. 280-5. Montanelli, D.S. (1987), “Space management for libraries”, Illinois Libraries, Vol. 69, pp. 130-8. Morse, P.M. (1978), Library Effectiveness: A Systems Approach, MIT Press, Cambridge, MA. Nkereuwem, E.E. and Eteng, U. (1994), “The application of operations research in library management”, Library Review, Vol. 43 No. 6, pp. 37-43. Strasser, T.C. (1998), “Geographic Information Systems and the New York State Library: mapping new pathways for library service”, Library Hi Tech, Vol. 16 No. 3, pp. 43-50. Suh, H. and Lee, A. (1999), “Embracing GIS services in libraries: the Washington State University experience”, The Reference Librarian, Vol. 64, pp. 125-37. Tombarge, J. (1999), “Using management software for Geographic Information Systems (GIS)”, The Bottom Line: Managing Library Finances, Vol. 12 No. 4, pp. 146-50. Truewell, R. (1968), Analysis of Library User Circulation Requirements – Final Report, NSF Grant No. 435, Government Printing Office, Washington, DC.

216

Introduction

PSU Gateway Library: electronic library in transition Lesley M. Moyo

The author Lesley M. Moyo is Head, Gateway Libraries, Penn State University Libraries, University Park, Pennsylvania, USA.

Keywords Digital libraries, Library systems, Academic libraries, Communication technologies, Information services

Abstract Developments in information technology have led to changes in the mode of delivery of library services, and in the perceptions of the role of librarians in the information-seeking context. In particular, the proliferation of electronic resources has led to the emergence of new service paradigms and new roles for librarians. The Gateway Library at Penn State University (PSU) is an electronic library in transition, with new technology-based services evolving to address the ever growing and changing needs of the academic community. It facilitates access to and navigation of electronic resources in an integrated technology environment.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 217-226 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543539

Librarians have in the past been referred to as “information gatekeepers”. Over the years, the concept of “gate-keeping” has evolved from one implying prescription of appropriate information to mediation in the information-seeking process, and now to facilitating information access and navigation. In the traditional library of the 1960s, “gate-keeping” activities focused on bibliographic instruction or information retrieval using printed indexes, abstracts, or catalogs. These were the primary retrieval tools of the information profession. By the 1970s and early 1980s, with the advent of automated library systems, bibliographic instruction revolved around OPACs, CD-ROMs and online databases. In addition to printed sources, these became the primary information retrieval tools. Consequently, bibliographic instruction incorporated elements of computer literacy. At that time, costs for online searching were significantly higher than they are at present, and therefore searching was largely done by designated reference librarians who had special training and were experienced in database searching. The late 1980s and 1990s witnessed an unprecedented explosion of advanced technologies that have led to the transformation of libraries and the information-seeking process. In particular, the phenomenal growth of the Internet and related technologies has been the major driving force of change. Fountain (2000) discusses some key trends in Web-based services in academic libraries and projects future trends, which she believes put librarians in “the best possible position to assist in ushering in a world of improved access for both faculty and students”. The concept of librarians as gatekeepers has taken on a new dimension – that of the librarian as an information facilitator within the electronic/digital environment. The electronic library is user-centered. Emphasis is on empowering the user to become a competent independent researcher within the networked electronic environment. Furthermore, the electronic library is extending far beyond the physical space to a virtual presence in the office, home or dormitory room. Libraries are now striving to offer fully fledged library services and transactions to patrons who come in through both the physical doorway and the electronic doorway. Some of the resultant challenges that libraries are facing now include: Received 4 August 2003 Revised 16 October 2003 Accepted 9 February 2004

217

.

.

.

.

.

.

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

coping with the surge in library patron numbers as remote users come on board; offering equitable library services to both local and remote users; redesigning library public services to incorporate virtual services; offering instruction and reference services to remote users; facilitating database access for remote users and dealing with issues of licensing, authentication, or accessibility; and keeping up with technology changes that are defining new modes of service delivery.

The expectations of library patrons have also changed significantly as a result of the ubiquitous nature of information technology. Patrons who are aware of the capabilities of technology and have benefited from its convenience now have new and higher expectations of the library. For example, many library patrons, particularly in academic and research environments, nowadays expect libraries to facilitate access to full-text journals and databases. They also expect library Web sites to not only offer information about services, but also provide online sites where actual transactions such as ILL requests, book renewals, interactive instruction, access to course reserve materials and real-time reference can be conducted remotely. Although it is highly unlikely that the need for, and acquisition of, printed resources in libraries will diminish over the next decade, there is an apparent increasing preference for electronic resources offering remote accessibility. Therefore, we are witnessing a marked shift in library collections and services priorities and, subsequently, in the way libraries deploy their financial resources.

Electronic services librarianship Librarians are recognizing many changes in their role as “information gatekeepers”. Developments and changes in information technologies that facilitate library and information work have necessitated corresponding changes in the skills and competencies of information professionals. This is evident from job postings over the recent years. Postings for “Electronic Services Librarian”, “Digital Initiatives Librarian”, “Information Technologies Librarian”, “Multimedia Services Librarian”, and “Internet Services Librarian”, are a common feature on professional job advertisements. Guenther (2000) observes that: . . . with the wide-scale adoption of the Internet, some question what the librarian’s role will become in the digital age. Some predict the demise of the profession, that the Internet and the Web would

allow those we serve to serve themselves. Nothing could be further from the truth! As more information is offered online, librarians are increasingly called on not only to tame the wildness of the Web, but to design information products that feature ease of access to Web-based resources. I think the Web’s impact will be to shift librarians from serving patrons directly to developing information products for patrons’ self-service access.

Furthermore, Guenther (2000) suggests that: Today’s technology allows us to spin our jobs in any direction, from trainers and searchers who bridge the gap between patron needs and information, or increasingly, to behind-the-scenes product designers and Web site managers. The Internet doesn’t just expand the opportunities within our profession it forces us to do what we do so we’ll reinvent ourselves.

The current technology environment, with its facility for sophisticated information analysis, processing and manipulation, has also fueled the information explosion, leading librarians and other information professionals to recognize the fact that facilitating information access is rapidly becoming more significant than ownership of collections. The trend over the last decade has been for libraries to invest a lot of money in setting up information technology infrastructures to enable them to access information globally. Libraries are now able to facilitate access to information located anywhere in the world which they may not necessarily “own”. More than ever, focus is on the information content rather than its source. This dynamic context has led to the emergence of a new facet of librarianship, which may be termed “electronic services librarianship”, which by nature implies a focus on facilitating access to electronic resources as well as providing support services to patrons to enable them to access and navigate these resources more efficiently. In terms of scope and programming, electronic services librarianship may include instruction in the use of resources, development of electronic search and evaluation tools, research guidance in a networked electronic environment, packaging and repackaging of electronic information resources to meet specific user needs, development and management of Web sites, indexing Web resources, and virtual reference services. It may also incorporate working with vendors and negotiating licenses for access to subscription databases as well as addressing copyright and other scholarly communication issues that have emerged in the new electronic/digital environment. In “keeping the electronic gateway”, therefore, the electronic services librarian facilitates access to electronic and digital information in much the same way as librarians traditionally acted as “gatekeepers”, facilitating access to printed

218

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

sources, and providing bibliographic instruction to support the use of resources in the traditional library. In the process of facilitating information access in the new library environment, librarians still need to point users to collections of relevant printed sources. These may be identifiable via electronic indexes and other finding tools. In the new environment, librarians are dealing with a hybrid of collections in both printed and electronic formats. A number of models exist for handling collection development of both print and electronic collections. At Penn State, the two roles are integrated, with subject specialist librarians selecting both print and electronic resources for the subject areas for which they are responsible. Another aspect of “keeping the electronic gateway” that requires librarians to deal with a hybrid situation is that libraries are now serving users in a physical space as well as a virtual space. Whereas the delivery mode of some of the current services/resources caters for both in-person and remote users, there are other services that only cater for in-person patrons. More services are required to meet the additional unique needs of remote patrons.

of information access and navigation, developing services and products that respond proactively to user needs, and then helping users to use these services effectively by providing them with the necessary training and guidance. On the other hand, it entails prioritizing service objectives, the deployment of financial resources based on these priorities, and identifying cost-effective options in service design and programming within a dynamic environment. The role of the librarian in the electronic library has many inherent challenges, some of which are outlined below. The electronic library is technology-based, and it is important to keep pace with the technological developments that drive the library’s services and resources. This entails continuous exploration of new technologies and a willingness on the part of the library to embrace the technologies that are deemed necessary for the continued development and enhancement of library services. New sets of skills and competencies are required of librarians who work within the new technology environment. Electronic services librarianship demands that the librarian has an understanding of the library’s public services within the context of its organizational structure and culture, and also has the technical knowledge that will enable him or her to adopt technologies and design services that are appropriate for that environment. In the electronic environment, where the library services are not necessarily associated with a physical space, there is a greater need for access tools that patrons can use independently. Librarians may find themselves spending much of their time developing these access tools or evaluating and testing commercial access tools for possible use in their libraries. Library users have disparate computer literacy and information literacy competencies, and need assistance at their individual levels. Librarians working in electronic libraries are increasingly under the pressure of technical issues over which they may not have control. Moreover, it is becoming increasingly difficult for electronic services librarians to divorce the technology that provides the service platform from the services and information sources. Meeting the needs of remote users comes with unique challenges because it is more difficult to ascertain their specific needs from a distance. There are now more players in the information facilitation arena. In offering access to databases and other electronic resources, libraries enter into partnerships with database aggregators, information brokers, commercial online services, electronic publishers and government agencies.

Challenges of electronic services librarianship New service paradigms are now emerging as libraries redesign their services to meet the changing needs of their users. This is particularly apparent in academic libraries. For instance, the concept of “information commons” is something that has emerged and evolved in the recent years. The scope and role of the evolving information commons within the academic library environment is still being actively addressed. This is evident from the professional literature (Bailey and Tierney, 2002; Beagle, 2002; Cowgill et al., 2001) and meetings. In most cases, these “information commons” are a conglomeration of networked computing resources and facilities that provide gateways to access and navigate the library’s electronic resources, and may include value-added services to support the use of these resources. The Gateway Library at Penn State is an example of such a gateway. It offers an integrated technology environment with access to the University Libraries’ electronic resources, and offers reference and instructional services to assist patrons to navigate these resources. Building and sustaining an electronic gateway library requires commitment to meeting changing user needs. On the one hand, it entails keeping a watch on the behavior of users and their response to various new technologies, harnessing the technologies that offer promise within the context

219

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

Libraries must provide efficient access and competitive services within this arena. Previously, collection building was synonymous with ownership. Now, it could mean paying a subscription to access items owned and controlled by another agent. Agreements with other agencies could result in the limitation of remote users’ access rights to resources, depending on the licensing terms of the agent.

The Gateway Library is made-up of two physical spaces: The Pollock Laptop Library and the Gateway Commons. Both are relatively new libraries that were created in response to the needs of users to access, process and integrate data and information electronically.

The Gateway Library at Penn State University All aspects of technical and public services, administrative functions and organizational structures in every type of library have been affected by changes brought about by technological developments (Saunders, 1999). The evolution of the Gateway Library at Penn State University is an example of one response to such change. Dowler and Farwell (1996) define the gateway as “a transition from a traditional library to the emerging world of digital information and distributed computing”. The concept behind the Penn State University Libraries’ gateway library implies that: . it provides services in a physical as well virtual space; . it provides an electronic means of accessing and navigating all formats of information; . it provides the necessary technological infrastructures to enable this access; and . it supplies a cluster of value-added services that support and enhance access to the electronic and printed resources.

Context of the Libraries Penn State University is unique in that it consists of one centrally administered university which is geographically dispersed – the largest campus at University Park, plus 23 additional locations throughout Pennsylvania. Within this structure and context, the University Libraries consist of eight subject libraries in the main adjoined Pattee and Paterno buildings at University Park (of which Gateway Commons is one), five branch libraries elsewhere on the University Park campus (of which the Pollock Laptop Library is one), and libraries at each of the 23 campus locations, all performing as one research and teaching library. The collection is fully integrated and accessible from any location via the Library Information Access System (LIAS) on the Web. These libraries serve in excess of 84,000 students, plus university faculty and staff, and are open to all Pennsylvania residents.

The Pollock Laptop Library The Pollock Laptop Library was originally the Undergraduate Library, with print collections and traditional undergraduate library services. In 1998, one of the librarians, Tona Henderson, led a task force that pioneered the concept of a “laptop library”. Subsequently, the library was renovated and converted to an electronic library without any print collections. It became operational in the fall of 1998, to meet the increasing need for students to have additional computing space on campus with 24-hour access to the Internet, the LIAS and application programs. Current services The overall goal of the Pollock Laptop Library is to provide students with a “one-stop shopping” location where they can conduct their library research, create complete products such as documents, Web pages, presentations or multimedia products, and get help from library staff through the entire process. To facilitate this, a wide range of resources have been put in place, and special support services are offered. Laptop circulation Laptops are loaned for circulation within the library building. Each laptop is configured to link to the Dynamic Host Configuration Protocol (DHCP) Pollock Network, providing access to the Internet and LIAS. The laptops are bar-coded and circulated via the same library circulation system (Unicorn) that handles the rest of the Libraries’ circulating collections. Fines and charges can be levied on defaulters, just as in the case of books and other library materials. Support for use of enhanced workstations One of the newer services is support for the use of the newly implemented enhanced workstations. Pollock staff help students in the use of specialized application programs like Endnote, SPSS, Macromedia products, Adobe products, etc. that are loaded on the enhanced workstations. Technical support The main technical support for the facility is provided by the University Libraries’ Information Technologies Department and the Digital Library

220

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

Technologies Unit. On-site technical support is provided by the Pollock Laptop Library staff, who are responsible for assisting users with the “How To. . .” aspects of use, troubleshooting and reghosting the laptops once returned. The Pollock Laptop Library staff also carry out minor repairs such as replacement of network cables and damaged network cards. Enhanced reference services General reference to support researchers using the LIAS and Internet are available. The reference service incorporates a streamlined referral system to connect patrons to other University Park subject libraries and subject specialist librarians elsewhere on campus. Media and Technology Support Services Unit drop-off/ pick-up center The Pollock Laptop Library is one of the few designated drop-off/pick-up locations for videos that patrons order from the Libraries Media and Technology Support Services Unit, which is located on the periphery of the University Park campus. The videos can also be viewed within the Pollock Library. 24/7 service The Pollock Laptop Library is geared to primarily service on-site patrons. One of its unique features is that it offers services 24 hours a day, seven days a week throughout the time that the university is in session (fall and spring semesters). Its close proximity to residence halls makes it a popular research and study facility. Furthermore, it has a relaxed food and drink policy, coffee supplies are provided by the library and free coffee is brewed daily by library staff between midnight and 8.00am. Resources .

.

.

There are currently 40 Dell 800 laptops (there were 50 Dell Latitudes in the first generation of laptops which were in use from 1998 to January 2002) loaded with the Windows operating system, the Internet Explorer browser, and the Microsoft Office suite. Each laptop has a floppy drive as well as CD-ROM drive, and comes with a headset, network card, and necessary cables for connecting to the network. Additional laptop peripherals available on request include Zip drives, CD-RWs. There are five enhanced workstations loaded with Macromedia products (Dreamweaver, Flash, Fireworks, Freehand), Adobe products (Acrobat, Photoshop, Illustrator and Pagemaker), Endnote, SPSS and the

.

.

.

. .

. .

.

Microsoft Office Suite. All workstations have CD burners and Zip drives, and two of the workstations have scanners attached. There are 54 wired individual carrels equipped with data ports and power outlets. There are five PCs providing access to the Internet and LIAS. There are three multimedia stations with DVD. There is one network printing station. There is an instruction/seminar room with 23 PCs providing access to the Internet and LIAS. The instructor’s workstation has projection facilities. There are two VCR stations. There is a quiet study room with 12 individual carrels. There are group study/discussion areas in the center of the main room.

The Pollock physical and technological infrastructure allows students to do their research in an integrated technology environment, having access to all PSU libraries’ electronic resources.

The Gateway Commons The Gateway Commons is an electronic reference center. It comprises a cluster of services that provide entry to myriad digital and electronic resources available to library patrons. It became operational in August 1999 and evolved from a perceived need for value-added support services to enhance user access to and navigation of the evergrowing body of electronic resources available. Although currently defined by a physical space in the adjoining Pattee/Paterno buildings at University Park campus, its services incorporate the needs of remote users, and it is one of the university libraries locations participating in offering virtual reference services to the entire Penn State Community. The genesis of the Gateway Commons at Penn State is similar to that of the Gateway Library at Harvard College (Dowler and Farwell, 1997): it has evolved out of a library in the process of physical and organizational transition. The construction of the new Paterno Library and the renovation of Pattee Library led to the decentralization of the Library’s general reference services and their incorporation into the new subject libraries. In the new organizational structure there was no longer a General Reference Section to serve users who were initiating research projects or researchers whose scope of research was general or multidisciplinary. There was need for a referring library to help users to identify a

221

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

home library based on their discipline. Furthermore, the exponential growth of electronic resources created a need for more aggressive assistance and help for patrons in locating appropriate databases and searching. In response to these needs, and to the opportunity of a new physical facility, a proposal to create a Gateway Library was made. The concept proposal for a Gateway Library (later called Gateway Commons) at Penn State was initiated by a team of librarians in the former General Reference Section. The Gateway Library as it is structured today incorporates the Pollock Laptop Library and the Gateway Commons. The first head, Tona Henderson, worked with a team to develop the strategic framework within which the Gateway Library currently operates. However, It was recognized within the initial development plan of the Gateway Library that its development would be of an evolutionary nature because of the changing nature of technology coupled with the changing needs of users. The Initial Concept Statement of March 1995 articulated the following mission statement: The Gateway Library integrates outreach, instruction and information packaging in a way that enhances patron awareness of the Libraries’ resources, develops personal competence in library users, and improves individual skills in the information environment of the virtual library (Penn State University Libraries, 1995).

This mission statement formed the basis of programs and activities for the initial year of operation, and is likely to be at the core of future programming. Functions of the Gateway Library The functions of the Gateway Library are grounded in its three-pronged mission of: (1) instruction; (2) information packaging; and (3) outreach. Instruction The University Libraries cannot make assumptions concerning the computer and information literacy of library users, particularly when new electronic resources are being incorporated into the system at an increasingly accelerated rate. Each new resource generates a need for user instruction and support. The teaching mission of the Gateway addresses this through various programs targeted at individuals as well as groups. The Gateway also works in liaison with the Instructional Programs Unit, which is responsible for the broader instructional mandate for the Penn State Libraries. Instructional activities in the Gateway include, among other things, instructive reference service,

teaching library research classes on the Library Information Access System, seminars on electronic research, development of Web-based tutorials, general research guidance and more recently, one-on-one research mentoring and virtual reference services. Information packaging This is a rapidly growing aspect of the Gateway Library services. It includes the creation and management of databases, Web pages; Web-based bibliographies, indexes, etc. The Online Reference Resources Web pages[1] and the FAQ[2] are some of the examples of resources created/maintained by Gateway. As the number of Gateway Web resources has grown, there has been a need to have a designated Web master to manage the existing resources as well as create other resources as needs arise. Outreach Outreach activities include tours of Pattee and Paterno Libraries and orientation to overall library services. Other outreach activities are targeted at various user groups: The main objectives of the outreach drive are: . to take aspects of the Library services beyond the walls of the library to the users in their dorms, offices or even in their virtual space; and . as a promotional/marketing strategy to demonstrate the support that the Libraries in general, and the Gateway Library in particular, can offer in an academic and research environment. Activities so far include international student fairs, adult student fairs, on-the-road seminars in residence halls, the Writing Center, and lecturing in departments. More recently there has been outreach to student athletes, sororities and fraternities and other non subject-based groups. Outreach to subject-based user groups is the responsibility of the respective subject libraries. Although most of the services are currently only offered in the Gateway physical space, the Library is transitioning to virtual services. Apart from servicing the needs of on-site patrons, the current physical space is also the laboratory for designing and testing the products/services of the future. As a result of the fact that Gateway is technology-based and driven, part of its charge has become to develop initiatives intended to harness the potential of technological developments and new access systems. It is also a place to take risks, to try new ideas and develop new service modalities while continuously evaluating them.

222

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

Current resources Current resources include: . The main electronic information arcade with 28 PCs providing access to the Internet and LIAS. . An instruction/seminar room with 18 PCs providing access to the Internet and LIAS. The instructor’s workstation has projection facilities. . A network printing station. . The multimedia room, which was developed recently and is still in the process of being upgraded, currently has four workstations loaded with software for graphic designing and Web authoring, desk top publishing, and video editing. Other hardware resources in the multimedia room include a camcorder, digital camera, flatbed scanner, art pad, laser printer and color ink-jet printer. The resources are available for the use of all library faculty and staff, as well as any parties that are working in collaboration with them. . Gateway service/help desk staffed at all hours the library is open. . A limited number of laptops (eight), which are available for circulation to patrons and for use within the Pattee and Paterno buildings. New initiatives: Research Mentoring Program The Gateway Library Research Mentoring Program initiative was conceived to address issues of very specific, one-on-one instruction to students, especially returning adults, transfer and international students, and to extend the presence of the Gateway concept throughout the University Park campus. This program was also conceived as a result of the perceived need to assess levels of information literacy skills and competencies among undergraduates and returning adult learners with a mission of establishing further programs that would enhance the current efforts. The program was launched in August 2000. The primary objectives of the program are to: . demystify the technology-driven research process by humanizing it (hi-tech/hi-touch mix); . inculcate information literacy skills and competencies in the students; . enrich the students’ academic experience by making them aware of the enormous resources at their disposal and helping them share in the intellectual promise of a great university library; . offer students instruction in accessing and navigating myriad digital/electronic resources available to them; and . identify any common problems being faced by students within the context of information

literacy and address them through appropriate programming. This program employs the mentoring model to offer research guidance to students. The program is popular among adult learners experiencing difficulty readjusting to academic work, and to incoming freshmen who are still grappling with information literacy. The program is personalized, and help is offered/available at the time of need. Furthermore, it is a “high touch” in a “high-tech” environment. Students self-select to participate. The only requirement is for the students to have a real assignment or project that they are working on. (Moyo and Robinson, 2001) Virtual reference services The Gateway library is part of the Penn State initiative that led to the establishment of virtual reference services for the Penn State Community. The pilot phase of the service went live in September 2001. This initial phase involved three campuses participating in offering virtual reference services, i.e. Delaware County campus, Hazleton campus and a World Campus subgroup. A total of 15 hours of online real-time reference service were available weekly. Two of the three participating campuses (Delaware County and Hazleton) are undergraduate campuses. The World Campus (virtual campus) sub-group that participated in the pilot was made up of graduate students based in different locations globally, taking online courses. This diversity provided an opportunity to observe the different patterns of usage among undergraduate and graduate students, as well as among local and remote students. After the pilot, it was decided to proceed with the full service. The full service was launched in the fall of 2002. Now open to the entire Penn State community, the service has expanded hours and the number of participating locations and librarians has increased to other Penn State Campuses throughout Pennsylvania[3].

Financial implications of providing electronic library services Making facilities of this nature available to an academic community has major financial implications. Apart from the initial costs of setting up the basic infrastructure for the electronic library services, there are ongoing costs for the technology hardware and software. Penn State University Libraries follow a three-year lifecycle replacement plan for most of the technology in place. Furthermore, apart from lifecycle replacements,

223

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

there are ongoing software upgrades and purchases to meet the growing/changing needs of users. These are major expenses associated with the basic infrastructure. Other costs are those of electronic information sources that the Libraries subscribe to. The costs for subscribing to these electronic databases (over 400 databases), e-books, full-text e-journals, etc., is part of the annual collections budget. Current trends show a rise in the costs of electronic resources subscriptions. Although electronic sources are being continuously enhanced and improved, escalating costs are a major concern to most academic libraries. Additional costs associated with electronic libraries are technology support costs. As a result of the complex technology infrastructure and the large amounts of hardware and software in use, technology support is an integral part of the operations and has to be taken into account when planning. At Penn State, Information Technology Services (ITS) is responsible for overall campuswide computing infrastructure and resources and technology facilitation. The Libraries are responsible for technology and resources that permit access to and navigation of the various information resources. The Libraries’ Information Technology (I-Tech) and Digital Library Technologies (DLT) departments provide technology support for the Libraries. The support offered by these two Departments is a key factor in the successful operations of the Gateway Commons and Pollock Laptop Library.

further boosted the electronic libraries. Gateway staff have also been very careful not to minimize the importance of printed books when helping patrons. Often when students “discover” online full-text resources, and when they view e-books online, they tend to avoid printed books in their subsequent research. Gateway staff still emphasize the importance of printed books by highlighting to the students that many of the online support services do help expedite access to or delivery of printed books and other sources. For example, patrons requesting articles via ILL may have these documents sent to them as e-mail attachments in PDF format through the new ILLiad service (electronic delivery service). These services help to ensure that patrons do not lose sight of the importance of printed resources because of the convenience of online full-text resources. A recent (Summer 2003) patron survey at the Pollock Laptop Library drew very interesting feedback from patrons regarding user preferences and needs, as well as an indication of patrons’ overall response to new service modes of electronic libraries. The preliminary findings of the study show that the overall result of a user satisfaction poll was overwhelmingly positive, with all respondents rating their overall satisfaction as “good” or “very good”, with the majority of these (85 per cent) rating the services as “very good” on a four point Likert scale which had the ratings of “poor”, “satisfactory”, “good”, and “very good”. The usage patterns of respondents also indicated that patrons used/visited the Laptop Library primarily because of its technology facility and access to networked resources, rather than for other more traditional services such as reference help. In open-ended questions asking respondents to suggest new services, respondents’ suggestions ranged from access to more electronic resources, to more space and longer hours of operation over the summer as well (currently, 24/7 operation is for the fall and spring semesters only, while during summer, the library closes at midnight).

User response to changing service paradigms Patron needs have been part of the driving force leading to the implementation of the Gateway Commons and the Pollock Laptop Library at Penn State. Therefore, the concept of an electronic library with no print collection and a laptop library with a myriad of other technology resources and facilities does not come as a surprise to most Penn State users, who are already technology savvy and accustomed to working in networked technology environments. Those patrons who walk into this environment for the first time may be a little surprised initially. Some of them ask “Where are the books?” We explain that there are no printed books in these locations, but we are quick to add “But we can get you the information you need”. Once the students articulate their information needs and the staff help them identify and locate online journal articles, they are highly pleased. The recent addition of netLibrary and Safari e-books to the Penn State Libraries collection has

Similar technology initiatives at other institutions Many academic libraries are experiencing increasing patron demands for access to a wide range of electronic resources and support services. Many libraries have responded with innovative ways of meeting this need, and have set up complex technology infrastructures, typically networked environments that make it possible for library patrons to conduct their research online. Assistance in the use of these resources has also come in a variety of ways, such as the introduction

224

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

of virtual reference services, and the creation of special units such as information commons offering services like those offered by the Gateway Commons at Penn State. Cowgill et al. (2001) cite several universities that have created information commons as a way of delivering resources and services to users in networked electronic environments, among them Emory University Libraries, Kansas State University, the University of Arizona, the University of Iowa, Colorado State University and others. Although models vary from institution to institution, the underlying concept is the same. Bailey and Tierney (2002) also discuss the concept of information commons and make suggestions for the sustainability of information commons through training and evaluation. Beagle’s (2002) discussion of the information commons concept emphasizes the instructional role of the information commons within an integrated learning environment, and explores a number of research concepts relating to information commons as a discovery bed, and the linking of information commons to other networked learning resources. The circulation of laptops is also a rising trend. Lyle (1999) and Vaughan and Burnes (2002) discuss laptop lending in their institutions as a service that has evolved as a result of the changing needs of users. Laptops offer flexibility and convenience to students in the modern academic environment. Students may borrow library laptops, or they may bring in their own and connect to the network to access library resources. The concepts of information commons and laptop lending are just examples of emerging service paradigms that are going to be characteristic of all libraries that are geared to stay relevant within the dynamic technology environment. As also discussed by Keating and Hafner (2002), within this new electronic environment, it is still important to provide services and resources tailored to users’ individual needs. Academic libraries’ efforts to meet these needs will lead to continued innovation and changes as enabled by technology.

Penn State and other electronic libraries that have emerged over the past decade. The electronic library environment that has evolved is still grounded in the same librarianship values. The attributes of the information being sought remain the same, i.e. accurate, timely, verifiable, relevant, in an appropriate form, appropriate in scope and appropriate in depth. However, the facilitating technologies are changing rapidly, and moreover, pressure for access is increasing drastically because of a heightened recognition of information as an important resource in our daily lives as well as within business and academia. The librarian is now playing the role of facilitator in the information-seeking process. There has been a shift from direct mediation such as bibliographic instruction, personal assistance, and mediated online searching to indirect mediation such as information packaging, developing online pathfinders, and creating a virtual library research experience for the user in cyberspace. Although most users may no longer see the librarian, their experience is still very much influenced by the work that the librarian is doing in the technology background. The Gateway Library at Penn State facilitates access to and navigation of electronic resources in an integrated technology environment, and offers value-added services such as electronic reference and research mentoring to support the use of these resources. As a result of the changing nature of technology, resources and services being offered in a technology-driven environment can only be of an evolutionary nature. The services and facilities that are currently being offered by the Gateway Libraries at Penn State are expected to continue to develop on an evolutionary basis, in response to user needs, as well as proactively as staff continue to explore new technologies on the horizon, harnessing them to offer new and better services as enabled by the technology.

Notes 1 See www.lias.psu.edu/gateway/referenceshelf/ 2 See www.libraries.psu.edu/gateway/FAQ/ 3 See www.de2.psu.edu/faculty/saw4/vrs/

Conclusion Electronic services librarianship is all about experimentation, development of new resources, innovation, technology watch, new competencies for the information professional, frequent retooling as the technologies will inevitably change, and user instruction to empower users to use the electronic resources effectively. All of these characterize the concept of the Gateway Library at

References Bailey, R. and Tierney, B. (2002), “Information commons redux: concept, evolution, and transcending the tragedy of the commons”, The Journal of Academic Librarianship, Vol. 28 No. 5, pp. 277-86.

225

PSU Gateway Library: electronic library in transition

Library Hi Tech

Lesley M. Moyo

Volume 22 · Number 2 · 2004 · 217-226

Beagle, D. (2002), “Extending the information commons: from instructional testbed to Internet”, The Journal of Academic Librarianship, Vol. 28 No. 5, pp. 287-96. Cowgill, A., Beam, J. and Wess, L. (2001), “Implementing an information commons in a university library”, The Journal of Academic Librarianship, Vol. 27 No. 6, pp. 432-9. Dowler, L. (1997), “Gateways to knowledge: a new direction for the Harvard College Library”, in Dowler, L. (Ed.), Gateways to Knowledge: The Role of Academic Libraries in Teaching, Learning and Research, MIT Press, Cambridge, MA, pp. 95-107. Dowler, L. and Farwell, L. (1996), “The gateway: a bridge to the library of the future”, Reference Services Review, Vol. 24 No. 2, pp. 7-11. Dugan, R.E. (2002), “Information technology budgets and costs: do you know what your information technology costs each year?”, The Journal of Academic Librarianship, Vol. 28 No. 4, pp. 238-43. Fountain, L.M. (2000), “Trends in Web-based services in academic libraries”, in Fetcher, P.D. and Bertot, J.C. (Eds), World Libraries on the Information Superhighway: Preparing for the Challenges of the New Millennium, Idea Group Publishing, Hershey, PA, pp. 80-94.

Guenther, K. (2000), “From information finders to product designers”, Computers in Libraries, Vol. 20 No. 3, pp. 61-3. Keating, J.J. III and Hafner, A.W. (2002), “Supporting individual library patrons with information technologies: emerging one-to-one library services on the college or university campus”, The Journal of Academic Librarianship, Vol. 28 No. 6, pp. 426-9. Lyle, H. (1999), “Circulating laptop computers at West Virginia University”, Information Outlook, Vol. 3 No. 11, pp. 30-2. Moyo, L. and Robinson, A. (2001), “Beyond research guidance: Gateway Library Research Mentoring Program”, Library Management, Vol. 22 No. 8/9, pp. 343-50. Penn State University Libraries (1995), “Gateway concept statement”, available at: www.libraries.psu.edu/gateway/ history/gconcept.htm Saunders, L.M. (1999), “The human element in the virtual library”, Library Trends, Vol. 47 No. 4, pp. 771-87. Vaughan, J. and Burnes, B. (2002), “Bringing them in and checking them out: laptop use in the modern academic library”, Information Technology and Libraries, Vol. 21 No. 2, pp. 52-62.

226

Introduction

ProPrint world-wide print-on-demand services for study and research

ProPrint currently offers an on-demand print service within Germany for over 2,000 monographs and 1,000 journals[1]. It does this by enabling virtual connections between document servers so that information within the metadata of all servers can be searched with a single search interface. This service provides users with unlimited access to distributed electronic documents via the central ProPrint search engine and sets up an efficient print-on-demand workflow for those libraries that publish electronic information.

Elmar Mittler and Matthias Schulz

The idea The authors Elmar Mittler is Director, Go¨ttingen State and University Library, Go¨ttingen, Germany. Matthias Schulz is an Informatics Specialist, Computer and Media Services, Humboldt University, Berlin, Germany.

Keywords Libraries, Digital libraries, Print media, Demand, Germany

Abstract The libraries of more and more universities and research institutions have local digital repositories, and the amount of material is increasing every day. Users need an integrated retrieval interface that allows aggregated searching across multiple document servers without having to resort to manual processes. ProPrint offers an on-demand print service within Germany for over 2,000 monographs and 1,000 journals. Partners worldwide are now invited to join.

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 227-230 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543548

The libraries of more and more universities and research institutions have local digital repositories, and the amount of material is increasing every day. Users need an integrated retrieval interface that allows aggregated searching across multiple document servers without having to resort to manual processes. To do this, networking the document servers using standards like the Open Archive Initiative (OAI) metadata harvesting protocol is a prerequisite. These document servers are not expected to replace printed information. Experience shows that the increased availability of digital information enhances the desire to print. Most people resort to the convenience of desktop printing, but there is also an ongoing demand to have loose pages bound in a book-like format, especially if the material is for long-term use. Such a book may be also unique, for example a compilation of texts on a particular subject taken from multiple sources. ProPrint is a software solution for this kind of print on demand using OAI-standards and (at least for the next few years) PDF as output format. Part of the idea behind the use of PDF as a quasi-standardized print format is to enable its use worldwide. Data hosted by one or more remote servers can (in principle) be printed anywhere in the world that has the necessary technical facilities. Instead of shipping a print-on-demand work, users can fetch it locally, whether in Hong Kong, New York or New Delhi. ProPrint is currently far from realizing this dream, but the Computing and Media Services of Humboldt University in Berlin and the Go¨ttingen State and University Library have taken the first steps into this direction though a grant by the Received 9 February 2004 Revised 17 March 2004 Accepted 19 March 2004

227

ProPrint world-wide print-on-demand services for study and research

Library Hi Tech

Elmar Mittler and Matthias Schulz

Volume 22 · Number 2 · 2004 · 227-230

Deutsches Forschungsnetz (DFN) and the Ministry of Education and Research (BMBF). Partners worldwide are now invited to join.

The ProPrint project Within the last five years most German universities have set up local document servers in order to distribute and archive digitally born scientific literature. Most these universities have also changed or are changing their graduation rules in order to allow the publication of doctoral theses in digital format, rather than on paper. At the same time, university libraries are digitizing historical material in order to make it available via the Internet, and scholarly document servers are hosting more and more electronic journals as a long-term solution to the crisis in scholarly publication. The goal of all these activities is to make information available worldwide in a costeffective way. The Computer and Media Services of Humboldt University is one of the leading providers of theses in digital format and the State and University Library of Go¨ttingen is one of Germany’s biggest Digitization Centers. Together they started the ProPrint project in November 2000 by merging the content of the servers of Humboldt and Go¨ttingen. The first phase was completed in summer 2003. The ProPrint developers use a “LAMP” system that includes a Linux operating system, an Apache web server, a MySQL database, and PHP for programming. These components were chosen because they were free and freely available for others to use. Consistent standards for communication, for document formats, and for metadata have been and remain a core concern, and the central ProPrint search engine is an OAI-compliant service provider. The OAI protocol uses Hypertext Transfer Protocol (HTTP), Dublin Core Metadata and eXtensible Markup Language (XML). OAI enables an efficient exchange of metadata between data and service providers using an asynchronous retrieval in which service providers regularly send requests to data providers and then store the metadata in local repositories. Search requests from end users run against these local repositories. An OAI compliant document server has to deliver Dublin Core as its metadata format. ProPrint has extended the Dublin Core metadata set with specific elements that are defined in a separate (ProPrint) name space. This includes structural elements of the DIEPER metadata set[2] (e.g. chapters, sub-chapters and page formats) and sales metadata. Since the digital

documents of the servers in Berlin and Go¨ttingen had different page formats, the metadata allow for partial structures. The results of these developments are: . a service where users can order a single document or compose a new one from parts of others in the ProPrint database; and . print-on-demand for these documents that can be made available locally.

Technical details The ProPrint Web service is based on dynamic Web pages. During implementation, the specifications shown below were used. Metadata Development of a central uniform metadata[3] set for all document servers that allows extensibility to include other document types and objects (such as maps). Like Dublin Core (DC), a ProPrint name space and a ProPrint application profile have been developed. The metadata are regularly updated between the document servers. Documents ProPrint produces a printable document either from single documents or from a patron-selected compilation of multiple documents from more than one ProPrint server (see Figure 1). Access rights control Users, administrators and print service providers each have specific profiles and different access rights to parts of ProPrint. Designated administrators control these rights. Print Decentralized professional print shops distribute the documents in printed and bound form to the ProPrint user. ProPrint supports sorting of colour and black-and-white pages, thus allowing the automatic use of separate output devices. Users without access to a local ProPrint outlet copy shop can order their book directly from Go¨ttingen. Billing and sale Go¨ttingen prepares the bills for the printing and binding via a workflow that integrates the local print shops and Go¨ttingen’s accounting department. Documents are checked for number of pages and colouring. Pricing per page is determined for black and white and for colour pages. ProPrint generates a printable invoice that includes the price plus an invoice number.

228

ProPrint world-wide print-on-demand services for study and research

Library Hi Tech

Elmar Mittler and Matthias Schulz

Volume 22 · Number 2 · 2004 · 227-230

Figure 1 ProPrint workflow

System administration ProPrint can be administered either directly on the server or via a Web interface. The administrative interface includes the following functions: . text/language (including the help text); . user administration; . OAI server; . calendar/tasks; . print services (including the integration of professional print shops); and . invoice numbering.

.

.

The use of PDF ProPrint requires standardized formats and a certain level of technical quality in order to enable the document exchange between servers. ProPrint uses Adobe’s PDF format, but the technical quality of documents can vary due to scanning mistakes or upgrades to higher PDF versions. This became clear during a test involving document servers from the Technical University Library in Stuttgart and the University Computing Center at the Technical University of Chemnitz. When representative documents were tested by trying to print them at a professional print shop, recurring failures led to the following recommendations: . Embedding of fonts and pre-checking of documents. If the PDF format is going to be used for archiving, all fonts used should be embedded using a distiller or other software tools, even though this expands file size and

increases the load time for Web use. Documents must be checked thoroughly prior to archiving on a server; No security settings for PDF files. To enable archiving in PDF format, security settings and user restriction have to be avoided. Converting the PDF file via Ghostscript into a postscript file and back to a PDF file erases the security settings and user restrictions; and Checking for coloured text. For ease of use and cost effectiveness, documents should be free of redundant colour elements. For example the automatic colouring of URLs in Microsoft Word can cause a document to be considered a “color document” by the printer, and increase the cost. Authors should be warned to avoid automated settings that cause colour-coding.

Electronic publishing in Germany The German Initiative for Networked Information (DINI) is the association of computer centers, libraries and media centers in Germany, comparable to the Coalition for Networked Information (CNI) in the USA. Its working group on electronic publishing sponsored a study in 2003 that found a lack of standardization in local repositories in German universities and research institutions. The main findings were: . in most cases, electronic documents are stored on the local server and are accessible via the OPAC;

229

. .

.

ProPrint world-wide print-on-demand services for study and research

Library Hi Tech

Elmar Mittler and Matthias Schulz

Volume 22 · Number 2 · 2004 · 227-230

the preferred document format is PDF; 40 per cent of these local servers have written standards for electronic documents; and 23 per cent deliver detailed bibliographic, administrative and technical metadata.

The working group wanted to promote the idea of standardized local repositories. After publishing a booklet about university servers, the working group developed a certification process for OAI compliant document and publication servers, with the main focus on server technology and quality management for electronic documents[4]. This document was published in January 2004. DINIcertified local repositories will be ideal partners of ProPrint in the future. See Figure 2 for a diagram of the ProPrint service in Germany.

The future of ProPrint Today, over 4,000 different documents are available through ProPrint from the document servers of Humboldt University in Berlin and the State and University Library in Go¨ttingen. These

include monographs, journals, digitized documents from historical archives, dissertations, master’s theses, and publications from three national and international conferences. Including these heterogeneous materials, each of which posed a different challenge, has proven Proprint’s stability. ProPrint can handle digitally born as well as converted and retro-digitized works. ProPrint was designed intentionally as an open system so that every document server provider can use it. ProPrint is not restricted to public institutions, and offers useful functionality for commercial content providers. However, it is designed as a cooperative activity of information institutions. ProPrint is striving to build a networked environment for quality publications in local repositories with a distribution system that facilitates customized printing on demand. So far ProPrint combines the document servers of Humboldt and Go¨ttingen. The University Library of Graz will be the next library in the group. New partners from all over the world are welcome. For contact information or the names of project staff, see the Appendix.

Notes 1 2 3 4

Figure 2 Scheme of the ProPrint service in Germany

See http://edoc.hu-berlin.de/proprint/flyer-en.pdf See http://gdz.sub.uni-goettingen.de/dieper/ See www.edoc.hu-berlin.de/proprint/ See www.dini.de/documents/Zertifikat.pdf (English version in preparation).

Appendix Website: www.proprint-service.de E-mail contact: [email protected]

Project staff . Project direction: Professor Dr Elmar Mittler, Dr Peter Schirmbacher. . Management: Hans J. Becker, Matthias Schulz. . Scientific assistants: Dr Karen Strehlow, Andres Imhof, Susanne Dobratz, Antje Groth. . Library staff: Dr Michael Voss . Student workers: Hans-Werner Hilse, Cliff Richter. . Technical assistants: Ju¨rgen Braun, Markus Enders, Thomas Fischer, Frank Klaproth, Winfried Mu¨hl.

230

About XML Patently ridiculous Judith Wusteman

The author

The open alternative In May 2003, the local government in Munich voted to delete Microsoft Windows from its 14,000 computers and to install Linux, an open source operating system. Microsoft was so concerned that its chief executive, Steve Ballmer, interrupted his skiing holiday in Switzerland to try and persuade Munich’s mayor to change his mind, but in vain (The Economist, 2003). The Munich decision reflects a worldwide trend: governments across Latin America, Europe and Asia are moving towards open source software. In some cases, the impetus is economic, in some political, and in some it is based on a respect for good software. Governments do not want to be reliant on proprietary standards or tied to commercial vendors, particularly when their products have a history of unreliability and poor security.

Judith Wusteman is based in the Department of Library and Information Studies, University College Dublin, Dublin, Ireland.

Keywords Computer software, Public domain software, Extensible Markup Language, Libraries

Abstract The Open Source Software movement has much to offer the library community. But can it survive the onslaught of patent applications?

Electronic access The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

Open source software The term “open source software” (OSS) was not coined until 1998[1], but the movement evolved from the Free Software initiative[2] that emerged in the 1970s. One of free software’s most influential figures is Richard Stallman, author of the Emacs Editor and founder of the Free Software Foundation[3], which has overseen the creation of GNU operating system components. It was Stallman who coined the terms “freeware” and “copyleft” to describe concepts very similar to those epitomised in today’s OSS. Stallman explains the origins of the term “copyleft” as follows[4]: Proprietary software developers use copyright to take away the users’ freedom; we use copyright to guarantee their freedom. That’s why we reverse the name, changing “copyright” into “copyleft”.

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

Stallman used his concept of copyleft to license GNU. The GNU General Public License (GNU GPL, or simply GPL), forms the basis of the licence that is still used by much of the OSS community today. However, use of the GPL can prevent the code it licences being included in commercial software. When Netscape released its source code in 1998, it used a license that would facilitate such commercial uses. The term “open source” was created to cover this wider definition of free software. All W3C software is described as “Open Source/Free Software, and GPL compatible”[5]. Software labelled as “open source” implies much more than simply access to its source code. It also Library Hi Tech Volume 22 · Number 2 · 2004 · pp. 231-237 q Emerald Group Publishing Limited · ISSN 0737-8831 DOI 10.1108/07378830410543557

Received 11 March 2004 Revised 12 March 2004 Accepted 19 March 2004

231

Patently ridiculous

Library Hi Tech

Judith Wusteman

Volume 22 · Number 2 · 2004 · 231-237

requires that the software be freely available to any party for any purpose, including all forms of modification or extension. Any such derivations must be distributed under the same open source terms[1]. The extensive list of software that has emerged from the OSS stable is impressive, and includes operating systems such as GNU/Linux, programming languages such as Perl[6] and PHP[7], Web servers such as Apache[8], the OpenOffice suite[9], the Mozilla browser[10] and databases such as MySQL[11] and PostGreSQL[12].

movement is widely established. Many versions of Linux are maintained and sold for a profit by commercial companies such as Red Hat[13]. Such companies make their profit by packaging the software and improving the ease of installation and maintenance. Other companies offer commercial products that build on open source technology to add extra features. The IBM Web Server, for example, builds on the Apache server. Companies such as IBM and Oracle have ported software to Linux. In addition, commercial training and support are available for the most widely used OSS such as Linux and Apache (Bretthauer, 2002). Some companies that develop OSS offer open source and commercial licences for the same software[14]. The latter licences may incorporate fewer restrictions on use than the former. The MySQL commercial licence, for example, does not require developers of commercial applications incorporating MySQL to make their source code freely available[15]. An open source business model is not only viable, it can be very profitable. IBM attracted great attention when it invested $1 billion in improving Linux in 2001, but it recouped most of this in sales in the first year. And, in 2002, Hewlett Packard and IBM reported $3.5 billion of Linux-related revenue (Orzech, 2003). OSS is now mainstream; as OSS advocate Bruce Perens points out (Boyd, 2004): “We are no longer isolated geeks making a system only we know is good”.

OSS development OSS development methods vary between what the OSS guru Eric Raymond describes as the “cathedral” and the “bazaar” approaches. GNU and Apache epitomise the former, in that development proceeded “in a carefully coordinated way by a small, tightly-knit group of people”. Linux epitomises the latter approach in being “rather casually hacked on by huge numbers of volunteers coordinating over the Internet” (Raymond, 2000). Approximately 10,000 developers have contributed to the production of the Linux kernel. As a leaked Microsoft (1998) strategy memorandum points out: The ability of the OSS process to collect and harness the collective IQ of thousands of individuals across the Internet is simply amazing.

Both approaches to development can result in software of higher quality and greater stability than that of many commercial rivals, as the examples above illustrate. In addition, most mature open source projects provide a Web site and discussion lists for users and developers, as well as other documentation. The advantages of OSS for information users and developers are manifold. As well as the obvious economic advantage, there is also the major bonus for the technically minded of being able to fix bugs themselves rather than having to report them to a commercial company that may not get around to solving problems for weeks or even months. Similarly, products can be customised locally to fit a particular need, again without the delay of contacting the company.

OSS and commerce Although a few of the more extreme proponents of free software, such as Richard Stallman, oppose the concept of proprietary software entirely, the involvement of commercial interests in the OSS

OSS and libraries Not surprisingly, the development and use of OSS in libraries is growing rapidly. The benevolent nature of the open source ideal fits well with librarianship culture. As Eric Lease Morgan (2000), one of the most enthusiastic proponents of OSS in libraries, explains: Open source software development and librarianship have a number of similarities – both are examples of gift cultures . . . and gain reputation by the amount of “stuff” they give away.

The list of OSS applications that have found uses in libraries is long and is growing. As well as general tools, such as those already mentioned, specific library applications have also been created for all areas of library technology. The oss4lib Web site[16], itself maintained as an open source project, provides links to dozens of examples. Morgan (2003) summarises some of the categories as follows. An example project is listed for each category. . document delivery applications (Prospero[17]);

232

. .

.

. .

Patently ridiculous

Library Hi Tech

Judith Wusteman

Volume 22 · Number 2 · 2004 · 231-237

Z39.50 clients and servers (Yaz[18]); systems to manage collections (Greenstone[19]); MARC record readers and writers (XMLMARC[20]); integrated library systems (Koha[21]); and systems to read and write bibliographies (bp[22]).

Central to an increasing number of these applications is XML. The concepts behind OSS have now spread to other creative content such as Web sites, scholarship, music, film, photography, literature and courseware. Creative Commons[23] is an attempt to make such content freely available for copying and creative reuse.

OSS as threat Unfortunately, OSS is perceived by some organisations as a major threat to product dominance and revenue. In public, Microsoft representatives have dismissed OSS, and Linux in particular, as “PacMan-like” (Ricciuti, 2001), “a cancer, unAmerican” (McMillan, 2004) and “communist” (The Economist, 2003). However, in a leaked strategy memorandum, Microsoft (1998) admits that: The intrinsic parallelism and free idea exchange in OSS has benefits that are not replicable with [Microsoft’s] current licensing model and therefore present a long term developer mindshare threat.

the XML schemata used in Office. It is also considering making other sections of Office code available, for viewing only, to certain approved clients including some governments and large corporations. Microsoft refers to this as a “shared source initiative”: it appears to have been adopted largely to calm the fears of governments concerned about “secret security backdoors” in Office (The Economist, 2003). OpenOffice.org has contributed its XML file format to OASIS, the Organization for the Advancement of Structured Information Standards, with the aim of standardising formats amongst the various open office suites. The developers of OpenOffice.org believe that XML can allow the user to “regain ownership to his/her own data, by allowing access and manipulation of office documents by arbitrary tools which support the file format”[9]. XML, announces Tidwell (2001), is “shifting the balance of power from software vendors to software users”. But how far will Microsoft allow this to go? Efforts to standardize office document formats are described as posing one of the “few viable threats to MS desktop dominance” (Gonsalves, 2003). An increasing awareness of XML has escalated the concern of Microsoft customers about being locked into proprietary formats; Microsoft has to appear to support open formats. However, cynics suggest that it cannot afford to allow its formats to be truly open – users cannot be allowed to regain ownership of their own data.

Protecting market share

One such “mindshare threat” is OpenOffice.org[9].

Open source versus shared source Like Microsoft Office, OpenOffice.org is an office software suite incorporating word processing, spreadsheet, presentation and drawing applications. Unlike Microsoft Office, OpenOffice.org is open source and multiplatform. Both products use XML file formats: in the case of Microsoft, the incorporation of XML is a recent development, only appearing in Office 2003. The ability to save documents in XML means that they should be readable by other software. In addition, both software suites have the potential to become clients for viewing and manipulating data from applications such as Web services (Becker, 2004a). Microsoft has hailed its adoption of XML technology as an illustration of its move towards openness and standards. It has recently published

It is interesting to note that, in successive versions of Word, Microsoft has found it possible to create and disseminate filters to allow the import of virtually all the major word processing formats on the market. At the same time, it is often difficult for users to read a document in the latest version of Word using a previous version of Word software. Filters may or may not be available somewhere on the Microsoft Web site: in many cases, they might as well not exist because they are so difficult to find. This could be helpful in encouraging users to upgrade. OpenOffice.org, on the other hand, appears to have no difficulty in providing filters for all versions of Word currently in use[24]. And herein lies a problem for Microsoft: how can it retain ownership of Word documents? Might copyright help? The purpose of copyright is to protect forms of expression (Lesk, 1997). Hence, it can only be used to protect software code, not the ideas or algorithms on which it is based. Thus, for example, vendors are at liberty to produce independent

233

Patently ridiculous

Library Hi Tech

Judith Wusteman

Volume 22 · Number 2 · 2004 · 231-237

implementations of the algorithms behind Microsoft Word. This “reverse engineering” process is both legal and common practice. Patents, on the other hand, do not protect forms of expression, but the devices or processes themselves.

The hardware industry is enveloped in patents. But it is a long time since Jobs and Wozniak designed the Apple I computer in Job’s bedroom and built it in his parents’ garage. These days, by and large, only major companies develop computer hardware. By comparison, there are hundreds of thousands of lone software developers and small groups creating useful software applications. This number includes thousands of librarians, often altruistically sharing their code with the international library community. For an invention to be patented, its developers must prove that it is a novel and non-obvious idea. “Prior art” refers to the technology relevant to an invention that is publicly available at the time that the invention is made. For a patent application to be accepted, the invention in question has to be distinguished from any prior art. It appears that in the US, the search for prior art goes no further than existing patents. If a patent does not yet exist in an area, the US Patent and Trademark Office (PTO) assumes that there is no prior art (Ulbricht, 1999). It is irrelevant if a dozen companies have already developed and are using something similar. If they have not patented their idea, that’s their problem. Most hardware is patented, and has been for years: a search of patents is fairly likely to show prior art if it exists. Most software has not been patented. The nature of the software industry and the existence of the OSS culture makes the patents approach particularly unfair. Traditionally, patents have only been used for “concrete and physical inventions”[26]. Software and other abstract subjects such as mathematics have been regarded by law in many countries as falling outside the scope of patentable products. In recent years, however, the European Patent Office has granted more than 30,000 software patents. As the Foundation for a Free Information Infrastructure (FFII) comments: “the patent system has gone out of control”. The FFII blames a “closed community of patent lawyers” that is “creating, breaking and rewriting its own rules without much supervision from the outside”[26]. Within Europe, a battle is in progress between those organisations and governments that support the legitimisation of this process and those that oppose it. Microsoft itself has suffered from this trend in the form of the Eolas patent suit[27]. In 2003, Eolas, a one-person company, sued Microsoft for breach of patent relating to the automatic launching of embedded objects such as Flash, Real Player and PDF readers in Internet Explorer. Microsoft lost but the case gained it sympathy from unusual quarters. The case also put the W3C on its guard and a W3C Patent Policy[28] was quickly drawn up that “all but bans the use of patented technologies in its recommendations” (Festa, 2003). However,

The new weapon A leaked Microsoft (2002) study implies that Microsoft’s battle against OSS has not been as successful as it might have wished. However: Seventy-four percent of Americans and 82% of Swedes stated that the risk of being sued over Linux patent violations made them feel less favourably towards Linux. This was the only message that had a strong impact with any audience (my emphasis).

As with Linux, so with other competition. If, for example, Microsoft could claim to have invented new techniques for storing or manipulating Word documents using XML, it could then patent these techniques and prevent other vendors from using them. And this is what Microsoft is currently attempting. Microsoft is not the only organisation interested in patenting XML. As of February 2004, there are 101 XML-related patents pending at the US Patent and Trademark Office. But Microsoft is sponsoring 56 of them (Loli-Queru, 2004). Among the 56 are several concerning the processing of XML by Office. The patents will affect software, such as OpenOffice.org, that interoperates with Word through XML. For example, it could prevent competing applications from opening XML files created in Office without licensing the patent (Cover Pages, 2003). Office is the overwhelmingly dominant product in this sphere, and interoperability with Word is essential to the success of any word processor. Using XML in applications such as word processing is hardly novel. Microsoft claims that the ideas being patented are unique because they describe a method of storing all document information in one file rather than several, as is the case in OpenOffice.org. This rather stretches the definition of “unique”.

Software patents The patent is a relatively new class of weapon in the software world. Apart from the high-profile Unisys-LZW patent case over the GIF format[25], it is only in more recent years that software patents have been widely discussed.

234

Patently ridiculous

Library Hi Tech

Judith Wusteman

Volume 22 · Number 2 · 2004 · 231-237

in March 2004, the US PTO invalidated one of Eolas’s central claims. The outcome of a review is awaited. If the case fails, the patent will be one of only 152 out of nearly 4 million patents awarded since 1988 to be invalidated (Reuters, 2004).

We’re looking at a future where only the very largest companies will be able to implement software, and it will technically be illegal for other people to do so.

Attacking OSS FUD Many patent applications are blatantly silly, but they can still cause problems. Microsoft’s applications relating to XML and word processing are unlikely to succeed: there are too many “precedents for applications sharing XML data” (Becker, 2004a). But they may still cause problems. Even applying for a patent can cut out competition. Contesting a claimed patent infringement is prohibitively expensive for small firms. According to Stanford’s John Barton, such suits are “among the most expensive kind of litigation in the US today” (Pascual and Fernandez, 2000). Large companies have been known to drag out cases for years: small software companies have become bankrupt even before the case is decided. Such claimed infringements would be particularly difficult for loose collection of OSS developers to fight. The FUD (“Fear, Uncertainty, Doubt” (FUD)[29]) that a patent threat engenders simply results in an avoidance of such areas of research and development by all but the largest companies. As Perens points out (Boyd, 2004): [Y]ou can never finish a patent search. The definitions are so broad, you can’t ever be sure a company would or would not assert their patent on what you are doing.

Microsoft states that, in increasing its use of patents, it is simply following “the precedent of other technology companies that have had licensing programs in place for some time, such as Intel, IBM, Hewlett-Packard and Fujitsu” (Fried, 2003). Its actions are, says Microsoft, “standard moves for the company to protect its innovations and don’t affect its commitment to openly sharing the XML schemas used by Office” (Becker, 2004b). However, history should make us wary. FAT (file allocation table) technology is the software used to format hard drives and floppies. It is far from ideal, but has become the standard method of formatting such storage devices. Cleartype is a font display technology. Both of these standards have been patented by Microsoft for some time. They both have a large user base. Microsoft has recently decided to require licences for their use (Becker, 2004a). The worst case scenario, as described by Perens (Boyd, 2004), is bleak:

In March 2003, a company called SCO began an action against IBM, claiming that the latter had illegally donated code to Linux. This code, it asserted, belonged to SCO’s version of Unix, System V Unix. IBM counter-sued, claiming that SCO had released this code into the public domain by releasing a Linux distribution covered by the GNU GPL. SCO has announced a challenge to the legality of the GPL. It claims that the GPL violates the US Constitution, as well as copyright, antitrust and export control laws (Shankland, 2003). In January 2004, SCO wrote to all 535 members of the United States Congress to explain how the use of Linux and OSS was a “threat to the security and economy of the US” (McMillan, 2004). Ironically, SCO has used GPL-licensed software in some of its products. In addition, SCO has been severely criticised for failing to back up most of its claims with proof. Some commentators have suggested that, by paying an undisclosed amount of money to SCO for a Unix license, Microsoft is indirectly funding the SCO lawsuit. Some go as far as suggesting that software sales are now a secondary activity for SCO, its main function and source of income being the Linux lawsuit (McMillan, 2004). US federal regulators may have begun investigating the two companies in relation to these and other allegations (Preimesberger, 2004). Meanwhile, the management of SCO have become hate figures to some in the broader OSS community. Unfortunately, an extremist chose to demonstrate this by launching an email virus, “mydoom”, which attacked the SCO site on January 31, 2004 (Kotadia, 2004). The GPL has never been tested in a court of law: this uncertainly in relation to its legal status makes some lawyers nervous, and they welcome the SCO lawsuit. If the latter fails, confidence in the OSS sector will increase. However, as Perens points out (Boyd, 2004): “[T]he real threat to Linux and the open source movement is not from the SCO lawsuits, but from software patents”.

What should librarians do? In the spring of 2000, the oss4lib mailing list hosted a debate on how the library profession could best take advantage of OSS. The themes that

235

Patently ridiculous

Library Hi Tech

Judith Wusteman

Volume 22 · Number 2 · 2004 · 231-237

emerged are discussed in detail by Morgan (2003). They include a call for national leadership by library organisations in funding and facilitating methods to provide “credibility, publicity, stability, and coordination” to library-based OSS projects. Also debated was the extent to which the current generation of library applications is “beyond [the] control” of librarians. OSS offers more control to librarians. But, given the uncertainly caused by lawsuits and patents, should librarians be using OSS? Consideration of the alternative may help put the discussion in context. Imagine a scenario in which a major digital library of several million documents is archived in Microsoft Office 2003 Word format. It can be saved as XML so, of course, it is future-proof and hence an appropriate archival format. After a couple of years, Microsoft upgrades to Word 2006. A couple of years later, it upgrades again, this time to Word 2008. At this point, Microsoft “sunsets” Word XP, that is, it ceases to support it. Word 2008 may be able to read Word 2006 files but history tells us that it may not be able to read Windows 2003 files. But the files are in XML so it should be easy enough to create a reader for them – except that there’s a patent on the format so this would be illegal until that patent has expired. The result is several million unreadable documents. Archiving documents in formats encumbered by patents will always be a bad idea.

Notes 1 The Open Source Definition, available at: www.opensource.org/docs/definition.php 2 The Free Software Definition,available at: www.gnu.org/ philosophy/free-sw.html 3 Free Software Foundation, www.gnu.org/fsf/fsf.html 4 “What is copyleft?”, Free Software Foundation, available at: www.gnu.org/copyleft/copyleft.html 5 W3C Open Source Software, www.w3.org/Status 6 Perl, www.perl.org/ 7 PHP, www.php.net/ 8 Apache, http://httpd.apache.org 9 OpenOffice.org, www.openoffice.org 10 Mozilla, www.mozilla.org 11 MySQL, www.mysql.com/ 12 PostGreSQL, www.postgresql.org/ 13 Red Hat, www.redhat.com/ 14 Open Source case for business, available at: www.opensource.org/advocacy/case_for_business.php 15 MySQL licensing policy, available at: www.mysql.com/ products/licensing.html 16 oss4lib: open source systems for libraries, www.oss4lib.org/ 17 Prospero, http://bones.med.ohio-state.edu/prospero/ 18 Yaz, www.indexdata.dk/yaz/ 19 Greenstone, www.greenstone.org/cgi-bin/library 20 XMLMARC, http://laneweb.stanford.edu:2380/wiki/ medlane/xmlmarc 21 Koha, www.koha.org/

22 bp, a Perl bibliography package, www.ecst.csuchico.edu/ ,jacobsd/bib/bp/index.html 23 Creative Commons, http://creativecommons.org/learn/ aboutus/ 24 OpenOffice filter description, http://framework.openoffice. org/files/documents/25/897/filter_description.html 25 “LZW patent and software information”, available at: www.unisys.com/about__unisys/lzw/ 26 Foundation for a Free Information Infrastructure (FFII), “Software patents in Europe”, available at: http://swpat. ffii.org/ 27 “FAQ on US Patent 5,838,906 and the W3C”, available at: http://www.w3.org/2003/09/public-faq 28 “W3C patent policy”, February 5, 2004, available at: www.w3.org/Consortium/Patent-Policy-20040205/ 29 FUD – a whatis definition, available at: http://whatis. techtarget.com/definition/0,sid9_gci214113,00.html

References Becker, D. (2004a), “Microsoft seeks XML-related patents”, CNET News.com, available at: http://news.com.com/ 2100-1013_3-5146581.html (accessed 23 January). Becker, D. (2004b), “Microsoft: XML patent moves are no big deal”, CNET News.com, available at: http://news.com. com/2100-1013-5147390.html (accessed 26 January). Boyd, C. (2004), “Software patents ‘threaten Linux’”, BBC News, World Edition, available at: http://news.bbc.co.uk/2/hi/ technology/3422853.stm (accessed 23 January). Bretthauer, D. (2002), “Open source software: a history”, Information Technology and Libraries, Vol. 21 No. 1, March, available at: www.ala.org/ala/lita/litapublications/ ital/2101bretthauer.htm Cover Pages (2003), “Microsoft announces licenses for use of Office 2003 XML reference schemas”, Cover Pages, available at: http://xml.coverpages.org/ni2003-11-18-a. html (accessed 18 November). (The) Economist (2003), “Microsoft at the power point”, The Economist, available at: www.economist.com/ business/displayStory.cfm?story_id ¼ 2054746 (accessed 11 September). Festa, P. (2003), “Web patent critics spotlight old technology”, CNET News.com, available at: http://zdnet.com.com/ 2100-1104-5100693.html (accessed 31 October). Fried, I. (2003), “Microsoft opens technology to more licensing”, CNET News.com, available at: http://news.com.com/ 2100-1012-5113033.html (accessed 3 December). Gonsalves, A. (2003), “XML work poses one of few viable threats to MS desktop dominance”, InternetWeek, available at: www.internetweek.com/story/ showArticle.jhtml?articleID ¼ 8800539 (accessed 22 April). Kotadia, M. (2004), “SCO recovers from MyDoom”, Znet (UK), available at: http://zdnet.com.com/2100-1105_2-5171499. html (accessed 8 March). Lesk, M. (1997), Practical Digital Libraries: Books, Bytes and Bucks, Morgan Kaufmann, San Mateo, CA, July. Loli-Queru, E. (2004), “XML patent paradox”, OSNews, available at: www.osnews.com/story.php?news_id¼6068 (accessed 10 February). McMillan, R. (2004), “SCO to Congress: Linux hurts the US”, Computer World, available at: www.computerworld.com/ softwaretopics/os/linux/story/0,10801,89335,00.html (accessed 23 January).

236

Patently ridiculous

Library Hi Tech

Judith Wusteman

Volume 22 · Number 2 · 2004 · 231-237

Microsoft (1998), “Open source software: a (new?) development methodology”, Revision 1.00, available as Halloween Document I (Version 1.14) at: www.opensource.org/ halloween/halloween1.php (accessed 8 November). Microsoft (2002), “Research e-bulletin: attitudes towards shared source and open source research study”, September, available as Halloween Document VII at: www.opensource.org/halloween/halloween7.php Morgan, E.L. (2000), “Gift cultures, librarianship, and open source software development”, available at: www.infomotions.com/musings/gift-cultures.shtml (accessed 28 December). Morgan, E.L. (2003), “Possibilities for open source software in libraries”, Information Technology and Libraries, Vol. 21 No. 1, March, available at: www.ala.org/ala/lita/ litapublications/ital/2101morgan.htm Orzech, D. (2003), “Can you make money selling Linux? Try $3.5 billion”, CIO Information Network, available at: www.cioupdate.com/news/article.php/1574431 (accessed 24 January). Pascual, J.S. and Fernandez, R.G. (2000), “Software patents and their effect on Europe”, December, available at: www.dit.upm.es/, joaquin/report_en.pdf

Preimesberger, C. (2004), “Analysis: Microsoft, SCO have a lot more explaining to do”, NewsForge, available at: www.newsforge.com/trends/04/03/08/0457259.shtml (accessed 8 March). Raymond, E.S. (2000), “A brief history of hackerdom”, available at: www.hackemate.com.ar/hacking/eng/part_00.htm (accessed 5 May). Reuters (2004), “Patent central to Microsoft: case invalidated”, available at: www.reuters.com/newsArticle. jhtml?type ¼ topNews&storyID ¼ 4509756 (accessed 5 March). Ricciuti, M. (2001), “Gates wades into open-source debate”, CNET News.com, available at: http://news.com.com/ 2100-1001-268667.html (accessed 19 June). Shankland, S. (2003), “SCO attacks open source foundation”, ZDNet, available at: www.zdnet.com.au/news/software/ 0,2000061733,20280278,00.htm (accessed 29 October). Tidwell, D. (2001), XSLT, O’Reilly & Associates, Sebastopol, CA. Ulbricht, J. (1999), “WG: [Patents] 1998 US software patent statistics”, Debate Mailing List, available at: www.fitug.de/ debate/9907/msg00284.html (accessed 24 July).

237

Book review

Library Hi Tech Volume 22 · Number 2 · 2004 · 238

Book review Keeping Current: Advanced Internet Strategies to Meet Librarian and Patron Needs Steve M. Cohen American Library Association Chicago, IL 2003 97 pp. ISBN 0-8389-0864-0 Keywords Internet, Search engines, Librarians Review DOI 10.1108/07378830410543566

“. . . the purpose of Keeping Current is to make the ‘keeping up’ process easier and less timeconsuming.” And that is certainly what the author is doing in this book. Cohen starts by discussing how the fluidity of information provided by the Web has changed its currency. He also describes the pros and cons of print and electronic resources used to keep current in the field, and comes up with his own theory. “Steven’s Theory of Currency” states that the simplest and fastest way to keep current is by making the information come to you in a compact manner by using a few simple tools. The rest of the book deals with the tools he found useful in keeping current: news on search engines, Web site monitoring software, weblogs (or “blogs” for short) and RSS Feeds (Rich Site Summary or Really Simple Syndication.) For example, he discusses the use of Web site monitoring software to track any changes in sites that the user has specified, and to notify them when the changes occur. Other examples, installation procedures, suggestions for resources, and pros and cons are also included. It is a well-written book. At less than 100 pages, the author manages to keep the content

straightforward, and therefore easy to understand. The screen shot examples tell us what to expect when using the tools and resources. The information given is interesting and useful. For example, after discussing Web site monitoring tools and RSS Feeds, the author makes a comparison between the two. In addition to his useful analysis of each tool, he provides comparisons of tools of similar functionality. Also, the index page is well designed and quite thorough so readers can find a subject easily. The author clearly invested considerable time and effort in finding good tools and resources to support his professional work. And these tools and resources are good starting points. While we can explore other resources relevant to our specific work, the author cautions us to be selective – due to the abundant resources available, it is easy to suffer from information overload. The tools he lists are ideal for librarians who want to keep abreast of what’s new in the field but don’t have enough time or expertise to tinker with the technical aspects of them. These tools are particularly useful for subject specialists and reference librarians working in special libraries. A librarian whose job is specialized in other services, such as serials cataloger, conservationist or systems librarian will also find them useful. Public and academic reference librarians will probably find it a bit overwhelming to track the wide range of queries they may receive from patrons. Still, these tools will help them stay current within the library field. As an added bonus, librarians are not the only ones who can use these tools. Anybody who wants to keep current on certain issues can also take advantage of them. The method of choosing and utilizing the tools described in this book is clear and easy to understand, both for information professionals as well as the general public. S.G. Ranti Junus Systems Librarian, Michigan State University Libraries, East Lansing, Michigan, USA

238