Aslib Proceedings - New Information Perspectives 59:4-5 British Library and Information Schools - The Research of the Department of Information Science, City University, London : British Library and Information Schools 9781846636035, 9781846636028

The papers in this e-book reflect the research and scholarly interests of the staff of the Department of Information Sci

158 85 3MB

English Pages 183 Year 2007

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Aslib Proceedings - New Information Perspectives 59:4-5 
British Library and Information Schools - The Research of the Department of Information Science, City University, London : British Library and Information Schools
 9781846636035, 9781846636028

Citation preview

ap cover (i).qxd

05/09/2007

13:31

Page 1

ISSN 0001-253X

Volume 59 Number 4/5 2007

Aslib Proceedings New information perspectives British library and information schools: the research of the Department of Information Science, City University London Guest Editor: David Bawden

www.emeraldinsight.com

Aslib Proceedings: New Information Perspectives

ISSN 0001-253X Volume 59 Number 4/5 2007

British library and information schools: the research of the Department of Information Science, City University London Guest Editor David Bawden

Access this journal online ______________________________

303

Editorial advisory board ________________________________

304

GUEST EDITORIAL Information Science at City University London David Bawden _________________________________________________

305

Organised complexity, meaning and understanding: an approach to a unified view of information for information science David Bawden _________________________________________________

307

Healthcare libraries in Saudi Arabia: analysis and recommendations Ahmad Khudair and David Bawden _______________________________

328

Impact of digital information resources in the toxicology literature Lyn Robinson _________________________________________________

342

Evaluation of web search for the information practitioner A. MacFarlane ________________________________________________

Access this journal electronically The current and past volumes of this journal are available at:

www.emeraldinsight.com/0001-253X.htm You can also search more than 150 additional Emerald journals in Emerald Management Xtra (www.emeraldinsight.com) See page following contents for full details of what your access includes.

352

CONTENTS

CONTENTS

Parallel methods for the update of partitioned inverted files

continued

A. MacFarlane, J.A. McCann and S.E. Robertson ____________________

367

Flickr and Democratic Indexing: dialogic approaches to indexing Pauline Rafferty and Rob Hidderley________________________________

397

Non-literal copying of factual information: architecture of knowledge Tamara Eisenschitz ____________________________________________

411

Mixed reality (MR) interfaces for mobile information systems David Mountain and Fotis Liarokapis ______________________________

422

Information history: its importance, relevance and future Toni Weller ___________________________________________________

437

Of the rich and the poor and other curious minds: on open access and ‘‘development’’ Jutta Haider __________________________________________________

449

Continuing professional development for library and information science: case study of a network of training centres Lyn Robinson and Audrone Glosiene_______________________________

462

Where do we go from here? An opinion on the future of LIS as an academic discipline in the UK Toni Weller and Jutta Haider ____________________________________

475

www.emeraldinsight.com/ap.htm As a subscriber to this journal, you can benefit from instant, electronic access to this title via Emerald Management Xtra. Your access includes a variety of features that increase the value of your journal subscription.

Structured abstracts Emerald structured abstracts provide consistent, clear and informative summaries of the content of the articles, allowing faster evaluation of papers.

How to access this journal electronically

Additional complimentary services available

To benefit from electronic access to this journal, please contact [email protected] A set of login details will then be provided to you. Should you wish to access via IP, please provide these details in your e-mail. Once registration is completed, your institution will have instant access to all articles through the journal’s Table of Contents page at www.emeraldinsight.com/0001-253X.htm More information about the journal is also available at www.emeraldinsight.com/ ap.htm

Your access includes a variety of features that add to the functionality and value of your journal subscription:

Our liberal institution-wide licence allows everyone within your institution to access your journal electronically, making your subscription more cost-effective. Our web site has been designed to provide you with a comprehensive, simple system that needs only minimum administration. Access is available via IP authentication or username and password.

E-mail alert services These services allow you to be kept up to date with the latest additions to the journal via e-mail, as soon as new material enters the database. Further information about the services available can be found at www.emeraldinsight.com/alerts

Emerald online training services Visit www.emeraldinsight.com/training and take an Emerald online tour to help you get the most from your subscription.

Key features of Emerald electronic journals Automatic permission to make up to 25 copies of individual articles This facility can be used for training purposes, course notes, seminars etc. This only applies to articles of which Emerald owns copyright. For further details visit www.emeraldinsight.com/ copyright Online publishing and archiving As well as current volumes of the journal, you can also gain access to past volumes on the internet via Emerald Management Xtra. You can browse or search these databases for relevant articles. Key readings This feature provides abstracts of related articles chosen by the journal editor, selected to provide readers with current awareness of interesting articles from other publications in the field. Non-article content Material in our journals such as product information, industry trends, company news, conferences, etc. is available online and can be accessed by users. Reference linking Direct links from the journal article references to abstracts of the most influential articles cited. Where possible, this link is to the full text of the article. E-mail an article Allows users to e-mail links to relevant and interesting articles to another computer for later use, reference or printing purposes.

Xtra resources and collections When you register your journal subscription online, you will gain access to Xtra resources for Librarians, Faculty, Authors, Researchers, Deans and Managers. In addition you can access Emerald Collections, which include case studies, book reviews, guru interviews and literature reviews.

Emerald Research Connections An online meeting place for the research community where researchers present their own work and interests and seek other researchers for future projects. Register yourself or search our database of researchers at www.emeraldinsight.com/ connections

Choice of access Electronic access to this journal is available via a number of channels. Our web site www.emeraldinsight.com is the recommended means of electronic access, as it provides fully searchable and value added access to the complete content of the journal. However, you can also access and search the article content of this journal through the following journal delivery services: EBSCOHost Electronic Journals Service ejournals.ebsco.com Informatics J-Gate www.j-gate.informindia.co.in Ingenta www.ingenta.com Minerva Electronic Online Services www.minerva.at OCLC FirstSearch www.oclc.org/firstsearch SilverLinker www.ovid.com SwetsWise www.swetswise.com

Emerald Customer Support For customer support and technical help contact: E-mail [email protected] Web www.emeraldinsight.com/customercharter Tel +44 (0) 1274 785278 Fax +44 (0) 1274 785201

AP 59,4/5

EDITORIAL ADVISORY BOARD

Ralph Adam Freelance Journalist, City University, London, UK

304

John Akeroyd Director, Learning Resources, South Bank University, London, UK Andrew Boyd Research Associate, CIBER, School of Library, Archive and Information Studies, University College London, UK

Dr Ben Fouche Principal, Knowledge Leadership Associates, South Africa Eti Herman Vice-Director, Library, University of Haifa, Israel

David Brown Director, Publishing Relations, The British Library, UK

Nat Lievesley Network Manager, Centre for Policy on Ageing, London, UK

Peter Chapman Information Consultant, Banbury, UK

Helen Martin Media Information Consultant, London, UK

Professor Peter Cole Head of Department, Department of Journalism, University of Sheffield, UK

Anthony Watkinson Publishing Consultant, Oxford, UK

Dr Tom Dobrowolski Senior Lecturer, Institute of Information Science and the Book, University of Warsaw, Poland

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 p. 304 # Emerald Group Publishing Limited 0001-253X

Professor David Ellis Research Director, Department of Information and Library Studies, University of Wales, Aberystwyth, UK

Richard Withey Global Director, Interactive Media, Independent News and Media, UK

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

GUEST EDITORIAL

Guest editorial

Information Science at City University London David Bawden

305

Department of Information Science, City University London, London, UK Abstract Purpose – The paper seeks to introduce a special issue of Aslib Proceedings, which contains a series of papers written by staff and research students at the Department of Information Science, City University London. Design/methodology/approach – This introductory paper introduces the other papers in the special issue and sets them in context. Findings – This editorial argues that the information science discipline, which has always been the focus of City’s research and scholarship, is a valid academic discipline with a positive future. Originality/value – The paper points out the particular strengths and historical continuity of the City Information Science Department. Keywords Information science, Research, Professional education, United Kingdom Paper type General review

The papers in this special issue reflect the research and scholarly interests of the staff of the Department of Information Science at City University London. Although the issue began with no explicitly intended theme, I believe that a theme emerges strongly: the continued vitality and variety of the information science discipline. The City Department of Information Science is the only academic department within the library/information area in the UK to have retained its departmental name unchanged since its establishment: originally as the Centre for Information Science, and later as the Department for the same. It is also the only such department to have offered a course entitled “Information Science” since its inception, and to continue to do so. We are thus well placed to represent this aspect of the library/information spectrum. It should not be thought that City retains the somewhat sceptical attitude towards librarianship for which it was known in times gone by. On the contrary, in terms of teaching, our relatively new Master’s course in Library and Information Studies is our largest course by a considerable margin. But our perspective and focus, particularly in terms of scholarship, are still very much oriented to “information” and to “science”, and I hope the papers in this issue reflect this. The papers span the whole scope of the subject. I cannot, as issue editor, claim any credit for this. I simply asked my colleagues to write something which they felt appropriate, and the resulting wide spread of content followed naturally. I am particularly pleased that several of our PhD students have contributed articles, in their own right and as co-authors.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 305-306 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817537

AP 59,4/5

306

The issue begins and ends with articles focusing on the information science discipline. My own opening contribution – somewhat more theoretical in nature, perhaps, than is usual for Aslib Proceedings – seeks to gain an understanding of information itself, which must be the basis of our subject. I attempt to base this unification on an analysis of the changing understanding of the nature of information, in the physical and biological sciences, and – by extension – to recorded human information. The closing article by Toni Weller and Jutta Haider, two of our current PhD students, argues for a strengthening of information science as an academic discipline. The traditional concerns of the department are well represented in several papers: healthcare and scientific information (Khudair and Bawden; Robinson); information retrieval (MacFarlane; MacFarlane, McCann and Robertson); information organisation (Rafferty and Hidderley); and legal and policy issues (Eisenschitz). These are complemented by newer strands of the department’s activities: geographic information (Mountain and Liarokapis); information history (Weller), and discourse analysis (Haider). Our international collaborations, which have always been an important feature, are represented by a paper authored by one of our visiting professors (Robinson and Glosiene), and dealing with issues of training for continuing professional development, which was the starting point for the City department in the form of the evening courses taught since 1961. We firmly believe in the validity of information science as an academic discipline, and in its future. This belief is reflected in the work now being begun on our ambitious LIS-RES-2030 project. This project, which will involve Delphi studies and international symposia, has the aim of setting out an agenda for LIS research for the next 20 years. Our hope is that the papers in this issue will convince readers to share our interests and our optimism. Corresponding author David Bawden can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Organised complexity, meaning and understanding An approach to a unified view of information for information science David Bawden Department of Information Science, City University London, London, UK

Complexity, meaning and understanding 307 Received 25 January 2007 Accepted 3 June 2007

Abstract Purpose – The paper seeks to outline an approach to a unified framework for understanding the concept of “information” in the physical, biological and human domains, and to see what links and interactions may be found between them. It also aims to re-examine the information science discipline, with a view to locating it in a larger context, so as to reflect on the possibility that information science may not only draw from these other disciplines, but that its insights may contribute to them. Design/methodology/approach – The paper takes the form of an extensive literature review and analysis, loosely based on the approaches of Stonier, Madden and Bates, and including analysis of both scientific and library/information literature. Findings – The paper identifies the concept of information as being identified with organised complexity in the physical domain, with meaning in context in the biological domain, and with Kvanvig’s concept of understanding in the human domain. The linking thread is laws of emergent self-organised complexity, applicable in all domains. Argues that a unified perspective for the information sciences, based on Popperian ontology, may be derived, with the possibility of not merely drawing insights from physical and biological science, but also of contributing to them. Based on Hirst’s educational philosophy, derives a definition for the information sciences around two poles: information science and library/information management. Originality/value – This is the only paper to approach the subject in this way. Keywords Information science, Information theory, Physics, Biology Paper type Conceptual paper

Introduction The seemingly empty space around us is seething with information. Much of it we cannot be aware of because our senses do not respond to it. Much of it we ignore because we have more interesting things to attend to. But we cannot ignore it if we are seeking a general theory of information. We cannot live only by reading and writing books (Brookes, 1980, p. 132).

The purpose of this paper is to outline an approach to a unified framework for understanding the concept of “information” in the physical, biological and human domains, and to see what links and interactions may be found between them. This analysis is used to re-examine the information system discipline, with a view to locating it in a larger context. In turn, this picture is used to reflect on the possibility that information science may not only draw from these other disciplines, but that its insights may contribute to them. The author is grateful to Jutta Haider, Jack Meadows and Toni Weller for insightful comments.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 307-327 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817546

AP 59,4/5

308

Because the discussion must necessarily take in some rather technical areas, such as quantum mechanics, thermodynamics, and molecular biology, I have relied to a large extent on “popular” works for literature citation; these will lead the interested reader on to the scientific literature itself. In particular, anyone seeking an accessible account of the background to these issues should consult Davies (1987, 1998), Gell-Mann (1994), Von Baeyer (2004), and Penrose (2004). The lengthy reference list is intended to draw attention to an extensive literature that appears to have been largely overlooked by the information science community. The nature of the information science discipline There has been considerable debate in the literature as to the exact nature of the information science discipline, and its place within the wider spectrum of the “information-based” disciplines and professions: see, as recent examples, Hjørland (2000), Warner (2001), Webber (2003), Sturges (2005), Zins (2006), and also, for an earlier perspective, Brittain (1980); see also the article by Weller and Haider (2007) in this issue of Aslib Proceedings. Webber’s (2003) analysis is somewhat discouraging, for those who believe in the concept of information science per se, at least in the UK, since she finds that, although three departments style themselves “Information Science” – that in itself a small proportion of the 15 (at the time of Webber’s (2003) survey) offering CILIP-accredited courses – only one has a course with that title[1]. A similar input to this issue on this is a consideration of the place of LIS departments in the faculty/school structure of their universities within Europe, established by a survey of such departments (Kajberg and Lørring, 2005; see also Webber, 2003, for a similar earlier analysis). There is a wide variety: most are in humanities faculties, but others are in social science, pure science, informatics/computing, business schools, education, etc. This illustrates the lack of any clear agreement as to where the library/information disciplines “fit” academically. This survey also showed that the only subject to be covered in all library/information programmes, without exception, was information seeking and retrieval, indicating a similar lack of clear agreement as to the necessary content of LIS education. The emergence of the “I-school” phenomenon in the USA, with its formation of integrated departments bringing together the disciplines centred on information, particularly including computing and information systems is an analogous change (Buckland, 2005; Cronin, 2005). For LIS departments in the UK, Webber (2003) draws attention to a similar process of restructuring, with only three of 15 retaining the same departmental name over a five-year period. Professionally, in both the UK and the USA, there was until recently a distinct professional body representing information science: the Institute of Information Scientists and the American Society for Information Science, respectively. Both of these have changed their identity in recent years, but in very different ways. The Institute has merged with the (UK) Library Association, to form the Chartered Institute of Library and Information Professionals. Its American equivalent has moved in another direction, explicitly including Information Technology. ASIS has become ASIST, and remains entirely separate from the American Library Association. This shows the divergent views of the relations between the various information disciplines (see Webber, 2003, for a more detailed account of this).

To bring some clarity to this picture, it is helpful to use the ideas of the educational philosopher Paul Hirst (Hirst, 1974; Hirst and Peters, 1970; Walsh, 1993). He argues that, since disciplines are closely associated with their knowledge base, we can understand a discipline by understanding its “form of knowledge”. There are, for Hirst, seven main domains or forms of knowledge, defined by the fundamental nature of the knowledge and concepts with which they deal: (1) mathematics; (2) physical sciences; (3) human sciences; (4) literature and the fine arts; (5) morality; (6) religion; and (7) philosophy. Where a discipline equates to one of these forms, it is what would be regarded as a “pure” academic subject. Hirst also recognises “practical disciplines”, based on one of the forms, but oriented toward solving practical problems. Engineering, for example, would be a practical discipline based on the form of the physical sciences. Many academic subjects, however, do not align neatly with any form. Rather they are focused on a topic or subject of interest, using any of the forms that are useful in studying and understanding it. Hirst refers to these as “fields of study”; they are typically, though not necessarily, multidisciplinary. In this sense, it seems clear than the information disciplines are best regarded as a field of study, the focus of which is recorded information. It is generally, though not universally, accepted that their main “form” is that of the human or social sciences, but other forms play a part. Given that the divergence of views about the information professions is just as great as that about the academic disciplines, we might want to say that this field of study underlies not one but several practical disciplines. The information science discipline would then be understood as: . . . a multidisciplinary field of study, involving several forms of knowledge, given coherence by a focus on the central concept of human, recorded information, and underpinning several practical disciplines.

If we accept this, then it is hardly surprising that the field has no unique place in academic structures, nor that there is continual reassessment as to how its practical disciplines relate to each other. In order to attempt a clarification, we shall need to consider the nature of “human recorded information”, the central concept for the field. The nature of information (for information science) There has been no shortage of attempts to define or explain the concept of information, as it applies to the information sciences[2] (see, for example, Bates, 2006; Noorlander, 2005; Cornelius, 2002; Losee, 1997; Meadow and Yuan, 1997; Mingers, 1997; Buckland, 1991; Liebenau and Backhouse, 1990; Belkin, 1978). Despite these efforts, there is no generally accepted and agreed understanding, at least in any degree of explicit detail, of the concept.

Complexity, meaning and understanding 309

AP 59,4/5

310

Furthermore, the concept of “information” appears in different guises, in disciplines far removed from the library/information area, including the physical and biological sciences (Bawden, 2001). The intriguing question which this raises is whether this is simply a consequence of the same word being used to denote different concepts – though clearly with something in common – or whether there is indeed a closer link, at a deep level, between the meaning of the term in these different domains. If the latter is the case, then we might hope that a better understanding of this deeper meaning, and its consequences, might be a source of a deeper “theory of information” for information science. It should also enable richer interactions between information science and the other disciplines for which information is an increasingly central concept. Three authors in particular have addressed this idea, albeit in very different ways: Tom Stonier, Andrew Madden, and Marcia Bates. Stonier (1990, 1992, 1997) makes the earliest detailed attempt to unify the concept of information in the physical, biological and human domains. Starting from the concept of information as a fundamental constituent of the physical world: Information exists. It does not need to be perceived to exist. It does not need be understood to exist. It requires no intelligence to interpret it. It does not have to have meaning to exist. It exists (Stonier, 1990, p. 22, author’s italics).

he proposes relations between information and the basic physical quantities of energy and entropy. (His most radical proposal, that information may be carried by physical particles, which he terms “infons”[3], has not been supported.) Stonier postulates that a general theory of information may be possible based on the idea that the universe is organised into a hierarchy of information levels. He moves on from this to propose an evolutionary view of intelligence, as a property of “advanced information systems”, from self-replicating crystals, through the intelligence of animal and human individuals and societies, through to machine intelligence and “super-intelligences”, and then to an examination of the origins of meaning in the workings of the brain. He identifies self-organising information processing systems as the “physical roots of intelligence”, based on his conception of information as a basic property of the universe. Madden (2004) focuses on the biological domain in his evolutionary treatment of information, examining information processing as a fundamental characteristic of most forms of life. He argues that Lamarckian evolution – the idea that characteristics acquired by a biological organism during its lifetime can be passed on to their descendants – while discredited in general biology, may be appropriate for understanding the evolution of human societies, including their information behaviour. Madden proposes, for the first time so far as the author is aware, that insights from the information sciences may be valuable to the supposedly more “basic” sciences, in this case the biological sciences, because of the commonality of the “information” concept. Bates (2005), seeking like Stonier to reconcile the physical, biological and human forms of information, takes the general definition that: “information is the pattern of organisation of everything”. All information is “natural information”, existing in the physical universe of matter and energy. “Represented information” is either “encoded” (having symbolic, linguistic or signal-based patterns of organisation) or “embodied” (encoded information expressed in physical form), and can only be found in association with living creatures. Beyond this, Bates defines three further forms of information:

(1) Information 1: the pattern of organisation of matter and energy; (2) Information 2: some pattern of organisation of matter and energy given meaning by a living being (or its constituent parts); and (3) Knowledge: information given meaning and integrated with other contents of understanding. The remainder of this paper is aimed at building upon these three approaches, to outline an approach to the expansion of a unified concept of information of relevance to information science. Following Stonier and Bates, this will seek to account for information – perhaps of different kinds – in the physical, biological and human domains, and to allow for the ideas of meaning, understanding, knowledge. Following all three authors, it will assume an evolutionary approach. This, in an information science context, evokes Karl Popper’s “evolutionary epistemology” (Popper, 1979), and Popper’s “three world” model will be used later. Full use of the insights and discoveries of the physical and biological sciences will be made, while, following Madden, we allow for the intriguing possibility that the insights of the library and information sciences may contribute to the development of the physical and biological sciences, in so far as information concepts are involved. Information in the physical domain Biological entities, including human beings, are part of the physical world. What is implied here is what Bates terms “natural information”, which is neither “embodied” nor “encoded”; the information which Stonier argues is a fundamental attribute of the physical universe: for a similar perspective, see Landauer (1991). An understanding of the ways in which “information physics” is developing is important in gaining an realistic understanding of the information concept in this domain. These matters are, however, complex, technical and often controversial, both within the physics community and outside. Only a brief account, with limited references, often to popular sources, will be given here. Since Stonier presented his thesis – following pioneers of the area such as Brillouin (1956) – the role of information in physics has become much more widely accepted: see Beckenstein (2003), Von Baeyer (2004) and Lloyd (2006) for recent popular overviews, and Siegfried (2000) and Roederer (2005) for more technical discussion; see also Leff and Rex (1990, 2003), for collections of papers on these themes over a long period of time. The new discipline of “information physics”, in which information is regarded as a fundamental feature of the universe – analogous to space, time or energy – has advanced to the extent to which one of its exponents can give as a strapline to an article the idea that “The structure of the multiverse is determined by information flow” (Deutsch, 2002). Smolin (1997, 2002) has suggested that novel physical theories may not even involve the idea of space itself, but rather a web of information, with the most fundamental events being information exchanges between physical processes. The most ambitious working out of this view so far is due to Frieden (2004), who presents a recasting of physical science in information terms, deriving an “information Lagrangian”, a mathematical structure in which fundamental physics is usually now presented, albeit usually in terms of quantities such as energy (Penrose, 2004, chapter 20). Although such views are by no means universally accepted, they are an

Complexity, meaning and understanding 311

AP 59,4/5

312

indication of the extent of the adoption of an “information perspective” in the physical sciences. There are three main areas of the physical sciences in which “information” is widely regarded as a particularly important issue: (1) the study of entropy; (2) aspects of quantum mechanics; and (3) the study of self-organising systems. Other intriguing ideas may be found – for example, the analogy between the well-known social science “principle of least effort”, or Zipf’s Law, familiar in the information sciences (Egghe, 1988, 2005, chapter 1; Buckland and Hindle, 1969; Rousseau, 2005), and the physical “principle of least action” (Moore, 1996) – but, in the absence of detailed study, these have to be regarded as metaphors rather than anything more substantial. Entropy is a concept emerging from the development of thermodynamics in the nineteenth century, and is in essence a measure of the disorder of a system (Penrose, 2004, chapter 27). Given that order, or organisation, is a quality generally associated with information, a qualitative link between information and entropy is evident. This was put on a quantitative footing by the observations by Brillouin, Wiener and others, that the “information content” or “entropy” in Shannon-Weaver information theory takes the same mathematical form as that of physical entropy, devised by Boltzmann and Planck[4] (see Denbigh, 1981; Roederer, 2005; Von Baeyer, 2004; Leff and Rex, 1990, 2003 for original writings). Information may, therefore, in this sense, be regarded as a kind of “negative entropy”, an indication that it may indeed be a fundamental physical quantity. Indeed, entropy itself may be regarded as a measure of a lack of information about a physical system, another indication of a connection between human knowledge and the physical world. It is also worth noting that the overall tendency of entropy to increase – the universe tending towards disorder – is regarded as the probable underlying cause of the perception of time passing or flowing, since the laws of physics are generally invariant in time (Gell-Mann, 1994, chapter 15). The recording and deleting of information, in memory and in the form of physical records, activities that reduce entropy locally at the cost of expenditure of energy and increase of entropy in the universe as a whole, may therefore be closely associated with our sense of time (Penrose, 2004, chapter 27). Quantum mechanics, devised in the first years of the twentieth century, is the most successful physical theory yet developed, in terms of its ability to account accurately for experiments and observations. Interpreting it, however, and understanding what it “means”, is notoriously difficult. Intriguingly for our purposes, many of the interpretations available make some reference to information or knowledge (Penrose, 2004, chapter 29). One of the major difficulties is that the quantum formalism suggests that physical reality is a strange mingling of quantum states, so that the outcome of any observation is a mix of possibilities. Accounting for why this does not happen in practice is as yet problematic. The most usual, “Copenhagen”, interpretation of quantum mechanics suggests that the mixture of states drops to only one, accounting for what is invariably observed,

only after the intervention of a conscious observer. This is a problematic situation, leading to difficulties as to what exactly counts as an “observer”. Some attempts to deal with this have led to the idea of an “information gathering and utilising system” (IGUS), which would include, but not be limited to, a human (Gell-Mann, 1994, chapter 11). At all events, this interpretation puts information and knowledge squarely at the centre of this fundamental physical theory. The physicist John A. Wheeler is generally credited with initiating the trend to regard the physical world as basically made of information, with matter, energy, and even space and time, being secondary “incidentals” (Beckenstein, 2003), encapsulated in Wheeler’s dictum: “Tomorrow we will have learned to understand and express all of physics in the language of information”. His ideas, and their influence, are described in detail in the contributions in Barrow et al. (2004), and in particular Davies (2004); see also Von Baeyer (2004) and Davies (1987, 2006). Wheeler has taken this approach farther than most, in insisting that “meaningful information” is necessarily involved, and hence that “meaning”, in the mind of a conscious observer, in effect constructs the physical world: “physics is the child of meaning even as meaning is the child of physics”. This situation Wheeler terms the “participatory universe”, in which conscious observers, gathering information and comprehending its meaning, are a vital part: “in short [. . .] all things physical are information-theoretic in origin and this is a participatory universe”. Wheeler expressed these ideas in two of a series of challenging RBQs (really big questions). The first – “What makes meaning?” – addresses the issue of how meaning emerges from what conventional physics tells us is a meaningless universe. The second – “It from Bit?” – asks whether and how the “its”, the material particles of the universe, arise from their “information-theoretic” origins. In an alternative way of understanding quantum mechanics, referred to as the “many worlds interpretation”, it is supposed that the many quantum states do not “collapse” to a single reality, but rather continue to exist together. We are only aware of one reality because it is the one that we happen to inhabit; other versions of us inhabit the others. This interpretation does not require the involvement of information or meaning in forming reality, but it does have one remarkable aspect. David Deutsch, an exponent of this view, argues that “knowledge bearing matter”, containing, in Bates’s terms embodied information, is physically special, having a regular structure across the “multiverse”, the totality of these multiple realities (Deutsch, 1997, chapter 8). Finally, there should be mentioned an approach to quantum mechanics due to the physicist David Bohm (Bohm, 1980; Bohm and Hiley, 1994). This proposed that there is only one reality, and no role is assigned to observers in bringing it into being, but that it is guided into being by a quantum “pilot wave”, determining which of the various possibilities will become reality; Bohn referred this pilot wave as “active information”. Bohm also includes in his theory the idea of an “implicate order” in the universe, existing “folded up” in the physical world, and acting as the basis for emergent order. This idea is closely associated with the idea that the universe is in some way “holographic”, leading again to the idea that information exchange may be a fundamental process (Susskind and Lindesay, 2005; Beckenstein, 2003; Talbot, 1991). While it is not possible in an article such as this to attempt to deal with the technical issues, the points made above should serve to give the idea that information, organisation, knowledge, and meaning are deeply involved, albeit in different ways, in this fundamental physical theory[5].

Complexity, meaning and understanding 313

AP 59,4/5

314

Self-organising systems are a topic of relatively recent interest, but are proving to be of importance in a variety of areas in the physical and biological sciences: see Davies (1987, 1998) for popular accounts. The interest in them comes from two perspectives. On the small-scale, it may be observed that simple physical and chemical systems show a propensity to “self-organise”: to move spontaneously towards a mode which is both organised and also highly complex. On the large scale, science must account for the emergence of highly complex organised structures – stars, galaxies, clusters of galaxies, and so on – in a universe which theorists assure us was entirely uniform and homogenous immediately after its creation. It is still not clear what the origins of this complexity are; it is generally assumed to come from gravitational effects, acting on very small inhomgeneities (Davies, 1998, chapter 2). Gravity in the early universe can therefore be seen as “the fountainhead of all cosmic organization [. . .] triggering a cascade of self-organizing processes” (Davies, 1987, p. 135). The ubiquitousness of self-organisation has led some scientists to propose that there may be “laws of complexity”, such that the universe has an “in-built” propensity to organise itself in this way; this view is far from generally accepted, but is gaining support: An increasing number of scientists and writers have come to realise that the ability of the physical world to organise itself constitutes a fundamental, and deeply mysterious, property of the universe. The fact that nature has creative power, and is able to produce a progressively richer variety of complex forms and structures, challenges the very foundation of contemporary science. “The greatest riddle of cosmology”, writes Karl Popper [. . .] “may well be that the universe is, in a sense, creative” (Davies, 1987, p. 5, author’s italics). . . . it becomes possible to imagine that a great deal of the order and regularity we find in the physical world might have arisen just as the beauty of the living world would come to be: through a process of self-organisation, by means of which the world has evolved over time to become intricately structured (Smolin, 1997, p. 15). . . . [making progress in explaining recent cosmological findings] will require us finding a way to somehow meld general relativity with complexity theory (Kolb, 2007; quoted in Clark, 2007).

The relevance of these issues to information science is that any such complexity laws would be informational in character; that is to say they would act on the information content of the organisation of matter and energy, tending to its increase. This would therefore form the basis of any unified view of information, rooted in its emergence in the physical world. We should distinguish here between “order” and “organisation” (Davies, 1987, chapter 6). Both are associated with information, but in a rather different way: order may be viewed as a measure of the quantity of information, and organisation as a measure of its quality. Order is, as noted above, associated with the physical quantity of entropy. Organisation is more difficult to quantify: various measures of “logical depth” or “algorithmic complexity” have been suggested, essentially showing to what extent the organisation of the system allows its description to be abstracted and summarised, or “compressed”. (For readable summaries, see Lloyd, 2006; Gribbin, 2004; Davies, 1998, chapter 4; Gell-Mann, 1994, chapter 3; Von Baeyer, 2004, chapter 12[6]. For fuller and more accounts of complexity, see Ellis, 2004; Chaitin, 2005.)

Systems with a high degree of organised complexity are those with the highest amount of “interesting information”. They are intermediate between systems with a high degree of order, but little complexity (for example, an inorganic crystal) and those with a high degree of complexity, but little evident order or organisation (for example, the pattern of leaves on a lawn after a heavy leaf-fall). It is the presence of organised complexity in physical systems that embodies information in the physical world[7]. However, this is as yet information which has no way of being “meaningful”, in any sense. It is worth noting, however, that several of the interpretations of quantum mechanics noted above allowed for some form of involvement of “meaning” in the wider picture. Information in the biological domain The “informatisation” of biology has been a remarkable feature of science over the past decades, from the elucidation of the genetic code in 1953 to the sequencing of the human genome exactly 50 years later, and accompanied by a consequent detailed understanding of the ways in which information is passed through generations of living creatures; see Freeland and Hurst (2004) and Dawkins (1995) for concise and lengthy accounts respectively[8]. The concepts of information theory have been extensively applied to biological, and specifically genetic, information from a relatively early stage (Gatlin, 1972). These arguments very much follow on from those relating to the physical world, in terms of increasing levels of organised complexity and information content, the latter generally understood in terms of Shannon’s formalism and its successors (Rosen, 1985; Avery, 2003; Yockey, 2005). The actions of living things in using energy to decrease entropy in their immediate surroundings, by increasing order and organisation developing, reproducing, etc., while increasing the entropy of the universe as a whole in accordance with the second law of thermodynamics, is another informational link between the physical and biological domains[9] (Davies, 1998, chapter 2). A major change that has come over biology simultaneous with the increasing emphasis on the understanding of genetic information is the tendency to describe life itself as an informational phenomenon. Rather than defining living things, and their differences from non-living, in terms of arrangements of matter and energy, and of life processes – metabolism, reproduction, etc. – it is increasingly usual to refer to information concepts: “the major difference between living and non-living systems is related to the informational organization in the living cell” (Rich, 1980); “if you want to understand life, don’t think about throbbing gels and oozes, think about information technology” (Dawkins, 1986; see also Roederer, 2005). Life, thought of in these terms, is the example of self-organised complexity par excellence. But with life comes a change from the organised complexity in the physical universe: with life we find the emergence of meaning and context. The genetic code, for example, allows a particular triplet of DNS bases to have the meaning that a particular amino acid is to be added to a protein under construction, but only in the context of the cell nucleus[10]. Further, it has become clear that the origin of life itself may best be viewed as an “information event”, as is the subsequent evolution of all life, and the development of intelligence and culture; a number of authors have elucidated this view in various ways (see, for example, Ku¨ppers, 1990; Goonatilake, 1991; Plotkin, 1994; Harms, 2004). The crucial aspect is not the arrangement of materials to form the anatomy of a living

Complexity, meaning and understanding 315

AP 59,4/5

316

creature, nor the beginning of metabolic processes; rather it is the initiation of information storage and communication between generations that marks the origin of life (Davies, 1998, chapters 2 and 3). Maynard Smith and Szathma´ry (1997, 1999) have shown how an “information centred” view of life may be used to treat evolution in terms of biological information. As life has become more complicated, they argue, this is matched by changes in the means by which biological information is coded, stored and transmitted within the living system. They identify a number of “transitions” in the evolution of biological complexity, corresponding to these information changes, including the origin of cells of different kinds, sexual reproduction, multicellular organisms, and animal co-operation and societies, leading ultimately to language and culture. The record of life on earth is generally agreed to be one of increasing organised complexity. This should not be overstated, and there are biologists who criticise this view for focusing on exceptions (Jones, 2005): species of relatively low complexity still predominate[11] and there are instances of evolution causing a decrease in complexity. Nonetheless, the natural world continues to manifest examples of increasing complexity. This causes discomfort for biologists, for whom it has worrying connotations of purpose or plan. It may, however, be seen as a further example of “complexity laws” applying in the biological, as well as the physical, domain (Davies, 1998, chapter 10; see also Smith and Jenks, 2005). “Meaning in context” is, of course, not restricted to the biological domain; as will be seen, it underlies all of human recorded information. Here we are dealing with a new development, the origin of consciousness. This is regarded by several of the authors quoted above as a natural extension of the increasing complexity of the biological world; among others who have considered this issue in the same way, though from different perspectives, are Mithen (1996), Penrose (1994), and Donald (1993, 2002). Information in the human domain Here we move to the sort of “information” most familiar in the library/information sciences: communicable recorded information, produced by humans in order to convey what Popper terms “objective knowledge”. In addition to the organised complexity and meaning in context of the physical and biological domains, we have conscious participants with an internal mental comprehension of knowledge and an ability to make it explicit. Even in this familiar domain, there is some controversy about the best way to understand information. Information may be regarded as an entity or “thing” (Buckland, 1991), as a cognitive attribute of the individual mind (Belkin, 1990), or as something created collaboratively (Talja et al., 2006). There is a particular issue of how information is to be understood to relate to similar entities, most particularly knowledge (see Meadow and Yuan, 1997; Zins, 2006). Two main frameworks to help in understanding this issue are in common use. One framework, often termed a “scalar” or “pyramidal” model, regards information, knowledge and related concepts as closely related entities which can be transformed into one another, outside the human mind. It is a common sense model, relying on an appeal to the intuitive difficulty of distinguishing between information and knowledge in normal discourse. The usual entities involved are data, information, knowledge and (sometimes) wisdom[12]. Wisdom is usually not addressed directly in information

science circles (see Rowley, 2006). These are conventionally seen as forming a pyramid – or sometimes a simple linear or scalar progression – with the broad mass of data at the base being distilled to the peak of wisdom. It is pragmatically accepted that the “distillation” process involves what are generally termed “value added” activities: summarising, evaluating, comparing, classifying, etc. Also, moving from data to wisdom is generally seen as setting information within a context, or framework, of existing knowledge, the context giving the meaning. Checkland and Holwell (1998) give a clear account of this viewpoint, adding an additional element, capta – those data to which one pays some interest – between data and information. They see the transformation from capta to information as involving the addition of context, and hence meaning, and from information to knowledge as involving the creation of large structures of related information. Intuitively appealing though it is, this model is far from rigorous; in particular, it is unclear exactly how it is determined when the transition is made between the various states. An alternative model regards knowledge as something intrinsic to, and only existing within, the human mind and cognition. Knowledge, being subjective, cannot be directly transferred or communicated from one person to another, but must be converted into information first. Information is then regarded as the objective – and therefore communicable and recordable – form of knowledge. Information is thus the bridge between the subjective knowledge in people’s heads. This model is described clearly by Orna and Pettitt (1998). There are some similarities between this view and Popper’s three-world picture, and it falls within a human- or user-centred perspective of information science, while the scalar model is in an information- or system-centred perspective[13]. These conceptual frameworks, though useful, fall short of giving an account of “human information” which may be related to the unified vision being considered here. For this purpose, we need to focus on the entity termed “knowledge” in both the frameworks, as this is the specific and distinctive entity at the “human” level. This entity has been studied for many centuries by philosophers, under the heading of “epistemology” (see, for example, Goldman, 2003; Moser and Vander Nat, 1987). These studies, however, are of limited value for information science, since they have usually been based on a consideration of what some person knows, generally expressed as an individual’s “justified true belief”. As with the issues of physical information, this is a complex and, in a different sense, technical topic, and justice cannot be done to it here; see Kvanvig (2003) for a detailed treatment. Suffice to say that, from an information science perspective particularly, there are problems with all three elements of this explanation. It does not seem sensible or appropriate to express information, in a library/information context, in terms of what some particular person believes. Justification of information will usually be in terms of what is recorded “in the literature”, rather than in the usual philosophical justification in terms of what some person has experienced. The idea that knowledge should be “true” is particularly problematic, especially if one takes a Popperian view that all human knowledge is imperfect[14]. Floridi (2005), an exponent of a new interest in the “philosophy of information” within the discipline of philosophy itself, recasts the idea of knowledge as “justified, true belief” into the idea that information is “well-formed, meaningful and truthful data”[15]. This seems more suitable for the needs of information science, but does not reflect the rather muddled reality of the human record[16].

Complexity, meaning and understanding 317

AP 59,4/5

318

Perhaps the most interesting philosophical approach is that of Kvanvig (2003), who argues that we should replace “knowledge” with “understanding” as a focus for interest. Understanding, for Kvanvig, requires “the grasping of explanatory and other coherence-making relationships in a large and comprehensive body of information”. It allows for there to be greater or lesser degrees of understanding, rather than just having knowledge/information or not. Crucially, it allows for understanding to be present even in the presence of missing, inconsistent, incompatible, and downright incorrect, information. It is firmly on the idea of meaning in context, basic to biological information, and therefore underlying human information, which must build on the biological foundation. This seems to be a more appropriate entity than the philosophers’ traditional ideas of knowledge, and the author argues elsewhere (Bawden, 2007), that Kvanvig’s “understanding” is a valuable concept for the theoretical foundations of information science[17]. This can be set, as noted above, in a Popperian ontology, with its three “worlds” (Popper, 1979, 1992, chapters 38 and 39; Notturno, 2000, chapter 7): (1) World 1: the physical world, of people, books, computers, buildings, etc. (2) World 2: the internal, subjective mental state of a conscious individual, including their personal knowledge, or understanding in Kvanvig’s terms. (3) World 3: the world of communicable, objective knowledge, or information. Popper’s ontology, though it was regarded by Brookes (1980) as a suitable philosophical foundation for information science, has been largely overlooked by philosophers, but has recently been recast in detail for the information sciences (Ingwersen and Ja¨rvelin, 2005, pp. 48-9) and for the physical/mathematical sciences (Penrose, 2004, chapters 1 and 34)[18]. Ellis (2004) presents an extension of Popper’s ideas to deal with the kind of organised complexity discussed in this article: this extends to four worlds, with human information “relegated” to World 2a. The author would argue that Popper’s ontology, or a variant of it, is still the most appropriate framework for understanding this topic, and useful in practice (Bawden, 2002). Information in three domains: a unified view? This paper has argued that information may be seen in the physical domain as patterns of organised complexity of matter and energy, with the added implication that information, and perhaps meaning and consciousness, may underlie and suffuse the physical universe to a remarkable extent. In the biological domain, meaning-in-context emerges from the self-organised complexity of biological organisms. In the human domain, understanding emerges from the complex interactions of World 2, the mental product of the human consciousness, with World 3, the social product of recorded human knowledge. The term “emerges” is used deliberately, for these are emergent properties, that is to say they appear appropriate to their level: physical, biological, conscious and social. The linking thread, and the unifying concept of information here, is organised complexity. The crucial events which allow the emergence of new properties are: (1) the origin of the universe, which spawned organised complexity itself; (2) the origin of life, which allowed meaning-in-context to emerge; and

(3) the origin of consciousness, which allows self-reflection, and the emergence of understanding, at least partly occasioned when the self reflects on the recorded knowledge created by other selves. If, therefore, we understood these three origins fully, we would, presumably, understand information itself equally fully, and the ways in which its various forms emerged. Sadly, the beginnings of the universe, of life, and of consciousness, are among the most deep and difficult problems for science (Gleiser, 2004)[19]. It is possible that an informational approach, of the kind being increasingly adopted, may shed light on these origins and transitions, as well as on processes within the three domains. This might take the form of the elucidation of laws of complexity and of self-organisation, operating at a deep level within all three domains. It is not impossible that insights from the information sciences in the human domain could contribute to this. The contribution of information science to other disciplines has been discussed (Cronin and Pearson, 1990; Hahn, 2003), but these have so far not included any impact on the physical and biological sciences. This is the intriguing prospect offered by the unifying vision suggested tentatively here: that a better understanding of the human information domain could contribute to a better understanding of the emergence of complexity and organisation in the biological and physical realms. If this seems far-fetched, and in a sense the “wrong way round”, some of the issues of the role of the consciousness and meaning in quantum mechanics remind us that it may not be an unreasonable prospect to consider. Conversely, a better understanding of informational issues in the physical and biological domains could shed light on issues of information science. This would certainly not be a reductive approach, as in the discredited approaches to “socio-biology”; rather it would rely on an understanding of similarities and analogies in emergent patterns of complexity and organisation in very different levels and domains. The information science disciplines revisited It was suggested earlier in this article that information science could best be understood as: . . . a multidisciplinary field of study, involving several forms of knowledge, given coherence by a focus on the central concept of human, recorded information, and underpinning several practical disciplines.

We may now expand this, in light of the issues discussed above, by explaining further the idea of “human, recorded, information” as “a form of organised complexity, providing meaning in context, and promoting understanding”. This can be understood in terms of the Popperian worlds ontology, with the information sciences concerned primarily with the interaction between World 2 and World 3 (Brookes, 1980; Ingwersen and Ja¨rvelin, 2005), albeit mediated through World 1 objects. Consideration of this interaction will help elucidate the “several practical disciplines” aspect. This interaction may be single or multiple. By “single” is meant that what is studied and promoted is the interaction of a single person with World 3, and the World 1 objects and systems in which it is “carried”: this would include such topics as human information behaviour, information seeking, information retrieval, information organisation, information literacy, and so on. By “multiple”, is meant the study and

Complexity, meaning and understanding 319

AP 59,4/5

320

promotion of the interaction of groups of people, even whole societies, with World 3, and the organisation systems which make this possible: this would include such disciplines as information management, librarianship, records management, archiving, knowledge management, etc. There will clearly be some overlap between these two in academic terms, for practical purposes such as course planning and some research, But they still form two “poles” for the academic discipline. The former – individual – pole corresponds to a group of topics identified as a “core” for information science (Bawden, 2007). The latter – multiple – pole corresponds to what Bates (2005) refers to as the “collection sciences”, which bring together objects to aid research, learning, entertainment, etc., the nature of the objects determining the professional discipline. Published recorded information objects are the focus for librarianship, their unpublished equivalents the focus for archiving, records management and document management; information management might claim to cover both, and to be an umbrella concept, non-living and living carriers of embodied information are mainly the province of museums and of zoos and botanical gardens, respectively. For convenience, and accepting that the terminology will not be acceptable to everyone, the first pole might be termed information science and the second library/information management. The practical disciplines are likely to be associated with the second pole: while we may regret the passing of a historic and honourable term, Information Scientist is no longer a recognised job description outside a few limited contexts. The topics included in the core may be seen as providing the necessary “intellectual underpinning” for the disciplines. The explanation of the discipline may now be expanded, at the risk of long-windedness to: A multidisciplinary field of study, involving several forms of knowledge, given coherence by two foci: first on the central concept of human, recorded information – a form of self-organised complexity, providing meaning in context, and promoting understanding – and second on the interaction between Popper’s Worlds 2 and 3; with an intellectual core of information science which underpins several practical disciplines of library/information management.

This way of understanding the discipline seems to bridge the gap between the fundamental issues outlined above, with their potential for highly interdisciplinary research, and the practical needs of those who plan courses and contemplate the future of the information professions. Epilogue Information history, as outlined by Weller (2007) in this issue of Aslib Proceedings, considers developments over hundreds, or perhaps thousands, of years. The considerations above give a dramatically longer context, into “deep time”, for the origins of biological, and hence human and social, information: In some as yet ill-understood way, a huge amount of information evidently lies secreted in the smooth gravitational field of a featureless, uniform gas. As the system evolves, the gas comes out of equilibrium, and information flows from the gravitational field to the matter. Part of this information ends up in the genomes of organisms, as biological information [. . .] Thus all life feeds off the entropy gap that gravitation has created. The ultimate source of biological information and order is gravitation (Davies, 1998, p. 36).

Looking to the future, it is intriguing to speculate that consciousness, intelligence and human information may play a major, and as yet unforseeable, part in the development of the universe: As our World 3 products become ever more elaborate and complex [. . .] so the possibility arises that a new threshold of complexity may be crossed, unleashing a still higher organizational level, with new qualities and laws of its own. There may emerge collective activity of an abstract nature that we can scarcely imagine, and may even be beyond our ability to conceptualize. It might even be that this threshold has been crossed elsewhere in the universe already, and that we do not recognize it for what it is (Davies, 1987, p. 196).

Notes 1. This is the MSc Information Science course at City University London; the first such course in the UK, and, at the time of writing, the only one remaining. 2. The author is not here addressing the wider issue of the “philosophy of information”: for recent overviews of this, see the special issues of Library Trends (Vol. 55 No. 3, 2004) and Journal of Documentation (Vol. 61 No. 1, 2005). 3. Not to be confused with the “infons” introduced by Devlin (1991), as part of an attempt to provide a mathematical theory of information and meaning; Devlin’s infons are abstract entities with his use of situation theory. 4. Information content is essentially the logarithm of the number of different messages possible from the available symbols, while physical entropy is essentially the logarithm of the number of distinguishable ways in which a physical system may be arranged. 5. It may be noted that quantum theory has generated its own formalism of information theory, in which the “bits” of classical information theory are replaced by “qubits”, which may exist in quantum superposition, effectively being 0 and 1 at the same time, and may further be “entangled”, resulting in very unintuitive behaviour. This forms the basis for the developing field of quantum computing: see Nielsen (2002) for a non-technical introduction, and also Deutsch (2004) and Roederer (2005), and for a more technical treatment, see Vedral (2006) and Nielsen and Chang (2000). 6. It may be argued that all of science is a search for such “compressions”, seeking ways of abstracting observational or experimental data through finding patterns, or “laws”, usually through mathematics, the collection of all possible patterns. 7. Some physicists, most notably Smolin, argue that evolutionary self-organisation may provide not only the observable complex patterns of matter and energy, but also physical laws themselves. 8. The “river” in the title of Dawkins’ “River out of Eden”, is the flow of information through generations; a striking metaphor, and one which would have made no sense before the understanding of the informational nature of the basis of life. 9. Those who lament the gap between the culture of science and that of the humanities may be cheered to know that the thermodynamics of living creatures features conspicuously in the early novels of Thomas Pynchon (Simberloff, 1978). 10. It may seem strange to speak of “meaning” when there is no conscious entity in the cell to understand the meaning; however, all that is implied here is that, in a particular context, the information in the DNA code causes a specific and intended response. 11. As Jones (2005, p. 153) puts it, “A billion years ago most organisms were bacteria – and they still are”.

Complexity, meaning and understanding 321

AP 59,4/5

322

12. This is sometimes referred to as the T.S. Eliot model, from the lines in that poet’s “Choruses from the Rock”: “Where is the wisdom we have lost in knowledge/Where is the knowledge we have lost in information”. 13. Those concerned at dealing with two different models can readily combine them, by considering that the data-capta-information-knowledge-wisdom spectrum refers only to the objective and communicable information transferred between people. 14. This is seen most clearly for scientific knowledge, which is generally subject to incremental, and occasionally revolutionary, advances. Newtonian mechanics, for example, were regarded as inarguably “true” for 200 years, until the insights of general relativity and quantum theory. While the Newtonian is not “untrue”, in the sense that it accords with intuitive experience, and is the basis for most engineering and technology, it is no longer regarded as an adequate account of how the world is. A more specific example is given by our knowledge of the planet Mercury (Sobell, 2005, chapter 3). It was believed for several decades that the length of the day and the year on Mercury were the same, so that the planet had one hemisphere in perpetual sunlight, and the other in perpetual darkness. This view was promulgated in all astronomical literature between the 1880s and the 1960s: “the terms ‘day’ and ‘night’ have no real meaning so far as Mercury is concerned. One hemisphere is perpetually scorched by the solar rays, while no gleams of sunlight ever penetrate to the far side” (Moore, 1955, pp. 41-2). Lurid science fiction stories were written about the exploration of such a strange world, and speculations were made about the possibility of life in the “twilight zone” between the two hemispheres. Later observations showed this view to be mistaken; though the days and nights on Mercury are long, they do exist. It seems a strange perspective to say that there was “no knowledge” about this aspect of Mercury for 80 years, as the current knowledge of the time has been shown to be in error. 15. In this, Floridi notes that he follows a recent trend to equate (human) information with some composite of data and meaning, for example Checkland’s “information equals data plus meaning”. 16. It is particularly at odds with Popper’s concept of objective knowledge; he insisted World 3 must contain incorrect facts and erroneous arguments. 17. Although it is not a part of the argument, it is worth noting that this gives support for the continuing value of “traditional” library/information services in an era of increasingly capable information retrieval systems. An organised collection of appropriate information resources, with services appropriate to the known needs of users, could claim to aid understanding, in a way that a search engine never could. 18. One difficulty with the adoption of this ontology is a lack of clarity – starting with Popper’s own writings – as to exactly what are the constituents of World 3. Popper described it vaguely as “the contents of journals, books and libraries”, but also specified allowed “problems, theories and critical arguments” (including erroneous arguments); he and his followers also allowed numbers and other mathematical objects, music, laws, institutions, etc. Penrose initially restricts it to mathematical truths (“Plato’s world”), but considers allowing in Beauty, Truth and Morality; Ingwersen and Ja¨rvelin say simply that it is objective knowledge, and exemplify it by direction signs. Davies (1987) includes social institutions, works of art, literature and religion. 19. They have also been recognised as such for some while: Smolin (1997) notes that, as a young man, he was inspired to study physics by the thought that he might contribute to answering three questions: “What is the universe?”; within the context of the answer to the first question, “What is a living thing?”; within the context of the answers to the first two questions, “What is a human being?”. It now seems likely that both the answers, and their context, may best be given in terms of information.

References Avery, J. (2003), Information Theory and Evolution, World Scientific, Singapore. Barrow, J.D., Davies, P.C.W. and Harper, C.L. (Eds) (2004), Science and Ultimate Reality, Cambridge University Press, Cambridge. Bates, M.J. (2005), “Information and knowledge: an evolutionary framework”, Information Research, Vol. 10 No. 4, paper 239, available at: http://informationr.net/ir/10-4/paper239. html (accessed 8 January 2007). Bates, M.J. (2006), “Fundamental forms of information”, Journal of the American Society for Information Science and Technology, Vol. 57 No. 8, pp. 1033-45. Bawden, D. (2001), “The shifting terminologies of information”, Aslib Proceedings, Vol. 53 No. 3, pp. 93-8. Bawden, D. (2002), “The three worlds of health information”, Journal of Information Science, Vol. 28 No. 1, pp. 51-62. Bawden, D. (2007), “Information seeking and information retrieval: the core of the information curriculum?”, Journal of Education for Librarianship and Information Science, forthcoming. Beckenstein, J.D. (2003), “Information in the holographic universe”, Scientific American, August, pp. 46-55, available at: www.fiz.huji.ac.il/ , bekenste/Holographic_Univ.pdf (accessed 10 January 2007). Belkin, N. (1990), “The cognitive viewpoint in information science”, Journal of Information Science, Vol. 16 No. 1, pp. 11-15. Belkin, N.J. (1978), “Information concepts for information science”, Journal of Documentation, Vol. 34 No. 1, pp. 55-85. Bohm, D. (1980), Wholeness and the Implicate Order, Routledge and Kegan Paul, London. Bohm, D. and Hiley, B.J. (1994), The Undivided Universe: An Ontological Interpretation of Quantum Mechanics, Routledge, London. Brillouin, L. (1956), Science and Information Theory, Academic Press, New York, NY. Brittain, J.M. (1980), “What are the distinctive characteristics of information science?”, in Harbo, O. and Kajberg, L. (Eds), Theory and Application of Information Research, Mansell, London, pp. 34-47. Brookes, B.C. (1980), “The foundations of information science: part 1: philosophical aspects”, Journal of Information Science, Vol. 2 Nos 3/4, pp. 125-33. Buckland, M. (1991), “Information as thing”, Journal of the American Society for Information Science, Vol. 42 No. 5, pp. 351-60. Buckland, M. (2005), “Information schools: a monk, library science and the information age”, in Hauke, P. and Saur, K.G. (Eds), Bibliothekswissenschaft – Quo Vadis?, Saur, Mu¨nchen. Buckland, M. and Hindle, A. (1969), “Library Zipf”, Journal of Documentation, Vol. 25 No. 1, pp. 52-7. Chaitin, G. (2005), Megamaths: The Quest for Omega, Atlantic Books, London. Checkland, P. and Holwell, S. (1998), Information, Systems and Information Systems – Making Sense of the Field, Wiley, Chichester. Clark, S. (2007), “Heart of darkness”, New Scientist, Vol. 193 No. 2591, pp. 28-33. Cornelius, R. (2002), “Theorising information for information science”, Annual Review of Information Science and Technology, Vol. 36, pp. 393-425.

Complexity, meaning and understanding 323

AP 59,4/5

324

Cronin, B. (2005), “An I-dentity crisis? The information schools movement”, International Journal of Information Management, Vol. 25 No. 4, pp. 363-5. Cronin, B. and Pearson, S. (1990), “The export of ideas from information science”, Journal of Information Science, Vol. 16 No. 6, pp. 381-91. Davies, P. (1987), The Cosmic Blueprint: Order and Complexity at the Edge of Chaos, Penguin, London. Davies, P. (1998), The Fifth Miracle: The Search for the Origin of Life, Penguin, London. Davies, P. (2006), The Goldilocks Enigma, Penguin, London. Davies, P.C.W. (2004), “John Archibald Wheeler and the clash of ideas”, in Barrow, J.D., Davies, P.C.W. and Harper, C.L. (Eds), Science and Ultimate Reality, Cambridge University Press, Cambridge, pp. 3-23. Dawkins, R. (1986), The Blind Watchmaker, Norton, New York, NY. Dawkins, R. (1995), River out of Eden, Basic Books, New York, NY. Denbigh, K. (1981), “How subjective is entropy?”, Chemistry in Britain, Vol. 17 No. 4, pp. 168-85 (reprinted in Leff, H.A. and Rex, A.F. (1990), Maxwell’s Demon: Entropy, Information, Computing, Adam Hilger, Bristol). Deutsch, D. (1997), The Fabric of Reality, Penguin, London. Deutsch, D. (2002), “The structure of the multiverse”, Proceedings of the Royal Society, Vol. 458 No. 2028, pp. 2911-23, available at: http://arxiv.org/ftp/quant-ph/papers/0104/0104033.pdf (accessed 8 January 2007). Deutsch, D. (2004), “It from qubit”, in Barrow, J.D., Davies, P.C.W. and Harper, C.L. (Eds), Science and Ultimate Reality, Cambridge University Press, Cambridge, pp. 90-102. Devlin, K. (1991), Logic and Information, Cambridge University Press, Cambridge. Donald, M. (1993), Origins of the Modern Mind: Three Stages in the Evolution of Culture and Cognition, new edition, Harvard University Press, Cambridge, MA. Donald, M. (2002), A Mind So Rare: The Evolution of Human Consciousness, rev. ed., W.H. Norton, New York, NY. Egghe, L. (1988), “On the classification of the classical bibliometric laws”, Journal of Documentation, Vol. 44 No. 1, pp. 53-62. Egghe, L. (2005), Power Laws in the Information Production Process: Lotkaian Informetrics, Elsevier, Amsterdam. Ellis, G.F.R. (2004), “True complexity and its associated ontology”, in Barrow, J.D., Davies, P.C.W. and Harper, C.L. (Eds), Science and Ultimate Reality, Cambridge University Press, Cambridge, pp. 607-36. Floridi, L. (2005), “Is semantic information meaningful data?”, Philosophy and Phenomenological Research, Vol. 70 No. 2, pp. 351-70, available at: www.philosophyofinformation.net/pdf/ iimd.pdf (accessed 12 January 2007). Freeland, S.J. and Hurst, L.D. (2004), “Evolution encoded”, Scientific American, April, pp. 56-63. Frieden, B.R. (2004), Science from Fisher Information, Cambridge University Press, Cambridge. Gatlin, L.L. (1972), Information Theory and the Living System, Columbia University Press, New York, NY. Gell-Mann, M. (1994), The Quark and the Jaguar: Adventures in the Simple and the Complex, Little Brown, London.

Gleiser, M. (2004), “The three origins: cosmos, life, and mind”, in Barrow, J.D., Davies, P.C.W. and Harper, C.L. (Eds), Science and Ultimate Reality, Cambridge University Press, Cambridge, pp. 637-53. Goldman, A. (2003), “Epistemology”, in Shand, J. (Ed.), Fundamentals of Philosophy, Routledge, London, chapter 1. Goonatilake, S. (1991), The Evolution of Information: Lineages in Gene, Culture, and Artefact, Pinter, London. Gribbin, J. (2004), Deep Simplicity, Allan Lane, London. Hahn, T.B. (2003), “What has information science contributed to the world?”, Bulletin of the American Society for Information Science and Technology, Vol. 29 No. 4, pp. 2-3. Harms, W.F. (2004), Information and Meaning in Evolutionary Processes, Cambridge University Press, Cambridge. Hirst, P. (1974), Knowledge and the Curriculum, Routledge and Kegan Paul, London. Hirst, P. and Peters, R.S. (1970), The Logic of Education, Routledge and Kegan Paul, London. Hjørland, B. (2000), “Library and information science: practice, theory and philosophical basis”, Information Processing and Management, Vol. 36 No. 3, pp. 501-31. Ingwersen, P. and Ja¨rvelin, K. (2005), The Turn: Integration of Information Seeking and Retrieval in Context, Springer, Dordrecht. Jones, S. (2005), The Single Helix, Little Brown, London. Kajberg, L. and Lørring, L. (2005), “European curriculum reflections on library and information science education”, Royal School of Librarianship and Information Science, Copenhagen, available at: www.db.dk/lis-eu Ku¨ppers, B.-O. (1990), Information and the Origin of Life, MIT Press, Cambridge, MA. Kvanvig, J.L. (2003), The Value of Knowledge and the Pursuit of Understanding, Cambridge University Press, Cambridge. Landauer, R. (1991), “Information is physical”, Physics Today, May, pp. 23-9. Leff, H.S. and Rex, A.F. (1990), Maxwell’s Demon: Entropy, Information, Computing, Adam Hilger, Bristol. Leff, H.S. and Rex, A.F. (2003), Maxwell’s Demon 2: Entropy, Classical and Quantum Information, Computing, Institute of Physics Publishing, Bristol. Liebenau, J. and Backhouse, R. (1990), Understanding Information, Macmillan, London. Lloyd, S. (2006), Programming the Universe, Jonathan Cape, London. Losee, R.M. (1997), “A discipline independent definition of information”, Journal of the American Society for Information Science, Vol. 48 No. 3, pp. 254-69. Madden, A.D. (2004), “Evolution and information”, Journal of Documentation, Vol. 60 No. 1, pp. 9-23. Maynard Smith, J. and Szathma´ry, E. (1997), The Major Transitions in Evolution, Oxford University Press, Oxford. Maynard Smith, J. and Szathma´ry, E. (1999), The Origins of Life: From the Birth of Life to the Origin of Language, Oxford University Press, Oxford. Meadow, C.T. and Yuan, W. (1997), “Measuring the impact of information: defining the concepts”, Information Processing and Management, Vol. 33 No. 6, pp. 697-714. Mingers, J. (1997), “The nature of information and its relationship to meaning”, in Winder, R.L., Probert, S.K. and Beeson, A.A. (Eds), Philosophical Aspects of Information Systems, Taylor & Francis, London, pp. 73-84.

Complexity, meaning and understanding 325

AP 59,4/5

326

Mithen, S. (1996), The Prehistory of the Mind, Thames and Hudson, London. Moore, P. (1955), Guide to the Planets, Eyre and Spottiswood, London. Moore, T.A. (1996), “Least action principle”, MacMillan Encyclopaedia of Physics, Vol. 2, Simon and Schuster, New York, NY, pp. 840-2. Moser, P. and Vander Nat, A. (1987), Human Knowledge: Classical and Contemporary Approaches, Oxford University Press, Oxford. Nielsen, M.A. (2002), “Rules for a complex quantum world”, Scientific American, November, pp. 49-57. Nielsen, M.A. and Chang, I.L. (2000), Quantum Computation and Quantum Information, Cambridge University Press, Cambridge. Noorlander, W. (2005), “What does ‘information’ really mean?”, Information Outlook, Vol. 9 No. 6, pp. 21-4. Notturno, M.A. (2000), Science and the Open Society: The Future of Karl Popper’s Philosophy, Central European University Press, Budapest. Orna, E. and Pettitt, C. (1998), Information Management in Museums, 2nd ed., Gower, Aldershot. Penrose, R. (1994), Shadows of the Mind: A Search for the Missing Science of Consciousness, Oxford University Press, Oxford. Penrose, R. (2004), The Road to Reality, Jonathan Cape, London. Plotkin, H. (1994), The Nature of Intelligence: Concerning Adaptions, Instinct and the Evolution of Intelligence, Penguin, London. Popper, K. (1992), Unended Quest: An Intellectual Autobiography, revised edition, Routledge, London. Popper, K.R. (1979), Objective Knowledge: An Evolutionary Approach, revised edition, Oxford University Press, Oxford. Rich, A. (1980), “Bits of life: how information is handled in living systems”, The Sciences, Vol. 20 No. 8, pp. 10-13, 25. Roederer, J.G. (2005), Information and its Role in Nature, Springer Verlag, Berlin. Rosen, R. (1985), Theoretical Biology and Complexity, Academic Press, New York, NY. Rousseau, R. (2005), “Robert Fairthorne and the empirical power laws”, Journal of Documentation, Vol. 61 No. 2, pp. 194-202. Rowley, J. (2006), “Where is the wisdom that we have lost in knowledge?”, Journal of Documentation, Vol. 62 No. 2, pp. 251-70. Siegfried, T. (2000), The Bit and the Pendulum: From Quantum Computation to M Theory – The New Physics of Information, Wiley, New York, NY. Simberloff, D. (1978), “Entropy, information and life: biophysics in the novels of Thomas Pynchon”, Perspectives in Biology and Medicine, Vol. 21 No. 4, pp. 617-25. Smith, J. and Jenks, C. (2005), “Complexity, ecology and the materiality of information”, Theory, Culture and Society, Vol. 22 No. 5, pp. 141-63. Smolin, L. (1997), The Life of the Universe, Weidenfeld and Nicholson, London. Smolin, L. (2002), Three Roads to Quantum Gravity, Bantam Books, New York, NY. Sobell, D. (2005), The Planets, Fourth Estate, London. Stonier, T. (1990), Information and the Internal Structure of the Universe: An Exploration into Information Physics, Springer-Verlag, London.

Stonier, T. (1992), Beyond Information: The Natural History of Intelligence, Springer-Verlag, London. Stonier, T. (1997), Information and Meaning: An Evolutionary Perspective, Springer-Verlag, London. Sturges, P. (2005), “Clear thinking on the ‘unity’ of the information professions”, Journal of Documentation, Vol. 61 No. 4, pp. 471-5. Susskind, L. and Lindesay, J. (2005), An Introduction to Black Holes, Information, and the String Theory Revolution: The Holographic Universe, World Scientific, Singapore. Talbot, M. (1991), The Holographic Universe, Harper Collins, London. Talja, S., Tuominen, K. and Savolainen, R. (2006), “‘Isms’ in information science: constructivism, collectivism and constructionism”, Journal of Documentation, Vol. 61 No. 1, pp. 79-101. Vedral, V. (2006), Introduction to Quantum Information Science, Oxford University Press, Oxford. Von Baeyer, C. (2004), Information: The New Language of Science, Harvard University Press, Harvard, MA. Walsh, P. (1993), Education and Meaning: Philosophy in Practice, Cassell Educational, London. Warner, J. (2001), “W(h)ither information science”, Library Quarterly, Vol. 71 No. 2, pp. 243-55. Webber, S. (2003), “Information science in 2004: a critique”, Journal of Information Science, Vol. 29 No. 4, pp. 311-30. Weller, T. (2007), “Information history: its importance, relevance and future”, Aslib Proceedings, Vol. 59 Nos 4/5, pp. 437-48. Weller, T. and Haider, J. (2007), “Where do we go from here? An opinion on the future of LIS as an academic discipline in the UK”, Aslib Proceedings, Vol. 59 Nos 4/5, pp. 475-82. Yockey, H.P. (2005), Information Theory, Evolution and the Origin of Life, Cambridge University Press, Cambridge. Zins, C. (2006), “Redefining information science: from ‘information science’ to ‘knowledge science’”, Journal of Documentation, Vol. 62 No. 4, pp. 447-61. Corresponding author David Bawden can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Complexity, meaning and understanding 327

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

AP 59,4/5

Healthcare libraries in Saudi Arabia: analysis and recommendations

328

Ahmad Khudair

Received 19 December 2006 Accepted 3 June 2007

Department of Information Science, King Saud University, Riyadh, Saudi Arabia, and

David Bawden Department of Information Science, City University London, London, UK Abstract Purpose – The paper aims to gain a detailed understanding of the current health library/information environment in Saudi Arabia, to identify problems, issues, and areas for improvement, to make recommendations for improvement, and to instantiate these in models and prototypes. Design/methodology/approach – A mixed method empirical approach is used in 11 health libraries, including literature survey, institutional profiling, questionnaire, interviews, non-participant observation, and examination of documents. A model for supporting change management in Saudi health libraries is proposed, and a prototype for a Saudi Health Information Network is developed. Findings – The healthcare libraries are well-used, and appreciated by their users, and the staff are generally satisfied with their work. Problems and issues are identified: use of information communication technologies and digital resources; lack of proactive information services; education, training and continuing professional development for health library work; limited strategic planning and policy for these services. Recommendations are made for improvements. Research limitations/implications – The empirical research is limited to health sciences libraries in Riyadh, the capital of Saudi Arabia. The prototype health information network has not been evaluated by users. Practical implications – Recommendations are made to enable the government of Saudi Arabia and its various agencies to support improvements in the existing health sciences libraries and information provision. Originality/value – This is a detailed study of the health library environment in Saudi Arabia, illustrating factors typical of the situation in many other countries. The paper outlines a novel organisational change model and prototype national health information network. Keywords Libraries, Special libraries, Continuing professional development, Change management, Saudi Arabia Paper type Research paper

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 328-341 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817555

Introduction and aims of the study The aim of this study was to explore the state of health sciences libraries in Riyadh, the capital city of Saudi Arabia, to investigate their strengths and weaknesses, particularly with respect to information service provision and use of information and communication technologies (ICTs), and to make recommendations for future developments. This study has been reported in a PhD thesis (Khudair, 2005), which should be consulted for fuller details.

The more specific objectives of the study were: . to explore the current state of health sciences libraries in Riyadh; . to determine the adequacy or otherwise of the health sciences libraries’ resources, services, and co-operative activities in Riyadh; . to identify the health professionals’ information needs, the information sources used by them, the adequacy of those sources, and the difficulties faced in acquiring information; . to explore the perception of health professionals towards information provision and the use of information and communication technologies in health sciences libraries; . to explore the condition of the health library profession in Saudi Arabia, and the need for programmes of training and professional development; and . to develop an organisational model for improvement of health sciences library practice and information provision and to make appropriate recommendations based on this. Background: healthcare libraries in Saudi Arabia Historically, health sciences libraries in Saudi Arabia were founded concurrently with the foundation of medical teaching programmes and modern hospitals in major cities. The number of health sciences libraries is increasing with the establishment of new hospitals and universities around the country, the period of major growth being the 1970s (Khudair, 2005). A number of studies have been published on the various aspects of the Saudi health library system (see, for example, AbuOuf, 1995; Al-Ogla, 1998; AlShaya, 2002; Al-Zahrani, 2002; Arif, 1998; Aseel, 1996; and others quoted in Khudair, 2005). These studies point to a number of issues and constraints in the Saudi healthcare library system: . a poor information infrastructure, with limited ICT access; . the lack of a national strategic plan for health information; . relatively little effective use of ICTs, with continued reliance of traditional manual methods, and “pen on paper”; . limited understanding of ICTs among health professionals and health library staff; and . reliance in many hospital libraries on a basic set of resources. In recent years, there has been a very real concern about the implication of the adoption of various kinds of electronic information services into health sciences libraries. Therefore, the infrastructure of these libraries is becoming a collection of multiple technologies, including online databases, CD-ROMs, internet, etc. However, the weakness of libraries is that generally they have grown up without being carefully planned to fit in with existing facilities and the information and communication technology infrastructure. Moreover, their development has been ad hoc, without proper planning or co-operation with neighbouring libraries (Arif, 1998). This situation has been influenced by the growth and conditions of the parent hospitals and health organisations in the country. Little coordination exists among health provider agencies

Healthcare libraries in Saudi Arabia 329

AP 59,4/5

330

in Saudi Arabia. Facility and equipment planning in one sector rarely takes the resources of another sector into account, and even within a sector joint use of resources is not widely practised. Duplication of resources and services is the direct result of lack of coordination between provider agencies, and contributes to the escalating cost of health services in the Kingdom. AlShaya (2002) found two major problems, mainly: (1) the lack of information sources provided by health libraries in Riyadh to physicians; and (2) inadequate information education for physicians. AlShaya (2002) recommended extending physicians’ access to electronic information sources, and enhancing information education opportunities for physicians, so they can learn to use IT and electronic information sources. He found that several environmental factors can make quite large differences to the physicians’ use of new technologies such as availability and accessibility of electronic information services, status of physicians, and information searching skills and training. AlShaya (2002) saw it as essential to develop and implement national policies and guidelines for the provision of electronic information services in hospital libraries in Saudi Arabia. This was the background to the study reported here. It was focused on 11 healthcare libraries in Riyadh, the capital city of Saudi Arabia. These were: (1) King Faisal Specialist Hospital and Research Centre; (2) King AbdulAziz City of Medicine; (3) King Khalid University Hospital; (4) King Khalid Specialist Eye Hospital; (5) King Abdulaziz University Hospital and College of Dentistry; (6) Riyadh Armed Forces Hospital; (7) Security Forces Hospital; (8) Yamamah Hospital; (9) Sulaimanyah Children’s Hospital; (10) Saudi Centre for Organ Transplants; and (11) Al-Iman General Hospital. Study methods The study used six separate data gathering methods: (1) a literature review; (2) profiling of libraries and library staff; (3) a questionnaire analysis of library users and use; (4) interviews with library staff; (5) observation of activities and interactions within the libraries; and (6) analysis of documents. This mixed approach, which blended quantitative and qualitative data collection and analysis, was designed to give the richest picture of the situation (Khudair, 2005).

Literature review The review aimed at identifying literature relevant to healthcare information generally, with a particular emphasis on material relevant to the situation of Saudi Arabia. In addition to searches of bibliographic databases, and library collections in the UK, material available locally in Riyadh and Jeddah (Saudi Arabia) and Cairo (Egypt) was consulted. Electronic mailing lists and discussion forums covering the relevant topics and regions were also scanned. The results were analysed and fed into the subsequent design of the study (Khudair, 2005). Profiling A “fact sheet” was drawn up to provide a profile of all the staff working in the Riyadh health libraries, including basic personal information (nationality, academic qualifications, etc.) together with job descriptions, work experience, professional activities, etc. This gave an initial “picture” of the library staff. There was a lack of background information about the libraries themselves, for example contact information, electronic and printed resources, staff, services, and even such basic facts as the number of hospitals that actually have a health sciences library in Riyadh. A directory was therefore created for all the libraries in this study including this baseline information, and as the starting point for a full national directory. Questionnaires The questionnaire designed for this research was for health professionals working in governmental hospitals in Riyadh. Its main purpose was to gather both quantitative and qualitative data, and to gain an accurate knowledge of present activities in health sciences libraries in Riyadh. In addition, the questionnaire focused on the level of user satisfaction as it related to health libraries and information services. The questionnaire was designed, so far as possible, to build on other studies carried out in the Saudi health library system, by using similar questions (Khudair, 2005). An initial version was tested in two hospital libraries, with a further pre-test in five hospital libraries. The final questionnaire contained 44 questions, in five sections: (1) personal information; (2) user views of the library; (3) training received and training needs; (4) reasons for information seeking, type of information resources preferred, adequacy of information services; (5) future prospects and problems in information provision. Questionnaires were distributed to randomly selected health professionals within randomly selected departments of the 11 hospitals in the study; 845 questionnaires were distributed, of which 493 were returned correctly completed, a response rate of 58 per cent. Interviews A total of 22 interviews were carried out with health library workers, out of a total of 37 persons working in ten of the 11 libraries in the study (the remaining library had no

Healthcare libraries in Saudi Arabia 331

AP 59,4/5

332

designated staff at the time of the study). The interviews were unstructured, and took the form of an “informal conversation interview”, since most of the interviewees did not feel comfortable with structured or recorded interviews. The informal conversational interview is a type of interview that may occur spontaneously in the course of fieldwork, and the respondent may not know that an “interview” is taking place (Sewell, 2002). Questions emerge from the immediate context, so the wording of questions and even the topics are not predetermined. The major advantage is that the interview is highly individualised and relevant to the individual, and likely to produce information or insights that the interviewer could not have anticipated. The time taken with these interviews varied, depending on the mood of the staff and time available. The interviews generally highlighted the following topics: . co-operation between Saudi health libraries in Riyadh and with other libraries in the country; . needs, expectations and developments related to the health libraries in Riyadh; . challenges encountered by health sciences librarians; . type of planning required, especially for electronic information services in the libraries; . role of health information professionals in developing and delivering information services; and . role and position of the libraries and health information professionals within the health care environment. Observation Repeated non-participant observations were carried out in the libraries studied (participant observation was considered, but was not feasible due to time constraints, and to the restrictions imposed in some library settings). The main factors examined were: . the physical environment within the health libraries; . the human, social environment, including the ways in which health information professionals and health professionals interact and behave towards each other; . the implementation of services and facilities; . health professionals’ interactions with services and resources; and . the role of health information professionals in the libraries as service providers. This provided a better understanding of the context of the questionnaire and survey responses.

Document analysis As wide a range as possible of library documentation – including library reports and plans, health ministry reports and plans, job descriptions, and service guides – were analysed, to provide complementary information to that gathered by other means.

Study results The results are described and analysed fully by Khudair (2005), and only a summary of the main points is given here, organised into four sections: (1) libraries, staff, users, usage; (2) education and training; (3) information services; (4) information and communication technologies. Libraries, staff, users, usage Staff numbers in the libraries surveyed varied between one and seven, with one library awaiting the appointment of a librarian. Seventeen were regarded as professional level staff, and 20 as para-professional. A majority of library staff were not educated to university degree level, only half had any library/information qualification, and only one person had had any training in health information topics. All relied on substantial work experience to support their professionalism, given the limited impact of LIS formal education, and the lack of any continuing professional development (CPD) opportunities. Most were satisfied with their working environment, but had concerns about lack of facilities, training opportunities, their role and status (and consequent lack of input into decision making processes), and the lack of strategic policies and plans. The largest groups of users, as might be expected, were physicians (39 per cent) and nurses (24 per cent); other users included pharmacists, paramedics, technicians and administrators. Over 8 per cent of users were educated to Bachelor’s degree or equivalent. The health sciences library was the first choice for seeking information (45 per cent), although 26 per cent relied on online searching. Use was mainly for “traditional” purposes of reading library material (40 per cent), borrowing printed items (25 per cent) and literature searching (18 per cent), with 92 per cent of interaction being by a personal visit to the library. Difficulties with use of the library reflected this, with the main issue being opening hours, with an additional requirement, particular to the local culture, for better facilities for female users. Almost all services of the libraries are provided only to staff of the parent institutions, though two allow external users. Co-operation with other libraries is almost entirely dependent on personal efforts and contacts. Only half of the libraries had automated systems, the remainder relying on card catalogues. All provided computer facilities, but two had no internet connectivity. There was very limited use of web services, and four had no e-journal access. Only two libraries provided any active “information services” (e.g. current awareness), the remainder providing only passive collection-based services. There was a clear “digital divide” or “information divide” between those libraries well-equipped with ICTs and the others, and between the few that provided any information services and the others. Lack of any formal co-operation forum and lack of CPD provision made this situation worse. Education and training As noted above, the lack of formal education in LIS generally, and in health information specifically, and the lack of CPD opportunities, emerged as a significant

Healthcare libraries in Saudi Arabia 333

AP 59,4/5

334

factor preventing the development and improvement of the health sciences library services. Training in the use of ICTs came across as an important need. This emphasises a point made by others (see, for example, Rehman and Al-Ansari, 2003; Al-Ogla, 1998; Alsereihy, 1998; Marghalani, 1993; Siddiqui, 1996); that LIS educational programmes in Saudi Arabia have little influence or impact upon the work of Saudi librarians and information specialists in general, and health librarians in particular. Specialised training for healthcare library/information staff, recognised as an important factor in their success (Braude, 1997), is particularly lacking in the Saudi situation. The study also showed that the health professional library users felt a need for training and support in the use of ICTs and digital information resources, confirming the findings of others (AlShaya, 2002; Al-Zahrani, 2002). Very few had received any training or support from library staff, and only half would ask for advice from them. This emphasises the need for library staff themselves to be well-trained and confident with these topics. This is by no means limited to the Saudi situation (see, for example, Maynard, 2002, for a UK equivalent), but it is seen particularly strongly here. Information services Usage of the health sciences library by health professionals (physicians, nurses, etc.) was for diverse reasons which are commonly found in health organisations. Predominantly, it is for keeping up-to-date (44 per cent) and dealing with clinical issues (31 per cent), but also for studying, teaching, publishing, etc. (It may be noted that there had been virtually no prior studies of healthcare information needs in the Saudi situation.) A total of 56 per cent preferred to use printed materials, while only 39 per cent preferred electronic sources, re-emphasising the “traditional” nature of these library services, especially for some groups of users, especially nurses. Satisfaction with library services and library staff was generally high; dissatisfaction was noted for provision of electronic resources and ICT systems, and for training provision. Knowledge sharing, among both users and library staff, was very limited, because of both technical factors and a lack of organisational infrastructure and policy. ICTs ICTs were found to be playing an important role in the health sciences libraries, and have the potential to shape a paradigm shift of functions and activities. The libraries studied all provided various ICT facilities and electronic services, though the availability differed between libraries. These included personal computers (PCs), network and internet access, online catalogues, CD-ROM resources, online databases and electronic journals. However, there were found to be problems with all of these systems and services, in terms of availability, accessibility, the users’ knowledge of the existence of these facilities, and the ability of both users and library staff to use them effectively. Shortages of PCs, for example, forced users, to some extent, to use alternative means such as printed materials, and the majority of respondents indicated that they had no access to hospital computer networks from their homes. The ready availability of the internet service and the difficulties and obstacles occurring with CD-ROM workstations, OPAC, and electronic databases can be considered as factors in

respondents’ expressed preference for, and greater use of, the internet rather than other, arguably more appropriate, information tools. The high level of dissatisfaction expressed with current ICT facilities, and the adverse impact the shortage of facilities has on information provision, raises questions regarding the future of the information services provided in health science libraries in Riyadh.

Healthcare libraries in Saudi Arabia 335

Recommendations for improvement The results of the study, summarised above, show problems in the Saudi health library situation, particularly with reference to the limitations of ICTs, electronic information sources, and proactive information services, with training and CPD, and with strategies and policies for health libraries, especially those that will improve co-operation and networking. Health professionals are expecting faster access to health information and to be able to share such information with other professional bodies and individuals, but this is clearly not possible in the absence of a health information network. Health libraries would benefit from such a network by providing more convenient, accurate, and up-to-date information to all users. On the other hand, health librarians are not satisfied with the current condition of their libraries and services. They expect development related to various issues concerning health sciences libraries in Riyadh (i.e. co-operation, policy, access to electronic sources, development of information services and information networks). Information networks could create and improve co-operation among health libraries in Saudi Arabia and with other health libraries elsewhere. Regrettably, the current computing systems in most hospitals do not facilitate access to the health library database and other databases located in some hospitals and research departments. Furthermore, there is a need for information services’ development and the need to draw up a clear plan. In addition, there are some problems facing the implementation of ICT, some of which are the lack of training programmes, lack of co-ordination, and poor management. Health sciences libraries need to develop a type of co-operation, which lasts longer, for continuous development attached to formulated policies upon which librarians and users can rely. The results of the questionnaires and interviews showed that there is strong support for an initiative to provide Saudi Arabian institutions with a National Health Library, a Virtual Health Library, and an Association of Health Information Professionals in the country. Support for these was expressed by both health professionals and health library staff, as being ways of helping to overcome the kind of problems noted above. As a first step towards accomplishing these aims, the study reported here makes two recommendations: (1) the creation of a model for organisational change and development, applicable to the Saudi health library situation; and (2) the creation of a prototype website for a putative Saudi Health Information Network (SHIN). Figure 1 shows this as a three-stage process. The first stage, the gathering of information on the current situation of the health science libraries, has been described

AP 59,4/5

336

above, and leads to the second stage, the explicit recognition of the factors and constraints in the change process. This is followed by the development of an appropriate model. The main factors, coming from the results described above, may be summarised as: . lack of health information professionals; . weakness of libraries and information services; . digital/information divide amongst health sciences libraries; . information systems implemented are not fully utilised; . users’ dissatisfaction with current ICT and services in the health sciences libraries; . difficulties in accessing electronic information resources and weakness in the printed resources collections; . low cooperation among Riyadh hospitals and among health sciences libraries; . lack of studies conducted by hospitals to investigate health professionals’ information needs; . centralised and structured bureaucracy in the management of health sciences libraries; . lack of health library staff participation in decision making; and . slow pace of development. In order to include these diverse factors within a single useful model, a suitable framework had to be developed. This was based on the Brown University Library Model Organisation Framework (Brown University, 2000). This was derived in order to examine that library’s preparedness for adapting to the constant changes impacting on the academic, research and information environments, and allowing all staff to contribute to change management processes. The conceptual model comprises four major elements, referred to as “collaboratives”, each focusing on a set of user-centred activities: (1) scholarly resources; (2) learning and curricular resources; (3) access and delivery; and (4) organisational support.

Figure 1. Steps for change and development

The Brown model was modified and simplified (see Khudair, 2005, for details and rationale) to provide an “organisational visionary model” for the Saudi health library situation, which is shown in Figure 2. This proposed model attempts to represent the main factors in an understandable and usable way. Looking towards the future, health sciences libraries in Riyadh, with the adoption of the organisational visionary model, are expected to move from a traditional organisational structure and modalities towards a continuous spectrum of change. In order to facilitate incorporating technology in the work process, many professional development opportunities should be available. Therefore, predictions should be made to enable the implications of change to be positively managed rather than merely survived; health librarians’ participation can effectively manage that change in their own organisations. The environment is characterised by flexibility, collaboration, and interaction across units, with staff and users actively working together to foster an informative and successful environment. However, the change and development process in health sciences libraries in Riyadh needs to be simplified and presented in such a way that the participants in that change will accept and support the process. The technological infrastructure should make possible the support of a wide variety of options for offering various library and information services either within hospital buildings or online. Health sciences libraries development in Riyadh should continue to move to online and electronic resources to enhance remote access, meeting the need for resource access in any place and at any time. The model incorporates a collaborative approach in order to bridge the gap between change decisions and progress in real time. The priority is to participate in changing

Healthcare libraries in Saudi Arabia 337

Figure 2. Organisational visionary model for health sciences libraries

AP 59,4/5

338

and improving the current condition of health sciences libraries in Riyadh. This includes management style, advanced technology, improved communication channels, innovation trend, organisational and people development, and teamwork setting. The proposed model attempts to bring together and balance the internal focus of the library staff with an external focus on library users and its mission. It reaffirms the library’s traditional mission while proposing changes in how that mission can best be achieved utilising the new technologies and openness to change. The proposed change is to enable hospitals to achieve the libraries’ mission of being supportive, responsive to the eminence of healthcare distinguished by its commitment to openness, innovation, and excellence in applying well planned strategies and change practice. The organisational visionary model is proposed for health sciences libraries in order to solve key issues affecting the health sciences libraries in Riyadh. Also, this is to facilitate the utilisation and implementation of new technologies, for example the proposed health information network for Saudi Arabia, discussed below. Saudi Health Information Network In view of the importance of ICT in the healthcare system, it is very desirable that a network is developed to provide health information sources and services that will satisfy the information needs of health professionals in Saudi Arabia. Such a health information network could also help to identify and locate health information resources and services through the internet. The problem facing the health system in Riyadh is that the “body” (the health professional) and “soul” (the health information professional) are not joined as one to form a single entity. As a result of this separation, health professionals spend a great deal of time in information searching, while the health information professional’s role is underestimated. To overcome this problem, as a part of this study, there was developed the design of a prototype of a Saudi Health Information Network (SHIN) interface, instantiated as a website (for details, see Khudair, 2005; Khudair and Bawden, 2004). The proposed service is to offer regularly updated health and scientific articles and publications, and online health guidance relating to patients’ particular problems. This network would help health professionals and health information professionals perform effective functions within one setting, which will enhance their information seeking and satisfy their information needs. The proposed network will promote various channels of communication and co-operation in the healthcare environment. Importantly, it will help the healthcare environment to move towards the establishment of a flourishing health information society through popularising the use of electronic resources and highlighting the benefits and advantages of the electronic learning programmes. There will be links to all appropriate health sites, which would be authorised, authenticated and regulated. For example, users might find directories of governmental hospitals, governmental pharmacies, health sciences libraries, and guidance on organising and using personal health libraries. In addition, a web-based health information network prototype could be a key enabler and catalyst for such change in the health profession, because the internet has the capability to meet changes of this kind. This prototype has not yet reached the level of real implementation and evaluation, because of time and resource limitations. Future development will be assessed after the network has been launched and tested by users.

Conclusions The thorough empirical study, used a variety of methods, allowed a rich and reliable understanding of the Saudi healthcare library environment to be developed. Based on this, a visionary organisational model, to support the management of change, was developed, together with the prototype for a SHIN. Finally, the following specific recommendations could be made. Recommendations for health professionals . The electronic information services delivered to health professionals should be developed and improved to enable them to make more efficient use of their time. . During formal education, health professionals should be provided with opportunities to acquire basic information handling skills. . Health professionals should be provided with continuing educational programmes which cover their information competencies and keep up with technological advances to maintain their information management skills. . Advice and training programmes should be conducted using various methodologies, for example: one-to-one, within group, online consultation, live training courses (either on site or e-learning). . Policy makers for the healthcare system in the country should develop and implement policies and strategies to make certain that all health professionals have appropriate access to all forms of health and medical information. . Be more proactive and assertive in demanding improvement in and development of the current situation of few available resources and inadequate access to needed information. . Give more consideration to improving ICT skills in order for them to use various types of resources and not be limited to traditional printed materials. Recommendations for health library/information specialists . Improve their technical and technological skills to deliver effective information services. . Develop their professional attitudes and their practice paradigm from a reactive to a proactive stance. . LIS educational programmes should develop their curriculum to match the demands and challenges of the health information profession. . Staff members should be developed through a continuous training facility (continuous professional development). . Hospital management should work together with health information specialists to create and develop an information society in the healthcare environment. . Should participate in national and international conferences and meetings to discuss various issues related to their profession. . Establish a national association (The Association of Health Information Professionals). . Training programme evaluation should be considered in order for hospitals and health sciences libraries to develop their training services.

Healthcare libraries in Saudi Arabia 339

AP 59,4/5

340

.

A professional qualification in Library and Information Science must be considered as a condition of employment for specialised positions in health libraries.

It is to be hoped that the results of the study reported here (and in more detail in Khudair, 2005) may be of interest and value in countries other than Saudi Arabia, since many of the issues and constraints will be the same. References AbuOuf, S. (1995), “Use of information resources by physicians in Jeddah hospital libraries”, unpublished MSc dissertation, King AbdulAziz University, Saudi Arabia (in Arabic). Al-Ogla, S. (1998), “A study of hospital and medical libraries in Riyadh, Kingdom of Saudi Arabia”, Bulletin on the Medical Libraries Association, Vol. 86 No. 1, pp. 57-62. Alsereihy, H. (1998), “The status of LIS education in Saudi Arabia”, Journal of Education for Library and Information Science, Vol. 39 No. 4, pp. 334-8. AlShaya, A. (2002), “A study of the use of information sources, with special emphasis on CD-ROMs and the internet, by Saudi physicians in major government hospitals in Riyadh, Saudi Arabia”, unpublished PhD thesis, University of Wales, Aberystwyth. Al-Zahrani, S. (2002), “Use of information and communication technology in Saudi Arabia hospitals”, British Journal of Healthcare Computing and Information Management, Vol. 19 No. 10, pp. 17-20. Arif, M. (1998), “Inter-library loan service in the Kingdom of Saudi Arabia: a case study of medical libraries”, International Information and Library Review, Vol. 30 No. 4, pp. 341-65. Aseel, G. (1996), “Attitudes of physicians in the City of Jeddah towards the use of Medline databases on CD-ROM”, unpublished MSc dissertation, King AbdulAziz University, Saudi Arabia (in Arabic). Braude, R. (1997), “On the origin of species: evolution of health sciences librarianship”, Bulletin of the Medical Libraries Association, Vol. 85 No. 1, pp. 1-10. Brown University (2000), “Managing organizational development through effective leadership: the model program”, available at: www.brown.edu/Facilities/University_Library (accessed 28 May 2007). Khudair, A. (2005), “Health sciences libraries: information services and ICTs”, unpublished PhD thesis, Department of Information Science, City University London, London. Khudair, A. and Bawden, D. (2004), “Saudi Health Information Network (SHIN): a proposed prototype”, paper presented at the Fourth Regional Conference on EMR Health Sciences Virtual Library: Role in E-Learning and Building the Information Society, Regional Office for the Eastern Mediterranean, World Health Organisation, Cairo. Marghalani, M. (1993), “Continuing education for librarians and information specialists in Saudi Arabia”, in Woolls, B. (Ed.), Continuing Professional Education and IFLA: Past, Present and a Vision for the Future, K.G. Saur, Mu¨nchen. Maynard, S. (2002), “The knowledge workout for health: a report of a training needs census of NHS library staff”, Journal of Librarianship and Information Science, Vol. 34 No. 1, pp. 17-32. Rehman, S. and Al-Ansari, H. (2003), “The digital marketplace and library and information education in the GCC member nations: a critical review”, Library Review, Vol. 52 No. 4, pp. 170-9.

Sewell, M. (2002), “The use of qualitative interviews in evaluation”, University of Arizona, Tucson, AZ, available at: www.ag.arizona.edu/fcs/cyfernet/cyfar/Intervu5.htm (accessed 28 May 2007). Siddiqui, M. (1996), “Library and information sciences education in Saudi Arabia”, Education for Information, Vol. 14 No. 3, pp. 195-214. Corresponding author David Bawden can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Healthcare libraries in Saudi Arabia 341

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

AP 59,4/5

Impact of digital information resources in the toxicology literature

342

Lyn Robinson

Received 19 December 2006 Accepted 3 June 2007

Department of Information Science, City University London, London, UK Abstract Purpose – The purpose of the study reported here was to assess the degree to which new forms of web-based information and communication resources impact on the formal toxicology literature, and the extent of any change between 2000 and 2005. Design/methodology/approach – The paper takes the form of an empirical examination of the full content of four toxicology journals for the year 2000 and for the year 2005, with analysis of the results, comparison with similar studies in other subject areas, and with a small survey of the information behaviour of practising toxicologists. Findings – Scholarly communication in toxicology has been relatively little affected by new forms of information resource (weblogs, wikis, discussion lists, etc.). Citations in journal articles are still largely to “traditional” resources, though a significant increase in the proportion of web-based material being cited in the toxicology literature has occurred between 2000 and 2005, from a mean of 3 per cent to a mean of 19 per cent. Research limitations/implications – The empirical research is limited to an examination of four journals in two samples of one year each. Originality/value – This is the only recent study of the impact of new ICTs on toxicology communication. It adds to the literature on the citation of digital resources in scholarly publications. Keywords Worldwide web, Sciences, Communication technologies, Information transfer Paper type Research paper

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 342-351 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817564

Introduction The purpose of this study was to assess the extent to which digital information has become significant in the formal scholarly and professional literature of toxicology. More specifically, the interest was not in digital equivalents of printed products – i.e. electronic journals, computerised abstracting and indexing services, etc. – but in novel information entities – i.e. web pages, e-mail lists, weblogs, wikis etc. The study follows on from a comprehensive analysis of the communication of toxicology information, and the impact of new ICTs upon it (Robinson, 2002). The study examined the toxicology domain, its information resources and the communication of information and knowledge within it, from a number of perspectives, including: . analysis of the nature of toxicology as a discipline and of toxicological knowledge; . historical study of the development of the discipline and its information infrastructure; . construction of resource lists;

. . .

bibliometric analyses; quantitative and qualitative evaluation of retrieval systems and services; and examination of vocabularies and terminologies.

It was therefore an example of domain analysis, as described by Hjørland (2002). Part of this study involved an empirical investigation of the extent and impact of new forms of information resources in the formal toxicology literature in the year 2000. An updating study was performed for the year 2005, to assess developments and changes, and is reported here. Toxicology and its information resources Toxicology, the “science of poisons”, is concerned with actual and potential harmful effects of chemical substances upon humans and animals. It is a coherent subject in its own right, but overlaps with many other subject areas, especially chemistry, pharmacology, medicine, environmental sciences, and (increasingly) genomics. It thereby has multidisciplinary and interdisciplinary character, and in particular has two main strands: (1) the scientific, by which the mechanisms of toxicity are characterised; and (2) the legislative/regulatory, by which appropriate safeguards of human and animal health are maintained (Robinson, 2002; Gallo, 1996; Koeman, 1996). Toxicology is a rapidly developing subject with a long history and – not surprisingly, given its wide scope, multidisciplinary nature, and economic, as well as scientific, importance – a rich and well-developed set of information resources (Robinson, 2002; Wexler et al., 2000; Kissman and Wexler, 1983). Robinson (2002) carried out a compilation of significant toxicology information resources extant around the turn of the millennium, according to a systematic procedure for creating resource lists. Conclusions drawn from this were that the main formal communication systems and resources within toxicology had been little affected by new ICTs, although access to “traditional” resources – journal articles, reports, conference papers, etc. – had been made more convenient by electronic databases and by the web. The newer forms of communication had had most impact on the area of informal “pre-primary” information transfer. It is therefore possible to identify toxicological examples of most of the newer forms of resource. Websites predominate, but it is also possible to identify (with one current example shown for each): . open access internet journals (e.g. Particle and Fibre Toxicology; see www. particleandfibretoxicology.com); . metasearch engines for toxicology material (e.g. ToxSearch from the National Library of Medicine, Washington DC; see http://toxsearch.nlm.nih.gov); . web rings (e.g. Forensic Entomology web ring; see http://nav.webring.yahoo. com/hub?ring ¼ forent&list); . web logs (e.g. The Toxicology Weblog from the Walther-Straub Institute, Munich; see http://radio.weblogs.com/01002537);

Digital resources in the toxicology literature 343

AP 59,4/5

.

.

344

.

.

electronic discussion lists (e.g. toxlist listserv from Syracuse Research consultants; see http://syracuseresearch.com/esc/tox-toxlist.htm); wikis (e.g. Chemical Safety page, Chemical Information Sources wiki, Indiana University; see http://cheminfo.informatics.indiana.edu/cicc/cis/index.php/ Chemical_Safety); internet portals (e.g. ToxIndex: internet toxicology portal from Soteros Consultants Ltd; see www.toxindex.com); and public information portals (e.g. Toxicology Source from Cambridge Toxicology Group consultants; see www.toxicologysource.com).

The purpose of the study reported here was to assess the degree to which these new forms of resource impacted on the formal toxicology literature, and the extent of any change between 2000 and 2005. Impact of e-resources: citation study To assess the impact of new types of information resources, samples of the toxicology journal literature were examined. Reference to new information items might be made in a number of places, particularly as entries in a cited references list, or as footnotes, or parenthesised items in the text; and its citation might be done in many ways (Bird and Casserly, 2003). For these reasons, it was decided that no automated method would be reliable and a set of journals would have to be examined cover-to-cover. This would also allow elements such as instructions to authors to be examined. This follows the methodology used by other studies of the citation of electronic sources in library/information science journals (Zhang, 1998, 2001; Vaughan and Shaw, 2003), in conference papers in information science (Maharana et al., 2006), and in electronic journals in a variety of subject areas (Herring, 2002). One distinction is that these studies considered all electronic formats, including e-journal articles, while this study focused on novel forms of communication. Four scholarly/professional journals with a strong toxicology focus were chosen: (1) Toxicological Sciences (formerly Fundamental and Applied Toxicology) – a US journal, with a strong academic bias, and coverage of all aspects of toxicology; (2) Human and Experimental Toxicology – a British journal, emphasising experimental toxicology studies; (3) Archives of Toxicology – a European journal, emphasising mechanistic aspects; and (4) Veterinary and Human Toxicology – a US journal, with a “practitioner” focus, emphasising clinical treatment of poisoning. These four journals, because of their various national origins and emphasis on different aspects of the subject, give a good representation of the current journal literature. 2000 situation All the printed issues of each journal published in 2000 were scanned cover-to-cover. The following were recorded:

.

.

total number of significant items – articles, reviews, summaries, commentaries; and number and percentage having any reference, in any form, to novel digital information formats.

Then, only for those articles having some reference to digital information the total number of references and percentage of digital items were recorded. Web resources were sometimes mentioned in the text of an article, sometimes in the reference list, and sometimes both, with no clear rationale. For consistency, those mentioned in the text were treated as additional references in the counts. The results are shown in Table I. The results in Table I show the very limited penetration of novel information formats into the toxicology literature in the year 2000. Less than 5 per cent of the articles in any journal had any reference to such a format, and even in those articles that did make such a reference, less than 10 per cent of the references were of this form in all cases. The overall penetration of these novel information entities, at that time, was very small. All of the digital items found were web pages. (The only other “contenders” were two Hazardous Substance Databank records, and a substance directory on CD-ROM, all from Veterinary and Human Toxicology – these were not included, as being digital counterparts of printed resources.) They were largely used to reference governmental agency material, particularly from US sources such as the Environmental Protection Agency, National Institutes of Health, Food and Drug Administration and National Toxicology Program, and from the European Commission. Other references were to commercial data sources, and to statistics from groups such as the American Heart Association. The great bulk of references were to “traditional” information sources: largely journal articles, but also books, reports, conference proceedings, patents, theses, etc. Some of the reports, from government sources, would have been likely to be available on the web, but only the address of the issuing agency was given. Similarly, the “methods” sections of many papers described equipment, laboratory supplies, methods, etc., identifying these by the name and postal address of the organisation, as required in the journal’s “instructions to authors”; it is likely that many of these organisations would have had a web presence by that time, but this form of reference was not used.

Items Toxicological Sciences Human and Experimental Toxicology Archives of Toxicology Veterinary and Human Toxicology

Digital resources in the toxicology literature 345

All journal issues Articles with digital references With digital Digital references Percentage References references Percentage

297

13

4

611

28

5

94 111

1 1

1 1

38 45

2 1

5 2

86

2

2

39

3

8

Table I. Reference to novel information formats in the toxicology literature in 2000

AP 59,4/5

346

Personal communication was described as such, as a reference or an acknowledgement; it is likely that much of this might take the form of e-mail material, but this was not made clear. Similarly, “data on file” or “unpublished data” were a common form of reference, without any example of this being made available electronically. The small extent of referencing of digital formats in 2000 is perhaps surprising, in view of the apparent eagerness of some of these journals at that time to embrace new technologies. Toxicological Sciences, for example, urged its readers to visit its website, and to utilise its online versions. Its authors had to submit electronic forms of manuscripts, and to give an e-mail address for correspondence (though only one author added a personal web-page address). However, this enthusiasm for technology in producing and delivering the product did not appear to have affected the nature of the scientific record itself, even in this journal, which was more advanced in its adaptation to the web environment than the other three. 2005 situation To assess the changes that had come about over a time period which saw a greatly increased adoption of the web as an information environment, the process was repeated for publications in the year 2005. Veterinary and Human Toxicology ceased publication at the end of 2004 due to cutbacks in educational funding in the USA, and withdrawal of support by the sponsoring academic institution (Robertson, 2004), and therefore the 2004 issues were used for this journal. Toxicological Sciences was available only in e-journal form at the study location (British Library, London), and was examined in this format; printed issues were used for the other three journals. Scanning, of the kind that formed the basis for this study, was easier in print format. Carrying out the equivalent examination of the e-journal was more time-consuming and prone to error, because of the journal’s tendency to “hide” some material in supplementary sections, for which additional windows had to be opened. The search facility was used as an adjunct to “eye balling”, to minimise the possibility of overlooking references in the text. It was clear that, in the 2005 sample, some materials available on the web (e.g. substance data or regulatory reports) were cited as digital items, with a web address, by some authors and not so by others. In this study, the former were treated as digital citations, and the latter were not. This was for two reasons. It was not possible to be sure whether those authors who did not make any mention of the web resource had used the material in digital form, or were aware that it was available in this way. In any event, the purpose of the study was to assess the impact of web resources into the scholarly literature, and if they were not cited as such then their impact must be nil. The results, in the same form as for 2000, are shown in Table II. Between 2000 and 2005 (2004 for Veterinary and Human Toxicology), the percentage of items with digital references increased for all four journals, from a mean of 3 per cent to a mean of 19 per cent. Increases for individual journals were between a factor of 6 and 14. The greatest extent in 2000 was 4 per cent, while in 2005, the minimum was 10 per cent, while the maximum was 24 per cent. The percentage of digital references, in those items that had any, also increased, though not by so dramatic a factor, from a mean of 5 per cent to a mean of 8 per cent: the maximum percentage for any one journal increased from 8 per cent to 11 per cent.

This shows that the citing of some new forms of resources became much more widespread through this period: although four out of five articles taken over the four journals still cited only “traditional” resources by the end. The extent to which such new resources were cited grew more slowly: only for one journal was more than one in ten citations to a new form of resource. This indicates that, despite an increasing recognition of web-based resources, their penetration into the scholarly and professional literature of toxicology is limited. A wider range of material was included, compared with the 2000 situation, which could be categorised as follows: . regulations and guidelines: 15 per cent; . substance data: 15 per cent; . laboratory procedures and methods: 12 per cent; . unpublished data and reports: 12 per cent; . official bodies and programmes: 11 per cent; . genomic data sources: 10 per cent; . data analysis techniques and software: 8 per cent; . “general” information: 5 per cent; . laboratory facilities and equipment: 4 per cent; . news and announcements: 3 per cent; . nomenclature and terminology: 3 per cent; and . images: 2 per cent.

Digital resources in the toxicology literature 347

All of this variant material was present on web pages. No mailing lists, discussion forums, personal e-mail messages, weblogs, wikis etc. were cited at all. The largest contributions (15 per cent of the total) come jointly from national and international legislation, regulations and guidelines relating to toxic materials, and to chemical and biological hazards, and from the extensive files of data on toxic and potentially toxic substances maintained to support such regulation. These form the regulatory background to the largely scientific and medical studies reported in the four journals. Coming from bodies such as US National Institutes of Health, the US Food and Drug Administration, the European Commission, and the Organisation for Economic Co-operation and Development, they are now largely communicated through All journal issues With digital Items references Percentage Toxicological Sciences Human and Experimental Toxicology Archives of Toxicology Veterinary and Human Toxicology

Articles with digital references Digital References references Percentage

322

76

24

3,370

238

7

87 92

12 9

14 10

879 302

95 14

11 5

103

15

15

259

26

10

Table II. Reference to novel information formats in the toxicology literature in 2005

AP 59,4/5

348

the web. Web-based information on these bodies themselves, and their various programmes, accounts for 11 per cent of the total. The next largest contributions (at 12 per cent) include descriptions of a variety of laboratory and environmental methods, procedures and “good practice guidelines” (with descriptions of laboratories themselves, and laboratory equipment and materials adding another 4 per cent), and also a variety of unpublished reports, bibliographies and data compilations. Web-based data sources in genomics – a type of resource that has gained greatly in importance since 2000 – account for 10 per cent, largely in the more “academic” Toxicological Sciences journal. Standards and software for data analysis and statistical reporting account for 8 per cent of the total. “General” information covers 5 per cent of the total, coming largely from the practitioner-oriented, and rather eclectic, Veterinary and Human Toxicology journal. This includes a wide variety of topics and materials, including census data, prescribing data, information on pet keeping, the metal composition of coinage, and encyclopaedia articles on a variety of subjects. Smaller coverage goes to web-based image resources (generally for pathology images), terminology, nomenclature, and definitions of terms and concepts, and news items, press releases, and announcements from a variety of sources. In summary, it can be said that the web is now becoming a widely used – although by no means universal – way of communication of information of relevance to toxicology in the form of what has traditionally been described as “grey literature”. It is also becoming significant for data on substances, whether this is toxicological, physical, chemical or economic and use data. This, in effect, gives more convenient access to well-established types of information resources, which would hitherto have been accessible only on paper, or through proprietary computer databases. There was no evident relationship between citing of digital resources, and the particular subject of the article, national affiliation of authors, etc. Citing was most commonly a small number of items per article, though two articles (dealing with the general topic of allergy and with legal and regulatory issues, respectively) each cited over 30 such resources. There is no indication here that newer forms of communication – weblogs, wikis, discussion lists, etc. – are playing any role in this formal communication of toxicology knowledge. The journals themselves had generally adapted to a web environment to a greater extent than in 2000, although the instructions to authors still gave no advice on the citation of non-traditional material; the same is true in other subject areas (Bird and Casserly, 2003). All except Veterinary and Human Toxicology had an electronic version alongside print, operated through a journal web page, gave author e-mails for all articles, offered links to references through CrossRef, Medline, etc., provided DOI identifiers for articles, etc. Veterinary and Human Toxicology appeared rather different in nature, deliberately espousing a “magazine-like” printed format, with a good deal of non-scientific content, including job advertisements, announcements of meetings, opinions columns, and cartoons and jokes; the latter often with little obvious relevance to toxicology. This format, deliberately designed to be accessible to professional readers, especially those whose first language is not English (Robertson, 2004), might seem to lend itself to inclusion of more web-based material. It is ironic to note that the last issue of this

journal before it ceased publication had an unusually high proportion of digital citations, and intriguing to speculate how this might have developed had publication continued. Discussion These results are generally in accordance with those of earlier studies of the citation of e-resources, albeit in other subject areas. These found an initial very small impact of e-citation (Zhang, 1998), increasing somewhat with time (Zhang, 2001, Herring, 2002), to a level equivalent to print sources (Vaughan and Shaw, 2003, Maharana et al., 2006). It should be noted that these studies included resources such as journal articles in electronic form, which were not included here, and that they focused on citing sources published in e-journals themselves, or on the information science subject area: both factors likely to promote e-citation. That being so, the rather higher rates of e-citation, e.g. 35 per cent (Maharana et al., 2006) and 26 per cent (Herring, 2002) seem generally comparable to the mean 19 per cent found here for the 2005 case. They are also in accordance with the findings of researchers who have carried out extensive longitudinal studies of, largely American, scientists and technologists (Tenopir and King, 2000; King and Tenopir, 2001). While ICTs are used for many purposes, especially for informal communication, the traditional journal is still the overwhelmingly important medium for presentation of substantive information. It is also noticeable that very few “internet journals” with toxicology content can be identified, compared with the many still current in printed (and, in some cases, also electronic) form (Robinson, 2002), confirming the view that even peer-reviewed e-journals have had limited impact (Harter, 1998). Robinson (2002) identified only two such journals relevant to toxicology: Internet Journal of Medical Toxicology and Internet Journal of Forensic Medicine. In the intervening years, Internet Journal of Medical Toxicology has ceased publication, while four others have been started: (1) Internet Journal of Toxicology; (2) Journal of Toxicological Sciences; (3) Journal of Occupational Medicine and Toxicology; and (4) Particle and Fibre Toxicology. All are open access journals, available in electronic form only. Their initiation may indicate a move toward this form of resource for toxicology, but should be set against the large number of “conventional” journals in the field, over 70 being identified in Robinson’s (2002) study. Further verification of the low impact of new communication tools is given by a small survey of practising toxicologists carried out in mid-2005, as part of a Master’s dissertation project (Papageorgiou, 2005). A total of 29 responses were received from 63 members of American and British toxicology associations who were asked to complete a short questionnaire on their information behaviour. The relevant results were: . printed and electronic journal, and computerised databases, are used frequently by virtually all; . websites and e-mail messaging are used frequently by virtually all;

Digital resources in the toxicology literature 349

AP 59,4/5

350

. .

a majority made some use of discussion forms and mailing lists; and very few made any use of weblogs, wikis, portals, or instant messaging/chat.

Accepting the small sample, this confirms that reliance is still on the “traditional” forms of communication (journals, data collections), albeit facilitated by electronic access, that the web and e-mail are used for informal communication, and that the newer forms of resource have made little impact. Conclusions The formal communication system of toxicology has been little affected by new forms of information resource. Reliance is still placed on the “traditional” journal article, with citations largely to other “traditional” resources, though these may now be accessed electronically rather than in print form. A noticeable increase in the proportion of web-based material being cited in the toxicology literature has occurred between 2000 and 2005. References Bird, J.E. and Casserly, M.F. (2003), “Web citation availability: analysis and implications for scholarship”, College and Research Libraries, Vol. 64 No. 4, pp. 300-17. Gallo, M.A. (1996), “History and scope of toxicology”, in Klaasen, C.D. (Ed.), Casarett and Doull’s Toxicology: The Basic Science of Poisons, McGraw-Hill, New York, NY, pp. 3-11. Harter, S.P. (1998), “Scholarly communication and electronic journals: an impact study”, Journal of the American Society for Information Science, Vol. 49 No. 6, pp. 507-16. Herring, S.D. (2002), “Use of electronic resources in scholarly electronic journals: a citation analysis”, College and Research Libraries, Vol. 63 No. 4, pp. 334-40. Hjørland, B. (2002), “Domain analysis in information science: eleven approaches – traditional as well as innovative”, Journal of Documentation, Vol. 58 No. 2, pp. 422-62. King, D.W. and Tenopir, C. (2001), “Using and reading scholarly literature”, Annual Review of Information Science and Technology, Vol. 34, pp. 423-77. Kissman, H.M. and Wexler, P. (1983), “Toxicological information”, Annual Review of Information Science and Technology, Vol. 18, pp. 185-230. Koeman, J.H. (1996), “Toxicology, history and scope of the field”, in Niesink, R.J.M., de Vries, J. and Hollinger, M.A. (Eds), Toxicology: Principles and Applications, CRC Press, Boca Raton, FL, pp. 3-14. Maharana, B., Kalpana, N. and Sahu, N.K. (2006), “Scholarly use of web resources in LIS journals: a citation analysis”, Library Review, Vol. 55 No. 9, pp. 598-607. Papageorgiou, I.V. (2005), “Information resources in toxicology”, unpublished MSc dissertation, Department of Information Science, City University London, London. Robertson, W.O. (2004), “V and HT’s future – like all futures – remains uncertain”, Veterinary and Human Toxicology, Vol. 46 No. 6, pp. 352-3. Robinson, L. (2000), “A strategic approach to research using Internet tools and resources”, Aslib Proceedings, Vol. 52 No. 1, pp. 1-19. Robinson, L. (2002), “Toxicology information and knowledge: the impact of new information and communication technologies”, unpublished doctoral thesis, University College, London. Tenopir, C. and King, D.W. (2000), Towards Electronic Journals: Realities for Scientists, Librarians and Publishers, Special Libraries Association, Washington, DC.

Vaughan, L. and Shaw, D. (2003), “Bibliographic and web citations: what is the difference?”, Journal of the American Society for Information Science and Technology, Vol. 54 No. 14, pp. 1313-22. Wexler, P., Hakkinen, P.J., Kennedy, G. and Stoss, W. (Eds) (2000), Information Resources in Toxicology, 3rd ed., Academic Press, London. Zhang, Y. (1998), “The impact of internet based electronic resources in formal scholarly communication”, Journal of Information Science, Vol. 24 No. 4, pp. 241-54. Zhang, Y. (2001), “Scholarly use of internet-based electronic resources”, Journal of the American Society for Information Science and Technology, Vol. 52 No. 8, pp. 628-54. Corresponding author Lyn Robinson can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Digital resources in the toxicology literature 351

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

AP 59,4/5

Evaluation of web search for the information practitioner

352

Centre for Interactive Systems Research, Department of Information Science, City University London, London, UK

A. MacFarlane

Received 14 December 2006 Accepted 21 May 2007

Abstract Purpose – The aim of the paper is to put forward a structured mechanism for web search evaluation. The paper seeks to point to useful scientific research and show how information practitioners can use these methods in evaluation of search on the web for their users. Design/methodology/approach – The paper puts forward an approach which utilizes traditional laboratory-based evaluation measures such as average precision/precision at N documents, augmented with diagnostic measures such as link broken, etc., which are used to show why precision measures are depressed as well as the quality of the search engines crawling mechanism. Findings – The paper shows how to use diagnostic measures in conjunction with precision in order to evaluate web search. Practical implications – The methodology presented in this paper will be useful to any information professional who regularly uses web search as part of their information seeking and needs to evaluate web search services. Originality/value – The paper argues that the use of diagnostic measures is essential in web search, as precision measures on their own do not allow a searcher to understand why search results differ between search engines. Keywords Worldwide web, Information searches, Retrieval performance evaluation, Search engines, Measurement Paper type Research paper

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 352-366 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817573

1. Introduction Web search is an important part of the working life of an information professional, but a little understood issue is one of evaluation. How do such professionals evaluate the retrieval effectiveness of a given search engine with regard to a particular information need, or compare the retrieval effectiveness of several search engines? There has been some research in web search evaluation, but few attempts to practically apply evaluation methods in a real environment. There is a need for structured and formal techniques for evaluation that yield quantitative data, in which searchers can clearly see differences in search engines. Such techniques have been around for over 40 years (Aitchison and Cleverdon, 1963) using precision and recall measures, but these techniques do not tackle all the issues that may occur when evaluating web search. In this paper we show why such traditional IR measures on their own do not provide enough information for the researcher when evaluating web search, and to show how diagnostic measures (such as recording the number of broken links) can be used to augment such traditional measures. The paper puts forward a methodology which was The author is grateful to Professor Stephen Robertson for advice on both what measures to use and how to interpret statistical significance on the experiments described in this paper.

initially derived while working in the commercial sector, and has been subsequently refined over six years in teaching search and evaluation to Library and Information Science postgraduate students. We argue that this methodology gives a much better idea of the retrieval effectiveness of web search engines, as well as the ability to examine other processes in web search (such as crawling) which are not part of online search and are not addressed by such measures as recall and precision.

Evaluation of web search

353 2. Previous research in evaluation of web search There is a significant amount of interest in the information retrieval research community in evaluating web search, including various tracks in the TREC conference series including the VLC2 (Hawking et al., 1999) and Web Tracks (Craswell and Hawking, 2005). Strong arguments are made for the use of scientific methods to evaluate web search either in a live environment (Hawking et al., 2001) or on a static frozen collection (Craswell and Hawking, 2005). A live environment in this context is a real online searching situation where the documents set changes (usually increases in size), while a static frozen collection is a document set used in laboratory style evaluations. Whilst we accept that in order to control variables for scientific experiments, in order to gather useful data, an information professional working with real web search needs to work with live web search engines in order to assess the complexities of web search. However, the practitioner can still learn from the many valuable lessons from scientific experiments, particularly the measures used by web IR researchers. But what precisely is it that the practitioner needs to evaluate? Web queries can be divided into three main types: (1) navigational; (2) transactional; and (3) informational (Broder, 2002). A navigational query is one in which a user wants to find a particular website (e.g. the home page of City University), whereas a transactional query is where the user wants to find a site where some further interaction will take place (e.g. where can I buy bookcases?). An informational query is one which is needed to satisfy an anomalous state of knowledge, or ASK (Belkin et al., 1982). An example of this would be “what are the legal precedents for civil cases in conveyancing?” With navigational queries the user is looking for a single item for the most part, while the user requires multiple items for transactional and informational queries. The practitioner could potentially be faced with both navigational and transactional queries, but by and large their users’ information needs may be classed as informational queries. Table I describes some traditional IR measures used for evaluation in web search together with their target query type. It should be noted that binary decisions on relevance (relevant or not-relevant) are dominant in the field to date, however there is some interest in using non-binary evaluation methods for web search (Clarke et al., 2005). Other non-traditional methods have been used for evaluation purposes. Vaughan (2004) uses a number of different mechanisms, for example result ranking quality (such as correlation between user ranked pages and pages ranked automatically by search engines), and stability measures (compare the ranking of documents over a given period, e.g. two weeks). However, these measures rely for the most part on relevant documents (in effect a detailed comparison of precision) or the ranking mechanism

References

Descending scale from (say) 1-5 is used, e.g. hit at rank 1 is given score 1.0, at 2 score 0.5, etc. Hits outside of rank 5 are assigned 0 score. Scores are then averaged from all queries Precision at R (total number of relevant documents) Proportion of queries in which good answers were found at rank N Recall at N documents retrieved Proportion of queries where the right answer was found in the top N hits Proportion of queries where no right answers were found in the top N hits

Mean reciprocal rank

R precision

Success N

Recall N

Percentage top N

Percentage fail N

Chowdhury and Soboroff (2002), Hawking and Craswell (2002), Hawking and Craswell (2003), Hawking et al. (2004), Craswell and Hawking (2005)

Navigational

Navigational

Hawking and Craswell (2002)

Hawking and Craswell (2002), Hawking et al. (2004)

Informational Craswell and Hawking (2005) Transactional

Informational Craswell and Hawking (2005) Transactional Navigational

Informational Hawking et al. (2004) Transactional

Navigational

Precision at N documents Divide number of relevant documents by N (where N Informational Hawking and Thistlewaite (1998), Hawking et al. is the total number of documents retrieved) Transactional (1999), Leighton and Srivastava (1999), Gordon and Pathak (1999), Wu and Li (1999), Hawking et al. (2000), Hawking et al. (2001), Hawking (2001), Hawking and Craswell (2002), Hawking and Craswell (2003), Hawking et al. (2004), Craswell and Hawking (2005)

Average of all precision scores each time a document Informational Hawking and Thistlewaite (1998), Hawking et al. Transactional (1999), Gordon and Pathak (1999), Hawking et al. is retrieved (example of how to calculate this (2000), Hawking et al. (2001), Hawking (2001), measure is given in Table III) Hawking and Craswell (2002), Craswell and Hawking (2005), Clarke et al. (2005)

Query type

Average precision

Table I. Traditional IR measures used to evaluate search Calculation

354

Measure

AP 59,4/5

(between two sets of results). Other more frequently used measures try to examine in more detail why particular documents are not relevant, and if precision is affected adversely – we label these “diagnostic measures” (see Table II). These methods have been used in different contexts. For example, Leighton and Srivastava (1999) look at web search generally, while Wu and Li (1999) focus on web search for health information. It should be noted that these diagnostic measures can be used in conjunction with non-binary relevance judgements. In a review of web search methods Oppenheim et al. (2000) argue for a broad based approach using both traditional and diagnostic measures. The author fully agrees with this strategy, but if you are to use diagnostic measures you need to consider other aspects of web search (e.g. none of the studies referenced in this paper consider spam documents).

Evaluation of web search

355

3. Motivation for the study The primary motivation for this study is to give information practitioners or professional search intermediaries some guidance on how to evaluate web search in the light of experience gained by researchers in the field. We believe that many of the traditional IR measures which have been used for many years are still useful for evaluation in our context, but many of the studies concentrate on evaluating test collections. This is very useful in a scientific context, but information practitioners have to deal with real dynamic collections, and these traditional measures in isolation do not provide the required mechanism for dealing with web search. The diagnostic measures provide information practitioners with the extra data they need in order to properly evaluate searching for information on the web for their users. The rest of the paper outlines this evaluation methodology and how to use it. 4. Proposed evaluation methodology Before starting the evaluation the practitioner needs to make some important decisions on what they will be evaluating. The first (and most obvious) decision to make is to determine which search engines they will evaluate in their study. This may include mainstream search engines such as Google (www.google.com) and Altavista (www. altavista.com) or more specialist search engines such as the Health Library (www. health-library.com) and Law Crawler (lawcrawler.findlaw.com). The choice will be often determined by the information needs of the users the practitioner serves. The number of search engines to evaluate is also an important issue – this will depend on the resources available to the practitioner (the evaluation methodology described in this paper is very resource intensive). Measure

Calculation

Query type

References

Duplicates

Count the number of duplicate documents in the top N hits

Informational Leighton and Srivastava (1999), Transactional Wu and Li (1999), Oppenheim et al. (2000)

Broken links

Count the number of broken links Informational Wu and Li (1999), Oppenheim (e.g. 404 not found) Transactional et al. (2000) Navigational

Table II. Diagnostic measures

AP 59,4/5

356

When this is decided the number of topics to evaluate needs to be chosen (we suggest 50 as used in many TREC experiments) and the number of pages to examine for each topic for every search engine (this should be consistent across all topics and search engines). For the latter we recommend that only the first top ten sites are examined in the evaluation as it reduces the evaluation workload significantly. Many case studies have shown that users very rarely view pages of hits beyond the first page of the hit list (e.g. in Silverstein et al., 1999, 85.2 per cent of individual queries only viewed one page of results from AltaVista). A further issue to consider is the number of URLs to navigate from a hit list to find a relevant web page – or alternatively the number of clicks the user requires to find that web page. This may be needed for some queries where a relevant web page satisfying the users information need is buried somewhere within the website. A maximum of three clicks to find a relevant web page is considered reasonable – the “three click rule” (Zeldman, 2001). Alternatively the practitioner can use a more stringent method and assume that the user is only interested in pages that are linked directly from the hit list. In all cases the strategy used should be consistent, to ensure that the results produced for the evaluation make sense. When the issue of what to evaluate has been decided on, the practitioner can then think about conducting the evaluation. In the next three sections we describe the measures and the process that can be used for web search evaluation. We assume for this example that the information professional will undertake the evaluation, but the methodology could be utilized with users if such is required. Note that for the rest of our discussion we make a binary relevance assumption for documents. The variables used in the experiment are as follows. 4.1 Traditional measures We recommend the use of two measures: (1) precision at N documents retrieved; and (2) average precision (as described in section 2). We used the precision at N calculation to see how precision deteriorates over a given number of blocks or chunks. If the recommendation on examining only the first ten hits is taken from above, a reasonable strategy is to calculate precision at five and ten sites retrieved. These are standard measures used for many years in laboratory-based evaluations, and have been used on all kinds of information including news stories, government reports, journal articles, as well as the web. Calculation of precision at five and ten is very simple – the total number of relevant documents found to that point is divided by either five or ten. How well does the engine retrieve documents against the known total number of documents relevant (recall)? It is impossible to know the recall for collections the size of the web, so we need some estimate that we can sensibly use in order to give a figure to compare. If the practitioner inspects ten documents at most, they can make the assumption that we have at least ten documents for our given information need. This strategy might come in for some criticism in the sense of how you can be sure that there are ten relevant documents for any information need you may have. The author’s answer to this is that there are now eight billion-odd web pages indexed by Google as this paper is being written, and it is reasonable to assume that at least ten of those will

be relevant for many users’ information needs. If the practitioner is unhappy with this mechanism, they could use a pooling method for relevant documents (Voorhees and Harman, 2000), which would mean that the retrieved sets would need to be merged and the number of relevant documents found for each topic used instead of the assumed ten. This does, however, place an extra burden on the practitioner when conducting the evaluation. The author does not regard this as a significant issue provided the strategy taken on the assumption of relevant documents is consistent. Because of the assumption made with regard to relevant documents we label the measure “estimated average precision”. Any evaluator uncomfortable with this assumption can use the traditional “pooling” method used by the TREC research community (Voorhees and Harman, 2000), at the cost of extra effort to build the pools. We use this assumption to calculate average precision (see Table I). Average precision is a precision-based measure linked to recall. The evaluator uses this measure to see how our search engines are doing against the estimated recall and how this relates to precision. It also tells the evaluator how well relevant documents are being ranked across the whole hit list. Table III shows how average precision can be calculated (given our assumptions on ten documents retrieved, ten documents relevant). Each time a relevant document is retrieved, the total number of relevant documents found so far is divided by the current rank. The evaluator then accumulates the average precision scores, which in the case of Table III gives us a total of 3.33. Dividing this by ten (our assumed number of relevant documents) gives us an estimated average precision (EAP) of 0.33. The more relevant documents higher up the hit list rank, the better the EAP score.

Evaluation of web search

357

4.2 Diagnostic measures These measures are used to show why documents are not relevant beyond the fact that many documents do not meet an information need, and the subsequent impact this has on the precision measures described in section 4.1. Two diagnostic measures have already been introduced – duplicates and broken links – but there are other measures which need to be considered such as spam and a figure for hit lists which do not retrieve a full ten documents (which we must consider if we assume that there are at least ten relevant documents). The calculations for these measures are simple – the occurrences of a particular metric are accumulated (scores are recorded between 0 and 10). We describe each of these diagnostic measures in turn below.

Rank 1 2 3 4 5 6 7 8 9 10

Relevant

Rels/rank

1 0 1 0 1 0 0 1 1 0

1=1 ¼ 1 – 2=3 ¼ 0:67 – 3=5 ¼ 0:6 – – 4=8 ¼ 0:5 5=9 ¼ 0:56 –

Table III. Calculating estimated average precision

AP 59,4/5

358

4.2.1 Repeated documents (or duplicates). It is often the case that searches will bring up identical pages in a retrieved list. Since they contain the same information, it makes sense to mark the first encountered page relevant, and treat other subsequent pages as being irrelevant. Choosing criteria for duplicates can be difficult. Must the documents be identical in every sense? Is the information in the document identical? Are retrieved documents from the same site (when you only actually want one when completing navigational searches)? A good example of why this may happen is multinational companies that have offices in several places – some search engines are better at handling this type of problem than others. It may be best to use a simple method – if the page looks the same and has the same information it is a duplicate, otherwise it is not. However, the definition of duplicates will often depend on the type of information being searched for and the query type. 4.2.2 Not retrieved. As we want ten documents to retrieve, any hit lists which retrieve less than ten documents damages our precision and we want to penalise search engines that do not retrieve our required number of pages. 4.2.3 Link broken. This occurs when a user clicks on a link and gets an error message (e.g. 404 not found). Sometimes you may find that the link returned is a redirected page – the author would suggest that if the target of the redirection is relevant, then you mark the page as being relevant (if the webmaster/author has taken the trouble to make sure the information is available we should give them credit). In such a case the evaluator should not mark the link being broken. 4.2.4 Spam. A big issue for search engines is web page designers putting in words that bear no relation to the content of a page. This can be done in the meta tags in HTML or by putting the words in the main body of the document, but using a font/colour that makes it invisible on the browser. This means that when a user types in their search words, they retrieve documents/pages that are completely irrelevant to their information needs. The user is puzzled, as it is obvious that the page is irrelevant, and they cannot find any trace of their search words in the retrieved page. These pages are called “spam” pages and they can be very annoying to the user. This technique tends to be used by the “adult entertainment” industry and there is something of an arms race between web search engines and such organisations. Spam pages harm precision (they are not relevant) so should be recorded. A good survey of spamming techniques can be found in Henzinger et al. (2002).

4.3 The process of evaluating web search The simple evaluation procedure for this type of experiment is as follows: (1) use a given query on all the search engines; (2) judge each engine for this query and record the results of each measure; (3) when all the results for all the queries applied to all the search engines, calculate the average for every search engine on every measure; (4) tabulate each measure separately, listing the search engine and its score on that measure; (5) apply statistical techniques to find significant differences between the effectiveness of search engines; and

(6) compare and contrast each search engine for each measure to see how well search engines did against each other: using the diagnostic measures to show why precision was reduced for any search engine. 5. An evaluation experiment 5.1 Data used for the experiment We conducted an evaluation of 50 queries whist working for a commercial organisation in 2000 (the queries are shown in the Appendix). The queries are mostly taken from the logs of a now defunct web search engine; the author added some informational queries to the set. We used the same method for choosing queries from the log as used at the TREC-8 Web Track (Hawking et al., 2000): that is, we inspected a number of queries and picked those which we felt confident that we understood what the user was searching for and could therefore make appropriate relevance assessments. The average number of terms for the query set used is 2.68: this is about what you would expect from a set of web queries and is not far off the figure quoted by both Silverstein et al. (1999) and Jansen et al. (1998) of 2.35 terms per query. Our classification of the query set found that 18 per cent were navigational, 46 per cent transactional and 36 per cent informational. This is a reasonable distribution of the queries for our experiment, as there are enough web type searches to be close to the type of searches that most web users will undertake (64 per cent for navigational/transactional queries). However, there are a sufficient number of informational queries to make the study of interest to practitioners whose users are more likely to require the resolution of information needs. We used all the assumptions and techniques for this evaluation declared and described in section 4. We did not inspect the URL beyond the first click: our requirement was that the URLs in the hit list should contain relevant information (hub sites were not therefore considered).

Evaluation of web search

359

5.2 Experimental results The precision results collected for the experiment are declared in Table IV. We use these results to show how the measures can be used in practice. What stands out in the results is that no spam documents were retrieved, and we can therefore discount spam as a problem for this particular set of queries for a given time period. This could be because many of the queries are quite esoteric or it could be that search engines were doing a good job of detecting spam at that time. Google clearly comes out on top with respect to all the precision measures, and quite clearly did a lot better than the other search engines for ranking documents (e.g. Google provided a third better precision at five documents retrieved than its nearest rival, Yahoo!). The worst performer on this set of queries using the precision measures is Lycos (www.lycos.com), and it is clear why this is the case from evidence provided by the Search engine P@5 P@10 Average precision Spam Duplicates Link broken Not retrieved Google AltaVista Lycos Yahoo

0.424 0.280 0.184 0.318

0.386 0.256 0.160 0.280

0.290 0.178 0.093 0.190

0 0 0 0

0.82 0.28 2.02 0.44

0.50 0.72 1.12 0.34

0.20 0.00 0.54 2.22

Table IV. Evaluation results

AP 59,4/5

360

diagnostic measures: on average two hits were duplicates while more than one broken link per query was found. This clearly demonstrates the value of diagnostic measures and the impact they have on precision. However, some diagnostic measures are useful to examine other aspects of web search (e.g. the “link broken” measure demonstrates that the Lycos web crawling mechanism was not as effective as the others in 2000). Yahoo! (www.yahoo.com) recorded the worst “not retrieved” score and did not do as well as Google, largely because it was not retrieving the required ten documents (missing over two sites per query on average). Interestingly, only Lycos did worse on the duplicates measure than Google, and Yahoo! has the least number of broken links of all the search engines. Overall, the conclusion is, for this set of queries for that particular time period, Google was the most effective search engine and did not have as much of a problem as the other search engines with respect to diagnostic measures. Lycos overall is the search engine which has its precision results most adversely effected by errors recorded by diagnostic measures. We examine these figures in terms of statistical significance below. .

5.3 Significance testing on precision results It should be noted that an increase in precision does not necessarily mean that there is a real difference between search engines (i.e., one web search engine is shown to provide better retrieval effectiveness). In order to do this some kind of significance testing is useful, but there is some controversy on this issue. Some argue (Van Rijsbergen, 1979) that parametric tests such as the t-test are not applicable as the form of the underlying distribution (of relevant documents) is unknown. Others, such as Hull (1993) and Sanderson and Zobel (2005), argue that parametric measures such as the t-test can be used even if the assumption on the underlying data having a normal distribution is violated. One method around this is to use a non-parametric test such as the Wilcoxon test (Hull, 1993) in conjunction with the t-test and only accept that there is a difference between the two systems if both measures agree that the difference is significant. This is the method we used on the data collected in the experiments (see Table V). T-test results are marked as “t-test”, while Wilcoxon test results are marked as “WLC” in Table V. The practitioner does not need to know the details of these tests, just that a result below 0.05 is regarded as being significant, while a result of 0.01 or below can be regarded as highly significant (Rowntree, 1981). Many such statistical tests are available in Microsoft’s Excel spreadsheet software, or can be downloaded from the web. It can be seen from Table V that both tests agree on what is significant and what is not significant, which gives us a little more confidence on any conclusions we draw from this data. Given this we can see that both tests are in agreement that Google provides a retrieval effectiveness improvement over the other search engines which is

Table V. Significance tests on precision results (figures in italics are not statistically significant)

Measure

Yahoo AltaVista AltaVista Google Google Google versus versus versus versus versus versus Lycos Lycos Yahoo Lycos Yahoo AltaVista t-test WLC t-test WLC t-test WLC t-test WLC t-test WLC t-test WLC

P@5 0.001 0.002 0.015 0.024 0.000 0.000 0.472 0.567 0.021 0.033 0.005 0.024 P@10 0.000 0.000 0.001 0.000 0.000 0.000 0.573 0.414 0.004 0.006 0.003 0.007 Average precision 0.000 0.001 0.000 0.000 0.000 0.000 0.759 0.413 0.004 0.025 0.005 0.004

highly significant for the most part, apart from the test against Yahoo! on five documents retrieved. These tests give us more confidence that the retrieval effectiveness Google provided over the other search engines used is actually real (in section 5.2 above). It should be noted that both tests do not agree on what differences are significant and what are highly significant: for example, Yahoo! versus Lycos at five documents retrieved. In cases where the measures do not agree we recommend that the practitioner err on the side of caution when drawing any conclusions. Practitioners should be wary of using percentage increases in precision to differentiate between search engines (Sanderson and Zobel, 2005). A good example of this can be found in Table VI. It can seem that many of the increases in precision (particularly for Google over the other search engines) are very impressive. The percentage increase from AltaVista to Yahoo is also quite good (7 per cent for average precision). However, using the data from the significance tests applied, any difference between AltaVista and Yahoo is not regarded as being significant, even though on the surface Yahoo would appear to be the better search engine for the query set used.

Evaluation of web search

361

5.4 Significance testing on diagnostic measures Table VII declares the results of significance tests on the diagnostic measures from Table IV. As with precision measures, there is complete agreement between the two statistical tests as to which pairwise comparisons are significant. This is very encouraging indeed and gives us yet more confidence on any statements we may make on statistical significance, with respect to all the measures. The measure also distinguishes between most of the comparisons between significant and highly significant differences, apart from the Link Broken measure on Google versus Lycos and AltaVista versus Yahoo!. With respect to the statements made in section 5.2 with regard to diagnostic measures and their impact on precision, it is clear that for the most part that the differences appear to be statistically significant for most of the worst performing search engines when completing pairwise comparisons. Yahoo!’s “not retrieved” and

Measure P@5 P@10 Average precision

Google versus AltaVista

Google versus Yahoo

Google versus Lycos

AltaVista versus Yahoo

AltaVista versus Lycos

Yahoo versus Lycos

51.4 50.8 63.1

33.3 37.9 52.4

130.4 141.3 212.9

13.6 9.4 7.0

52.2 60.0 91.9

72.8 75.0 105.4

Yahoo AltaVista Google Google versus versus AltaVista versus Google versus Lycos Yahoo versus Lycos Yahoo versus Lycos AltaVista t-test WLC t-test WLC t-test WLC t-test WLC t-test WLC t-test WLC

Table VI. Percentage improvement for best result: prevision results

Table VII. Significance tests on Diagnostic results Duplicates 0.003 0.002 0.010 0.020 0.002 0.005 0.344 0.294 0.000 0.000 0.000 0.000 Link broken 0.132 0.167 0.242 0.353 0.011 0.009 0.008 0.013 0.058 0.074 0.001 0.001 (figures in italics are not Not retrieved 0.322 1.00 0.001 0.000 0.101 0.250 0.000 0.000 0.060 0.125 0.003 0.002 statistically significant)

Measure

AP 59,4/5

362

Lycos’s “duplicates” figures when compared to other search engines results can be regarded as being very significantly different. The statement made about the reason for Yahoo!’s depressed precision against Google because of the “not retrieved” measure is validated by these test results. However, when comparing Lycos against AltaVista on both the “link broken” and “not retrieved” measures, we find that there is no statistical evidence of difference between the two search engines. The statistical significance recorded on the precision measures between these two engines must therefore be down largely to the poor performance of Lycos on the “duplicates” measure. We can draw a few other conclusions about differences in retrieval effectiveness from many other pairwise comparisons, which allow us to show which diagnostic measures are most likely to have an effect on precision. For example, with Google and AltaVista, “duplicates” appears to be the most likely reason between the difference in retrieval effectiveness. Only on one occasion (Yahoo! versus Lycos) do all three diagnostic measures appear to have an effect. It can be seen from Table VIII that using percentage improvements is a completely inappropriate method for distinguishing between search engine performance on Diagnostic measures. A good example of this is the result from AltaVista on the “not retrieved” measure: as this was zero any comparison between AltaVista and other search engines on this measure is rendered meaningless. Further evidence (if needed) is provided by the comparison between Google and Lycos on the “not retrieved” measure: an increase of 170 per cent is recorded from Google to Lycos, but both the t-test and Wilcoxon measures agree that the difference is not statistically significant. One of the main reasons for this behaviour is that Diagnostic measures are not normalised like precision measures (between 0 and 1) and are therefore more sensitive to any increase. 6. Conclusion The evaluation methodology presented in this paper is a practical (if labour-intensive) mechanism for evaluation, which has been successfully used for teaching purposes at City University for the past six years. The source of this methodology was the need of a commercial organisation, which required an evaluation of search engine technology – this inspired the author to develop the methodology. The author found the method very useful when he applied it, and information science students at City University London have had the same experience in their working environments, having learned the method in their information retrieval module. We therefore believe that information practitioners will find this method a useful way of evaluating web search engines for the searches they conduct on behalf of their users. The advantage of this methodology is that it builds on a significant amount of work by the academic community, and it gives the evaluator much more information on why search engines do not do so well on

Measure Table VIII. Percentage improvement Duplicates for best result: diagnostic Link broken results Not retrieved

Google versus AltaVista

Google versus Yahoo

Google versus Lycos

AltaVista versus Yahoo

AltaVista versus Lycos

Yahoo versus Lycos

65.9 30.6 Inf.

86.4 47.1 1,010.0

146.3 124.0 170.0

57.1 111.8 Inf.

621.4 55.6 Inf.

359.1 229.4 311.1

average using evidence provided by the diagnostic measures. The example evaluation in section 5 demonstrates this clearly, where the impact of diagnostic measures on precision is shown to be significant in many cases. Further work from this study would include measuring the direct impact of diagnostic results on precision for a single search engine using some form of statistical analysis (as opposed to the pairwise comparison method used in this paper).

Evaluation of web search

363 References Aitchison, J. and Cleverdon, C. (1963), “A report on a test of the index of metallurgical literature of Western Reserve University”, The College of Aeronautics, Cranfield. Belkin, N., Oddy, R. and Brooks, H. (1982), “ASK for information retrieval: part 1. background and theory”, Journal of Documentation, Vol. 38 No. 2 (reprinted in Spark-Jones, K. and Willett, P. (1997), Readings in Information Retrieval, Morgan Kaufmann, San Francisco, CA, pp. 299-304). Broder, A. (2002), “A taxonomy of web search”, SIGIR Forum, Vol. 36 No. 2, pp. 3-10. Chowdhury, A. and Soboroff, I. (2002), “Automatic evaluation of World Wide Web search services”, in Beaulieu, M., Baeza-Yates, R., Myaeng, S. and Jarvelin, K. (Eds), Proceedings of the Twenty-fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), Tampere, pp. 421-2. Clarke, C., Craswell, N. and Soboroff, I. (2005), “Overview of the TREC 2004 Terabyte Track”, in Voorhees, E. and Buckland, L. (Eds), NIST Special Publication 500-261: The Eleventh Text REtrieval Conference, available at: http://trec.nist.gov/pubs/trec13/t13_proceedings.html Craswell, N. and Hawking, D. (2005), “Overview of the TREC 2004 Web Track”, in Voorhees, E. and Buckland, L. (Eds), NIST Special Publication 500-261: The Eleventh Text REtrieval Conference, available at: http://trec.nist.gov/pubs/trec13/t13_proceedings.html (accessed 8 March 2007). Gordon, M. and Pathak, P. (1999), “Finding information on the World Wide Web: the retrieval effectiveness of search engines”, Information Processing and Management, Vol. 35 No. 2, pp. 141-80. Hawking, D. (2001), “Overview of the TREC-9 Web Track”, in Voorhees, E. and Harman, D. (Eds), NIST Special Publication 500-249: The Ninth Text REtrieval Conference, available at: http://trec.nist.gov/pubs/trec9/t9_proceedings.html Hawking, D. and Craswell, N. (2002), “Overview of the TREC 2001 Web Track”, in Voorhees, E. and Harman, D. (Eds), NIST Special Publication 500-250: The Tenth Text REtrieval Conference, available at: http://trec.nist.gov/pubs/trec10/t10_proceedings.html Hawking, D. and Craswell, N. (2003), “Overview of the TREC 2002 Web Track”, in Voorhees, E. and Buckland, L. (Eds), NIST Special Publication 500-251: The Eleventh Text REtrieval Conference, pp. 78-92, available at: http://trec.nist.gov/pubs/trec11/t11_proceedings.html Hawking, D. and Thistlewaite, P. (1998), “Overview of the TREC-6 Very Large Collection Track”, in Voorhees, E. and Harman, D. (Eds), NIST Special Publication 500-242: The Sixth Text REtrieval Conference, pp. 93-106, available at: http://trec.nist.gov/pubs/trec6/ t6_proceedings.html Hawking, D., Craswell, N. and Thistlewaite, P. (1999), “Overview of the TREC-7 Very Large Collection Track”, in Voorhees, E. and Harman, D. (Eds), NIST Special Publication 500-242: The Seventh Text REtrieval Conference, pp. 91-104, available at: http://trec.nist. gov/pubs/trec7/t7_proceedings.html

AP 59,4/5

364

Hawking, D., Voorhees, E. and Craswell, N. (2000), “Overview of the TREC-8 Web Track”, in Voorhees, E. and Harman, D. (Eds), NIST Special Publication 500-246: The Eighth Text REtrieval Conference, pp. 131-50, available at: http://trec.nist.gov/pubs/trec8/ t8_proceedings.html Hawking, D., Craswell, N., Bailey, P. and Griffiths, K. (2001), “Measuring search engine quality”, Information Retrieval, Vol. 4 No. 1, pp. 33-59. Hawking, D., Craswell, N., Wilkinson, R. and Wu, M. (2004), “Overview of the TREC 2003 Web Track”, in Voorhees, E. and Buckland, L. (Eds), NIST Special Publication 500-255: The Twelfth Text REtrieval Conference, pp. 78-94, available at: http://trec.nist.gov/pubs/trec12/ t12_proceedings.html Henzinger, M., Motwani, R. and Silverstein, C. (2002), “Challenges in web search engines”, SIGIR Forum, Vol. 36 No. 2, pp. 11-22. Hull, D. (1993), “Using statistical testing in the evaluation of retrieval experiments”, in Korfage, R., Rasmussen, E. and Willett, P. (Eds), Proccedings of the 16th Annual ACM Conference on Research and Development in Information Retrieval, SIGIR’93, Pittsburgh, PA, pp. 329-38. Jansen, B., Spink, A., Bateman, J. and Saracevic, T. (1998), “Real life information retrieval: a study of user queries on the Web”, SIGIR Forum, Vol. 32 No. 1, pp. 5-17. Leighton, H.V. and Srivastava, J. (1999), “First 20 precision among world wide web search services (search engines)”, Journal of the American Society for Information Science, Vol. 50 No. 10, pp. 870-81. Oppenheim, C., Morris, A., McKnight, C. and Lowley, S. (2000), “The evaluation of WWW search engines”, Journal of Documentation, Vol. 56 No. 2, pp. 190-211. Rowntree, D. (1981), Statistics Without Tears: An Introduction for Non-Mathematicians, Penguin, London. Sanderson, M. and Zobel, J. (2005), “Information retrieval systems evaluation: effort, sensitivity and reliability”, Proceedings of the 28th Annual International ACM Conference on Research and Development in Information Retrieval, SIGIR 2005, Salvador, pp. 162-9. Silverstein, S., Henzinger, M. and Marais, H. (1999), “and Moricz Analysis of a very large web search engine query log”, SIGIR Forum, Vol. 33 No. 1, pp. 6-12. Van Rijsbergen, C. (1979), “Information Retrieval”, 2nd ed., Butterworths, London, available at: www.dcs.gla.ac.uk/Keith/Preface.html (accessed 7 March 2007). Vaughan, L. (2004), “New measurements for search engine evaluation proposed and tested”, Information Processing and Management, Vol. 40 No. 4, pp. 677-91. Voorhees, E. and Harman, D. (2000), “Overview of the Eighth Text REtrieval Conference”, in Voorhees, E. and Harman, D. (Eds), NIST Special Publication 500-246: The Eighth Text REtrieval Conference, pp. 1-24, available at: http://trec.nist.gov/pubs/trec8/t8_proceedings.htm Wu, G. and Li, J. (1999), “Comparing web search performance in searching consumer health information: evaluation and recommendations”, Bulletin of the Medical Library Association, Vol. 87 No. 4, pp. 456-61. Zeldman, J. (2001), Taking Your Talent to the Web: Making the Transition from Graphic Design to Web Design, Waite Group Press, Boston, MA. Appendix. List of queries used for evaluation (1) sade adu biography (2) middle east crisis (3) parallel computing

(4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43)

information retrieval karl popper philosophy science scramble africa origins second world war urbanwear streetwear urbanclothing hiphop clothing flower arranging mountain climbing safety equipment loft insulation body building arvo part compositions norman conquest meiji restoration japan atomic clock accuracy curry led zeppelin levi jeans bookcase suppliers tour operators spain fiction novel the silver city bombay street children lou reed interview soprano singing gene therapy martin scorsese submersible pump manufacturer germany door fittings currency conversion serbian mafia investments software bodyguard training tuition pictures Linda Lusardi research forest pathology bonsai styles world social security rates land rover defender air fares germany britain telemetry alarm system nokia phone le surete french police restaurants kids central london

Evaluation of web search

365

AP 59,4/5

366

(44) (45) (46) (47) (48) (49) (50)

autonomy microwave ovens engineer jobs uk soorento italy images festival diwali woodpigeon shooting sex

Corresponding author A. MacFarlane can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Parallel methods for the update of partitioned inverted files

Update of partitioned inverted files

A. MacFarlane School of Informatics, City University London, London, UK

367

J.A. McCann Department of Computing, Imperial College, London, UK, and

Received 14 December 2006 Accepted 29 May 2007

S.E. Robertson Microsoft Research Ltd, Cambridge, UK Abstract Purpose – An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier. Design/methodology/approach – Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions. Findings – Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context. Practical implications – There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past. Originality/value – The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services. Keywords Information retrieval, Parallel programming, Query languages, Databases Paper type Research paper

1. Introduction One of the most neglected areas in information retrieval is the issue of servicing updates to inverted files. In most applications this is understandable given that some databases will not be updated very frequently: for example, Dialog and DataStar have databases which are updated weekly, monthly quarterly or even yearly (Thomson Dialog, 2005). However, some applications such as web search engines or news services like Reuters could have updates arriving 24 hours a day, and there is no time when the This research is supported by the British Academy under grant number IS96/4203. The authors are also grateful to ACSys for awarding the first author a visiting student fellowship at the Australian National University in order to complete this research and the use of their equipment. They are particularly grateful to David Hawking for making the arrangements for the visit to the ANU.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 367-396 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817582

AP 59,4/5

368

system could be taken down and the updates serviced in a batch. Updating inverted files is very expensive and periodically requires the re-indexing of the whole database. It is therefore becoming increasingly important to examine the impact of update and query services on inverted files. In this paper we describe the update mechanism for a parallel text retrieval system, PLIERS, on two different types of partitioning methods: (1) term identifier partitioning (TermId) where each term, with all its associated postings is assigned to a single fragment (and therefore information about any document is distributed among fragments); and (2) document identifier partitioning (DocId) where the reverse is the case. 2. Experimental methodology Much of the previous work in the area of inverted file maintenance (Reddaway, 1991; Shoens et al., 1994; Clarke and Cormack, 1995; Brown et al., 1994) has advocated the use of buffering updates to save on input/output. Some argue that to update the index for each individual arriving document is inefficient (Shoens et al., 1994), but use a synthetic workload performance analysis to support their arguments. We attempt to simulate a persistent service for updates without coding a complete transaction service, accepting that it is better to wait a little before updating an index. To do this we keep an in-core buffer to which updates are added when they are received. When this buffer is full, we initiate an index reorganisation merging the in-core update index with the index kept on disk. In order to do this we use the following strategy: (1) read in inverted list from disk; (2) add new postings to inverted list; and (3) save the new postings to disk to a temporary postings file. As we are unlikely to be able to keep the dictionary in-core, we keep a subset of the keywords in memory, with each element of the subset a header of a keyword block held on disk. All hit keyword blocks are saved to a temporary keyword file for realism. The advantage of this method is that we can do a realistic disk reorganisation simulation without the need for expensive rollbacks in order to conduct repeated experiments on the same data set. We do not attempt to reorganise the whole index as we assume that a large chunk of the database will never be referenced by incoming updates. The transactions we refer to as updates are collection updates or document insertions: we do not address the issue of document removals or document changes. Our assumption is that text collections are in the main archival. Our priority is to try and keep the index in a state that would allow us to service fast query processing. We do, however, allow the service of transactions while the reorganisation of the index is being done: there is a strict interleaving between the reorganisation of a term and transaction service to prevent concurrency problems. There may therefore be some delays to transactions while a reorganisation of the index is going on. A number of issues have not been addressed, such as simultaneous update and query processing together with concurrency control due to time constraints. However, we recognise the importance of those issues and deal with them theoretically elsewhere (MacFarlane et al., 1996). The document availability semantics we use is Late Availability, which is defined by MacFarlane et al. (1996). A survey of strategies for updating inverted files is available in Zobel and Moffat (2006).

3. Transaction topologies Given that we want to service search and updates simultaneously, the transaction topology cannot differ too much from the search topology described elsewhere (MacFarlane et al., 2000). We therefore define top and leaf nodes that can handle both search operations and the update operations implemented. We describe the additional functionality needed by the nodes to support update operations above. Figure 1 shows an example of transaction topology with the service of updates.

Update of partitioned inverted files 369

3.1 Top node The top node being the interface to the topology, accepts new documents, breaks them down into their constituent words, and sends the index information to the relevant inverted file fragment. The main issue here is that the top node must know what type of partitioning method being used in order to send data to fragments accordingly. For example, in TermId partitioning a bucket of words will be formed for each fragment of the inverted file. These can be sent directly to the fragments. However, with DocId partitioning, a decision must be made as to how new documents are allocated to the fragments. We make the assumption that over a given period of time incoming documents that are distributed in a “round robin” fashion will give each fragment roughly the same amount of data, although it is unlikely to be evenly distributed. We therefore assign new document to fragments using a “round robin” distribution method when document identifier partitioning is used. For both types of partitioning method a confirmation of update completion must be received before a commit notice is sent to the client. 3.2 Leaf node The leaf node receives index data and merges it with the fragment index data handled by that particular leaf node. This sequential process is identical to that described in another paper by MacFarlane et al. (2005). Some collection statistics and document data must be shared amongst leaf nodes (e.g. collection size and document length). Each leaf has a document map structure that records such information. When a search transaction is received by the leaf, both the index and the in-core buffer are searched. If a reorganisation of the index is initiated new updates are added to a separate

Figure 1. Example of transaction topology configuration

AP 59,4/5

370

temporary buffer: this is searched as well if new queries are received by the leaf. When the reorganisation is complete this temporary buffer becomes the main buffer. 4. Software and hardware used PLIERS (Parallel Information Retrieval Research System) has been developed at City University using ideas from Okapi to investigate the use of parallelism in IR. PLIERS is designed to run on several parallel architectures and is currently implemented on those which use Sun Sparc, DEC Alpha and Pentium PII processors. The results presented in this paper were obtained on eight nodes of a 12-node AP3000 at the Australian National University, Canberra. Each node has its own local disk: the shared nothing architecture (DeWitt and Gray, 1992) is used by PLIERS. The Fujistsu AP3000 is a distributed memory parallel computer using Ultra 1 processors running Solaris 2.5.1. Each node has a speed of 167 Mhz. The Torus network has a top bandwidth of 200 Mbytes/s per second. 5. Data and settings used The data used in the experiments was the BASE1 and BASE10 collections, both sub-sets of the official 100 Gigabyte VLC2 collection (Hawking et al., 1999). The BASE1 is 1 Gigabyte in size, while BASE10 is approximately 10 Gigabytes in size. We use two types of builds for indexes: (1) distributed builds, where text is kept centrally and distributed to index nodes; and (2) local builds, where text is physically distributed to nodes and indexed locally (MacFarlane et al., 2005). For the distributed build method we use the BASE1 collection only, creating indexes on one to seven processors and servicing transactions on all of those indexes. Two types of index were built for these experiments, one set using TermId partitioning and one using DocId partitioning. The BASE1 and BASE10 collections were used for the local build method, running queries on eight nodes: the client and top node had to be placed on the same node as one leaf. The DocId partition method is used on these experiments. We built one set of indexes which contained position data and one set without position data for both types of build methods and both types of partitioning methods (runs with position data are marked “position data”, while those without such data are marked “postings only”). Table I shows the transaction sets used in our experiments. The queries are based on topics 1 to 450 of the TREC1 to TREC8 ad hoc tracks: 400 queries in all (the topics 201-250 in TREC4 did not have a title only field in the topics). The terms were extracted from TREC topic descriptions using an Okapi query generator utility to produce the final query. The average number of terms per query is 3.46. The document updates were Transaction set name

Table I. Details of transaction sets used in experiments

UPDATE1 UPDATE2 UPDATE3 UPDATE4 UPDATE

No. of updates

No. of queries

No. of transactions

40 80 200 400 500

400 400 400 400 400

440 480 600 800 900

chosen from a Reuters-22173 collection (Lewis, 2006) not in the VLC2 set: we refer to this file as REUTERS. We chose this set because we can guarantee that the data is new to the VLC2 set. The REUTERS file is 1.2 Mb in size and has 1,000 records. We took both these sets and created transaction sets with differing numbers of updates and queries, varying the number of updates to queries. We do this at a number of rates ranging from ten queries per one update down to one query per 1.25 updates. We also examine update and query only service. This allows us to both examine the effect between updates and queries as well as finding a good point where buffer re-organisation is needed. We apply these transactions to all the indexes built (described above), both in the presence and absence of an index reorganisation. All figures produced are averages of five runs per experiment. For the one-leaf experiments we use a client/server process. We record the raw index reorganisation speed to establish the best point to initiate it. We use a number of measures to examine the results. These are elapsed time in seconds, load imbalance (LI), speedup, and scaleability for transactions and index reorganisation and transaction throughput (transactions per hour). Equations for most of these metrics are shown in the Appendix.

Update of partitioned inverted files 371

6. Experimental results on transaction processing We have a number of aspects that we wish to examine by looking at the empirical results produced. The first of these is the issue of update performance (see section 6.1). Is there a big performance penalty in only allowing one update at a time in the system? We also need to examine the transactions as a whole looking at aspects such as the interaction between queries and updates and its impact on performance (see section 6.2). Both updates and transactions are examined in the presence and absence of index reorganisation. The performance of index reorganisation is examined in section 6.3, together with a discussion on a good buffer size for the collections being examined. A summary of the experimental results is given in section 6.4. 6.1 Performance of update transactions As there is no general criterion for response time for update transactions as there is for query transactions (Frakes, 1992) we need to define one here. The criterion we use is that updates should be done within one tenth of a second (or 100 milliseconds). This strict criterion is chosen because we want to ensure that queries are not delayed much, although users who submit documents for update would prefer a fast response. The elapsed time for update transactions is quite small for most runs (see Figures 2-5). All

Figure 2. BASE1 [DocId]: average elapsed time in ms for update transactions (postings only)

AP 59,4/5

372 Figure 3. BASE1 [TermId]: average elapsed time in ms for update transactions (postings only)

Figure 4. BASE1 [DocId]: average elapsed time in ms for update transactions (position data)

Figure 5. BASE1 [TermId]: average elapsed time in ms for update transactions (position data)

times are under 100 milliseconds and times do reduce with increasing numbers of leaf nodes. There are two main observations from this. The first is that update transaction elapsed times meet our criterion and are therefore acceptable in our terms. Any delays by blocking other transactions while an update is done are therefore small. The second is that speedup is found in systems using parallelism, which is surprising given the restrictions on parallelism with the type of update transaction processing implemented (see Figures 6-9). The results show that DocId partitioning has a much more beneficial effect on elapsed times than TermId partitioning and the advantage in elapsed time using multiple leaf nodes is superior with DocId. The reasons for these effects are two-fold: (1) memory; and (2) communication.

Update of partitioned inverted files 373

With DocId the increase in memory affects elapsed time positively, and communication is done with one leaf node only. This memory advantage is offset with extra communication with TermId as document data must be communicated to all leaf nodes. It should be noted that most of the conclusions drawn here apply to updates

Figure 6. BASE1 [DocId]: speedup for update transactions (postings only)

Figure 7. BASE1 [TermId]: speedup for update transactions (postings only)

AP 59,4/5

374

Figure 8. BASE1 [DocId]: speedup for update transactions (position data)

Figure 9. BASE1 [TermId]: speedup for update transactions (position data)

which record position data. The exception is that TermId partitioning in many cases does not meet the 100 millisecond criterion together with the single leaf nodes run (see Figure 5). Figures 10-13 show the effect of initiating an index reorganisation while serving update transactions. Elapsed times on both types of partitioning method are increased, but DocId partitioning is much better able to handle the resource contention than TermId. In terms of our 100 millisecond criterion, DocId meets our requirement while TermId partitioning does not. While DocId runs show reduction in elapsed time over multiple leaf nodes, TermId runs actually record a reduction in performance. The reason for this is simple: index reorganisation on DocId partitioned inverted file is done on much shorter lists. Therefore a request for transaction service on TermId partitioning is more likely to be delayed, hence the increase in percentage terms for elapsed time over DocId as shown in Figures 14-17. With respect to indexes that contain position data, most runs, apart from a few on DocId partitioning, exceed the 100 millisecond criterion. TermId partitioning runs are particularly badly affected with

Update of partitioned inverted files 375 Figure 10. BASE1 [DocId]: average elapsed time in ms for update transactions during index reorganisation (postings only)

Figure 11. BASE1 [TermId]: average elapsed time in ms for update transactions during index reorganisation (postings only)

Figure 12. BASE1 [DocId]: average elapsed time in ms for update transactions during index reorganisation (position data)

AP 59,4/5

376 Figure 13. BASE1 [TermId]: average elapsed time in ms for update transactions during index reorganisation (position data)

Figure 14. BASE1 [DocId]: percentage increase in average elapsed time for update transactions during index reorganisation (postings only)

Figure 15. BASE 1 [Termld]: percentage increase in average elapsed time for update transactions during index reorganisation (postings only)

Update of partitioned inverted files 377 Figure 16. BASE 1 [Docld] percentage increase in average elapsed time for update transactions during index reorganisation (position data)

Figure 17. BASE 1 [Termld]: percentage increase in average elapsed time for update transactions during index reorganisation (position data)

some runs registering an increase of around 300 per cent over elapsed times when index reorganisation is done. Table II shows the details of comparable BASE1 and BASE10 runs using the DocId partitioning method. It should be noted that BASE10 runs are slightly higher than our criterion for elapsed times for update. It may not therefore be possible to set such a strict criterion for larger databases, and we may have to relax our requirements to, say, a second. All BASE10 elapsed times are under a second, even updates done on indexes with position data while an index reorganisation is being done. The scalability for update transactions on the BASE10 collection is very good indeed, particularly for indexes with postings only data. The scalability reduces while index reorganisation is being done, but is still good.

6.2 Performance of transactions as a whole The average elapsed time for transactions as a whole is very good with all times under a second, including BASE10 experiments. Figures 18-21 show average elapsed times

AP 59,4/5

Metric Postings only (no positions) Elapsed time (ms)

378

Scalability Elapsed time (ms) during index update Scalability during index update Position data Elapsed time (ms)

Table II. BASE1/BASE10 [DocId]: index update results for update transactions

Figure 18. BASE1 [DocId]: transaction average elapsed times in ms (postings only)

Figure 19. BASE1 [TermId]: transaction average elapsed times in ms (postings only)

Scalability Elapsed time (ms) during index update Scalability during index update

Collection

UP

UP1

UP2

UP3

UP4

BASE1 BASE10 BASE10 BASE1 BASE10 BASE10

43 109 3.97 55 268 2.07

43 124 3.46 64 380 1.67

46 124 3.72 62 359 1.73

40 121 3.33 60 310 1.95

43 123 3.51 57 299 1.91

BASE1 BASE10 BASE10 BASE1 BASE10 BASE10

52 202 2.55 103 621 1.66

48 265 1.83 130 971 1.34

51 261 1.97 131 975 1.34

48 243 1.98 125 892 1.40

51 246 2.06 115 817 1.40

Update of partitioned inverted files 379 Figure 20. BASE1 [DocId]: transaction average elapsed times in ms (position data)

Figure 21. BASE1 [TermId]: transaction average elapsed times in ms (position data)

for transactions on the BASE1 collection using all types of indexes and partitioning methods. From these elapsed times it can be seen that there is a reduction in average time when the number of update transactions is increased and when DocId partitioning is used. The reduction due to increased level of updates is because updates are smaller in average time and will reduce the average transaction time. The DocId partitioning method outperforms TermId quite considerably on any of the transaction sets used. The performance problem found with runs on TermId partitioning in previous experiments (MacFarlane et al., 2000) severely affect the overall performance of those runs. No real speed advantage by the use of parallelism is demonstrated in any of the TermId partitioning experiments (Figures 22-25). In fact slowdown is registered for all parallel runs on indexes with postings only data (see Figure 23). Speed advantage on indexes containing position data is recorded, but is very slight (see Figure 25). With DocId partitioning we do gain speed advantage using parallelism (see Figures 22 and 24), but the proportion of updates in the transaction set may actually increase the average elapsed time when more leaf nodes are used (see Figures 18 and 20). The level

AP 59,4/5

380

Figure 22. BASE 1 [Docld]: speedup for all transactions (postings only)

Figure 23. BASE 1 [Termld]: speedup for all transactions (postings only)

Figure 24. BASE 1 [Docld]: speedup for all transactions (position data)

Update of partitioned inverted files 381

Figure 25. BASE 1 [Termld]: speedup for all transactions (position data)

of parallelism that can be successfully deployed depends on the balance in time between updates and queries, at the point where gain in parallelism is outweighed by loss in servicing updates. Figures 26-29 show the effect of index reorganisation on transactions serviced over BASE1 collection. The results show that DocId partitioning outperforms TermId if an elapsed time criterion is used. While runs on DocId partitioning using parallelism reduce run times over the client/server runs, TermId runs actually increase in time. This evidence is consistent with the update transaction results described above. However, it is clear that DocId partitioning after a certain parallel machine size holds the run times constant, and the ability to cope with resource contention is far superior to that of TermId. There is some doubt as to the wisdom of deploying parallelism after a given point, but other factors such as the total time for an index reorganisation are important. Our choice of either parallelism or the actual level of parallelism will depend on the balance between normal transaction processing and transaction processing during an index update. A further interesting observation is that transaction sets with more update transactions are less affected by resource contention than others with more query transactions, which is particularly noticeable in TermId results (see

Figure 26. BASE1 [DocId]: average elapsed time in ms for all transactions during index reorganisation (postings only)

AP 59,4/5

382 Figure 27. BASE1 [TermId]: average elapsed time in ms for all transactions during index reorganisation (postings only)

Figure 28. BASE1 [DocId]: average elapsed time in ms for all transactions during index reorganisation (position data)

Figure 29. BASE1 [TermId]: average elapsed time in ms for all transactions during index reorganisation (position data)

Figures 27 and 29). The reason for this is that update transactions are faster than query transactions and are therefore much less affected when the index is being updated. What effect do these results have on throughput? In Figures 30-34 the throughput figures are declared, with the data separated into transaction sets. The suffix “ro” in the diagrams signifies that the run was done in the presence of an index reorganisation. The throughput measure is thousands of transactions per hour. The main conclusion from these throughput results is that DocId partitioning outperforms TermId using any type of index (as would be expected from the elapsed time data). Using this measure demonstrates how disappointing the performance of TermId actually is: throughput is not improved by the addition of extra leaf nodes. Many runs are limited to a throughput of 20k transactions per hour. The best performing index type/partitioning pair is DocId with postings only indexes on any of the transaction sets. It can be seen in the diagrams through DocId with postings only data that the transaction set has an impact on trends in throughput. For example, on the UPDATE1 set there is a clear increase in throughput for increasing numbers of leaf

Update of partitioned inverted files 383

Figure 30. BASE1: combined transactions throughput for UPDATE1 transaction set

Figure 31. BASE1: combined transactions throughput for UPDATE2 transaction set

AP 59,4/5

384 Figure 32. BASE1: combined transactions throughput for UPDATE3 transaction set

Figure 33. BASE1: combined transactions throughput for UPDATE4 transaction set

Figure 34. BASE1: combined transactions throughput for UPDATE transaction set

nodes, while throughput on the UPDATE set shows a clear tailing off effect with larger numbers of leaf nodes (see Figures 30 and 34). Throughput on the index type/partitioning method relative to each other is consistent irrespective of the transaction set under scrutiny. How does load imbalance affect the results given above? Load imbalance does not appear to be a significant problem: Figures 35-38 show the overall level of load imbalance for all transactions. It can be seen that imbalance is higher in DocId than it is in TermId, for both types of indexes. The imbalance figures for all results are relatively small, but clearly there is an increase in imbalance with increasing parallel machine size on DocId, while imbalance on TermId remains fairly constant. The key result here is that document updates do not harm overall load imbalance significantly. The “round robin” method of distributing document updates to nodes when DocId partitioning is used is a reasonable method. The results also show that it may be possible to offer better concurrent transaction service on TermId partitioning than DocId partitioning

Update of partitioned inverted files 385

Figure 35. BASE1 [DocId]: load imbalance for all transactions (postings only)

Figure 36. BASE1 [DocId]: load imbalance for all transactions (position data)

AP 59,4/5

386 Figure 37. BASE1 [TermId]: load imbalance for all transactions (postings only)

Figure 38. BASE1 [TermId]: load imbalance for all transactions (position data)

(this is consistent with imbalance results found in probabilistic search; MacFarlane et al., 2000). Table III shows the scalability results for all transactions. Average elapsed times for BASE10 runs during normal transaction processing are all under half a second when postings only indexes are used and under a second for position data indexes. The delays on BASE10 while indexes are updated are considerable and runs are over double, a factor particularly significant for indexes with position data. It may not be viable to use the index update method for this task, particularly if queries are delayed beyond the ten-second elapsed time recommendation (Frakes, 1992) during an index reorganisation on much larger collections. Scalability is very good and increases with the number of updates in a transaction: as would be expected since updates provide much better scalability than queries (see Table II). Elapsed times trends are inverse to that of scalability and for the same reason. It is clear that within our experimental framework the best partitioning method for transaction processing is DocId. Both the experiments discussed here and work discussed throughout this paper show that DocId partitioning provides better

Metric

Collection

Postings only (no positions) Elapsed time (ms)

BASE1 BASE10 Scalability BASE10 Elapsed time (ms) during index update BASE1 BASE10 Scalability during index update BASE10 Position data Elapsed time (ms)

BASE1 BASE10 Scalability BASE10 Elapsed time (ms) during index update BASE1 BASE10 Scalability during index update BASE10

UP

UP1

60 75 257 479 2.35 1.56 80 118 551 1,021 1.46 1.15

UP2

UP3

UP4

73 440 1.66 112 949 1.18

66 363 1.83 98 776 1.26

60 280 2.14 82 604 1.36

72 103 97 84 73 448 891 818 660 505 1.61 1.15 1.18 1.27 1.45 159 265 246 205 167 1,368 2,562 2,364 1,924 1,517 1.17 1.04 1.04 1.07 1.10

Update of partitioned inverted files 387

Table III. BASE1/BASE10 [DocId]: index update results for all transactions

performance both in normal transaction processing and when an index reorganisation is initiated for all types of transactions. However, the imbalance figures demonstrate that concurrent transaction processing might work well on TermId partitioning, a conclusion which reinforces our previous experience with search (MacFarlane et al., 2000). Scalability of transactions using DocId partitioning is good, but results demonstrate that the index update task defined here may not be a viable solution for much larger collections than ones considered here. 6.3 Performance of index reorganisation The results found in our index reorganisation performance confirm that it is better to wait for a given period and do the reorganisation collectively than do it on a one-document basis (Shoens et al., 1994). Figures 39-42 show the index reorganisation results using elapsed time in seconds. Above all these figures show how expensive index reorganisations are, particularly for indexes with position data. It should be noted that these figures are much reduced

Figure 39. BASE1 [DocId]: index reorganisation elapsed time in seconds (postings only)

AP 59,4/5

388 Figure 40. BASE1 [DocId]: index reorganisation elapsed time in seconds (position data)

Figure 41. BASE1 [TermId]: index reorganisation elapsed time in seconds (postings only)

Figure 42. BASE1 [TermId]: index reorganisation elapsed time in seconds (position data)

from a method that would require a reorganisation of the whole index. The best buffer size for this data is 500 documents: there is very little difference between reorganisations done on buffer sizes of 500 documents and 400 documents, particularly for indexes with position data. There is an increase in the elapsed time for increasing buffer size on all runs, but the increase is not linear with the number of documents in the buffer: the results on multiple leaf nodes are the same. Comparing the partitioning methods, elapsed times on DocId are better than TermId using all buffer sizes and on all multiple leaf nodes runs apart from two leaf nodes on a 500-document buffer. Speed advantage is shown in both partitioning methods by increasing the leaf nodes set in a run. Figures 43-46 show the speedup for index reorganisation on both partitioning methods. Good speed advantage is shown by any number of leaf nodes using any type of partitioning method. Super-linear speedup is shown on both partitioning methods, apart from TermId, on any run using six leaf nodes with any type of index. The run on six leaf nodes on an 80-document buffer is particularly disappointing considering the other results. We will return to this factor when discussing load imbalance below. Why does this super-linear speedup occur? If we examine the total time needed for an index reorganisation we find that all parallel runs reduce the total time for an index reorganisation. Figures 47 and 48 show the underlying reason why the super-linear speedup occurs.

Update of partitioned inverted files 389

Figure 43. BASE1 [DocId]: index reorganisation speedup (postings only)

Figure 44. BASE1 [DocId]: index reorganisation speedup (position data)

AP 59,4/5

390

Figure 45. BASE1 [TermId]: index reorganisation speedup (postings only)

Figure 46. BASE1 [TermId]: index reorganisation speedup (position data)

Figure 47. BASE1 [DocId]: millions of postings handled during index reorganisation

Update of partitioned inverted files 391 Figure 48. BASE1 [TermId]: millions of postings handled during index reorganisation

The total number of posting records handled for DocId actually reduces with increasing numbers of leaf nodes, but TermId runs move much the same amount of data. The reason for this effect in DocId partitioning is that as the number of leaf nodes is increased, the more frequent terms which both the buffer and index shared are spread over more blocks which have fewer records associated with them. The more frequent occurring terms are interspersed among less frequent terms as more blocks are handled. This effect does not happen on TermId partitioning as much the same blocks will be handled by parallel runs of any leaf node size. There is some variation in TermId but the effect is minimal. Note that the number of postings moved for a 500-document buffer is always slightly more than those for a 400-document buffer. The total number of postings in BASE1 collection is 22.6 million: just over half the index is reorganised for just 500 documents, reducing with increasing numbers of leaf nodes. The evidence suggests that a good buffer size for this data is 500 documents. From the evidence given above there is clearly an offset between the advantage gained in DocId partitioning by increasing the number of leaf nodes and improvements in performance gained by waiting until the buffer has reached a given size. We can therefore make a case for delaying the initiation of index reorganisation on more leaf nodes until their buffers contain more documents. In this way we can take advantage of both effects discussed, i.e. less data movement on more leaf nodes and less time when an index update is being done. Figures 49-52 provide more evidence of the increased buffer effect on load imbalance. The imbalance figures for DocId partitioning show that initiating index reorganisations with a 40-document buffer does not yield good load balance. Increasing the leaf nodes set size also has a tendency to increase imbalance. The TermId partitioning method is generally more consistent, but imbalance on six leaf nodes for any buffer size is noticeably worse than for other leaf nodes. This is a failure of the distribution process, which relies on a heuristic to distribute data to leaf nodes. This has a direct and significant impact on speedup for TermId partitioning runs for six leaf nodes (see Figures 45 and 46).

AP 59,4/5

392

Figure 49. BASE1 [DocId]: index reorganisation load imbalance (postings only)

Figure 50. BASE1 [TermId]: index reorganisation load imbalance (postings only)

Figure 51. BASE1 [DocId]: index reorganisation load imbalance (position data

Update of partitioned inverted files 393

Figure 52. BASE1 [TermId]: index reorganisation load imbalance (position data)

6.4 Summary of experimental results In all aspects of transaction processing and index reorganisation, DocId partitioning is shown to be superior to TermId partitioning. For update transactions both methods are quick when data is added to the buffer, but DocId provides better transaction performance when an index reorganisation is being executed. Many update transactions meet our 100 millisecond requirement for elapsed times for document insertions. For transactions with both updates and queries, DocId is superior largely because of the performance improvement which is obtained with that method, shown in previous experiments with probabilistic search (MacFarlane et al., 2000). The total number of records moved during index reorganisation is reduced with increasing numbers of leaf nodes when DocId partitioning is used. There is, however, an offset between the buffer size for incoming updates and increasing the leaf nodes set in order to reduce the amount of data moved. Overall our empirical results demonstrate that DocId partitioning is the preferred method for servicing the inverted file index maintenance techniques outlined in this paper. One might question the viability of the method of index update, if queries are delayed beyond the ten-second response time recommended by Frakes (1992) or updates are delayed more than the 100-millisecond or one-second requirement recommended in this paper. This issue will be examined further in the conclusion. Conclusion The empirical results from this research show that in all aspects of both transaction processing and index reorganisation, the DocId partitioning method is far superior. Problems highlighted in our probabilistic search experiments (MacFarlane et al., 2000) impose severe restrictions on transaction processing when the TermId method is used, which are difficult to solve within our experimental context. These problems (most notably the sort aspect of search) had an impact on the relative difference between the two partitioning methods during transaction processing. In index reorganisation when using DocId partitioning, the amount of data which needs to be moved reduces with increasing leaf node set size due to the qualities of

AP 59,4/5

394

the keyword set for each element of the leaf nodes set (the assumption made in the synthetic model was correct). Providing the same term block strategy was used, this effect will be a generic one. We have found evidence, however, that TermId might be useful in a concurrent transaction processing context, and this would have to be the focus for any future research. It may be the case that the methods outlined in this paper for dealing with new documents may not be viable in a realistic situation: we could consider a scenario where the update rate was so high buffer space would run out, thereby crashing the system or causing a denial of service. Such a problem would occur when there are more updates being submitted to the system than it can handle, so that the time to reorganise an index with these new updates is greater than the actual time available on the system. There are limits to a method of storage such as inverted file, which is designed for fast search and which is expensive to maintain: therefore, for these high-update applications some other method of transaction processing and storage method is required. Where our methods are not useable we would recommend the use of a two-phase signature search (Cringean et al., 1990) which allow for cheap updates, but also allow for a high degree of parallelism.

References Brown, E.W., Callan, J.P., Croft, W.B. and Moss, J.E.B. (1994), “Supporting full-text information retrieval with a persistent object store”, in Jarke, M., Bubenko, J. and Jeffery, K. (Eds), Proceedings of EDBT’94, LNCS 779, Springer-Verlag, Berlin, pp. 365-77. Clarke, C.L.A. and Cormack, G.V. (1995), “Dynamic inverted indexes for a distributed full-text retrieval system”, MultiText Project Technical Report MT-95-01, Department of Computer Science, University of Waterloo, Waterloo. Cringean, J.K., England, R., Manson, G.A. and Willett, P. (1990), “Parallel text searching in serial files using a processor farm”, in Vidick, J.L. (Ed.), Proceedings of the 13th International Conference on Research and Development in Information Retrieval, ACM Press, New York, NY, pp. 429-53. DeWitt, D. and Gray, J. (1992), “Parallel database systems: the future of high performance database systems”, Communications of the ACM, Vol. 35 No. 6. Frakes, W.B. (1992), “Introduction to information storage and retrieval systems”, in Frakes, W.B. and Baeza-Yates, R. (Eds), Information Retrieval, Data Structures and Algorithms, Prentice-Hall, Englewood Cliffs, NJ, pp. 1-12. Hawking, D., Craswell, N. and Thistlewaite, P. (1999), “Overview of TREC-7 Very Large Collection Track”, in Voorhees, E.M. and Harman, D.K. (Eds), Proceedings of the Seventh Text Retrieval Conference, NIST Special Publication 500-242, National Institute of Standards and Technology, Gaithersburg, MD, pp. 91-104. Lewis, D. (2006), “The Reuters 21578 test collection”, available at: www.daviddlewis.com/ resources/testcollections/reuters21578/ (accessed 22 August 2006). MacFarlane, A., McCann, J.A. and Robertson, S.E. (2005), “Parallel methods for the generation of partitioned inverted files”, Aslib Proceedings, Vol. 57 No. 5, pp. 434-59. MacFarlane, A., Robertson, S.E. and McCann, J.A. (1996), “On concurrency control for inverted files”, in Johnson, F.C. (Ed.), Proceedings of the 18th BCS IRSG Annual Colloquium on Information Retrieval Research, Manchester, March 26-27, pp. 67-79.

MacFarlane, A., Robertson, S.E. and McCann, J.A. (2000), “Parallel methods for the search of partitioned inverted files”, in De La Fuente, P. (Ed.), Proceedings of String Processing and Information Retrieval – SPIRE 2000, La Coruna, September, IEEE Computer Society Press, Washington, DC, pp. 209-20. Reddaway, S.F. (1991), “High speed text retrieval from large databases on a massively parallel processor”, Information Processing & Management, Vol. 27 No. 4, pp. 311-6. Shoens, K., Tomasic, A. and Garcia-Molina, H. (1994), “Synthetic workload performance analysis of incremental updates”, in Croft, W.B. and Van Rijsbergen, C.J. (Eds), Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, SIGIR94, Springer-Verlag, London, pp. 329-38. Thomson Dialog (2005), “Dialog Database Catalog”, Thomson Dialog Ltd, available at: www. dialog.com (visited 29 May 2007). Zobel, J. and Moffat, A. (2006), “Inverted files for text search engines”, ACM Computing Surveys, Vol. 38 No. 2, article 6. Further reading MacFarlane, A., Robertson, S.E. and McCann, J.A. (1999), “PLIERS at TREC8”, in Voorhees, E. and Harman, D.K. (Eds), Proceedings of the Eighth Text Retrieval Conference, Special Publication No. 500-246, National Institute of Standards and Technology, Gaithersburg, MD, pp. 241-52.

Appendix. Glossary Distributed build

Method of building indexes where text is distributed from a single node.

DocId

Partitioning method which assigns all document data for a given document to one index partition.

Efficiency

Measure of the effective use of processors. Definition: speedup on n processors/n processors.

LI

A measure of the amount of load imbalance on n processors: max time on n processors/average time on n processors.

Local build

Method of indexing where all processing is kept local to the node.

MHz

Megahertz: processor clock speed.

Partition

Fragment of inverted file on a nodes disk.

Scalability

A measure of how well the algorithm scales on the same equipment. Definition: Time on small collection Size of large collection £ : Time on large collection Size of small collection

Speedup TermId

Measure of speed advantage of parallelism. Definition: Time on 1 processors/Time on n processors. Partitioning method which assigns all term data for a given term to one partition.

Update of partitioned inverted files 395

AP 59,4/5

396

TREC VLC

Annual Text Retrieval Conference run by the National Institute of Standards and Technology in the United States. Very Large Collection: Collection of 100 GB web data used in the TREC-7 VLC2 sub-track.

Corresponding author A. MacFarlane can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Flickr and Democratic Indexing: dialogic approaches to indexing

Flickr and Democratic Indexing

Pauline Rafferty Department of Information Science, City University London, London, UK, and

397

Rob Hidderley Department of Computing, University of Central England, Birmingham, UK

Received 20 November 2006 Accepted 4 June 2007

Abstract Purpose – The purpose of this paper is two-fold: to examine three models of subject indexing (i.e. expert-led indexing, author-generated indexing, and user-orientated indexing); and to compare and contrast two user-orientated indexing approaches (i.e. the theoretically-based Democratic Indexing project, and Flickr, a working system for describing photographs). Design/methodology/approach – The approach to examining Flickr and Democratic Indexing is evaluative. The limitations of Flickr are described and examples are provided. The Democratic Indexing approach, which the authors believe offers a method of marshalling a “free” user-indexed archive to provide useful retrieval functions, is described. Findings – The examination of both Flickr and the Democratic Indexing approach suggests that, despite Shirky’s claim of philosophical paradigm shifting for social tagging, there is a residing doubt amongst information professionals that self-organising systems can work without there being some element of control and some form of “representative authority”. Originality/value – This paper contributes to the literature of user-based indexing and social tagging. Keywords Indexing, Visual databases, Tagging Paper type Conceptual paper

Models of indexing Traditionally, the subject indexing of textual documents has been the responsibility of professional librarians and indexers. Indexing practice involves the indexer first determining the meaning of the information document, and then assigning indexing terms which describe meaningful content of the information document, an activity that might depend on the prior construction of information rich controlled vocabularies, for example thesauri or classification schemes. Indexing practice involves the identification and representation of the information content of and about documents using descriptive and analytical information retrieval systems that allow specific user requests for information to be matched up with the relevant information source(s). In the retrieval of textual documents the assumption is that the human indexer is able to decode the textual document and construct a representation of the significant This article is based on a paper presented at the Ninth International ISKO Conference, July 4-7, 2006, Vienna, Austria. An earlier version is published as “Flickr and Democratic Indexing: disciplining desire lines”, in Budin, G., Swertz, C. and Mitgutsch, K. (Eds) (2006), Knowledge Organization for a Global Learning Society, Proceedings of the Ninth International ISKO Conference, 4-7 July, 2006, Vienna, Austria, Ergon-Verlag, Wu¨rzburg, pp. 405-12.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 397-410 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817591

AP 59,4/5

398

information content using the codes and conventions of cataloguing rules and indexing languages, for example classification codes and controlled vocabularies. This classical approach to subject indexing rests on the assumption that information documents have stable meanings that the trained expert is able to decode, and that will be universally accepted as constituting the meaning of the information document. In this theoretical framework, the meaning of information documents is located in the document, and issues about interpretation and reader response remain relatively unexplored. In fact, many early textbooks of librarianship skimmed over the issue of subject analysis, and even Derek Langridge’s 1989 book, Subject Analysis (Langridge, 1989), assumed that there is a correct approach to the analysis of meaning of documents, and that students of librarianship could be educated into the correct approach. In opposition to the view that the meaning of a document is to be located entirely in the document, whether the signs are text-based symbolic signs or indexical and iconological images, this paper takes as its starting point the view that the meaning of documents derives from the interaction of the document and the reader, or viewer of the document, and that subsequently, there are interpretations of the document rather than a single authoritative interpretation. Following from this, we believe that in order to understand the theory and practice of subject indexing, it is useful to consider subject indexing as communicative practice. Subject indexing practice considered in this light suggests that a social constructionist metatheoretical framework (Talja et al., 2005) could be fruitful as a way of exploring recent trends in subject indexing practice, which move beyond the indexing activity of the professionalised indexer, towards an approach to assigning subject indexing terms often referred to as social tagging. We believe that in relation to both text and non-textual information objects, Mikhail Bahktin’s distinctions between monologic aspects of utterance and dialogic aspects of utterance (Bahktin, 1981) can become a useful way of theorising indexing practice. Bahktin’s distinctions between dialogic and monologic aspects of utterance grew out of his analysis of literary texts, but the model has been used in Britain since the 1970s as a framework through which to undertake cultural studies analysis, media critique and critical discourse analysis (e.g. Chandler, 2002; Fairclough, 1995). Following from Bahktin, we can define monologic texts as texts that are closed and directed from an authoritative source to an audience. The dialogic aspects of utterances relates to openness and interpretation, to the relationships that exists between the text and the audience, and indeed relationships between texts. Traditional indexing practice can be characterised as monologic, whether it is indexing undertaken by highly trained expert staff of the library, or the metadata designed for authors of webpages. Recent interest in folksonomies, ethnoclassifications, and social software suggests that there is a shift in interest towards dialogic indexing practice, where readers or viewers of documents are encouraged to add their own tags. There are a number of websites that use social tagging, and these include text-based websites (such as CiteULike and de.licio.us), music-based websites (such as last.fm), and image based websites (such as Flickr). In this paper we explore the move from monologic to dialogic indexing practice in relation to Flickr, and think about some of the issues that grow out of that move; however, the issues relating to social tagging are not confined solely to photographic websites.

Expert-led indexing Expert-led approaches to subject indexing rely on the existence of controlled vocabulary systems, classification schemes, taxonomies, or ontologies prior to specific instantiations of indexing activity. What traditional expert-led subject indexing systems have in common, whether they derive from a belief in consensus of adept intellectuals, or whether they are derived from a post-modern pragmatic functionalism, is their tendency towards monologic utterance. Traditional expert led subject indexing relies on the management of information through the intervention of intermediaries (librarians, indexers, publishers, volunteers). Monologic, expert led indexing is expensive and time consuming. It can be a facilitator of access to information, by providing routes into large groups of documents. It can also be an inhibitor of access to information, because any constructed, controlled vocabulary will privilege specific worldviews, and may ignore or marginalise other worldviews, with the result that certain concepts and terms are neglected (Berman, 1971; Olson, 2002). Expert-led knowledge organisation is a very single minded way to construct maps of knowledge. This matters, because representations of knowledge in knowledge organisation tools are always ideologically determined and politically consequential. Knowledge organisation tools are artificial constructions, and so historically and culturally contingent, thus, always already ideological. Although the worldview underpinning expert led indexing is monologic, it is in the indexing process that dialogic aspects of utterance creep in. The existence of knowledge organisation tools points to monologic authority, but in practice the human activity of indexing is not consistent and universal but is open to individual interpretation. Mai’s recent work on semiotics and indexing reminds us that indexing practice is not “a neutral and objective representation of a document’s subject matter but the representation of an interpretation of a document for future use” (Mai, 2000). Author-based indexing Mathes (2004) argues that an alternative model of indexing is author-based indexing. Traditional automated text-based indexing develops indexes by extracting terms from the text. This approach assumes that the author will use terms that are commonly understood and generally accepted. This literary warrant approach can prove problematic if indexing and searching happen at a later historical moment than the moment of textual production. Terms that are ideologically acceptable at the moment of production might not be considered acceptable at the moment of reception. Another author-based indexing approach to solving the problem of the expense of expert based indexing is seen most commonly in the development of the internet. The Dublin Core Metadata Initiative (2003) has been developed with a view to facilitating authorial indexing. A problem that this approach faces is that the author is not necessarily an information manager with the professional knowledge and understanding of an expert. The result is that the design of metadata frameworks are sometimes less detailed and specific, and the definitions of the content of fields and sub-fields simpler than those developed for traditional knowledge organisation purposes, to enable authors to create their own metadata. Information professionals are sometimes critical of DCMI in comparison to the Anglo-American Cataloguing Rules (AACR). DCMI is not as detailed as AACR, but is perhaps too detailed for the amateur indexer. The focus of AACR and DCMI is on

Flickr and Democratic Indexing 399

AP 59,4/5

400

resource description. DCMI still refers authors/indexers back to existing traditional subject indexing tools, or allows them to add their own descriptive text, for subject access. Mathes (2004) argues that internet search engines have shown that author indexing is sometimes wrong: sometimes it is inaccurate, sometimes author-based indexing is purposefully false and fraudulent. Even if the author is straightforward, truthful and honest, this approach to knowledge organisation still remains a monologic approach, as is the expert-led approach, but a monologic approach often implemented by knowledge organisation amateurs. Underpinning the author-based indexing approach is the assumption that the author’s interpretation of his/her own work is the authoritative view. However, when a communicative object is created and then “liberated” from its producer, and disseminated within a public space, who is to say that the author continues to have complete and utter control over determining its meaning? The reader or viewer is an important element in the production of meaning once the document is no longer in the total control of the author, or of institutional facilitators of dissemination. Indeed, reader interpretation of the text becomes perhaps even more important when the text is interpreted at a later historical moment than the moment of production. Meaning is historically contingent. In addition, personal, private meanings attached by readers or viewers can transform the document from the “preferred” meaning that might be inscribed within it by the author (Hall, 2001) to other possible meanings. User-based indexing The move towards social software has generated interest in shared metadata. The challenge is to involve users in metadata production. Professionally produced metadata is of high quality but expensive to produce: Dublin Core offers a way for the author to create metadata; however, in both cases, users are disconnected from the process. User-generated subject-orientated metadata has started to develop as an alternative approach. Clusters of user generated subject tags are sometimes referred to as “folksonomies”, a term generally considered to have been coined by Thomas Vander Wal (2005). Folksonomies do not have hierarchies, but there are automatically generated “related” tags in folksonomies. They are the set of terms “that a group of users tagged content with, they are not a predetermined set of classification terms or labels” (Mathes, 2004). There are some advantages of folksonomy or user-generated indexing, chiefly that tagging is cheaper and more economical in terms of time and effort than traditional indexing practice, and that the instant feedback that can be derived from user generated tagging can facilitate a high level of community interaction that would probably not be possible if decisions had first to be made about the codes, conventions and rules governing any tightly controlled taxonomy. There is a question about who is doing the tagging, as the user has to already have a certain level of IT literacy before engaging with social software. Mathes (2004) also cites as the limitations of these systems their ambiguity, the use of multiple words, and the lack of synonym control, whilst their strengths are that they facilitate serendipity and browsing. Merholz (2005) argues that folksonomies can reveal the digital equivalent of “desire lines”. Desire lines are foot worn paths, which can appear in the landscape over time. He suggests that favourite tags across the community may emerge over time and then a controlled vocabulary based on the favourites could be created. A related metaphor is

that of “information landscapes”, a term to be found in the literature of ethnoclassification. It may be that over time, a set of digital “desire lines” will develop, but it is often the case that when groups of humans get too large, they split and form sub-groups and cliques. This might lead to the organic production, not of a dominant controlled vocabulary, but of many splintered controlled vocabularies. Over time this might have cultural and political consequences.

Indexing images In general, image retrieval systems are categorised as either concept- or text-based retrieval systems, in which words are used to retrieve images, or as content-based information retrieval (CBIR) systems. Content-based retrieval research is concerned with the perception and identification of fairly primitive features such as shapes and colour (Eakins and Graham, 1999). CBIR systems are being used for trademark recognition and fingerprint matching (McDonald et al., 2001), but the focus of many of the systems appears to be object recognition within a narrow domain. There are also attempts to use CBIR in a more conventional image oriented retrieval system, for example the Leiden 19th Century Portrait Database (Huijsmans, 2005). From the perspective of an information scientist, it is difficult to imagine how an analysis of pixels, shapes and so on will lead to a system able to describe the denotative and connotative meaning of rich images such as newspaper photographs, although systems like Alipr (2006) that attempt to automatically assign tags to images are beginning to emerge (Lee, 2006). Currently, content-based systems are not able to deal with generalised image classification problems such as “photographs containing unhappy people”, but systems like Alipr are an attempt to begin to address that problem. Wieblut (1995) suggests that a range of user needs can be met by vision-oriented approaches and that applications in science and medicine, for example, are likely to be better served by image content (meaning lines, colours, shapes) rather than interpretive content. In relation to concept- or text-based image retrieval, researchers acknowledge that establishing the meaning of images is a complex business (see, for example, Brown and Hidderley, 1995; Burke, 1999; Enser and McGregor, 1992; Krause, 1988; Shatford, 1986; Shatford-Layne, 1994; Svenonius, 1994). Enser and Burke in particular have referred to Panofsky’s “levels of meaning” model as a way of thinking about the operation of meaning in images. The chief problem acknowledged by virtually all commentators on text-based image indexing is that of the subjectivity of the indexer (e.g. Shatford, 1986; Markkula and Sormunen, 2000). It is highly unlikely that two indexers would use the same terms to describe an image, and it has even been suggested that the same indexer may well index an image differently at different times (Bjarnestam, 1998, p. 6). Enser et al. (2004) use the term “semantic gap” to describe the distance between low level features, which might be automatically extracted from an image, and the “perceived semantics which may provide the initial formulation of a user’s query”. There is currently considerable interest in designing systems that bring together elements of text-based retrieval and content-based retrieval to bridge the “semantic gap”. This is the interesting, if disputed, research context within which we propose to consider the contribution made by social tagging of images on a photographic website.

Flickr and Democratic Indexing 401

AP 59,4/5

402

Knowledge organisation and Flickr Flickr is a photo sharing website that aims to provide new ways of organising photos. The Flickr website explains that: Part of the solution is to make the process of organizing photos collaborative. In Flickr, you can give your friends, family, and other contacts permission to organize your photos – not just to add comments, but also notes and tags. [. . .] and as all this info accretes around the photos as metadata, you can find them so much easier later on, since all this info is also searchable (Flickr, 2006a). Tags are like keywords or labels that you can add to a photo to make it easier to find later. You can tag a photo with tags like “catherine yosemite hiking mountain trail” and then later on if you are looking for pictures of Catherine you can just click on that tag and get all photos that have been tagged that way (Flickr, 2006b).

Flickr has been described as a folksonomy (Wright, 2005), but in practice Flickr works not as a user-indexed, but as an author-indexed database, where the term “author” refers to the person who uploads the image on to the site and creates tags for the image. The construction and use of a tag is left entirely to the author. There are discussions of how tags should be used within the internet community (Ideant, 2005). In practice, tags often correspond to well-understood (usually) English. However, tags are also used that are private codes and sometimes as a code used by some sub-group of users to facilitate semi-private communication. Other tags are developed that are actually phrases without spaces. Tags are uncontrolled (except by the author of an image) and unmediated; there is nothing to stop inappropriate use nor the generation of tags that are (nearly) identical in meaning or (mis-)spelling to other tags. There does not appear to be a single list of all the current tags in use. Flickr adopts summaries of “hot tags” and “all time most popular tags”, neither of which provides any kind of comprehensive listing. Flickr adopts an author-based indexing approach but it is worth exploring whether this leads to a satisfactory retrieval mechanism. The traditional information retrieval measures, precision and recall, are worth considering in this context. The use of a tag for searching may deliver any number of results. There is no way of knowing if all of the relevant images have been retrieved but one can certainly find examples of retrieved images that are irrelevant. Clearly this is due to the fact that an author may not decide to tag an image with a word that is relevant, or alternatively might tag an image inappropriately. Therefore, applying precision and recall measures are likely to indicate a poorly performing system, but one must ask if such measures are appropriate for such a system. Problems with unmediated tag use in Flickr Table I illustrates some of the difficulties that may be experienced through the use of unmediated tags as a mechanism for retrieval. More extensive searches are now possible within Flickr; for example, it is now possible to search the “full text” although it is not entirely clear what that actually includes. For example, a search using the “advanced search”, “full text”, or “goes” found an image of a frozen platform (see www.flickr.com/photos/medalby/66893755/) but examining the page (caption, title and tag list) did not produce the search term “goes”. It appears that a search for “goes” using the advanced search page produces a

Problem

Tag

Issue

Too broad

Wedding

795,280 images (14 February 2006) 2,481,553 Images (14 November 2006)

Too specific

Iijsselmeer

One image (14 November 2006; www.flickr.com/ photos/bennixview2/91925115/)

“False” use

Wedding

Image: www.flickr.com/photos/robwallace/ 99658661/ Picture of a squirrel, presumably at a wedding! Picture of a puppy: www.flickr.com/photos/ 85606729@N00/83578673/

Naked Code (private language?)

XX1 X1

Technologyshowcaseday 53 images, one user (21 February 2006) www.flickr.com/photos/tags/ technologyshowcaseday/

Ambiguity

Goes

Synonyms

Photos Photo

403

Six photos – all from the same user (14 November 2006) www.flickr.com/photos/tags/x1/ 481 photos – multiple users (14 November 2006)

Multiple words

It

Flickr and Democratic Indexing

1,639 images indexed (23 February 2006) 4,073 (14 November 2006) Difficult to discern use or relevance! 9,460 images (23 February 2006) 25,050 (14 November 2006) Wide variety of images! 83,539 images (23 February 2006) 240,696 (14 November 2006) 110,139 images (23 February 2006) 260,390 (14 November 2006)

different set of images (1,492,687 images as of 14 November 2006) to that produced by using the “search everyone’s photos” box (1,492,685 images as of 14 November 2006). In addition, searches for the same word (“goes”) produced different ordered sets of images, the only discernable difference between the various search/results being the URL, indicating marginally different parameters passed between the client and Flickr, but no search term variations. General/specific vocabulary. A key goal for any retrieval system is to use a set of terms that are not too general as to apply to all items, and not too specific that they only apply to a very small number, so that the key terms are able to distinguish items. The uncontrolled use of tags leads to terms that are too broad, retrieving a set that is too big to browse, or so specific that few items are associated with the term. However, given that there is no upper limit to the number of images that a system like Flickr may store, this may be an insurmountable problem through any statistical approach to the control of tag use. The first tag in Table I, “wedding”, retrieved 795,280 images when it was used on Flickr on 14 February 2006, which suggests that this is an example of a tag that is perhaps too broad. The second tag, “Iijsselmeer”, only retrieved one image and is an example of a tag that is perhaps too specific for public searching purposes.

Table I. Examples of difficulties with tag use from Flickr

AP 59,4/5

404

Figure 1. Roman Jakobson’s model of communication

“False” use. Can there ever be “false use” in an author-based indexing system? From the perspective of someone searching for an image using a tag, searching Flickr using a specific tag sometimes leads to images being retrieved that are not always immediately relevant to the tag’s use. The difficulty in Flickr is that there is no mechanism through which this may be changed or influenced, and there is a danger, therefore, that as tag use becomes more unusual, which is highly likely, so retrieval based on tags becomes less and less reliable. The danger is that such indexing practice leads towards entropy and chaos. This would inevitable lead to tags being used solely for private indexes, and public searching would demand other approaches (or, perhaps, would not be implemented at all). Given the increasing volumes represented by the growth in the number of images associated with particular tags identified in Table I and the limited success of the “advanced search” functions, it seems that public retrieval is becoming more of a lottery. This may be of no concern to Flickr, but it is an important lesson for systems that are intended to manage large volumes of similar material where public searching is a priority. Related to this is the question of assumptions about the function of indexing practice. As information professionals, we are interested in the informational content of tags, and traditionally we design, construct and implement informational indexing systems, whether they are free-form keywords, or more structured and formalised taxonomies and ontologies. This is how we have been educated as information professionals; however, it is not always clear that Flickr users who assign tags are interested in recording and exchanging information about images. Once again, it is perhaps useful to consider theories of communicative practice to explore what might be happening when some users add certain kinds of tags. Roman Jakobson’s model of communicative practice has already been used in exploring information functions in images (Dawson and Rafferty, 2001) and might be of some use here. Roman Jakobson devised a model of communication to demonstrate his theory that every act of communication contains a hierarchy of different functions. In his model, communication proceeds from the addresser and is directed towards the addressee. The message is only understood if the addressee is able to contact the message through understanding the language, or the lines represented on paper. The message requires a context to be referred to and is transmitted through a code, a language which is shared by both addresser and addressee (Rafferty and Hidderley, 2005, p. 87). Figure 1 illustrates the model. Jakobson argued that the importance of the functions varies according to the type of communication. When the communicative practice, specifically writing, is oriented towards the addresser, and the writer expresses his or her feelings or opinions through journals, diaries, free writing and poetry, Jakobson describes the function as expressive;

writing oriented towards the reader has conative purpose, which means that the writer is attempting to affect or persuade the reader; context-oriented writing is informative, for example maps or prose; message-oriented writing is poetic in function, focusing on the message itself, the language and structure and composition; contact oriented writing is phatic in function, in that it is primarily concerned with establishing and maintaining contact; and code-oriented writing is metalinguistic, that is writing which comments on writing, for example a journal editorial (Rafferty and Hidderley, 2005 p. 87). This model is useful for exploring the function and purposes of writing. If we use the model to consider the function of the tags “wedding” and “naked”, included in Table I, it might be possible to suggest that the functions of these tags are less informational or informative than expressive, or possibly phatic. Codes. Codes are tags that might be described as symbolic notations standing in place of subject terms representing concepts, as with most library classification notation, or as symbolic signs without any corresponding English (or other natural language) meaning. One might celebrate these private codes and conceive of their meaning as being entirely derived from the collection of images associated with the code. However, such codes are by definition private (Flickr provides no method of stating the intended meaning of a tag) and this cannot be seen as a transparent or useful use for public retrieval. The fifth and sixth examples in Table I are of codes. The former tag, XX1, is a tag that is used by one user to tag a group of images. The latter, X1, is a tag that is used by a number of users to tag images, and is an example of a symbolic sign standing in place of subject terms representing concepts. There is a function on the Flickr database that allows users to browse clusters of tags, and browsing the X1 clusters reveals that the most recent clusters of images tagged with X1 are images relating to Harlequins Rugby Club, images relating to models of the Bell X1 aircraft, the first plane to exceed the speed of sound in controlled, level flight (Wikepedia), and images of Bell X1, a rock band. The single tag, X1, is here being used to denote a number of different concepts, only some of which might be linked. Multiple words. In some ways these are similar to codes. However, they are an attempt by indexers to make their tags more exact, and are really phrases without spaces! This is partly likely to be a response to Flickr’s implementation of tags and the limited search language associated with tags (simply “all” or “any”, not even a Boolean language). The seventh example in Table I is the tag “technologyshowcaseday”, a tag that runs together four separate words. This tag is used to tag other images in the group, but all the images have been tagged and uploaded by the same author. Ambiguity. Text-based document retrieval systems have “stop-lists” of terms that are of little value for indexing: such lists would include words like “it”, “a”, “be” and so on. However, there is no such control within Flickr and although, like other tags, such words have to be associated with an image explicitly by the author it is difficult to discern the value or meaning of such tags. Tags like this may be worse than private codes due to the fact that people know they do have a linguistic meaning and might become confused when the images retrieved challenge their view of the meaning of the tag. “Goes” and “it” are two examples of ambiguous terms in Table I. These terms retrieve a large number of images, but they are not necessarily helpful. Synonyms. The uncontrolled use of tags leads to unfortunate indexing practice. Homonyms, synonyms and misspellings are all ignored in that they are all treated as

Flickr and Democratic Indexing 405

AP 59,4/5

406

unique tags. This leads to poor retrieval performance, not helped by not having a comprehensive list of tags used across Flickr. The example in Table I is in the use of the singular and plural forms, “photo” and “photos”. It is difficult to see either term as particularly valuable for use in a photographic database. Flickr implements some alternative retrieval mechanisms, which might be more appropriate for an informal collection of user-created photographs that constitutes the Flickr database. These mechanisms focus on browsing, and connecting (“clusters” of tags) and identifying photographs considered “interesting” by the users of the Flickr site. These functions facilitate the exploration of the database and the notion of serendipity (discovery by chance, exploration). A user’s tags may be viewed as an alphabetical list where the size of the tag reflects the number of times it has been applied by that user. This feature is also duplicated in the “hot tags” section for all users. This may be considered to be a crude attempt at implementing a “desire line” or providing a “map” of the “information landscape”. There are other functional improvements that could be made to Flickr to mitigate the practical use of tags, like the introduction of a Boolean search language and the provision of a “global tag list”. However, it is unlikely that such functions will resolve the larger problem of improving the retrieval performance of the system (such an aim may not be of interest to Flickr at all!). Is it fair to criticise Flickr for providing such a variable retrieval performance and could the system be improved so that the freedom provided by author-based indexing is retained but the retrieval performance is also improved? The authors believe that the democratic approach (Rafferty and Hidderley, 2005, pp. 177-87) offers a method of marshalling a “free” user-indexed archive to provide useful retrieval functions. Democratic Indexing: an alternative approach to concept-based retrieval The Democratic Indexing project grew out of an interest in the challenges of designing image retrieval systems. It is a response to the issues of connotation, specifically to the issue of whether a “spectrum of connotation”, based on the range of possible meanings available in society at a particular moment, might exist. The design of the database allows changes in meaning over time to be captured. Thus, the database addresses synchronic issues about the structure of meaning at any one time, and diachronic issues about the changes in the system over time. By focusing on user interpretation, Democratic Indexing differs from traditional expert-led models of indexing. In particular, the democratic approach considers that readers of the images play active roles in determining meaning by constructing their own interpretations of images, and that a collection of terms describing meanings constructed by readers should be used to create a subject-based index. The democratic approach determines authority from the agreement of its users: its warrant comes from the constructive interpretation of its users. The democratic approach does not cover all image contents, for example information such as photographer, date of creation and title, that are not subject to variation. However, the approach is applied to all forms of interpreted information that might be summarised as “what does this mean?” or “what is important here?”. The principle of Democratic Indexing is that individuals will have their own, potentially different, interpretation(s) of an image: the differences may be manifested as a different

focus on parts of the image and different terms to describe the image. Democratic Indexing has incorporated a number of novel features: . the information recorded for each information item includes descriptive cataloguing and subject indexing based on user perceptions of the item; . the collection of user-generated indexes will be used to compile a “public” index through a process called “reconciliation”; and . the ability of individual users to record their private indexes, offering a democratic approach to indexing.

Flickr and Democratic Indexing 407

Level of meaning tables Central to the project has been the construction of “levels of meaning” indexing templates, initially to capture a range of information relating to images, but subsequently developed to capture information relating to film and fiction (Hidderley and Rafferty, 1997). The image-based levels of meaning template is shown in Table II. The template is partly built on Panofsky’s (1993) approach to interpreting images, distinguishing between iconography, pre-iconography and iconology, and partly influenced by Barthes’ (1973) distinctions between denotation and connotation. Our attempt to include a dialogical dimension to indexing practice comes in the form of ongoing reconciliation of private indexes. We believe that the meaning of documents is not ontological, but that the meanings attached to documents change over time, and so our approach to reconciliation means that we “update” the historically contingent public view of the meanings attached to indexed documents. The democratic approach reflects the changing meanings of images. The approach adopted in the levels of meaning indexing template is based on some assumptions that still require verification. First, it is assumed that, at least for the Level and category

Description

Some examples

1.1 Biographical

Information about the image as Photographer/artist, date and a document time of creation, colour/B&W, size, title

1.2 Structural contents

Object types, position of object, Significant objects and their physical relationship within the relative size (or importance) within the picture (e.g. car top picture right)

2.1 Overall content

Overall classification of the image

Type of image, “landscape”, “portrait”

2.2 Object content

Classification of each object defined in 1.2

Precise name and details of each object (if known), e.g. Margaret Thatcher, Ford Orion

3.1 Interpretation of whole image Overall mood

Words or phrases to summarise the image, e.g. “happy”, “shocking”

3.2 Interpretation of objects

E.g. Margaret Thatcher triumphant, defeated

Mood of individual objects (when relevant)

Table II. Levels of meaning

AP 59,4/5

408

higher levels of meaning (3.1, 3.2 in Table II), there is no single interpretation of an image. Second, it is assumed that there will be common terms used by viewers to index images. Third, it is assumed that the natural way to describe images is through words and phrases. Organic organisation or teleological discipline? What is at stake in the user-based indexing paradigm is the issue of conventional knowledge organisation tools. Flickr and the Democratic Indexing project are examples of systems offering alternatives to imposed semantic structures, and both systems were initially developed to offer approaches to image retrieval. It might be that the indexing practice is not an activity that demands an either/or solution, either author-based indexing or user-based indexing, but rather that choices regarding knowledge organisation tools and methods are dependent on the format and content of the signs, and the function of the retrieval activity. Clay Shirky (2005) distinguishes between domains in which ontologies operate successfully and those that are more difficult to discipline ontologically. He argues that ontologies do not work well when domains are large corpus, with no formal categories, unstable and unrestricted entities, with no clear edges, and participants are uncoordinated, amateur users, naı¨ve catalogers, and there is no authority. The list of factors makes the web an almost perfect fit for an information space in which ontologies do not work. Shirky’s view is that the process of social tagging heralds a philosophical shift in indexing, which takes us away from a binary process of categorisation to a probabilistic approach. Shirky argues that Flickr and del.icio.us provide us with a way of developing organic categorisation where alternative organisational systems will be built by letting users tag URLs, and then aggregating those tags. The use of the word “aggregating” is suggestive of the limitations of user-driven systems. The term echoes Merholz’s (2005) suggestion that over time folksonomies will develop informational equivalents of “desire lines”, which will provide de-facto controlled vocabularies, and Hidderley and Rafferty’s (1997) suggestion that democratic indexing projects should operate using a public/private indexing split. The discourse of user-based indexing is one of democracy, organic growth, and of user emancipation, but there are hints throughout the literature of the need for post hoc disciplining of some sort. This suggests that, despite Shirky’s claim of philosophical paradigm shifting for social tagging, there is a residing doubt amongst information professionals that self-organising systems can work without there being some element of control and some form of “representative authority” (Wright, 2005). Perhaps all that social tagging heralds is a shift towards user warrant. References Alipr (2006), available at: http://alipr.com/ (accessed 15 March 2007). Bahktin, M.M. (1981), The Dialogical Imagination, University of Texas, Austin, TX. Barthes, R. (1973), Mythologies (trans. by Lavers, A.), Paladin, London. Berman, S. (1971), Prejudices and Antipathies: A Tract on the LC Subject Heads Concerning People, Scarecrow, Metuchen, NJ. Bjarnestam, A. (1998), “Text-based hierarchical image classification and retrieval of stock photography”, available at: http://ewic.bcs.org/conferences/1998/imageret/papers/paper4. pdf (accessed 25 October 2005).

Brown (Rafferty), P. and Hidderley, G.R. (1995), “Capturing iconology: a study in retrieval modelling and image indexing”, Proceedings of the 2nd International Elvira Conference, De Montfort University, ASLIB, London, pp. 79-91. Burke, M.A. (1999), Organization of Multimedia Resources: Principles and Practice of Information Retrieval, Gower, Aldershot. Chandler, D. (2002), Semiotics: The Basics, Routledge, London. Dawson, E. and Rafferty, P. (2001), “‘Careless Talk Costs Lives’: a case study examining the operation of information in British domestic posters of the Second World War”, The New Review of Information and Library Research, Vol. 7, pp. 129-56. Dublin Core Metadata Initiative (2003), available at: http://dublincore.org/documents/ dcmi-terms/ (accessed 10 October 2006). Eakins, J.P. and Graham, M.E. (1999), “Content-based image retrieval”, report to the JISC Technology Applications Programme, January, University of Northumbria, Newcastle. Enser, P.G.B. and McGregor, C.G. (1992), Analysis of Visual Information Retrieval Queries, British Library Research & Development Report, No. 6104, British Library, London. Enser, P.G.B., Lewis, P.H. and Sandom, C. (2004), “Views across the semantic gap”, paper presented at The International Workshop on Multidisciplinary Image, Video, and Audio Retrieval and Mining, De´partement d’Informatique, Universite´ de Sherbrooke, Sherbrooke, Que´bec, 25-26 October, available at: www.cmis.brighton.ac.uk/Research/vir/Bton-Soton Quebec%20Paper2.pdf (accessed 4 July 2006). Fairclough, N. (1995), Discourse Analysis: The Critical Study of Language, Longman, London. Flickr (2006a), “About Flickr”, available at: www.flickr.com/about.gne (accessed 21 February 2006). Flickr (2006b), “Tags, frequently asked questions”, available at: www.flickr.com/help/tags/#37 (accessed 21 February 2006). Hall, S. (2001), “Encoding/decoding”, in Durham, M.G. and Kellner, D.M. (Eds), Media and Cultural Studies: Keyworks, Blackwell, Oxford, pp. 166-77. Hidderley, R. and Rafferty, P. (1997), “Democratic indexing: an approach to the retrieval of fiction”, Information Services and Use, Vol. 17 Nos 2/3, pp. 101-11. Huijsmans, D.P. (2005), “Content-based image retrieval in LCPD: the Leiden 19th-century Portrait Database”, available at: http://nies.liacs.nl:1860/ (accessed 15 March 2007). Ideant (2005), “Tag literacy”, available at: http://ideant.typepad.com/ideant/2005/04/tag_literacy. html (accessed 23 February 2006). Jakobson, R. (1960), “Closing statement: linguistics and poetics”, in Sebeok, T.A. (Ed.), Style in Language, MIT Press, Cambridge, MA, pp. 350-77. Krause, M. (1988), “Intellectual problems of indexing picture collections”, Audiovisual Librarian, Vol. 14 No. 2, pp. 73-81. Langridge, D. (1989), Subject Analysis: Principles and Procedures, Bowker-Saur, London. Lee, J. (2006), “Software learns to tag photos”, Technology Review, 9 November, available at: www.technologyreview.com/read_article.aspx?id ¼ 17772&ch ¼ infotech (accessed 15 March 2007). McDonald, S., Lai, T.-S. and Tait, J. (2001), “Evaluating a content-based image retrieval system”, Proceedings of ACM SIGIR’01, September 9-12, New Orleans, LA. Mai, J.-E. (2000), “The subject indexing process: an investigation of problems in knowledge representation”, PhD dissertation, The University of Texas at Austin, May, available at: www.ischool.washington.edu/mai/abstract.html (accessed 25 October 2005).

Flickr and Democratic Indexing 409

AP 59,4/5

410

Markkula, M. and Sormunen, E. (2000), “End-user searching challenges indexing practices in the digital newspaper photo archive”, Information Retrieval, Vol. 1 No. 4, pp. 259-85. Mathes, A. (2004), “Folksonomies – cooperative classification and communication through shared metadata”, available at: www.adammathes.com/academic/computer-mediatedcommunication/folksonomies.html (accessed 25 October 2005). Merholz, P. (2005), “Metadata for the masses”, available at: www.adaptivepath.com/publications/ essays/archives/000361.php (accessed 25 October 2005). Olson, H. (2002), The Power to Name: Locating the Limits of Subject Representation in Libraries, Kluwer Academic, Dordrecht. Panofsky, E. (1993), Meaning in the Visual Arts, Penguin, Harmondsworth (originally published in 1955). Rafferty, P. and Hidderley, R. (2005), Indexing Multimedia and Creative Works: The Problems of Meaning and Interpretation, Ashgate, Aldershot. Shatford, S. (1986), “Analyzing the subject of a picture: a theoretical approach”, Cataloging & Classification Quarterly, Vol. 6 No. 3, pp. 39-62. Shatford-Layne, S. (1994), “Some issues in the indexing of images”, Journal of the American Society of Information Science, Vol. 45 No. 8, pp. 583-8. Shirky, C. (2005), “Ontology is overrated: categories, links, tags”, available at: http://shirky.com/ writings/ontology_overrated.html (accessed 25 October 2005). Svenonius, E. (1994), “Access to nonbook materials: the limits of subject indexing for visual and aural languages”, Journal of the American Society of Information Science, Vol. 45 No. 8, pp. 600-6. Talja, S., Tuominen, K. and Savolainen, R. (2005), “‘Isms’ in information science: constructivism, collectivism and constructionism”, Journal of Documentation, Vol. 61 No. 1, pp. 79-101. Vander Wal, T. (2005), “Explaining and showing broad and narrow folksonomies”, available at: www.personalinfocloud.com/2005/02/explaining_and_.html (accessed 27 February 2006). Wieblut, V. (1995), “Image content searching is here”, available at: http://sunsite.berkeley.edu/ Imaging/Databases/Fall95papers/vlad2.html (accessed 14 March 2007). Wright, A. (2005), “Folksonomy”, available at: www.agwright.com/blog/archives/000900.htm (accessed 25 October 2005). Corresponding author Pauline Rafferty can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Non-literal copying of factual information: architecture of knowledge Tamara Eisenschitz Department of Information Science, City University London, London, UK

Copying of factual information 411 Received 21 November 2006 Accepted 22 May 2007

Abstract Purpose – The paper seeks to explore the rights of researchers to use facts gathered from previous authors, even when there are only one or a small number of sources, and also to explore the limits of non-literal copying of textual materials. Design/methodology/approach – The paper consists of a conceptual analysis of legislation and cases that illustrate the effects of the law. Findings – The paper finds that the charge of non-literal copying of factual literary works is not accepted because of low levels of originality in structure of the material. Public policy based on the needs of scholarship provides a more predictable level of access to the contents of works. Practical implications – Originality arguments are always open to try again. Only a policy statement will give a degree of certainty. Originality/value – The paper aids in distinguishing the originality and policy arguments and who benefits from each, and also relates this problem to the more familiar one of the protection of free speech. Keywords Copyright law, Public policy Paper type Conceptual paper

Introduction Copyright protects rights holders against the copying and market exploitation of original creative works. It has always been accepted that copyright does not protect ideas, rather their expression, the “idea/expression dichotomy” (Bently and Sherman, 2004, p. 173). Anyone else can re-express concepts in their own words and re-use them in another creative work. In this paper a High Court case is examined, the Da Vinci Code case (Baigent and Leigh v. Random House 2006; hereafter DVC Case; Baigent and Leigh, 2006), where Dan Brown, the author of a novel, The Da Vinci Code, hereafter DVC (Brown, 2004), was accused of breaching copyright by taking the historical and philosophical basis of his story from one main source. This was another book, The Holy Blood and the Holy Grail (hereafter HBHG) by Baigent, Leigh and Lincoln (Baigent et al., 1982). This history and its meaning are contentious; one influential book, HBHG, collated the evidence and set forth the theory, and has since been the source of much further work. It was agreed that Dan Brown used his own words, but alleged that he took the structure or “architecture” of the arguments from HBHG when these were original to their authors (the selection of events and people, their relative importance and order). The concept of copying architecture is taken from software copyright cases where software as a set of commands does have a structure. This phenomenon is known as non-literal copying.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 411-421 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817609

AP 59,4/5

412

The judgment in the DVC Case is important not just for its protagonists, but for all writers and researchers. The complainants allege that if an argument is complex and original enough then it cannot be repeated, the authors would have a monopoly on this contribution to knowledge by way of its structure; that is, they would have protection against non-literal copying. The key question is should a sufficient degree of complexity and originality allow such protection in some cases? Or should potential users of such materials be protected in all circumstances by the recognition of a public policy presumption in favour of open access and freedom of expression? An important feature is that we are dealing with a collection of facts, used as evidence to build a case. The completed theory is then a fact in its own right. So we have nests of facts and the building of a theory by a sequence of facts is something other scholars will want to reproduce and study and then improve on or extend in some way. The submission that the structure of the earlier book, HBHG, was copied, was rejected (Baigent and Leigh v. Random House 2006, Section 46, Conclusion on rejection of central themes). So far, no harm has been done as there has been no change in the legal position, but a loophole was created by implying that had a more complex structure been used, this submission of copying would have been considered seriously and with possible success. We examine the originality arguments and ask whether a public policy perspective is needed instead, to allow researchers the certainty necessary to make use of texts. The details are taken from UK law, except at one point when we compare with US law. In the USA, due to its origin in common law, they have the same doctrine of originality. But they also have a Merger Doctrine for when a work can be expressed in only one way. This re-expresses the requirement of originality as public policy. We look briefly at how a similar outcome can be created in UK law. US economists have also written on the economic case for the public domain and we show that this translates fairly simply to public policy arguments in the UK. Finally, it is concluded that internal consistency in English law reinforced by the US position requires more support for the public domain and therefore the public policy interpretation is the more beneficial one. The structure of information Let us assume we have a text and it conveys information by consisting either of pure facts, or of facts plus interpretation and speculation. Some of these facts and interpretations may well be in dispute, even to the extent of such a work being described as more fiction than fact (this being the situation with HBHG). It is arguably a matter for the authors and publishers of a work to decide whether or not we are dealing with fact, and theories which will establish new facts in a line of scholarship. Once we assume this is such a work, then the factual material will be in the public domain upon publication. These facts may be of any type as long as they relate to knowledge rather than fiction. The two questions we are asking are: (1) Is this work original? (2) Should this work, once released to the public, be available for further exploitation as a matter of public policy? This two-pronged approach of either originality or policy was articulated in case law by Lord Hoffmann in Designer Guild Ltd v. Russell Williams (Textiles) Ltd 2000 (Designer

Guild, 2000), where he revisited the cases on idea and expression. Ideas will not be protected if not original or so commonplace as not to form a substantial part of a work (Bently and Sherman, 2004, p. 173). However, not acknowledging the basis of public policy for this rule suggests that, if general ideas embody substantial labour and skill, they would be protected. This would be “novel and undesirable” (Bently and Sherman, 2004, p. 174).

Copying of factual information

Originality The issue here, given that some material is factual, is can the selection and emphasis of material and the way the whole work is structured be original enough to allow protection despite the contents being factual? If this were the case, we would have the concept of a structure separate from the actual expression of textual content, and non-literal copying would be a possibility. We will consider this for software and for literary works more generally.

413

Software Software is written as text, but is not textual expression; it is a set of commands, written in a specific language with a logical structure. The commands achieve specific outcomes based on that logic, and languages are differentiated by their logic. A major aspect of architecture in software is concerned with the logic and layout of the program to make it work at maximum efficiency. Copying of architecture is called “non-literal” copying and cases were brought to deal with the production of an independent piece of software which produced an appearance and outcome the same as an existing product. One of the earliest and best known of these cases involved the Lotus 123 spreadsheet. This was extremely popular and widely used, so any attempts to improve efficiency needed to end up with the same spreadsheet layout and functions to have any impact on the market. This duly happened: independent software was written to produce a more effective spreadsheet, but it looked like and felt like Lotus 123 and threatened to replace it. On the basis of this resemblance the company was successfully sued for copying the “look and feel” of Lotus 123 – in other words, for non-literal copying. The disputed core issue here was that the software had entirely independent logic and had solved problems in a different way because of its different logic. So all that was copied was the concept of solving that problem and the layout of the spreadsheet, and yet it was commercially very important for the new spreadsheet to be familiar and involve no further learning to be able to use it. There were two main cases brought against Lotus Development Corporation and the decisions provided important clarification of the boundaries of what features are protected and what are not. In Lotus Corporation v. Paperback Software International 1990 (Lotus, 1990), the action of the menu command system by means of a two-line moving cursor was protected. This was held to be expression because different spreadsheets used different menu systems. The grid structure was not protected because it was inevitable that a spreadsheet would use such a grid, there was nothing special about it. However, in Lotus Corporation v. Borland International Inc. 1997 (Lotus, 1997), the menu was ruled to be not a work of copyright; rather it was a mode of operation like the buttons on a video recorder. The court thought it most important that the software was operating a machine and then the copyright issues of idea and expression became irrelevant in this case. The second judgment reverses the first, but this is due to the anomalous position of software. The

AP 59,4/5

414

first decision makes a clear distinction between common, unoriginal expression, the grid, and something that merits protection, the two-line moving cursor (Bainbridge, 2004, pp. 34-5). A computer program can always be hard wired, presented as a set of transistors and wires and then protected by a patent. It is still the same entity and its same functionality is being protected. This makes the level of copyright protection of the life of the author plus 70 years seem excessive. The difficulty is that software is not literary text, but a set of commands leading to an outcome. Protection is required for the problem-solving outcome, but it is the text that gets literary protection. The alternative of protecting software by patent is beyond the scope of this paper and has its own problems. It is worth noting that in patent law, which is designed to protect inventions, the protection lasts for around 20 years, much shorter than copyright protection precisely because it is a monopoly based on novelty. Copyright is based on originality and the protection is against copying, not against creation as such. An author can produce another work of extreme similarity provided they did the creative work themselves. A difficulty for authors is that the presumption is on copying having occurred if it could have. Therefore the author has to be able to show there was no contact with the previous work and no possibility of copying. If there was any contact then there is always the possibility of “unconscious copying” which the author would have difficulty disproving. This legal construction of creation as purely original is a problem in the real world where creation is recognized as arising out of the surrounding culture and is taken into account in critical theory under the rubric of intertextuality. Non-literal copying beyond software This concept originated with software but has gone beyond it to other areas involving commands such as cooking recipe books. The published descriptions of such recipes have copyright in the words and pictures used in the same way as any printed text and photographs. The commands themselves, the actual recipes have also been proposed to be protected under copyright (Hourihan, 2006). Most discussion following this has been negative, pointing out that each chef prides themselves on an original twist to a common core. Some menus in America have been adorned with copyright notices for the food, but no infringement actions seem to have been brought. Where the preparation of food has been inventive, at least one chef has applied for patent protection (Wells, 2007). There is no comparable structure for non-fictional printed sources in general, where the text conveys information and not commands. We simply have blocks of text on different topics and the architecture is the logical connections between these topics. Alternatively, the structure could be identified in the work done to put the text together, but it is not seen in the text. If this architecture is to be judged sufficiently complex to confer protection, the questions to ask are the following: . Are there complex connections joining the topics in a multitude of ways, or just linear connections? . How obvious are the connections? . Are they not an essential part of the text and is this not what scholarship is, presenting sets of connections, acknowledging those made earlier and hopefully extending or enhancing them in some way as one’s own contribution?

Public policy arguments Intellectual property laws were developed as a means of spreading knowledge and culture and respecting the public’s interest in access as separate from the economic exploitation cycle. Developments in digital copyright have, in the eyes of many commentators, tipped the balance decisively towards rights holders and away from public access (Laddie, 1996; Eisenschitz and Turner, 1997). Authors are publishing in a public space and therefore can expect and intend their material to be read by subscribers or purchasers and by the users of libraries. Particularly where an argument is new and complex, it will initially at least, be difficult if not impossible to communicate these ideas in a language or sequence different from the original. The ideas will need to be digested before they can be expressed in a significantly new way. It may well be necessary to communicate them to a community of scholars as an aid to digestion. Therefore the more important the ideas (by virtue of being difficult and not obvious), the more they need to be communicated as an aid to understanding. In this way of looking at the issue, it is not in the public interest to lock facts and ideas up via copyright and make access difficult. It is not a question of degree of originality, but is a decision of public policy that facts and ideas will be freely available for further use. This conflict of approaches and the need to clarify the underlying balance of interests in intellectual property law has lead quite recently to some campaigning against the operations of intellectual property law. This will be discussed in the next section. IPR backlash Law is to some extent the art of the possible. The public will accept the restrictions of intellectual property laws as long as they can see that fair exceptions are available for normal and traditional uses of knowledge and culture. There have been increasing calls for a change in attitude to IPRs by the Royal Society of Arts (2005), the World Intellectual Property Organization (2004) and most recently by the British Library (Chillingworth, 2006) because the balance in the digital environment has become so tilted away from users. Briefly, this campaign calls for the exceptions to copyright to be maintained and strengthened. They currently allow for limited copying for purposes of non-commercial research, private study, news reporting, reviews, preservation and various other strictly defined instances. But these are regarded as defences against infringements. Instead, the backlash campaign argues that the public domain needs to be valued as a positive asset. Also that access via copyright licences should be more generous and is not incompatible with profiting from a work. A key figure is Laurence Lessig and his movement for Creative Commons licences (Lessig, 2004). This is a very recent trend and indicates that the pendulum of public opinion is swinging against intellectual property rights and towards freer access. However, it is interesting that the main thrust of this backlash has come from consideration of the needs of authors, with users included insofar as they are also authors. Burrell and Coleman (2005) argue that in the UK users have had a particularly poor deal. They struggle to be heard in a complex of supranational and domestic legislative and political processes. Their preferred model for reform would follow closely the wording of the Information Society Directive, which has flexible exceptions to allow for

Copying of factual information 415

AP 59,4/5

the collection of information for research and illustration for teaching. These could be interpreted generously to improve their effectiveness. They should be supplemented by a public interest defence, and minimum standards should be set for the practicalities of institutional copyright policies (Burrell and Coleman, 2005). It is now time to look at the details of the Da Vinci Code case and see its possible importance as an indicator of change.

416 The Da Vinci Code case The Da Vinci Code (DVC) is a novel (Brown, 2004). It is based on the factual information contained in a number of books, most prominently, The Holy Blood and the Holy Grail (HBHG) (Baigent et al., 1982). This sets out the key story of Jesus’ bloodline and its merger with the Merovingian line of kings of France. This is taken as the key set of facts transferred from HBHG to DVC. There are a number of sub-facts which elaborate this, such as the existence of the Priory of Sion linked to the Templar Knights which guard the Grail and indications of this story in numerous texts and paintings, which are cited and explained. The complainants, Baigent and Leigh, alleged that the facts used in DVC were just their set of facts which had been lifted primarily from the one source and reused (the third author, Henry Lincoln, takes no part in this action). Now if HBHG were accepted as being a piece of fiction, there would have been no contest. Even if the exact words were not used, a substantial number of details were taken and this could have been a case of prima facie copying. But no, HBHG is labelled as fact, and facts can be taken and reused by others as long as cast in their own words. This is known as the “idea/expression dichotomy” a long established aspect of copyright law (Bently and Sherman, 2004). The crucial point is that HBHG has been thoroughly mined as the source of a strand of DVC, though there are other aspects of DVC in the detective story that are original to its author, Brown. It is not important whether or not the HBHG strand is a substantial part of DVC, only whether the strand is a substantial part of HBHG, which it does seem to be. And a good reason for this is that the theory in DVC is very new and is not readily available elsewhere. In the normal pattern of scholarship, theories get assembled from knowledge available from multiple sources. It can be put together in various ways to try for a better understanding of the whole. But this text was almost unique at the time of its writing, possibly because it may be fact but may also be substantially fiction, thus creating a hybrid work of faction. Therefore the grouping of facts was taken from this one source and melded into Dan Brown’s work of pure fiction. If HBHG were acknowledged as fiction, this would constitute unacceptable copying. But if the contents of HBHG are taken to be factual, then these factual parts must be available to others for further exploration and scholarship as required. The complainants accepted that the copying of facts would not be protected, and instead alleged that Brown had copied the structure of the argument. The term used was the “architecture”, which is taken from software as discussed previously. The judgment makes clear that the hints in the early stages of the case of complex architecture were exaggerated and by the end had been accepted as being simply the chronological order of a sequence of central themes (Baigent and Leigh, 2006, paragraph 115). Even the subdivision of the topic into a number of themes was deemed

to have been artificial. Chronological order was regarded as being insufficiently original to merit protection, so the argument failed in this case (paragraph 261), though it is up for appeal. However, the possibility of protection of architecture remains, because Judge Smith declared that this argument might have been persuasive had there been a more definitive and complex structure. This is implicit in the statement that it was because the structure was simple that protection was refused (paragraph 348). It is therefore necessary to examine what is meant by such structure and whether it presents a real threat to scholarship or not. Arguably the greater threat to scholarship would be the inability to use reference works freely, which would be a matter of public policy so that there is no ambiguity. Issues arising from the DVC judgement The alleged copying from HBHG in DVC was of the evidence and theories, historical facts making up the stories of Christ’s bloodline and its protection in France. The case rested on an identifiable structure or architecture in the book being copied. This is essentially non-literal copying. For most comparable scenarios of historical research where a topic has produced many theories and authors there will be many sources. Imagine, for instance, writing an historical novel about Anne Boleyn, one of the wives of Henry VIII. There are many resources available. But for DVC there was the one main source – HBHG. In overview the sequence of key events and characters was much the same going from HBHG to DVC. For legal satisfaction one needed to ask how complex these connections were. Did they amount to something that should be protected? Ultimately the copying was agreed to consist of the linear chronological order of events. This decision is similar to that in the US case of Feist Publications Inc. v. Rural Telephone Company 1991 (Feist, 1991), where a listing of telephone subscribers was copied, but it included all available subscribers in the area and in alphabetical order. This was not deemed to have required any creativity to put together. In the present case, chronological order was not deemed to require creativity either and protection was denied to HBHG. This is the originality argument, but one could imagine that more substantial connections for more complex theories would attract protection. Given this part of the judgment, no historian would be safe from a speculative accusation that a more complex web bound the facts and theories together and thus there was copying. This argument would be worth the rights-holder’s while to put forward even if it failed. Its possibility would have a chilling effect on literature research. It denies the possibility of a complex web of arguments being a higher level of theory in its own right and could be seen as quite destructive of trust between scholars and their readers in any field or circumstance. This is the weakness of this judgment. However, Mr Justice Smith recognized this very point, because at the end of his DVC Case judgment he transfers to a policy-based argument by adding that authors should not have their sources pawed over as Brown has had done, and that this was undesirable as a matter of public policy (paragraph 348). The basis of this argument is that facts and the benefits of scholarship should be available for all, once in the public domain. A second prong to this argument is that of freedom of speech, freedom to put ideas and theories forward to whichever audience an author sees fit. This is upheld at every

Copying of factual information 417

AP 59,4/5

418

opportunity, most particularly in Ashdown v. Telegraph Group Ltd. 2001 (Ashdown, 2001). There, Lord Phillips notes that conflict between freedom of speech and copyright will be rare because “only the form of the literary work is protected” and copyright will not prevent publication. This is consistent with freedom of use for facts and supports the need for this as the legal position. This second argument on freedom of expression is clearly incompatible with the first on originality. It implies an absolute and policy-based commitment to free expression. A free-speech argument would provide a definitive answer to any form of structure-based, non-literal, copying argument for text and would provide security for future writers and scholars as the first argument does not. But, of course, in these peculiar circumstances where these authors had a unique carefully tended theory, they had hoped that its originality would have conferred protection. It is becoming increasingly recognized that some principles need to be derived to allow for accommodations between copyright enforcement and the expectation to exercise freedom of speech. Consistency between laws is a powerful expectation, and Baerendt (2005) points to the public interest defence to actions for breach of confidence in UK law. Copyright is treated as a form of property analogous to land and therefore an absolute right. If it were instead to be treated as closer to breach of confidence, to be a form of breach of trust, it would be easier to introduce such public interest arguments in copyright actions. Principles would need to be developed for when it would be appropriate to consider freedom of speech arguments, but this step would allow for considerable reconciliation (Baerendt, 2005). Suthersanen (2005), on the other hand, argues that the right to free expression needs to be written explicitly into the framework for intellectual property laws, and internationally as this is a widespread problem. She suggests TRIPS, the international trade agreement, as the appropriate place where provisions would be both nationally and trans-nationally enforceable (Suthersanen, 2005). This is a controversial topic, but the necessity for action is at last being recognized. US law This is not an aspect of copyright law which has been harmonized throughout the European Union. A more fruitful source of guidance is likely to be from US law which has a viewpoint stemming from the same common law origins as the UK law. The basic originality argument as given in the US Feist case (Feist v. Rural Telephones 1991) is similar to the DVC argument. There are also two policy type arguments: (1) the Merger Doctrine; and (2) an economic analysis of intellectual property law. These last two arguments are different from those used in the UK and can add a different perspective. US economic copyright provisions have a similar treatment of originality of ideas and expression as in the UK. The Merger Doctrine is a policy argument and deals with the particular situation of a work being expressed in only one way. Then there is the economic analysis of copyright which has been carried out predominantly by US academics, because the US philosophy of intellectual property rights focuses strongly on the motivations of economic reward. In the UK context, its conclusions relate better to our public policy argument.

The freedom of speech aspect is also covered in the USA, but rather differently because of the Constitutional Right. The Merger Doctrine in the USA US copyright law has a special Merger Doctrine, which says that where something can be expressed only in one way, then that cannot be protected as not being in the public interest (Clayton, 2005). In copyright, it would be a policy decision to say that an idea that is entirely new and can be expressed in only one way could be copied without infringing copyright, because how else is one to express and communicate the concept? This is the Merger Doctrine, which as a policy formulation could be incorporated into either UK case law or as an amendment to the statute law without needing a broader “doctrine”. An entirely new idea may be in such a state (of only one possible form of expression) for a while, until people understand it well enough to paraphrase it and start to express it in their own words. In time new ideas become mainstream and accepted. Then they can be re-expressed by readers in new ways. To take a different field, Einstein’s theory of special relativity was thought to be incredibly abstruse when first published in 1905. It is now taught to undergraduate physics students and is not thought to be particularly difficult because the concepts are integrated into the overall structure of classical physics. The economic arguments Economic arguments have been elaborated mostly in the USA, because their theory of copyright is that creators are encouraged to produce more when they are well rewarded for what they produce. In contrast, WIPO and the European Union see copyright as predominantly an instrument to spread knowledge and culture. The argument is the essentially economic one that the property rights arise from rewarding effort and that the rewards encourage further works. Specifically for copyright, the idea-expression dichotomy can be explained in terms of law and economics (Landes and Posner, 1989), and also in terms of doctrines of possession and of what is not capable of being possessed (Yen, 1990). Results from such analyses are difficult to interpret meaningfully and to reconcile with experience. Reward must play some part, but a more nuanced analysis which distinguishes scholarly publishing from the authors of novels, and the economics of entertainment from any other category, might show up the differing priorities. Creative works are created at least in part for the purpose of creating, and to be used and enjoyed, not for the economic reward. Free speech The freedom of speech aspect of copyright is subsumed under the First Amendment to the Constitution, which guarantees free speech and publication as fundamental rights. But applications of copyright which could affect free speech negatively are linked to the Constitution and therefore interpreted in its favour (Yen, 1989). The US position overall Both the originality and the policy arguments in the USA indicate that no protection should be available which prevents the further publication and elaboration of facts and

Copying of factual information 419

AP 59,4/5

420

factual theories. The originality argument in Feist requires a step in creativity that was not present. The Merger Doctrine requires that unique works can be copied freely as an exception to copyright, and the purely economic arguments for reward of creative activity are at best incomplete. The US position seems to protect uniqueness with a creative step that needs to be judged for each case on its merits. Otherwise there is an economically driven copyright policy. Free speech arguments are favoured by the Constitution and also to some extent as an exception to copyright. The European position on uniqueness is similarly based on an identifiable creativity. The IPR backlash noted above would indicate a public mood towards strengthening this aspect. Free speech considerations add to the purely property-based access rights to strengthen the need for a change to a more permissive climate towards users. Conclusion The Da Vinci Code case at the centre of our attention has exposed the contradiction in protection of fact. The need for a public policy interpretation so as not to persecute researchers with questions of degree of originality is reinforced by analogies with the US Merger Doctrine and with questions of public policy in an economic sense. We also need to protect freedom of speech and not let this be submerged in the clamour of property rights. The needs of users have become privatised into licences and control as intellectual property policy moved into the sphere of international trade. In the DVC case we have comprehensible issues concerning the freedom to research and write. It is time for policy on information use to be brought back to centre stage. The passage of the DVC case through the Court of Appeal would be an ideal starting point. (Note in proof. The Appeal of the DVC Case was heard on 28 March 2007 and it upheld the judgement of the lower court.) References Ashdown (2001), Ashdown v. Telegraph Group Ltd 2001, Weekly Law Reports, Vol. 3, p. 1368. Baerendt, E. (2005), “Copyright and free speech theory”, in Griffiths, J. and Suthersanen, U. (Eds), Copyright and Free Speech – Comparative and International Analyses, Oxford University Press, Oxford, pp. 11-33. Baigent and Leigh (2006), Baigent and Leigh v. Random House, England and Wales High Court (Chancery Division) Decisions 719, available at: www.bailii.org/ew/cases/EWHC/Ch/2006/ 719.html (accessed 18 September 2006). Baigent, M., Leigh, R. and Lincoln, H. (1982), The Holy Blood and The Holy Grail, Random House, London. Bainbridge, D. (2004), Introduction to Computer Law, 5th ed., Longman, Harlow. Bently, L. and Sherman, B. (2004), Intellectual Property Law, 2nd ed., Oxford University Press, Oxford. Brown, D. (2004), The Da Vinci Code, Random House, London. Burrell, R. and Coleman, A. (2005), Copyright Exceptions: The Digital Impact, Cambridge University Press, Cambridge.

Chillingworth, M. (2006), “BL demands overhaul of intellectual property law”, Information World Review, October, p. 1. Clayton, L. (2005), “Copyright law: the Merger Doctrine”, The National Law Journal, 6 June, available at: www.NLJ.com (accessed 15 October 2006). Designer Guild (2000), Designer Guild Ltd v. Russell Williams (Textiles) Ltd 2000, United Kingdom House of Lords Decisions 58, available at: www.bailii.org/uk/cases/UKHL/2000/ 58.html (accessed 18 September 2006). Eisenschitz, T. and Turner, P. (1997), “Rights and responsibilities in the digital age: problems with stronger copyright in an information society”, Journal of Information Science, Vol. 23 No. 3, pp. 209-23. Feist (1991), Feist Publications Inc. v. Rural Telephone Company 1991, United States Supreme Court Reports 1991, Vol. 499, p. 340. Hourihan, M. (2006), “Keep recipes free”, available at: www.megnut.com/2006/10/ keep-recipes-free (accessed 9 February 2007). Laddie, H. (1996), “Copyright: over-strength, over-regulated, over-rated?”, European Intellectual Property Review, Vol. 18 No. 5, pp. 253-60. Landes, W. and Posner, R. (1989), “An economic analysis of copyright”, Journal of Legal Studies, No. 12, p. 325. Lessig, L. (2004), Free Culture: The Nature and Future of Creativity, Penguin Books, London. Lotus (1990), Lotus Development Corporation v. Paperback Software International 1990, 740 Federal Supplement (USA), p. 37 (District Court, Massachusetts 1990). Lotus (1997), Lotus Development Corporation v. Borland International Inc. 1997, Fleet Street Reports 61. Royal Society of Arts (2005), “Adelphi Charter on Creativity, Innovation and Intellectual Property”, Royal Society of Arts, London, available at: www.adelphicharter.org (accessed 17 November 2006). Suthersanen, U. (2005), “Towards an international public interest rule? Human rights and international copyright law”, in Griffiths, J. and Suthersanen, U. (Eds), Copyright and Free Speech – Comparative and International Analyses, Oxford University Press, Oxford, pp. 97-124. Wells, P. (2007), “New era of the recipe burglar”, available at: www.foodandwine.com/articles/ new-era-of-the-recipe-burglar (accessed 9 February 2007). World Intellectual Property Organization (2004), “Principles of IP justice”, available at: www. ipjustice.org/english.shtml (accessed 17 November 2006). Yen, A. (1989), “A first amendment perspective on the idea/expression dichotomy and copyright in a work’s total concept and feel”, Emory Law Journal, No. 38, p. 393. Yen, A. (1990), “Restoring the natural law: copyright as labour and possession”, Ohio State Law Journal, Vol. 51, p. 517. Corresponding author Tamara Eisenschitz can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Copying of factual information 421

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

AP 59,4/5

Mixed reality (MR) interfaces for mobile information systems David Mountain

422 Received 15 December 2006 Accepted 12 June 2007

giCentre, Department of Information Science, City University London, London, UK, and

Fotis Liarokapis Coventry University, Coventry, UK and Department of Information Science, City University London, London, UK Abstract Purpose – The motivation for this research is the emergence of mobile information systems where information is disseminated to mobile individuals via handheld devices. A key distinction between mobile and desktop computing is the significance of the relationship between the spatial location of an individual and the spatial location associated with information accessed by that individual. Given a set of spatially referenced documents retrieved from a mobile information system, this set can be presented using alternative interfaces of which two presently dominate: textual lists and graphical two-dimensional maps. The purpose of this paper is to explore how mixed reality interfaces can be used for the presentation of information on mobile devices. Design/methodology/approach – A review of relevant literature is followed by a proposed classification of four alternative interfaces. Each interface is the result of a rapid prototyping approach to software development. Some brief evaluation is described, based upon thinking aloud and cognitive walk-through techniques with expert users. Findings – The most suitable interface for mobile information systems is likely to be user- and task-dependent; however, mixed reality interfaces offer promise in allowing mobile users to make associations between spatially referenced information and the physical world. Research limitations/implications – Evaluation of these interfaces is limited to a small number of expert evaluators, and does not include a full-scale evaluation with a large number of end users. Originality/value – The application of mixed reality interfaces to the task of displaying spatially referenced information for mobile individuals. Keywords Reality, Mobile communication systems, Information systems, Geography Paper type Research paper

1. Introduction Two of the most significant technological trends of the past 15 years have been the increased portability of computer hardware – such as laptop computers and personal digital assistants (PDAs) – and the increasing availability of wireless networks such as mobile telecommunications, and more recently wireless access points (Brimicombe and Li, 2006). The convergence of these technological drivers presents opportunities Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 422-436 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817618

The work presented in this paper is conducted within the LOCUS project, funded by EPSRC through the Pinpoint Faraday Partnership. The authors would also like to thank their partner on the project, GeoInformation Group, Cambridge, for contributing the building geometry and heights data used by the project. The authors are also grateful to Hulya Guzel for her assistance in the expert user evaluation.

within the emerging field of mobile computing. Increasingly there is ubiquitous access to information stored via a variety of media (for example, text, audio, image and video) via mobile devices with wireless network connections. Advances in software development tools for mobile devices have resulted in the implementation of user-friendly interfaces that aim to appeal to a wide audience of end users. A key challenge for researchers of mobile information systems is to decide the type of interface to adopt when presenting this information on mobile devices. Additionally, developers should assess whether the most suitable interface is dependent upon the audience, the task-in-hand and geographic context in which the mobile information system is likely to be used (Jiang and Yao, 2006). The LOCUS project (LOcation Context tools for UMTS Services) being conducted within the Department of Information Science at City University is addressing some of the research challenges described above (LOCUS, 2007). The main aim of the project is to enhance the effectiveness of location-based services (LBS) in urban environments by investigating how mixed reality interfaces compare with the current map- and text-based approaches used by the majority of location-based services for the tasks of navigation and wayfinding (Mountain and Liarokapis, 2005). To satisfy this aim, LOCUS is tackling a number of issues including the three-dimensional representation of urban environments, the presentation of spatially referenced information – such as the information retrieved as the result of a user query, and navigational information to specific locations – and advanced visualisation and interaction techniques (Liarokapis et al., 2006). The LOCUS system is built on top of the WebPark mobile client-server architecture (WebPark, 2006) which provides the basic functionality associated with LBS including the retrieval of information based upon spatial and semantic criteria, and the presentation of this information as a list or on a map (see Figures 1a and b). In common with the majority of LBS, the basic architecture provides no mechanism for the display of information in a three-dimensional environment, such as a mixed reality interface. Mixed reality environments occupy a spectrum between entirely real environments at one extreme and entirely virtual environments on the other. This mixing of the real and the virtual domain offers great potential in terms of displaying information retrieved as a result of a location-based search, since this requires the presentation of digital information relative to your location in the physical world. This presentation may on the one hand be entirely synthetic, for example, placing virtual objects representing individual results within a virtual scene as a backdrop. Alternatively, an augmented reality interface can superimpose this information over the real world scene in the appropriate spatial location from the mobile user’s perspective. Both interfaces can present the location of information within the scene as well as navigation tools that describe the routes to the spatial locations associated with retrieved information. The LOCUS project is extending the functionality of the WebPark architecture to allow the presentation of spatially referenced information via these mixed reality interfaces on mobile devices (see Figures 1c and d). The rest of the paper is structured as follows. First, a review of relevant background literature in mobile computing and mixed reality is presented. Next, candidate interfaces for mobile information provision into mobile devices are suggested: these include the list, the map, and virtual and augmented reality interfaces. The paper closes with a discussion and conclusions.

MR interfaces for mixed reality interfaces 423

AP 59,4/5

424

Figure 1. Interfaces for presenting information retrieved from a mobile information system

2. Background 2.1 Mobile computing Just as the evolution of the internet has had a profound impact upon application development, forcing a change from a stand-alone desktop architecture to a more flexible, client-server architecture (Peng and Tsou, 2003), researchers in mobile computing are currently having a similar impact, forcing the development of web resources and applications that can be run on a wider range of devices than traditional desktop machines. According to Peng and Tsou (2003), mobile computing environments have three defining characteristics: (1) mobile clients that have limited processing and display capacity (e.g. PDAs and smart phones);

(2) non-stationary users who may use their devices whilst on the move; and (3) wireless connections that are often more volatile, and have more constrained bandwidth, compared to the “fixed” internet. These three characteristics suggest that mobile devices have both specific constraints and unique opportunities when compared to their desktop counterparts. First, screen real estate is limited; typically screens are small (usually less than 60 mm by 80 mm) with low resolution (typically 240 pixels width), and a relatively large proportion of this space may be taken up with marginalia such as scroll bars and menus; hence every pixel should be used wisely. Next, the outdoor environment is a more unpredictable and dynamic environment than the typically familiar indoor home and office environments in which desktop machines are used; hence, user attention is more likely to be distracted in the mobile context. Mobile computer usage tends to be characterised by multiple short sessions per day, as compared with desktop usage, which tends to be for relatively few, longer durations (Ostrem, 2002). Given these constraints, there is a clear need for information to be communicated concisely and effectively for mobile users. Despite constraints, the mobile computing environment offers a unique opportunity for the presentation of information, in particular taking advantage of location sensors to organise information relative to the device user’s position, or their spatial behaviour (Mountain and MacFarlane, 2007). While spatial proximity is perhaps the most intuitive and easily calculated measure of geographic relevance, it may not be the most appropriate in all situations and a variety of other measures of geographic relevance (Mountain and MacFarlane, 2007; Raper, 2007) have been suggested. Individuals may be more interested in the relative accessibility of results, which can be quantified by travel time and can take account for natural and manmade boundaries (Golledge and Stimson, 1997) or the transportation network, to discount results that are relatively inaccessible despite being physically close (Mountain, 2005). Geographic relevance can also be quantified as the results are most likely to be visited in the future (Brimicombe and Li, 2006), or those that are most visible from the current location (Kray and Kortuem, 2004). However geographic relevance is quantified, there are opportunities to use this property to retrieve documents from document collections. Given a set of spatially referenced results that are deemed to be geographically relevant according to some criterion, there are a variety of different approaches to presenting this information. Various mobile information systems have been developed. Kirste (1995) developed one of the first experimental mobile information systems based on wireless data communication. A few years later, Afonso et al. (1998) presented an adaptable framework for mobile computing information dissemination systems called UbiData. This model adopts a “push” model where relevant information is sent to the user, without them making a specific request, based upon their location. There are now a host of commercial and prototype mobile information systems that can present information dependent upon an individual’s semantic and geographic criteria (Yell Group, 2006; WebPark, 2006), the majority of which present results either as a list or over a backdrop map. 2.2 Mixed reality The mixed reality spectrum was proposed by Milgram and Kishino (1994), who depicted representations on a continuum with the tangible, physical (“real”) world at

MR interfaces for mixed reality interfaces 425

AP 59,4/5

426

one extreme and entirely synthetic virtual reality (VR) at the other. Two classes were identified between these extremes. Augmented reality (AR) refers to virtual information placed within the context of the real world scene, for example, virtual chess pieces on a real chessboard. The second case – augmented virtuality – refers to physical information being placed in a virtual scene, for example, real chess pieces on a virtual board. The resulting reality-virtuality continuum is shown in Figure 2. The first VR system was introduced in the 1950s (Rheingold, 1991), and since then VR interfaces have taken two approaches: (1) immersive head-mounted displays (HMDs); and (2) through the window approaches. HMDs are very effective at blocking the signals from the real world and replacing this natural sensory information with digital information. Navigation within the scene can be controlled by mounting orientation sensors on top of the HMD, a form of gesture computing whereby the user physically turning their head results in a rotation of the viewpoint in the virtual scene. The ergonomic limitations of HMDs proved unpopular with users and this immersive interface has failed to be taken up on a wide scale (Ghadirian and Bishop, 2002). In contrast to HMDs, the through the window (Bodum, 2005) – or monitor-based VR/AR – approach exploits monitors on desktop machines to visualise the virtual scene, a far less immersive approach since the user is not physically cut-off from the physical world around them. This simplistic form of visualisation has the advantage that it is cost-effective (Azuma, 1997). Interaction is usually realised via standard input/output (I/O) devices such as the mouse or the keyboard but also more sophisticated devices (such as spacemouse, inertia cube, etc.) may be employed (Liarokapis, 2005). Both HMDs and through the window approaches of VR aim to replace the physical world with the virtual. The distinction of AR is that it aims to seamlessly combine real and virtual information (Tamura and Katayama, 1999) by superimposing digital information directly into a user’s sensory perception (Feiner, 2002) (see Figure 3). Whilst VR and AR can process and display similar information (for example three-dimensional buildings) the combination of the “real” and the “virtual” in the AR case is inherently more complex than the closed virtual worlds of VR systems. This combination of real and virtual requires accurate tracking of the location of the user (in three spatial dimensions: x, y and z) and the orientation of their view (around three axes of orientation: yaw, pitch and roll), in order to be able to superimpose digital information at the appropriate location with respect to the real world scene, a procedure known as registration. In the past few years, research has achieved great advances in tracking, display and interaction technologies, which can improve the effectiveness of AR systems (Liarokapis, 2005). The required accuracy of the AR

Figure 2. The reality-virtuality continuum

MR interfaces for mixed reality interfaces 427 Figure 3. Augmented reality representation: a computer vision sensor recognises the doorway outline, and augments the video stream with virtual information (the direction arrow). Developed as part of the LOCUS project

tracking depends to a degree upon the scenario of use. In order to correctly superimpose an alternative building fac¸ade (for example, a historic or planned building fac¸ade) over an existing building, highly accurate tracking is required in terms of position and orientation, else the illusion will fail since the real and virtual fac¸ades will not align, or may drift apart as the user moves or turns their head (Hallaway et al., 2004). However, if simply augmenting the real world scene with annotations in the forms of text or symbols, for example, an arrow indicating the direction to turn at an upcoming junction, this tracking may not be required to be so accurate. The two most common tracking techniques used in AR applications include computer vision and external sensor systems. The visual approach uses fiducial reference points, where a specific number of locations act as links between the real and virtual scenes (Hallaway et al., 2004). These locations are usually marked with distinctive high contrast markers to assist identification, but alternatively can be distinctive landmarks within the real-world scene. Computer vision algorithms first need to identify at least three reference points in real time from a video camera input, then calculate the distance and orientation of the camera with respect to those reference points. Tracking using a computer vision-based system therefore establishes a relative spatial relationship between a finite number of locations in the real-world scene and the observer, via a video camera carried or worn by that observer (Hallaway et al., 2004), which can allow very accurate registration between the real and virtual scenes in a well-lit indoor environment. This computer vision approach nevertheless has significant constraints. First, the system must be trained to identify these fiducial reference points, and may further require the real-world scene to have markers placed within it. It requires both good lighting conditions (although infrared cameras can be also used for night vision) and significant computing resources to perform real-time tracking, and therefore has usually been conducted in an indoor, desktop environment (Liarokapis and Brujic-Okretic, 2006). An alternative to the vision-based approach is to use external sensors to determine the position of the user and the orientation of their view. Positioning sensors such as

AP 59,4/5

428

the Global Positioning System (GPS) can determine position in three dimensions and digital compasses, gyroscopes and accelerometers can be employed to determine the orientation of the user’s view. These sensor-based approaches have the advantage that they are not constrained to specific locations, unlike computer vision algorithms, which must be trained to recognise specific reference points within a scene. Also, the user’s location is known with respect to an external spatial referencing system, rather than establishing relative relationships between the user and specific reference points. A major disadvantage is the accuracy of the positioning systems, which can produce errors measured in tens of metres and can produce poor results when attempting to augment the real-world scene with virtual information. While advances in GPS systems such as differential GPS and real-time kinematic GPS can bring down the accuracy to one metre and a few centimetres respectively, GPS receivers still struggle to attain a positional fix where there is no clear view of the sky, for example in doors. Digital compasses also have limitations; the main flaw is that they are prone to environmental factors such as magnetic fields. Having identified a spatial relationship between the real-world scene and the user location, virtual information needs to somehow be superimposed upon the real world scene. Traditionally there have been two approaches to achieving this: (1) video see-through displays; and (2) optical see-through displays. Video see-through displays are comprised of a graphics system, a video camera, a monitor and a video combiner (Azuma, 1997). They operate by combining a HMD with a video camera. The video camera records the real environment and then sends the recorded video to the graphics system for processing. There the outputted video and the generated graphics images, by the graphics system, are blended together. Finally, the user perceives the augmented view in the closed-view display system. Using the alternative approach, optical see-through displays are usually comprised of a graphics system, a monitor and an optical combiner (Azuma, 1997). They work by simply placing the optical combiners in front of the user’s view. The main characteristic of the optical combiners is that they are partially transmissive and reflective. That is because the combiners operate like half-silvered mirrors, permitting only a portion of the light to penetrate. As a result, the intensity of the light that the user finally sees is reduced. A novel approach to augmenting the real-world scene with virtual information, emerging from within the field of mobile computing, is to use the screen of a handheld device to act as a virtual window on the physical world. Knowing the position and orientation of the device, the information displayed on screen can respond to movements and gestures of a mobile individual, for example, presenting the name of a building as text on the screen when a user points their mobile device at it, or updating navigational instructions via symbols or text as a user traverses a route. MARS is one of the first outdoor AR systems and a characteristic example of a wireless mobile interface system for indoor and outdoor applications. MARS was developed to aid navigation and to deliver location-based information to tourists in a city (Ho¨llerer et al., 1999). The user stands in an outdoor environment wearing a prototype system consisting of a computer, a GPS system, a see-through head-worn display and a stylus-operated computer. Interaction is via a stylus and display is via a tracked see-through head-worn display. MARS like most current mobile AR systems

has significant ergonomic restrictions which stretch the definitive of mobile and wearable computing beyond what is acceptable for most users (the system is driven by a computer contained in a backpack). Tinmith-Hand AR/VR is a unified interface technology designed to support outdoor mobile AR applications and indoor VR applications (Piekarski and Thomas, 2002). This system employs various techniques including 3D interaction techniques, modelling techniques, tracked input gloves and a menu control system, in order to build VR/AR applications that can be applied to construct complex models of objects in both indoor and outdoor environments. A location-based application that was designed for a mobile AR system is ARLib: it aims to assist the user in typical tasks that are performed within a library environment (Umlauf et al., 2002). The system follows a wide area tracking approach (Hedley et al., 2002) based on fiducial-based registration. Many distinct markers are attached to bookshelves and walls so that the book’s positions are superimposed on the shelves as the user navigates inside the library. To provide extra support to the user, a simple interface and a search engine are integrated to provide maximum usability and speed during book searches. 3. Interfaces for mobile information systems There are many candidate interfaces for the presentation of the results of an information retrieval query on mobile devices (Mannings and Pearson, 2003; Schofield and Kubin, 2002; Mountain and Liarokapis, 2005). This section describes how the interfaces described previously can be applied to the task of presenting information retrieved as the result of a mobile query. As described in the introduction, the LOCUS project has developed alternative, mixed reality interfaces for existing mobile information system technology based upon the WebPark platform. The WebPark platform can assist users in formulating spatially referenced, mobile queries. The retrieved set of spatially referenced results can then be displayed using various alternative interfaces: a list, a map, virtual reality or augmented reality. Each interface is described in more detail in the rest of this section. 3.1 List interface The most familiar interface for the presentation of the results of an information retrieval query is a list; this is the approach taken by the majority of internet search engines where the most relevant result is placed at the top of the list, with relevance decreasing further down the list (see Figure 1a). In the domain of location-aware computing, results that are deemed to be particularly geographically relevant (Mountain and MacFarlane, 2007; Raper, 2007) will be presented higher up the list (Google, 2006; WebPark, 2006). While familiar, this approach of simply ordering the results does not convey their location relative to your current position. 3.2 Map interface The current paradigm in the field of LBS is to present information relevant to an individual’s query or task over a backdrop map (see Figure 1b). This information may include the individual’s current position (and additionally some representation of the spatial accuracy), the locations of features of interest that were retrieved as the result of a user query (e.g. the results from a “find my nearest” search), or navigation information such as a route to be followed. This graphical approach has the advantage

MR interfaces for mixed reality interfaces 429

AP 59,4/5

430

of displaying the direction and distance of results relative to the user’s location (a vector value), as opposed to just an ordering results based on distance. The viewpoint is generally allocentric (Klatzky, 1998), adopting a bird’s eye view looking straight down on a flat, two-dimensional scene (see Figure 1c). The backdrop contextual map used is usually an abstract representation and may choose to display terrain, points or regions of interest, transportation links, or other information, alternatively a degree of realism can be included by using aerial photography (WebPark, 2006; Google, 2006). 3.3 VR interface An alternative to the allocentric viewpoint of a two-dimensional, abstract scene is to choose an egocentric viewpoint within a three-dimensional scene (see Figure 4). Such a perspective is familiar from VR discussed in section 2.2. While the concept of VR has existed for many decades, only during the past few years has it been used on handheld mobile devices. Traditionally, VR applications have been deployed on desktop devices and have attempted to create realistic looking models of environments to promote a feeling of immersion within a virtual scene. This has resulted in less opportunity for individuals to compare the virtual scene with its real-world counterpart. This separation of the real and the virtual is due in part to the static nature of desktop devices, and in addition that the appeal of many virtual scenes is that they allow the viewing of locations that cannot be visited easily, for example, virtual fly-throughs on other planets (NASA Jet Propulsion Laboratory, 2006) and imagined landscapes (Elf World, 2006). In a location-aware, mobile computing context, the position of the user’s viewpoint within a VR scene can be controlled from an external location sensor such as GPS, and the orientation of the viewpoint can be controlled by sensing the direction of movement (from GPS heading), or an orientation sensor to gauge the direction an individual is facing (e.g. a digital compass). The VR scenes themselves can adopt different levels of detail and realism (Bodum, 2005). A particular building may be represented with an exact three-dimensional geometric representation, and graphics added as textures to the fac¸ades of the building to create as true a representation as possible – known as a verisimilar representation (Bodum, 2005). Alternatively, the building may be modelled with a generalised approximation of the geometry within specific tolerances. For texturing the building facades, generic images may be applied that are typical of that class of building. The building block can be left untextured, but more abstract information conveying using shading, icons, symbols or text (Bodum, 2005). The level of detail and realism required by different users for different tasks is an open question currently under investigation (Liarokapis and Brujic-Okretic, 2006). Traditionally, for VR applications deployed in a static, desktop context, there has been greater emphasis placed upon scenes looking realistic than ensuring that the content of these scenes is spatially referenced. However, in a mobile context, accurate spatial referencing of VR scenes is required when setting the viewpoint within that scene (using position and orientation sensors) to ensure that the viewpoint in the virtual scene is registered accurately with the user’s location in the real world scene. Realism is still important since this can help the user make associations between objects in the virtual scene and those in the real world. For the applications developed as part of the LOCUS project, within this VR backdrop, additional, non-realistic visual information can be included to augment the scene. Such information can include nodes representing documents retrieved from a

MR interfaces for mixed reality interfaces 431

Figure 4. A virtual representation of a London neighbourhood

AP 59,4/5

432

spatially referenced document collection (see Figure 1c), or navigational information and instructions (i.e. 3D textual directions). This approach has the advantage of promoting a feeling of immersion, and creating a stronger association between the physical world and relevant geo-referenced information, but is potentially less effective than a map in providing a quick synopsis of larger volumes of information relative to your location. There are opportunities to adopt multiple viewpoints within the VR scene that fall between the extremes of the allocentric-egocentric spectrum, for example an oblique perspective several metres higher than the user’s viewpoint (see Figure 4). 3.4 AR interface A fourth approach to the display of information in mobile computing is to use the device to merge the real world scene with relevant, spatially referenced information by using an AR interface – the virtual window approach described in section 2.2. Just as for the mobile VR case described above, knowing the location and orientation of the device is an essential requirement for outdoor AR, in order to superimpose information in the correct location. As described in the literature review, a GPS receiver and digital compass can provide sufficient accuracy for displaying points of interest in the approximate location relative to the user’s position. At present, however, these sensor solutions lack the accuracy required for more advanced AR functionality, such as aligning an alternative fac¸ade on the front of a building in the real world scene. In the LOCUS system, the handheld mobile device presents text, symbols and annotations in response to the location and orientation of the device. There is no need for a HMD, since the screen on the device can be aligned with the real world scene. On the screen of the device, information can either be overlaid on imagery captured from the device’s internal camera, or the screen can display just the virtual information with the user viewing the real world scene directly. The information displayed is dependent upon the task in hand. When viewing a set of results, as the user pans the device around them, the name and distance of each result is displayed in turn as it coincides with the direction that the user is pointing the device, allowing the user to interrogate the real world scene by gesturing. By adopting an egocentric perspective to combine real and virtual information in this way, users of the system can base their decisions of which location to visit on more quantifiable criteria – such as the distance to a particular result, and the relevance on semantic criteria – but also the more subjective criteria that could never be quantified by an information system. For example, following a mobile search for places to eat conducted at a crossroads, by gesturing with a mobile device, users can see the distance and direction of candidate restaurants, and make an assessment based upon the ambience of the streets upon which different restaurants are located. Having selected a particular result from the list of candidates, the AR interface can then provide navigational information, in the form of distance and direction annotations (see Figure 1d), to guide the user to the location associated with those results. Although most examples from location-based service suggest “where’s my nearest” shop or service, there is no reason that this information could not be the location of breaking news stories from a news website, or spatially referenced HTML pages providing historical information associated with a particular era or event.

4. Discussion An evaluation exercise was undertaken to assess appropriate levels of detail, realism and interaction for the mobile virtual reality interface. Whilst there has been extensive evaluation of these requirements in a static desktop context (Dollner, 2005), relatively little attention has been paid to the specific needs of mobile users. In order to gauge these specific requirements, an expert evaluation was conducted. Two common evaluation techniques were applied: (1) think aloud; and (2) cognitive walkthrough (Dix et al., 2004). Think aloud is a form of observation that involves participants talking through the actions they are performing, and what they believe to be happening, whilst interacting with a system. The cognitive walkthrough technique was also used where a prototype of a mobile VR application and scenario of use were presented to expert users: evaluating in this way allows fast assessment of early mock-up, and hence can influence subsequent development and the suitability of the final application. Both forms of evaluation are appropriate for small numbers of participants testing prototype software and it has been suggested that the majority of usability problems can be discovered from testing in this way (Dix et al., 2004). The expert user testing took place at City University with a total of four users with varied backgrounds: one human-computer interaction expert, one information visualization expert, one information retrieval expert and one geographic information scientist. Each user spent approximately one hour performing four tasks. The aims of the evaluation of the VR prototype included assessment of the expert user experience with particular focus on: . the degree of realism required in the scene; . the required spatial accuracy and level of detail of the building outlines; and . a comparison of 3D virtual scenes with 2D paper maps. A virtual reality scene was created of the University campus and surrounding area, and viewpoints placed to describe trajectories of movement through the scene. The expert-evaluation process covered two tasks, including mobile search and navigation. The first scenario was in relation to searching for, then locating, specific features. For example, a user searching on a mobile system for entrances to the City University campus from a nearby station. The second scenario was in relation to navigation from one point to another, for example, from the station to the University. Starting and target locations were marked in the 3D maps, and sequences of viewpoints were presented, to mimic movement through the scene. There was a great deal of variation in terms of the level of photorealism required in the scene, and whether buildings should have image textures placed over the building faces, or whether the building outlines would be sufficient alone. Opinions varied between evaluators and according to the task in hand. Plain, untextured buildings are hard to distinguish from each other and, in contrast, buildings with realistic textures were considered easy to recognise in a micro-scale navigation context (for example, trying to find the entrance to a particular building). However, many evaluators thought that much of this realism would not be required or visible on a small screen device when an overview

MR interfaces for mixed reality interfaces 433

AP 59,4/5

434

of the area was required, for example, when considering one’s present location in relation to information retrieved from a mobile search. Expert users also suggested various departures from the realism traditionally aspired to within the field of virtual reality. These included transparency, to allow users to see through buildings as an aid to navigation, since this will allow the identification of the location of a concealed destination point. Other suggestions included labelling of objects in the scene (for example, building and street names). The inclusion of symbology in the scene to represent points, and routes to those points, was considered to be beneficial to the task of navigation. In terms of the level of detail and spatial accuracy, some users thought that it was not important to have very detailed models of building geometry. Building outlines that are roughly the right size and shape are sufficient, especially when considering an overview of an area, as often required in the mobile search task. For micro-navigation, a higher degree of accuracy may be required. Virtual 3D scenes were found to have many advantages when compared to paper maps: the most positive feature was found to be the possibility to recognize the features in the surrounding environment, which provides a link between the real and virtual worlds. This removes the need to map-read, which is required when attempting to link your position in the real world with a 2D map, hence the VR interface offers an effective way to gauge your initial position and orientation. A more intangible response was the majority of the users enjoyed interacting with the VR interface more than a 2D map. However, the 3D interface also has significant drawbacks. Some users said that they are so used to using 2D maps that they do not really need a 3D map for navigating, however they thought this attitude may change with the next generation. The size, resolution and contrast of the device screen were also highlighted as potential problems for the VR interface. 5. Conclusions This paper has presented some insights on how mixed reality interfaces can be used in conjunction with mobile information systems to enhance the user experience. We have explored how the LOCUS project has extended LBS through different interfaces to aid the tasks of urban navigation and wayfinding. In particular, we have described how virtual and augmented reality interfaces can be used in place of text- and map-based interfaces, which can provide an egocentric perspective to location-based information which is lacking from map- and text-based representations. Expert user evaluation has proven to be a useful technique to aid development, and suggests that the most suitable interface is likely to vary according to the user and task in hand. Continued research, development and evaluation is required to provide increasingly intuitive interfaces for location-based services that can allow users to make associations between spatially referenced information retrieved from mobile information systems, and their location in the physical world. References Afonso, A.P., Regateiro, F.S. and Silva, M.J. (1998), “UbiData: an adaptable framework for information dissemination to mobile users”, Object Oriented Technology, ECOOP’98 Workshop on Mobility and Replication, Brussels, July 20-24, p. 1543. Azuma, R. (1997), “A survey of augmented reality”, Teleoperators and Virtual Environments, Vol. 6 No. 4, pp. 355-85.

Bodum, L. (2005), “Modelling virtual environments for geovisualization: a focus on representation”, in Dykes, J.A., Kraak, M.J. and MacEachren, A.M. (Eds), Exploring Geovisualization, Elsevier, London, pp. 389-402. Brimicombe, A. and Li, Y. (2006), “Mobile space-time envelopes for location-based services”, Transactions in GIS, Vol. 10 No. 1, pp. 5-23. Dix, A., Finlay, J.E., Abowd, G.D. and Beale, R. (2004), Human-Computer Interaction, Prentice-Hall, Harlow. Dollner, J. (2005), “Geovisualization and real-time computer graphics”, in Dykes, J.A., Kraak, M.J. and MacEachren, A.M. (Eds), Exploring Geovisualization, Elsevier, London, pp. 325-44. Elf World (2006), “Elven Forest, 3D”, available at: www.allelves.ru/forest/ (accessed 12 December 2006). Feiner, S.K. (2002), “Augmented reality: a new way of seeing”, Scientific American, Vol. 4 No. 24, pp. 48-55. Ghadirian, P. and Bishop, I.D. (2002), “Composition of augmented reality and GIS to visualise environmental changes”, Proceedings of the Joint AURISA and Institution of Surveyors Conference, Adelaide, 25-30 November. Golledge, R.G. and Stimson, R.J. (1997), Spatial Behaviour: A Geographic Perspective, The Guildford Press, New York, NY. Google (2006), Google Local, available at: http://local.google.co.uk/ (accessed 10 December 2006). Hallaway, D., Hollerer, T. and Feiner, S. (2004), “Bridging the gaps: hybrid tracking for adaptive mobile augmented reality”, Applied Artificial Intelligence, Vol. 18 No. 6, pp. 477-500. Hedley, N.R., Billinghurst, M., Postner, L., May, R. and Kato, H. (2002), “Explorations in the use of augmented reality for geographic visualization”, Presence, Vol. 11 No. 2, pp. 119-33. ¨ Hollerer, T., Feiner, S.K., Terauchi, T., Rashid, G. and Hallaway, D. (1999), “Exploring MARS: developing indoor and outdoor user interfaces to a mobile augmented reality system”, Computers and Graphics, Vol. 23 No. 6, pp. 779-85. Jiang, B. and Yao, X. (2006), “Location-based services and GIS in perspective”, Computers Environment and Urban Systems, Vol. 30 No. 6, pp. 712-25. Kirste, T. (1995), “An infrastructure for mobile information systems based on a fragmented object model”, Distributed Systems Engineering Journal, Vol. 2 No. 3, pp. 161-70. Klatzky, R.L. (1998), “Allocentric and egocentric spatial representations: definitions, distinctions, and interconnections”, in Freksa, C., Habel, C. and Wender, K.F. (Eds), Spatial Cognition – An Interdisciplinary Approach to Representation and Processing of Spatial Knowledge, Springer, Berlin, pp. 1-18. Kray, C. and Kortuem, G. (2004), “Interactive positioning based on object visibility”, in Brewster, S. and Dunlop, M. (Eds), Mobile Human-Computer Interaction, Springer, Berlin, pp. 276-87. Liarokapis, F. (2005), “Augmented reality interfaces – architectures for visualising and interacting with virtual information”, DPhil thesis, Department of Informatics, School of Science and Technology, University of Sussex, Brighton. Liarokapis, F. and Brujic-Okretic, V. (2006), “Location-based mixed reality for mobile information services”, Advanced Imaging, Vol. 21 No. 4, pp. 22-5. Liarokapis, F., Mountain, D., Papakonstantinou, S., Brujic-Okretic, V. and Raper, J. (2006), “Mixed reality for exploring urban environments”, Proceedings of the 1st International Conference on Computer Graphics Theory and Applications, Setu´bal, 25-28 February, pp. 208-15. LOCUS (2007), Homepage, available at: www.locus.org.uk (accessed 22 January 2007).

MR interfaces for mixed reality interfaces 435

AP 59,4/5

436

Mannings, R. and Pearson, I. (2003), “‘Virtual air’: a novel way to consider and exploit location-based services with augmented reality”, Journal of the Communications Network, Vol. 2 No. 1, pp. 29-33. Milgram, P. and Kishino, F. (1994), “A taxonomy of mixed reality visual displays”, IEICE Transactions on Information Systems E Series D, Vol. 77 No. 12, pp. 1321-9. Milgram, P., Takemura, H., Utsumi, A. and Kishino, F. (1994), “Augmented reality: a class of displays on the reality-virtuality continuum”, Telemanipulator and Telepresence Technologies, Vol. 2351, pp. 282-92. Mountain, D.M. (2005), “Exploring mobile trajectories: an investigation of individual spatial behaviour and geographic filters for information retrieval”, PhD thesis, Department of Information Science, City University London, London. Mountain, D.M. and Liarokapis, F. (2005), “Interacting with virtual reality scenes on mobile devices”, Human Computer Interaction with Mobile Devices and Services, University of Salzburg, Salzburg, 19-22 September. Mountain, D.M. and MacFarlane, A. (2007), “Geographic information retrieval in a mobile environment: evaluating the needs of mobile individuals”, Journal of Information Science, forthcoming. NASA Jet Propulsion Laboratory (2006), “Lander and Rover on Mars”, available at: http://mars. sgi.com/worlds/pathfinder/pathfinder.html (accessed 12 December 2006). Ostrem, J. (2002), “Palm OS user interface guidelines”, available at: www.palmos.com/dev/ support/docs/ui/UIGuide_Front.html (accessed 10 April 2006). Peng, Z.-R. and Tsou, M.-H. (2003), Internet GIS: Distributed Geographic Information Services for the Internet and Wireless Networks, Wiley, New York, NY. Piekarski, W. and Thomas, B.H. (2002), Unifying Augmented Reality and Virtual Reality User Interfaces, University of South Australia, Adelaide. Raper, J.F. (2007), “Geographic relevance”, Journal of Documentation, forthcoming. Rheingold, H.R. (1991), Virtual Reality, Summit Books, New York, NY. Schofield, E. and Kubin, G. (2002), “On interfaces for mobile information retrieval”, Lecture Notes in Computer Science, Vol. 2411, pp. 383-7. Tamura, H. and Katayama, A. (1999), “Steps toward seamless mixed reality”, in Ohta, Y. and Tamara, H. (Eds), Mixed Reality Merging Real and Virtual Worlds, Ohmsha, Tokyo, pp. 59-84. Umlauf, E., Piringer, H., Reitmayr, G. and Schmalstieg, D. (2002), “ARLib: the augmented library”, Proceedings of the First IEEE International Augmented Reality ToolKit Workshop, Darmstadt. WebPark (2006), “Geographically relevant information for mobile users in protected area”, available at: www.webparkservices.info (accessed 12 December 2006). Yell Group (2006), The UK’s Local Search Engine, available at: www.yell.com (accessed 12 December 2006). Corresponding author David Mountain can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Information history: its importance, relevance and future Toni Weller Department of Information Science, City University London, London, UK

Information history

437 Received 28 September 2006 Accepted 10 June 2007

Abstract Purpose – The purpose of this paper is to explore the emergent field of information history (IH) and to move towards a definition of IH. Some of the more traditional historical approaches to information science are challenged in their claims to be information history. Design/methodology/approach – The historiography of the field is discussed, and an analysis of the continuing development of IH is explored. Findings – IH is a field that has been attracting increasing attention in recent years from historians and information scientists alike. Although still a relatively young area, this paper argues that IH has the potential to develop into a highly relevant and dynamic field of research. The paper concludes with a look at the future for this area of research, with some suggestions as to how IH needs to develop in order to gain the credence and recognition it deserves. Originality/value – This paper attempts to augment the debate on IH and to encourage a broader recognition of this young and dynamic field within LIS. Keywords History, Information science, Library studies, Curricula Paper type Conceptual paper

Defining information history Information history is a distinct form of historical study in its own right, which looks at the role of information within past societies. It is grounded in historical evidence and it operates within the structures of historical research. It is history. But it also adopts ideologies from information science. The fundamental questions of the LIS discipline have always been related to information in society, politics, economics, the media, and business, how people understand information, and how it is transformed by changes in society. When these conceptual questions are allied to a rigorous historical methodology, something quite distinct and powerful emerges. This is information history: the historical study of information for its own sake. Information science as a discipline has existed for perhaps 50 or 60 years. Information has existed for centuries, millennia. As long as there have been records of human existence, there has been some form of information. Information history is thus the study of the relationship between humanity and its different forms of knowledge – of recording, manifesting, disseminating, preserving, collecting, using, and understanding information. Of how these issues affected, and were affected by, social, economic and political developments. It is not just concerned with information technologies, but with all aspects of an information society – in the most catholic sense of the term. This is a slightly different emphasis than that taken by Alistair Black, the strongest advocate of information history to date. While Black has rightly always stressed the

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 437-448 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817627

AP 59,4/5

438

importance of context and rigorous historical research, for him, information history strongly encompasses the history of the library. Black has comprehensively examined the range and scope of library history, showing that there have been academic and professional bodies and publications dedicated to it since the 1960s (Black, 1997). He has examined the cultural and socio-political role of the library (Black, 2001), including a fascinating insight into the use of the library as a means of controlling and eradicating social “diseases” such as deviancy, disorder, and poor discipline (Black, 2005). Black has also discussed the history of information, but this has tended to be manifested as information societies throughout history, or the history of information management, focusing on the modern period post-1800 (Black, 1997, 2004; Black and Brunt, 1999, 2000), rather than the history of information itself. More recently, Black (2006) has suggested that information history could be used almost as an umbrella term for the distinct and respected areas of book history, library history, the histories of information management, information societies, infrastructures and systems. This is certainly one way to make sense of an abstract research field. However, it does not do justice to information history, which is more than simply the sum of its parts. Information history has the potential to be a field in its own right, despite its difficulty to define. Its essence is not so much the definition of what information is or was, but rather, the way in which information is or has been thought of and applied in its own right. And, even more so, how information, in this particular sense, affected, and is or was affected by, the social, political, economic and cultural climates of the time. Developments in the LIS field often tend to make connections between library and information history – indeed, LIH is a recognised term. Library history is a significant topic, but it is not synonymous with information history, nor does information history have to focus on the library in any way to be valid in its own right. Library history has a huge scope itself, covering libraries as institutions, as cultural objects, as centres for the preservation of human knowledge, or as socio-political influences in society. Other fascinating topics of library history include the histories of individual librarians or the changing techniques for the control and access of information. But the material point is that library history – in whatever manifestation you choose – is examining a physical entity, a tangible body. Information history can be much more conceptual and abstract. Information is intangible and multifaceted. This is why much discussion of information history has broken it down into the more manageable histories of technology, the book, the library, and so forth. However, by doing so, this underrates the most fascinating area of this field. Precisely because information is so intangible, and so enduring, its manifestation for each generation tells us much about that society’s attitude towards control, culture, politics, knowledge and education. Is the way we examine information in the twenty-first century so unique? How does technology alter the way we interact with information? How does socio-political information affect everyday perceptions of reality? To what extent does access and control of information dominate social status? How has the value of information changed? To help us understand our fascination with information in the twenty-first century, we must examine the longer-term historical trends and relationships between society and information. This may sound very abstract and ambiguous, and indeed one of the reoccurring criticisms of information history is that information is too vast a concept to define and

examine. But, it is precisely this ambiguity which gives the field such potential depth. One example of this is the post-modern historical approach which argues that there is a history for everyone and everything should we choose to find it. This approach also favours the “everyday” person and experience, rather than the “great men” of history that was so favoured during the nineteenth century and the first half of the twentieth century. By adopting post-modernism in this way, it can be argued that information history research is exploring the everyday experience and attitudes of people. There have been historical explorations of such abstract notions of laughter and fear, such as Gatrell’s (2006) history of English humour and laughter and Burke’s (2005) Cultural History of Fear. These histories have examined fear and laughter in their own right, the relationships they have with individuals and society and the reciprocal affect socio-political context has upon them. Exactly the same argument as is being proposed here for information history. These abstract and ambiguous themes of fear and laughter are considered valid history so why not information as well? Because information can be understood in multiple ways, it presents a diverse field of enquiry: . the economic history of information (the cost of collecting, disseminating, controlling, and knowing the right information at the right time); . the political history of information (propaganda, surveillance, rise of the state, power); . the social history of information (censorship, education, auto didacticism, social status); and . the cultural history of information (literature, art, societies, museums). Or, is the history of information different in a democratic country than in a state with communist traditions, or a developing nation? Broader ideologies can also be applied; perhaps a Marxist history of information which understands it purely in terms of an economic product and process, or a feminist history of information examining the differences in what types of information women have been encouraged or discouraged from pursuing historically – are innate gender issues involved? (Incidentally, there has been some fascinating work by Moody (2006) on the notion of gender behaviour in cyberspace and how the internet remains a predominantly male construction.) By looking at society’s understanding and application of information in its past manifestations, we are able to meditate on the present, and the future. Interest in information is not a recent or transient phenomenon. Our contemporary interest in information should be understood in its historical context in order for us to appreciate the depth and longevity of the fundamental issues with which we are dealing today. Black (1998, p. 40) has argued that: In terms of period, the focus of information history would surely be that of the age of modernity.

Although traditionally the modern period is understood by historians to begin from the late eighteenth century, Black suggests that in terms of information “the age of modernity” could stretch from the early modern period of the fourteenth to the

Information history

439

AP 59,4/5

440

sixteenth century. Most work in this field to date has indeed focused on the period since 1750 (Black, 1998; Agar, 2003; Headrick, 2000; Weller and Bawden, 2005, 2006), although there have been some attempts at presenting more of a grand narrative such as Edward Higgs’s (2004) ambitious study of the development of the information state for five centuries from 1500 to the present day. However, as Weller has argued, there is no reason why it should be limited to modernity (Weller, 2005, pp. 273, 276-7). Information history is not limited to modernity since information has always been part of human society, right back to ancient times; indeed, a society cannot exist without information. To understand and study it in multiple contemporary and contextual terms gives us a hugely rich and intriguing field of research. This is not to suggest that all information historians study all periods; quite clearly this would be both unworkable and unrealistic. There are, of course, obvious practical limitations of evidence and sources the further back you go, but historians understand the sources and documents of their periods. They understand where documents can be found, the limitations of evidence and the weight of certain evidence over others. Just as there are specialist political, cultural or economic historians of differing eras who understand the contextual subtleties of their period, why not ultimately also specialist medieval, renaissance, or modern information historians? If information history is defined through the study of the origins of the information society, or the tools and processes for collecting information, then perhaps modernity would be a more appropriate period. However, if it is defined, as suggested here, as the study of information in its own right, of its relationship with, and impact upon, society, then this restriction surely no longer applies. Information transcends period. Historiography Information scientists are not the only ones interested in information history. There has been a great deal of movement in this direction by historical scholars in recent years. Quite aside from scholarly research and monographs in this area, the Institute of Historical Research in London now runs regular seminars on knowledge in society, and the annual international conference, started in 2002, on The Future Direction of the Humanities, continually has a theme on “The stuff of knowledge in a ‘knowledge society’ or ‘knowledge economy’”. For a subject that has only seriously been considered by scholars for the last decade or so, it already has a striking bibliography. Early forms of information history used variations of the terms “historical information science”, or “historical informatics” which focused on the history of the discipline or a particular technology, or even the application of digital technologies to the study of history (Marvin, 1987; Karvalics, 1994, 2002; Warner, 2000; Boonstra et al., 2004). These were not good history though, since they studied LIS themes without a historical methodology. As contemporary interest in the information society and information technologies grew, historians who traditionally focused on the history of science and technology began to apply these ideas to the notion of the “information state”. Higgs (2004), recently published a book on the Information State in England Since 1500, arguing that the state has long collected data on its citizens in some form; it is only the technological processes that have changed over time. Agar (2003) has also looked at the process of government data collecting since the nineteenth century, suggesting that the development of the civil service and of the computer are strongly linked. The

mechanics of government data collection and processing in the late eighteenth and early nineteenth centuries was examined by David Eastwood (Eastwood, 1989) as early as 1989. James Beniger’s key work is on the origins of the information society brought about by the technological and economic crisis of control during American industrialisation in the nineteenth century (Beniger, 1986). In the last few months Oz Frankel has published a book on the processes of state enquiry in the nineteenth century, in which he argues that through the new government enquiries and committees of this period, which were then published and circulated to the public, citizens assumed the standing of informants and readers (Frankel, 2006). Another trend has been to trace the histories of technologies used to disseminate or produce information, such as Elizabeth Eisenstein’s analysis of the printing press as an agent of change (Eisenstein, 1979), or Brian Winston’s account of the telegraph and telephone (Winston, 1998). Daniel Headrick has written about the pre-mechanised knowledge technologies from 1700 to 1850 (Headrick, 2000), including the rise of cartography, statistics, graphs and dictionaries – technologies that served to organise, display, store, or communicate knowledge. There have also been some attempts at social histories of information, such as Burke’s (2000) social history of knowledge, or Briggs’s (1977) social history of the telephone, although these are still limited by a view of information technology. For Briggs, it is obviously the telephone; for Burke, the printing press and distribution. In the last few years there has been more explicit discussion of information history as an area of research in its own right, stressing the importance of rigorous historical methodology. Although Alistair Black often associates information history with library history, he has done a huge amount for the recognition of this field. His research has ranged from work on surveillance, bureaucracy and public librarianship in nineteenth century Britain (Black, 2001), to the role of information management in MI5 (Black and Brunt, 2000). He has been consistent and unrelenting in his calls on the importance of historical context and appreciation of information history as a subject (Black, 1995, 1998, 2001, 2006). More recently, Weller (2005) has written on the emergence of information history, the crisis of control which helped lead to the nineteenth century origins of the information society (Weller and Bawden, 2005), and Victorian information history, by focusing on individual understandings and perceptions of information rather than on technological developments or institutions (Weller and Bawden, 2006). Such contextual examinations of information provide both a depth and richness to the field and help to show its validity as an area of research in its own right. Information science “histories” There have been arguments that information science needs to embrace more broadly accepted theories from other disciplines and consider the wider theoretical and philosophical basis of its own field in order to progress (Karvalics, 1994; Warner, 2000; Hjørland, 2005). Information history certainly serves this purpose, and it is also recognised by historians as an emerging new area. In Agar’s (2003, p. 13) recent work he supports this view, arguing that: Information – what it meant and how it was collected and used – must be understood in terms of its context [. . .] an informational history is emerging. Historians of an older generation [. . .] are re-emphasizing informational aspects to their own work to reinterpret business and cultural history [. . .] There is potential in a new informational history.

Information history

441

AP 59,4/5

442

Information science needs to pay more attention to this exciting and emergent field, whose fundamental research questions are so related to the LIS discipline. There is no reason why such research should be limited within the disciplines of either history or information science; indeed it would benefit most from more interaction and conversation between scholars in both areas. This is another reason why a rigorous historical methodology is so important to information history; it helps information scientists in this field to be taken seriously by historians, who in turn, could benefit from an information science understanding of the philosophies of information and knowledge. One serious criticism of many “historical” information science works is that they apply or impose twenty-first century terms and ideas onto a historical background, using them in an ahistorical and anachronistic way. A recent example, by Spink and Currier (2006), attempted to look at historical examples of human information behaviour. In itself this is an interesting idea, but it applied a methodology based upon simply searching through the biographies of individuals (often using only a single volume as evidence) looking for explicit reference of the word “information”. For example, in their discussion of Charles Darwin’s information behaviour, only one source was studied – his autobiography – and from this, only three brief quotes are used to support their findings. These are simply cited, with no reference to their contextual placement or wider meaning, no other supporting evidence, no analysis of the source or the quotes, and no attempt to reference the alleged information behaviour within the bigger picture of Darwin’s world view. The result was that much useful material was probably ignored, and the references that were picked out were simplistic and abstract. This is not information history. Information history research must be firmly contextualized with an appreciation of the wider events that may have influenced the individual or event: [Research] must be rigorous in its methodology, that is, internally coherent and consistent. It must also ensure that the evidence used have been checked back to their original sources and that any bias or ideological stance of the author (or indeed, the historian) has been acknowledged [. . .] Any argument must be self-referencing, it must show where it fits into the existing literature [. . .] Evidence must not be wilfully ignored, but explored and challenged and, if necessary, revisions must be made [. . .] in order to accommodate it (Weller, 2006).

This is a view strongly supported by Black (1995, 1998, 2006). Too often historical information science research neglects this rigorous methodology and the result tends to be misleading and reductionist. Information scientists writing “historically” have also repeatedly attempted to fit modern concepts such as information and document management, the internet, and so forth into the developments and technologies of earlier centuries. Bawden and Robinson’s (2000) discussion of the similarities of the communications revolutions of the fifteenth century and the twentieth century (with the introduction of the printing press and the internet, respectively) is one such example, although the authors do acknowledge the limitations of the study. Karvalics (2002) examines the history of the internet, and in doing so constantly jumps from the sixteenth century to the twenty-first century in a way which not only suggests an inevitable teleological progression, but which also completely omits any reference to the context in which these technologies were developed, or why and how they were used within society.

Examples such as these focus on the development of specific communication technologies and ways of organising information – the telegraph and telephone in particular, and later the radio, typewriter, classification schemes – without considering the differing contexts in which each of these technologies emerged, or their contemporary impacts. These accounts also have a tendency to be deterministic, and often at least indirectly support the idea of a teleological progression into the “ultimate” form of information society. It could be argued from an information science viewpoint that these are all ways of writing about the past in terms of current concepts, but they are not good or valid history and thus they cannot be considered information history. Information history as a field of research Alistair Black has warned of the limitations of information history; that the very ubiquity of information in modern society makes the potential for the topic so vast that it becomes impossible to define. The very intangibility of information raises questions about its nature and properties, presenting problems for any potential information historian. He asks: If information itself defies precise definition, what chance is there that its definition might be historicized? (Black, 2006, p. 2).

While this is a valid and important methodological issue, it seems to miss the point that just as there is not (and most likely cannot be) a single definition of information in the twenty-first century, nor was there likely to have been one in any other period of history. Therefore to try to impose a modern interpretation on the past is, in its very essence, unhistorical. This does not mean that research must be abstract. It simply means that we must be aware that the notions of information, knowledge, the information society and other information science ideas are often modern phraseology, and their application to the past can risk being anachronistic. One way of avoiding this is, again, to ensure research is firmly contextualized. A recent study of Victorian perceptions of information, for example, was grounded in the semantics and definitions of information through nineteenth century dictionaries (Weller, 2006). Information science debates over the meanings and distinction between knowledge and information and so forth must be applied carefully in historical study; we need to define the past in its own terms and not apply modern terminology or values where none existed. In fact, this approach could be regarded as the philosophical basis for the field of information history. The very fact that information can “relate to anything [. . .] the smell of a new perfume [. . .]; in Landseer’s ‘Stag at Bay’; in a letter to a sweetheart; a laboratory slide; a computer tape” (Ritchie, 1982) allows a huge diversity of understanding of its properties and nature. Contemporary modern society does not limit its discussion of information to one definition; it is understood and researched as a commodity, a process, education, through the media, propaganda, and policy, and through the techniques used to collect, organise, disseminate and preserve it, to name just a few. Information science debates over the properties of information, of the potential transitions from facts or data through to information and then knowledge and wisdom come into play here. From Plato’s three analyses of knowledge to Popper’s three

Information history

443

AP 59,4/5

444

worlds, information has long been a topic of philosophical enquiry. To align such questions with information history provides further depth. It is not necessary or desired that there should be one single definition of information – such a definition is so subjective to the context in which it is created that it serves little historical purpose – each era will define it in a new way according to its own values and world view. These changing definitions in themselves allow insight to the understanding and importance of information proving a philosophy of information. Empirical enquiry alongside philosophical and conceptual debate provides a strong framework for an information history rich in research and theory. This is where information scientists are better equipped than historians to tackle the philosophical issues of information history. They better understand the debates and theories surrounding the nature of information and knowledge. Allied with a historical methodology, information historians are in a strong position to recognise information as it was expressed and understood by past societies. This is what is so interesting about information history and gives the field scope. As a field of interest that is independently being recognised by historians, information science scholars need to become more aware that this is a lucrative and dynamic area. There are significant funded research projects in this field currently underway, ranging from “The early information society in Britain: the emergence of information management and information science, 1900-1975” at Leeds Metropolitan University, an AHRC funded project worth £78,000, to the smaller project at the University of Reading, also funded by the AHRC, the £38,000 “Designing information in everyday life, 1815-1914”. There is also a project currently under way for a potential European LIS curriculum that includes an information history panel (Makinen et al., 2005). Information history allows for a deeper understanding of the role information and its uses (and misuses) has played in the past, thereby enriching both of its parent disciplines. History is, after all, a dynamic, changing story. One of its key purposes is to provide understanding of change, and information science as a discipline is often regarded as being a manifestation of a rapidly changing society (Warner, 2000; Saracevic, 1999). Such a combination should also allow for a deeper discussion of the issues central in our own culture (the information society, the digital divide, information literacy, privacy rights, etc.) enabling us to place them in their wider historical context and flesh out a more three dimensional picture of the thematic roots of our contemporary society. Likewise, contemporary society also has had an influence on the growth of information history. As Weller (2006) has argued: One of history’s greatest strengths is that we can extract patterns and themes from the past to try to explain contemporary human behaviour or cultural climate, or in order to learn from the mistakes (or successes) of the past. In doing so, we add to the bigger story of human development. The emergence of information history and digital history in the last decade, for example, has to a great extent been down to a reinterpretation of history, based upon the contemporary values and concerns of the information society.

Weller continues that the dominant themes of the information age have led to a reconsideration of our history. The research by Agar (2003) and also Higgs (2004) on the histories of the government machine and the information state, are essentially revisions of the role central government plays (or should not play) in the collection of

data on its citizens in the light of contemporary concerns over ID cards, data protection, and surveillance issues. Revisions have also been made to the role and conceptualisation of information and knowledge in society, as these themes have become more prominent in everyday culture (Black, 1998, 1995; Weller and Bawden, 2006; Burke, 2000). The future of information history So, where next for information history research? Despite all the existing research, there is one fundamental area that has been largely overlooked, and that is the social and cultural history of information. The technological and infrastructural developments in the processing and collection of information are significant, and it is important we understand them, but it is also crucial to realise that they did not occur in a vacuum. Throughout human history, perceptions of information in everyday life and culture have changed alongside the developments in information infrastructures in business, government and communication. This is an area of information history which has yet to be fully explored, and is a central thesis of the doctoral research on which this paper is based (Weller, forthcoming). What better way to explore the historic perceptions of information and knowledge than through the everyday cultural artefacts by which it was understood? Current research under way by Toni Weller uses an information history methodology to examine how information was disseminated and displayed in nineteenth century etiquette books and periodicals; how access to certain information through certain cultural channels could affect social behaviour and social status. Cultural understandings and manifestations of information are just as important as the political and technological, or the cultural institutions of information such as museums and libraries. There also needs to be much more information history taught at a degree level, particularly within LIS. The benefits are numerous for both the student and the future of the discipline. Teaching historical methodology encourages rigorous and critical thinking, which can be applied to any form of research or professional work, historical or not. Addressing some of the key historical and philosophical questions and issues of information science also provides a real backbone for the discipline. It gives roots and credence to something that can also be very contemporary and modern. It encourages broader thinking; to understand that technology is not the only aspect of LIS, and to recognise how our own society developed. Calls for more inclusion in the LIS curriculum have also been made by Black (2006, p. 445) who similarly argues that: Dissemination and promotion of the subject to this community is arguably a prerequisite not only of the development of information history but also of its acceptance as a legitimate field in the history discipline.

Bringing a more academic focus to university courses provides depth to the discipline, without losing its professional and commercial interests. We need to give future students and up-and-coming academics more awareness of just how varied and exciting information science can be. Information history is a specialism just as much as GIS, legal research, web-log analysis, librarianship, or information retrieval. It requires an understanding of historical methodology and research, but it also requires an

Information history

445

AP 59,4/5

446

understanding of the nature of information and current debates in the LIS field in order to make it relevant. Information historians should therefore have an understanding of the methodology of historical research, of challenging sources, providing context, and presenting rigorous evidence. They should also be able to understand the conceptual debates over the nature of information, and notions of the information society, so prevalent in the information science community. A new generation of researchers with backgrounds in both history and information science would strengthen the validity of the information history field. In order to provide such an opportunity, more credence must be paid to the study of information history within information science degree courses (and, ultimately, history degrees as well). There is undoubtedly a strong and emergent area of research here, both among information scientists and historians, and it is one that is producing successfully funded projects. It is also an important one for our own contemporary society since we often overlook the increased role and influence information has in our everyday lives and how it is changing the way we think and interact. The focus is all too often on technology and the immediacy of “here and now” Although it is allied to information science and historical study, information history deserves to be recognised as an independent field in its own right. Not to do so would be to miss one of the most exciting and rewarding ways of studying this fundamental aspect of society. Information history is gaining momentum as a research area; we should embrace and encourage its development. References Agar, J. (2003), The Government Machine: A Revolutionary History of the Computer, MIT Press, Boston, MA. Bawden, D. and Robinson, L. (2000), “A distant mirror? The internet and the printing press”, Aslib Proceedings, Vol. 52 No. 2, pp. 51-7. Beniger, J.R. (1986), The Control Revolution: Technological and Economic Origins of the Information Society, Harvard University Press, Cambridge, MA. Black, A. (1995), “New methodologies in library history: a manifesto for the ‘new’ library history”, Library History, Vol. 11, pp. 76-85. Black, A. (1997), “Lost worlds of culture: Victorian libraries, library history and prospects for a history of information”, Journal of Victorian Culture, Vol. 2 No. 1, pp. 124-41. Black, A. (1998), “Information and modernity: the history of information and the eclipse of library history”, Library History, Vol. 14 No. 1, pp. 39-45. Black, A. (2001), “The Victorian information society: surveillance, bureaucracy, and public librarianship in 19th-century Britain”, The Information Society, Vol. 17 No. 1, pp. 63-80. Black, A. (2004), “Hidden worlds of the early knowledge economy: libraries in British companies before the middle of the 20th century”, Journal of Information Science, Vol. 30 No. 5, pp. 418-35. Black, A. (2005), “The library as a clinic: a Foucauldian interpretation of British public library attitudes to social and physical disease, ca. 1850-1950”, Libraries & Culture, Vol. 40 No. 3, pp. 416-34. Black, A. (2006), “Information history”, in Cronin, B. (Ed.), Annual Review of Information Science and Technology, Vol. 40, Information Today, Medford, NJ, pp. 441-74.

Black, A. and Brunt, R. (1999), “Information management in business, libraries and British military intelligence: towards a history of information management”, Journal of Documentation, Vol. 55 No. 4, pp. 361-74. Black, A. and Brunt, R. (2000), “MI5, 1909-1945: an information management perspective”, Journal of Information Science, Vol. 26 No. 3, pp. 185-97. Boonstra, O., Breure, L. and Doorn, P. (2004), “Past, present and future of historical information science”, Historical Social Research/Historiche Sozialforschung, Vol. 29 No. 2, Netherlands Institute for Scientific Information and the Royal Netherlands Academy of Arts and Sciences, Amsterdam. Briggs, A. (1977), “The pleasure telephone: a chapter in the prehistory of the media”, in Pool, I. (Ed.), The Social Impact of the Telephone, MIT Press, Boston, MA, pp. 40-65. Burke, J. (2005), A Cultural History of Fear, Virago Press, London. Burke, P. (2000), A Social History of Knowledge From Gutenberg to Diderot, Polity Press, Cambridge. Eastwood, D. (1989), “Amplifying the province of the legislature: the flow of information and the English state in the early nineteenth century”, Historical Research, Vol. 62 No. 149, pp. 276-94. Eisenstein, E. (1979), The Printing Press as an Agent of Change: Communication and Culture in Early Modern Europe, Cambridge University Press, Cambridge. Frankel, O. (2006), States of Inquiry: Social Investigations and Print Culture in Nineteenth-century Britain and the United States, The Johns Hopkins University Press, Baltimore, MD. Gatrell, V. (2006), City of Laughter: Sex and Satire in Eighteenth-Century London, Atlantic Books, London. Headrick, D.R. (2000), When Information Came of Age: Technologies of Knowledge in the Age of Reason and Revolution, 1700-1850, Oxford University Press, Oxford. Higgs, E. (2004), The Information State in England: The Central Collection of Information on Citizens Since 1500, Palgrave Macmillan, Basingstoke. Hjørland, B. (2005), “Library and information science and the philosophy of science”, Journal of Documentation, Vol. 61 No. 1, pp. 5-10. Karvalics, L. (1994), “The claims, pre-history and programme of historical informatics”, Periodica Polytechnica, Series on Human and Social Sciences, No. 1, pp. 19-30, available at: www.ittk. hu/english/docs/historical_informatics_zkl.pdf (accessed 10 June 2007). Karvalics, L. (2002), “Internet: the real pre-history and its implications for social theory”, paper presented at at Internet Research 3.0: Net/Work/Theory, Maastricht, 13-16 October. Makinen, I., Black, A., Kovac, M., Skouvig, L. and Torstensson, M. (2005), Information and Libraries in an Historical Perspective: From Library History to Library and Information History, draft report of the EU Curriculum Development Project. Marvin, C. (1987), “Information and history”, in Slack, J. and Fejes, F. (Eds), The Ideology of the Information Age, Ablex, Norwood, NJ, pp. 49-62. Moody, E. (2006), “Women in cyberspace”, paper presented at ASECS, Montreal, 30 March-2 April, available at: www.jimandellen.org/ConferencePapers.WomenCyberspace.html (accessed 10 June 2007). Ritchie, S. (1982), Modern Library Practice, ELM Publications, Kings Ripton. Saracevic, T. (1999), “Information science”, Journal of the American Society for Information Science, Vol. 50 No. 12, pp. 1051-63.

Information history

447

AP 59,4/5

448

Spink, A. and Currier, J. (2006), “Towards and evolutionary perspective for human information behaviour. An exploratory study”, Journal of Documentation, Vol. 62 No. 2, pp. 171-93. Warner, J. (2000), “What should we understand by information technology (and some hints at other issues)?”, Aslib Proceedings, Vol. 52 No. 9, pp. 350-70. Weller, T. (2005), “A new approach: the arrival of informational history”, Proceedings of the XVI International Conference of the Association for History and Computing, Royal Netherlands Academy of Arts and Sciences, Amsterdam, 14-17 September, pp. 273-8. Weller, T. (2006), “A continuation of Paul Grobstein’s theory of science as story telling and story revising: a discussion of its relevance to history”, Journal of Research Practice, Vol. 2 No. 1, article M3, available at: http://jrp.icaap.org/content/v2.1/weller.html Weller, T. (forthcoming), A Cultural History of Information in Nineteenth Century England, Department of Information Science, City University London, London, funded by the Arts and Humanities Research Council (forthcoming). Weller, T. and Bawden, D. (2005), “The social and technological origins of the information society: an analysis of the crisis of control in England, 1830-1900”, Journal of Documentation, Vol. 61 No. 6, pp. 777-802. Weller, T. and Bawden, D. (2006), “Individual perceptions: a new chapter on Victorian information history”, Library History, Vol. 22 No. 2, pp. 137-56. Winston, B. (1998), Media Technology and Society. A History: From the Telegraph to the Internet, Routledge, New York, NY. Corresponding author Toni Weller can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Of the rich and the poor and other curious minds: on open access and “development” Jutta Haider Department of Information Science, City University London, London, UK

Open access and “development”

449 Received 6 November 2006 Accepted 5 June 2007

Abstract Purpose – The paper seeks to reconsider open access and its relation to issues of “development” by highlighting the ties the open access movement has with the hegemonic discourse of development and to question some of the assumptions about science and scientific communication upon which the open access debates are based. The paper also aims to bring out the conflict arising from the convergence of the hegemonic discourses of science and development with the contemporary discourse of openness. Design/methodology/approach – The paper takes the form of a critical reading of a range of published work on open access and the so-called “developing world” as well as of various open access declarations. The argument is supported by insights from post-development studies. Findings – Open access is presented as an issue of moral concern beyond the narrow scope of scholarly communication. Claims are made based on hegemonic discourses that are positioned as a priori and universal. The construction of open access as an issue of unquestionable moral necessity also impedes the problematisation of its own heritage. Originality/value – This paper is intended to open up the view for open access’s less obvious alliances and conflicting discursive ties and thus to initiate a politisation, which is necessary in order to further the debate in a more fruitful way. Keywords Developing countries, Sciences, Communication technologies, Journal publishers Paper type Conceptual paper

Introduction An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds (Budapest Open Access Initiative, 2001).

These sentences open the “Budapest Open Access Initiative” (BOAI), which with “open access” (OA) gave in 2001 a name to a new phenomenon in scientific publishing and scholarly communication that had emerged in the preceding decade. As in other forms of publishing, scholarly publishing has also been affected by the increased significance of digital media, and in particular by the internet. Especially during the latter half of the The author is grateful to David Bawden for his valuable comments. The author would also like to thank the participants at seminars at Lund University, Sweden, and at the Swedish School of Library and Information Science in Go¨teborg and Bora˚s for discussing some of the ideas presented here, above all Jan Nolin. The author is in receipt of a doctoral award by the AHRC.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 449-461 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817636

AP 59,4/5

450

twentieth century the number of journals published had increased significantly and scientific publishing, or more precisely the distribution of journals, had undergone considerable changes. While earlier they had been sold primarily to individuals, during the course of the twentieth century they were subscribed to increasingly only by institutions (Tenopir and King, 2000). Concurrently scientific publishing had been subject to large-scale mergers of publishing houses and price increases far beyond inflation, which has stretched the acquisition budgets of libraries and led to cancellations of subscriptions; a situation which has been described by the name “serials crisis”. The most visible and most debated new model that emerged from the changed circumstances in this context is OA. Superficially, OA takes advantage of two factors. First, scholarly authors do not derive any direct income from publishing, yet they have an interest in the high visibility of their publications. This is less for the “sake of inquiry and knowledge” as the BOAI, quoted above, would have it, and has more, or everything, to do with the symbolic capital that publications accrue. Second, in comparison to print, the internet has the potential to make the distribution of documents relatively inexpensive. Numerous definitions of what constitutes OA and how to achieve it exist. Despite some differences, all refer to a mode of enabling perpetual, free, online access to scholarly literature, either by publishing in OA journals or through the archiving of material published elsewhere in OA repositories or on the authors’ own websites. Some are more restrictive and refer solely to the peer-reviewed literature, while others take a broader view and also include pre-prints and other un-refereed material, as well as, for example, teaching materials or data sets. As argued below, the primary relevance of OA might not lie with what is considered the centre of the system of science and its actors. In its core OA is about extending access to scientific information and, ultimately, it is about science, or what could and should be called modern or Western science: a particular and peculiar form of knowledge whose status depends on its claim to be universal and which is arguably one of the most powerful forms of knowledge to have shaped the world. Despite the fact that occasionally OA is also extended to include other materials, its root concerns lie with one of the most central institutions of science – the scientific journal – and with it the peer-reviewed article. These have come to depict an idealised version of scientific progress, perceived as a relatively straightforward cumulative venture, where each research project and each article is based on the preceding ones and where citations serve to give due credit. In a certain way, the scientific article, peer-reviewed, collected and distributed in journals, and connected to the preceding and surrounding ones via citations, has come to determine how modern science itself is imagined. Despite the emergence of post-Kuhnian science studies and strong voices of criticism, in particular from feminist quarters (e.g. Harding, 1991, 2006), perceptions of linear progress and the simple cumulative nature of science still contribute significantly to the terminology in which many current debates – especially in LIS and librarianship – are couched. Also, the OA debates seem to draw to a considerable degree on this idea of science and are largely rooted in an understanding of science and scientific communication that adheres to this depiction. Yet at the same time, as the opening sentences of the BOAI quoted at the outset illustrate, OA has always also been portrayed as more than simply a means to speed up and ease the process of scholarly communication. More importantly, it is also negotiated in terms of extending the accessibility of this type of information, or better

documents, to otherwise excluded populations or, in the words of the BOAI, to “other curious minds”. In particular, if it is true that science’s formal literature is hardly used at the research front and most published articles are never read (see Frohmann, 2004, p. 4), or at least they are never cited, then facilitating and widening access to these literatures within and for the scientific community cannot be the central significance of OA. Furthermore, by definition, being part of a discipline means having access to what is considered a field’s core literature, even if it is the paradox of sciences formal literature, as Frohmann (2004, p. 91) maintains: [It] conveys very little, if anything, of substance contributing to the performance of research science, perhaps only communicating a subtext about science’s social systems of intellectual priority and status hierarchies. [T]he degree of use of information services, apparatuses and procedures turns out to be a function of how little rather than how much knowledge users possess.

It appears that the most interesting aspects of OA lie less with the scientific community at what is considered the centre of science, but with its margins and fringes (i.e. with types of documents that do not form part of the official and formal literature and in particular with groups that are typically beyond the reach of these literatures). Two groups, who are continuously referred to, seem especially significant in this regard: on the one hand, the so-called “public”, and on the other, those researchers and institutions, who for financial reasons, cannot afford to purchase (access to) scholarly journals. Here, the most important groups are scholars and institutions in what is usually called the “developing world”[1]. Open access for the “developing world” Significantly the coinage of the term OA itself in 2001 took place in the context of a development project in the widest sense of the word. The BOAI was initiated and funded by the Soros Foundation’s “Open Society Institute”, a charitable foundation set up by billionaire philanthropist George Soros, which has as its prime areas of action and intervention a number of “developing countries” (Open Society Institute, n.d.). Since then the Soros Foundation has developed into one of the main funding bodies behind OA and it has financed countless workshops and conferences and sponsored such highly visible projects as, for instance, the “Directory of Open Access Journals”, the “Open Access Newsletter”, and also the EiFL “big deal” library consortia arrangements, which also include OA products. It is intriguing in this context that the Soros Foundation’s “Open Society Institute” is specifically dedicated to the promotion of a liberal-democratic, Popperian so-called “open society”. By this they mean a “society based on the recognition that nobody has a monopoly on the truth” (OSI). While Popper did not necessarily privilege European forms of societal organization (Notturno, 2000), this is still interesting, since at the same time Popper’s name cannot be separated from what he is best known for, his philosophy of science, which privileges the rationality of the scientific method and his theory of falsification. This modernist view of science and of science’s universality is based on the very claim that it has the monopoly to truth, or at least that science constitutes a prior and universal form of knowledge. It has come under considerable attack with regard to its role in the process of colonialism and later of development and it has been associated with the destruction of other knowledge systems, in particular those that have been relegated to a status of indigenous knowledge (e.g. Nandy, 1988, 1992; Marglin, 1996; Harding,

Open access and “development”

451

AP 59,4/5

452

2006). At the very least it seems intriguing that one of the prime organisations behind the OA movement is located in the wider area of development initiatives, and second, that it had been set up in the very name of one of the most prominent modernist philosophers of science. In the vicinity of librarianship OA emerged as a topic strongly associated with the “serials crisis”. The “serials crisis” has impacted the budgets of libraries in general, but it is said to have particularly affected libraries in economically weaker countries, especially in Africa (Willemse, 2002; Muthayan, 2004), which during the same time have also undergone economical crises. A number of continuously re-emerging issues tie a supposed need for open or free access to scholarly publications to the “developing world”. Open access is seen as constituting a way to better connect the “developing world” to the system of science, by potentially providing access to scientific literatures published in the “developed world” (e.g. Chan and Costa, 2005; Chan et al., 2005; Chan and Kirsop, 2001; Arunachalam, 2003; Ramachandran and Scaria, 2004; Deschamps, 2003; Scaria, 2003; Weitzman and Arunachalam, 2003; Tenopir, 2000; Smart, 2004; Durrant, 2004). Occasionally, while broadly favouring access to more literature from the “developed countries”, this is also associated with threats to the local journal production, which could suffer from an increased availability of this usually more prestigious material (e.g. Durrant, 2004; Smart, 2004; Scaria, 2003). Habitually reference is made to three different types of divides or gaps, namely a North/South, a South/North, and a South/South divide. Each divide is related to a direction of “information flow”, which OA is perceived as having the potential to enable or to intensify (e.g. Deschamps, 2003; Durrant, 2004; Smart, 2004; Chan and Costa, 2005; Chan et al., 2005). A further connection between OA and the “developing world” is made via the so-called “big deal”. To stay within their budgets, libraries began to negotiate the provision of whole sets of journals at a fixed price with big publishing houses. On the one hand this allowed them access to more journals, but on the other hand it restricted choice. At times it also forced libraries to restructure their budgets in ways that required cancelling subscriptions to journals published by smaller publishers or by scholarly societies. For the “developing world” the big deal is said to have had effects that go beyond merely restricting the availability of material (Chan and Costa, 2005; Chan et al., 2005). Since journals from the “developing world” are usually published by small publishers (Rosenberg, 2002), the logical conclusion seems to be that if these journals were OA they would still be used by readers in libraries which had cancelled their subscriptions, or which had never subscribed to them (Chan and Costa, 2005). Of course, this also applies to journals that are independent of the serials crisis, are excluded from collections and bibliographic databases, and that show considerable bias against publications from “developing countries” (Sancho, 1992; Narvaez-Berthelemot and Russel, 2001). More generally, OA is perceived as potentially extending the readership and reach of scientific publications from the “developing world”, and thus as increasing its visibility and impact (e.g. Arunachalam, 2003; Ramachandran and Scaria, 2004; Davison et al., 2005; Deschamps, 2003; Scaria, 2003; Weitzman and Arunachalam, 2003; Tenopir, 2000; Smart, 2004; Durrant, 2004; Rajashekar, 2004).

In short, OA is thought to benefit “developing countries” by increasing “information flows” and by “connecting” them to the “system of science”, which, since it is persistently portrayed as synonymous with progress, is depicted as the necessary prerequisite for any form of “development” to take place. Famously, the move to provide free online access to a considerable number of its scientific journals was undertaken at a nationally and internationally orchestrated level by Brazil in the form of Scientific Electronic Library Online (SciELO). It was set up – avant la lettre – in 1998 and has since been expanded across the whole of South and Latin America. It now also includes Spain and Portugal. It is based on a very stringent policy and a strict system of control, which measures quality largely by reference to the mainstream international bibliographic databases. Although SciELO includes literatures spanning from psychology via linguistics and the arts to engineering, by far most of its journals are in medicine and related areas. Its main funding bodies are large organisations active in health politics and its methodology was originally developed in cooperation with BIREME (Latin America and Caribbean Centre on Health Sciences Information), Pan American Health Organization (PAHO), and the World Health Organization (WHO) (Marcondes and Saya˜o, 2003). All three are major players in national, regional and global health politics. This is illustrative of two aspects of OA that are particularly relevant when seen in relation to the “developing world”. First, it brings to the fore that it is an issue which is also very much the concern of major international organisations. Second, it also highlights the link between the “developing world” and OA related to the field of medicine and health and to the politics surrounding it. This is characteristic of many of the debates on the “developing world” and also present in the various OA debates. Furthermore, the biggest free-access initiative, Health Internetwork Access to Research Initiative (HINARI), equally funded by the WHO, through which major commercial science publishers grant institutions in a number of “developing countries” free online access to scientific journals, is equally situated in the area of health and medicine. Although not considered to be OA in the actual sense of the word – which requires unrestricted free access for everyone, while here it is granted only to certain groups, dependent on a country’s GDP – the discussions surrounding it, as well as the language in which its usefulness is debated, are in relevant parts very similar to those that are connected to OA proper. While it could, of course, be argued that this might be merely reflective of the fact that medical research plays a significant role for scientific publishing and also for OA in general – for instance, BioMed Central, the biggest commercial OA publisher is situated in this field – the potential of OA for medicine and health care in the “developing world” is still often emphasised separately (e.g. Weitzman and Arunachalam, 2003; Smart, 2004; Chan and Costa, 2005; Chan et al., 2005). Furthermore, despite voices of criticism, not only are notions of the “developing world” still primarily entangled with images of suffering and disease (Nandy and Visvanathan, 1990; Escobar, 1995), also, as Nandy and Visvanathan (1990, p. 145) maintain, “the language of modern medicine has contributed handsomely to the language of development”. Correspondingly, not only are many of the OA or other free access initiatives devoted to “developing countries” concentrated on medical research and health issues, which of course are relevant and legitimate concerns, but more importantly, the debates preceding and surrounding them, however subtly, still draw

Open access and “development”

453

AP 59,4/5

454

on this vocabulary and almost invariably enforce the perception of the poor, diseased and weak peoples of a global “South”. This happens not least by introducing the notion of an “information famine” (Chan et al., 2005), thus evoking the misery of starvation and with it one of the strongest and some say most violent images (Escobar, 1995, p. 103) that have shaped relations with the “developing world”; or by referring to the “peoples of developing nations” alongside the “disabled”, as is done in the “IFLA Statement on Open Access” (International Federation of Library Associations, 2004). “Open access” as a movement Since the BOAI convened in 2001 the number of charities, development agencies, and funding bodies that became involved in the politics of OA has increased steadily (compare Bailey, 2005). Concurrently, the number of initiatives, petitions, declarations, and mission statements has increased equally consistently. They are also the documents setting out the definitions of OA, its conditions, requirements as well as its goals. These can be of a very all-encompassing nature. Besides the BOAI, the most relevant are the Bethesda Statement on Open Access Publishing (Bethesda Statement, 2003), and the Berlin Declaration on Open Access to Knowledge in Science and the Humanities (Berlin Declaration, 2003). The Salvador Declaration (2005) has been formulated specifically with “developing countries” in mind. The BOAI’s envisioned effect of OA is to: . . . accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.

The Bethesda Statement is more specific and aimed at the biomedical research community. Its purpose is to: . . . stimulate discussion [. . .] on how to proceed, as rapidly as possible, to the widely held goal of providing open access to the primary scientific literature.

The Berlin Declaration, on the other hand has the more general “mission of disseminating knowledge”. It sets out “to promote the Internet as a functional instrument for a global scientific knowledge base and human reflection” and defines “open access as a comprehensive source of human knowledge and cultural heritage that has been approved by the scientific community”. The Salvador Declaration on Open Access, which is intended to provide a specific developing world perspective, and according to which “open access promotes equity”, contains yet another version of OA. Here it is simply said to “[mean] unrestricted access to and use of scientific information”. By referring to concepts such as humanity, poverty, cultural heritage, or equity, which are all highly charged notions entangled with strong connotations and related to various agendas, these few excerpts draw on very powerful images that tie OA to specific discourses, and whose use in this context has implications. Specifically, a certain idea of poverty has been fundamental in development discourse for the construction of underdevelopment and consequently the division of the world into developing and developed parts as well as the related relations of dominance (Escobar, 1995; Rist, 2002; Mestrum, 2002). Furthermore, in contemporary ICT and information society debates, with which the OA debates overlap, a techno-centric and economistic notion of poverty

contributes nicely to more recent constructs, such as the “digital divide” or “information poverty”. This is not least achieved by drawing on the authority of the hegemonic discourse of development (Wilson, 2003; Haider and Bawden, 2006). The reference to these concepts also clearly highlights that OA first and foremost has to be regarded as a movement and that it is being tied to issues that position it in the realm of certain types of political engagement. This perception is re-enforced by a closer look at the constantly growing literature on OA, which consists largely of opinion pieces, studies carried out in the name of specific interest groups, how-to guides, and policy documents (see Bailey, 2005). Myriad national and international organisations, charities, foundations, various funding and government bodies have outlined policies, signed declarations, advanced mission statement or else got involved in the wider politics of scientific information, that can be said to have one current focal point in OA. This reaches, for instance, from the already introduced Soros Foundation to the Wellcome Trust in the UK, the National Institute of Health in the USA, or the Chinese Academy of Sciences as well as the OECD. For instance, the 2004 UK House of Commons Science and Technology Committee report on scientific publishing contains extensive references to the “developing world” throughout (House of Commons Science and Technology Committee, 2004). The aims pronounced in the numerous OA petitions and mission statements and in the various reports and policy papers draw on discourses attempting to tie the need for OA to a number of factors that construct it as an issue of moral and political concern, quite beyond the seemingly narrow scope of scholarly communication. By doing so, the “developing world”, its information needs and its fate are constructed in particular and often also in conflicting ways. For example, in a two-week long e-mail discussion during summer 2006, organised by the Coady International Institute (2006) – a Canadian development agency – OA was discussed not only as a panacea for all things development, but quite curiously a great number of postings also ignored its origins within the science system. Rather, OA and its relevance for “development in the Global South” (Coady International Institute, 2006), which was the explicit purpose of the discussion, was debated largely in relation to infrastructure problems, general issues of poverty, malnourishment, education, and in its significance for development workers. Having said that, concerns over the representation of the “developing world” did arise in the debate, specifically over its representation in the media. However, well-known images of powerlessness continued to be advanced simultaneously, and the friction between positions reaffirming what could be called stereotypical images and those trying to unravel them remained strangely subdued. Likewise, while a certain unease towards a lot of development practice could be sensed throughout, the ultimate belief in the possibility of development remained unshattered, as did ultimately the belief in technology as the facilitator of such evolutionary progress. Science and development To some extent the representation of OA, specifically so in relation to development, is based on assuming a causal connection first between science and its (formal) literature as well as “information systems” and subsequently with the possibility of development, and it is dependent on at least two factors. Both are dependent on assumptions that are problematic for various reasons and which usually ignore the

Open access and “development”

455

AP 59,4/5

456

instrumental relationship science had with colonialism and furthermore neglect the connection between colonialism and development. First, it depends on a view of science as a neutral, privileged, and crucially as a universal form of knowledge. Second, it is based on an evolutionary perception of development as the fundamentally positive, continuous advancement along known pathways, towards a state of development that had already been reached previously by another society. Both are also bound to ideas about science and technology, which uncritically equate scientific and technological advances with positive, societal progressiveness. Of course, this is not a new phenomenon, but as Arturo Escobar (1995, p. 36) reminds us, “[s]cience and technology had been the markers of civilization par excellence since the 19th century”. The ideas of science and of societal change that arrive from the direct association of science with knowledge and considerably build on the assumption of science’s universality have been criticised, questioned, and challenged on a number of accounts and this has given way to various forms of post-studies, including post-Kuhnian, post-colonial, or post-development (Harding, 2000). Yet at the same time, both can still be said to underlie certain perceptions of science, technology and development that dominate the views of policy makers, large development institutions as well as international organisations. It is still a widely held belief that a causal connection exists between scientific advances and mostly positive social progress and that more science and increased science and technology transfer can only benefit society (Harding, 2006, p. 1 et seq.). This becomes especially evident in the context of major international summits, one of the arenas where OA is debated. For example, in the context of the World Summit on the Information Society the relevance of scientific information for development, more often than not tied up with technology, was amongst the foremost issues discussed. Whereas, the notions of science as well as of development were fundamentally used as unproblematic and OA quite easily found its way into the declaration of principles (World Summit on the Information Society, 2003a) as well as the plan of action (World Summit on the Information Society, 2003b). This has to be seen as situated within a theme that has a long-standing tradition. The notion of science and technology for development already appeared in the by now famous point four of US President Harry Truman’s inauguration speech in 1949. This particular speech is understood to have heralded the age of development by first introducing the concept of underdevelopment into the language repertoire of the political mainstream after the Second World War (Esteva, 1992; Escobar, 1995; Rist, 2002). It has developed into a standard theme in the language of development institutions. To this day, while constantly being reinvented as presenting a unique opportunity facilitated by technological changes, it appears in documents issued by these and similar organisations in ways that have changed surprisingly little since the 1950s. In certain ways, the role of OA is assigned and the manner in which it is depicted also has to be seen as a continuation of these themes and the policies connected. A statement such as “Scientific and technological research is essential for social and economic development”, taken from the Salvador Declaration on Open Access, clearly marks out this continuation and it affirms OA as tying into forms of representation that adhere to a depiction of the world according to the classic development paradigm which has by and large dominated the post-war era. Likewise, if the BOAI speaks of an “unprecedented public good” made possible through a “new technology”, this affirmation of novelty paired with an

untarnished view of the possibilities of technology also positions it in the long-standing tradition of development speak. Conflicting discursive ties The OA movement ties heavily into long-established discourses, while at the same time it draws on and shapes current ideas about openness, the commons, and networking. In this sense OA also provides a focal point for certain aspects of several wider developments that are particularly relevant in contemporary society, which cannot be disconnected from their histories and contingencies. These are the expansion and distribution of science, the changed circumstances of communication on the internet, status and conceptions of information, as well as the inequalities that define international relations, to a large degree still captured neatly in the highly charged notion of “development”. Despite the fact that some attempt to delineate OA from other “open” movements – for example, open source, free software, or creative commons – (Harnard, 2006), clear connections and convergences between them exist, in particular in the language used, and in the ways in which all, albeit in different ways, are positioned as counter currents to contemporary developments concerning various aspects of intellectual property regimes. It is thus particularly intriguing to observe how OA links into two, at least seemingly opposing ways of speaking, which are both particularly interesting when considered in regard to those that are marginalised. On the one hand, as has been discussed, OA is largely about what has been called “Western”, European, or modern science and ultimately it is about extending its reach through its texts. On the other hand, OA, more than just by virtue of its name, also ties into the contemporary and highly ambiguous discourse of openness, which is represented most prominently by the open source and also the free software movements. This brings it into the argumentative proximity of what is commonly perceived as a counter movement, which positions itself in opposition to mainstream trends. Put differently, OA ties into at least two discursive spaces, which, at least on the surface, seem to be if not fundamentally opposing, at least conflicting. One that is firmly grounded in advancing the very type of knowledge that is associated with modernisation and modernity and which to a degree has been interpreted as a symbol and expression of Western dominance and its quite concrete consequences (e.g. Alvares, 1992); and one that stands for opposition, collaboration, participation, and resistance. The problematique or conflict arising from this convergence, it seems to me, is particularly palpable when notions of the “developing world” are introduced and becomes even more obvious when ideas about indigenous knowledge filter through in the debates and their role in relation to science becomes an issue in need of justification. It is this conflict that makes OA such an interesting phenomenon. Curiously, however, it seems that it is also this very conflict which positions OA firmly within the realm of the various contemporary open and free movements and which all appear to oscillate between providing platforms of resistance or merely supposedly better tools for capitalist advancement. Berry and Moss (2006) point to a general problem with regard to the arguments of free culture in general, and the Free Software Movement in particular, when they say these: . . . are overwhelmingly made within a moral register. Claims to authority are made by reference to a priori human rights divorced from the political realm. Decisions are made between “right” and “wrong” [. . .] on the basis of a supposedly shared morality.

Open access and “development”

457

AP 59,4/5

458

This is an important statement and it also rings true for OA. Here equally claims are made based on hegemonic discourses that are positioned as a priori and universal and this seldom forecloses any serious engagement with the historic and political contingencies of these claims. The construction of OA as an issue of unquestionable moral necessity, while understandable from the perspective of the protagonists involved, also impedes an explicit politicisation and frankly also the problematisation of its own heritage, which is necessary in order to be able to determine where the less obvious fault lines lie and thus possibly to arrive at some conclusions about OA’s alternative potential for change. Note 1. The terms “developing world” or “developed world” are here not taken to imply certain individual countries, categorised as such according to one of the various classifications. One way to envisage the relation between the “developing” and the “developed world” that can also be fruitfully drawn on here is that of a dominant “meta-geography” which is the product of discourse. This is “a set of spatial structures through which people order their knowledge of the world: the often unconscious frameworks that organize studies of history, sociology, anthropology, economic, political science, or even natural history” (Lewis and Wigen, 1997, p. ix). As such they will be understood here as relating to a particular historical and epistemological position. They are elements of popular and wider political discourses that have come to denote certain, yet not always clearly, circumscribed situations and types of relations. They are not understood as factual entities that describe actual geo-political borders or countries. To highlight this fact they will be surrounded by quotation marks. References Alvares, C. (1992), Science, Development and Violence. The Revolt Against Modernity, Oxford University Press, Oxford. Arunachalam, S. (2003), “Information for research in developing countries: information technology – friend or foe?”, Bulletin of the American Society for Information Science and Technology, Vol. 29 No. 5, pp. 16-21. Bailey, C.W. (2005), “Open access bibliography. Liberating scholarly literature with e-prints and open access journals”, Association of Research Libraries, Washington, DC, available at: www.escholarlypub.com/oab/oab.pdf (accessed 3 November 2006). Berlin Declaration (2003), Berlin Declaration on Open Access to Knowledge in Science and the Humanities, available: at: www.zim.mpg.de/openaccess-berlin/berlindeclaration.html (accessed 31 October 2006). Berry, D.M. and Moss, G. (2006), “The politics of the libre commons”, First Monday, Vol. 11 No. 9, available at: http://firstmonday.org/issues/issue11_9/berry/index.html (accessed 31 October 2006). Bethesda Statement (2003), Bethesda Statement on Open Access Publishing, available at: www. earlham.edu/,peters/fos/bethesda.htm (accessed 31 October 2006). Budapest Open Access Initiative (2001), available at: www.soros.org/openaccess (accessed 31 October 2006). Chan, L. and Costa, S. (2005), “Participation in the global knowledge commons: challenges and opportunities for research dissemination in developing countries”, New Library World, Vol. 106 Nos 3/4, pp. 141-53.

Chan, L. and Kirsop, B. (2001), “Open archiving opportunities for developing countries: towards equitable distribution of global knowledge”, Ariadne, No. 30, available at: www.ariadne.ac. uk/issue30/oai-chan (accessed 3 November 2006). Chan, L., Kirsop, B. and Arunachalam, S. (2005), “Open access archiving: the fast track to building research capacity in developing countries”, Science and Development Network, available at: www.scidev.net/ms/openaccess (accessed 3 November 2006). Coady International Institute (2006), “Open Access and Information for Development”, e-discussion sponsored by the Coady International Institute, May 29-June 6, 2006, available at: www.dgroups.org/groups/openaccess (accessed 3 November 2006). Davison, R.M., Harris, R.W., Licker, P.L. and Shoib, G. (2005), “Open access e-journals: creating sustainable value in developing countries”, PACIS 2005. Proceedings of the Pacific Asia Conference on Information Systems, Bangkok, 7-10 July, available at: www.pacis-net.org/ file/2005/183.pdf (accessed 25 February 2006). Deschamps, Ch. (2003), “Round table: open access issues for developing countries”, Information Services & Use, Vol. 23 Nos 2/3, pp. 149-59. Durrant, S. (2004), “Overview of initiatives in the developing world”, in Esanu, J.M. and Uhlir, P.F. (Eds), Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium, US National Committee for CODATA, Board on International Scientific Organizations, Policy and Global Affairs Division, National Academies Press, Washington, DC, pp. 122-6. Escobar, A. (1995), Encountering Development. The Making and Unmaking of the Third World, Princeton University Press, Princeton, NJ. Esteva, G. (1992), “Development”, in Sachs, W. (Ed.), The Development Dictionary. A Guide to Power as Knowledge, Zed Books, London, pp. 6-25. Frohmann, B. (2004), Deflating Information. From Science Studies to Documentation, University of Toronto Press, Toronto. Haider, J. and Bawden, D. (2006), “Pairing information with poverty: traces of development discourse in LIS”, New Library World, Vol. 107 Nos 9/10, pp. 371-85. Harding, S. (1991), Whose Science? Whose Knowledge? Thinking from Women’s Lives, Open University Press, Buckingham. Harding, S. (2000), “Gender, development, and post-enlightenment philosophies of science”, in Narayan, U. and Harding, S. (Eds), Decentering the Centre. Philosophy for a Multicultural, Postcolonial, and Feminist World, Indiana University Press, Bloomington, IN, pp. 240-61. Harding, S. (2006), Science and Social Inequality. Feminist and Postcolonial Issues, University of Illinois Press, Urbana, IL. Harnard, S. (2006), “How open access is related to free software and open source”, available at: www.p2pfoundation.net/index.php/How_Open_Access_is_related_to_Free_Software_ and_Open_Source (accessed 27 October 2006). House of Commons Science and Technology Committee (2004), “UK House of Commons Science and Technology Committee, Tenth Report, 7 July 2004”, available at: www.publications. parliament.uk/pa/cm200304/cmselect/cmsctech/399/39902.htm (accessed 19 October 2006). International Federation of Library Associations (2004), “IFLA Statement on open access to scholarly literature and research documentation”, available at: www.ifla.org/V/cdoc/ open-access04.html (accessed 31 October 2006). Lewis, M.W. and Wigen, K.E. (1997), The Myth of Continents. A Critique of Metageography, University of California Press, Berkeley, CA.

Open access and “development”

459

AP 59,4/5

460

Marcondes, C.H. and Saya˜o, L.F. (2003), “The SciELO Brazilian Scientific Journal Gateway and Open Archives: a report on the development of the SciELO-Open Archives Data Provider Server”, D-Lib Magazine, Vol. 6 No. 3, available at: www.dlib.org/dlib/march03/ marcondes/03marcondes.html (accessed 15 January 2007). Marglin, S.A. (1996), “Farmers, seedsmen, and scientists: systems of agriculture and systems of knowledge”, in Apffel-Marglin, F. and Marglin, S.A. (Eds), Decolonizing Knowledge. From Development to Dialogue, Clarendon Press, Oxford, pp. 185-248. Mestrum, F. (2002), “De l’utilite´ de la ‘lutte contre la pauvrete´’ pour le nouvel ordre mondial”, in Rist, G. (Ed.), Les mots du pouvoir. Sens et non-sens de la rhe´torique internationale, Press Universitaire de France, Nouveaux cahiers de l’IUED, Gene`ve, pp. 67-81. Muthayan, S. (2004), “Open-Access research and the public domain in South African universities: the public knowledge project’s open journal system”, in Esanu, J.M. and Uhlir, P.F. (Eds), Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium, US National Committee for CODATA, Board on International Scientific Organizations, Policy and Global Affairs Division, National Academies Press, Washington, DC, pp. 134-45. Nandy, A. (1988), “Introduction: science as a reason of state”, in Nandy, A. (Ed.), Science, Hegemony and Violence. A Requiem for Modernity, United Nations University/Oxford University Press, Oxford, pp. 1-23. Nandy, A. (1992), Traditions, Tyranny and Utopias. Essays in the Politics of Awareness, Oxford University Press/Oxford India Paperbacks, Dehli. Nandy, A. and Visvanathan, S. (1990), “Modern medicine and its non-modern critics. A study in discourse”, in Apffel-Marglin, F. and Marglin Apffel, S. (Eds), Dominating Knowledge. Development, Culture, and Resistance, Clarendon Press, Press, pp. 145-84. Narvaez-Berthelemot, N. and Russel, J.M. (2001), “World distribution of social science journals: a view from the periphery”, Scientometrics, Vol. 51 No. 1, pp. 223-39. Notturno, M.A. (2000), Science and the Open Society. The Future of Karl Popper’s Philosophy, Central European University Press, Budapest. Open Society Institute (n.d.), “Frequently asked questions”, available at: www.soros.org/about/ faq (accessed 31 October 2006). Rajashekar, T.B. (2004), “Open-access initiatives in India”, in Esanu, J.M. and Uhlir, P.F. (Eds), Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium, US National Committee for CODATA, Board on International Scientific Organizations, Policy and Global Affairs Division, National Academies Press, Washington, DC, pp. 154-7. Ramachandran, P.V. and Scaria, V. (2004), “Open access publishing in the developing world: making a difference”, Journal of Orthopaedics, Vol. 1 No. 1, available at: www.jortho.org/ 2004/1/1/e1/index.htm (accessed 3 November, 2006). Rist, G. (2002), The History of Development. From Western Origins to Global Faith, Zed Books, London. Rosenberg, D. (2002), “African journals online: improving awareness and access”, Learned Publishing, Vol. 15 No. 1, pp. 51-7. Sancho, R. (1992), “Misjudgements and shortcomings in the measurement of scientific activities in less developed countries”, Scientometrics, Vol. 23 No. 1, pp. 211-33. Scaria, V. (2003), “Scholarly communication in biomedical sciences, open access and the developing world”, Internet Health, Vol. 1 No. 1, available at: www.virtualmed.netfirms. com/internethealth/articleapril03.html (accessed 3 November, 2006).

Salvador Declaration (2005), Salvador Declaration on Open Access: The Developing World Perspective, available at: www.icml9.org/meetings/openaccess/public/documents/ declaration.htm (accessed 31 October 2006). Smart, P. (2004), “Two-way traffic: information exchange between the developing and the developed world”, Serials, Vol. 17 No. 2, pp. 183-7. Tenopir, C. (2000), “Online journals and developing nations”, Library Journal, Vol. 125, November, pp. 34-6. Tenopir, C. and King, D.W. (2000), Towards Electronic Journals: Realities for Scientists, Librarians, and Publishers, Special Libraries Association, Washington, DC. Weitzman, J.B. and Arunachalam, S. (2003), “Open access in the developing world”, Open Access Now, 15 December, available at: www.biomedcentral.com/openaccess/archive/?page ¼ features&issue ¼ 11 (accessed 3 November, 2006). Willemse, J. (2002), Library Funding. Adequate Financial Support for African University Libraries, INASP, SCANUL-ECS, Makere/Oxford. Wilson, M. (2003), “Understanding the international ICT and development discourse: assumptions and implications”, The Southern African Journal of Information and Communication, available at: http://link.wits.ac.za/journal/j0301-merridy-fin.pdf (accessed 3 November 2006). World Summit on the Information Society (2003a), Declaration of Principles, Document WSIS-03/GENEVA/DOC/4-E, 12 December, available at: www.itu.int/wsis/docs/geneva/ official/dop.html (accessed 3 November, 2006). World Summit on the Information Society (2003b), Plan of Action, Document WSIS-03/GENEVA/DOC/5-E, 12 December 2003, available at: www.itu.int/wsis/docs/ geneva/official/poa.html (accessed 3 November 2006). Corresponding author Jutta Haider can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

Open access and “development”

461

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

AP 59,4/5

Continuing professional development for library and information science

462

Case study of a network of training centres

Received 21 January 2007 Accepted 3 June 2007

Lyn Robinson Department of Information Science, City University London, London, UK, and

Audrone Glosiene Institute of Library and Information Sciences, Vilnius University, Vilnius, Lithuania Abstract Purpose – The paper aims to describe a network of training centres (TCN-LIS) to support continuing professional development (CPD) of library and information specialists in countries of Central and Eastern Europe and Central Asia, funded by the Open Society Institute (OSI). It also aims to draw some general lessons for CPD in the library/information sector. Design/methodology/approach – The paper reviews the development and activities of the training centre network, and reflects on issues raised and lessons learned. Findings – The paper finds that TCN-LIS has been effective in raising standards of professional competence among library and information specialists in the countries of the OSI region. General conclusions can be drawn about good practice for CPD, in issues including the most appropriate topics to be covered, most appropriate methods for teaching and learning, situation of CPD training centres, and relations between CPD and formal education. Research limitations/implications – The study is based on a network of training centres in 23 countries between 1999 and 2006. Originality/value – This is the only paper describing TCN-LIS, and the library/information training activities supported by OSI. It provides a unique perspective for considering library/information CPD issues. Keywords Continuing professional development, Information science, Eastern Europe, Central Asia Paper type Case study

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 462-474 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817645

Introduction This paper describes the Training Centre Network initiative, a network of national centres for promoting the professional development of library/information workers in countries of Central and Eastern Europe and Central Asia. Initially set up under the auspices of, and funded by, the Open Society Initiative (OSI), the centres of the Network are now self-sustaining. The activities of the network are described here as an instance of international co-operation in professional development in rapidly changing political and economic circumstances. The experiences of the network also provide lessons on the provision of professional development that are of more general applicability for all library and information specialists.

Lyn Robinson was co-ordinator of the training centre network during its sponsorship by OSI and continues as an advisor. Audrone Glosiene is co-ordinator of the Lithuanian LIS CPD training centre and director of the Institute of Library and Information Sciences at Vilnius University, which now hosts the network infrastructure.

CPD for LIS

Continuing professional development for information specialists Continuing professional development (CPD), sometimes referred to as continuing professional education (CPE), in a library/information context, is the process by which library and information specialists maintain a professional competence throughout their careers. It has been more fully defined as:

463

[A] career-long process of improving and updating the skills, abilities and competencies of staff by regular in-service training and education, supported by external courses (Prytherch, 2005)

and, in a more general professional context, as: [The] systematic maintenance, improvement and broadening of knowledge and skills and the development of personal qualities necessary for the execution of professional and technical duties throughout the practitioner’s working life (Corrall and Brewerton, 1999).

The relative balance of in-service (internal) training and external provision varies a good deal internationally and according to sector; increasingly, internal CPD is not available, except in the largest employer organisations. Similarly, the extent to which CPD may replace, rather than build on, an initial formal professional education is viewed differently in different circumstances. The need and rationale for CPD generally, and issues arising, have been discussed by – inter alia – Brine (2004), Gorman (2004), Tedd (2003), Layzell Ward (2002), Kinnell (2000) and Farmer and Campbell (1997). CPD provision in particular countries has been outlined, for example, for Croatia (Horvat, 2004), for Denmark (Thorhauge, 2005) and for the UK (Information World Review, 2006), and a worldwide overview has been recently presented (Genoni and Walton, 2005). Regarding particular topics, CPD provision has been described for cataloguing (Hider, 2006), and for the knowledge, skills and competencies necessary for working with digital libraries. CPD provision in Slovenia and in the UK has been compared (Bawden et al., 2005), and the use of distance learning techniques for CPD has been described and evaluated (Dahl et al., 2002; Bawden and Robinson, 2002, Robinson and Bawden, 2002; DELCIS, 2001). The ideal curriculum for library and information science per se is far from settled, with divergent views as to what should be its “core” – if, indeed, such a concept is realistic – and equally divergent views as to how to adapt library/information education to a changing technical, economic and social environment: see, for example, from various national and international perspectives, Kajberg and Lørring (2005), Aina (2005), Badovinac and Juznic (2005), Virkus and Wood (2004), van Heusden (2004), Ashcroft (2004), Markey (2004) and Horvat, 2003. This reflects onto the CPD situation, where there is little agreement as to the most important and appropriate topics to be covered, although there is usually an assumption that CPD should cover the more immediately practical and vocational – and, perhaps, local and ephemeral – aspects. Nor is there agreement, or consistency of practice, as to where, and by whom, CPD should be provided. Practice differs internationally, with CPD providers including

AP 59,4/5

464

national libraries, academic departments, professional organisations, government agencies, special interest groups and commercial providers (see, for example, Bawden et al., 2005; Thorhauge, 2005; Information World Review, 2006). The OSI and its library/information programmes The OSI was established by the financier and philanthropist George Soros, in order to promote the concept of “Open Society” first propounded by the philosopher Karl Popper (Popper, 1992; Soros, 1995, 2000; Notturno, 2000). In brief, open society refers to a form of society characterised by the rule of law, a mechanism to replace the law-makers without resort to force, and free access to information to support critical and rational debate. The communication and sharing of information and knowledge are key to the idea of open society, and acceptance of the idea, in turn, has implications for libraries and information provision (Robinson and Bawden, 2001a, b). OSI’s activities during the period under review were largely in Central and Eastern Europe, in Central Asia, plus Haiti and Mongolia, and were largely directed towards assisting the emergent and transitional countries during and after the break-up of the Soviet Union and the socialist bloc. In “library terms”, there are intriguing parallels between the situation in the early 1990s and that of the late 1940s, when support had to be given to re-establish the library systems of Europe after the Second World War (Danilewicz, 1945; Choldin, 2005). From its outset, among many activities and programmes, OSI was involved in promoting the free communication of information, and hence involved with the library/information sector, including such basics as the provision of photocopying, which had hitherto been closely regulated in the OSI region (Danyi, 2006). From 1994 to 2000 a distinct libraries programme, known initially as the Regional Library Programme and from 1997 as the Network Library Programme, was operated by OSI. In 2000 this was merged, along with OSI’s Internet Programme and Media Programme, into a single Information Programme, which continues. (Information on OSI’s Information Programme is available from www.soros.org/initiatives/ information. Archival information on the Regional and Network Library Programmes is available from www.osi.hu.nlp) The specific purpose of the library programmes, as stated on the NLP website, within OSI’s overall remit of supporting emergent open societies was to “help libraries within the region to transform themselves into modern service-oriented centres, serving their communities, and contributing to the establishment and maintenance of open societies”. These programmes undertook a number of initiatives and projects, including: . a variety of initiatives for collection development and service improvement; . support for open access and consortium purchasing for international materials (Friend, 2004; Rowland, 2005); . a variety of initiatives promoting preservation of, and better access to, local materials (Stalbovskaya, 2002); . support for conference attendance by librarians from the region; . a public library development initiative, aimed at strengthening the position of the public library as a community centre and public information point; and

.

LIS Fellowships, for professional librarians from the OSI region to spend periods at equivalent institutions in the USA and Western Europe (e.g. the Library of Congress, Queens Public Library New York, The University of Graz Library, the British Library, and the Bodleian Library at Oxford University).

Many of these initiatives had a “training and development” component. For example, the Fellowships had the sharing of knowledge subsequently as an integral part of the programme, and the Fellows were brought together in “Librarians as Trainers” workshops. Four initiatives addressed this issue directly and can be seen as precursors to the training centre network: (1) Financial support was provided for the creation of training materials for CPD courses for practising librarians: most notably the EDULIB programme which developed several training modules, with a distance learning element, for Slovak librarians (Dahl et al., 2002). (2) A series of training seminars on library automation and systems development were provided in many countries of the OSI (for an example, see Robinson, 1997). These were intended to assist in the widespread introduction of computerised library management systems being introduced for the first time in many of these countries (Borgman, 1996). Although a standard format for the seminars was established, this had to be implemented very flexibly. In some cases, where the knowledge of the participants was relatively great, it took the form of an updating and exchange of experience seminar; in other cases, a basic introduction to ICTs in the library, and indeed to the basics of computing itself, was needed first. A similar, though smaller scale, seminar series on library management was also held, and some special seminars on topics such as fundraising were organised. (3) Building on the LIS Fellowships concept, funding was made available for Fellowships for two LIS academics at the Department of Information Science, City University London. (Both were later involved as trainers and resource developers in their local TCN-LIS training centre and in the Network Teaching Programme.) (4) A summer school was run for five years, between 1997 and 2001 at the Central European University (CEU), Budapest, for LIS practitioners and educators from the OSI region (Robinson et al., 2000; Bawden and Robinson, 2002). Initially focused on uses of the internet in libraries, as this concept became better established, the focus changed to the more general concept of digital literacy for open society. The last of the summer schools introduced distance learning to the CEU’s summer programmes (Bawden and Robinson, 2001; Robinson and Bawden, 2002). (Many participants in these summer schools became actively involved in their national LIS training centres.) Partly based on the success of these initiatives, and recognising the importance of the continuing education of library/information workers in developing a modern and forward looking library sector in the OSI region, a network of national LIS training centres was established with OSI funding.

CPD for LIS

465

AP 59,4/5

466

The Training Centre Network for Librarianship and Information Science (TCN-LIS) TCN-LIS was set up under the auspices of OSI’s Network Library Programme in 1999, and subsequently transferred to the merged Information Programme. The first funding was provided in 2000 and the final funding in 2003. Funding was provided for two years on a “matching funds” basis (i.e. the centres were required to raise additional income equivalent to the OSI grant). Some smaller grants, without matching funding being required, were also made and OSI also funded the costs of the network infrastructure, including meetings. After the cessation of OSI funding the network continued on a self-sustaining basis. A total of 23 national training centres were involved in the network. Five centres – in general, the most active and best established – received OSI funding from the outset: Czech Republic, Hungary, Lithuania, Slovakia and Slovenia. A further 12 centres received OSI at a later stage: Bosnia, Bulgaria, Croatia, Georgia, Kazakhstan, Kyrgyzstan, Moldova, Mongolia, Montenegro, Russia, Ukraine and Yugoslavia (Serbia). Five centres, though affiliated to the network, were unable to fulfil the conditions for funding: Albania, Armenia, Azerbaijan, Kosovo and Uzbekistan. Finally, a well-established and active LIS training centre in Latvia was affiliated to the network, though it did not seek OSI funding. These centres operated in a wide variety of settings, and carried out training (largely by conventional face-to-face short courses, although with some innovative methods) across a very wide variety of topics. The network infrastructure comprised a website and mailing list for communication, a set of shared training resources, and a programme of meetings and special training events. The network was initially co-ordinated from the OSI regional offices in Budapest. After cessation of OSI funding, the co-ordination moved to City University London, and then, from 2006, Vilnius University, Lithuania. The main task of the network coordinator, apart from the administration of grants and maintenance of the infrastructure, was to promote exchange of experience and good practice between the centres. This was achieved partly through electronic communication, partly through individual visits and local meetings, partly through Fellowships and partly through a series of international meetings. Five main meetings have been held: (1) September 2000, Budapest, as an inaugural meeting for the network; (2) June 2002, Budapest, with the aim of supporting newly established centres, by the sharing of experience from longer established centres; (3) April 2003, Prague, with a focus on the role of the training centres in supporting and promoting the place of the library in civil society; (4) November 2003, London, devoted to the future education and training of LIS professionals; and (5) November 2006, Vilnius, with the theme of the impact of library/information services (the papers from this meeting will be published in the Lithuanian Journal of Information Science during 2007). A series of Training Centre Fellowships were also funded, by which 14 trainers from TCN-LIS centres were able to attend special fellowship programmes, focused on the development of curricula and training materials, at the Department of Information

Science, City University London, and at the Mortensen Centre, University of Illinois. This led to the establishment of a Network Teaching Programme, at which trainers from longer-established centres supported less developed centres by carrying out training directly and assisting the development of local training materials. At the end of 2006, TCN-LIS, now co-ordinated from Vilnius University, was still active, though in a very different environment from that in which it was established. Several of its more active centres are in countries which have joined the European Union, and in a much more favourable political and economic situation than at the start of this project. For others, particularly in the Caucasus and in Central Asia, the local situation is much less favourable, and this is reflected in the environment in which library/information services operate. The development of TCN-LIS has illuminated a number of issues of CPD in the library/information context, which are of relevance well beyond the OSI region, and the time of transition to democracy. General lessons for CPD Lessons learned from the operations of the TCN-LIS training centres over a six-year period, brought out in particular at the international meetings of the network, have relevance for CPD provision for LIS generally, beyond the specific time and place of the network’s activity. These are set out here under six main headings: (1) topics for CPD; (2) management and evaluation issues; (3) teaching and learning methods; (4) location of the training centre; (5) relations between CPD and formal education; and (6) local versus general training resources. Reference is made here to factors promoting the success of training centres. The “success” of centres was judged by their activity (number and variety of courses and participants), longevity, and sustainability after OSI funding ceased. Topics for CPD The range of topics covered in the courses of the TCN-LIS centres was very wide, while the variant nomenclature for course titles and levels made comparisons difficult. A “core curriculum” to be recommended to all centres was seen as desirable, but this proved difficult to achieve, for reasons including: . local needs, issues and concerns, depending on the local library/information environment; . levels of education and training of the local library workforce; . availability of training resources in local languages; and . availability of local trainers for some topics. However, it is clear that the course mix offered by centres fell into a general categorisation: . Basic “core” courses, on topics central to LIS work (e.g. reader services, classification, cataloguing and resource description). For some centres, these

CPD for LIS

467

AP 59,4/5 .

468

.

.

.

were typically updating and refresher courses; for others, they compensated for a lack of formal LIS education. “Transition” topics, of particular importance to countries of the region as they emerged from a socialist environment and required new skills (e.g. of marketing and promoting libraries, and influencing decision makers and funders). “Ephemeral” topics, important to that local situation at the time. This might be, for example, grant writing if the local library community stood to gain from funding awards, or library management software if a new national system were being introduced. “Local” topics. Some topics offered by the centres (e.g. “psychology of the user”) were clearly of established importance to that centre’s local LIS “culture”, while not being in the mainstream of LIS education or CPD. ICT use, which was important for all centres.

An awareness of this categorisation was as close as could be achieved to a recommended curriculum as a benchmarking for the centres’ provision. It may be that this, perhaps omitting “transition” topics, would be suitable as a template for LIS CPD more generally. Over time, and with political and economic development, the “transition” topics became of less paramount importance, particularly for those countries achieving EU accession status. That is not to say that such topics are not of importance, rather that they match the significance which they have in CPD in the countries of Western Europe. ICT topics retain an importance, though a trend seen was for the centres to move away from the “ICT for libraries” style and towards provision of “general” ICT qualifications, such as the European Computer Driving Licence (ECDL). This qualification has been suggested to be suitable as a basic grounding in ICTs for formal LIS education and as a basis for deeper skill sets (Poulter and McMenemy, 2004), and hence should be suitable for CPD. As a general principle, it may be worthwhile for LIS training centres to consider the provision of such generally recognised qualifications, rather than attempting to provide LIS-specific equivalents. This may be so, not just for the applications of ICTs, but also for such “generic” skills as marketing, people management and budgeting. Management and evaluation issues One of the factors which most markedly distinguished the successful centres within TCN-LIS from the less successful was the extent to which they were able to establish an appropriate management structure, handle finances successfully, including finding matching funds to the OSI grants, and market and promote their courses to their audience. This latter included assessing the needs of potential trainees – which might well differ between centres – and evaluating the success of the courses, with a mechanism in place to ensure that the results of evaluation were fed back into modification of courses. These issues were new to many of the centre operators, who had been working in a socialist style of command economy, with little need, or indeed opportunity, to consider them. Some centres dealt with them more successfully than others, often a reflection of

the economic and social conditions in the country generally. These aspects received a good deal of attention from within the network, with advice and exchange of good practice being encouraged, to assist the less developed centres. Another management issue was the need for centres to adapt their training provision realistically to the available infrastructure and resources. While lack of expert local trainers could, to an extent, be overcome through the Network Teaching Programme, problems of adequate accommodation, electricity supply, IT facilities, etc., were not so easily dealt with. The internet has provided many useful and freely available resources, at least to those able to read and use material in English (Koltay, 2006), but this is far from a complete answer. While many of these issues were specific to their time and place, there are some general lessons for CPD here. Marketing, promotion, needs analysis and self-evaluation are all essentials within any CPD programme in any situation, and may be too readily over-looked in the quest for better training materials and presentations. A realistic attitude to what can be provided within the constraints of the training situation is of universal relevance. Teaching and learning methods The great majority of training in the network was, and is, carried out by the “traditional” CPD methods of short courses delivered by face-to-face presentations and demonstrations. Some innovative methods have been tried by various centres. The Lithuanian training centre has made some of its courses available through the WebCT e-learning environment, building on their leadership of DELCIS (2001), an international project (Distance Education for Librarians: Creating an information-Competent Society), funded by the EU Leonardo Da Vinci programme, which produced three distance learning modules: (1) “Basic Internet”; (2) “Advanced Internet”; and (3) “Webpage design”. As noted above, a series of CPD modules for library and information professionals in the Slovak Republic was designed with OSI support, comprising a week of face-to-face lectures session, practical and group discussions, followed by three months of guided work through printed self-study materials (Dahl et al., 2002). Again as noted above, the OSI-supported CEU summer school on “digital literacy for open societies”, moved, in its final year, from a two-week summer school to a one-week school, preceded and followed by interaction through an e-learning system (Bawden and Robinson, 2001, 2002; Robinson and Bawden, 2002). Experience within TCN indicates, however, that full, or even partial, distance learning CPD is not likely to be a popular option. Many of the perceived benefits of a CPD “course” – time away from work for reflection, face-to-face interaction with fellow participants from outside the normal workplace setting, social and networking aspects, opportunity to attend a central training venue, etc. – are largely lost in a distance learning setting. It is notable, for comparison, that of the many CPD courses available from several providers in the UK, only two courses from one provider have a distance learning version.

CPD for LIS

469

AP 59,4/5

470

This is not to say that distance learning (to include open learning and e-learning) is not of value for CPD: far from it. Rather, it is to say that it should be used carefully as part of a blended approach retaining the values of face-to-face interaction, as for those CPD topics for which it is most appropriate (Robinson and Bawden, 2002; Robinson et al., 2005). Typically, these topics will be those of a “technical” nature, either in the sense of applying information and communication technologies, or of involving some process such as cataloguing, indexing and resource description. Location of training centre It was noted above that CPD is offered by various types of provider in various settings, and this was true of the centres comprising TCN-LIS. These were located variously in national libraries, academic libraries, LIS academic departments, professional associations, government agencies, non-governmental organisations, and less formal settings. It was notable that the most successful centres were located either in LIS departments or in academic libraries. These locations provided a continuing focus for activity, physical space and resources, and access to people who could function both as trainers and support staff. Interaction between training activities and other activities of the organisation hosting the centre is also an advantage. Such settings also allow training activities to be spread among a group of personnel, and are hence less dependent on the availability and enthusiasm of a few individuals. Other settings within the network lacked some of these aspects, to the detriment of the centre. This is different from the situation in the UK, for example, where the national library no longer participates regularly in CPD, and where academic LIS departments provide CPD courses only occasionally, and therefore the “national library or academic LIS department” model is not to be recommended worldwide. In the UK, however, the lengthy development of an extensive LIS infrastructure has allowed the development of well-established CPD programmes in the professional association and in commercial and non-profit training organisations. This is not likely to be the case in smaller countries, or in those with a less developed LIS environment. Here, the focus of CPD in national libraries of academic departments is likely to be the norm. Relations between CPD and formal education It became clear that another factor influencing the success of the centres was the situation with respect to formal education at university level in library and information science in the relevant country. The more successful centres were those operating in an environment where there was also a well-established system of higher education providing academic and/or professional qualifications in LIS. Although there are some confounding factors – such an educational system was to an extent correlated with economic and social development – this seems a valid observation. In isolated cases where a training centre did not co-operate with providers of formal LIS education, the result was also a relative lack of success. There are a number of reasons for this. A well-established formal LIS education system implies a workforce with a good basic set of professional skills and knowledge, so that CPD may be used for up-dating and adding specialist skills. The participants will also have a better understanding of the area, and will be better able to articulate CPD needs. The lack of such a system means that CPD may be used as a surrogate for a

basic professional education, an intrinsically problematic situation. Such an education system will also provide a pool of trainers for CPD, whether or not the centre is located in an academic department. It will also make it more likely that there will be a set of local language training resources (textbooks, course notes, practical exercises with local relevance, etc.) available. The general lesson from this is that CPD should complement, rather than try to replace or parallel, formal educational programmes. The two should be synergistic, in terms of topics covered and resources developed. A degree of “formality”, in a sense, for CPD was recognised, in the need for some kind of formal certification of completion of training. This was important for participants in the circumstances of the transition countries, but has also been noted as desirable for other forms of in-service training in Western Europe (Bawden and Robinson, 2002). Local versus general training resources A perennial issue for all CPD (and indeed to all LIS education and training) is the extent to which training materials must be customised to the local situation, and how much can “generic” materials be shared. In the case of TCN-LIS this was exacerbated by the number of local languages in the centres of the network. To a degree, English was the working lingua franca, though in some parts of the region Russian was more widely known as a second language, at least at the time of the network’s initiation and still among older LIS personnel. The advantages of sharing common materials, in terms of economics and of ability to pool effort to create high quality materials, are clear. The value of using freely available Internet materials for this purpose has been emphasised (Koltay, 2006). Some of the earlier efforts in producing CPD materials were based on the translation of original Danish materials into local languages (DELCIS), and effort was put into the translation of materials for TCN-LIS centres, and into the preparation of recommended resource lists. However, the counter argument stresses the increased acceptability to participants of materials in local languages, referring to local issues and using local examples. This point was argued by several of the TCN-LIS centres at various times. The best solution to an issue which affects all CPD provision, to a greater or lesser extent, seems to be one which provides a “core” of basic material, sharable by all and translated into local languages if necessary, supported by local and customised instances and examples. This is very much the solution recommended for information literacy training, whether for CPD or for formal education (Robinson et al., 2005). Conclusions TCN-LIS has been effective in raising standards of professional competence among library and information specialists in the 23 countries in which centres operated, though the extent of the centres’ effectiveness has varied greatly across the OSI region. The factors behind this variant success have been analysed, and general conclusions can be drawn about good practice for library/information CPD. The issues including the most appropriate topics to be covered, most appropriate methods for teaching and learning, management and evaluation policies, the situation of CPD training centres,

CPD for LIS

471

AP 59,4/5

472

relations between CPD and formal education, and the feasibility of a core curriculum based on general resources. TCN-LIS is a good example of the benefits that can be obtained from international co-operation through an ultimately self-sustaining network. References Aina, L.O. (2005), “Towards an ideal library and information studies (LIS) curriculum for Africa: some preliminary thoughts”, Education for Information, Vol. 23 No. 3, pp. 165-85. Ashcroft, L. (2004), “Developing competencies, critical analysis and personal transferable skills”, Library Review, Vol. 53 No. 2, pp. 82-8. Badovinac, B. and Juznic, P. (2005), “Toward library and information science education in the European Union: a comparative analysis of library and information science programmes of study for new members and new applicant countries to the European Union”, New Library World, Vol. 106 Nos 3/4, pp. 173-86. Bawden, D. and Robinson, L. (2001), “Multicultural library education: towards distance learning”, paper presented at the First Annual Joint UK and USA Conference on the Scholarship of Teaching and Learning, London, 6 June. Bawden, D. and Robinson, L. (2002), “Promoting literacy in a digital age: approaches to training for information literacy”, Learned Publishing, Vol. 15 No. 4, pp. 297-301. Bawden, D., Vilar, P. and Zabukovec, V. (2005), “Education and training for digital librarians: a Slovenia/UK comparison”, Aslib Proceedings, Vol. 57 No. 1, pp. 85-98. Borgman, C.L. (1996), “Automation is the answer, but what is the question? Progress and prospects for Central and Eastern European libraries”, Journal of Documentation, Vol. 52 No. 3, pp. 252-95. Brine, A. (2004), Continuing Professional Development: A Guide for Information Professionals, Chandos Publishing, Oxford. Choldin, M.T. (2005), “Libraries in continental Europe: the 40s and the 90s”, Journal of Documentation, Vol. 61 No. 3, pp. 356-61. Corrall, S. and Brewerton, A. (1999), The New Professional’s Handbook, Library Association Publishing, London, p. 266. Dahl, K., Francis, S., Tedd, L.A., Terevova, M. and Zihlavnikova, E. (2002), “Training for professional librarians in Slovakia by distance-learning methods: an overview of the PROLIB and EDULIB projects”, Library Hi Tech, Vol. 20 No. 3, pp. 340-51. Danilewicz, M. (1945), “The post-war problems of continental libraries”, Journal of Documentation, Vol. 1 No. 2, pp. 81-8 (reprinted in Journal of Documentation, 2005, Vol. 61 No. 3, pp. 334-340). Danyi, E. (2006), “Xerox project: photocopy machines as a metaphor for an ‘Open Society’”, Information Society, Vol. 22 No. 2, pp. 111-5. DELCIS (2001), “Distance education for librarians: creating an information-competent society”, available at: www.economicsoftware.ro/delcis (accessed 14 January 2007). Farmer, J. and Campbell, F. (1997), “Information professionals, CPD and transferable skills”, Library Management, Vol. 18 Nos 3/4, pp. 129-34. Friend, F.J. (2004), “How can there be open access to journal articles?”, Serials, Vol. 117 No. 1, pp. 37-40. Genoni, P. and Walton, G. (Eds) (2005), Continuing Professional Development: Preparing for New Roles in Libraries, IFLA Publications No. 116, K.G. Saur, Munich.

Gorman, M. (2004), “Whither library education?”, New Library World, Vol. 105 Nos 9/10, pp. 376-80. Hider, P. (2006), “A survey of continuing professional development activities and attitudes among cataloguers”, Cataloguing and Classification Quarterly, Vol. 42 No. 2, pp. 35-58. Horvat, A. (2003), “Fragmentation of the LIS curriculum: the case of Croatia”, New Library World, Vol. 104 No. 6, pp. 227-32. Horvat, A. (2004), “Continuing education of librarians in Croatia: problems and prospects”, New Library World, Vol. 105 Nos 9/10, pp. 370-5. Information World Review (2006), “Continuing professional development”, Information World Review, No. 220, pp. 23-5. Kajberg, L. and Lørring, L. (2005), “European Curriculum Reflections on Library and Information Science Education”, Royal School of Librarianship and Information Science, Copenhagen, available at: www.db.dk/lis-eu (accessed 12 May 2007). Kinnell, M. (2000), “From autonomy to systems: education for the information and library professions 1986-1999”, Journal of Documentation, Vol. 56 No. 4, pp. 399-411. Koltay, T. (2006), “The role of free Internet resources for library technical services and reference in a Hungarian LIS continuing education course”, Education for Information, Vol. 24 No. 1, pp. 51-70. Layzell Ward, P. (Ed.) (2002), Continuing Professional Education for the Information Society, IFLA Publications No. 100, K.G. Saur, Munich. Markey, K. (2004), “Current educational trends in the information and library science curriculum”, Journal of Education for Library and Information Science, Vol. 45 No. 4, pp. 317-39. Notturno, M.A. (2000), Science and the Open Society, Central European University Press, Budapest. Popper, K. (1992), Unended Quest: An Intellectual Autobiography, Routledge, London. Poulter, A. and McMenemy, D. (2004), “Beyond the European Computer Driving Licence: basic and advanced IT skills for the new library professional”, IFLA Journal, Vol. 30 No. 1, pp. 37-46. Prytherch, R. (2005), Harrod’s Librarians’ Glossary, 10th ed., Ashgate, Aldershot, p. 168. Robinson, L. (1997), “IT in Hungary: the librarian’s perspective”, Managing Information, Vol. 4 No. 7, pp. 40-2. Robinson, L. and Bawden, D. (2001a), “Libraries, information and knowledge in open societies”, Nordinfonytt, Vol. 2 No. 1, pp. 21-30. Robinson, L. and Bawden, D. (2001b), “Libraries and open society: Popper, Soros and digital information”, Aslib Proceedings, Vol. 53 No. 5, pp. 167-78. Robinson, L. and Bawden, D. (2002), “Distance learning and LIS professional development”, Aslib Proceedings, Vol. 54 No. 1, pp. 48-55. Robinson, L., Hilger-Ellis, J., Osborne, L., Rowlands, J., Smith, J.M., Weist, A., Whetherly, J. and Philips, R. (2005), “Healthcare librarians and learner support: competencies and methods”, Health Information and Libraries Journal, Vol. 22 No. 4, Supplement 2, pp. 42-50. Robinson, L., Kupryte, R., Burnett, P. and Bawden, D. (2000), “Libraries and the Internet; a multi-national training course”, Program, Vol. 34 No. 2, pp. 187-94. Rowland, F. (2005), “Journal access programmes for developing countries”, Serials, Vol. 18 No. 2, pp. 104-6. Soros, G. (1995), Soros on Soros, Wiley, New York, NY.

CPD for LIS

473

AP 59,4/5

474

Soros, G. (2000), Open Society: Reforming Global Capitalism, Little, Brown and Company, London. Stalbovskaya, M.S. (2002), “The practice and perspectives of free access to the legal information of citizens of the Republic of Uzbekistan”, International Information and Library Review, Vol. 34 No. 2, pp. 201-7. Tedd, L.A. (2003), “The what? and how? of education and training for information professionals in a changing world: some experiences from Wales, Slovakia and the Asia-Pacific region”, Journal of Information Science, Vol. 29 No. 1, pp. 79-86. Thorhauge, J. (2005), “New demands old skills. A strategy for bridging the competence gap: building competencies in a daily working context”, IFLA Journal, Vol. 31 No. 2, pp. 162-8. van Heusden, M.R. (2004), “The challenge of developing a competence-oriented curriculum: an integrative framework”, Library Review, Vol. 53 No. 2, pp. 98-103. Virkus, S. and Wood, L. (2004), “Change and innovation in European LIS education”, New Library World, Vol. 105 Nos 8/9, pp. 320-9. Corresponding author Lyn Robinson can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0001-253X.htm

Where do we go from here? An opinion on the future of LIS as an academic discipline in the UK Toni Weller and Jutta Haider Department of Information Science, City University London, London, UK

Where do we go from here?

475 Received 11 December 2006 Accepted 25 May 2007

Abstract Purpose – The purpose of this paper is to discuss the current situation of academic LIS research, specifically in the UK and to provide some thoughts considering the future of the discipline. According to the opinion of the authors, this situation is characterised by a lack of cohesion, the need for justification of academic research in terms of its immediate applicability to the professional education of practitioners, and a disjuncture between the information profession and information research. The paper attempts to offer introductory thoughts regarding these circumstances. Design/methodology/approach – The current situation is briefly reviewed and commented on from the authors’ viewpoint. Aspects of Pierre Bourdieu’s study of the university as a hierarchically structured field of forces are considered. Some reference is made to previous literature. Findings – The paper advances the view that the role of academic LIS research, debate and theory formation needs to be strengthened and that this needs to be reflected in the curriculum more strongly. Originality/value – The paper attempts to highlight consistently overlooked contributing factors, and thus aims to shift the perspective towards role and position of LIS research within academia, rather than vis-a`-vis the professional education it is connected to. It aims to stimulate discussion of the current situation, of how it can be perceived, and of ways to address it. Keywords Education, Information science, Curricula, Research Paper type Viewpoint

Introduction In choosing to study the social world in which we are involved, we are obliged to confront [. . .] a certain number of epistemological problems, all related to the question of the difference between practical knowledge and scholarly knowledge (Bourdieu, 1988, p. 1).

In the 1970s and 1980s there was a flurry of interest in the future of library and information science (LIS) as an academic discipline (e.g. Foskett, 1973; Bayless, 1977; Berry, 1987), a debate which has continued to be of importance up to the present day. In the 30 years or so that have since passed the field of LIS – specifically in the UK – has appeared to move closer to professional and business degrees, with less of an emphasis on academic research and academic (as opposed to vocational) teaching (e.g. Eaton and Bawden, 1991; Bud-Frierman, 1994; Rowley, 1998; Holtham, 2001; Layzell Ward, 2003; Oppenheim et al., 2004). As two doctoral students in this field, the authors feel that this shift of emphasis has helped to create and exacerbate some of the problems, or at least difficulties, inherent in the LIS field today: a certain lack of cohesion, the need for Toni Weller and Jutta Haider are pursuing doctoral research funded by the Arts and Humanities Research Council. They would like to thank David Bawden for his insightful comments.

Aslib Proceedings: New Information Perspectives Vol. 59 No. 4/5, 2007 pp. 475-482 q Emerald Group Publishing Limited 0001-253X DOI 10.1108/00012530710817654

AP 59,4/5

476

justification (specifically also for justification of academic research in terms of its immediate applicability to the professional education of library and information practitioners), and to a degree also a disjuncture between the information profession and information research. This paper is not an attempt to define the discipline itself; instead it hopes to offer some introductory thoughts regarding this current situation in LIS – in particular in Britain – and to highlight some, maybe not always obvious, circumstances that might be contributing to it. In this sense, our interest here is not in what the core of LIS might be, but in grasping some aspects of its blurriness. As Haider and Bawden (2007) have argued, albeit in a different context, this: . . . does not necessarily require the introduction of an opposition between theory and practice or between LIS research and librarianship. More precisely, while LIS research and professional librarianship form different practices and also engender different ways of speaking – and naturally LIS research equally constitutes a professional practice, it can nonetheless be argued that they constitute elements of the same discursive space and are also complementary.

This tension has already been discussed by Black (1983), Haddow and Klobbas (2004), and McNicol (2004) among others. We hope that the following discussion will add to the debate on this often overlooked aspect of LIS education. LIS curriculum and teaching According to a list of CILIP accredited degree courses in late 2006, there is very little undergraduate teaching of LIS in the UK. Only seven universities in the UK offer undergraduate degrees in this area. Aberystwyth is alone in offering a BA in Information and Library Studies. Brighton, Edinburgh, Liverpool, Loughborough, Manchester and Sheffield all offer undergraduate degrees in information, library, or business management, or computing, but not in LIS specifically. Likewise, many of these are LIS with another subject, rather than LIS in its own right. When we examine the offerings for postgraduate qualifications we can add Aberdeen, Birmingham, Bristol, Edinburgh, Glasgow, London (City, UCL, Metropolitan, and Thames Valley), Manchester, and Newcastle to the list. The faculties offering these degrees vary from business, computing, engineering and mathematical sciences, to social sciences and humanities[1]. One obvious result of this myriad different faculties with which LIS is associated, and of the fact that most students only enter the field at postgraduate level, is that LIS has ended up having a largely uncohesive postgraduate body. Thus students bring preferences and methodologies from their respective individual disciplinary backgrounds. This is a situation that can be considered a strength, but which also creates problems, specifically so with regard to the formation of the discipline and its position amongst other disciplines within academia. It certainly seems to be a contributing factor in the confusions that abound about the scope of LIS as an academic discipline. Furthermore, this lack of collective grounding evidently becomes a stumbling block at postgraduate level, and especially so at doctoral level, when in order to adopt the new discipline and to participate in its discourse community, it becomes necessary to share similar outlooks and, most importantly, to have a similar reference system, however wide in scope this may be. Evidently, we, the authors of this paper, are a product of this system ourselves and as such are good examples of these circumstances, one of us having a strong

background in history, the other in literature and linguistics, respectively. While we, of course, are also starting to be anchored within the LIS scholarly community (hence this paper), we have still relied upon these original academic teachings in our PhD methodologies, and hence to a degree also in our terms of reference. It seems that these original disciplinary backgrounds tend to take precedence over LIS when there is no evident theoretical or practical application in LIS. In the Information Science Department at City University London, academic staff have backgrounds as diverse as geography, pure science, computer science, and of course social sciences and the humanities. It can be argued that these multi-disciplinary backgrounds of LIS students, practitioners, and researchers, adds value to the field, and that such a combination of ideas and approaches provides depth of understanding of today’s fast-moving, socio-technological demands. It may also help to broaden LIS research. It seems futile to deny the benefits arising from LIS’s multi- or interdisciplinary nature, and to a degree such a multifarious approach helps LIS not only draw on already established theories and frameworks, but also move into diverse and already established and accepted fields. At the same time, however, it could also be argued that this situation leads to a lack of cohesiveness; reflected, to a degree in the comparatively small amount of academic theory specific to LIS. We recognise that in LIS there will always be a certain amount of “latent tension between research and education on the one hand and professional practice on the other” (Sundin and Johannisson, 2005, p. 40). Despite this, we suggest that uniquely to the LIS discipline, this communal focus on professional education is perhaps a way of allowing these multiple disciplinary approaches to share common ground. Of course, in itself the LIS profession is a very diverse area, including, for instance, information professionals and librarians on one side (e.g. information managers, specialist librarians, professional researchers), and more technical specialisations on the other (e.g. intranet managers, data retrieval specialists, GIS specialists). One could argue that the absence of an explicitly academic focus, which seems to be inherent to LIS almost from the start, also encourages implicit confusion or at least uncertainty which ultimately leads to a somewhat uncohesive form of community, of academics, and also of practitioners. It seems therefore that in terms of its disciplinary ties, this very heterogeneous field finds one of its most cohesive elements in the professional education it provides to practitioners. At the same time, the currency of the topic LIS teaches, or to use Pierre Bourdieu’s term, the “cultural or symbolic capital” (Bourdieu, 1992), arising from its connection with the practitioners it educates, does not appear to promote LIS’s status as an academic discipline or to give it academic credence[2]. Cynically, one could argue that it becomes necessary to associate LIS research with wider developments and changes of high economic value and societal prestige, and also with already established academic disciplines of high status, such as law or philosophy. It also becomes necessary to widen the scope of LIS practitioners from “merely” librarianship to include more high profile and progressive occupations in the IT sector or in business and management related areas. Therefore we often see that library and information practitioners tend to ignore any prestige that might be provided by LIS research, instead looking to other areas for an increase of its social or symbolic capital. This can manifest by attempting to link the profession to various other societal status symbols

Where do we go from here?

477

AP 59,4/5

478

and changes in the field of power, for example, managerialism, other capitalist or new economy ventures, or the so-called “information revolution”, as exemplified by the popularisation of the “information society”. The mixed status of the disciplines encompassed within LIS research encourages a hierarchy of power both within the discipline itself, and also without as LIS tries to assert itself as independent and equal to older disciplines, such as law, history, or philosophy. On the one hand, it seems that by drawing on theories closely allied with disciplines with high academic and also general societal prestige, LIS aims to tap into this value system. On the other hand, we argue that in order to assert its position within academia, it is equally necessary to develop an independent identity and ideally export its theories to other academic disciplines. Of course, in many ways this is hardly new or surprising, especially if we consider, as does Bourdieu (1988), academia as a structured field of forces in which individual agents, but also faculties and disciplines, hold certain positions according to the social capital they are able to accrue. Thus we see that “[t]he structure of the university field reflects the structure of the field of power, while its own activity of selection and indoctrination contributes to the reproduction of this structure” (Bourdieu, 1988, p. 40). The need for active alignment with the “field of power” (i.e. the wider field of politics and new economically relevant developments), in order to draw on its social capital is more relevant for some disciplines than for others. Typically this is seen in newer fields, which still need to establish themselves and the knowledge they produce, while older disciplines are already “dominant in the political order” (Bourdieu, 1988, p. 63). In the case of LIS research and in keeping with its vocational and educational focus, it seems that this is also reflected in the proliferation of degree titles on offer, usually at the expense of the original “library” prefix. In the USA, for example, major LIS schools have affiliated to form what is called the “I-School” movement, equally dropping their “L”, and thus clearly attempting to capitalise on the current obsession with the “information society”, or at least a certain image it provides, trying to escape the constrains of its own original focus or cohesive element[3]. It is this blurriness of definition, over what LIS encompasses, what it actually is – to put it crudely – that contributes further to its uncertain status within academia. It is quite possible that it might also contribute to a lack of cohesion within the LIS community. One could also argue that a further contributing factor is the often very un-specific LIS education and the way this can be taught. This is not at all to decry interdisciplinary approaches or to call for the introduction of a narrow LIS curriculum at the expense of wider intellectual inquiry. We think a broad intellectual scope and embracement of interdisciplinary approaches in principle can be vital. But to be so, they also require a distinct LIS academic discourse. Older professions and disciplines do not need the same justification since they are already accepted holistically as research disciplines. LIS needs to get this acceptance from elsewhere, thus the professional issues of the field tend to be taught and stressed over academic or epistemological questions regarding the nature of the discipline. There are also more jobs for the professional information scientist than there are for the academic one. Why? Because there are only a limited number of UK universities offering undergraduate courses, and no degrees of any level which focus on the academic, the philosophical, the research potential of the discipline, rather than the professional opportunities and skills needed by managers, librarians, or computer

scientists. In themselves these are, of course, important skills to teach, but we argue that they have begun to dominate the LIS discipline to the detriment of its future as a whole. We mentioned above the existing tensions and discord within different sections of the discipline. A focus on the technical can make this type of research and work appear more valid than others, particularly in the UK climate which forces academics to justify their research in order to receive funding. More funding provides more credence for the LIS field, which gives these dominant areas more status within the discipline than those which bring in less money or fewer quantitative and applicable results. If LIS research were to judge its own “worth” less in terms of the professional education it provides, and more in terms of its status within academia itself, this issue would become much less acute. This does not imply that other disciplines could not be drawn upon, nor that professional education should cease, but rather that the criteria for valuing the contribution of LIS to academia more generally need to be reconsidered by those within the discipline in order to sustain and encourage the field’s future development, both professionally and academically. LIS academic research Given this existing imbalance, we ask whether it is time to consider some kind of LIS research syllabus. This would not replace the existing professional curriculum; it would compliment it. In the Nordic countries, for example, LIS PhD students are employed by their universities and their research work is considered a “professional” activity. More importantly, however, doctoral students are provided with a research education that connects them with their chosen discipline. With the clear aim to raise the quality of doctoral education, the universities have joined forces to form NorLIS (see www.norslis.net). This network of 15 Scandinavian and Baltic institutions offers (obligatory) PhD courses, which actively introduce the students (or junior researchers) to the discipline as researchers and academics themselves. Although maintaining and acknowledging the broad spectrum of possible approaches within LIS, and also the interdisciplinary nature of the discipline, they still provide a focal point and manage to lend some cohesion to the discipline and its community. Should we not attempt to instil a similar sense in LIS students and academics in the UK? We accept that it is necessary to make academic research relevant to the students’ various needs and that research is not for everyone. However, we feel the current emphasis on vocational teaching and degree options can also act to isolate those who are interested; that is, the LIS researchers of the future. Therefore these researchers will be coming, almost exclusively, from other disciplines, and since no further academic research education exists, these original disciplinary ties in many cases tend to continue to provide the frame of reference. In so doing, LIS gets caught in a circle of justification: . there is no existing academic grounding in LIS; . students bring their own disciplinary focus to postgraduate courses; . PG courses emphasise the LIS professional over LIS research; . students enter the profession leaving a gap for new specialist researchers; and . thus, there remains no taught academic grounding in LIS.

Where do we go from here?

479

AP 59,4/5

480

Currently the majority of doctoral students, in the UK at least, focus on vocational questions and areas of direct applicability, such as applications of digital technologies, information seeking behaviours, or information needs analyses. While these are worthwhile concerns and interesting research areas, they also tend to neglect issues of academic discourse, focusing instead on applied research. We must recognise the extent to which LIS tends to focus on the vocational; yet in order to avoid the circle of justification introduced above, there is a need for a new balance to be struck. If students are not being taught and shown the academic research under way by the academics who are teaching them, how can they be expected to become involved in any kind of cohesive LIS academic discourse? University LIS departments are at times heard to complain that there is a struggle to fill courses with students. Although common wisdom suggests offering more immediate practical skills to attract students, we feel that the student body is not this homogenous and that this approach, while understandable, might not always meet all of the needs and desires of many potential students. LIS degrees should, we suggest, include at least one well-developed and up-to-date compulsory module which demonstrates the richness of LIS research as a career, to engage students with current debates, and place the professional skills and discussions within their broader disciplinary context. Ideally, this would also include department staff sharing their cutting edge research, and challenging students to think of LIS academia in new ways. We argue it should be less about the basic origins of the field, and more about intellectually challenging and contemporary enquiry. For example, “fundamentals” or “introduction” modules would benefit from a greater involvement with what staff members are themselves researching, the field’s current intellectual debates, or the discipline’s key schools of thought. We feel this would also encourage a deeper sense of disciplinary cohesiveness which is lacking at the moment. Especially also since, as Sundin and Johannisson (2005, p. 40) poignantly argue: It is not the only objective of research to serve as a basis for the development of better professional tools; research also comprises a community that needs to develop its own tools. This is not an argument for placing researchers in an ivory tower, but an argument for balance between practical research and research devoted to the development of theory.

LON-LIS Research Group One way we have attempted to redress this balance and to satisfy what we perceive to be a gap in LIS education, in the broadest sense, and community, is by creating an LIS forum for academic debate and discussion. Proactively, we set up a research seminar group in June 2006 for exactly this purpose. The focus of the seminars is academic discussion, specifically as opposed to any vocational emphasis. The “London LIS Research Seminars” attempt to connect research students and other university researchers at primarily London universities in the fields of Library and Information Science and Studies. Originally started by the authors in the Department of Information Science at City University, and two research students from SLAIS at UCL, the seminars are open to everyone. They are intended as informal arenas for peer review, debate, and discussion. The seminars are given, and attended by, a mixture of PhD students and researchers, as well as guest lecturers. We have

hosted seminars on a diverse range of topics including electronic publishing, Victorian information history, and open access (though a discourse analysis), webcomics, and historical GIS, and had presenters not only from the UK, but also from Europe and the US[4]. We propose to use this forum for a round table debate on the issues raised in this paper. Specifically, we would like to encourage debate on the role of academic teaching and research in LIS, and how we might encourage existing and future LIS students to consider LIS research as a valid career in its own right, as well as LIS in its professional (i.e. vocational) capacity. We welcome papers and discussion on this topic; if you are interested in becoming involved, please contact one of us at the Department of Information Science, City University[5]. We hope this opinion encourages an engagement in the academic discourse of our discipline. We are all researching fascinating topics; we should ensure that the LIS students of tomorrow know what academic potential there is for them within the discipline, as well as what professional opportunities. Doing so can only serve to create a richer and more homogeneous, albeit hopefully not narrower, LIS discipline, which would be of benefit to us all. Notes 1. With regard to LIS in different higher education faculties, see the LIS-EU European curriculum project on this point for Europe. See Kajberg and Lørring (2005). 2. We argue that this is a factor unique to LIS. To compare to a discipline such as medicine, which practices both academic and professional education, the emphasis is fundamentally different to that which we suggest here. 3. See, for example, Dillon et al. (2006): “In its academic plan, a school of information studies recently described itself as broadening its mandate, and thereby joining an elite group of North American faculties collectively known as the ‘Information Schools’ or i-schools. At the first formal Conference of the i-School Community, in September 2005, 19 institutions from the US and Canada were identified as i-conference schools. Of these, 15 are schools which include American Library Association-accredited Masters degree programs. Some who were not present would argue (and have done so) that they were i-schools long before the idea existed. The academic community in library and information science (LIS) has begun to wonder what the i-schools movement is all about, and how it evolved from a small occasional meeting of deans to an annual conference, and perhaps more”. 4. More information can be found at the LON-LIS website. See http://lon-lis.soi.city.ac.uk/ 5. E-mail Toni Weller at [email protected] or Jutta Haider at [email protected] References Bayless, S. (1977), “Librarianship is a discipline”, Library Journal, 1 September, pp. 1715-7. Berry, J. (1987), “What about the ‘Library Discipline’?”, Library Journal, 15 March, p. 4. Black, A.R. (1983), “Information science research versus the practitioner”, Nachrichten fu¨r Dokumentation, Vol. 34 No. 6, pp. 261-5. Bourdieu, P. (1988), Homo Academicus (trans. Collier, P.), Polity Press, Cambridge. Bourdieu, P. (1992), “Social space and the genesis of ‘classes’”, in Bourdieu, P. (Ed.), Language and Symbolic Power, Polity Press, Cambridge, pp. 229-51. Bud-Frierman, L. (Ed.) (1994), Information Acumen: The Understanding and Use of Knowledge in Modern Business, Routledge, London.

Where do we go from here?

481

AP 59,4/5

482

Dillon, A., Bruce, H., Cloonan, M., Estabrook, L., Smith, L., King, J.L., Thomas, J. and Von Dran, R.F. (2006), “The i-School Movement”, discussion at the ASIS&T Annual Meeting, Austin, TX, 3-9 November, available at: www.asis.org/Conferences/AM06/papers/131.html Eaton, J. and Bawden, D. (1991), “What kind of resource is information?”, International Journal of Information Management, Vol. 1 No. 2, pp. 156-65. Foskett, D. (1973), “Information science as an emergent discipline – educational implications”, Journal of Librarianship, Vol. 5 No. 3, pp. 161-74. Haddow, G. and Klobbas, J.E. (2004), “Communication of research to practice in library and information science”, Library and Information Science Research, Vol. 26 No. 1, pp. 29-43. Haider, J. and Bawden, D. (2007), “Conceptions of ‘information poverty’ in LIS: a discourse analysis”, Journal of Documentation, forthcoming. Holtham, C. (2001), “Valuation has its price”, Library Association Record, Vol. 103 No. 4, pp. 232-3. Kajberg, L. and Lørring, L. (Eds) (2005), European Curriculum Reflections on Library and Information Science Education, The Royal School of Library and Information Science, Copenhagen, available at: http://biblis.db.dk/uhtbin/cgisirsi.exe/kT3CQx09Ii/DBI/ 77140014/523/462 (accessed 12 February 2007). Layzell Ward, P. (2003), “Management and the management of information, knowledge-based and library services 2002”, Library Management, Vol. 24 No. 3, pp. 126-59. McNicol, S. (2004), “Is research an untapped resource in the library and information profession?”, Journal of Librarianship and Information Science, Vol. 36 No. 3, pp. 119-26. Oppenheim, C., Stenson, J. and Wilson, R. (2004), “Studies on information as an asset III: views of information professionals”, Journal of Information Science, Vol. 30 No. 2, pp. 181-90. Rowley, J. (1998), “Information policy pricing: factors and contexts”, Information Services and Use, Vol. 18 No. 3, pp. 165-75. Sundin, O. and Johannisson, J. (2005), “Pragmatism, neo-pragmatism and sociocultural theory. Communicative participation as a perspective in LIS”, Journal of Documentation, Vol. 61 No. 1, pp. 23-43. Corresponding author Toni Weller can be contacted at: [email protected]

To purchase reprints of this article please e-mail: [email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints